Another beginner's memory consumption problem...

I'm trying the add-a-gram challenge from here: http://www.itasoftware.com/careers/programmers-archive.php ... and I'm also experiencing runaway memory consumption. If I load the supplied list of words (a 1.6M file) and search for shorter strings, things are OK. Memory consumption increases dramatically as I search for longer and longer trails. The function searchWord starts with a particular map and searches the maps below it (i.e. those containing shorter words), so when I search for longer trails, a larger number of maps must be examined. When I search for short trails, only a few (2-3 say) of the maps are examined. I think this implies some kind of lazy evaluation problem, but I'm not sure where to go from here. Was it a bad idea to use a Map of Maps as a data structure? Initially I had a list of Maps, but this wouldn't even load the full file (so I suppose lazy evaluation has allowed me to get a bit further...). Maps seemed like the right data structures to use for this problem. Is there some way I could force evaluation of the maps as the file is loaded? (Or is blaming the maps a red herring?) Program below:
module Main where
import Data.FiniteMap import Data.List import System.IO
Data structure is a Map of Maps. The top-level Map holds Maps indexed by Int (for words of size n). Each of these Maps holds words of that size, where a key is the sorted list of chars for a word, and the value is the list of words whose sorted characters are that key i.e. all words constructed from the same set of characters.
type WordMaps = FiniteMap Int WordMap type WordMap = FiniteMap String [String] type MyWord = String
ail + s = sail + n = nails + e = aliens + t = salient + r = entrails + c = clarinets + e = interlaces + d = CREDENTIALS (length 11) mar + c = cram + h = march + s = charms + o = chromas + n = monarchs + i = harmonics + a = maraschino + n = ANACHRONISM (length 11) -----------------------------------------------------------
addWordToMap :: String -> WordMap -> WordMap addWordToMap word m = case lookupFM m key of Just wlist -> addToFM m key (word:wlist) Nothing -> addToFM m key [word] where key = sort word
addWord :: String -> WordMaps -> WordMaps addWord word maps = addToFM maps key newmap where key = length word newmap = addWordToMap word oldmap oldmap = case (lookupFM maps key) of Just m -> m Nothing -> emptyFM
populateMaps :: WordMaps -> IO WordMaps populateMaps maps = do eof <- isEOF if eof then return maps else do word <- getLine if (length word) > 0 -- test for blank lines then populateMaps (addWord word maps) else populateMaps maps
-----------------------------------------------------------
getMap :: WordMaps -> Int -> WordMap getMap maps key = case (lookupFM maps key) of Nothing -> emptyFM Just m -> m
showWordList :: [String] -> String showWordList = concat . intersperse ", "
printTrail :: WordMaps -> [String] -> String printTrail _ [] = "" printTrail maps (w:ws) = (showWordList elts) ++ "\n" ++ (printTrail maps ws) where m = getMap maps (length w) elts = lookupWithDefaultFM m [""] w
removeEachChar :: String -> [String] removeEachChar word = removeEachChar' word (length word) []
removeEachChar' :: String -> Int -> [String] -> [String] removeEachChar' [] _ list = list removeEachChar' _ 0 list = list removeEachChar' word n list = removeEachChar' word (n-1) (newword:list) where newword = (take (n-1) word) ++ (drop n word)
unfussyHead :: [[String]] -> [String] unfussyHead [] = [] unfussyHead l = head l
Given a word (and a trail so far) this will return either an empty trail, or a trail to the first 3-char word that completes the search. A trail is just the list of keys, shortest first.
searchWord :: WordMaps -> [String] -> String -> [String] searchWord maps trail word | length word <= 3 = word:trail | otherwise = case lookupFM m word of Nothing -> [] Just _ -> unfussyHead $ filter ([] /=) $ map cont (removeEachChar word) where cont = searchWord maps (word:trail) m = getMap maps (length word)
For each word (key) in a level, call searchWord. The first to return a non-empty list is the winner. If the list for this level is empty, then recurse with the next level down.
searchLevel :: WordMaps -> Int -> [String] searchLevel _ 3 = [] searchLevel maps level = case (lookupFM maps level) of Nothing -> [] Just m -> let result = filter ([] /= ) $ map (searchWord maps []) (keysFM m) in if result == [] then searchLevel maps (level-1) else head result
Kick off search at top-level.
start :: WordMaps -> [String] start maps = searchLevel maps (last (keysFM maps))
Look for a specific trail. For testing/debugging.
findw :: WordMaps -> [String] findw maps = searchWord maps [] (sort "entrails")
findw maps = searchWord maps [] (sort "nails") findw maps = searchWord maps [] (sort "salient") findw maps = searchWord maps [] (sort "entrails") findw maps = searchWord maps [] (sort "clarinets")
readWords :: IO String readWords = do wordMaps <- populateMaps emptyFM return (printTrail wordMaps (findw wordMaps)) -- This is how we'd normally start: --return (printTrail wordMaps (start wordMaps))
main :: IO () main = do s <- readWords putStrLn s return ()
***************************************************************** The information in this email and in any attachments is confidential and intended solely for the attention and use of the named addressee(s). This information may be subject to legal professional or other privilege or may otherwise be protected by work product immunity or other legal rules. It must not be disclosed to any person without our authority. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. *****************************************************************
participants (1)
-
Bayley, Alistair