
Pete Kazmier wrote:
Bryan O'Sullivan
writes: After this switch, I found that spellchecking one word still took 4x as long in Haskell as Norvig's Python program. Since I was checking only one word in each case, essentially all of the runtime was taken up by building the word frequency map.
train = foldl' updateMap M.empty . map lower . mkWords where updateMap model word = M.insertWith' (+) word 1 model mkWords = filter (not . B.null) . X.splitWith isNotAlpha lower !s = X.map toLower s isNotAlpha !c = c < 0x41 || (c > 0x5a && c < 0x61) || c > 0x7a toLower !c | c >= 0x41 && c <= 0x5a = c + 0x20 | otherwise = c
After reading Bryan's post, I switched my right fold to a strict left fold and almost tripled my original speed. Could someone provide some guidelines on when to use each? I thought foldr should be used when building some large data structure such as the map I build.
Bryan, out of curiosity, is a non bytestring version of your code faster?