
Hello,
I found the following implementation using finite maps to work rather
well. (See http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.FiniteMap.ht...)
-------------------
import Char
import Data.FiniteMap
isLetter c = isAlphaNum c || c=='\''
normalize = map toLower . filter isLetter
nwords = filter (not . null) . map normalize . words
wordCount w = addListToFM_C (+) emptyFM (zip (nwords w) [1,1 ..])
main = do
s <- getContents
mapM_ print $ fmToList $ wordCount s
-----------------
On a file with 264724 words and 17391 distinct words, this had the
following times:
real 0m38.208s
user 0m4.302s
sys 0m0.214s
It was compiled with -O using ghc 6.2.2. I had to increse the stack
size slightly with a commandline option.
Most of the time was actually spent rendering the output. I didn't
bother to wait for the original code to finish, as it was reporting
only a few distinct words per second.
Hope this helps
- Cale
On Sat, 01 Jan 2005 13:28:41 +0000, Dominic Fox
Another newbie question.
The following is a naive attempt at a word count function, that takes a string and returns a list of word/count pairs.
import Char
isLetter c = isAlphaNum c || c=='\''
normalize = map toLower . filter isLetter
isWord [] = False isWord s = True
nwords = filter isWord . map normalize . words
update w [] = [(w, 1)] update w ((x, n):xs) = if x==w then (x, n+1) : xs else (x, n) : (update w xs)
wordCount = foldr update [] . nwords
It works fine, but it's obviously very inefficient. I could make it less inefficient by implementing the word/count dictionary as a hash table or binary tree, but that's what standard libraries are for. So - what is there in the standard libraries that I could use in this scenario? I've looked at the Array module, but don't think it's quite suitable (we don't know the array bounds ahead of time, and the indices are strings). Data.HashTable looks better, but involves IO as it mutates the dictionary entries in place. Is there an alternative that could be used with a fold as in the code above?
Dominic _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe