
Hello again,
Actually ran the original code (on the same file, still using
gnome-terminal). I noticed that it sped up as it went along, so it
didn't actually take quite as much time as I originally thought it
might.
real 13m5.190s
user 6m1.595s
sys 0m3.061s
By the way, the finitemap code, when run from the linux console, as
opposed to a gnome-terminal gives 5.189s of real time. (The disparity
between real and user times really was gnome-terminal's fault).
- Cale
On Mon, 3 Jan 2005 03:06:28 -0500, Cale Gibbard
Hello,
I found the following implementation using finite maps to work rather well. (See http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.FiniteMap.ht...) ------------------- import Char import Data.FiniteMap
isLetter c = isAlphaNum c || c=='\''
normalize = map toLower . filter isLetter
nwords = filter (not . null) . map normalize . words
wordCount w = addListToFM_C (+) emptyFM (zip (nwords w) [1,1 ..])
main = do s <- getContents mapM_ print $ fmToList $ wordCount s -----------------
On a file with 264724 words and 17391 distinct words, this had the following times: real 0m38.208s user 0m4.302s sys 0m0.214s
It was compiled with -O using ghc 6.2.2. I had to increase the stack size slightly with a commandline option.
Most of the time was actually spent rendering the output. I didn't bother to wait for the original code to finish, as it was reporting only a few distinct words per second.
Hope this helps - Cale