
Kamil Dworakowski wrote:
On Jun 22, 10:03 am, Eugene Kirpichov
wrote: Hey, you're using String I/O!
nWORDS <- fmap (train . map B.pack . words) (readFile "big.txt")
This should be
WORDS <- fmap (train . B.words) (B.readFile "big.txt")
By the way, which exact file do you use as a misspellings file? The corpus linked to at Norvig's page has many. And do you have a driver program that I could run and obtain your timings?
Yep, Don pointed that out and I have changed the program accordingly. It didn't make any difference though. The time spent on building the dictionary is a small portion of the overall run time.
Please see the repository contents for the current version of the program: http://patch-tag.com/r/spellcorrect/snapshot/current/content/pretty
The eval-bytestring.hs there is the program I used for timing. Inside of it you will find the name of the misspellings file needed.
Thanks all for the suggestions. I'll try them when I get home tonight.
Another suggestion, is that you should try to make sure that the lists constructed by getCommonSpellingMistakesWithCorrections get optimized away. As written I'm not sure there will be sufficient inlining to ensure that. If you want to be explicit about removing them, something like the following should help:
module Main where import Prelude hiding (words) import SpellingCorrection import qualified Data.ByteString.Char8 as B import Data.Char import Data.IORef import Control.Monad (forM_)
main = do corrector <- getCorrector misspell_corpus <- B.readFile "FAWTHROP1DAT.643" n <- newIORef (0::Int) wrong <- newIORef (0::Int) forM_ (B.lines misspell_corpus) $ \line -> do modifyIORef' n (1+) let [ms,c] = map (B.map toLower) . B.words $ line if corrector ms /= c then modifyIORef' wrong (1+) else return () accuracy <- do n' <- readIORef n w' <- readIORef wrong return $! 100 * fromIntegral w' / fromIntegral n' putStrLn $ "accuracy" ++ show (100 - accuracy) ++ "%"
modifyIORef' :: IORef a -> (a -> a) -> IO () modifyIORef' r f = readIORef r >>= (writeIORef r $!) . f {-# INLINE modifyIORef' #-}
-- Live well, ~wren