
Yup, that's it. It does indeed go nice and fast with GHC 5.04.3, but slows down with 6.2. This is because 6.2 is using the new Data.PackedString library, which is unoptimised. (the code also needs a fix to hashPS for 6.2: indexPS is 0-based rather than 1-based). Cheers, Simon
Does the code below look familiar?
/Josef
\begin{code} -- compile with: ghc -O -package lang -- run with: ./a.out +RTS -H10m -K4m
import MArray import PackedString import IOExts import IO import Char import Monad
arr_size = 20000
main = do h <- openFile "Usr.Dict.Words" ReadMode sz <- hFileSize h ps <- hGetPS h (fromIntegral sz)
tbl <- newArray (0,arr_size) [] mapM (addToHashTable tbl) (linesPS ps)
h <- openFile "Input.txt" ReadMode sz <- hFileSize h ps <- hGetPS h (fromIntegral sz)
let test s = do b <- elemHashTable s tbl when (not b) (putStrLn (unpackPS s)) mapM test (linesPS ps)
type HashTable = IOArray Int [PackedString]
-- Looks bad, but GHC does a great job of optimising it: hashPS :: PackedString -> Int hashPS ps = foldr f 0 (map (ord.indexPS ps) [1..lengthPS ps]) where f n m = n + m * 128 `mod` 1048583
addToHashTable :: HashTable -> PackedString -> IO () addToHashTable tbl s = do let h = hashPS s index = h `mod` arr_size r <- readArray tbl index if (s `elem` r) then return () else do writeArray tbl index (s : r)
elemHashTable :: PackedString -> HashTable -> IO Bool elemHashTable s tbl = do let h = hashPS s index = h `mod` arr_size r <- readArray tbl index return (s `elem` r) \end{code} _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (1)
-
Simon Marlow