RE: Data.HashTable bug

19 Mar 2004


      Yup, that's it.  It does indeed go nice and fast with GHC 5.04.3, but
slows down with 6.2.  This is because 6.2 is using the new
Data.PackedString library, which is unoptimised.  (the code also needs a
fix to hashPS for 6.2: indexPS is 0-based rather than 1-based).
 
Cheers,
	Simon
...
Does the code below look familiar?
/Josef
\begin{code}
-- compile with: ghc -O -package lang
-- run with:     ./a.out +RTS -H10m -K4m 
import MArray
import PackedString
import IOExts
import IO
import Char
import Monad
arr_size = 20000
main = do
  h <- openFile "Usr.Dict.Words" ReadMode
  sz <- hFileSize h
  ps <- hGetPS h (fromIntegral sz)
tbl <- newArray (0,arr_size) []
  mapM (addToHashTable tbl) (linesPS ps)
h <- openFile "Input.txt" ReadMode
  sz <- hFileSize h
  ps <- hGetPS h (fromIntegral sz)
let test s = do b <- elemHashTable s tbl
        when (not b) (putStrLn (unpackPS s))
  mapM test (linesPS ps)
type HashTable = IOArray Int [PackedString]
-- Looks bad, but GHC does a great job of optimising it:
hashPS :: PackedString -> Int
hashPS ps = foldr f 0 (map (ord.indexPS ps) [1..lengthPS ps])
  where f n m = n + m * 128 `mod` 1048583
addToHashTable :: HashTable -> PackedString -> IO ()
addToHashTable tbl s = do
  let h = hashPS s
      index = h `mod` arr_size
  r <- readArray tbl index
  if (s `elem` r) then return () else do
  writeArray tbl index (s : r)
elemHashTable :: PackedString -> HashTable -> IO Bool
elemHashTable s tbl = do
  let h = hashPS s
      index = h `mod` arr_size
  r <- readArray tbl index
  return (s `elem` r)
\end{code}
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Simon Marlow

tags

participants (1)