
--- Isaac Gouy
We'll be happy to also show a Haskell program that uses Data.HashTable - first, someone needs to contribute that program.
Someone did: k-nucleotide Haskell GHC #2 http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all __________________________________ Yahoo! for Good - Make a difference this year. http://brand.yahoo.com/cybergivingweek2005/

From: Isaac Gouy
To: haskell-cafe@haskell.org Subject: [Haskell-cafe] Re: Haskell Speed Date: Thu, 29 Dec 2005 13:00:15 -0800 (PST) --- Isaac Gouy
wrote: We'll be happy to also show a Haskell program that uses Data.HashTable - first, someone needs to contribute that program.
Someone did: k-nucleotide Haskell GHC #2 http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all
To comment some observation on this program. Most of the pressure now is on Data.HashTable. I've susspected such large memory usage on substring from array conversions, so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char, size :: Int } and there was no substrings in map or anywhere else, but memory consumption remains. So after eliminating inserts and updates into HashTable memory was ok. Finally I've realized that updates into hash table actually increase memory usage to large extent instead to keep memory same on average. So I guess this is bug in HashTable? Second in this test, hash function needs to be very strong, as even with following I got longest chain of 16 elements. myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0 where f c m = f'' $ f' (ord c + m) f' m = m + (m `shiftL` 10) f'' m = m `xor` (m `shiftR` 6) ff' m = m + (m `shiftL` 3) ff'' m = m `xor` (m `shiftR` 11) ff''' m = m + (m `shiftL` 15) Default hashString has longestChain of 18 elements. Perhaps someone can explain if such a memory leaks from HashTable updates are normal or are bugs? All in all functional version consumes less memory and is twice as fast. Greetings, Bane. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

On Dec 29, 2005, at 7:44 PM, Branimir Maksimovic wrote:
To comment some observation on this program. Most of the pressure now is on Data.HashTable. I've susspected such large memory usage on substring from array conversions, so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char, size :: Int } and there was no substrings in map or anywhere else, but memory consumption remains. So after eliminating inserts and updates into HashTable memory was ok. Finally I've realized that updates into hash table actually increase memory usage to large extent instead to keep memory same on average. So I guess this is bug in HashTable?
Probably. The minimum table chunk size was rather large. I have been experimenting (tests are running even as I type) with alternate implementations of Data.HashTable. So far the winning implementation is one based on multiplicative hashing and simple table doubling. This seems to be consistently competitive with / faster than Data.Map. At this point my inclination is to make the whole suite available: Data.HashTable.Class Data.HashTable.DataMap Data.HashTable.Doubling Data.HashTable.Flat Data.HashTable.Freeze Data.HashTable.Modulus Data.HashTable.Multiplicative Data.HashTable.Original Data.HashTable.Test Data.HashTable I've separated out the hashing technique (Modulus, Multiplicative) from the hash table implementation. Note that a few of the above are bogus, and this is only a fraction of the implementations tried thus far. I need to get distribution clearance for a bunch of this code from my employer, and figure out how to package it. The latter may be tricky, as Data.Hashtable is currently rather intimately tied to a bunch of rather tricky bits of GHC. -Jan-Willem Maessen
Second in this test, hash function needs to be very strong, as even with following I got longest chain of 16 elements. myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0 where f c m = f'' $ f' (ord c + m) f' m = m + (m `shiftL` 10) f'' m = m `xor` (m `shiftR` 6) ff' m = m + (m `shiftL` 3) ff'' m = m `xor` (m `shiftR` 11) ff''' m = m + (m `shiftL` 15) Default hashString has longestChain of 18 elements. Perhaps someone can explain if such a memory leaks from HashTable updates are normal or are bugs? All in all functional version consumes less memory and is twice as fast.
Greetings, Bane.
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello Branimir, Friday, December 30, 2005, 3:44:26 AM, you wrote: BM> myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0 i use the following hash function in my program: filenameHash = fromIntegral . foldl (\h c -> h*37+(ord c)) 0 (it's written without explicit parameter in order to try to help GHC machinery better optimize this function) BM> All in all functional version consumes less memory and is twice as BM> fast. IOArrays is second-class citizens in GHC/Haskell. they are scanned on _each_ GC, and this can substantially slow down program which uses large IOArrays. below is two runs of the almost the same program using different array types. this program unpickles 10^6 Word32 values, using 1) Array Int Word32 2) IOArray Int Word32 first program just unpickles list and then uses listArray to transform it to Array, while the second program writes unpickled values directly to IOArray. the second solution uses less memory, creates less temporary data and obviously must be faster. but really it works 8 times slower because these dramatical GC times for your program, things will be not so bad because IOArray, used inside your HashTable, contains far less than million elements and because your program has more to do itself. but try to compare MUT time of functional and imperative version - may be your algorithm is really faster and only GC times makes this comparision unfair .. writing this message i thought that reducing number of GCs can speed up my program and your program too. so there is third variant - using IOArray, but with "+RTS -A100m"
1) Array Int Word32
make array: 1.392 secs (arr = listArray0 [1..10^6] :: Array Int Word32) put array: 1.833 secs (put_ h arr) get array: 2.163 secs (a <- get h :: IO (Array Int Word32)) 381,202,596 bytes allocated in the heap 143,766,944 bytes copied during GC 24,937,228 bytes maximum residency (8 sample(s)) 1439 collections in generation 0 ( 1.65s) 8 collections in generation 1 ( 0.77s) 54 Mb total memory in use INIT time 0.01s ( 0.00s elapsed) MUT time 2.88s ( 2.93s elapsed) GC time 2.42s ( 2.45s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.32s ( 5.39s elapsed) %GC time 45.6% (45.5% elapsed) Alloc rate 131,721,698 bytes per MUT second Productivity 54.2% of total user, 53.5% of total elapsed
2) IOArray Int Word32
make array: 6.569 secs (v <- newListArray (0,10^6-1) [1..10^6] :: IO (IOArray Int Word32)) put array: 14.110 secs (put_ h arr) get array: 23.874 secs (a <- get h :: IO (IOArray Int Word32)) 360,377,080 bytes allocated in the heap 87,007,920 bytes copied during GC 18,237,472 bytes maximum residency (6 sample(s)) 1344 collections in generation 0 ( 40.77s) 6 collections in generation 1 ( 0.38s) 36 Mb total memory in use INIT time 0.01s ( 0.00s elapsed) MUT time 3.15s ( 3.16s elapsed) GC time 41.15s ( 41.40s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 44.31s ( 44.56s elapsed) %GC time 92.9% (92.9% elapsed) Alloc rate 114,007,301 bytes per MUT second Productivity 7.1% of total user, 7.1% of total elapsed
3) IOArray Int Word32 with "+RTS -A100m"
make: 0.391 secs put: 2.343 secs get: 2.003 secs 440,654,968 bytes allocated in the heap 26,413,568 bytes copied during GC 20,043,828 bytes maximum residency (2 sample(s)) 4 collections in generation 0 ( 0.23s) 2 collections in generation 1 ( 0.64s) 120 Mb total memory in use INIT time 0.02s ( 0.00s elapsed) MUT time 3.81s ( 3.83s elapsed) GC time 0.87s ( 0.90s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.71s ( 4.74s elapsed) %GC time 18.6% (19.1% elapsed) Alloc rate 114,963,466 bytes per MUT second Productivity 81.0% of total user, 80.5% of total elapsed -- Best regards, Bulat mailto:bulatz@HotPOP.com

From: Bulat Ziganshin
Reply-To: Bulat Ziganshin To: "Branimir Maksimovic" CC: igouy@yahoo.com, haskell-cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Re: Haskell Speed Date: Fri, 30 Dec 2005 17:56:57 +0300 Hello Branimir,
Friday, December 30, 2005, 3:44:26 AM, you wrote: BM> myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0
i use the following hash function in my program:
filenameHash = fromIntegral . foldl (\h c -> h*37+(ord c)) 0
(it's written without explicit parameter in order to try to help GHC machinery better optimize this function)
I've found that hasing function and anything that does calculation is fast enough (compared imported C function and Haskell, and no real difference, Haskell is fast regarding calculations) , problem is with memory leak. In this case hashing function have to be strong in order to avoid linear search. when memory leak with HashTable dissapears I will use some even better hash functions.
BM> All in all functional version consumes less memory and is twice as BM> fast.
IOArrays is second-class citizens in GHC/Haskell. they are scanned on _each_ GC, and this can substantially slow down program which uses large IOArrays.
Hm, there is Hans Boehm GC for C and C++ and I have gcmalloc and gcmalloc_atomic. Why this isn;t present in Haskel? gcmalloc_atomic is usefull when allocating large arrays that does not contain any references/pointers.
for your program, things will be not so bad because IOArray, used inside your HashTable, contains far less than million elements and because your program has more to do itself. but try to compare MUT time of functional and imperative version - may be your algorithm is really faster and only GC times makes this comparision unfair
Hm that can be solved if hash_table use gcmaloc_atomic I guess. (If all execution times goes for GC scans).
.. writing this message i thought that reducing number of GCs can speed up my program and your program too. so there is third variant - using IOArray, but with "+RTS -A100m"
Wow, that option almost doubled speed of HashTable program (memory leak remains). Thank you, for enlighten me about GC. After this I can think of gc_add_scanrange,gc_remove_scanrange and gcmalloc_atomic functions as they are now common to other languages? We need low level GC interface in order to produce fast programs. Greetings, Bane. _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/

Hello Branimir, Saturday, December 31, 2005, 4:55:51 AM, you wrote:
IOArrays is second-class citizens in GHC/Haskell. they are scanned on _each_ GC, and this can substantially slow down program which uses large IOArrays.
BM> Hm, there is Hans Boehm GC for C and C++ and I have gcmalloc and BM> gcmalloc_atomic. Why this isn;t present in Haskel? gcmalloc_atomic is BM> usefull BM> when allocating large arrays that does not contain any references/pointers. because these arrays CONTAINS references :) they reference all the values inserted in the array so far there is also IOUArray, which contains plain unboxed values of simple types (Int, Float..) and don't slow down the program
.. writing this message i thought that reducing number of GCs can speed up my program and your program too. so there is third variant - using IOArray, but with "+RTS -A100m"
BM> Wow, that option almost doubled speed of HashTable program (memory leak BM> remains).
use option "+RTS -sstderr" to see runtime of the program itself ("MUT time") and "GC time" separately. see section "4.14.2. RTS options to control the garbage collector" of GHC guide for more information -- Best regards, Bulat mailto:bulatz@HotPOP.com
participants (4)
-
Branimir Maksimovic
-
Bulat Ziganshin
-
Isaac Gouy
-
Jan-Willem Maessen