ANN: bytestring-trie 0.1.4

-------------------------------------------- -- bytestring-trie 0.1.4 -------------------------------------------- Another release for efficient finite maps from (byte)strings to values. Equal parts bugfix release and real release. This upgrade is suggested for all users and is what 0.1.0 should have been. -------------------------------------------- -- Changes (since 0.1.2) -------------------------------------------- * Fixed a number of obscure bugs when "" is used as a key. * Added keys and toListBy (which generalizes toList, keys,...). Also did some optimization of toListBy so it shouldn't be as egregiously inefficient as it was before. * Added fromListL, fromListR, and fromListS to Data.Trie.Convenience along with notes about when each is more efficient than the others. * Separated Data.Trie (the main module for users) from Data.Trie.Internal (gritty details, and core implementation). Data.Trie re-exports most of Data.Trie.Internal so this is primarily a cosmetic change. The showTrie and lookupBy_ functions are no longer exported from Data.Trie however. -------------------------------------------- -- Links -------------------------------------------- Homepage: http://code.haskell.org/~wren/ Hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie Darcs: http://code.haskell.org/~wren/bytestring-trie/ Haddock (Darcs version): http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/bytestring-trie/ -- Live well, ~wren

Do you have any benchmarks comparing dictionaries against Map ByteString Int, or Map String Int? -- Don wren:
-------------------------------------------- -- bytestring-trie 0.1.4 --------------------------------------------
Another release for efficient finite maps from (byte)strings to values. Equal parts bugfix release and real release. This upgrade is suggested for all users and is what 0.1.0 should have been.
-------------------------------------------- -- Changes (since 0.1.2) --------------------------------------------
* Fixed a number of obscure bugs when "" is used as a key.
* Added keys and toListBy (which generalizes toList, keys,...). Also did some optimization of toListBy so it shouldn't be as egregiously inefficient as it was before.
* Added fromListL, fromListR, and fromListS to Data.Trie.Convenience along with notes about when each is more efficient than the others.
* Separated Data.Trie (the main module for users) from Data.Trie.Internal (gritty details, and core implementation). Data.Trie re-exports most of Data.Trie.Internal so this is primarily a cosmetic change. The showTrie and lookupBy_ functions are no longer exported from Data.Trie however.
-------------------------------------------- -- Links --------------------------------------------
Homepage: http://code.haskell.org/~wren/
Hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
Darcs: http://code.haskell.org/~wren/bytestring-trie/
Haddock (Darcs version):
http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/bytestring-trie/
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Don Stewart wrote:
Do you have any benchmarks comparing dictionaries against Map ByteString Int, or Map String Int?
I haven't personally run them, but Mark Wotton has compared [(ByteString,Int)] vs (Map ByteString Int) vs (Trie Int) version 0.1.1 or 0.1.2 and using data from /usr/share/dict/words: 9:22 ~/projects/current/MapBench % ghc --make Test.hs ../../Microbench.hs && ./Test [1 of 2] Compiling Microbench ( ../../Microbench.hs, ../../Microbench.o ) [2 of 2] Compiling Main ( Test.hs, Test.o ) Linking Test ... * alist lookup: 160.641ns per iteration / 6225.07 per second. * map lookup: 0.881ns per iteration / 1135623.22 per second. * Trie lookup: 0.243ns per iteration / 4116930.41 per second. The benchmarking code requires hacking microbench to use Integers instead of Ints, to avoid counter overflow. I haven't yet worked the benchmarking code into the Darcs repo, but it's available to those interested. I'll add it to the repo soon. The cost of inserting elements is higher for Trie than Map (I don't have the numbers). One possible reason for this is that the generic alterBy doesn't know whether it's inserting or deleting, so it uses smart constructors along the spine which adds the cost of consistency checks. I've planned to add a dedicated implementation for insertBy (or modifyBy, or some such name) in order to test this hypothesis. The cost of merging is still unknown, but big-endian patricia trees have better asymptotic complexity than the balanced trees used by Map, so Trie is probably better there too. Another improvement in the works is a smarter algorithm for splitting strings. Once it's in place, this should speed up all three of the main algorithms. -- Live well, ~wren
participants (2)
-
Don Stewart
-
wren ng thornton