fundata1 -- Karmic Social Capital Benchmark and Shootout

I am happy to announce fundata1 -- the largest-ever program per RAM allocation in Haskell, originally implemented in Clojure and then OCaml and Haskell for social network modeling. http://github.com/alexy/fundata1 It has now become the first large-scale social networking benchmark with a real dynamic social graph built from the actual Twitter gardenhose, with the data OK'd by Twitter and supplied along with the benchmark. I wrote three reference implementations, all on github as well. Clojure and OCaml are quite basic, while Haskell community had a chance to optimize its data structures and in fact fix a GC integer overflow while working on it. You're welcome to fork and improve all of these implementations, and supply others! There's a Google Group, http://groups.google.com/group/fundata/ to discuss the shootout. There's also a blog about it and other functional things at http://functional.tv/ Let the fun begin! -- Alexy Khrabrov firstname.lastnameATgmaildotcom

Great stuff! I have an improvements to HashMaps that I'm working on that will hopefully work well here.

In the lessons you say: Haskell proved too slow with String Map, so we ended up interning strings
and working with an IntMap and a dictionary to disintern back to strings as a last step. Daniel Fisher was instrumental in bringing Haskell up to speed with OCaml and then beating it. Don Stewart provided awesome leadership and amazing modification of Haskell's core data structured before your very eyes.
Can you elaborate on this?
and What do you mean by: "modification of Haskell's core data structured " ?
Daryoush
On Thu, Oct 28, 2010 at 5:53 PM, Alexy Khrabrov
I am happy to announce fundata1 -- the largest-ever program per RAM allocation in Haskell, originally implemented in Clojure and then OCaml and Haskell for social network modeling.
http://github.com/alexy/fundata1
It has now become the first large-scale social networking benchmark with a real dynamic social graph built from the actual Twitter gardenhose, with the data OK'd by Twitter and supplied along with the benchmark.
I wrote three reference implementations, all on github as well. Clojure and OCaml are quite basic, while Haskell community had a chance to optimize its data structures and in fact fix a GC integer overflow while working on it. You're welcome to fork and improve all of these implementations, and supply others!
There's a Google Group,
http://groups.google.com/group/fundata/
to discuss the shootout. There's also a blog about it and other functional things at
Let the fun begin!
-- Alexy Khrabrov firstname.lastnameATgmaildotcom
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Daryoush Weblog: http://perlustration.blogspot.com/

dmehrtash:
In the lessons you say:
Haskell proved too slow with String Map, so we ended up interning strings and working with an IntMap and a dictionary to disintern back to strings as a last step. Daniel Fisher was instrumental in bringing Haskell up to speed with OCaml and then beating it. Don Stewart provided awesome leadership and amazing modification of Haskell's core data structured before your very eyes.
Can you elaborate on this?
and What do you mean by: "modification of Haskell's core data structured " ?
I think he means some of the stuff Daniel and I did with specializing data structures (like IntMap) to their monomorphic key / elem types. -- Don

On Fri, Oct 29, 2010 at 1:17 PM, Daryoush Mehrtash
In the lessons you say:
Haskell proved too slow with String Map, so we ended up interning strings and working with an IntMap and a dictionary to disintern back to strings as a last step. Daniel Fisher was instrumental in bringing Haskell up to speed with OCaml and then beating it. Don Stewart provided awesome leadership and amazing modification of Haskell's core data structured before your very eyes.
A quick tip for anyone who needs to work on large maps with string keys in the future: try Milan Straka's hashmap package. String comparison is expensive and using a size balanced tree causes O(log n) of them. The HashMap data type uses an IntMap internally and typically (in the absence of collisions) needs to traverse the string only once, to compute the hash, and can then perform O(log n) cheap Int comparisons. I'm working on a more efficient hash function for ByteString and Text: an implementation of MurmurHash. The HashMap data type and the need hash function together should give us good performance for maps with string keys. Johan
participants (4)
-
Alexy Khrabrov
-
Daryoush Mehrtash
-
Don Stewart
-
Johan Tibell