
I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then decode), it's the same time... It's not IO, CPU is loaded 100%. I'm now thinking of using cereal. Given I have Data.Binary in place, what needs to be changed to work with cereal? Is it binary- compatible? How can one construct a cereal instance for Data.Trie? -- Alexy

deliverable:
I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then decode), it's the same time... It's not IO, CPU is loaded 100%.
I'm now thinking of using cereal. Given I have Data.Binary in place, what needs to be changed to work with cereal? Is it binary- compatible? How can one construct a cereal instance for Data.Trie?
cereal and binary are almost identical, one will load from a single strict bytestring, one will load from a chunk of strict bytestrings. I'd imagine that constructing your list is the problem, not the parsing part. Do some profiling, since I'm doubtful switching to cereal will make much of a difference. -- Don

On Sat, Jul 3, 2010 at 8:57 PM, braver
I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then decode), it's the same time... It's not IO, CPU is loaded 100%.
I'm now thinking of using cereal. Given I have Data.Binary in place, what needs to be changed to work with cereal? Is it binary- compatible? How can one construct a cereal instance for Data.Trie?
I suspect Float instance is problem. Try to to the same with (Int,Int) pairs.

On Sat, Jul 3, 2010 at 10:54 PM, Alexey Khudyakov
On Sat, Jul 3, 2010 at 8:57 PM, braver
wrote: I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then decode), it's the same time... It's not IO, CPU is loaded 100%.
I'm now thinking of using cereal. Given I have Data.Binary in place, what needs to be changed to work with cereal? Is it binary- compatible? How can one construct a cereal instance for Data.Trie?
I suspect Float instance is problem. Try to to the same with (Int,Int) pairs.
To clarify things. If there is significant improvement in performance (times, tens of times) than Float's instance is indeed culprit.

braver wrote:
I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then decode), it's the same time... It's not IO, CPU is loaded 100%.
The Binary instance for Trie is based on the old Binary instance for Data.IntMap. There were some corner-case performance issues with the latter which were recently fixed[1], but I haven't had a chance to look at the new instance or to figure out if the changes would also be relevant for Trie. So this might be a potential source of your problems. [1] Alas, I can't find the thread discussing the new instance ATM.
I'm now thinking of using cereal. Given I have Data.Binary in place, what needs to be changed to work with cereal? Is it binary- compatible? How can one construct a cereal instance for Data.Trie?
If you send me an instance, I can apply the patch. -- Live well, ~wren
participants (4)
-
Alexey Khudyakov
-
braver
-
Don Stewart
-
wren ng thornton