ACM Task for C++ and Java programmers in Haskell. How to make code faster?

Hi! Recently I've found interesting ACM research on C++ and Java efficiency. This task also was been solved on lisp/scheme and is described on http://www.flownet.com/ron/papers/lisp-java/ I tried to solve this task on Haskell but stuck with efficiency problems (approx in 1000 times slower and takes 20 times more memory). I hope the approach, I've used is correct and problem with data structures and implementation. It took more than 81% of time for garbage collection. I tried to use ByteStrings instead of Strings, but unfortunately it didn't help. The entire code I placed on http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2764 The original instructions are on: http://www.flownet.com/ron/papers/lisp-java/instructions.html Dictionary file could be downloaded from: http://www.flownet.com/ron/papers/lisp-java/dictionary. Input data: http://www.flownet.com/ron/papers/lisp-java/input.txt Expected output: http://www.flownet.com/ron/papers/lisp-java/output.txt Could someone help me to make this code faster? I'd like to see solution that will be elegant and fast, without heavy optimizations, that will make code unreadable. Also, if it possible, prepare the program to support SMP parallelism. Very thanks in advance. Hope you'll enjoy this task :) -- Best regards, Vasyl Pasternak

Hello Vasyl, Sunday, March 22, 2009, 8:25:02 PM, you wrote: i believe that i already seen this problem here a few years ago :) what search structure you was used? i think that immutable hash (represented as array of lists) would be useful here
Hi!
Recently I've found interesting ACM research on C++ and Java efficiency. This task also was been solved on lisp/scheme and is described on http://www.flownet.com/ron/papers/lisp-java/
I tried to solve this task on Haskell but stuck with efficiency problems (approx in 1000 times slower and takes 20 times more memory). I hope the approach, I've used is correct and problem with data structures and implementation. It took more than 81% of time for garbage collection. I tried to use ByteStrings instead of Strings, but unfortunately it didn't help.
The entire code I placed on http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2764
The original instructions are on: http://www.flownet.com/ron/papers/lisp-java/instructions.html
Dictionary file could be downloaded from: http://www.flownet.com/ron/papers/lisp-java/dictionary.
Input data: http://www.flownet.com/ron/papers/lisp-java/input.txt
Expected output: http://www.flownet.com/ron/papers/lisp-java/output.txt
Could someone help me to make this code faster? I'd like to see solution that will be elegant and fast, without heavy optimizations, that will make code unreadable. Also, if it possible, prepare the program to support SMP parallelism.
Very thanks in advance. Hope you'll enjoy this task :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Thanks for the point. I've found haskell solution there
http://www.haskell.org/haskellwiki/Phone_number
The second (shorter) solution is fantastic, it beats all other
languages, but I'll never wrote such code in reasonable time :)
2009/3/22 Bulat Ziganshin
Hello Vasyl,
Sunday, March 22, 2009, 8:25:02 PM, you wrote:
i believe that i already seen this problem here a few years ago :)
what search structure you was used? i think that immutable hash (represented as array of lists) would be useful here
Hi!
Recently I've found interesting ACM research on C++ and Java efficiency. This task also was been solved on lisp/scheme and is described on http://www.flownet.com/ron/papers/lisp-java/
I tried to solve this task on Haskell but stuck with efficiency problems (approx in 1000 times slower and takes 20 times more memory). I hope the approach, I've used is correct and problem with data structures and implementation. It took more than 81% of time for garbage collection. I tried to use ByteStrings instead of Strings, but unfortunately it didn't help.
The entire code I placed on http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2764
The original instructions are on: http://www.flownet.com/ron/papers/lisp-java/instructions.html
Dictionary file could be downloaded from: http://www.flownet.com/ron/papers/lisp-java/dictionary.
Input data: http://www.flownet.com/ron/papers/lisp-java/input.txt
Expected output: http://www.flownet.com/ron/papers/lisp-java/output.txt
Could someone help me to make this code faster? I'd like to see solution that will be elegant and fast, without heavy optimizations, that will make code unreadable. Also, if it possible, prepare the program to support SMP parallelism.
Very thanks in advance. Hope you'll enjoy this task :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
-- Best regards, Vasyl Pasternak

Vasyl Pasternak wrote:
The entire code I placed on http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2764
Could someone help me to make this code faster? I'd like to see solution that will be elegant and fast, without heavy optimizations, that will make code unreadable. Also, if it possible, prepare the program to support SMP parallelism.
The solution's already been posted, but to make this particular code faster, I recommend using Data.Trie instead of Data.Map ByteString. Tries are faster for lookup since they don't redundantly check the prefix of the query; also they're better for memory usage because they don't store redundant copies of the prefixes. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie -- Live well, ~wren

I changed the program, now it similar to the program from the wiki
(http://www.haskell.org/haskellwiki/Phone_number)
The version with ByteString compared to version with ordinary Strings
works 3.5 times faster.
(I put it to http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2830)
But version with Data.Trie dissapointed me, it works 5 times slover
than version with Data.Map ByteString.
(here is the code http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2829)
Anyway, thanks to everyone who helped me, Haskell is really powerfull
tool in clever hands :)
2009/3/23 wren ng thornton
Vasyl Pasternak wrote:
The entire code I placed on http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2764
Could someone help me to make this code faster? I'd like to see solution that will be elegant and fast, without heavy optimizations, that will make code unreadable. Also, if it possible, prepare the program to support SMP parallelism.
The solution's already been posted, but to make this particular code faster, I recommend using Data.Trie instead of Data.Map ByteString. Tries are faster for lookup since they don't redundantly check the prefix of the query; also they're better for memory usage because they don't store redundant copies of the prefixes.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Best regards, Vasyl Pasternak
participants (3)
-
Bulat Ziganshin
-
Vasyl Pasternak
-
wren ng thornton