
Hi Bram,
Sorry for being a bit late to this -- I have been on the road.
I have switched over you example to pre-compile the REs and use ByteString and can see 13x speedup on scan and a 9x speedup on mapping. Curiously, nearly all of that speedup seems to be gained by lifting the RE compilation out of the loop but I am pretty sure there are gains to be had from re-writing the loops.
Do you have the Python code that was performing 80x better?
Chris
From: Alfredo Di Napoli
Try to use Text or ByteString instead of strings. Try to use compile and execute methods (http://hackage.haskell.org/package/regex-tdfa-1.2.1/docs/Text-Regex-TDFA-Byt...), make sure regex get compiled once.
2017-05-16 12:12 GMT+03:00 Bram Neijt
: Dear reader,
I decided to do a little project which is a simple search and replace program for large text files.
Written in Haskell, it does a few different regex matches on each line and stores them in a leveldb key-value store to create a consistent/reviewable search-replace index. It should provide for some simple/brute-force anonymization of data and therefore I called it hanon (sorry, could not think of a better name).
https://github.com/BigDataRepublic/hanon
The code works, but I've done some benchmarking to compare it with Python and the code is about 80x slower then doing the same thing in Python, making it useless for larger data files.
I'm obviously doing something wrong.
Could you give me tips on improving the performance of this code? Probably mainly looking at
https://github.com/BigDataRepublic/hanon/blob/master/src/Mapper.hs
where the regex code lives?
Greetings,
Bram _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Sincerely, Stanislav Chernichkin.
Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.