Re: [Haskell-cafe] Haskell performance when it comes to regex?

29 May 2017

      Hi Chris,

Thank you for looking into this and thank you for your pull-request.

I moved the "=~" outside of the map and that makes the whole thing a
huge amount faster.

Seems my assumption that =~ would memoise the regex creation (I read
that in a post on regex in Haskell[1])

The 80% diff is now gone, the Python code was everything without the
leveldb stuff (but still, compiling the regexes every time, so it
seemed like a valid comparison at the time), see attachment for code.

Thank you all for your help!

Bram

[1] http://www.serpentine.com/blog/2007/02/27/a-haskell-regular-expression-tutor...

On Sun, May 28, 2017 at 2:22 PM, Chris Dornan  wrote:
...
Hi Bram,
Sorry for being a bit late to this -- I have been on the road.
I have switched over you example to pre-compile the REs and use ByteString
and can see 13x speedup on scan and a 9x speedup on mapping. Curiously,
nearly all of that speedup seems to be gained by lifting the RE compilation
out of the loop but I am pretty sure there are gains to be had from
re-writing the loops.
Do you have the Python code that was performing 80x better?
Chris
From: Alfredo Di Napoli 
Date: Monday, 22 May 2017 at 08:48
To: Bram Neijt 
Cc: Станислав Черничкин , haskell-cafe
, Chris Dornan 
Subject: Re: [Haskell-cafe] Haskell performance when it comes to regex?
Hi Bram,
you might be interested in the “regex” package from my colleague Chris
Dornan:
http://regex.uk/
I know some proper performance work still needs to be done, but I would be
curious to hear your experience report ;)
Alfredo
On 19 May 2017 at 18:52, Bram Neijt  wrote:
Thank you!
I already changed to Text instead, but I thought the regex was already
memoized by GHC, so that should not be a problem.
I'm trying regex-applicative now, maybe that will help, but it takes
some time to figure out the syntax. I'll also try to see if
precompilation helps.
Greetings,
Bram
On Fri, May 19, 2017 at 1:17 PM, Станислав Черничкин
 wrote:
...
Try to use Text or ByteString instead of strings. Try to use compile and
execute methods
(http://hackage.haskell.org/package/regex-tdfa-1.2.1/docs/Text-Regex-TDFA-Byt...),
make sure regex get compiled once.
2017-05-16 12:12 GMT+03:00 Bram Neijt :
...
Dear reader,
I decided to do a little project which is a simple search and replace
program for large text files.
Written in Haskell, it does a few different regex matches on each line
and stores them in a leveldb key-value store to create a
consistent/reviewable search-replace index. It should provide for some
simple/brute-force anonymization of data and therefore I called it
hanon (sorry, could not think of a better name).
https://github.com/BigDataRepublic/hanon
The code works, but I've done some benchmarking to compare it with
Python and the code is about 80x slower then doing the same thing in
Python, making it useless for larger data files.
I'm obviously doing something wrong.
Could you give me tips on improving the performance of this code?
Probably mainly looking at
https://github.com/BigDataRepublic/hanon/blob/master/src/Mapper.hs
where the regex code lives?
Greetings,
Bram
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
--
Sincerely, Stanislav Chernichkin.

Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.