Re: [Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

14 Feb 2007

      jeff:
...
It was suggested that I might derive some performance benefit from using lazy 
bytestrings in my tokenizer instead of regular strings.  Here's the code that 
I've tried.  Note that I've hacked the "basic" wrapper code in the Lazy 
version, so the code should be all but the same.  The only thing I had to do 
out of the ordinary was write my own 'take' function instead of using the 
substring function provided by Data.Lazy.ByteString.Char8.  The take function 
I used was derived from the one GHC uses in GHC.List and produces about the 
same code.
The non-lazy version runs in 38 seconds on a 211MB file versus the lazy 
versions 41 seconds.  That of course doesn't seem like that much, and in the 
non-lazy case, I have to break the input up into multiple files, whereas I 
don't have to in the lazy version -- this does not take any extra time.  The 
seconds do add up to a couple of hours for me, though once I'm done, and so 
I'd like to understand why, when the consensus was that Data.ByteString.Lazy 
might give me better performance in the end, it doesn't do so here.
I am running GHC 2.6 now, and am using -O3 as my optimization parameter.  I'm 
profiling the code now, but was wondering if there was any insight...
GHC 6.6 you mean?

Can you post a complete example, including FileReader, so that I can
compile the code, with some example input and output, to work out what's
going on?

By the way, if you're able to break the file into chunks already, we
should able to do even better with a strict ByteString.

Cheers,
  Don

Re: [Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

dons＠cse.unsw.edu.au