[Haskell-cafe] Alex Lexer Performance Issues

22 Jun 2011

      Hi Cafe,

In one of my projects I have a lexer that seems to be taking an
inordinate amount of time and space.  The lexer is generated by Alex
using the lazy ByteString (with position information) template.  I
compiled and ran with profiling enabled and I get a report like this:

-------------------------------------------------------------------------------
  Tue Jun 21 16:56 2011 Time and Allocation Profiling Report  (Final)

     ViewCallGraph +RTS -p -RTS gnutls.bc

  total time  =       51.80 secs   (2590 ticks @ 20 ms)
  total alloc = 9,482,333,244 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE                              %time %alloc

alexScanTokens                 Data.LLVM.Private.Lexer             24.1    4.5
alex_scan_tkn                  Data.LLVM.Private.Lexer             21.2   32.7
tokenAs                        Data.LLVM.Private.Parser.Primitive   6.7    2.9
alexGetChar                    Data.LLVM.Private.Lexer              6.5   22.7
-------------------------------------------------------------------------------

The entries below these four are marginal.  The third entry is from my
code and isn't a big deal (yet), but the other three seem to indicate
that the lexer is responsible for about 50% of my runtime and memory
allocation.  For reference, this particular input is about 18M of
text, though the ratios are just as bad for smaller inputs.

My uneducated suspicion is that Alex is constructing separate
ByteStrings that it passes to each of my token constructors, and that
this is responsible for a large part of this allocation.  Most of my
token constructors just ignore this ByteString - assuming there really
is an allocation for each token, is there any way to avoid it?  I was
looking at alexScanTokens, alex_scan_tkn, and alexGetChar but didn't
see any obvious ways to improve them.

Alternatively, does any one have lexing performance tips that might
help?

Thanks

[Haskell-cafe] Alex Lexer Performance Issues

Tristan Ravitch