
Hi Cafe, In one of my projects I have a lexer that seems to be taking an inordinate amount of time and space. The lexer is generated by Alex using the lazy ByteString (with position information) template. I compiled and ran with profiling enabled and I get a report like this: ------------------------------------------------------------------------------- Tue Jun 21 16:56 2011 Time and Allocation Profiling Report (Final) ViewCallGraph +RTS -p -RTS gnutls.bc total time = 51.80 secs (2590 ticks @ 20 ms) total alloc = 9,482,333,244 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc alexScanTokens Data.LLVM.Private.Lexer 24.1 4.5 alex_scan_tkn Data.LLVM.Private.Lexer 21.2 32.7 tokenAs Data.LLVM.Private.Parser.Primitive 6.7 2.9 alexGetChar Data.LLVM.Private.Lexer 6.5 22.7 ------------------------------------------------------------------------------- The entries below these four are marginal. The third entry is from my code and isn't a big deal (yet), but the other three seem to indicate that the lexer is responsible for about 50% of my runtime and memory allocation. For reference, this particular input is about 18M of text, though the ratios are just as bad for smaller inputs. My uneducated suspicion is that Alex is constructing separate ByteStrings that it passes to each of my token constructors, and that this is responsible for a large part of this allocation. Most of my token constructors just ignore this ByteString - assuming there really is an allocation for each token, is there any way to avoid it? I was looking at alexScanTokens, alex_scan_tkn, and alexGetChar but didn't see any obvious ways to improve them. Alternatively, does any one have lexing performance tips that might help? Thanks