Hi,
Recently I was improving performance of simple-sql-parser library which sits on top of megaparsec.
Main issue was memory consumption, not speed. Machine generated SQL query might weigh megabytes.
I got success with a deduplicating cache for lexer tokens.
A complex parser produces very chunky lazy text.
To avoid traversing these linked lists (cache locality) and copying to strict one
- offsets in the input Text are analyzed before and after success return from parser => shallow substring which is deduplicated with high probability and no extra memcpy.
Another interesting idea, I would like to try, is to use ByteString for text input and mmap input byte string.
SQL query is 99% ASCII only and all keywords are ASCII bytes.
So knowing that constraint parser doesn't spend time on reconstructing characters out of impossible multibyte input,
when it expects keywords.
Megaparsec doesn't have digital search tree, which help a lot with avoiding back tracking.
Thanks