
Hello, I have modified the Alex lexer generator to support unicode. The general idea is that the state-machine works on the UTF8 representation of the text. I submit my work here for review in order to off-load the maintainer (Simon Marlow) as far as possible. The prototype is available on github: git://github.com/jyp/Alex.git Be sure to * checkout the "utf8" branch (so "git diff master" shows the changes) * Do a 2-stage bootstrapping before testing Caveats: * The generated code depends on some utf8 packages; * There is no attempt to fix the bytestring-based wrappers; * Left-context recognition is not table-based any more; * Presence of debug code. Bug reports, comments, and especially patches are welcome :) Thanks, -- JP
participants (1)
-
Jean-Philippe Bernardy