
Hi GHCers, I recently ran into a problem where Haddock does not correctly handle Unicode in doc comments. So for example with this file: """ module Example where -- | 好 ok :: Int -> Int ok x = x -- | 个 misinterp :: Int -> Int misinterp _ = (-1) -- | 漢 failure :: Int -> Int failure x = x-1 """ Current versions of Haddock will output the documentation for "ok" correctly, will output an empty bulleted list as the documentation for "misinterp" and not output any documentation at all for "failure" (echoing a warning to stderr instead). This is kind of sad. There is a very old open ticket about this issue: http://trac.haskell.org/haddock/ticket/20. The patches I've attached to that ticket fix the problem by using the native Unicode support in Alex 3. I've also attached to the ticket a patch which makes the necessary changes to GHC's build system required to build this new Haddock correctly. Do these patches seem OK? Is it fine to insist on Alex 3? I think it was released in 2011 so I think by now we can assume that it is available on all machines that will want to build GHC. If this patch is accepted, at some point we might want to think about switching to Alex 3's unicode support in GHC's own lexer rather than relying on the current hacks. My patches do not make any change along those lines. Cheers, Max