
Jeremy, Thanks, this is useful. Gitit already uses UTF-8 where it should, I think, but I hadn't noticed the issue with look* functions and ToMessage instances in HAppS. This affects gitit too (try typing an accented string into the search box, for example). I strongly agree that HAppS should be fixed, but until that happens I'll look into putting workarounds into gitit. John
For example, lookPairs currently looks like this:
lookPairs :: RqData [(String,String)] lookPairs = asks fst >>= return . map (\(n,vbs)->(n,L.unpack $ inputValue vbs))
As you can see, it just takes the incoming bytes and converts them to a String, but without doing any decoding. You probably want something more like:
lookPairs :: RqData [(String,String)] lookPairs = asks fst >>= return . map (\(n,vbs)->(n,Data.ByteString.Lazy.UTF8.toString $ inputValue vbs))
Some of the other look* functions need patching as well.
Similarily, the ToMessage instances need to encode the outgoing data. Consider:
instance ToMessage Html where toContentType _ = B.pack "text/html" toMessage = L.pack . renderHtml
We really want to make two changes:
instance ToMessage Html where toContentType _ = B.pack "text/html; charset=UTF-8" -- add the encoding toMessage = Data.ByteString.Lazy.UTF8.fromString . renderHtml -- encode the data
3. make sure that any I/O (readFile, writeFile, etc) uses the utf-8 functions from utf8-string.
If you don't want to patch HAppS-Server, then you could work around it by doing silliness like:
do pairs' <- lookPairs let pairs = map (first toString . second toString) pairs'
hope this helps. - jeremy