
Hello, I have not looked at the gitit source code, but I have had this problem in other HAppS applications. The problem is that by default HAppS does nothing about string encodings. The easy fix is to use utf-8 and unicode everywhere. ('easy' compared to supporting multiple encodings). The goal is to make sure that in gitit, a String is always a list of unicode code points, and not a list of utf-8 encoded octets. This means that whenever data comes in or goes out of gitit it needs to be decoded or encoded. To transition you need to do atleast the following: 1. Set the charset of the outgoing pages so that the browser knows that the pages is supposed to be utf-8: For html, this can be done by adding this meta to the <head> of each page: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> However, for text/plain, etc, you must set it in the HTTP header (which I will cover later). For html, it is still useful to set the meta tag though, so that if the page is saved to disk, the encoding is not lost. 2. use the utf8-string library, and make sure that all the inputs/outputs are decoded/encoded properly. This probably means patching your copy of HAppS-Server (or copying the modified functions into gitit). For example, lookPairs currently looks like this:
lookPairs :: RqData [(String,String)] lookPairs = asks fst >>= return . map (\(n,vbs)->(n,L.unpack $ inputValue vbs))
As you can see, it just takes the incoming bytes and converts them to a String, but without doing any decoding. You probably want something more like:
lookPairs :: RqData [(String,String)] lookPairs = asks fst >>= return . map (\(n,vbs)->(n,Data.ByteString.Lazy.UTF8.toString $ inputValue vbs))
Some of the other look* functions need patching as well. Similarily, the ToMessage instances need to encode the outgoing data. Consider:
instance ToMessage Html where toContentType _ = B.pack "text/html" toMessage = L.pack . renderHtml
We really want to make two changes:
instance ToMessage Html where toContentType _ = B.pack "text/html; charset=UTF-8" -- add the encoding toMessage = Data.ByteString.Lazy.UTF8.fromString . renderHtml -- encode the data
3. make sure that any I/O (readFile, writeFile, etc) uses the utf-8 functions from utf8-string. If you don't want to patch HAppS-Server, then you could work around it by doing silliness like: do pairs' <- lookPairs let pairs = map (first toString . second toString) pairs' but that seems error prone and not a long term solution. The obvious long term solution is for HAppS to fix its encoding issues. The simple fix is to hardwire it for utf-8, but a system that would supports arbitrary encodings might be nice? As far as I know, no one has even tried to submit a patch hardwiring HAppS to use utf-8 -- which seems like a good short-term solution. You might try posting on the HAppS mailing list and see if such a patch would be welcome: http://groups.google.com/group/HAppS hope this helps. - jeremy At Tue, 30 Dec 2008 13:58:15 +0100, Arnaud Bailly wrote:
Hello, I have started using Gitit and I am very happy with it and eager to start hacking. I am running into a practical problem: characters encoding. When I edit pages using accented characters (I am french), the accents get mangled when the page come back from server.
The raw files are incorrectly encoded. Where Shall I look for fixing this issue ?
Thanks
ps: the wiki is live at http://www.notre-ecole.org(some of the other look funct
-- Arnaud Bailly, PhD OQube - Software Engineering
web> http://www.oqube.com _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe