Re: [Haskell-cafe] Gitit - Encoding

30 Dec 2008

      Hello,

I have not looked at the gitit source code, but I have had this
problem in other HAppS applications. The problem is that by default
HAppS does nothing about string encodings. The easy fix is to use
utf-8 and unicode everywhere. ('easy' compared to supporting multiple
encodings).

The goal is to make sure that in gitit, a String is always a list of
unicode code points, and not a list of utf-8 encoded octets. This
means that whenever data comes in or goes out of gitit it needs to be
decoded or encoded.

To transition you need to do atleast the following:

1. Set the charset of the outgoing pages so that the browser knows
that the pages is supposed to be utf-8:

 For html, this can be done by adding this meta to the <head> of each page:

  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

 However, for text/plain, etc, you must set it in the HTTP header
 (which I will cover later). For html, it is still useful to set the
 meta tag though, so that if the page is saved to disk, the encoding
 is not lost.

2. use the utf8-string library, and make sure that all the
inputs/outputs are decoded/encoded properly.

This probably means patching your copy of HAppS-Server (or copying the
modified functions into gitit). 

For example, lookPairs currently looks like this:
...
lookPairs :: RqData [(String,String)]
lookPairs = asks fst >>= return . map (\(n,vbs)->(n,L.unpack $ inputValue vbs))
As you can see, it just takes the incoming bytes and converts them to
a String, but without doing any decoding. You probably want something
more like:
...
lookPairs :: RqData [(String,String)]
lookPairs = asks fst >>= return . map (\(n,vbs)->(n,Data.ByteString.Lazy.UTF8.toString $ inputValue vbs))
Some of the other look* functions need patching as well.

Similarily, the ToMessage instances need to encode the outgoing data. Consider:
...
instance ToMessage Html where
   toContentType _ = B.pack "text/html"
   toMessage = L.pack . renderHtml
We really want to make two changes:
...
instance ToMessage Html where
   toContentType _ = B.pack "text/html; charset=UTF-8"            -- add the encoding
   toMessage = Data.ByteString.Lazy.UTF8.fromString . renderHtml  -- encode the data
3. make sure that any I/O (readFile, writeFile, etc) uses the utf-8
functions from utf8-string.

If you don't want to patch HAppS-Server, then you could work around it by doing silliness like:

 do pairs' <- lookPairs
    let pairs = map (first toString . second toString) pairs'

but that seems error prone and not a long term solution. The obvious
long term solution is for HAppS to fix its encoding issues. The simple
fix is to hardwire it for utf-8, but a system that would supports
arbitrary encodings might be nice?

As far as I know, no one has even tried to submit a patch hardwiring
HAppS to use utf-8 -- which seems like a good short-term solution. You
might try posting on the HAppS mailing list and see if such a patch
would be welcome:

http://groups.google.com/group/HAppS

hope this helps.
- jeremy

At Tue, 30 Dec 2008 13:58:15 +0100, 
Arnaud Bailly wrote:
...
Hello,
I have started using Gitit and I am very happy with it and eager to
start hacking. I am running into a practical problem: characters
encoding. When I edit pages using accented characters (I am french),
the accents get mangled when the page come back from server.
The raw files are incorrectly encoded. Where Shall I look for fixing
this issue ?
Thanks
ps: the wiki is live at http://www.notre-ecole.org(some of the other look funct
-- 
Arnaud Bailly, PhD
OQube - Software Engineering
web> http://www.oqube.com
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Gitit - Encoding

Jeremy Shaw