[patch] Fix URI escaping in X.A.Search

Hello This patch fix URI escaping in X.A.Search. Non ASCII characters were not escaped. This patch modify escape function so it encodes as UTF8 and escapes all non-ASCII characters. This is also a workaroud for bug in System.Process which doesn't properly encode strings and just truncate Chars to 1 byte. -- Khudyakov Alexey

В сообщении от 24 апреля 2010 00:57:29 вы написали:
Hello
This patch fix URI escaping in X.A.Search. Non ASCII characters were not escaped. This patch modify escape function so it encodes as UTF8 and escapes all non-ASCII characters.
This is also a workaroud for bug in System.Process which doesn't properly encode strings and just truncate Chars to 1 byte.
Ping?

On Thu, Apr 29, 2010 at 7:29 AM, Khudyakov Alexey
В сообщении от 24 апреля 2010 00:57:29 вы написали:
Hello
This patch fix URI escaping in X.A.Search. Non ASCII characters were not escaped. This patch modify escape function so it encodes as UTF8 and escapes all non-ASCII characters.
This is also a workaroud for bug in System.Process which doesn't properly encode strings and just truncate Chars to 1 byte.
Ping?
I looked at it earlier, and I wasn't so sure that this is entirely right. For example, in 'isAscii c && isAlphaNum c = [c]', isn't isAscii redundant? I thought all AlphaNum were ASCII. As well as that, I'm not confident about the use of encodeString. We had a long discussion about modify the 'spawn' functions to UTF-8 encode stuff, and the upshot was that it seemed like a bad idea? See http://www.haskell.org/pipermail/xmonad/2009-January/thread.html#7012 -- gwern

В сообщении от 29 апреля 2010 19:47:26 вы написали:
On Thu, Apr 29, 2010 at 7:29 AM, Khudyakov Alexey
wrote: В сообщении от 24 апреля 2010 00:57:29 вы написали:
Hello
This patch fix URI escaping in X.A.Search. Non ASCII characters were not escaped. This patch modify escape function so it encodes as UTF8 and escapes all non-ASCII characters.
This is also a workaroud for bug in System.Process which doesn't properly encode strings and just truncate Chars to 1 byte.
Ping?
I looked at it earlier, and I wasn't so sure that this is entirely right. For example, in 'isAscii c && isAlphaNum c = [c]', isn't isAscii redundant? I thought all AlphaNum were ASCII.
No. It's necessary.
Prelude Data.Char> isAlphaNum 'ы' True
As well as that, I'm not confident about the use of encodeString. We had a long discussion about modify the 'spawn' functions to UTF-8 encode stuff, and the upshot was that it seemed like a bad idea? See http://www.haskell.org/pipermail/xmonad/2009-January/thread.html#7012
In this particular case it's right thing to do. According to RFC3986[1] all URLs should be first encoded in UTF-8 and then percent encoded. So xmonad just emits valid URL. Quote from RFC3986: Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- encoded to be represented as URI characters [1] http://tools.ietf.org/html/rfc3986

On Thu, Apr 29, 2010 at 12:25 PM, Khudyakov Alexey
В сообщении от 29 апреля 2010 19:47:26 вы написали:
On Thu, Apr 29, 2010 at 7:29 AM, Khudyakov Alexey
wrote: В сообщении от 24 апреля 2010 00:57:29 вы написали:
Hello
This patch fix URI escaping in X.A.Search. Non ASCII characters were not escaped. This patch modify escape function so it encodes as UTF8 and escapes all non-ASCII characters.
This is also a workaroud for bug in System.Process which doesn't properly encode strings and just truncate Chars to 1 byte.
Ping?
I looked at it earlier, and I wasn't so sure that this is entirely right. For example, in 'isAscii c && isAlphaNum c = [c]', isn't isAscii redundant? I thought all AlphaNum were ASCII.
No. It's necessary.
Prelude Data.Char> isAlphaNum 'ы' True
As well as that, I'm not confident about the use of encodeString. We had a long discussion about modify the 'spawn' functions to UTF-8 encode stuff, and the upshot was that it seemed like a bad idea? See http://www.haskell.org/pipermail/xmonad/2009-January/thread.html#7012
In this particular case it's right thing to do. According to RFC3986[1] all URLs should be first encoded in UTF-8 and then percent encoded. So xmonad just emits valid URL.
Quote from RFC3986: Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- encoded to be represented as URI characters
Hm. Alright then; I'll apply a patch if you'll amend-record it to remove this: -search browser site query = safeSpawn browser [site query] +search browser site query = do + safeSpawn browser [site query] Unnecessary 'do' is bad, and someone would have to clean it up with hlint eventually. -- gwern
participants (2)
-
Gwern Branwen
-
Khudyakov Alexey