Re: [Haskell-cafe] Why so many strings in Network.URI, System.Posix and similar libraries?

14 Mar 2012

      2012/3/12 Jeremy Shaw :
...
On Sun, Mar 11, 2012 at 1:33 PM, Jason Dusek  wrote:
...
Well, to quote one example from RFC 3986:
 2.1.  Percent-Encoding
  A percent-encoding mechanism is used to represent a data octet in a
  component when that octet's corresponding character is outside the
  allowed set or is being used as a delimiter of, or within, the
  component.
Right. This describes how to convert an octet into a sequence of characters,
since the only thing that can appear in a URI is sequences of characters.
...
The syntax of URIs is a mechanism for describing data octets,
not Unicode code points. It is at variance to describe URIs in
terms of Unicode code points.
Not sure what you mean by this. As the RFC says, a URI is defined entirely
by the identity of the characters that are used. There is definitely no
single, correct byte sequence for representing a URI. If I give you a
sequence of bytes and tell you it is a URI, the only way to decode it is to
first know what encoding the byte sequence represents.. ascii, utf-16, etc.
Once you have decoded the byte sequence into a sequence of characters, only
then can you parse the URI.
Mr. Shaw,

Thanks for taking the time to explain all this. It's really
helped me to understand a lot of parts of the URI spec a lot
better. I have deprecated my module in the latest release

  http://hackage.haskell.org/package/URLb-0.0.1

because a URL parser working on bytes instead of characters
stands out to me now as a confused idea.

--
Jason Dusek
pgp  ///  solidsnack  1FD4C6C1 FED18A2B