Data encoding library

older
Applicative functor for building C...

Magnus Therning

13 Oct 2007 13 Oct '07

6:26 p.m.

I've just created the page http://haskell.org/haskellwiki/Library/Data_encoding (there are no links into it at the moment). Any comments on what I've written there? Are the locations in the hierarchy good? I have code to share if you are interested in commenting on it as well. Just let me know and I'll find a place to put it for public download. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus．therning＠gmail．com http://therning.org/magnus Windows [n.] A thirty-two bit extension and GUI shell to a sixteen bit patch to an eight bit operating system originally coded for a four bit microprocessor and sold by a two-bit company that can't stand one bit of competition. -- Anonymous USEnet post

Attachments:

signature.asc (application/pgp-signature — 189 bytes)

Show replies by date

apfelmus

14 Oct 14 Oct

10:55 a.m.

Magnus Therning wrote:

...

I've just created the page http://haskell.org/haskellwiki/Library/Data_encoding (there are no links into it at the moment).

Any comments on what I've written there? Are the locations in the hierarchy good?

Nice, but do you really need separate modules for each encoding? I mean, Ashely's proposal

...

2. Codecs, i.e. encoder/decoder pairs such as charset converters

data Codec base derived = MkCodec { encode :: derived -> base, decode :: base -> Maybe derived -- or other Monad }

utf8 :: Codec [Word8] String xml :: Codec String XML

from the recent thread http://article.gmane.org/gmane.comp.lang.haskell.libraries/7663 would fit the bill perfectly, wouldn't it? In other words, encodings are just pairs of functions, nothing complicated base16 :: Codec String [Word8] -- decode Strings to Word8 base32 :: Codec String [Word8] base64 :: ... base64url :: uuencode :: For more documentation in the types, a type synonym may come in very handy type ASCII = String base16 :: Codec ASCII [Word8] ... Want to encode an example? Here you go encode base16 [0xde,0xad,0xbe,0xef] :: ASCII Btw, it is essential that decode may fail on bad input! Also, I don't have a clue about what chop and unchop are supposed to do. Regards, apfelmus

Magnus Therning

4:50 p.m.

On Sun, Oct 14, 2007 at 12:55:28 +0200, apfelmus wrote:

...

Magnus Therning wrote:

...
I've just created the page http://haskell.org/haskellwiki/Library/Data_encoding (there are no links into it at the moment). Any comments on what I've written there? Are the locations in the hierarchy good?

Nice, but do you really need separate modules for each encoding? I mean, Ashely's proposal

There is no particular need for separate modules, except that in the current version there would be name clashes.

...

...
2. Codecs, i.e. encoder/decoder pairs such as charset converters data Codec base derived = MkCodec { encode :: derived -> base, decode :: base -> Maybe derived -- or other Monad } utf8 :: Codec [Word8] String xml :: Codec String XML

from the recent thread

http://article.gmane.org/gmane.comp.lang.haskell.libraries/7663

would fit the bill perfectly, wouldn't it? In other words, encodings are just pairs of functions, nothing complicated

base16 :: Codec String [Word8] -- decode Strings to Word8 base32 :: Codec String [Word8] base64 :: ... base64url :: uuencode ::

For more documentation in the types, a type synonym may come in very handy

type ASCII = String base16 :: Codec ASCII [Word8] ...

Want to encode an example? Here you go

encode base16 [0xde,0xad,0xbe,0xef] :: ASCII

A similar result could be gotten by using phantom types, right? But then there must be some way of liberating the result. I'm not sure yet whether they are worth it. AFAIU the example from above then changes to encode [0xde,0xad,0xbe,0xef] :: Base16 ASCII

...

Btw, it is essential that decode may fail on bad input!

I will for sure change the result of decode to deal with failures.

...

Also, I don't have a clue about what chop and unchop are supposed to do.

For some encodings there are standard ways of splitting an encoded string over several lines. Unfortunately it's not always as simple as just splitting a string at a particular length. Uuencode is the most complicated I've come across so far. That's what chop/unchop is for. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus．therning＠gmail．com http://therning.org/magnus

apfelmus

6:11 p.m.

Magnus Therning wrote:

...

...
...
2. Codecs, i.e. encoder/decoder pairs such as charset converters data Codec base derived = MkCodec { encode :: derived -> base, decode :: base -> Maybe derived -- or other Monad } utf8 :: Codec [Word8] String xml :: Codec String XML

type ASCII = String base16 :: Codec ASCII [Word8] ...

encode base16 [0xde,0xad,0xbe,0xef] :: ASCII

A similar result could be gotten by using phantom types, right?

Most likely, although I'm not sure whether the choice from your blog is the right one. I mean, the only-a-little-bit-phantom type newtype Base16 a = Base16 { unBase16 :: a } deriving (Eq,Show) will do the job too instance DataEncoding Base16 where encode = Base16 . b16Encode decode = b16Decode . unBase16 chop n = Base16 . b16chop n . unBase16 unchop = Base16 . b16unchop . unBase16 liberate = unBase16 incarcerate = Base16 Usually, the "normal" phantom type approach would be to make the encoding a phantom argument of a string type, not the other way round: newtype EncodedString enc = ES String data Base16 -- empty type, no constructors instance DataEncoding (EncodedString Base16) where ... But your idea of fixing the encoding in the type for more type safety is good. Another way to do that would be to have an abstract data type -- this is not a String, this is base16-encoded data! newtype Base16 = Base16 String with functions encode :: [Word8] -> Base16 decode :: Base16 -> [Word8] and functions encode :: Base16 -> String decode :: String -> Maybe Base16 The "normal" phantom type approach has the advantage of making the last functions polymorphic encode :: EncodedString enc -> String decode :: String -> EncodedString enc encode (ES s) = s decode s = ES s at the expense of shifting the possible failure to decode :: EncodedString Base16 -> Maybe [Word8] Of course, you can use both phantom types and the codec approach eliminating the need for a type class base16 :: Codec [Word8] (EncodedString Base16) string :: Codec (EncodedString a) String

...

But then there must be some way of liberating the result. I'm not sure yet whether they are worth it.

AFAIU the example from above then changes to

encode [0xde,0xad,0xbe,0xef] :: Base16 ASCII

Concerning the choice between encoding the encoding (... ;-) in the types (like Base16) or as values (like base16 :: Codec ...), the observation is that you have to specify the encoding anyway :) either as type annotation ("type argument") encode [0xde,0xad,0xbe,0xef] :: EncodedString Base16 encode' (undefined :: Base16) [0xde,0xad,0xbe,0xef] or as value argument encode base16 [0xde,0xad,0xbe,0xef] In this case, I would prefer the value argument approach for its brevity and mnemonics ("encode in base16 the following data"). However, possible strong type guarantees usually are a good argument for the typed approach. To be true, I'm not really sure whether strong types would gain us something here.

...

...
Also, I don't have a clue about what chop and unchop are supposed to do.

For some encodings there are standard ways of splitting an encoded string over several lines. Unfortunately it's not always as simple as just splitting a string at a particular length. Uuencode is the most complicated I've come across so far. That's what chop/unchop is for.

Ah, that's what they are for. An idea would be to build the line length into the encoding, like base16 :: Int -> Codec [Word8] [String] with the intention that encode (base16 70) x will encode x with a line length of 70 characters. Hm, should decode (base16 70) s fail when the lines are not 70 characters in length, or should it accept any line length? Maybe it should be basae16 :: Maybe Int -> Codec [Words8] [String] since the programmer may choose to not wrap lines anyway. But perhaps the line length is best paired with the data base16 :: Codec ([Words8], Maybe Int) [String] so that encode base16 (..., Just 70) x will encode with a line length of 70 characters and let (,ll) = decode base16 s in ... will return the parsed line length in ll . Oh my lambda, it's wondrous how Haskell gives so many possibilities to ponder for such a seemingly innocent API design problem :) Regards, apfelmus

Magnus Therning

20 Oct 20 Oct

12:46 p.m.

I've updated the Wiki page. I think I've decided on the API I want. Basically the old one remains and one similar to Ashley's has been added. Phantom types are cute and can easily be added in the future if needed. Once I've changed the type for decode to allow for failure I'll upload the code to code.haskell.org/dataenc. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus．therning＠gmail．com http://therning.org/magnus

6471

Age (days ago)

6478

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

apfelmus
Magnus Therning