
Folks, Domenic Steinitz has contributed some crypto code (see attachment and msg below). According to our draft hierarchy, there is a place in the hierarchy reserved for crypto libraries: FileFormat.Encryption (see http://www.haskell.org/~simonmar/lib-hierarchy.html). Question 1: there's a comment next to 'FileFormat' which mentions that 'Codec' might be a more accurate name. It seems to me that 'Codec' would indeed be a better choice: many of the libraries under FileFormat are more like pure stream-transformers than file formats, and Codec encompasses both. What do people think about changing this? (there aren't any other implementations of libraries under FileFormat that I'm aware of). If FileFormat became Codec, then the existing FileFormat.Encoding looks a bit odd. But moving FileFormat.Encoding.Base64 up to Codec.Base64 (similarly for FileFormat.Encoding.Yenc) would seem to make sense. Question 2: what should the insides of the FileFormat.Encryption (or Codec.Encryption) hierarchy look like? Cheers, Simon -----Original Message----- From: Dominic Steinitz [mailto:dominic.steinitz@blueyonder.co.uk] Sent: 21 April 2003 18:47 To: Simon Peyton-Jones; Simon Marlow Cc: libraries-request@haskell.org Simon, Simon, Here's my first attempt at a crypto library. It compiles and I can run a test using ghc-inplace. The test checks with the example in http://www.itl.nist.gov/fipspubs/fip81.htm (except I couldn't find an example with PKCS#5 padding). I'm not sure what the next steps are. Dominic Steinitz

This reminds me, for my ginsu project http://repetae.net/john/computer/ginsu/ I implemented interfaces to curses and part of OpenSSL's cryptography routines. Would there be interest in putting those into the general haskell librarys as well? John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

John Meacham
I implemented interfaces to curses and part of OpenSSL's cryptography routines. Would there be interest in putting those into the general haskell librarys as well?
Yes, I'm sure. Have a look at http://www.haskell.org/~simonmar/lib-hierarchy.html and make a proposal for where in the hierarchy you think they fit, bearing in mind that the guiding rule is to classify by the library's purpose. Then you can release the software as an independent "package" for ghc, Hugs, and nhc98. Regards, Malcolm

"Simon Marlow"
Question 1: there's a comment next to 'FileFormat' which mentions that 'Codec' might be a more accurate name. It seems to me that 'Codec' would indeed be a better choice: many of the libraries under FileFormat are more like pure stream-transformers than file formats, and Codec encompasses both. What do people think about changing this? (there aren't any other implementations of libraries under FileFormat that I'm aware of).
I concur that Codec is a better name than FileFormat. It is more general, since the encodings may in fact never be stored in files - they could just be transmitted directly to a decoder over a network for instance.
If FileFormat became Codec, then the existing FileFormat.Encoding looks a bit odd. But moving FileFormat.Encoding.Base64 up to Codec.Base64 (similarly for FileFormat.Encoding.Yenc) would seem to make sense.
Since we already have a little hierarchy based on the codec's purpose: Codec.Image (e.g. Codec.Image.Jpeg) Codec.Video (e.g. Codec.Video.Mpeg2) then how about adding Codec.Text (e.g. Codec.Text.Base64) because its role, like uuencoding, is to convert binary streams to 7-bit ASCII text i.e. transmissible by email. A codec that is truly multi-purposed would sit directly under Codec, but I can't immediately think of an example that might fit.
Question 2: what should the insides of the FileFormat.Encryption (or Codec.Encryption) hierarchy look like?
I would have thought that its population would look something like: Codec.Encryption.DES Codec.Encryption.RSA Codec.Encryption.Blowfish etc. No need for any deeper structure. Any attempt to further classify crypto schemes by method or purpose would be confusing I think. Crypto is just crypto, i.e. binary to binary. Regards, Malcolm

On Tue, Apr 22, 2003 at 12:15:41PM +0100, Malcolm Wallace wrote:
I concur that Codec is a better name than FileFormat. It is more general, since the encodings may in fact never be stored in files - they could just be transmitted directly to a decoder over a network for instance.
Me too.
If FileFormat became Codec, then the existing FileFormat.Encoding looks a bit odd. But moving FileFormat.Encoding.Base64 up to Codec.Base64 (similarly for FileFormat.Encoding.Yenc) would seem to make sense.
Since we already have a little hierarchy based on the codec's purpose: Codec.Image (e.g. Codec.Image.Jpeg) Codec.Video (e.g. Codec.Video.Mpeg2) then how about adding Codec.Text (e.g. Codec.Text.Base64) because its role, like uuencoding, is to convert binary streams to 7-bit ASCII text i.e. transmissible by email.
I would have thought Codec.Binary.Base64, since in the other cases (Image and Video), the name is for the content you wish to code, not the format in which it is encoded. e.g. Jpeg is a format for encoding an image in binary, while xpm is a format for encoding an image as text, but both would go under Codec.Image.
A codec that is truly multi-purposed would sit directly under Codec, but I can't immediately think of an example that might fit.
gzip? Although that would probably go under Compression or something.
Question 2: what should the insides of the FileFormat.Encryption (or Codec.Encryption) hierarchy look like?
I would have thought that its population would look something like:
Codec.Encryption.DES Codec.Encryption.RSA Codec.Encryption.Blowfish
etc. No need for any deeper structure. Any attempt to further classify crypto schemes by method or purpose would be confusing I think. Crypto is just crypto, i.e. binary to binary.
I would imagine you'd want a Codec.Encryption.PublicKey and Codec.Encryption.Symmetric, since public key encryption supports a different set of operations from simple symmetric ciphers. I'd think that all symmetric encryption could use a common interface of some sort, but and perhaps all PK encryption could do the same, and it would be nice to reflect this in the hierarchy. Then you'd probably also want a Codec.Encryption.Hash for cryptographic hashes. I'd think you'd also want to create somewhere a module for converting a passphrase into a key of an arbitrary number of characters (which would probably be used by all the symmetric ciphers if they want to use a common interface), but here I'm revealing my ignorance of cryptography, since I don't know what such a thing is called. Some sort of variable-sized hash I guess... -- David Roundy http://www.abridgegame.org

David Roundy wrote:
On Tue, Apr 22, 2003 at 12:15:41PM +0100, Malcolm Wallace wrote:
I concur that Codec is a better name than FileFormat. It is more general, since the encodings may in fact never be stored in files - they could just be transmitted directly to a decoder over a network for instance.
Me too.
Me too, too.
If FileFormat became Codec, then the existing FileFormat.Encoding looks a bit odd. But moving FileFormat.Encoding.Base64 up to Codec.Base64 (similarly for FileFormat.Encoding.Yenc) would seem to make sense.
Since we already have a little hierarchy based on the codec's purpose: Codec.Image (e.g. Codec.Image.Jpeg) Codec.Video (e.g. Codec.Video.Mpeg2) then how about adding Codec.Text (e.g. Codec.Text.Base64) because its role, like uuencoding, is to convert binary streams to 7-bit ASCII text i.e. transmissible by email.
I would have thought Codec.Binary.Base64, since in the other cases (Image and Video), the name is for the content you wish to code, not the format in which it is encoded. e.g. Jpeg is a format for encoding an image in binary, while xpm is a format for encoding an image as text, but both would go under Codec.Image.
I would agree with Codec.Binary.Base64. Codec.Text should be reserved for codecs that deal with text data; i.e. translating text between various encodings like unicode, ISO 8859-1, ascii, ebcdic, etc.
A codec that is truly multi-purposed would sit directly under Codec, but I can't immediately think of an example that might fit.
I would prefer Codec.General for any truly general-purpose codecs, but we shouldn't need it. It seems to me that any codec would have to be designed for some purpose, and it should be possible to classify them based on that purpose.
gzip? Although that would probably go under Compression or something.
Yes, but it should probably be called by the generic term Deflate. I would suggest Codec.Compression.Deflate. The zip file format is a different story; I would suggest FileFormat.Archive.Zip for that. We could also have FileFormat.Archive.Tar and FileFormat.Archive.Cab, as some other examples. FileFormat.Archive could be defined as providing methods to access any file format designed to store other files of arbitrary type; Gzip files would fit that definition even though they can only store a single file. So we could still have a FileFormat.Archive.Gzip, which would presumably use Codec.Compression.Deflate to do the bulk of the work.
Question 2: what should the insides of the FileFormat.Encryption (or Codec.Encryption) hierarchy look like?
I would have thought that its population would look something like:
Codec.Encryption.DES Codec.Encryption.RSA Codec.Encryption.Blowfish
etc. No need for any deeper structure. Any attempt to further classify crypto schemes by method or purpose would be confusing I think. Crypto is just crypto, i.e. binary to binary.
What about ROT13, or Enigma? Those are text to text. Not terribly secure today, but I can think of a more secure recent example: the solitaire cipher (see http://www.counterpane.com/solitaire.html). I don't think that justifies another hierarchy level, though.
I would imagine you'd want a Codec.Encryption.PublicKey and Codec.Encryption.Symmetric, since public key encryption supports a different set of operations from simple symmetric ciphers. I'd think that all symmetric encryption could use a common interface of some sort, but and perhaps all PK encryption could do the same, and it would be nice to reflect this in the hierarchy. Then you'd probably also want a Codec.Encryption.Hash for cryptographic hashes. I'd think you'd also want to create somewhere a module for converting a passphrase into a key of an arbitrary number of characters (which would probably be used by all the symmetric ciphers if they want to use a common interface), but here I'm revealing my ignorance of cryptography, since I don't know what such a thing is called. Some sort of variable-sized hash I guess...
I'm not sure public key vs. symmetric deserves an additional level in the hierarchy. Each encryption codec is inherently either sym or PK, and should implement the appropriate interface for its class. I think I'd like Codec.Encryption.PublicKey to be the name of the public key class, which each PK alg. would implement. By the way, crypto hashes should go somewhere else, because they're not codecs. Same goes for crypto random-number generators. Regards, Matt Harden

G'day all. On Tue, Apr 22, 2003 at 12:15:41PM +0100, Malcolm Wallace wrote:
I would have thought that its population would look something like:
Codec.Encryption.DES Codec.Encryption.RSA Codec.Encryption.Blowfish
etc. No need for any deeper structure. Any attempt to further classify crypto schemes by method or purpose would be confusing I think. Crypto is just crypto, i.e. binary to binary.
On Tue, Apr 22, 2003 at 11:00:05PM -0500, Matt Harden wrote:
What about ROT13, or Enigma? Those are text to text.
If it helps, remember that RSA and DES, for example, don't do the same thing. "Encryption" is a general heading which at least includes symmetric ciphers, asymmetric ciphers, digital signatures, hash functions and secure random number generators, and that's just the algorithms. Then there are all the various modes in which they can be used to encrypt blocks of data (e.g. PKCS#1 encoding for RSA, inner-CBC vs outer-CBC mode for 3DES etc). Codec.Encryption specifies what the module is for, not how it is used. Presumably there are going to be some type classes somewhere which specify whether it is a BinarySymmetricCipher like TripleDES or TextSymmetricCipher like FourWheelNavalEnigma. (These names are just examples, of course.) It's the same, for example, with FileFormat. Zip and Gzip are conceptually similar, but one supports multiple files and the other does not.
By the way, crypto hashes should go somewhere else, because they're not codecs. Same goes for crypto random-number generators.
Which suggests that Codec isn't the right name. :-/ Cheers, Andrew Bromage

G'day all. On Wed, Apr 23, 2003 at 06:08:11PM +1000, I wrote:
Codec.Encryption specifies what the module is for, not how it is used. Presumably there are going to be some type classes somewhere which specify whether it is a BinarySymmetricCipher like TripleDES or TextSymmetricCipher like FourWheelNavalEnigma.
My bad. DES is not even a codec. A cipher is an algorithm for turning an n-bit block into another n-bit (and back again) block using an m-bit key. It does not cover encoding messages longer or shorter than an n-bit block. At best it's a "codec" for a certain fixed-size binary object where the size depends, in general, on the algorithm. This table may help clarify the thinking: | Algorithm | Codec | Format ----------------+---------------+---------------+---------------- Cipher | DES | CBC-DES | SSL MAC | SHA-1 | HMAC-SHA-96 | SNMPv3 Text compress | LZ77 | Deflate | GZip Image compress | DCT | JPEG | JFIF My copy of OpenSSL, for example, supports at least nine "codecs" based on single-strength DES. Cheers, Andrew Bromage

Andrew J Bromage wrote:
This table may help clarify the thinking:
| Algorithm | Codec | Format ----------------+---------------+---------------+---------------- Cipher | DES | CBC-DES | SSL MAC | SHA-1 | HMAC-SHA-96 | SNMPv3 Text compress | LZ77 | Deflate | GZip Image compress | DCT | JPEG | JFIF
These algorithms are all normative. All compliant encoders and decoders produce the same output for the same input (they are true functions with inverses). How would you handle something like the MPEG audio and visual specs which only specify the bitstream, and not the actual algorithm? For example, two MPEG encoders with the same set of parameters can produce two totally different, yet compliant, bitstreams. Decoders, on the other hand, basically produce the same output for the same input (there is allowance for different roundoff/truncation strategies). MPEG is also complicated by the various layers, so there are multiple formats. For example, an MPEG program (a group of audio and video streams) can be stuffed into a program stream (DVD's do this) or a transport stream (digital TV via ATSC or DVB). AVI and QuickTime formats also a have a similar issue, where they really specify a file format, and you can use a variety of codecs in them. This may not be a problem, but it is something to consider. -- Matthew Donadio (m.p.donadio@ieee.org)

G'day all. On Wed, Apr 23, 2003 at 09:26:30PM -0400, Matthew Donadio wrote:
AVI and QuickTime formats also a have a similar issue, where they really specify a file format, and you can use a variety of codecs in them.
In these cases, each codec will have its own interface to plug into QuickTime. It's the same with a file format (e.g. OpenPGP) into which you can "plug in" ciphers, hashes, string-to-key algorithms etc. One issue with Deflate and Jpeg is that the "codecs" can be used in different formats in different ways (e.g. Jpeg is used in JFIF, PDF and TIFF). This is one reason why I think that generic versions of "codecs" are probably a bad idea. Far better to group by the general area which they serve (e.g. Crypto, Image, Text etc). Cheers, Andrew Bromage

Andrew J Bromage wrote:
A cipher is an algorithm for turning an n-bit block into another n-bit (and back again) block using an m-bit key. It does not cover encoding messages longer or shorter than an n-bit block. At best it's a "codec" for a certain fixed-size binary object where the size depends, in general, on the algorithm.
This table may help clarify the thinking:
| Algorithm | Codec | Format ----------------+---------------+---------------+---------------- Cipher | DES | CBC-DES | SSL MAC | SHA-1 | HMAC-SHA-96 | SNMPv3 Text compress | LZ77 | Deflate | GZip Image compress | DCT | JPEG | JFIF
This is great. I would just change one thing: in my opinion, SSL and SNMPv3 are not formats, but protocols. HTML is a format, HTTP is a protocol. The distinction should be that a format is a way of structuring data that is designed for storage, whereas a protocol is used to transfer data where there is a sender and a reciever, or a peer relationship in which multiple entities interact. Protocols are interactive, where formats are passive. An example of a crypto format might be the OpenPGP file format. An example of a MAC format (or at least, one involving a secure hash) might be the unix /etc/shadow file. Thanks, Matt Harden

G'day all. On Thu, Apr 24, 2003 at 11:22:12PM -0500, Matt Harden wrote:
This is great. I would just change one thing: in my opinion, SSL and SNMPv3 are not formats, but protocols.
Good point. I wanted a word which means "physical representation", and "format" seemed as good as any. The distinction between "protocol" and "format" is particularly important in ASN.1 land, where protocol (e.g. LDAP, Z39.50) and format (e.g. BER, XER etc) are completely independent. Cheers, Andrew Bromage

Matt Harden
I would agree with Codec.Binary.Base64.
The epithet "Binary" doesn't add any information. Almost all of the possible image and video codecs are binary-based, but that fact is really rather irrelevant to their purpose. It is also irrelevant to the purpose of Base64. I maintain that the primary purpose of Base64 is to encode something as text.
I would prefer Codec.General for any truly general-purpose codecs,
Death to "Misc"! Having a category named something like General or Miscellaneous or Utils, simply conveys no useful information.
The zip file format is a different story; I would suggest FileFormat.Archive.Zip for that.
Surely Codec.Archive.Zip? Again, a zip, tar, or cab archive is not necessarily stored in a file.
I'm not sure public key vs. symmetric deserves an additional level in the hierarchy. Each encryption codec is inherently either sym or PK, and should implement the appropriate interface for its class. I think I'd like Codec.Encryption.PublicKey to be the name of the public key class, which each PK alg. would implement.
I agree with this. Regards, Malcolm

tis 2003-04-22 klockan 12.22 skrev Simon Marlow:
Question 1: there's a comment next to 'FileFormat' which mentions that 'Codec' might be a more accurate name. It seems to me that 'Codec' would indeed be a better choice: many of the libraries under FileFormat are more like pure stream-transformers than file formats, and Codec encompasses both. What do people think about changing this? (there aren't any other implementations of libraries under FileFormat that I'm aware of).
If FileFormat became Codec, then the existing FileFormat.Encoding looks a bit odd. But moving FileFormat.Encoding.Base64 up to Codec.Base64 (similarly for FileFormat.Encoding.Yenc) would seem to make sense.
Question 2: what should the insides of the FileFormat.Encryption (or Codec.Encryption) hierarchy look like?
Having worked a bit with Unicode, I associate "codec" with Unicode encodings. Are there any plans on modules for working with Unicode codecs? I guess Char:s are meant to be Unicode characters, but I want functions for converting between String:s and UTF-8, UTF-16, Latin1 (lossy, there are translation tables iirc) et.c. depending on what the user's locale says. I guess e.g. gettext takes care of output, but input is more interesting. /Martin -- Martin Sjögren sjogren@debian.org -- marvin@dum.chalmers.se GPG key: http://www.mdstud.chalmers.se/~md9ms/gpg.html let hello = "hello" : hello in putStr (unlines hello)
participants (8)
-
Andrew J Bromage
-
David Roundy
-
John Meacham
-
Malcolm Wallace
-
Martin Sjögren
-
Matt Harden
-
Matthew Donadio
-
Simon Marlow