
On Tue, Nov 18, 2008 at 10:02:20AM -0800, Michael Snoyman wrote:
I am trying to write some code to read flat files from a mainframe system. This includes some character fields. This is a fixed width file, so each field will have a consistent length between records, but there are fields of different length within a record. For example, I might have a "name" field length 20 and an eye color field length 5.
I am trying to use the binary library to read in this file. I've written a binary type, MFChar2, for reading in a 2-length character field. It is defined as such (you can safely ignore the ebcdicToAscii piece, it is just doing character conversion):
data MFChar2 = MFChar2 [Word8] instance Binary MFChar2 where put = undefined get = do ebcdic <- replicateM 2 getWord8 return $ MFChar2 $ map ebcdicToAscii ebcdic
What I would like to do is have some kind of generic "MFChar" data type which could take any character length, but I can't figure out how to do it. Any help would be appreciated.
Hm, interesting! The problem is that 'get' does not take any arguments, so must determine what to do from the type at which it is called. So the number of words to be read needs to be in the type. We can't put actual Int values in a type -- but there is actually a way to do what you want, by encoding natural numbers at the type level! I don't know whether this really belongs on a 'beginners' list but I couldn't resist. =) data Z -- the type representing zero data S n -- the type representing the successor of another natural -- for example, Z, S Z, and S (S Z) are types representing -- zero, one, and two. -- the n is for a type-level natural representing the length of the list. data MFChar n = MFChar [Word8] -- add a Word8 to the beginning of an MFChar, resulting in an MFChar -- one word longer mfCons :: Word8 -> MFChar n -> MFChar (S n) mfCons w (MFChar ws) = MFChar (w:ws) instance Binary (MFChar Z) where get = return $ MFChar [] instance (Binary (MFChar n)) => Binary (MFChar (S n)) where get = do ebcdic <- getWord8 rest <- get -- the correct type of get is -- inferred due to the use of mfCons below return $ mfCons (ebcdicToAscii ebcdic) rest Now if you wanted to read a field with 20 chars, you can use get :: Get (MFChar (S (S (S ... 20 S's ... Z)))) Ugly, I know. You could make it slightly more bearable by defining some type synonyms at the top of your program like type Five = S (S (S (S (S Z)))) type Ten = S (S (S (S (S Five)))) and so on. Then you can just say get :: Get (MFChar Ten) or whatever. This is untested but it (or something close to it) ought to work. Of course, you may well ask yourself whether this contortion is really worth it. Maybe it is, maybe it isn't, but I can't think of a better way to do it in Haskell. In a dependently typed language such as Agda, we could just put regular old natural numbers in the types, instead of going through contortions to encode natural numbers as types as we have to do here. So I guess the real answer to your question is "use a dependently typed language". =) If you have problems getting this to work or more questions, feel free to ask! -Brent