
OK, so I just spent an entire day trying to write some code, and after hours of struggling I have something that's semi-working but very ugly and rather unreliable. My usual guideline for Haskell programming is "if it seems hard, you're doing it wrong"... After many hours of effort, I came up with these: data Writer x instance Monad Writer run_writer :: Writer () -> ByteString write1 :: Bool -> Writer () write8 :: Word8 -> Writer () write16 :: Word16 -> Writer () write32 :: Word32 -> Writer () write64 :: Word64 -> Writer () writeN :: Word8 -> Word32 -> Writer () data Reader x instance Monad Reader run_reader :: Reader x -> ByteString -> x is_EOF :: Reader Bool read1 :: Reader Bool read8 :: Reader Word8 read16 :: Reader Word16 read32 :: Reader Word32 read64 :: Reader Word64 readN :: Word8 -> Reader Word32 Notice that, unlike the Binary package, all of these are bit-aligned rather than byte-aligned. This is what took so much effort. I strived to make each operation as efficient as possible - by performing as few steps as possible. It would have been *far* easier to, say, implement read8 by calling read1 eight times and then packing the bits back into a byte. But that wouldn't be very fast. So actually read8 reads a pair of bytes, bit-shifts them as required, and then ANDs them together. And so on. (Note that the Writer monad uses Data.Binary.Builder (only) from the Binary package. I assume this is more efficient...) I even sat down and wrote a battery of QuickCheck properties because - as you can imagine - there's a hell of a lot to go wrong here! After many hours, I managed to get it to pass all tests... Next I decided to write an LZW compressor. Here's what I came up with: data EncodeLZW symbol eLZW_start :: (Ord symbol) => EncodeLZW symbol eLZW_encode :: (Ord symbol) => EncodeLZW symbol -> symbol -> Writer (EncodeLZW symbol) eLZW_end :: (Ord symbol) => EncodeLZW symbol -> Writer () data DecodeLZW symbol dLZW_start :: DecodeLZW symbol dLZW_decode :: DecodeLZW symbol -> Reader (DecodeLZW symbol, symbol) So each step of the encoder (possibly) writes some bits, and also returns a new encoder state. The decoder is similar. (At least one program bug was caused by the wrong version of the state being used somewhere!) Then I tried to run the code, and oh look, there's a bug somewhere. Emphasis "somewhere". At this point I have several KB of code and no idea how in hell to test it. What I *want* to do is single-step through the encoder algorithm, watch it write bits out and update state, and then single-step through the decoder watching it read bits in and update its state, etc. But I can't. All the encoder does is return a giant Writer object which you can't look inside, only run. And likewise for the decoder. I could try GHC's new debugger. But my experiences with it so far have shown that for all but the most trivial programs possible, it becomes intractably difficult to figure out what the debugger is actually showing you. I actually tried to debug a "normal" LZW implementation once - one that didn't involve two highly convoluted custom monads and a stateful execution model with manually threaded state. This is *not* my idea of a fun time... In a "normal" programming language, at this point you would sprinkle a few print statements around so you can see the intermediate steps the program is taking. But I can't. I'm in the wrong monad. Curses! In the end, the only solution I could come up with was to design a second, "hacked" version of the two monads. Instead of importing one module, you import another that provides the same interface. However, now Reader and Writer are aliases to IO. All write requests cause pretty-printed binary to be sent to stdout, and all read requests pop up a prompt for input from stdin. It worked reasonably well in that I could now add the vitally necessary print statements, and see intermediate values and so forth... It wasn't very pretty though. At this point I decided that lugging 5 individual functions around was silly, and I should write a class: class Encoder e where encoder_start :: e x encode :: e x -> x -> Writer (e x) encoder_end :: e x -> Writer () class Decoder d where decoder_start :: d x decode :: d x -> Reader (d x, x) 10 points to the first person to spot the fatal flaw in this plan... Yep, that's right. The type system won't accept this. The reason? x needs an Ord context. If you turn on multi-parameter type classes *and* flexible instances, apparently this works: class Encoder e x where... instance (Ord x) => Encoder EncodeLZW x where... (I haven't tried but I *presume* it means what I think it does...) Alternatively, you can avoid using dangerous type system extensions and just write class Encoder e where encode :: (Symbol x) => e x -> x -> Writer (e x) ... and make Symbol encompass everything any possible encoder could want. (In reality, x is almost always Word8 or something. But I dislike loss of generality just for the hell of it...) So at least I got that part fixed. Heh. But now I find myself worrying about yet *another* problem: is Writer lazy enough? I mean, the idea is that you can write a program that lazily reads from a file, pushes it through a Writer, and lazily writes the result back to another file. The thing should chug along reasonably fast and use a constant amount of RAM. But all this is only true of Writer is actually lazy enough. I have a horrible feeling that all the complicated Origami inside makes it too strict. And I have no way to check! Actually, thinking about it, is Reader lazy enough? You call run_reader and it hands over your data, but if that data is, say, a list, will run_reader build the entire damn list before it'll hand it over? Or will the monadic code by called as-needed to generate the list elements? Obviously I desire the latter, but I'm really not sure what the actual behaviour is... In summary, I've spent several days coding, and I still don't have a single algorithm working. I've spent all my time wrestling with the mundane details of infrastructure rather than the interesting algorithms I actually set out to implement. This makes me very sad. :-( If anybody can think of a better set of abstractions or another way to do debugging, I'd be interested to hear about it. (This would all be so trivial in an OO language. Just make an Encoder object that updates its own state internally and talks to a Source object and a Destination object for its data...)