
...and again today I found myself trying to do something that would be very easy in an imperative language, but I cannot think of a single good way of doing it in Haskell. Hopfully somebody can give me some hints. I'm writing a whole bunch of data compression programs. In particular, one of them was a Huffman encoder. The encoder scans the input file, computes symbol frequencies, and builds a canonical Huffman table, which it then dumps to file. The actual compressed data goes after that. The encoder consists of two functions. One takes the Huffman table and serializes it. The other does the actual encoding. The (++) operator joins the two together before going to file. Getting the data back is mildly more tricky. There's a decoder function that takes the complete data stream, parses out the Huffman table, and returns a pair containing the Huffman table and the remaining data. Then a second decoder function actually does the Huffman decoding on this data. So far, it's all pretty easy. The size of the Huffman table determins how many bytes the decoder needs to read, so it's easy to figure out where the table ends. And then I had a brainwave. I altered the encoder so that the output from the Huffman table serialize function codes through a simple RLE encoder before going to file. Then a went and altered the decoder to... uh... oops. Now I have a problem. It's easy enough to pass the entire data stream through an RLE decoder and feed that to the Huffman table deserialize function, and it will give be back the table. But I now have *no clue* where the table ends in the original stream! If this was Java, you would write all these compression and decompression stages as stream wrappers. So you wrap the raw input stream with an RLE decoder, have the function read the Huffman table, and then take off the RLE decoder and process the rest of the stream. However, in Haskell, the reading and deserializing of the Huffman table doesn't so anything detectable to the original data stream, so you can't tell how much of it the RLE decoder actually processed. Um... help! (What I did in the end was to write the size of the RLE encoded Huffman table to the file and read it back in the decoder. But this shouldn't actually be necessary...)