ANN: MissingH 0.7.2 with GZip

Hello, I'm pleased to announce the release of MissingH 0.7.2, available from http://quux.org/devel/missingh. This release incorporates Ian Lynagh's pure-Haskell Inflate algorithm, and CRC-32 and GZip file parsers of my own design, to make a pure-Haskell solution[1] to decompressing .gz files. At present, it is rather slow, as there has been little effort to optimize any of the three components. Patches to address the speed will be happily applied. The Inflate and CRC32 algorithms also provide the necessary support to be able to handle ZIP files. I plan to introduce ZIP and tar file support in upcoming releases of MissingH. Also, MissingH 0.7.0 was not announced on the library list. It introduced the ConfigParser[2] module. This module can parse the flat or hierarchical configuration files made popular by Python's ConfigParser module. The files are simple for both humans and programmers <grin>. If care is taken, it is possible to craft files that are valid both to ConfigParser and to other systems such as the Unix shell or Make. ConfigParser can also completely parse various other formats "out of the box", including SMTP headers, Cabal Setup.description files, and, to a limited extent, /etc/passwd files. All code in MissingH is pure Haskell. There is no C required. [1] http://quux.org/devel/missingh/html/MissingH.FileArchive.GZip.html [2] http://quux.org/devel/missingh/html/MissingH.ConfigParser.html -- John

On Sat, Dec 04, 2004 at 05:42:30PM -0600, John Goerzen wrote:
Hello,
I'm pleased to announce the release of MissingH 0.7.2, available from http://quux.org/devel/missingh.
This release incorporates Ian Lynagh's pure-Haskell Inflate algorithm, and CRC-32 and GZip file parsers of my own design, to make a pure-Haskell solution[1] to decompressing .gz files. At present, it is rather slow, as there has been little effort to optimize any of the three components. Patches to address the speed will be happily applied. The Inflate and CRC32 algorithms also provide the necessary support to be able to handle ZIP files. I plan to introduce ZIP and tar file support in upcoming releases of MissingH.
The FileArchive.GZip module is interesting, but I'm not sure I care for the interface. Wouldn't it be better for decompress :: String -> (String, Maybe GZipError) to be decompress :: [Word8] -> Either GZipError [Word8] And I'm not sure how hDecompress works. Could we use this with a pipe to read a gzipped file, or does it have to be able to synchronously write the entire file to the output handle? What I'd really like would be openGZippedFile :: IOMode -> FilePath -> IO Handle And of course, speed is an issue, so an optional zlib backend would be double-nice (or a really well-optimized decompressor). For now, I guess I'll stick with my tried and true (and butt ugly) zlib ffi interface, which uses threads and pipes to implement a gzOpenFile. Actually, I see that compression is not implemented either... -- David Roundy http://www.darcs.net

On 2004-12-05, David Roundy
On Sat, Dec 04, 2004 at 05:42:30PM -0600, John Goerzen wrote: The FileArchive.GZip module is interesting, but I'm not sure I care for the interface. Wouldn't it be better for
decompress :: String -> (String, Maybe GZipError)
to be
decompress :: [Word8] -> Either GZipError [Word8]
The reason it doesn't use Either GZipError String is this: There are two places where there could be a problem. First, when reading the header (not a GZip file, etc.) Secondly, if the CRC-32 of the extracted data did not match the stored CRC-32 value. To validate this part, we must first inflate all the data and calculate the CRC-32 along the way. If the return type is an Either, then this will force the system to buffer the entire file in memory and check the CRC-32 before returning anything. In other words, the laziness is lost. As for [Word8] vs. String, I'm not sure what the advantages of using a Word8 instead of a Char are. I presume a slight performance benefit? The String seemed to be handy, given hGetContents and hPutStr.
And I'm not sure how hDecompress works. Could we use this with a pipe to read a gzipped file, or does it have to be able to synchronously write the entire file to the output handle?
Yes, you could do that.
What I'd really like would be
openGZippedFile :: IOMode -> FilePath -> IO Handle
Yes, me too. Though from what I can tell, that is impossible, or at least it's impossible to do in Haskell.
And of course, speed is an issue, so an optional zlib backend would be double-nice (or a really well-optimized decompressor). For now, I guess I'll stick with my tried and true (and butt ugly) zlib ffi interface, which uses threads and pipes to implement a gzOpenFile.
Ahh. I suppose one could do that, but that sounds nasty. But you're right, I would like to see good Haskell bindings for both zlib and the bzip2 library.
Actually, I see that compression is not implemented either...
I haven't found any Deflate implementation for Haskell yet. -- John
participants (2)
-
David Roundy
-
John Goerzen