
All, I'm pleased to announce a major new release of the tar package for handling ".tar" archive files. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/tar This release has a completely new and much improved API. See the hackage page for the API documentation. There are high level "all in one" functions for extracting or creating .tar files. More interestingly, it is easy to make variants by composing a pipeline. For example extracting a ".tar.gz" file is just:
Tar.unpack dir . Tar.read . GZip.decompress =<< BS.readFile tar
Or creating a ".tar.bz2" file:
BS.writeFile tar . BZip.compress . Tar.write =<< Tar.pack base dir
In addition it provides a full api for inspecting and constructing tar files without having to pack or unpack to local files. The functions are lazy which allows large archives to be processed in constant space. It is based on the tar handling code that has been in use in the cabal-install program for the last year or so. It has been tested on a large number of real world .tar.gz files so compatibility should be pretty good. There is also an 'htar' tool which is essentially a demo program for the tar library. It implements the common subset of the command line interface of the standard tar program, including gzip and bzip compression. Thanks to Christian Maeder for feedback on pre-release versions. Duncan

Duncan Coutts wrote:
All,
I'm pleased to announce a major new release of the tar package for handling ".tar" archive files.
Very nice! I'm curious -- what specific variants of the tar format can it read and write? * PAX? * GNU tar sparse files? * POSIX ustar * various pre-posix archives? * Solaris tar? * Binary and text numbers in numeric fields? -- John

On Mon, 2009-03-02 at 08:20 -0600, John Goerzen wrote:
Duncan Coutts wrote:
All,
I'm pleased to announce a major new release of the tar package for handling ".tar" archive files.
Very nice!
I'm curious -- what specific variants of the tar format can it read and write?
It can read and write basic Unix V7 format, POSIX ustar and gnu formats.
* PAX?
PAX is a compatible extension of Posix ustar. It just standardises some extra tar entry types ('x' and 'g'). These archives can be read and written but there is no special support for them. You would match on entryContents = OtherEntryType 'x' paxHeader _ -> and then parse the paxHeader which is a utf-8 file containing name/value pairs.
* GNU tar sparse files?
No support. They'll get matched as OtherEntryType 'S'. However unlike PAX, GNU sparse format puts the sparse info directly in the tar header and that is not parsed and the lib does not provide direct access to the data for you to be able to do it yourself.
* POSIX ustar
This is the standard format the library generates.
* various pre-posix archives?
Yes, at least the basic data from the V7 format. Data in the top half of the header is ignored.
* Solaris tar?
In so far as it is standard posix ustar yes. Again it uses some extra entry types like 'X' for extended info. These are preserved and you can access them but there is no special support for parsing the body of these entry types.
* Binary and text numbers in numeric fields?
Only text at the moment. Binary ones are currently recognised and rejected with an error saying binary ones are not supported. Adding support would not be terribly hard. Patches gladly accepted. I've only found one tarball that uses it (generated by the perl Archive::Tar lib). My main use case so far for the library is for software distribution with .tar.gz files, where portability is important. So I've tested with all the .tar.gz and .tar.bz files I could get my hands on (quite a few on a gentoo system). I've not looked at or tested use cases like backups where important things include large file support, preserving permissions, sparse files etc. I've tried the star program's tar torture tests, but this should be automated into a testsuite. I've done no performance tuning except to check that it works in constant space. On a cached 97m tarball (glibc) the timings on my machine are: GNU tar uncompressed: $ time tar -tf glibc-2.6.1.tar > /dev/null real 0m0.126s user 0m0.052s sys 0m0.068s Haskell tar uncompressed: $ time htar -tf glibc-2.6.1.tar > /dev/null real 0m0.617s user 0m0.572s sys 0m0.040s GNU tar compressed: $ time tar -tzf glibc-2.6.1.tar.gz > /dev/null real 0m0.938s user 0m0.880s sys 0m0.056s Haskell tar compressed: $ time htar -tzf glibc-2.6.1.tar.gz > /dev/null real 0m1.207s user 0m1.188s sys 0m0.016s So it's slower but still perfectly good. Duncan
participants (2)
-
Duncan Coutts
-
John Goerzen