
On Tue, 2009-02-24 at 10:34 +0100, Christian Maeder wrote:
Let me know what you think about the API and documentation. You mention above about exporting internal data structures. As far as I can see everything that is exported in the current code is needed. Let me know if you think it is too much or too little.
Ok, I think the api is too big (for a casual user). I don't want to know anything about the internals of an "Entry" or about a "TarPath". For refactoring cabal-install (using your tar package) the following interface was enough:
create :: FilePath -> FilePath -> FilePath -> IO () extract :: FilePath -> FilePath -> IO () read :: ByteString -> Entries write :: [Entry] -> ByteString pack :: FilePath -> FilePath -> IO [Entry] unpack :: FilePath -> Entries -> IO () data Entry fileName :: Entry -> FilePath fileContent :: Entry -> ByteString data Entries = Next Entry Entries | Done | Fail String
Maybe only a "isNormalFile" test-function for an Entry is missing.
isNormalFile entry = fileType entry == NormalFile No we really want one of these for every file type? isDirectory, isHardLink. I hope not. So perhaps that means you just want to expose fileType and FileType too in the top level API.
checkSecurity is not needed in the API, because it is done by unpack. (checkTarBomb does nothing currently).
It's needed if you're checking a tar file now because you expect to unpack it later, eg on hackage.
Tar entries should (usually will) not be constructed by the user.
I've got a use case where we do.
getDirectoryContentsRecursive does not really belong into this tar package.
True but it's so useful you'd have to implement it yourself if it was not provided. We can't get it into the System.Directory package very quickly.
I would be happy, if the existence of TarPath (and all the other funny entry fields) could be hidden from the user.
You'll have to trade that for some cases where you get error instead. Translating file paths to tar paths does not work in all cases. We can hide that in an IO error for functions that are in IO, like pack.
Manipulating Entries is also not a typical user task. (Maybe the type Entries should just be "[Either String Entry]", but the given type is fine, as it only allows a final failure string)
Yeah I think only one failure is good. Manipulating entries is more common than you think, if you are working with a tar file in memory and never extracting files to disk. For example the hackage-server both constructs and dissects tar files but never packs or unpacks them.
So rather than re-exporting almost everything from the other modules in the top module, I suggest my API above and simply expose all other modules in case some wants the internals.
Ok, so perhaps we should split the API into two, in the way you suggest above. I'll try that and see how it looks.
Currently I get round-trip byte-for-byte compatibility with about 50% of the .tar.gz packages on my system (I'm on gentoo so there's lots of those). The ones that are not byte-for-byte equal after reading/writing are still readable by other tools (and probably normalised and closer to standard compliant) but it needs investigating in more detail.
The checking API is incomplete (security, tarbombs, portability) and there are no tests for the lazy streaming properties yet (ie that we can process arbitrary large archives in constant space).
I can only suggest to release it soon, use it for cabal-install and make a new release of cabal-install for ghc-6.10.2
I'll not be using the tar package for cabal-install immediately. Keeping the number of dependencies low while we work out installation and packaging issues is fairly important. People already complain that cabal-install has any dependencies at all. Once it's bundled in the platform then that restriction will be lifted.
P.S. I could (darcs) send you my (humble) changes to cabal-install and tar
Please. Duncan