
On Tue, 2009-02-24 at 12:00 +0000, Duncan Coutts wrote:
On Tue, 2009-02-24 at 10:34 +0100, Christian Maeder wrote:
checkSecurity is not needed in the API, because it is done by unpack. (checkTarBomb does nothing currently).
It's needed if you're checking a tar file now because you expect to unpack it later, eg on hackage.
Tar entries should (usually will) not be constructed by the user.
I've got a use case where we do.
So I should say what these use cases are. One is cabal-install of course. That's fairly easy it just needs to create and extract tar files. The other case is hackage. For that we want to upload and check the contents of tar files, without ever unpacking them to local files. We want to check the tar file itself to make sure it is a portable format (ie not containing any funky extensions that not all tar readers will grok) or things that are not portable between platforms, like file names that would be invalid on Windows. We also want to extract a single file in memory (the .cabal file). Another case within hackage is constructing the 00-index.tar.gz file. That is built in memory from an another internal representation of the package index. For that we really are constructing each entry ourselves, supplying all the appropriate info including file modification time, ownership etc. A final case in hackage is serving the contents of .tar files. This is to let users browse the contents of packages, eg to read the README without having to download the whole .tar.gz. Should also make all the code more easily googleable. It's also the method we want to use for serving haddock docs. Bots or package owners will upload .tar.gz bundles of documentation and we'll serve the contents directly without unpacking them. We'll do that by gunziping and storing the .tar file on disk, scanning it once to generate a file name -> (offset, length) index and then when we service a request we open the .tar file seek to the offset and return that length. I think that's it. Duncan