Using lzip instead of xz for distributed tarballs

Hello all, GHC is distributed as .tar.xz tarballs; I assume this is because it produces small tarballs. However, xz is ill-suited for archiving due to its lack of error recovery. Moreover, lzip produces smaller tarballs with GHC (I tested with ghc-8.8.2-x86_64-deb9-linux.tar) and decompression takes about the same amount of time. There's more information on the project page: https://www.nongnu.org/lzip/lzip.html. Cheers, Vanessa McHale

Vanessa McHale
Hello all,
GHC is distributed as .tar.xz tarballs; I assume this is because it produces small tarballs. However, xz is ill-suited for archiving due to its lack of error recovery. Moreover, lzip produces smaller tarballs with GHC (I tested with ghc-8.8.2-x86_64-deb9-linux.tar) and decompression takes about the same amount of time.
Indeed I recall seeing the "Why xz is not suitable for archival purposes" blog post quite a while ago and considered moving away from xz at the time but wasn't entirely convinced that the benefits would justify the churn, especially since xz tends to be pretty ubiquitous at this point while lzip is a fair bit less so. I'd be happy to hear further reasons why we should switch but I'll admit that I still don't quite see what switching would buy us; we do have a few backups spread across the planet so the probability of us having to rely on the compressor for error recovery pretty small. Cheers, - Ben

Would it be plausible to distribute both? That way users would not have to install lzip. Cheers, Vanessa McHale
On Jan 20, 2020, at 4:15 PM, Ben Gamari
wrote: Vanessa McHale
writes: Hello all,
GHC is distributed as .tar.xz tarballs; I assume this is because it produces small tarballs. However, xz is ill-suited for archiving due to its lack of error recovery. Moreover, lzip produces smaller tarballs with GHC (I tested with ghc-8.8.2-x86_64-deb9-linux.tar) and decompression takes about the same amount of time.
Indeed I recall seeing the "Why xz is not suitable for archival purposes" blog post quite a while ago and considered moving away from xz at the time but wasn't entirely convinced that the benefits would justify the churn, especially since xz tends to be pretty ubiquitous at this point while lzip is a fair bit less so.
I'd be happy to hear further reasons why we should switch but I'll admit that I still don't quite see what switching would buy us; we do have a few backups spread across the planet so the probability of us having to rely on the compressor for error recovery pretty small.
Cheers,
- Ben

On January 21, 2020 11:44:15 AM EST, Vanessa McHale
Would it be plausible to distribute both? That way users would not have to install lzip.
Cheers, Vanessa McHale
On Jan 20, 2020, at 4:15 PM, Ben Gamari
wrote: Vanessa McHale
writes: Hello all,
GHC is distributed as .tar.xz tarballs; I assume this is because it produces small tarballs. However, xz is ill-suited for archiving due to its lack of error recovery. Moreover, lzip produces smaller tarballs with GHC (I tested with ghc-8.8.2-x86_64-deb9-linux.tar) and decompression takes about the same amount of time.
Indeed I recall seeing the "Why xz is not suitable for archival purposes" blog post quite a while ago and considered moving away from xz at the time but wasn't entirely convinced that the benefits would justify the churn, especially since xz tends to be pretty ubiquitous at this point while lzip is a fair bit less so.
I'd be happy to hear further reasons why we should switch but I'll admit that I still don't quite see what switching would buy us; we do have a few backups spread across the planet so the probability of us having to rely on the compressor for error recovery pretty small.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
There is indeed precedent for this. IIRC, we distributed both bzip2 and xz tarballs for several years. I'm not opposed to offering both, the biggest cost is the storage and that is relatively minor. I have opened #17726 to track this. Cheers, - Ben
participants (3)
-
Ben Gamari
-
Ben Gamari
-
Vanessa McHale