
On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts
Something to keep in mind is memory usage. I know Jeremy is looking at this from the infrastructure side, but I think from the app side there's also some likely culprits. Cabal's GenericPackageDescription type is very large in memory. Having 10's of 1000's of these means lots of memory. One hopefully easy way to save memory here without going to the hassle of redoing Cabal's type definitions is simply to increase sharing. There's a huge amount of repeated information. Start by sharing all the package names and versions. Then there's other meta-data that rarely changes between versions of the same package. This kind of thing should be easy to evaluate, just write a test prog that reads the index file and look at peak memory use. Then try sharing stuff and see how much it drops. This sharing optimisation would still be useful even if later we go and redo GenericPackageDescription to be more compact.
This should not hold up the launch of Hackage 2 (which is very important) but I think it's an important issue that we need to address: we don't want to store the perhaps most important data the Haskell community has in an experimental data store! Creating a correct data store (i.e. ACID) that also handles a moderate amount of load is a quite difficult undertaking and it shouldn't be taken lightly. Lets stick the data in some SQL database and spend our energy on other things. :) Cheers, Johan