
On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen
Athas on #haskell wondered how many dependencies the average Haskell package had. I commented that it seemed like some fairly simple scripting to find out, and as these things tend to go, I wound up doing a complete solution myself.
First, we get most/all of Hackage locally to examine, as tarballs:
for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal fetch $package; done
I think the index tarball has all the info you need, and would be faster to retrieve / process, if you or anyone else needs to get the .cabal files again: http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb) The set of the latest package sdists is also available: http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb) --Rogan
Then we cd .cabal/packages/hackage.haskell.org
Now we can run a command which extracts the .cabal file from each tarball to standard output:
find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;
We could grep for 'build-depends' or something, but that gives unreliable dirty results. (>80k items, resulting in a hard to believe 87k total deps and an average of 27 deps.) So instead, we use the Cabal library and write a program to parse Cabal files & spit out the dependencies, and we feed each .cabal into that:
find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal" -Oxf {} | runhaskell ~/deps.hs' \;
And what is deps.hs? Turns out to be surprisingly easy to parse a String, extract the Library and Executable AST, and grab the [Dependency] field, and then print it out (code is not particularly clean):
import Distribution.Package import Distribution.PackageDescription import Distribution.PackageDescription.Parse main :: IO () main = do cbl <- getContents let desc = parsePackageDescription cbl case desc of ParseFailed _ -> return () ParseOk _ d -> putStr $ unlines $ map show $ map (\(Dependency x _) -> x) $ extractDeps d extractDeps :: GenericPackageDescription -> [Dependency] extractDeps d = ldeps ++ edeps where ldeps = case (condLibrary d) of Nothing -> [] Just c -> condTreeConstraints c edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d
So what are the results? (The output of one run is attached.) I get 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies per package.
-- gwern http://www.gwern.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe