[Hackage] #184: cabal-install should report build results to hackage server

22 Nov 2007

      #184: cabal-install should report build results to hackage server
----------------------------+-----------------------------------------------
  Reporter:  duncan         |        Owner:         
      Type:  enhancement    |       Status:  new    
  Priority:  normal         |    Milestone:         
 Component:  cabal-install  |      Version:  1.2.2.0
  Severity:  normal         |     Keywords:         
Difficulty:  normal         |   Ghcversion:  6.4.2  
  Platform:  Linux          |  
----------------------------+-----------------------------------------------
 One way to get a lot more testing data on hackage packages is if cabal-
 install could report back to the hackage server about build successes and
 failures. This information should be kept by hackage and used to distill
 information about which platforms configurations a package builds
 successfully and which it does not. This should provide useful information
 to developers to enable them to discover problems more quickly and useful
 information to users.

 An important consideration is privacy. Users should always have the option
 to not report anything and any information they report that is kept should
 not contain identifying information. This is particularly important when
 it comes to build logs which may contain paths etc. It should be clear to
 users what information they are reporting to the can decide for themselves
 if it meets their privacy needs. Since cabal can be used to build private
 code it is vital that it reports only on packages that were obtained from
 the hackage server. It is also vital that the information is sent to the
 correct hackage server. It's possible to set up private hackage server
 instances and it'd may be useful to collect build information "in house"
 too.

 So what information would be helpful?

  * build success or failure (qualified by what failed, build, docs, tests)
  * package name and version
  * hash of .cabal file just to make sure it's the same one we're all
 talking about and to detect local modifications.
  * precise versions of dependent packages that the package was built with.
  * os and arch strings
  * compiler flavour and version
  * versions of important build tools
  * In the case of a build failure, some part of the build log would be
 helpful. This is the most problematic part from a privacy point of view.

 This is quite a bit of raw data and we can expect to have many hundreds of
 such reports for popular packages. How can we distill useful information
 from this kind of data?

 I expect we would want to do some statistical analysis where we look for
 common traits in the results. The data forms a multi-dimensional space of
 boolean values. By looking down the rows/columns of this space we should
 be able to identify trends. For example if it always fails with ghc-6.8.x,
 or always fails on Windows. Excluding those obvious failures we then may
 be able to say that it does work on mac osx (except ghc-6.8.1) or that it
 does work with regex-posix-0.92 (except on windows), etc. This should
 allow us to give a summary saying yes/no to various properties of the
 environment.

 For failure cases developers should be able to get access to the more
 detailed data including build logs.

-- 
Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/184
Hackage http://haskell.org/cabal/
Hackage: Cabal and related projects