
Hi, let me thank you perusing this!
I am not sure how useful this is going to be: + Tests lots of common and important real-world libraries. − Takes a lot of time to compile, includes CPP macros and C code. (More details in the README linked above).
another problem with the approach of taking modern real-world code: It uses a lot of non-boot libraries that are quite compiler-close and do low-level stuff (e.g. using Template Haskell, or stuff like the). If we add that not nofib, we’d have to maintain its compatibility with GHC as we continue developing GHC, probably using lots of CPP. This was less an issue with the Haskell98 code in nofib.
But is there a way to test realistic modern code without running into this problem?
what are the reasons besides fragmentation for a modern real-world test suite outside of ghc (maybe even maintained by a different set of people)? At some point you would also end up having a matrix of performance measurements due to the evolution of the library and the evolution of ghc. Fixing the library to profile against ghc will likely end at some point in incompatibility with ghc. Fixing ghc will similarly at some point end with the inability to compile the library. However if both are always updated, how could one discriminate performance regressions of the library against regressions due to changes in ghc? — What measurements did you collect? Are these broken down per module? Something I’ve recently had some success with was dumping measurements into influxdb[1] (or a similar data point collections service) and hook that up to grafana[2] for visualization. cheers, moritz — [1]: https://www.influxdata.com/ [2]: http://grafana.org/