> On a related note, how about separating test failures from failing
> performance tests ("stat too good" / "stat not good enough")? The latter are
> important, but they seem to be much more prone to fail without good reason.

I also think this is a good idea.

https://phabricator.haskell.org/D406

Gintautas Miliauskas