How does GHC's testsuite work?

Dear all, I am a member of OCaml's developement team. More specifically, I am working on a test-driver for the OCaml compiler, which will be part of OCaml's 4.06 release. I am currently writing an article to describe the tool and its principles. In this article, I would like to also talk about how other compilers' testsuites are handled and loking how things are done in GHC is natural. In OCaml, our testsuite essentially consist in whole programs that we compile and run, checking that the compilation and execution results match the expected ones.
From what I could see from GHC's testsuite, it seemed to me that it uses Python to drive the tests. I also understood that the testsuite has tests that are more kind of unit-tests, in the .T file. Am I correct here? Or do you guys also have whole program tests? If you do, how do you compile and run them?
Any comment / hint on this aspect of the test harness' design would be really helpful. Many thanks in advance, Sébastien.

Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist. Edward Excerpts from Sébastien Hinderer's message of 2017-10-30 16:17:38 +0100:
Dear all,
I am a member of OCaml's developement team. More specifically, I am working on a test-driver for the OCaml compiler, which will be part of OCaml's 4.06 release.
I am currently writing an article to describe the tool and its principles. In this article, I would like to also talk about how other compilers' testsuites are handled and loking how things are done in GHC is natural.
In OCaml, our testsuite essentially consist in whole programs that we compile and run, checking that the compilation and execution results match the expected ones.
From what I could see from GHC's testsuite, it seemed to me that it uses Python to drive the tests. I also understood that the testsuite has tests that are more kind of unit-tests, in the .T file. Am I correct here? Or do you guys also have whole program tests? If you do, how do you compile and run them?
Any comment / hint on this aspect of the test harness' design would be really helpful.
Many thanks in advance,
Sébastien.

Dear Edward, Many thanks for your prompt response! Edward Z. Yang (2017/10/30 11:25 -0400):
Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist.
Oh okay! Would you be able to point me to just a few tests to get an idea of a few typical situations, please? One other question I forgot to ask: how do you deal with conditional tests? For instance, if a test should be run only on some platforms? Or, in OCaml we have tests for Fortran bindings that should be run only if a Fortran compiler is available. How would you deal with such tests? Thanks! Sébastien.

Excerpts from Sébastien Hinderer's message of 2017-10-30 16:39:24 +0100:
Dear Edward,
Many thanks for your prompt response!
Edward Z. Yang (2017/10/30 11:25 -0400):
Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist.
Oh okay! Would you be able to point me to just a few tests to get an idea of a few typical situations, please?
For example: The metadata https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail... The source file https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail... The expected error output https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail...
One other question I forgot to ask: how do you deal with conditional tests? For instance, if a test should be run only on some platforms? Or, in OCaml we have tests for Fortran bindings that should be run only if a Fortran compiler is available. How would you deal with such tests?
All managed inside the Python driver code. Example: https://github.com/ghc/ghc/blob/master/testsuite/tests/rts/all.T#L32 Edward

Dear Edward, Many thanks to you, too, for your prompt response. Edward Z. Yang (2017/10/30 11:51 -0400):
Excerpts from Sébastien Hinderer's message of 2017-10-30 16:39:24 +0100:
Dear Edward,
Many thanks for your prompt response!
Edward Z. Yang (2017/10/30 11:25 -0400):
Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist.
Oh okay! Would you be able to point me to just a few tests to get an idea of a few typical situations, please?
For example:
The metadata https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail...
The source file https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail...
The expected error output https://github.com/ghc/ghc/blob/master/testsuite/tests/typecheck/should_fail...
Excellent, thanks! With these few hints I really got the understanding I was looking for so I'm really grateful for that, thanks!
One other question I forgot to ask: how do you deal with conditional tests? For instance, if a test should be run only on some platforms? Or, in OCaml we have tests for Fortran bindings that should be run only if a Fortran compiler is available. How would you deal with such tests?
All managed inside the Python driver code.
Example: https://github.com/ghc/ghc/blob/master/testsuite/tests/rts/all.T#L32
okay thanks, awesome! Best wishes, Sébastien.

Hi Sebastien, I’m looking forward to your report, surely there will be some interesting inspirations for us. Am Montag, den 30.10.2017, 11:25 -0400 schrieb Edward Z. Yang:
Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist.
let me add that these tests rarely check the actual output of the compiler (i.e. the program, or even the simplified code). Often it is enough to check * whether the compile succeeds or fails as expected, or maybe * what messages the compiler prints. In a few cases we do dump the complete intermediate code (-ddump- simpl), but then the test case specifies a “normalization function” that checks the output for a certain property, e.g. by grepping for certain patterns. The only real unit tests that I know of are these: http://git.haskell.org/ghc.git/tree/HEAD:/testsuite/tests/callarity/unittest These are effectively programs using “GHC-the-library” Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/

Dear Joachim, Many thanks for your prompt, positive and encouraging response. Joachim Breitner (2017/10/30 13:44 -0400):
Hi Sebastien,
I’m looking forward to your report, surely there will be some interesting inspirations for us.
Thanks. Unfortunately, the paper has been written in French. i'll see whether I can find the time to translate it into english soon. If I can do so I'll definitely post a link.
Am Montag, den 30.10.2017, 11:25 -0400 schrieb Edward Z. Yang:
Actually, it's the reverse of what you said: like OCaml, GHC essentially has ~no unit tests; it's entirely Haskell programs which we compile (and sometimes run; a lot of tests are for the typechecker only so we don't bother running those.) The .T file is just a way of letting the Python driver know what tests exist.
let me add that these tests rarely check the actual output of the compiler (i.e. the program, or even the simplified code). Often it is enough to check * whether the compile succeeds or fails as expected, or maybe * what messages the compiler prints.
I see. I think it is quite similar for OCaml.
In a few cases we do dump the complete intermediate code (-ddump- simpl), but then the test case specifies a “normalization function” that checks the output for a certain property, e.g. by grepping for certain patterns.
Got it, thanks. We also have options to pretty-print the few internal representations we have, but as far as I know, we don't use any normalization function in such cases and just make a diff with an expected reference (yuk!).
The only real unit tests that I know of are these: http://git.haskell.org/ghc.git/tree/HEAD:/testsuite/tests/callarity/unittest These are effectively programs using “GHC-the-library”
Okay, thanks! Will come back with a link ASAP! Sébastien.
participants (4)
-
Edward Z. Yang
-
Joachim Breitner
-
Sébastien Hinderer
-
Wolfram Kahl