GSoC: Improving Cabal's Test Support

Hello community! I've been working on a proposal for Google Summer of Code 2010 to work on improving Cabal's test support, as described on the Haskell SoC Trac [1]. Today I'm looking for feedback to see if what I intend is what people want/need. As you read this, I kindly ask that you consider: 1) Would you use the facility I describe, were it available? and 2) What additional features would you like to see? There have been two separate suggestions (of which I am aware) of ways to integrate tests into Cabal. One is to build the tests into their own executable which uses an error code on exit to indicate test failure. The second is to have package authors write modules which Cabal will load (dynamically?) and run the tests from. The former method has the advantage of being simpler to implement, but is probably too granular. Although the second suggestion avoids some security concerns, it seems to me that a malicious party could simply put nefarious code into their Setup.hs file anyway, or even in the library being tested. I propose to build a test suite as its own executable, but to avoid the problem of granularity by producing an output file detailing the success or failure of individual tests and any relevant error messages. The format of the file would be standardized through library routines I propose to write; these routines would run tests with HUnit or QuickCheck and process them into a common format. Cabal, or any other utility, could read this file to determine the state of the test suite. Perhaps Cabal could even warn the user about installing packages with failing tests. Under this proposal, a test suite would look something like this (suppose I am writing a test suite for a module Foo, which has an existing test suite in QuickCheck):
module Main where
import Foo import Test.QuickCheck import Distribution.Test -- This module is part of the project I propose
main = runTests [ ("testBar", wrap $ testBar), ("testBaz", wrap $ testBaz) ] -- (name, test)
'runTests' and 'wrap' would be provided by 'Distribution.Test'. 'wrap' would standardize the output of test routines. For QuickCheck tests, it would probably look like:
wrap :: Testable a => a -> IO (Bool, String)
where the Bool indicates success and the String can be an error message the test produced. 'runTests' would take the list of tests, format their results, and write the output to a file:
runTests :: [(String, IO (Bool, String))] -> IO ()
I would probably gather the test results into a value of type '[(String, Bool, String)]' -- the name, status, and messages associated with each test -- and use 'show' to produce a nice, human-readable, machine-parsable file. The test suite would be included in the package description file with a stanza such as:
Test main-is: Test.hs build-depends: foo, QuickCheck, Cabal
This would take all the same options as an 'Executable' stanza, but would tell Cabal to run this executable when './Setup test' is invoked. This of course requires Cabal to support building executables that depend on the library in the same package. Since version 1.8, Cabal supposedly supports this, but my experiments indicate the support is a little broken. (GHC is invoked with the '-package-id' option, but Cabal only gives it the package name. Fixing this would naturally be on the agenda for this project.) At this point, the package author need only run: $ ./Setup configure $ ./Setup build $ ./Setup test to produce a file detailing the results of the test suite. I apologize for taking up your time with a such a lengthy message, and eagerly await your feedback! Thanks! -- Thomas Tuegel [1] http://hackage.haskell.org/trac/summer-of-code/ticket/1581

On Thu, Apr 1, 2010 at 3:52 PM, Thomas Tuegel
I propose to build a test suite as its own executable, but to avoid the problem of granularity by producing an output file detailing the success or failure of individual tests and any relevant error messages. The format of the file would be standardized through library routines I propose to write; these routines would run tests with HUnit or QuickCheck and process them into a common format. Cabal, or any other utility, could read this file to determine the state of the test suite. Perhaps Cabal could even warn the user about installing packages with failing tests.
There are a few frameworks that provide limited degrees of this functionality. I've recently added to test-framework so that the results can be gathered into an xml format that complies with at least some (maybe all?) junit xml parsers. I specifically targeted junit xml so it would be easy to use existing continuous integration systems as-is, but the format is not for haskell tests. It would be nice, for example, to see how many successful quickcheck inputs were run; and the concept of packages and classes had to be munged to work with Haskell modules and test groupings. I need to clean up the code and get it over to Max for review before it'll be widely available, but that's just a matter of finding the time (possibly next week).
module Main where
import Foo import Test.QuickCheck import Distribution.Test -- This module is part of the project I propose
main = runTests [ ("testBar", wrap $ testBar), ("testBaz", wrap $ testBaz) ] -- (name, test)
'runTests' and 'wrap' would be provided by 'Distribution.Test'. 'wrap' would standardize the output of test routines. For QuickCheck tests, it would probably look like:
This is very similar to what test-framework (and other libs.) are doing -- it's well worth looking into them.
wrap :: Testable a => a -> IO (Bool, String)
where the Bool indicates success and the String can be an error message the test produced. 'runTests' would take the list of tests, format their results, and write the output to a file:
Keep in mind that there are at least two ways a test can fail -- through errors or false assertions, and it's useful to distinguish between those. As indicated above, I think this bit of the problem has been largely solved -- at the least, there has been a lot of work on designing test frameworks for most languages, and we should be able to take advantage of that here.
The test suite would be included in the package description file with a stanza such as:
Test main-is: Test.hs build-depends: foo, QuickCheck, Cabal
I've been thinking about this as well, and I like this general idea, but I'm not (yet) convinced it's the best. That's probably just because I'm cautious though :)
This would take all the same options as an 'Executable' stanza, but would tell Cabal to run this executable when './Setup test' is invoked. This of course requires Cabal to support building executables that depend on the library in the same package. Since version 1.8, Cabal supposedly supports this, but my experiments indicate the support is a little broken. (GHC is invoked with the '-package-id' option, but Cabal only gives it the package name. Fixing this would naturally be on the agenda for this project.)
At this point, the package author need only run:
$ ./Setup configure $ ./Setup build $ ./Setup test
My general feeling has been that Setup is being discouraged in favor of using 'cabal <foo>', but I don't have any solid evidence for that (and I could very well be wrong!). They do do slightly different things, so I think it's wise to figure out which idiom is most likely to be used and work with that. --Rogan

On Thu, Apr 1, 2010 at 6:13 PM, Rogan Creswick
On Thu, Apr 1, 2010 at 3:52 PM, Thomas Tuegel
wrote: At this point, the package author need only run:
$ ./Setup configure $ ./Setup build $ ./Setup test
My general feeling has been that Setup is being discouraged in favor of using 'cabal <foo>', but I don't have any solid evidence for that (and I could very well be wrong!). They do do slightly different things, so I think it's wise to figure out which idiom is most likely to be used and work with that.
I haven't figured out how it's possible, but I'm convinced that ./Setup configure vs. cabal configure can lead to a different set of dependencies being selected. This can lead to diamond dependency problems. (I'm convinced this happen on at least one machine I know of.) What I don't understand is how it's possible for the discrepancy to happen. It's as if ./Setup and cabal-install use different algorithms for dependency resolution, but as I understand it, both should be using the Cabal library for that. My only other thought is that perhaps ./Setup uses a different version of the Cabal library than what cabal-install uses. Perhaps Duncan can comment on this. Jason

Jason Dagit
What I don't understand is how it's possible for the discrepancy to happen. It's as if ./Setup and cabal-install use different algorithms for dependency resolution, but as I understand it, both should be using the Cabal library for that. My only other thought is that perhaps ./Setup uses a different version of the Cabal library than what cabal-install uses.
My understanding is that Setup.hs will use whatever libraries you already have installed; cabal-install will use the latest allowable version of libraries. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, Apr 1, 2010 at 9:13 PM, Rogan Creswick
On Thu, Apr 1, 2010 at 3:52 PM, Thomas Tuegel
wrote: There are a few frameworks that provide limited degrees of this functionality. I've recently added to test-framework so that the results can be gathered into an xml format that complies with at least some (maybe all?) junit xml parsers. I specifically targeted junit xml so it would be easy to use existing continuous integration systems as-is, but the format is not for haskell tests. It would be nice, for example, to see how many successful quickcheck inputs were run; and the concept of packages and classes had to be munged to work with Haskell modules and test groupings.
I need to clean up the code and get it over to Max for review before it'll be widely available, but that's just a matter of finding the time (possibly next week).
Thanks for pointing this out! I also noticed that the testrunner package [1] provides almost the same functionality I described here. (In fact, it's almost exactly the same on the type level.) I wouldn't want to reinvent the wheel, and leveraging one of these existing solutions would allow me to get a lot more done over the course of the summer. I have been somewhat hesitant about using an xml format. If we want Cabal to be able to process the test results itself, which is not strictly necessary but potentially useful, using xml would have a substantial impact on Cabal's build dependencies.
The test suite would be included in the package description file with a stanza such as:
Test main-is: Test.hs build-depends: foo, QuickCheck, Cabal
I've been thinking about this as well, and I like this general idea, but I'm not (yet) convinced it's the best. That's probably just because I'm cautious though :)
Well, let me try to convince you! :) My intention is to have Cabal handle the 'Test' stanza exactly as if it were an 'Executable test' stanza. I think this gives package authors the most flexibility and greatest ease of use. It's easy to use because, if you can write an 'Executable' stanza, you can write a 'Test' stanza. At the same time, you get all of Cabal's flexibility in handling executables. Furthermore, since Cabal will build this executable as if it were any other, you can write any test suite you want! Don't buy in to the idea of a common output format for test results? Want to use a different framework? No problem! To Cabal, this is just another executable that it builds, and invokes when you ask for tests (and one that it knows not to install). I think ease of use and flexibility are the most important aspects of this plan. Part of the goal here is to entice more package authors to write test suites, which will only happen if the barrier to entry is low. At the same time, it would be nice to convert some authors with existing high-quality test suites, which requires flexibility to work with their existing tests.
At this point, the package author need only run:
$ ./Setup configure $ ./Setup build $ ./Setup test
My general feeling has been that Setup is being discouraged in favor of using 'cabal <foo>', but I don't have any solid evidence for that (and I could very well be wrong!). They do do slightly different things, so I think it's wise to figure out which idiom is most likely to be used and work with that.
The Cabal documentation wasn't entirely clear to me, regarding which is preferred. In any case, the changes I'm talking about making to Cabal would happen in 'Distribution.Simple,' and cabal-install seems to use that, so the improvements should be available to authors and users regardless of which they use. -- Thomas Tuegel [1] http://hackage.haskell.org/package/testrunner

Thomas Tuegel
There have been two separate suggestions (of which I am aware) of ways to integrate tests into Cabal. One is to build the tests into their own executable which uses an error code on exit to indicate test failure.
I personally prefer this suggestion: for my graphviz library at least, I define at least one QuickCheck property that I cannot as yet actually use (as my Arbitrary instances aren't guaranteed to produce valid Dot code, so I can't run the test that attempts to pass a value through to dot/neato).
I propose to build a test suite as its own executable, but to avoid the problem of granularity by producing an output file detailing the success or failure of individual tests and any relevant error messages. The format of the file would be standardized through library routines I propose to write; these routines would run tests with HUnit or QuickCheck and process them into a common format. Cabal, or any other utility, could read this file to determine the state of the test suite. Perhaps Cabal could even warn the user about installing packages with failing tests.
All well and good, but your sample code (which I've ommitted for the sake of brevity) doesn't seem to lead to much room for customisation: for graphviz's test suite, I include various comments about the purpose of the test, etc. as well as changing some of QuickCheck's paramaters; does your proposal allow the inclusion of such customisations? As an aside, I question the necessity of this kind of proposal: how many people actually run tests for packages they download from Hackage? graphviz's test suite runs for 110 minutes, and I mainly use it to test that my changes/inclusions in what it prints and parses is consistent: I don't expect a user to bother running it (and would question why anyone would). How does the inclusion of a "test" option to Cabal allow any substantial benefits to developers over building the package with a build-time flag to enable building a test suite? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, Apr 1, 2010 at 9:36 PM, Ivan Lazar Miljenovic
I propose to build a test suite as its own executable, but to avoid the problem of granularity by producing an output file detailing the success or failure of individual tests and any relevant error messages. The format of the file would be standardized through library routines I propose to write; these routines would run tests with HUnit or QuickCheck and process them into a common format. Cabal, or any other utility, could read this file to determine the state of the test suite. Perhaps Cabal could even warn the user about installing packages with failing tests.
All well and good, but your sample code (which I've ommitted for the sake of brevity) doesn't seem to lead to much room for customisation: for graphviz's test suite, I include various comments about the purpose of the test, etc. as well as changing some of QuickCheck's paramaters; does your proposal allow the inclusion of such customisations?
I've been looking at graphviz's test suite. You used a type
data Test = Test { name :: String , desc :: String , test :: IO Result -- ^ QuickCheck test. }
to represent tests (sorry for mangling the indentation), but this is practically the same as the type I suggested:
type Test = (String, IO (Bool, String))
The String output in the IO monad can just as easily be used to output a test description for a successful test as it can be used for an error message for a failing test. Other than that, the only difference is that you keep the entire QuickCheck Result, and I proposed just to keep a Bool (for compatibility between QuickCheck and HUnit). The use of Bool is purely illustrative; if I use one of the frameworks I've been discussing with Rogan, it will be whatever that framework uses, instead of a Bool.
As an aside, I question the necessity of this kind of proposal: how many people actually run tests for packages they download from Hackage? graphviz's test suite runs for 110 minutes, and I mainly use it to test that my changes/inclusions in what it prints and parses is consistent: I don't expect a user to bother running it (and would question why anyone would).
It's true that this proposal does much more for package authors than for users. I'm not proposing to require users to run tests for every package. At the same time, I'm sure that on the Linux side, distribution packagers would appreciate wider and more uniform availability of package tests; this proposal would help on both fronts.
How does the inclusion of a "test" option to Cabal allow any substantial benefits to developers over building the package with a build-time flag to enable building a test suite?
Admittedly, you stand to benefit less than a package author without such a complete test suite. At the very least, the proposed 'Test' stanza for the package description file would make it unnecessary for you to worry about recognizing a build-time flag and messing about with UserHooks. (As I said to Rogan, using the 'Test' stanza wouldn't lock you in to a particular framework: you could use it to compile your test suite as-is.) That lowers the barrier to entry for creating test suites so more authors will use them, and that's a substantial benefit. I also envision automated testing being offered as a service for package authors: I would like, e.g., to ensure that my package works correctly on different architectures and platforms, but I don't have the means to acquire all those different machines. Collectively, however, the community does. If we used a standard format for handling test results, we could offer a service where uploaded packages would be compiled and tested on an array of machines. Since we're using a common format, we can write one tool to parse and summarize the results across all those platforms, and authors can immediately see where their packages are failing. Certainly, any single package author could write a tool to do this for any single package's tests, but now we have a common tool everyone can use simply by writing tests. To summarize: everyone's life is easier if we do this bit of plumbing for them. -- Thomas Tuegel

Hello again! Based on the invaluable feedback I've received, I've made some revisions to the proposal I made a few days ago (at the end of this post, after my signature). I apologize for the length of my post, but I'd like once again to solicit feedback on this. Any commentary is very helpful! Thanks! -- Thomas Tuegel Throughout this proposal, examples are given to indicate how a package author would utilize the features proposed here. In all these examples, suppose that the programmer is the author of the 'haskell-foo' package, which exposes the module 'Foo' and has a single test executable, 'foo-tests', using the QuickCheck testing library. Package Description File Syntax The syntax for designating test executables in package description files will be based on the existing syntax for describing executables. Such a stanza in the hypothetical package's description file would look like:
Test foo-tests main-is: foo-tests.hs build-depends: haskell-foo, Cabal, QuickCheck
This example is obviously minimal; this is really an 'Executable' stanza by another name, so any options recognized there would also be valid here. Handling of Test Executables by Cabal The changes proposed here will make it possible to build, test, and install a Cabal package with the usual sequence of commands: $ cabal configure $ cabal build $ cabal test $ cabal install Cabal will recognize two new options during the 'configure' stage: '--enable-tests' and '--disable-tests'. If 'cabal configure' is invoked with the '--enable-tests' option, then any test executables designated in the package description file will be built. For the purposes of the 'configure' and 'build' stages, they will be handled as if they were ordinary executables, i.e., described by 'Executable' stanzas. With tests enabled, the test programs will be executed and their results collected by Cabal during the 'test' stage. If 'cabal configure' is invoked with the '--disable-tests' option (which should be the default if neither option is specified), then test executables designated in the package description file will be ignored, as if the 'Test' stanza were absent. Any attempt to invoke the 'test' stage with tests disabled should remind the user of that fact. Regardless of the status of tests (enabled or disabled), the 'install' stage will ignore any executables designated as test suites, since it is not desirable to install the test executables. Collection of Test Results Cabal will provide a standard interface, residing in the module 'Distribution.Test', for running tests independent of the testing library used. A minimal outline of this module looks like:
module Distribution.Test where
type Name = String type Result = Maybe Bool type Info = String type Output = String
-- 'Compiler' and 'ComponentLocalBuildInfo' are already provided by Cabal. -- They are included here to aid in debugging test failures type Report = (Compiler, ComponentLocalBuildInfo, [(Name, Result, Info, Output)])
class Test t where wrap :: t -> IO (Result, Info)
runTests :: Test t => [(Name, t)] -> IO Report
writeResults :: Report -> IO ()
Instances of 'Test' will run a type of test from one of the testing libraries; part of this project will therefore be patching QuickCheck and HUnit to provide these instances. Any other testing library providing this instance will also be compatible with the automated testing features this proposal introduces. The type 'Maybe Bool' is used throughout this framework to indicate a test result: 'Nothing' indicates a test was not run, 'Just False' indicates a failed test, and 'Just True' indicates a successful test. The 'Info' string captures any information provided by the testing library. However, because of the reliance of most test suites on standard output, Cabal will also capture the standard output produced during each test (when the test suite is invoked through 'cabal test'); the output will be included in the test result file. The function 'writeResults' will write the test results to a file. The 'Show' instance for the type of its single argument will therefore constitute the standard test result file format. This has the advantage of being human- and machine-readable without requiring any extra dependencies to parse the file. With this framework, the hypothetical package's author might write a test suite such as:
module Main where
import Distribution.Test import Foo import QuickCheck
testBar :: Gen Bool testBar = ...
testBaz :: Gen Bool testBaz = ...
main = runTests [("testBar", testBar), ("testBaz", testBaz)] >>= writeResults
Reporting and Comparing Test Results The 'cabal test' command will run tests by default, but support two other options: 1. '--report [file]', which will produce a nicely formatted report of the test results stored in the named file, or of the last run of the package's test suite if no file is specified, and 2. '--diff file1 file2', which will show the differences between the test results stored it two different files. Because the report file format is readily parseable by any Haskell program, it could be processed into another format for compatibility with existing tools.

Rather that starting from scratch, you should strongly consider adapting something like test-framework to this task, as it already has done the heavy work of creating a way to combine tests from different frameworks into a single suite and includes such features as displaying a progress bar during the QuickCheck tests. Furthermore, it is easily extendable to support new kinds of tests; for example, I found that it was relatively straightforward to add a new kind of "statistical" test to make sure that the average value of a function where where it should be. Cheers, Greg On Apr 6, 2010, at 3:51 PM, Thomas Tuegel wrote:
Hello again!
Based on the invaluable feedback I've received, I've made some revisions to the proposal I made a few days ago (at the end of this post, after my signature). I apologize for the length of my post, but I'd like once again to solicit feedback on this. Any commentary is very helpful!
Thanks! -- Thomas Tuegel
Throughout this proposal, examples are given to indicate how a package author would utilize the features proposed here. In all these examples, suppose that the programmer is the author of the 'haskell-foo' package, which exposes the module 'Foo' and has a single test executable, 'foo-tests', using the QuickCheck testing library.
Package Description File Syntax
The syntax for designating test executables in package description files will be based on the existing syntax for describing executables. Such a stanza in the hypothetical package's description file would look like:
Test foo-tests main-is: foo-tests.hs build-depends: haskell-foo, Cabal, QuickCheck
This example is obviously minimal; this is really an 'Executable' stanza by another name, so any options recognized there would also be valid here.
Handling of Test Executables by Cabal
The changes proposed here will make it possible to build, test, and install a Cabal package with the usual sequence of commands:
$ cabal configure $ cabal build $ cabal test $ cabal install
Cabal will recognize two new options during the 'configure' stage: '--enable-tests' and '--disable-tests'.
If 'cabal configure' is invoked with the '--enable-tests' option, then any test executables designated in the package description file will be built. For the purposes of the 'configure' and 'build' stages, they will be handled as if they were ordinary executables, i.e., described by 'Executable' stanzas. With tests enabled, the test programs will be executed and their results collected by Cabal during the 'test' stage.
If 'cabal configure' is invoked with the '--disable-tests' option (which should be the default if neither option is specified), then test executables designated in the package description file will be ignored, as if the 'Test' stanza were absent. Any attempt to invoke the 'test' stage with tests disabled should remind the user of that fact.
Regardless of the status of tests (enabled or disabled), the 'install' stage will ignore any executables designated as test suites, since it is not desirable to install the test executables.
Collection of Test Results
Cabal will provide a standard interface, residing in the module 'Distribution.Test', for running tests independent of the testing library used. A minimal outline of this module looks like:
module Distribution.Test where
type Name = String type Result = Maybe Bool type Info = String type Output = String
-- 'Compiler' and 'ComponentLocalBuildInfo' are already provided by Cabal. -- They are included here to aid in debugging test failures type Report = (Compiler, ComponentLocalBuildInfo, [(Name, Result, Info, Output)])
class Test t where wrap :: t -> IO (Result, Info)
runTests :: Test t => [(Name, t)] -> IO Report
writeResults :: Report -> IO ()
Instances of 'Test' will run a type of test from one of the testing libraries; part of this project will therefore be patching QuickCheck and HUnit to provide these instances. Any other testing library providing this instance will also be compatible with the automated testing features this proposal introduces.
The type 'Maybe Bool' is used throughout this framework to indicate a test result: 'Nothing' indicates a test was not run, 'Just False' indicates a failed test, and 'Just True' indicates a successful test. The 'Info' string captures any information provided by the testing library. However, because of the reliance of most test suites on standard output, Cabal will also capture the standard output produced during each test (when the test suite is invoked through 'cabal test'); the output will be included in the test result file.
The function 'writeResults' will write the test results to a file. The 'Show' instance for the type of its single argument will therefore constitute the standard test result file format. This has the advantage of being human- and machine-readable without requiring any extra dependencies to parse the file.
With this framework, the hypothetical package's author might write a test suite such as:
module Main where
import Distribution.Test import Foo import QuickCheck
testBar :: Gen Bool testBar = ...
testBaz :: Gen Bool testBaz = ...
main = runTests [("testBar", testBar), ("testBaz", testBaz)] >>= writeResults
Reporting and Comparing Test Results
The 'cabal test' command will run tests by default, but support two other options:
1. '--report [file]', which will produce a nicely formatted report of the test results stored in the named file, or of the last run of the package's test suite if no file is specified, and 2. '--diff file1 file2', which will show the differences between the test results stored it two different files.
Because the report file format is readily parseable by any Haskell program, it could be processed into another format for compatibility with existing tools. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Apr 6, 2010 at 7:03 PM, Gregory Crosswhite
Rather that starting from scratch, you should strongly consider adapting something like test-framework to this task, as it already has done the heavy work of creating a way to combine tests from different frameworks into a single suite and includes such features as displaying a progress bar during the QuickCheck tests. Furthermore, it is easily extendable to support new kinds of tests; for example, I found that it was relatively straightforward to add a new kind of "statistical" test to make sure that the average value of a function where where it should be.
Thanks for the suggestion! I've taken a hard look at test-framework in particular already. I like it, but the reason I haven't already chosen to go with that is that it does a lot _more_ than I'm trying to do here. The way my proposal stands, whatever framework I go with gets integrated with Cabal at least to the degree that Cabal can read test results. Rogan suggested that test-framework will shortly support JUnit XML as an output format, so with the rest of my proposal we'd be talking about adding an XML parsing library to Cabal's dependencies. Now, if you're telling me I'm going off in the wrong direction by proposing to integrate a test framework into Cabal itself, that's another story. Should I pare down my proposal to only include support for a proper 'Test' stanza in the package description file and leave it at that? Thanks! -- Thomas Tuegel

On Apr 6, 2010, at 4:40 PM, Thomas Tuegel wrote:
Now, if you're telling me I'm going off in the wrong direction by proposing to integrate a test framework into Cabal itself, that's another story. Should I pare down my proposal to only include support for a proper 'Test' stanza in the package description file and leave it at that?
Yes, I think that would be a better use of your time, and furthermore if you arrange this feature so that the programmer can use any test framework that they like, it has the advantage that you are not entrenching a particular way of combining tests from different modules. If you want there to be communication between Cabal and the test suite, it should be something simple like passing in a location to where the test results should be output (or a flag indicating that they should be written to the screen in "verbose" mode), and having Cabal in return receive from the test runner a single string with a summary of the test results to show to the user (when in "summary" as opposed to "verbose" mode) and possibly a flag if there were any failures so that the user can be specifically alerted to this fact and pointed to the full test report. Even if all you do is add systematic support for automatically building and running tests automatically as part of the build process, it would still be a major improvement; it would be especially awesome if you made it possible to separate out the test framework dependencies from the package dependencies, since at the moment people who want to include tests with their package often feel the need to include dependencies on packages such as QuickCheck in order to make life easier for themselves, but if you could modify cabal so it would only pull in such packages when the user requests that tests be run it would be awesome! Cheers, Greg

On Tue, Apr 6, 2010 at 4:03 PM, Gregory Crosswhite
Rather that starting from scratch, you should strongly consider adapting something like test-framework to this task, as it already has done the heavy work of creating a way to combine tests from different frameworks into a single suite
I want to second this -- test-framework would be a good place to start, and you would be able to accomplish quite a lot more during the summer. Your proposal addresses (at least!) two different problems: * updating cabal so that it can handle the build/test process; and, * combining HUnit / QuickCheck / etc. to present a uniform interface. test-framework and test-runner both address the second problem, and those solutions can be kept separate, at least for now. Figuring out the best way to specify test commands, dependencies, build/execution order, etc. is going to take some substantial effort, and I think that should be the first goal of the project. More comments in-line below... On Apr 6, 2010, at 3:51 PM, Thomas Tuegel wrote:
Package Description File Syntax
The syntax for designating test executables in package description files will be based on the existing syntax for describing executables. Such a stanza in the hypothetical package's description file would look like:
Test foo-tests main-is: foo-tests.hs build-depends: haskell-foo, Cabal, QuickCheck
This example is obviously minimal; this is really an 'Executable' stanza by another name, so any options recognized there would also be valid here.
Cabal allows for multiple executable sections -- are multiple test sections allowed? If so, how are they handled when 'cabal test' is invoked? If not, will there be any support for running multiple test suites? (more on this below). While the test executable could be configured to run different sets of tests (at runtime? hm.. we may need more flags to 'cabal test'), there are some situations it's necessary to build multiple test suites because of odd library dependencies. (for example, testing certain combinations of libraries--imagine supporting multiple versions of ghc.) The existing Executable sections may serve the need fine, if we could specify how to run the tests in a different way. Perhaps a list of test commands could be specified instead, eg:
TestCommands: foo-test-ghc6.6, foo-test-ghc6.8, foo-props --all
Anyhow, just food for thought.
described by 'Executable' stanzas. With tests enabled, the test programs will be executed and their results collected by Cabal during the 'test' stage.
Where are the results collected, and in what format? My preference is to choose a sensible default (dist/test-results?) and allow it to be overridden in the cabal file.
module Distribution.Test where
type Name = String type Result = Maybe Bool
I think you're reinventing the wheel a bit here, (see comments above about test-framework). That aside, Result is still too restrictive. Ignored tests may well need justification (why were they not run?). There may also be multiple ways to ignore tests, and it isn't clear to me what those are, or which are important. I also feel pretty strongly that Result should distinguish between test failures and tests that failed due to exceptional circumstances. I don't know of any frameworks that do this in Haskell yet, but it has proven to be a useful distinction in other languages. I'm not commenting on the rest of the framework proposal because I don't think the point of this SoC project is to write another testing framework.
The 'cabal test' command will run tests by default, but support two other options:
1. '--report [file]', which will produce a nicely formatted report of the test results stored in the named file, or of the last run of the package's test suite if no file is specified, and 2. '--diff file1 file2', which will show the differences between the test results stored it two different files.
See my comments about running multiple test suites, and parameterized test suites above. I think richer parameters are necessary. (possibly just a --pass-through flag that hands all the subsequent parameters off to the test executable(s)) --Rogan

On Tue, Apr 6, 2010 at 7:43 PM, Rogan Creswick
test-framework and test-runner both address the second problem, and those solutions can be kept separate, at least for now. Figuring out the best way to specify test commands, dependencies, build/execution order, etc. is going to take some substantial effort, and I think that should be the first goal of the project.
Ok, this is the bottom-line that I didn't understand after our first exchange, but I think now I do: I should entirely scrap the second aspect of my proposal and focus exclusively on making Cabal build and run test programs.
Cabal allows for multiple executable sections -- are multiple test sections allowed? If so, how are they handled when 'cabal test' is invoked? If not, will there be any support for running multiple test suites? (more on this below).
While the test executable could be configured to run different sets of tests (at runtime? hm.. we may need more flags to 'cabal test'), there are some situations it's necessary to build multiple test suites because of odd library dependencies. (for example, testing certain combinations of libraries--imagine supporting multiple versions of ghc.)
I had intended to allow multiple 'Executable' sections; that's what I meant by "this is really an 'Executable' stanza by another name". I think that takes care of your concern about building multiple test suites with different dependencies, also. My thinking is that 'cabal test' should run all the tests that were built, and 'cabal test foo' should run only the test named 'foo'.
The existing Executable sections may serve the need fine, if we could specify how to run the tests in a different way. Perhaps a list of test commands could be specified instead, eg:
TestCommands: foo-test-ghc6.6, foo-test-ghc6.8, foo-props --all
Anyhow, just food for thought.
One of the reasons I prefer implementing a dedicated 'Test' stanza is that it makes it easier to tell which executables to install. There may be situations where we want to install test programs, but there will _always_ be situations where we _don't_ want to. (Maybe '--{en,dis}able-tests' passed to 'cabal install' should control this?) Could we take a 'TestCommands' list and parse out the options to get the executable names? Yes, but relying on that to work makes me uneasy. I think the more robust way to specify executable-specific options instead would be to add a field to the 'Test' section ('run-options' seems to be consistent with the naming scheme) that doesn't exist in 'Executable'. (I've snipped some of the comments here; if forget about test frameworks for this proposal, it's all tangential.)
See my comments about running multiple test suites, and parameterized test suites above. I think richer parameters are necessary. (possibly just a --pass-through flag that hands all the subsequent parameters off to the test executable(s))
That is an excellent suggestion that I will definitely adopt in my eventual proposal. Thanks! -- Thomas Tuegel

On 7 April 2010 10:28, Thomas Tuegel
Ok, this is the bottom-line that I didn't understand after our first exchange, but I think now I do: I should entirely scrap the second aspect of my proposal and focus exclusively on making Cabal build and run test programs.
Just as a food for thought: how would you deal with test programs that deal with user input? Should "cabal test" only run those that are automatic? The reason I'm asking is that I'm developing a second test executable for graphviz that checks if it can parse "real world" Dot files. Now, I could have it do a search for all .dot and .gv files on the machine in question and try those, except not all of those might be Dot code (just something with the same extension); as such the user has to explicitly pass a list of filenames to the program. This is thus more of a guided test program rather than something that should be run automatically. Also, I'm not sure if this is come up yet in your proposal: what about installation? One thing that Sergei (aka trofi) is trying to add in the Gentoo ebuilds for Haskell packages is something similar to this where tests are optionally built and run; the problem is that if the tests reside in an optional executable then the installation stage will also install these test executables because Cabal has no way of saying "build these but don't install them". This is probably going to be the case for your proposed project as well: should test executables be installed? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Apr 6, 2010, at 5:35 PM, Ivan Miljenovic wrote:
On 7 April 2010 10:28, Thomas Tuegel
wrote: Ok, this is the bottom-line that I didn't understand after our first exchange, but I think now I do: I should entirely scrap the second aspect of my proposal and focus exclusively on making Cabal build and run test programs.
Just as a food for thought: how would you deal with test programs that deal with user input? Should "cabal test" only run those that are automatic?
Yes, I personally think that tests which are automatically run should be self-contained so that they require no additional intervention by the user. However, one could conceivably flag some tests as being "manual" so that they are only run when chosen specifically (i.e. if "foo" is manual then it is only run with "cabal run foo"). Cheers, Greg

On 7 April 2010 10:45, Gregory Crosswhite
Yes, I personally think that tests which are automatically run should be self-contained so that they require no additional intervention by the user. However, one could conceivably flag some tests as being "manual" so that they are only run when chosen specifically (i.e. if "foo" is manual then it is only run with "cabal run foo").
Sounds good to me. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Apr 6, 2010, at 5:28 PM, Thomas Tuegel wrote:
On Tue, Apr 6, 2010 at 7:43 PM, Rogan Creswick
wrote: test-framework and test-runner both address the second problem, and those solutions can be kept separate, at least for now. Figuring out the best way to specify test commands, dependencies, build/execution order, etc. is going to take some substantial effort, and I think that should be the first goal of the project.
Ok, this is the bottom-line that I didn't understand after our first exchange, but I think now I do: I should entirely scrap the second aspect of my proposal and focus exclusively on making Cabal build and run test programs.
I concur with this conclusion.
Cabal allows for multiple executable sections -- are multiple test sections allowed? If so, how are they handled when 'cabal test' is invoked? If not, will there be any support for running multiple test suites? (more on this below).
While the test executable could be configured to run different sets of tests (at runtime? hm.. we may need more flags to 'cabal test'), there are some situations it's necessary to build multiple test suites because of odd library dependencies. (for example, testing certain combinations of libraries--imagine supporting multiple versions of ghc.)
I had intended to allow multiple 'Executable' sections; that's what I meant by "this is really an 'Executable' stanza by another name". I think that takes care of your concern about building multiple test suites with different dependencies, also.
My thinking is that 'cabal test' should run all the tests that were built, and 'cabal test foo' should run only the test named 'foo'.
This sounds like a good idea to me.
The existing Executable sections may serve the need fine, if we could specify how to run the tests in a different way. Perhaps a list of test commands could be specified instead, eg:
TestCommands: foo-test-ghc6.6, foo-test-ghc6.8, foo-props --all
Anyhow, just food for thought.
One of the reasons I prefer implementing a dedicated 'Test' stanza is that it makes it easier to tell which executables to install. There may be situations where we want to install test programs, but there will _always_ be situations where we _don't_ want to. (Maybe '--{en,dis}able-tests' passed to 'cabal install' should control this?) Could we take a 'TestCommands' list and parse out the options to get the executable names? Yes, but relying on that to work makes me uneasy. I think the more robust way to specify executable-specific options instead would be to add a field to the 'Test' section ('run-options' seems to be consistent with the naming scheme) that doesn't exist in 'Executable'.
(I've snipped some of the comments here; if forget about test frameworks for this proposal, it's all tangential.)
I agree with your reasoning here about having a separate Test stanza so that tests can be kept separate from the rest of the package. - Greg

I apologize for spamming, but it only just occurred to me how to get
the best aspects of both our ideas:
On Tue, Apr 6, 2010 at 8:28 PM, Thomas Tuegel
On Tue, Apr 6, 2010 at 7:43 PM, Rogan Creswick
wrote: The existing Executable sections may serve the need fine, if we could specify how to run the tests in a different way. Perhaps a list of test commands could be specified instead, eg:
TestCommands: foo-test-ghc6.6, foo-test-ghc6.8, foo-props --all
Anyhow, just food for thought.
One of the reasons I prefer implementing a dedicated 'Test' stanza is that it makes it easier to tell which executables to install. There may be situations where we want to install test programs, but there will _always_ be situations where we _don't_ want to. (Maybe '--{en,dis}able-tests' passed to 'cabal install' should control this?) Could we take a 'TestCommands' list and parse out the options to get the executable names? Yes, but relying on that to work makes me uneasy. I think the more robust way to specify executable-specific options instead would be to add a field to the 'Test' section ('run-options' seems to be consistent with the naming scheme) that doesn't exist in 'Executable'.
Suppose we adopt your suggestion and let test programs be ordinary executables in ordinary 'Executable' sections, and make 'Test' sections that look like:
Test foo-1 exe-is: foo options: --enable-bar --disable-baz
Test foo-2 exe-is: foo options: --enable-bar --enable-baz
Test some-other-test exe-is: some-other-test options: --something-completely-different
Now we get named sets of options but don't have to worry about parsing through to find the executable names! It also solves the problem of running test suites with multiple sets of options at runtime. -- Thomas Tuegel

Sounds like a good plan to me! You still need some way to separate out the test dependencies from the main build dependencies, though. Perhaps as a separate line in the main Cabal header section? Also, per my earlier comment, it would be good if there were also an "Automatic" option in each test section that defaults to True and specifies whether a test is included automatically in "cabal test" or whether it needs to be specified manually. Cheers, Greg On Apr 6, 2010, at 5:49 PM, Thomas Tuegel wrote:
I apologize for spamming, but it only just occurred to me how to get the best aspects of both our ideas:
On Tue, Apr 6, 2010 at 8:28 PM, Thomas Tuegel
wrote: On Tue, Apr 6, 2010 at 7:43 PM, Rogan Creswick
wrote: The existing Executable sections may serve the need fine, if we could specify how to run the tests in a different way. Perhaps a list of test commands could be specified instead, eg:
TestCommands: foo-test-ghc6.6, foo-test-ghc6.8, foo-props --all
Anyhow, just food for thought.
One of the reasons I prefer implementing a dedicated 'Test' stanza is that it makes it easier to tell which executables to install. There may be situations where we want to install test programs, but there will _always_ be situations where we _don't_ want to. (Maybe '--{en,dis}able-tests' passed to 'cabal install' should control this?) Could we take a 'TestCommands' list and parse out the options to get the executable names? Yes, but relying on that to work makes me uneasy. I think the more robust way to specify executable-specific options instead would be to add a field to the 'Test' section ('run-options' seems to be consistent with the naming scheme) that doesn't exist in 'Executable'.
Suppose we adopt your suggestion and let test programs be ordinary executables in ordinary 'Executable' sections, and make 'Test' sections that look like:
Test foo-1 exe-is: foo options: --enable-bar --disable-baz
Test foo-2 exe-is: foo options: --enable-bar --enable-baz
Test some-other-test exe-is: some-other-test options: --something-completely-different
Now we get named sets of options but don't have to worry about parsing through to find the executable names! It also solves the problem of running test suites with multiple sets of options at runtime.
-- Thomas Tuegel _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Apr 6, 2010 at 9:00 PM, Gregory Crosswhite
Sounds like a good plan to me!
You still need some way to separate out the test dependencies from the main build dependencies, though. Perhaps as a separate line in the main Cabal header section?
I think I can implement this inside Cabal in the conversion from GenericPackageDescription to PackageDescription, as if it were implemented all in terms of conditionals. So, if tests are enabled, the test executables are in the PackageDescription and their dependencies get pulled in; if tests are disabled, the executables get cut from the PackageDescription. Am I missing something? -- Thomas Tuegel

As I understand it, there is only one place in the Cabal package where *build* dependencies appear, in contrast to lines in the Library and Executable sections where the *install* dependencies appear (i.e., you might not need all of the build dependences around merely to use the library or programs). So there would need to be a way of specifying in the .cabal file in a convenient way (i.e., hopefully without having the package writer manually specify conditionals) that some of the *build* dependencies only apply if one is building the test packages. Cheers, Greg On Apr 6, 2010, at 6:37 PM, Thomas Tuegel wrote:
On Tue, Apr 6, 2010 at 9:00 PM, Gregory Crosswhite
wrote: Sounds like a good plan to me!
You still need some way to separate out the test dependencies from the main build dependencies, though. Perhaps as a separate line in the main Cabal header section?
I think I can implement this inside Cabal in the conversion from GenericPackageDescription to PackageDescription, as if it were implemented all in terms of conditionals. So, if tests are enabled, the test executables are in the PackageDescription and their dependencies get pulled in; if tests are disabled, the executables get cut from the PackageDescription. Am I missing something?
-- Thomas Tuegel

On Tue, Apr 6, 2010 at 5:49 PM, Thomas Tuegel
Suppose we adopt your suggestion and let test programs be ordinary executables in ordinary 'Executable' sections, and make 'Test' sections that look like:
Test foo-1 exe-is: foo options: --enable-bar --disable-baz
Test foo-2 exe-is: foo options: --enable-bar --enable-baz
Test some-other-test exe-is: some-other-test options: --something-completely-different
Now we get named sets of options but don't have to worry about parsing through to find the executable names! It also solves the problem of running test suites with multiple sets of options at runtime.
Ah, I like this too, as well as Gregory's suggestion re: "Automatic" options. --Rogan
-- Thomas Tuegel

I think that, rather than having Cabal try to combine the results of different frameworks, Cabal should specify interfaces that frameworks need to conform to. E.g., rather than integrating test-framework into Cabal so that HUnit works with it, modify HUnit so it emits the format that Cabal wants. And modify test-framework to emit the format that Cabal wants so, if someone can't convert their test suite to the CabalTest format, test-framework can act as an intermediary and handle the conversion of output. Richard G. On 10-04-06 5:03 PM, Gregory Crosswhite wrote:
Rather that starting from scratch, you should strongly consider adapting something like test-framework to this task, as it already has done the heavy work of creating a way to combine tests from different frameworks into a single suite and includes such features as displaying a progress bar during the QuickCheck tests. Furthermore, it is easily extendable to support new kinds of tests; for example, I found that it was relatively straightforward to add a new kind of "statistical" test to make sure that the average value of a function where where it should be.
Cheers, Greg

On Wed, Apr 28, 2010 at 4:30 AM, Richard G.
I think that, rather than having Cabal try to combine the results of different frameworks, Cabal should specify interfaces that frameworks need to conform to.
E.g., rather than integrating test-framework into Cabal so that HUnit works with it, modify HUnit so it emits the format that Cabal wants. And modify test-framework to emit the format that Cabal wants so, if someone can't convert their test suite to the CabalTest format, test-framework can act as an intermediary and handle the conversion of output.
I think this is what we've ultimately decided to do, although we have yet to decide exactly what format Cabal should expect test results to be in. I realize that it's a little difficult to follow this discussion, since the proposal was in a state of flux. I think you can read my proposal at the GSoC site, but the proposal submission form kinda mangled my formatting. There is a public Google Documents version of my proposal at https://docs.google.com/Doc?docid=0AZzNFnSY9FOeZGd6MnQ4cWNfM2Q2N2J0OWZn&hl=en which should be up-to-date and contain all the information you need. -- Thomas Tuegel

On Tue, 2010-04-06 at 18:51 -0400, Thomas Tuegel wrote:
Hello again!
Based on the invaluable feedback I've received, I've made some revisions to the proposal I made a few days ago (at the end of this post, after my signature). I apologize for the length of my post, but I'd like once again to solicit feedback on this. Any commentary is very helpful!
Hia Thomas.
Package Description File Syntax
The syntax for designating test executables in package description files will be based on the existing syntax for describing executables. Such a stanza in the hypothetical package's description file would look like:
Test foo-tests main-is: foo-tests.hs build-depends: haskell-foo, Cabal, QuickCheck
One feature that I consider to be vital (and as Cabal maintainer I get to say that kind of thing! ;-) ) is that the stanza must specify the "testing interface" that it supports. The importance of this is that it lets us develop improved testsuite interfaces in future. At the moment there are two test interfaces we want to support. One is the simple unix style "exit code + stdout" interface. This is good because it is a lowest common denominator that all existing testsuites can fit into. Of course that test interface does not provide any detailed machine-readable information (though you do get human-readable test logs). So that's why we want a second interface. That one should let the "testing agent" (for example "cabal test" but could be other agents) get much more detail about what tests can be run and then what the results are of various tests. The details of such an interface are up for discussion. I do not mind if that is a command line executable interface or a library interface.
Handling of Test Executables by Cabal
The changes proposed here will make it possible to build, test, and install a Cabal package with the usual sequence of commands:
That all sounds reasonable. I'd like to add that the interface between the testsuite and a testing agent such as the cabal program should be clearly documented and specified. There will likely be dedicated test agents that want to run the tests too and send reports to other systems (e.g. dedicated hackage test agents) and convert to other formats (e.g. integration in companies in-house systems). A "cabal test" user interface is obviously great for developers. Gregory makes a goof suggestion about using or adapting the existing test-framework package. That was also something I was thinking about. It would be good to work with the maintainer of the test-framework package so that it can be used to implement the interface that Cabal specifies. Duncan

On 4/7/10 12:33 PM, Duncan Coutts wrote:
The importance of this is that it lets us develop improved testsuite interfaces in future. At the moment there are two test interfaces we want to support. One is the simple unix style "exit code + stdout" interface. This is good because it is a lowest common denominator that all existing testsuites can fit into.
I don't know how, but maybe http://hackage.haskell.org/package/shelltestrunner can help in this niche ?

On Wed, Apr 7, 2010 at 3:33 PM, Duncan Coutts
The importance of this is that it lets us develop improved testsuite interfaces in future. At the moment there are two test interfaces we want to support. One is the simple unix style "exit code + stdout" interface. This is good because it is a lowest common denominator that all existing testsuites can fit into.
Of course that test interface does not provide any detailed machine-readable information (though you do get human-readable test logs). So that's why we want a second interface. That one should let the "testing agent" (for example "cabal test" but could be other agents) get much more detail about what tests can be run and then what the results are of various tests.
For the purpose of differentiating between these two, would a field in the test section such as "interface: stdout" (in the first case) or "interface: detailed" (in the second) suffice?
The details of such an interface are up for discussion. I do not mind if that is a command line executable interface or a library interface.
That's something I've been thinking about. The former seems more portable. Maybe cabal could call the test program with "test_program --list" to produce a list of tests and "test_program test1,test2,test3" to run some tests. I also want to ask how strictly this is within the scope of the SoC project, i.e. how well will my proposal be received if it focuses primarily on the first part of the problem (getting everything working for the stdout interface)? I ask because the detailed interface seems like a much larger mandate given that cabal doesn't really support any of the syntax/features for the simple stdout interface.
Handling of Test Executables by Cabal
The changes proposed here will make it possible to build, test, and install a Cabal package with the usual sequence of commands:
That all sounds reasonable. I'd like to add that the interface between the testsuite and a testing agent such as the cabal program should be clearly documented and specified. There will likely be dedicated test agents that want to run the tests too and send reports to other systems (e.g. dedicated hackage test agents) and convert to other formats (e.g. integration in companies in-house systems).
Rogan mentioned possible upcoming support in test-framework for JUnit XML as an output format for test results. That certainly seems to be widely supported; do you think it is suitable? -- Thomas Tuegel

On Wed, 2010-04-07 at 16:09 -0400, Thomas Tuegel wrote:
On Wed, Apr 7, 2010 at 3:33 PM, Duncan Coutts
wrote: The importance of this is that it lets us develop improved testsuite interfaces in future. At the moment there are two test interfaces we want to support. One is the simple unix style "exit code + stdout" interface. This is good because it is a lowest common denominator that all existing testsuites can fit into.
Of course that test interface does not provide any detailed machine-readable information (though you do get human-readable test logs). So that's why we want a second interface. That one should let the "testing agent" (for example "cabal test" but could be other agents) get much more detail about what tests can be run and then what the results are of various tests.
For the purpose of differentiating between these two, would a field in the test section such as "interface: stdout" (in the first case) or "interface: detailed" (in the second) suffice?
Yep.
The details of such an interface are up for discussion. I do not mind if that is a command line executable interface or a library interface.
That's something I've been thinking about. The former seems more portable. Maybe cabal could call the test program with "test_program --list" to produce a list of tests and "test_program test1,test2,test3" to run some tests.
Having some ideas is good. The details of the interface can be worked out during the project.
I also want to ask how strictly this is within the scope of the SoC project, i.e. how well will my proposal be received if it focuses primarily on the first part of the problem (getting everything working for the stdout interface)? I ask because the detailed interface seems like a much larger mandate given that cabal doesn't really support any of the syntax/features for the simple stdout interface.
Certainly we want to get the first part working first. My guess is that there is time within the 3-month GSoC period to complete the basic bits and to at least have a go at a prototype of a more detailed interface. That part doesn't need to be final, especially given that it will be possible to create new interfaces in future.
Handling of Test Executables by Cabal
The changes proposed here will make it possible to build, test, and install a Cabal package with the usual sequence of commands:
That all sounds reasonable. I'd like to add that the interface between the testsuite and a testing agent such as the cabal program should be clearly documented and specified. There will likely be dedicated test agents that want to run the tests too and send reports to other systems (e.g. dedicated hackage test agents) and convert to other formats (e.g. integration in companies in-house systems).
Rogan mentioned possible upcoming support in test-framework for JUnit XML as an output format for test results. That certainly seems to be widely supported; do you think it is suitable?
I think it's important to be able to convert into standard or custom formats. I've no idea if JUnit XML would make sense as the native format. It's plausible. Duncan

On Thu, Apr 8, 2010 at 5:53 AM, Duncan Coutts
I think it's important to be able to convert into standard or custom formats. I've no idea if JUnit XML would make sense as the native format. It's plausible.
I hadn't really thought about cabal, itself, being a consumer for test results -- but I like your (Duncan's) points about defining a testing interface, and keeping it extensible. For the record: I don't think junit xml is a good choice for a native format :), but I do think it's a good format to start with simply because there are many tools that can consume it already. --Rogan
Duncan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

My thanks to all of you for your help! I've submitted my proposal as of this afternoon. I've done my best to ensure that the fruits of this discussion are represented there. As an aside, the Google's form has seriously mangled my formatting; if anyone here has past experience and/or pointers, I'd love to hear them! Thanks! -- Thomas Tuegel

I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it. Test <n>:<Result> <Location> <Error message> E.g., Test 1:Passed src/Some/File.hs:23 Test 2:Failed src/Some/File.hs:27 Expecting `4'; received `5'. Test 3:Error src/Some/OtherFile.hs:39 Unexpected exception. This would keep the complexity low in Cabal and allow for easy transformation to XML. Richard G. On 10-04-08 8:30 PM, Rogan Creswick wrote:
On Thu, Apr 8, 2010 at 5:53 AM, Duncan Coutts
wrote: I think it's important to be able to convert into standard or custom formats. I've no idea if JUnit XML would make sense as the native format. It's plausible.
I hadn't really thought about cabal, itself, being a consumer for test results -- but I like your (Duncan's) points about defining a testing interface, and keeping it extensible.
For the record: I don't think junit xml is a good choice for a native format :), but I do think it's a good format to start with simply because there are many tools that can consume it already.
--Rogan
Duncan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 28 April 2010 09:24, Richard G.
I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it.
I have previously advocated a library interface as a detailed testsuite interface (in addition to a lowest common denominator interface of stdio+exitcode). That is a test stanza in a package .cabal file would specify a module containing an entry point of the right type (like main but using a more interesting type). That way, cabal or any other tool could run the testsuite and produce results in whatever format it likes. As you suggest in your other post, it would make sense to adapt test-framework to implement the interface specified by Cabal. Duncan

On Wed, Apr 28, 2010 at 4:54 AM, Duncan Coutts
On 28 April 2010 09:24, Richard G.
wrote: I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it.
I have previously advocated a library interface as a detailed testsuite interface (in addition to a lowest common denominator interface of stdio+exitcode). That is a test stanza in a package .cabal file would specify a module containing an entry point of the right type (like main but using a more interesting type).
That way, cabal or any other tool could run the testsuite and produce results in whatever format it likes.
I appreciate the elegance of this method, but it seems to me that it requires dynamic loading, which is currently in a sorry state. One way or another, cabal will need to provide a data structure it expects test suites to use for results. Is there a substantial advantage to a library interface, versus providing Read/Show instances for the test result data structure?
As you suggest in your other post, it would make sense to adapt test-framework to implement the interface specified by Cabal.
I agree, as well; this is essentially the approach I took in my proposal. -- Thomas Tuegel

On Wed, 2010-04-28 at 10:42 -0400, Thomas Tuegel wrote:
On Wed, Apr 28, 2010 at 4:54 AM, Duncan Coutts
wrote: On 28 April 2010 09:24, Richard G.
wrote: I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it.
I have previously advocated a library interface as a detailed testsuite interface (in addition to a lowest common denominator interface of stdio+exitcode). That is a test stanza in a package .cabal file would specify a module containing an entry point of the right type (like main but using a more interesting type).
That way, cabal or any other tool could run the testsuite and produce results in whatever format it likes.
I appreciate the elegance of this method, but it seems to me that it requires dynamic loading, which is currently in a sorry state.
Actually it doesn't require dynamic loading. It just requires compiling a stub program that imports the user's library and some test-runner code. Cabal is good at doing that kind of thing already (eg Setup.hs scripts).
One way or another, cabal will need to provide a data structure it expects test suites to use for results. Is there a substantial advantage to a library interface, versus providing Read/Show instances for the test result data structure?
Yes, it means the testing agent (cabal-install or some other program/system) can do more than simply run all the tests. It means it can enumerate them and not run them (think a GUI or web interface), run a subset of tests, run them in parallel etc. Duncan

On Wed, Apr 28, 2010 at 8:19 AM, Duncan Coutts
Yes, it means the testing agent (cabal-install or some other program/system) can do more than simply run all the tests. It means it can enumerate them and not run them (think a GUI or web interface), run a subset of tests, run them in parallel etc.
I'm not convinced that this should be cabal's responsibility. I think we would be better served by leaving this up to the test frameworks (indeed, test-framework has test filtering capabilities already). If 'cabal test' simply acts as a thin layer between the user/invoking system and the test framework, then we could pass arguments through to the underlying test binary and perform these tasks using whatever interface that test binary provides. This will buy us more flexibility in the long run. (I think this is at least a good place to start -- and matches my interpretation of Thomas's proposal.) If Cabal takes on these responsibilities, then the testing api will be more constrained -- we won't be able to experiment with new test formats/methodologies as easily, since any tests will have to meet a specific API. While I agree that we need standardization, I think that we should achieve that by using compatible output formats and compatible (user) interfaces (and enforcing those with tests, schema checkers, etc..). I don't see many benefits to baking this functionality into cabal when it could be done separately. --Rogan

If the goal is continuous integration, perhaps it would be sufficient to require "cabal test" to return an error code of 0 if all tests succeed, and something else if any of them fail; it can additionally print whatever output it wants in either case. The continuous integration system would then run "cabal test" after the build, and if it succeeded (error code 0) say nothing, and if it failed (error code something else) it would report that the build failed and show the output from "cabal test" to give details to the developer. Cheers, Greg On Apr 28, 2010, at 12:55 PM, Rogan Creswick wrote:
On Wed, Apr 28, 2010 at 8:19 AM, Duncan Coutts
wrote: Yes, it means the testing agent (cabal-install or some other program/system) can do more than simply run all the tests. It means it can enumerate them and not run them (think a GUI or web interface), run a subset of tests, run them in parallel etc.
I'm not convinced that this should be cabal's responsibility.
I think we would be better served by leaving this up to the test frameworks (indeed, test-framework has test filtering capabilities already). If 'cabal test' simply acts as a thin layer between the user/invoking system and the test framework, then we could pass arguments through to the underlying test binary and perform these tasks using whatever interface that test binary provides. This will buy us more flexibility in the long run. (I think this is at least a good place to start -- and matches my interpretation of Thomas's proposal.)
If Cabal takes on these responsibilities, then the testing api will be more constrained -- we won't be able to experiment with new test formats/methodologies as easily, since any tests will have to meet a specific API.
While I agree that we need standardization, I think that we should achieve that by using compatible output formats and compatible (user) interfaces (and enforcing those with tests, schema checkers, etc..). I don't see many benefits to baking this functionality into cabal when it could be done separately.
--Rogan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Apr 28, 2010 at 12:55 PM, Rogan Creswick
On Wed, Apr 28, 2010 at 8:19 AM, Duncan Coutts
wrote: Yes, it means the testing agent (cabal-install or some other program/system) can do more than simply run all the tests. It means it can enumerate them and not run them (think a GUI or web interface), run a subset of tests, run them in parallel etc.
I'm not convinced that this should be cabal's responsibility.
I think we would be better served by leaving this up to the test frameworks (indeed, test-framework has test filtering capabilities already). If 'cabal test' simply acts as a thin layer between the user/invoking system and the test framework, then we could pass arguments through to the underlying test binary and perform these tasks using whatever interface that test binary provides. This will buy us more flexibility in the long run. (I think this is at least a good place to start -- and matches my interpretation of Thomas's proposal.)
That is more or less how I intended my proposal to be read, the caveat being that I intentionally said very little about the detailed test suite interface. I agree that we should leave much up to the testing frameworks. If we start implementing facilities in Cabal to pick and choose specific tests from inside test suites, we're essentially writing yet another test framework into Cabal; I've been specifically discouraged from doing that since I began discussing this proposal on the list. I am increasingly of the opinion that we should just provide a simple, stdout/exit code interface and let test frameworks handle the rest: If developers want continuous integration with existing testing tools, the can use a framework that supports the output format those tools use, and pipe that output to stdout to be captured by Cabal. Then they can turn whichever tool they want loose on the output file. If developers want to independently run subsets of their tests, they can give them independent test stanzas in the .cabal file. Either they put the tests in different executables, or the test framework can provide command-line options for turning tests on and off. Those are the big two usage scenarios we've discussed for the detailed test interface, and I think these examples demonstrate why I think it may be unnecessary. -- Thomas Tuegel

On Wed, 2010-04-28 at 09:55 -0700, Rogan Creswick wrote:
On Wed, Apr 28, 2010 at 8:19 AM, Duncan Coutts
wrote: Yes, it means the testing agent (cabal-install or some other program/system) can do more than simply run all the tests. It means it can enumerate them and not run them (think a GUI or web interface), run a subset of tests, run them in parallel etc.
I'm not convinced that this should be cabal's responsibility.
"Cabal" should define the interface between testsuite and test runner. Nothing more. Packages should provide collections of tests. Testing agents should provide a test runner to actually run the tests and do something with the results. See for example test-framework which has exactly this decomposition between testsuites (a collection of tests) and a test runner. They are mediated by a common interface. The test-framework package provides both the interface, some adapters for QC/HUnit to provide tests, and also a sample test runner that prints results to the console. Tools like cabal-install that use the interface defined by Cabal can provide a test runner (almost certainly implemented in some other package) and then do something interesting with the results like showing them to the user or uploading them to hackage. Other tools can use other test runners and do other interesting things.
I think we would be better served by leaving this up to the test frameworks (indeed, test-framework has test filtering capabilities already). If 'cabal test' simply acts as a thin layer between the user/invoking system and the test framework, then we could pass arguments through to the underlying test binary and perform these tasks using whatever interface that test binary provides. This will buy us more flexibility in the long run. (I think this is at least a good place to start -- and matches my interpretation of Thomas's proposal.)
If Cabal takes on these responsibilities, then the testing api will be more constrained -- we won't be able to experiment with new test formats/methodologies as easily, since any tests will have to meet a specific API.
It is exactly defining this API that should allow flexibility. It means packages can provide tests and have them be used in multiple different ways by different test runners with different purposes and capabilities. If all you can do is run the testsuite and collect the results, that's much more constrained. Note that we're also proposing a lowest-common-denominator testsuite interface that gives all the control to the package author, but means the test runner cannot interpret the results in any interesting way. The main point is to wrap any existing testsuite in a way that allows it to be run automatically and non-iteractively for e.g. hackage testing. Duncan

I like this. One area that would be helpful is the ability to run the tests when different compile flags are used. E.g., the HUnit tests have different behaviors when compiled with and without optimization; it would be very handy if I could automate the testing of both cases. I don't believe that testing of multiple compile flags should be done inside Cabal. Instead, the arguments that are passed to `cabal configure' should also be used to build the test programs. This would allow a simple script, or a more complex build system, to handle the testing of both cases. Richard On 10-04-28 9:19 AM, Duncan Coutts wrote:
I have previously advocated a library interface as a detailed testsuite interface (in addition to a lowest common denominator interface of stdio+exitcode). That is a test stanza in a package .cabal file would specify a module containing an entry point of the right type (like main but using a more interesting type).
That way, cabal or any other tool could run the testsuite and produce results in whatever format it likes.
I appreciate the elegance of this method, but it seems to me that it requires dynamic loading, which is currently in a sorry state.
Actually it doesn't require dynamic loading. It just requires compiling a stub program that imports the user's library and some test-runner code. Cabal is good at doing that kind of thing already (eg Setup.hs scripts).

On Wed, Apr 28, 2010 at 1:24 AM, Richard G.
I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it.
I don't think we need to be limited to a single output format. It's a simple thing to have continuous integration (or cabal) invoke tests with a flag/option to output in a specific format. XML is useful because there are a number of mature tools that already expect xml -- we don't need to reinvent the wheel to get some of the capabilities that developers in other languages are enjoying if our tools use some of the same formats (despite the issues that may exist with those formats..). I like your suggestion for an emacs/dev-readable format, and it can coexist with xml and other "snazzier" outputs (such as the default format for test-framework, which uses many little tricks to draw and erase progress bars / etc.) --Rogan
Test <n>:<Result> <Location> <Error message>
E.g.,
Test 1:Passed src/Some/File.hs:23
Test 2:Failed src/Some/File.hs:27 Expecting `4'; received `5'.
Test 3:Error src/Some/OtherFile.hs:39 Unexpected exception.
This would keep the complexity low in Cabal and allow for easy transformation to XML.
Richard G.
On 10-04-08 8:30 PM, Rogan Creswick wrote:
On Thu, Apr 8, 2010 at 5:53 AM, Duncan Coutts
wrote: I think it's important to be able to convert into standard or custom formats. I've no idea if JUnit XML would make sense as the native format. It's plausible.
I hadn't really thought about cabal, itself, being a consumer for test results -- but I like your (Duncan's) points about defining a testing interface, and keeping it extensible.
For the record: I don't think junit xml is a good choice for a native format :), but I do think it's a good format to start with simply because there are many tools that can consume it already.
--Rogan
Duncan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Richard G. schrieb:
I think that formatted plain-text output would be much better than XML, something that is human-readable and relatively easy to parse via machine. Something similar to the GHC error output would work well because developers are familiar with it.
Test <n>:<Result> <Location> <Error message>
E.g.,
Test 1:Passed src/Some/File.hs:23
Test 2:Failed src/Some/File.hs:27 Expecting `4'; received `5'.
Test 3:Error src/Some/OtherFile.hs:39
This is the format Emacs parses and lets you jump right to the according file and position.
Unexpected exception.
This would keep the complexity low in Cabal and allow for easy transformation to XML.
participants (10)
-
Duncan Coutts
-
Gregory Crosswhite
-
Henning Thielemann
-
Ivan Lazar Miljenovic
-
Ivan Miljenovic
-
Jason Dagit
-
Richard G.
-
Rogan Creswick
-
Simon Michael
-
Thomas Tuegel