
OK, so suppose you sit down and write a complicated string parser. Now how do you test that it works correctly? One way is to run the parser over a large corpus of "real world" samples. This has a number of problems: * If the parser is for a grammar you just invented yourself, presumably no "real world" corpus exists yet. * This tests that the most commonly used features work OK, but might not test whether more obscure features work right. * There's no really obvious way to check that what the parser returns is /correct/. You've got the input, but there's nothing to compare with. * This probably doesn't test whether the parser /fails/ for invalid input, since a real world corpus would presumably consist entirely of valid input. Most people's library of choice appears to be QuickCheck. But while it's not hard to have QuickCheck generate random strings and confirm that the returned parse tree (if any) doesn't violate any invariants, it's not so easy to check that the parse tree is actually what it should be. On top of that, depending on what the grammar is, the vast majority of random strings are probably just parse failures. Even if they aren't, there are probably specific constructs that are special-cased in the parser, which are very unlikely to appear by chance. (Things like keyword names, special combinations of punctuation, etc.) You can try to write your own text generator that constructs text more closely approximating what your parser is supposed to parse. But then how do you tell whether your generator is right? If you have a function that turns a parse tree back into text again, you can try verifying that a round-trip is the identity function. Except perhaps sometimes it isn't. Perhaps a given expression has more than one equivalent representation. A round-trip from string and back again is even less likely to be stable. So what's the best way to attack this problem?

On 11/06/11 14:10, Andrew Coppin wrote:
OK, so suppose you sit down and write a complicated string parser. Now how do you test that it works correctly?
If you have a function that turns a parse tree back into text again, you can try verifying that a round-trip is the identity function. Except perhaps sometimes it isn't. Perhaps a given expression has more than one equivalent representation. A round-trip from string and back again is even less likely to be stable.
So what's the best way to attack this problem?
_
If your parse tree has a "show" instance (or better yet, a pretty-print function) then you can generate random parse trees, print them, and then show that the parse returns an equal tree. However if you want to have useful error messages or a wider range of representations than just those generated by "show" then you will need to write a QuickCheck variant on the "show" function that emits all these variations, which is likely to be rather more work.

Choices, choices.
The first one is to use unit tests. Look at the grammar and make sure the obvious stuff fails or succeeds. "a + b", "a :+ b", etc. You can do this at the Haskell level with parser objects.
Next you can write small samples to test things the unit tests did not. Compare the output to known results. There are numerous examples of this in the open source world you can either reuse or crib from.
Personally I think QuickCheck is more work than it is worth here.
On Jun 11, 2011, at 8:18, Paul Johnson
On 11/06/11 14:10, Andrew Coppin wrote:
OK, so suppose you sit down and write a complicated string parser. Now how do you test that it works correctly?
If you have a function that turns a parse tree back into text again, you can try verifying that a round-trip is the identity function. Except perhaps sometimes it isn't. Perhaps a given expression has more than one equivalent representation. A round-trip from string and back again is even less likely to be stable.
So what's the best way to attack this problem?
_
If your parse tree has a "show" instance (or better yet, a pretty-print function) then you can generate random parse trees, print them, and then show that the parse returns an equal tree.
However if you want to have useful error messages or a wider range of representations than just those generated by "show" then you will need to write a QuickCheck variant on the "show" function that emits all these variations, which is likely to be rather more work.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Jun 11, 2011 at 10:06 AM, Sean Perry
Choices, choices.
The first one is to use unit tests. Look at the grammar and make sure the obvious stuff fails or succeeds. "a + b", "a :+ b", etc. You can do this at the Haskell level with parser objects.
Next you can write small samples to test things the unit tests did not. Compare the output to known results. There are numerous examples of this in the open source world you can either reuse or crib from.
Don't just test the "obvious" things. If the parse tree is truly a tree, it is an initial algebra and amenable to case analysis as an initial algebra. In other words, you can do a proof by induction, just by checking all the base cases.

On Sat, 11 Jun 2011 10:34:47 -0700
Alexander Solla
On Sat, Jun 11, 2011 at 10:06 AM, Sean Perry
wrote: Choices, choices.
The first one is to use unit tests. Look at the grammar and make sure the obvious stuff fails or succeeds. "a + b", "a :+ b", etc. You can do this at the Haskell level with parser objects.
Next you can write small samples to test things the unit tests did not. Compare the output to known results. There are numerous examples of this in the open source world you can either reuse or crib from.
Don't just test the "obvious" things. If the parse tree is truly a tree, it is an initial algebra and amenable to case analysis as an initial algebra. In other words, you can do a proof by induction, just by checking all the base cases.
Wouldn't also be reasonable for the test cases to be generated from the grammar automagically ? Seems much more useful than attempting to do it by hand. Seems like something which would have already been done, i.e. there's already a tool out there that does that ? Brian

On Sat, Jun 11, 2011 at 7:10 AM, Andrew Coppin
If you have a function that turns a parse tree back into text again, you can try verifying that a round-trip is the identity function. Except perhaps sometimes it isn't. Perhaps a given expression has more than one equivalent representation. A round-trip from string and back again is even less likely to be stable.
I find that this kind of testing works quite nicely in practice with QuickCheck, in the sense that it does a good job of finding bugs in both the unparser and the parser. It's possible, but unlikely, that you'll have a bug in the parser and a bug in the unparser that cancel one another out. Generate the ASTs; parse . unparse ought to be id, or perhaps t = t . parse . unparse for some simple transformation 't'. For example, if the parser annotates AST nodes with source positions, you may need 't' to strip them out. Kevin -- Kevin Charter kevin.charter@acm.org
participants (6)
-
Alexander Solla
-
Andrew Coppin
-
briand@aracnet.com
-
Kevin Charter
-
Paul Johnson
-
Sean Perry