Code that writes code

I'm working on a small Haskell package. One module in particular contains so much boilerplate that rather than write the code myself, I wrote a small Haskell program that autogenerates it for me. What's the best way to package this for Cabal? Just stick the generated file in there? Or is there some (easy) way to tell Cabal how to recreate this file itself?

My preferred approach is to check the generation script into source
control, and add it to Cabal's extra-source-files section. If the
generated file is a standard .hs module, Cabal should add it to the
sdist automatically.
You might want to add a note to the README documenting how to
regenerate the module, so anybody branching your code knows which file
to modify.
There are ways to make Cabal automatically generate a file, but they
seem to assume an external preprocessor operating on a (single input
file -> single output file) model. More complicated systems, such as
running a separate script or (many files -> many files) don't work
well.
On Thu, Aug 19, 2010 at 14:00, Andrew Coppin
I'm working on a small Haskell package. One module in particular contains so much boilerplate that rather than write the code myself, I wrote a small Haskell program that autogenerates it for me.
What's the best way to package this for Cabal? Just stick the generated file in there? Or is there some (easy) way to tell Cabal how to recreate this file itself?

Maybe not helpful to you at this stage, but... An alternative to generating source code is to factor out the common "boilerplate" elements into separate functions, suitably parameterized, and to use higher order functions to stitch these together. An example of this kind of approach, which is handled by code generation in some other languages (e.g. lex, yacc, etc), is the Parsec combinator-based parsing library (http://www.haskell.org/haskellwiki/Parsec) - instead of generating code, the syntax "rules" are written directly using Haskell functions and assemble the common underlying repeated logic dynamically, behind the scenes. I adopted a development of this approach for a programme with a built-in scripting language that I implemented some time ago: the scripting language was parsed using Parsec, not into a syntax tree, but directly into a dynamically assembled function that could be applied to some data to perform the scripted function (http://www.ninebynine.org/RDFNotes/Swish/Intro.html). What I'm trying to point out here that, rather than go through the step of generating source code and feeding it back into a Haskell compiler, it may be possible to use higher order functions to directly assemble the required logic within a single program. For me, this is one of the great power-features of functional programming, which I now tend to use where possible in other languages that support functions as first class values. #g -- Andrew Coppin wrote:
I'm working on a small Haskell package. One module in particular contains so much boilerplate that rather than write the code myself, I wrote a small Haskell program that autogenerates it for me.
What's the best way to package this for Cabal? Just stick the generated file in there? Or is there some (easy) way to tell Cabal how to recreate this file itself?

Graham Klyne
[...] rather than go through the step of generating source code and feeding it back into a Haskell compiler, it may be possible to use higher order functions to directly assemble the required logic within a single program. For me, this is one of the great power-features of functional programming [...]
I agree one-hundred-percently, and that's also what I stress when I teach. But of course this has to be balanced with the observation that in current Haskell, not everything is a value. Functions are, but modules and types are not. That's why you cannot directly handle them programmatically. So you either rewrite the program (unify the "similar" modules/types) or resort to syntactic manipulation (as a compiler pass - like template haskell, or by external processors) which has the severe downside of losing static typechecking (even if the generator is type-checked, you cannot be sure that its output is type-safe). Anyway the original poster asked about cabal integration. For that, code generation in the compiler (template haskell) certainly is easier than external processors. The gtk2hs project also needs to generate boilerplate, and they put their generators into a separate package http://hackage.haskell.org/package/gtk2hs-buildtools that you need to cabal-install before. (somewhat strangely, gtk2hs-buildtools is not a dependency of gtk? Is that because cabal packages cannot depend on executables?) J.W.

I would also like to strongly discourage code generators. Any code that has to be "generated" can and should have its common characteristics separated out with only unique characterstic remaining typically with an interface (i.e. type class) or polymorphic type dividing the two, creating a separation of concerns (this is really just abstraction). Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work. Of course code generation is perfectly fine when the output is not intended to be read and maintained by a human. For example, a compiler is technically a code generator, but it is purely for optimization purposes and the output is not intended to then be maintained by a human manually. A compiler might unroll a loop repeating the loop body a hundred times causing obvious duplication of logic, but it's fine because the assembler output is not intended to be maintained by a human, only the source input is. Efficiency and maintainability cannot be satisfied at the same time, which is why assembly sucks (not maintainable) and so do dynamic/scripting languages (not efficient), and compiled languages like Haskell are awesome (source code is highly maintainable, compiler output is highly efficient). Anyway, from my experience if you're generating code intended to be maintained by a human, you're doing it wrong. Though I am very interested to hear counter examples. Jesse On 20/08/2010 6:17 PM, Graham Klyne wrote:
Maybe not helpful to you at this stage, but...
An alternative to generating source code is to factor out the common "boilerplate" elements into separate functions, suitably parameterized, and to use higher order functions to stitch these together.
An example of this kind of approach, which is handled by code generation in some other languages (e.g. lex, yacc, etc), is the Parsec combinator-based parsing library (http://www.haskell.org/haskellwiki/Parsec) - instead of generating code, the syntax "rules" are written directly using Haskell functions and assemble the common underlying repeated logic dynamically, behind the scenes.
I adopted a development of this approach for a programme with a built-in scripting language that I implemented some time ago: the scripting language was parsed using Parsec, not into a syntax tree, but directly into a dynamically assembled function that could be applied to some data to perform the scripted function (http://www.ninebynine.org/RDFNotes/Swish/Intro.html).
What I'm trying to point out here that, rather than go through the step of generating source code and feeding it back into a Haskell compiler, it may be possible to use higher order functions to directly assemble the required logic within a single program. For me, this is one of the great power-features of functional programming, which I now tend to use where possible in other languages that support functions as first class values.
#g --
Andrew Coppin wrote:
I'm working on a small Haskell package. One module in particular contains so much boilerplate that rather than write the code myself, I wrote a small Haskell program that autogenerates it for me.
What's the best way to package this for Cabal? Just stick the generated file in there? Or is there some (easy) way to tell Cabal how to recreate this file itself?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jesse Schalken wrote:
I would also like to strongly discourage code generators.
Any code that has to be "generated" can and should have its common characteristics separated out with only unique characterstic remaining typically with an interface (i.e. type class) or polymorphic type dividing the two, creating a separation of concerns (this is really just abstraction).
All the facilities of my family of types is accessed through type classes already.
Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work.
Uh... why edit the generated code? Why not just modify the generator and rerun it? That's kind of the entire *point* of using a code generator rather than writing it all by hand...
Anyway, from my experience if you're generating code intended to be maintained by a human, you're doing it wrong. Though I am very interested to hear counter examples.
Yeah, I do get the feeling that if your code is repetative enough to require automation, you're probably doing it wrong...

Jesse Schalken
I would also like to strongly discourage code generators.
Any code that has to be "generated" can and should have its common characteristics separated out with only unique characterstic remaining typically with an interface (i.e. type class) or polymorphic type dividing the two, creating a separation of concerns (this is really just abstraction).
Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work.
Of course code generation is perfectly fine when the output is not intended to be read and maintained by a human. For example, a compiler is technically a code generator, but it is purely for optimization purposes and the output is not intended to then be maintained by a human manually. A compiler might unroll a loop repeating the loop body a hundred times causing obvious duplication of logic, but it's fine because the assembler output is not intended to be maintained by a human, only the source input is. Efficiency and maintainability cannot be satisfied at the same time, which is why assembly sucks (not maintainable) and so do dynamic/scripting languages (not efficient), and compiled languages like Haskell are awesome (source code is highly maintainable, compiler output is highly efficient).
Anyway, from my experience if you're generating code intended to be maintained by a human, you're doing it wrong. Though I am very interested to hear counter examples.
*ahem* http://code.haskell.org/graphviz/utils/AttributeGenerator.hs Yes, it's ugly; the point is is that every time I want to edit what I do with the different Attributes in graphviz (e.g. I'm thinking of having "smart" Gen functions for QuickCheck tests that will generate specific types of Attributes suitable for DotNodes, etc.) I don't have to do so by hand and possibly edit code that's already there: I merely add the appropriate code-writing functions into that script, remove the already existing code, run the script and paste the result. Note that this is code that is generated at _write_ time, not build time; as much as possible the code is also generated to be human-readable: http://hackage.haskell.org/packages/archive/graphviz/2999.10.0.1/doc/html/sr... (but I have stuffed up in the past and tried editing it by hand only to have breakages because I did it piece-meal rather than doing so all in one go, which the script allows me). That is, the script is just there to help me out rather than doing some fancy auto-generation stuff which I have no control over. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Aug 22, 2010, at 7:34 PM, Jesse Schalken wrote:
I would also like to strongly discourage code generators.
I've used ad hoc code generators a lot, and never had reason to regret it. The key point is that ALL maintenance of the generated code must be done by maintaining the generator and its input, NOT by patching the output.
Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work.
If you need to change something about a generator's output, you do it (always!) by changing the generator's input, or by changing the generator. Then you *re*generate the code. There should never *be* any "unique modifications" to the output of a code generator.
Of course code generation is perfectly fine when the output is not intended to be read and maintained by a human.
"Read" and "maintained" are two different issues. Depending on the tool-chain, it may be necessary for people to read the generated code while debugging.

In the past I've taken a hybrid approach - ad-hoc fixes are done to the
generated code, but it is done via 'patch' as an automated step, and the
diff is stored in source control with everything else.
You'll need extra tool support to build the diff, as well.
It's still really brittle, and I would only recommend it if you don't own
the input data you're generating from, and other consummers rely on it being
in its current state.
Take care,
Antoine
On Aug 24, 2010 4:51 PM, "Richard O'Keefe"
On Aug 22, 2010, at 7:34 PM, Jesse Schalken wrote:
I would also like to strongly discourage code generators.
I've used ad hoc code generators a lot, and never had reason to regret it.
The key point is that ALL maintenance of the generated code must be done by maintaining the generator and its input, NOT by patching the output.
Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work.
If you need to change something about a generator's output, you do it (always!) by changing the generator's input, or by changing the generator. Then you *re*generate the code. There should never *be* any "unique modifications" to the output of a code generator.
Of course code generation is perfectly fine when the output is not intended to be read and maintained by a human.
"Read" and "maintained" are two different issues. Depending on the tool-chain, it may be necessary for people to read the generated code while debugging.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Quoth "Richard O'Keefe"
Of course code generation is perfectly fine when the output is not intended to be read and maintained by a human.
"Read" and "maintained" are two different issues. Depending on the tool-chain, it may be necessary for people to read the generated code while debugging.
Indeed, for my modest purposes it has been easy enough to generate code in a conventional, readable format - and honestly it's better self-documentation than the code I write directly. The mechanical consistency of generated code makes it especially transparent. Donn Cave, donn@avvanta.com

On 25 August 2010 07:51, Richard O'Keefe
On Aug 22, 2010, at 7:34 PM, Jesse Schalken wrote:
Every software project which I've worked on that used a code generator turned into a nightmare, because when we find we need to change something about the generator's output, all the already generated code has to be updated manually while at the same time maintaining all of the unique modifications that have been since the code was first generated. It's a horrible duplication of program logic and maintenance work.
If you need to change something about a generator's output, you do it (always!) by changing the generator's input, or by changing the generator. Then you *re*generate the code. There should never *be* any "unique modifications" to the output of a code generator.
Yes, and if your ad-hoc changes cannot be expressed in the actual generated code, then you may wish to re-think what you're generating (what's the point of generating it if you have to edit it anyway?). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 8/24/10 17:51 , Richard O'Keefe wrote:
On Aug 22, 2010, at 7:34 PM, Jesse Schalken wrote:
I would also like to strongly discourage code generators.
I've used ad hoc code generators a lot, and never had reason to regret it.
The key point is that ALL maintenance of the generated code must be done by maintaining the generator and its input, NOT by patching the output.
I have one additional exception: I will sometimes autogenerate a skeleton by e.g. grepping over a source tree. This has the opposite constraint: once generated, you don't regenerate it unless you can characterize it well enough that you can treat it as above; but usually it's a one-time code refactoring. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkx1VcMACgkQIn7hlCsL25WYTQCglf5U3GZNVLO/Zta42IXcztzj uJQAnRKE5svSoYMag0gb3YIeDfuc3AMn =52i1 -----END PGP SIGNATURE-----

Check out the userHooks in Cabal[1]. I believe you can use, e.g.
hookedPreProcessors[2], or preBuild to preprocess your files into
regular Haskell files before building takes place.
[1]: http://www.haskell.org/ghc/docs/6.12.1/html/libraries/Cabal/Distribution-Sim...
[2]: http://www.haskell.org/ghc/docs/6.12.1/html/libraries/Cabal/Distribution-Sim...
On 19 August 2010 23:00, Andrew Coppin
I'm working on a small Haskell package. One module in particular contains so much boilerplate that rather than write the code myself, I wrote a small Haskell program that autogenerates it for me.
What's the best way to package this for Cabal? Just stick the generated file in there? Or is there some (easy) way to tell Cabal how to recreate this file itself?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (11)
-
Andrew Coppin
-
Antoine Latter
-
Brandon S Allbery KF8NH
-
Christopher Done
-
Donn Cave
-
Graham Klyne
-
Ivan Lazar Miljenovic
-
Jesse Schalken
-
Johannes Waldmann
-
John Millikin
-
Richard O'Keefe