The amount of CPP we have to use is getting out of hand

Hi, (This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.) The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain: containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15 If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java). CPP really sucks from a maintenance perspective: * It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking. There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries. Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use *both* the old "unclean" API *and* the new "clean" API. The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible. * It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP. -- Johan

+1, you have my full support and agreement.
On Fri, Jan 9, 2015, 3:57 PM Johan Tibell
Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).
CPP really sucks from a maintenance perspective:
* It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.
Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use *both* the old "unclean" API *and* the new "clean" API.
The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.
* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.
-- Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I agree in principle, however, I'm not sure how feasible it is in
practice. I quickly grepped through some of our (internal and public)
packages and checked why we need CPP (I only checked MIN_VERSION_foo
macros):
data type change: 11
newly added function/type: 6
added instance: 4
function change/rename: 3
function remove: 1
deprecated function: 1
As you can see, most instances are either data type changes (these are
almost all in template haskell or haskell-src-exts) or newly added
functions where we also want to support old versions without the
function. So avoiding function changing or removing would reduce the
CPP usage in our case by about 15%, which is welcome, but doesn't
really change anything fundamental. For data type changes, I don't
really see an alternative: if new features are added the AST changes,
the AST type changes. For additions of functions, I guess we could use
a local definition even for newer versions, but that makes it less
clear when you can remove it. For instance additions, I again see no
alternative.
So while I think we can do slightly better, and I would love it if
that happened, it's probably not going to be significant.
Erik
P.S. Just to add some more data, here's the packages where CPPing for:
9 base
7 template_haskell
4 network
4 haskell_src_exts
2 uuid
2 time
1 wai
1 json_schema
1 containers
1 HTTP
On Fri, Jan 9, 2015 at 3:00 PM, Michael Snoyman
+1, you have my full support and agreement.
On Fri, Jan 9, 2015, 3:57 PM Johan Tibell
wrote: Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).
CPP really sucks from a maintenance perspective:
* It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.
Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.
The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.
* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.
-- Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

If anyone would like to compute the CPP usage for your modules, you can use
this command:
for lib in hashable cabal/Cabal cabal/cabal-install containers
unordered-containers cassava ekg network; do
echo $lib
find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o
-name .cabal-sandbox -o -name tests-ghc \) -prune \
-o -name Setup.hs -prune -o -name '*.hs' -exec grep -l 'LANGUAGE.*CPP'
{} \; | wc -l
find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o
-name .cabal-sandbox -o -name tests-ghc \) -prune \
-o -name Setup.hs -prune -o -name '*.hs' -print | wc -l
done
Replace the list in the 'in' clause with your list of packages (which
should all be in a per-package directory under $CWD).
On Fri, Jan 9, 2015 at 2:55 PM, Johan Tibell
Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).
CPP really sucks from a maintenance perspective:
* It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.
Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use *both* the old "unclean" API *and* the new "clean" API.
The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.
* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.
-- Johan

To complete the list, here are my other three packages:
network-uri 1/1
ekg-statsd 1/3
ekg-core 1/8
On Fri, Jan 9, 2015 at 3:46 PM, Johan Tibell
If anyone would like to compute the CPP usage for your modules, you can use this command:
for lib in hashable cabal/Cabal cabal/cabal-install containers unordered-containers cassava ekg network; do echo $lib find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \ -o -name Setup.hs -prune -o -name '*.hs' -exec grep -l 'LANGUAGE.*CPP' {} \; | wc -l find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \ -o -name Setup.hs -prune -o -name '*.hs' -print | wc -l done
Replace the list in the 'in' clause with your list of packages (which should all be in a per-package directory under $CWD).
On Fri, Jan 9, 2015 at 2:55 PM, Johan Tibell
wrote: Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).
CPP really sucks from a maintenance perspective:
* It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.
Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use *both* the old "unclean" API *and* the new "clean" API.
The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.
* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.
-- Johan

Il giorno 09/gen/2015, alle ore 14:55, Johan Tibell
ha scritto: Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).
CPP really sucks from a maintenance perspective:
* It's not Haskell, but this bizarre string concatenation language. * The code is harder to read, bitrots more easily, and is harder to test. * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.
Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.
The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.
* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.
Hi I’m an outsider so this could probably sound ingenuous but, why not thinking about an in-language feature to solve the problems addressed by CPP? I think these all fall into: Enable some top-level declaration only if the XYZ feature is available. This feature should allow the user to specify different definitions of the same symbol depending on the availability of compiler features but also _modules_ features. So if I can declare and export a newFunc function from my module only if DataKinds is supported, I can do it explicitly instead of relying on GHC version X.Y.Z. On the other hand, the users of my module can decide if they want to compile some code depending on the fact that my module exports the function or not. This should not be limited to “the module exports the function”. Other types of “features” could be tested over, and modules should be able to declare the new features added which deserve to be tested in this way. For example, if in the 2.0 version of my module I’ve increased the laziness of my data structure, I can export the feature “MyModule.myFunc is lazy” (encoded in some way). Then the user can decide which implementation of its algorithm to use depending on this. I think a system like this should solve the majority of maintenance burden because: - Dependencies on libraries features are explicit and the version numbers needed to support them can be inferred by cabal. For example cabal could support a syntax like containers(with:foo_is_lazy) instead of containers >= x.y.z - GHC can automatically warn about features that are supported by all the currently supported versions of GHC so that the checks can be removed. - Code is more testable because the test suite could run tests multiple times, each time “faking” the availability of certain features, with the GHC support of a “fake old version mode” where it has simply to pretend to not know the existence of a certain extension (not at all a “compatibility mode”, to be clear). As for library features, integrating with cabal sandboxes one could automatically switch library versions to run the test with. - Other? I repeat: I’m an outsider of the world of maintenance of the haskell packages, so I could be missing something obvious. Hope this can be useful though.
— Johan
Bye, Nicola

I wonder how much of the CPP functionality could be implemented using template haskell?

Nicola Gigante
Il giorno 09/gen/2015, alle ore 14:55, Johan Tibell
ha scritto: Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
[snip]
Hi
I’m an outsider so this could probably sound ingenuous but, why not thinking about an in-language feature to solve the problems addressed by CPP?
This might be a good time to bring in the data point provided by Rust [1], where the attribute system to allow conditional compilation. For instance, #[cfg(not(a_feature))] pub fn my_function() { ... } #[cfg(a_feature)] pub fn my_function() { ... } The build system can then detect whether the feature in question is available, and potentially pass `-f a_feature` to the compiler. `cfg` items can also have string values which can be tested for equality (although I think they intend on extending this at some point). This works well for them as it is flexible and fits nicely into the language, reusing the attribute syntax that Rust users are already familiar with. The closest thing Haskell has to this is the conventional `{-# ... #-}` pragma syntax, but leveraging this would almost certainly require compiler support and a language extension. Cheers, - Ben [1] http://doc.rust-lang.org/reference.html#conditional-compilation

On 01/09/2015 08:55 AM, Johan Tibell wrote:
Hi,
(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)
The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:
containers 18/18 hashable 4/5 unordered-containers 6/9 network 3/7 cassava 4/16 cabal/cabal-install 13/75 cabal/Cabal 7/78 ekg 1/15
I looked at a few of these, and some of the CPP could be avoided. Whether or not the alternatives involve more work -- well, you be the judge. 1. TESTING constant. Here CPP is used to export internal stuff during testing, for example: module Data.Set ( #if !defined(TESTING) Set #else Set(..) #endif This allows you to put the tests in a separate module, but give them access to internal functions. I'm torn on which solution is better, but I've settled on putting the tests in the module with the functions they test. You then have to depend on e.g. tasty, but who cares -- Cabal should be running the test suites anyway and bail out if they fail. That's (half of..) what they're for. You also have to import the Test.Crap in each module, but this bothers me less than I thought it would. If you use doctest to test your examples, then those tests have to go in the module with the functions themselves, so at that point there's no additional uncleanliness felt. 2. Optionally enable new features with newer GHCs. One example: #if MIN_VERSION_base(4,8,0) import Data.Coerce #endif These are better addressed with git branches. Do your development on the master branch targeting the latest GHC, but also keep a branch for older GHC. The master branch would have "import Data.Coerce", but the "old_ghc" branch would not. It doesn't produce much extra work -- git is designed to do exactly this. Whenever you make a new commit on master, it's trivial to merge it back into the old_ghc branch. Suppose your library foo is at version 1.5.0 when a new GHC is released. You can use the master branch for 1.6.0, using the new features. The next time you make a release, just release two new packages: 1.5.1 and 1.6.1 that target the old and new GHC respectively. This way you at least *work* off of a clean code base. Your new tricks in the master branch just look like a patch on top of what's in the old_ghc branch.

Johan, I hear and agree. Even though I've never used CPP in my packages, I've read code that does ---and it's horrible. And I, too, have experienced new GHC stable versions breaking foundational libraries, including Cabal. But, my impression was that most of these breakages are due to certain GHC extensions being deprecated, and not because the compiler stops respecting the standard. Is my understanding correct? If so, then why not disable extensions and limit yourself to Haskell 2010? Yes, you get used to the good stuff quickly, and it's painful to give it up --- but isn't that what standards are for?

It is not my experience that encouraging stagnation in library APIs
will either reduce CPP or improve the community.
My project modules using CPP:
0/12
0/6
3/9 - Changes to base/tagged, conditionally including expensive
code needed by few users
10/116 - gmp arch specific issues, RecursiveDo vs DoRec,
bitSizeMaybe, debug trace
0/7
4/5 - Platform specific code selection
0/1
0/7
1/4 - Architecture specific unsafeness for performance gains
5/207 - File and line number enhanced error messages
0/1
On Fri, Jan 9, 2015 at 2:45 PM, Andrey Chudnov
Johan, I hear and agree. Even though I've never used CPP in my packages, I've read code that does ---and it's horrible. And I, too, have experienced new GHC stable versions breaking foundational libraries, including Cabal. But, my impression was that most of these breakages are due to certain GHC extensions being deprecated, and not because the compiler stops respecting the standard. Is my understanding correct? If so, then why not disable extensions and limit yourself to Haskell 2010? Yes, you get used to the good stuff quickly, and it's painful to give it up --- but isn't that what standards are for?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (9)
-
Andrey Chudnov
-
Ben Gamari
-
David Fox
-
Erik Hesselink
-
Johan Tibell
-
Michael Orlitzky
-
Michael Snoyman
-
Nicola Gigante
-
Thomas DuBuisson