
On Saturday 16 October 2010 00:03:42, Jordan Ellis wrote:
I'm trying (really trying!) to get my head around Haskell, but one (among many) significant stumbling blocks for me so far has been trying to figure out how my code is actually going to perform. Here's a contrived example of the sort of thing that confuses me. Let's say I have a function which produces a list of odd integers from 1 to n: f n = filter (odd) [1..n] ...and I have another function that makes a list of all sums of pairs of odd numbers from 1 to n: g n = [a + b | a <- (f n), b <- (f n)] I assume that in most languages, (f n) would get called twice (once for a, and once for b), but what happens in Haskell? Does GHC recognize that, since f is a pure function, it only has to be evaluated once for a given n?
That depends. If you compile without optimisations, the list f n won't be shared (that may of course change in future versions), if you compile with optimisations, it will be shared. GHC does very little common subexpression elimination, and CSE is not always a good thing. In the case above, it's fine for small n, but consider what happens for large n. If the list is not shared, for every a you get a new application of f (at least, that's GHC's current behaviour), and the algorithm runs in constant space. If the list is shared, the entire list remains in memory after it has been constructed for the first a. Thus the algorithm runs in O(n) space (but it runs faster unless n is really large). In such cases, you can usually suppress undesired sharing with -fno-cse.
If not, would a 'where clause' make any difference? e.g., g n = [a + b | a <- h, b <- h] where h = f n
Yes, when compiled without optimisations, no when compiled with. In general, when you want a value to be shared, give it a name. It may be shared if you don't but with a name, a decent implementation will share it.
Thanks.