Newbie performance question

newer
A problem with a simple csv file...

Jordan Ellis

15 Oct 2010 15 Oct '10

10:03 p.m.

I'm trying (really trying!) to get my head around Haskell, but one (among many) significant stumbling blocks for me so far has been trying to figure out how my code is actually going to perform. Here's a contrived example of the sort of thing that confuses me. Let's say I have a function which produces a list of odd integers from 1 to n: f n = filter (odd) [1..n] ...and I have another function that makes a list of all sums of pairs of odd numbers from 1 to n: g n = [a + b | a <- (f n), b <- (f n)] I assume that in most languages, (f n) would get called twice (once for a, and once for b), but what happens in Haskell? Does GHC recognize that, since f is a pure function, it only has to be evaluated once for a given n? If not, would a 'where clause' make any difference? e.g., g n = [a + b | a <- h, b <- h] where h = f n Thanks.

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by date

Ozgur Akgun

15 Oct 15 Oct

10:26 p.m.

Just a quick reply, others might go into more detail. On 15 October 2010 23:03, Jordan Ellis wrote:

...

Let's say I have a function which produces a list of odd integers from 1 to n:

f n = filter (odd) [1..n]

...and I have another function that makes a list of all sums of pairs of odd numbers from 1 to n:

g n = [a + b | a <- (f n), b <- (f n)]

I assume that in most languages, (f n) would get called twice (once for a, and once for b), but what happens in Haskell? Does GHC recognize that, since f is a pure function, it only has to be evaluated once for a given n?

No, it doesn't. Haskell compilers doesn't do CSEhttp://en.wikipedia.org/wiki/Common_subexpression_elimination .

...

If not, would a 'where clause' make any difference? e.g.,

g n = [a + b | a <- h, b <- h] where h = f n

Yes, this is the way to go (or local let bindings). HTH, Ozgur

Daniel Fischer

10:57 p.m.

On Saturday 16 October 2010 00:03:42, Jordan Ellis wrote:

...

I'm trying (really trying!) to get my head around Haskell, but one (among many) significant stumbling blocks for me so far has been trying to figure out how my code is actually going to perform. Here's a contrived example of the sort of thing that confuses me. Let's say I have a function which produces a list of odd integers from 1 to n: f n = filter (odd) [1..n] ...and I have another function that makes a list of all sums of pairs of odd numbers from 1 to n: g n = [a + b | a <- (f n), b <- (f n)] I assume that in most languages, (f n) would get called twice (once for a, and once for b), but what happens in Haskell? Does GHC recognize that, since f is a pure function, it only has to be evaluated once for a given n?

That depends. If you compile without optimisations, the list f n won't be shared (that may of course change in future versions), if you compile with optimisations, it will be shared. GHC does very little common subexpression elimination, and CSE is not always a good thing. In the case above, it's fine for small n, but consider what happens for large n. If the list is not shared, for every a you get a new application of f (at least, that's GHC's current behaviour), and the algorithm runs in constant space. If the list is shared, the entire list remains in memory after it has been constructed for the first a. Thus the algorithm runs in O(n) space (but it runs faster unless n is really large). In such cases, you can usually suppress undesired sharing with -fno-cse.

...

If not, would a 'where clause' make any difference? e.g., g n = [a + b | a <- h, b <- h] where h = f n

Yes, when compiled without optimisations, no when compiled with. In general, when you want a value to be shared, give it a name. It may be shared if you don't but with a name, a decent implementation will share it.

...

Thanks.

Jordan Ellis

16 Oct 16 Oct

6:18 a.m.

Thanks for the reply. I have a quick follow-up question: With optimization enabled, will a value be shared between recursive function calls? For example, in the following: myRecursiveFunction 0 m = []myRecursiveFunction n m = m : myRecursiveFunction (n - 1) m ...will 'm' be calculated 'n' times, or just once? Thanks.

...

From: daniel.is.fischer@web.de To: beginners@haskell.org Subject: Re: [Haskell-beginners] Newbie performance question Date: Sat, 16 Oct 2010 00:57:41 +0200 CC: hookflash@hotmail.com

On Saturday 16 October 2010 00:03:42, Jordan Ellis wrote:

...
I'm trying (really trying!) to get my head around Haskell, but one (among many) significant stumbling blocks for me so far has been trying to figure out how my code is actually going to perform. Here's a contrived example of the sort of thing that confuses me. Let's say I have a function which produces a list of odd integers from 1 to n: f n = filter (odd) [1..n] ...and I have another function that makes a list of all sums of pairs of odd numbers from 1 to n: g n = [a + b | a <- (f n), b <- (f n)] I assume that in most languages, (f n) would get called twice (once for a, and once for b), but what happens in Haskell? Does GHC recognize that, since f is a pure function, it only has to be evaluated once for a given n?

That depends. If you compile without optimisations, the list f n won't be shared (that may of course change in future versions), if you compile with optimisations, it will be shared.

GHC does very little common subexpression elimination, and CSE is not always a good thing.

In the case above, it's fine for small n, but consider what happens for large n. If the list is not shared, for every a you get a new application of f (at least, that's GHC's current behaviour), and the algorithm runs in constant space. If the list is shared, the entire list remains in memory after it has been constructed for the first a. Thus the algorithm runs in O(n) space (but it runs faster unless n is really large). In such cases, you can usually suppress undesired sharing with -fno-cse.

...
If not, would a 'where clause' make any difference? e.g., g n = [a + b | a <- h, b <- h] where h = f n

Yes, when compiled without optimisations, no when compiled with. In general, when you want a value to be shared, give it a name. It may be shared if you don't but with a name, a decent implementation will share it.

...
Thanks.

Daniel Fischer

8:44 p.m.

On Saturday 16 October 2010 08:18:39, Jordan Ellis wrote:

...

I have a quick follow-up question: With optimization enabled, will a value be shared between recursive function calls? For example, in the following: myRecursiveFunction 0 m = [] myRecursiveFunction n m = m : myRecursiveFunction (n - 1) m ...will 'm' be calculated 'n' times, or just once? Thanks.

Here, you don't have two equal expressions, the m refers to the same entity in both appearances on the RHS. It's not guaranteed by the language definition, but every decent implementation will evaluate m only once (all occurrences are pointers to the same heap object, so you get n pointers, but only one pointed-to value). With or without optimisation. That's of course bad if m is large and you need a large part of it once at the beginning but only small parts afterwards, e.g. mapM_ (print . length) $ zipWith take (iterate (`quot` 2) 20000000) $ myRec n [1 .. k] To prevent that, if you don't want it, you have to jump through a couple of hoops.

5375

Age (days ago)

5376

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Daniel Fischer
Jordan Ellis
Ozgur Akgun