
On Fri, 2008-12-19 at 14:40 +0100, Daniel Kraft wrote:
Hi,
I'm just a beginner trying to learn a little about Haskell, and as such write some toy programs (e.g. for projecteuler.net) in Haskell.
Currently, I'm experiencing what I would call "strange behaviour":
I've got a data-type
data Fraction = Fraction Int Int
to hold rational numbers (maybe there's already some built-in type for this in Haskell, much like for instance Scheme has a rational type?), and then I compute a list of pairs of those numbers, that is [(Fraction, Fraction)]. Fraction is declared an instance of Ord.
This list has up to 3 million elements. If I do
main = print $ length $ points
then the program prints out the length fine and takes around 6s to finish (compiled with GHC -O3). Ok, but I acknowledge that length isn't quite an expensive function, as I can imagine that Haskell does not compute and hold the entire list for this but instead each element at once and discards it afterwards.
Right.
Doing
main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
instead, gives a slightly longer runtime (6.6s), but in this case I'm sure that at least each element is computed; right?
Nope. Nothing looks at the result of the comparison so it is not done and so it does not force any of the fractions to be computed.
main = print $ length $ reverse points
gives 11.9s, and here I guess (?) that for this to work, the entire list is computed and hold in memory.
Yes, the list is held in memory but the elements are not computed.
However, trying to do
import List main = print $ length $ sort points
Now it really does have to compute the elements because comparing the elements involves inspecting them.
makes memory usage go up and the program does not finish in 15m, also spending most time waiting for swapped out memory.
Right. Much more memory used this time and that is pushing your machine into swapping which then goes very very slowly.
What am I doing wrong, why is sort this expensive in this case?
Sort itself is not so expensive, the expensive thing was calculating all the elements of your list and sort was the first function that had to do it. If you used something like sum (rather than length) then it'd hit the same issue as it would need to evaluate each element.
I would assume that computing and holding the whole list does not take too much memory, given its size and data type;
That does not seem to be the case. data Fraction = Fraction Int Int This takes 7 words per Fraction. There's 3 words for the Fraction constructor and it's two fields. Each Int also takes 2 words. On a 32bit machine that is 28 bytes, or 56 on a 64bit machine. We can reduce the memory requirements for Fraction by making the Int's strict and unpacking them into the Fraction constructor: data Fraction = Fraction {-# UNPACK #-} !Int {-# UNPACK #-} !Int Now it takes 3 words per Fraction. However it is no longer so lazy. Previously you could have (Fraction 3 (error "not used")) and as long as you never looked at the second Int it would work. Now that we made both fields strict (the ! on the fields) that does not work any more, (Fraction 3 (error "not used")) will produce the error, even if we only look at the first Int.
doing the very same calculation in C should be straight forward.
In an eager language like C the first one would have failed too because it would have had to compute all the elements eagerly and holding them all in memory at once seems to require more memory than your machine has. On the other hand C would use something more like the strict variant I mentioned above so the initial memory requirements would probably be lower.
And sort should be O(n * log n) for time and also not much more expensive in memory, right?
Sort does take a bit more memory than just reversing because it has to construct some intermediate lists while merging.
Am I running into a problem with lazyness?
A problem in your understanding of lazyness.
What can I do to avoid it?
Look at this again: length $ map (\(x, _) -> x == Fraction 1 2) points Try something like: length $ map (\_ -> error "look! never evaluated!") points you'll find it gives the same answer. That's because length never looks at the elements of the list.
As far as I see it though, the reverse or map call above should do nearly the same as sort, except maybe that the list needs to be stored in memory as a whole and sort has an additional *log n factor, but neither of those should matter. What's the problem here?
reverse never looks at the elements of the list, sort does. Duncan