
Bayley, Alistair wrote:
Currently, I'm experiencing what I would call "strange behaviour":
I've got a data-type
data Fraction = Fraction Int Int
to hold rational numbers (maybe there's already some built-in type for this in Haskell,
http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html
Thanks for the pointer, I knew there would be something already there :)
This list has up to 3 million elements. If I do
main = print $ length $ points
main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
main = print $ length $ reverse points
However, trying to do
import List main = print $ length $ sort points
makes memory usage go up and the program does not finish in 15m, also spending most time waiting for swapped out memory. What am I doing wrong, why is sort this expensive in this case? I would assume that computing and holding the whole list does not take too much memory, given its size and data type; doing the very same calculation in C should be straight forward. And sort should be O(n * log n) for time and also not much more expensive in memory, right?
Not having looked at your code, I think you are benefiting from fusion/deforestation in the first three main functions. In this case, although you may appear to be evaluating the entire list, in fact the list elements can be discarded as they are generated, so functions like length and reverse can run using constant space, rather than O(n) space.
How does reverse work in constant space? At the moment I can't imagine it doing so; that's why I tried it, but of course you could be right.
The sort function, however, requires that the entire list is retained, hence greater memory usage. I also think you are optimistic in the memory requirements of your 3 million element list. A list of Ints will take a lot more than 4 bytes per element (on 32-bit machines) because there's overhead for the list pointers, plus possibly the boxes for the Ints themselves. I think there are 3 machine words for each list entry (pointer to this element, pointer to next element, info-table pointer), and 2 words for each Int, but I'm just guessing: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObje cts
Of course that's the case, but the list being 3 million elements, and not, say 100 (which would still fit into memory for a simple C array of ints) I thought would make it possible. Otherwise, how can one handle such amounts in data anyway? Only using arrays?
You might get some mileage by suggesting to GHC that your Fraction type is strict e.g.
data Fraction = Fraction !Int !Int
which might persuade it to unbox the Ints, giving some space savings.
I already tried so, but this doesn't change anything to the performance. I will however try now to use the provided rational type, maybe this helps. Thanks for the answers, Daniel