[Haskell-cafe] sort and lazyness (?)

19 Dec 2008

      Hi,

I'm just a beginner trying to learn a little about Haskell, and as such 
write some toy programs (e.g. for projecteuler.net) in Haskell.

Currently, I'm experiencing what I would call "strange behaviour":

I've got a data-type

data Fraction = Fraction Int Int

to hold rational numbers (maybe there's already some built-in type for 
this in Haskell, much like for instance Scheme has a rational type?), 
and then I compute a list of pairs of those numbers, that is
[(Fraction, Fraction)].  Fraction is declared an instance of Ord.

This list has up to 3 million elements.  If I do

main = print $ length $ points

then the program prints out the length fine and takes around 6s to 
finish (compiled with GHC -O3).  Ok, but I acknowledge that length isn't 
quite an expensive function, as I can imagine that Haskell does not 
compute and hold the entire list for this but instead each element at 
once and discards it afterwards.

Doing

main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points

instead, gives a slightly longer runtime (6.6s), but in this case I'm 
sure that at least each element is computed; right?

main = print $ length $ reverse points

gives 11.9s, and here I guess (?) that for this to work, the entire list 
is computed and hold in memory.

However, trying to do

import List
main = print $ length $ sort points

makes memory usage go up and the program does not finish in 15m, also 
spending most time waiting for swapped out memory.  What am I doing 
wrong, why is sort this expensive in this case?  I would assume that 
computing and holding the whole list does not take too much memory, 
given its size and data type; doing the very same calculation in C 
should be straight forward.  And sort should be O(n * log n) for time 
and also not much more expensive in memory, right?

Am I running into a problem with lazyness?  What can I do to avoid it? 
As far as I see it though, the reverse or map call above should do 
nearly the same as sort, except maybe that the list needs to be stored 
in memory as a whole and sort has an additional *log n factor, but 
neither of those should matter.  What's the problem here?

Is this something known with sort or similar functions?  I couldn't find 
anything useful on Google, though.  My code is below, and while I would 
of course welcome critics, I do not want to persuade anyone to read 
through it.

Thanks a lot,
Daniel

----------------------------------------------------------

-- Problem 165: Intersections of lines.

import List

-- The random number generator.

seeds :: [Integer]
seeds = 290797 : [ mod (x * x) 50515093 | x <- seeds ]

numbers :: [Int]
numbers = map fromInteger (map (\x -> mod x 500) (tail seeds))

-- Line segments, vectors and fractions.

data Segment = Segment Int Int Int Int
data Vector = Vector Int Int

data Fraction = Fraction Int Int
instance Eq Fraction where
   (Fraction a b) == (Fraction c d) = (a == c && b == d)
instance Ord Fraction where
   compare (Fraction a b) (Fraction c d)
     | (a == c && b == d) = EQ
     | b > 0     = if a * d < b * c then LT else GT
     | otherwise = if a * d > b * c then LT else GT

-- Build a normalized fraction and get its value.

normalize (Fraction a b) = let g = gcd a b;
                                aa = div a g;
                                bb = div b g in
                               if bb < 0
                                 then Fraction (-aa) (-bb)
                                 else Fraction aa bb

-- Find the inner product of two vectors.

innerProduct (Vector a b) (Vector c d) = a * c + b * d

-- Find the normal vector and direction of a line segment, as well as
-- the constant in straight-normal form for a given normal vector.

normalVector (Segment x1 y1 x2 y2) = Vector (y1 - y2) (x2 - x1)
direction (Segment x1 y1 x2 y2) = Vector (x2 - x1) (y2 - y1)
nfConstant (Segment x1 y1 x2 y2) n = innerProduct n (Vector x1 y1)

-- Check if a point is between the ends of the segment times D.

betweenEndsTimes d (Segment x1 y1 x2 y2) xD yD
   = let x1D = x1 * d; x2D = x2 * d; y1D = y1 * d; y2D = y2 * d;
         xDMin = min x1D x2D; xDMax = max x1D x2D;
         yDMin = min y1D y2D; yDMax = max y1D y2D in
       (xDMin <= xD && xDMax >= xD && yDMin <= yD && yDMax >= yD
        && (x1D /= xD || y1D /= yD) && (x2D /= xD || y2D /= yD))

-- If they are not parallel, we can find their intersection point (at least,
-- the one it would be if both were straights).  Then it is easy to check if
-- it is between the endpoints for both.
--
-- n1 * x + m1 * y = c1
-- n2 * x + m2 * y = c2
--
-- => x = (c1 * m2 - x2 * m1) / (n1 * m2 - n2 * m1)
-- => y = (n1 * c2 - n2 * c1) / (n1 * m2 - n2 * m1)
--
-- (Iff they are parallel, the determinant will be 0.)

trueIntersect s1 s2 = let (Vector n1 m1) = normalVector s1;
                           (Vector n2 m2) = normalVector s2;
                           c1 = nfConstant s1 (Vector n1 m1);
                           c2 = nfConstant s2 (Vector n2 m2);
                           d = n1 * m2 - n2 * m1;
                           xD = c1 * m2 - c2 * m1;
                           yD = n1 * c2 - n2 * c1 in
                         if d == 0
                           then Nothing
                           else
                             if (betweenEndsTimes d s1 xD yD)
                                 && (betweenEndsTimes d s2 xD yD)
                               then Just ((normalize $ Fraction xD d),
                                          (normalize $ Fraction yD d))
                               else Nothing

-- Build list of segments.

takeEveryForth :: [Int] -> [Int]
takeEveryForth (a:_:_:_:t) = a : (takeEveryForth t)

n1 = numbers
n2 = tail n1
n3 = tail n2
n4 = tail n3

segments = [ Segment a b c d | ((a, b), (c, d))
                                 <- zip (zip (takeEveryForth n1)
                                             (takeEveryForth n2))
                                        (zip (takeEveryForth n3)
                                             (takeEveryForth n4)) ]

-- For the first 5000 segments, calculate intersections.

firstSegments = take 5000 segments

intersects :: [Maybe (Fraction, Fraction)]
intersects = findInters firstSegments []
   where
     findInters [] l = l
     findInters (h:t) l = findInters t (addInters h t l)
       where
         addInters _ [] l = l
         addInters e (h:t) l = addInters e t ((trueIntersect e h) : l)

getPoints :: [Maybe (Fraction, Fraction)] -> [(Fraction, Fraction)]
getPoints [] = []
getPoints (Nothing : t) = getPoints t
getPoints ((Just v) : t) = v : (getPoints t)

points = getPoints intersects

-- Main program.

main :: IO ()
main = print $ length $ reverse points
--main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
--main = print $ length $ sort points

[Haskell-cafe] sort and lazyness (?)

Daniel Kraft