sort and lazyness (?)

19 Dec 2008

      Hi,

I'm just a beginner trying to learn a little about Haskell, and as such 
write some toy programs (e.g. for projecteuler.net) in Haskell.

Currently, I'm experiencing what I would call "strange behaviour":

I've got a data-type

data Fraction = Fraction Int Int

to hold rational numbers (maybe there's already some built-in type for 
this in Haskell, much like for instance Scheme has a rational type?), 
and then I compute a list of pairs of those numbers, that is
[(Fraction, Fraction)].  Fraction is declared an instance of Ord.

This list has up to 3 million elements.  If I do

main = print $ length $ points

then the program prints out the length fine and takes around 6s to 
finish (compiled with GHC -O3).  Ok, but I acknowledge that length isn't 
quite an expensive function, as I can imagine that Haskell does not 
compute and hold the entire list for this but instead each element at 
once and discards it afterwards.

Doing

main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points

instead, gives a slightly longer runtime (6.6s), but in this case I'm 
sure that at least each element is computed; right?

main = print $ length $ reverse points

gives 11.9s, and here I guess (?) that for this to work, the entire list 
is computed and hold in memory.

However, trying to do

import List
main = print $ length $ sort points

makes memory usage go up and the program does not finish in 15m, also 
spending most time waiting for swapped out memory.  What am I doing 
wrong, why is sort this expensive in this case?  I would assume that 
computing and holding the whole list does not take too much memory, 
given its size and data type; doing the very same calculation in C 
should be straight forward.  And sort should be O(n * log n) for time 
and also not much more expensive in memory, right?

Am I running into a problem with lazyness?  What can I do to avoid it? 
As far as I see it though, the reverse or map call above should do 
nearly the same as sort, except maybe that the list needs to be stored 
in memory as a whole and sort has an additional *log n factor, but 
neither of those should matter.  What's the problem here?

Is this something known with sort or similar functions?  I couldn't find 
anything useful on Google, though.  My code is below, and while I would 
of course welcome critics, I do not want to persuade anyone to read 
through it.

Thanks a lot,
Daniel

----------------------------------------------------------

-- Problem 165: Intersections of lines.

import List

-- The random number generator.

seeds :: [Integer]
seeds = 290797 : [ mod (x * x) 50515093 | x <- seeds ]

numbers :: [Int]
numbers = map fromInteger (map (\x -> mod x 500) (tail seeds))

-- Line segments, vectors and fractions.

data Segment = Segment Int Int Int Int
data Vector = Vector Int Int

data Fraction = Fraction Int Int
instance Eq Fraction where
   (Fraction a b) == (Fraction c d) = (a == c && b == d)
instance Ord Fraction where
   compare (Fraction a b) (Fraction c d)
     | (a == c && b == d) = EQ
     | b > 0     = if a * d < b * c then LT else GT
     | otherwise = if a * d > b * c then LT else GT

-- Build a normalized fraction and get its value.

normalize (Fraction a b) = let g = gcd a b;
                                aa = div a g;
                                bb = div b g in
                               if bb < 0
                                 then Fraction (-aa) (-bb)
                                 else Fraction aa bb

-- Find the inner product of two vectors.

innerProduct (Vector a b) (Vector c d) = a * c + b * d

-- Find the normal vector and direction of a line segment, as well as
-- the constant in straight-normal form for a given normal vector.

normalVector (Segment x1 y1 x2 y2) = Vector (y1 - y2) (x2 - x1)
direction (Segment x1 y1 x2 y2) = Vector (x2 - x1) (y2 - y1)
nfConstant (Segment x1 y1 x2 y2) n = innerProduct n (Vector x1 y1)

-- Check if a point is between the ends of the segment times D.

betweenEndsTimes d (Segment x1 y1 x2 y2) xD yD
   = let x1D = x1 * d; x2D = x2 * d; y1D = y1 * d; y2D = y2 * d;
         xDMin = min x1D x2D; xDMax = max x1D x2D;
         yDMin = min y1D y2D; yDMax = max y1D y2D in
       (xDMin <= xD && xDMax >= xD && yDMin <= yD && yDMax >= yD
        && (x1D /= xD || y1D /= yD) && (x2D /= xD || y2D /= yD))

-- If they are not parallel, we can find their intersection point (at least,
-- the one it would be if both were straights).  Then it is easy to check if
-- it is between the endpoints for both.
--
-- n1 * x + m1 * y = c1
-- n2 * x + m2 * y = c2
--
-- => x = (c1 * m2 - x2 * m1) / (n1 * m2 - n2 * m1)
-- => y = (n1 * c2 - n2 * c1) / (n1 * m2 - n2 * m1)
--
-- (Iff they are parallel, the determinant will be 0.)

trueIntersect s1 s2 = let (Vector n1 m1) = normalVector s1;
                           (Vector n2 m2) = normalVector s2;
                           c1 = nfConstant s1 (Vector n1 m1);
                           c2 = nfConstant s2 (Vector n2 m2);
                           d = n1 * m2 - n2 * m1;
                           xD = c1 * m2 - c2 * m1;
                           yD = n1 * c2 - n2 * c1 in
                         if d == 0
                           then Nothing
                           else
                             if (betweenEndsTimes d s1 xD yD)
                                 && (betweenEndsTimes d s2 xD yD)
                               then Just ((normalize $ Fraction xD d),
                                          (normalize $ Fraction yD d))
                               else Nothing

-- Build list of segments.

takeEveryForth :: [Int] -> [Int]
takeEveryForth (a:_:_:_:t) = a : (takeEveryForth t)

n1 = numbers
n2 = tail n1
n3 = tail n2
n4 = tail n3

segments = [ Segment a b c d | ((a, b), (c, d))
                                 <- zip (zip (takeEveryForth n1)
                                             (takeEveryForth n2))
                                        (zip (takeEveryForth n3)
                                             (takeEveryForth n4)) ]

-- For the first 5000 segments, calculate intersections.

firstSegments = take 5000 segments

intersects :: [Maybe (Fraction, Fraction)]
intersects = findInters firstSegments []
   where
     findInters [] l = l
     findInters (h:t) l = findInters t (addInters h t l)
       where
         addInters _ [] l = l
         addInters e (h:t) l = addInters e t ((trueIntersect e h) : l)

getPoints :: [Maybe (Fraction, Fraction)] -> [(Fraction, Fraction)]
getPoints [] = []
getPoints (Nothing : t) = getPoints t
getPoints ((Just v) : t) = v : (getPoints t)

points = getPoints intersects

-- Main program.

main :: IO ()
main = print $ length $ reverse points
--main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
--main = print $ length $ sort points

Daniel Kraft

Jake Mcarthur

Bayley, Alistair

Daniel Kraft

Lutz Donnerhacke

Luke Palmer

Duncan Coutts

Luke Palmer

Bayley, Alistair

Luke Palmer

Bayley, Alistair

Duncan Coutts

Luke Palmer

Duncan Coutts

tags

participants (6)