
Hello all, A question on that most elusive of subjects.... performance in haskell (GHC 6.2) Using the GHC profiler, I obtained the following analysis results (i hope the code doesn't come out too ugly by mail): total time = 0.92 secs (46 ticks @ 20 ms) total alloc = 83,373,032 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc isIn Main 50.0 22.8 getCon Main 13.0 16.7 o' Main 8.7 6.6 satisfies Main 6.5 0.0 powerList Main 6.5 46.9 CAF Main 6.5 0.1 showCon Main 4.3 0.3 MAIN MAIN 4.3 0.0 a' Main 0.0 6.7 The problem child, that isIn function, has got about 78000 entries in the profile log. I should probably mention that this is an incredibly dumbed down version of the program, the dimensions of the data it is supposed to handle are such that, on a trial run I killed the process after about 15 minutes, when I found out it hadn't even completed 3% of its task. sad stuff, really. Anyways, 'isIn' is a predicate that checks whether a given Double lies within an interval, where intervals are defined as ... define an interval bound, either inclusive (I) or exclusive (E)
data Bound = I Double | E Double deriving (Eq, Show, Ord) data Interval = Il Bound Bound | Nil Bound Bound deriving (Eq,Ord)
where Nil acts as the complement of an interval, this is reflected in the isIn function. ....
isIn :: Double -> Interval -> Bool isIn r (Nil x y) = not (isIn r (Il x y)) isIn r (Il (I x) (I y)) = r >= x && r <= y isIn r (Il (I x) (E y)) = r >= x && r < y isIn r (Il (E x) (I y)) = r > x && r <= y isIn r (Il (E x) (E y)) = r > x && r < y
I tried rewriting it to something that intuitively 'feels' like it should be faster, but i have no real idea about the cost of the respective haskell expressions: ... version 2
isIn :: Double -> Interval -> Bool isIn r (Nil x y) = not (isIn r (Il x y)) isIn r (Il x y) = case x of (I x') -> if r >= x' then case y of (I y') -> r <= y' (E y') -> r < y' else False (E x') -> if r > x' then case y of (I y') -> r <= y' (E y') -> r < y' else False
... which indeed turns out to be a tad bit faster, according to the new profile log. total time = 0.80 secs (40 ticks @ 20 ms) total alloc = 64,404,104 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc isIn Main 30.0 0.0 getCon Main 25.0 21.6 powerList Main 15.0 60.7 showCon Main 7.5 0.3 o' Main 7.5 8.6 CAF Main 7.5 0.1 MAIN MAIN 5.0 0.0 a' Main 2.5 8.6 But it can hardly be called impressive. Can anyone see another obvious optimization, or have I just hit the ceiling because of the sheer number of function calls to isIn? I am still pretty new to haskell, and I find it hard to wrap my head around the way the compiler deals with my code. If someone has a few leads on general performance heuristics in haskell/GHC, I would be happy to read them too... thanks for your time. stijn

Stijn De Saeger wrote:
data Bound = I Double | E Double deriving (Eq, Show, Ord) data Interval = Il Bound Bound | Nil Bound Bound deriving (Eq,Ord)
isIn :: Double -> Interval -> Bool isIn r (Nil x y) = not (isIn r (Il x y)) isIn r (Il (I x) (I y)) = r >= x && r <= y isIn r (Il (I x) (E y)) = r >= x && r < y isIn r (Il (E x) (I y)) = r > x && r <= y isIn r (Il (E x) (E y)) = r > x && r < y
If performance is the main concern, I would flatten the data structure: data Interval = IlII Double Double | IlIE Double Double | IlEI Double Double | IlEE Double Double | NilII Double Double | NilIE Double Double | NilEI Double Double | NilEE Double Double isIn :: Double -> Interval -> Bool isIn r (IlII x y) = r >= x && r <= y isIn r (IlIE x y) = r >= x && r < y isIn r (IlEI x y) = r > x && r <= y isIn r (IlEE x y) = r > x && r < y isIn r (NilII x y) = r < x || r > y isIn r (NilIE x y) = r < x || r >= y isIn r (NilEI x y) = r <= x || r > y isIn r (NilEE x y) = r <= x || r >= y Depending on your application you might not need all of those cases. Another neat trick you can pull is to take advantage of the fact that Double is actually a discrete type, like Int, and you can therefore get away with closed intervals only: data Interval = Il Double Double | Nil Double Double isIn :: Double -> Interval -> Bool isIn r (Il x y) = r >= x && r <= y isIn r (Nil x y) = r < x || r > y But this requires nextLargestDouble and nextSmallestDouble functions. I don't know if Haskell provides them. Also, you could run into trouble with wider-than-Double intermediate values. Finally, if you never do anything with intervals except pass them to isIn, you can do this: type Interval = Double -> Bool isIn r i = i r -- Ben

On Mon, Jan 17, 2005 at 08:54:38PM -0800, Ben Rudiak-Gould wrote:
If performance is the main concern, I would flatten the data structure:
data Interval = IlII Double Double | IlIE Double Double | IlEI Double Double | IlEE Double Double | NilII Double Double | NilIE Double Double | NilEI Double Double | NilEE Double Double
I would go even further
data IntervalType = IlII | IlIE | IlEI | IlEE | NilII | NilIE | NilEI | NilEE data Interval = Interval IntervalType {-# UNPACK #-} !Double {-# UNPACK #-} !Double
now, the doubles can be stored in their native form and are not under a union data type (which always must be represented by a pointer) so accessing them can be very fast. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (3)
-
Ben Rudiak-Gould
-
John Meacham
-
Stijn De Saeger