
Hi data Tree = Leaf Int | Node Tree Int Tree occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r It works but I'd like to know if it can be improved in any way. Thanks, Paul

prstanley:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way.
You could probably get away with: data Tree = Leaf !Int | Node Tree !Int Tree but that's a minor issue. I don't see anything wrong this this code -- isn't that what you'd hope to write? -- Don

On 12/2/07, Don Stewart
prstanley:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way.
You could probably get away with:
data Tree = Leaf !Int | Node Tree !Int Tree
but that's a minor issue.
IMO, there's no reason to even think about putting in the strictness annotations unless you've identified this datatype as part of a performance bottleneck in production code. Otherwise, there's no need to clutter your code and your mind with them :-) Cheers, Tim -- Tim Chevalier * catamorphism.org * Often in error, never in doubt "Base eight is just like base ten, really... if you're missing two fingers." -- Tom Lehrer

catamorphism:
On 12/2/07, Don Stewart
wrote: prstanley:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way.
You could probably get away with:
data Tree = Leaf !Int | Node Tree !Int Tree
but that's a minor issue.
IMO, there's no reason to even think about putting in the strictness annotations unless you've identified this datatype as part of a performance bottleneck in production code. Otherwise, there's no need to clutter your code and your mind with them :-)
Very true ("a minor issue"). However, *rarely* do you want lazines in the atomic types (Int,Char,Word,Integer, et al) A little test: main = do n <- getArgs >>= readIO . head let t = make (n*2) n print (check t) make :: Int -> Int -> Tree make i 0 = Node (Leaf 0) i (Leaf 0) make i d = Node (make (i2-1) d2) i (make i2 d2) where i2 = 2*i d2 = d-1 check :: Tree -> Int check (Leaf _) = 0 check (Node l i r) = i + check l - check r Fully lazy: data Tree = Leaf Int | Node Tree Int Tree $ time ./A 25 49 ./A 25 18.20s user 0.04s system 99% cpu 18.257 total ^^^^^^ 3556K heap use. Strict in the elements, lazy in the spine: data Tree = Leaf !Int | Node Tree !Int Tree $ time ./A 25 49 ./A 25 14.41s user 0.03s system 99% cpu 14.442 total ^^^^^^ 3056K heap use. You'll be hard pressed to do better in C here. Finally, strict in the spine and elements: data Tree = Leaf !Int | Node !Tree !Int !Tree $ time ./A 25 A: out of memory (requested 1048576 bytes) ./A 25 11.46s user 2.60s system 97% cpu 14.379 total 657M heap use ^^^^ Lesson: * Full strictness == teh suckness. * Mixed lazy and strict == flexible and efficient data types. Makes me wonder why Map is strict in the spine, data Map k a = Tip | Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a) Anyone know? Cheers, Don

Fully lazy:
data Tree = Leaf Int | Node Tree Int Tree
$ time ./A 25 49 ./A 25 18.20s user 0.04s system 99% cpu 18.257 total ^^^^^^ 3556K heap use.
Strict in the elements, lazy in the spine:
data Tree = Leaf !Int | Node Tree !Int Tree
$ time ./A 25 49 ./A 25 14.41s user 0.03s system 99% cpu 14.442 total ^^^^^^ 3056K heap use.
And, oh, element strict, with -funbox-strict-fields, $ time ./A 25 49 ./A 25 12.25s user 0.01s system 99% cpu 12.346 total -- Don

dons:
* Full strictness == teh suckness. * Mixed lazy and strict == flexible and efficient data types.
Makes me wonder why Map is strict in the spine,
data Map k a = Tip | Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
Spencer points out that being sized-balanced, subtree sizes must be computed on insertion, so may as well make the structure spine strict anyway. -- Don

On Sun, 2007-12-02 at 21:54 -0800, Don Stewart wrote:
catamorphism:
On 12/2/07, Don Stewart
wrote: prstanley:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way.
You could probably get away with:
data Tree = Leaf !Int | Node Tree !Int Tree
but that's a minor issue.
IMO, there's no reason to even think about putting in the strictness annotations unless you've identified this datatype as part of a performance bottleneck in production code. Otherwise, there's no need to clutter your code and your mind with them :-)
Very true ("a minor issue").
I agree that (in this context, beginning learning Haskell) it is a somewhat minor issue. But I disagree that this is something you should ignore until it becomes a problem and I do think that it should be part of learning Haskell. Properly using strictness is an important part of using Haskell. It makes the difference between code that stack overflows and code that doesn't, code that takes 100 seconds and code that takes 10, code that uses 3MB of RAM and code that uses 600. At least the first of these is not, in my mind, the difference between "optimized" and "unoptimized", but rather the difference between correct and incorrect. Writing better code at the beginning is much easier than trying to figure out what the problem is later. Furthermore, writing better code is not more difficult. In this case it merely means adding two characters. Of late, the "rules of thumb" for this sort of thing are becoming more widely known. Such things need to be "instinctively" part of how you write code, much like writing code tail-recursively or not using (++) left associatively. It's not that you should immediately know that this is better, but (more strongly) that you should not even think of the worse ways to begin with in many cases.

I agree that (in this context, beginning learning Haskell) it is a somewhat minor issue. But I disagree that this is something you should ignore until it becomes a problem and I do think that it should be part of learning Haskell. Properly using strictness is an important part of using Haskell. It makes the difference between code that stack overflows and code that doesn't, code that takes 100 seconds and code that takes 10, code that uses 3MB of RAM and code that uses 600. At least the first of these is not, in my mind, the difference between "optimized" and "unoptimized", but rather the difference between correct and incorrect. Writing better code at the beginning is much easier than trying to figure out what the problem is later. Furthermore, writing better code is not more difficult. In this case it merely means adding two characters. Of late, the "rules of thumb" for this sort of thing are becoming more widely known. Such things need to be "instinctively" part of how you write code, much like writing code tail-recursively or not using (++) left associatively. It's not that you should immediately know that this is better, but (more strongly) that you should not even think of the worse ways to begin with in many cases.
It would be great if someone could exemplify these "rules of thumb", e.g. "Primitive types such as Int should be strict unless in the three canonical examples X, Y and Z." My strictness radar is still quite poor and I feel I can't make informed decisions on when I need to make something more strict or lazy. -- Johan

"Johan Tibell"
It would be great if someone could exemplify these "rules of thumb", e.g. "Primitive types such as Int should be strict unless in the three canonical examples X, Y and Z." My strictness radar is still quite poor and I feel I can't make informed decisions on when I need to make something more strict or lazy.
I find that I often need to add strictness when: left thumb) parsing [Char] into something more compact, i.e. almost all cases. right thumb) storing data into maps, especially when the values are produced by multiple updates - i.e. doing word frequency counts. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Mon, 2007-12-03 at 10:48 +0100, Ketil Malde wrote:
"Johan Tibell"
writes: It would be great if someone could exemplify these "rules of thumb", e.g. "Primitive types such as Int should be strict unless in the three canonical examples X, Y and Z." My strictness radar is still quite poor and I feel I can't make informed decisions on when I need to make something more strict or lazy.
I find that I often need to add strictness when:
left thumb) parsing [Char] into something more compact, i.e. almost all cases. right thumb) storing data into maps, especially when the values are produced by multiple updates - i.e. doing word frequency counts.
Indeed, this generalizes fairly well. In general when going from a "large" structure (especially recursive types or arrays) to a "small" one (especially base types or small non-recursive types e.g. a vector type) you want strictness. Cale Gibbard argues that this is the only case where strictness is desirable. In the other three cases, "small" to "large", "large" to "large" and "small" to "small" either laziness is preferable or there is not a big difference between them. http://www.haskell.org/haskellwiki/Stack_overflow gives some advice on how to choose strictness for avoiding stack overflows. On that page you can see the above rule in action in, for example, the difference between concat :: [[a]] -> [a] and sum :: Num a => [a] -> a. The techniques sidebar on the Performance page, http://www.haskell.org/haskellwiki/Performance also contains some bits of advice. For example, the widely known advice about making accumulating parameters strict. This is related to Ketil's "right rule of thumb." Oftentimes the best way to get good behaviour is to use strict (in the appropriate places) data constructors. This will often eliminate most or all of the need for (other) strictness annotations. For example, one way of solving the issue with the scanl code on the Stack Overflow wiki page is by using a head strict list type (which, incidentally, Clean has native support for.) In fact, I suspect most of the time a head strict list type is either comparable or what is desired (though certainly not all of the time).

G'day all. On Mon, 2007-12-03 at 10:48 +0100, Ketil Malde wrote:
I find that I often need to add strictness when:
left thumb) parsing [Char] into something more compact, i.e. almost all cases. right thumb) storing data into maps, especially when the values are produced by multiple updates - i.e. doing word frequency counts.
Quoting Derek Elkins
Indeed, this generalizes fairly well. In general when going from a "large" structure (especially recursive types or arrays) to a "small" one (especially base types or small non-recursive types e.g. a vector type) you want strictness.
On the "right thumb" rule, just a quick comment. In general, it makes sense for the "spine" of data structures to be lazy but the "content" to be strict, if the structure depends on the content. So, for example, in a binary search tree, it would make sense for the pointers-to-nodes to be lazy but the keys to be strict. However, if the "content" does _not_ determine the structure (e.g. lists), then it should not be strict by default. So, for example, in a binary search tree, while it makes sense for "keys" to be strict, it is wrong for "values" to be strict by default. Expressed as rules of thumb: 1. Data structure "spines" should almost always be lazy. 2. If it's logically a Functor, strictness will break the axioms, so don't do that. Cheers, Andrew Bromage

On Dec 3, 2007 9:23 PM,
2. If it's logically a Functor, strictness will break the axioms, so don't do that.
What do you mean by breaking the axioms? If I define
data List a = Nil | Cons !a !(List a)
instance Functor List where fmap f Nil = Nil fmap f (Cons x xs) = Cons (f x) (fmap f xs)
Then the laws 1) fmap id == id 2) fmap f . fmap g == fmap (f . g) won't hold? What am I missing here? Are there some bottoms hiding out? ... Oh, I think I saw one! Let
f x = 1 g x = _|_ l = Cons 2 Nil
Then
fmap f (fmap g l) == fmap f (Cons _|_ Nil) == fmap f _|_ == _|_
but
fmap (f . g) l == Cons (f (g 2)) Nil == Cons (f _|_) Nil == Cons 1 Nil
right? Very interesting. Is this written somewhere on the wiki? Cheers, -- Felipe.

Johan Tibell wrote:
It would be great if someone could exemplify these "rules of thumb", e.g. "Primitive types such as Int should be strict unless in the three canonical examples X, Y and Z." My strictness radar is still quite poor and I feel I can't make informed decisions on when I need to make something more strict or lazy.
+1 When I first learned Haskell, lazyness sounded like a great idea, and I was somewhat puzzled as to why you would ever want to turn such a thing off. Fortunately (?!) after lots of experiments with the lambda calculus and other such things, I quickly realised that reducing large subexpression can sometimes be a big win. But I couldn't find much on the Wiki that explains all this stuff, and it would probably be quite useful to have! Of course, now we need somebody to *write* the thing...

dons:
Strict in the elements, lazy in the spine:
data Tree = Leaf !Int | Node Tree !Int Tree
$ time ./A 25 49 ./A 25 14.41s user 0.03s system 99% cpu 14.442 total ^^^^^^ 3056K heap use.
And, finally, we can get a little speedup again over the basic element-strict, -funboxed-strict-fields tree by parallelising some of the traversals. Nothing great, but I didn't try very hard: Serial code: {-# OPTIONS -O2 -funbox-strict-fields #-} import System.Environment data Tree = Leaf !Int | Node Tree !Int Tree main = do n <- getArgs >>= readIO . head let t = make (n*2) n print (check t) make :: Int -> Int -> Tree make i 0 = Node (Leaf 0) i (Leaf 0) make i d = Node (make (i2-1) d2) i (make i2 d2) where i2 = 2*i d2 = d-1 check :: Tree -> Int check (Leaf _) = 0 check (Node l i r) = i + check l - check r Running: $ time ./A 28 55 ./A 28 24.39s user 0.03s system 99% cpu 24.424 total Ok. Now, parallelise that recursive call in 'check': check :: Tree -> Int check (Leaf _) = 0 check (Node l i r) = lp `par` (rp `pseq` i + lp - rp) -- <-- where lp = check l rp = check r Super-simple strategy -- will register too many sparks, but what the heh... $ time ./B 28 +RTS -N2 55 ./B 28 +RTS -N2 31.81s user 0.14s system 147% cpu 21.700 total Pretty good for a naive strategy, and only one branch, on one line had to be modified. Control.Parallel, yay! -- Don

On Mon, Dec 03, 2007 at 05:20:35AM +0000, PR Stanley wrote:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way. Thanks, Paul
Note that occurs checks elements in pre-order; so it can be factored into preorder and elem, where elem is defined in the Prelude: preorder (Leaf n) = [n] preorder (Node l n r) = n : preorder l ++ preorder r occurs x = elem x . preorder --- Also, Tree can be made to work on any type of value as a type constructor: data Tree k = Leaf k | Node (Tree k) k (Tree k) -- definitions of preorder and occurs are the same as before Stefan

On Mon, Dec 03, 2007 at 05:20:35AM +0000, PR Stanley wrote:
Hi data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way. Thanks, Paul
I'm not sure this will count as improvement, but you can build the same kinds of trees with a simpler datatype: data Tree = Empty | Node Tree Int Tree This way there is only one constructor with the Int value, which can simplify other code a bit. Best regards Tomasz

PR Stanley wrote:
data Tree = Leaf Int | Node Tree Int Tree
occurs :: Int -> Tree -> Bool occurs m (Leaf n) = m == n occurs m (Node l n r) = m == n || occurs m l || occurs m r
It works but I'd like to know if it can be improved in any way.
That's entirely fine. The logical or || doesn't evaluate it's second argument occurs m r if the first argument occurs m l turns out to be already True. In other words, thanks to lazy evaluation, the search stops if m has been found in the left subtree, it won't search the right subtree anymore. Regards, apfelmus
participants (13)
-
ajb@spamcop.net
-
Andrew Coppin
-
apfelmus
-
Bulat Ziganshin
-
Derek Elkins
-
Don Stewart
-
Felipe Lessa
-
Johan Tibell
-
Ketil Malde
-
PR Stanley
-
Stefan O'Rear
-
Tim Chevalier
-
Tomasz Zielonka