Re: #ifdef considered harmful (was: DData)

Jan-Willem Maessen - Sun Labs East wrote: Simon Marlow wrote:
Hmm, wouldn't it be nice to have a "FastInt" library which re-bound the appropriate types and functions? Haskell libraries are chock-a-block with instances of "#ifdef __GHC__" which define things in terms of Int# instead of Int. We could simplify all those libraries if we did it just once, in one place.
The problem, of course, is that the naive non-GHC user could write a library which used "FastInt" polymorphically, and write code that wasn't portable to GHC. I don't think there's an easy, magic solution there. Perhaps such a library would be named FastIntWhichIPromiseToTestWithGHC.
Most of the time it's unnecessary to use an explicit Int#. If your Int is in a data structure, you can use {-# UNPACK #-} !Int
which is portable, and compiles to an unboxed Int in GHC >= 6.2.
The real problem, of course, isn't the data structures at all---GHC does an excellent job with these (though the "UNPACK" pragma is a revelation). The actual problem is function arguments. GHC can't do worker-wrapper unless a function is strict. Often we end up with something like this: data Tree a = Branch !Int a (Tree a) (Tree a) lookfor :: Int -> Tree a -> Maybe a lookfor Leaf key = Nothing lookfor (Branch k v l r) key | key==k = Just v | key < k = lookfor l key | otherwise = lookfor r key Here we don't use "key" in the first clause, so GHC isn't allowed to unbox. Most of the uses of Int# that I've seen in library code have been workarounds to this problem. In this case, being able to write lookfor :: FastInt -> Tree a -> Maybe a might give us some hope of getting the compiler to generate the unboxed code we might want, yet still afford us the opportunity to use boxed types if that's not possible. We could do this today without changing the language or the compilers. The library would have to be carefully coded... -Jan-Willem Maessen

Jan-Willem Maessen - Sun Labs East wrote:
[...] The actual problem is function arguments. GHC can't do worker-wrapper unless a function is strict. Often we end up with something like this:
data Tree a = Branch !Int a (Tree a) (Tree a)
lookfor :: Int -> Tree a -> Maybe a lookfor Leaf key = Nothing lookfor (Branch k v l r) key | key==k = Just v | key < k = lookfor l key | otherwise = lookfor r key
Here we don't use "key" in the first clause, so GHC isn't allowed to unbox. Most of the uses of Int# that I've seen in library code have been workarounds to this problem. [...]
If that's the only concern, there is no real problem. One can do: a) Help GHC a bit by hand with something like lookfor Leaf key | key == key = Nothing or lookfor Leaf key = key `seq` Nothing Not very nice, but a "traditional" way of improving strictness. b) Use GHC's -O2 flag, enabling specialization over constructors, which is exactly what you're looking for, see the "Game Plan" comment at the beginning of http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/specialise/SpecConstr.lhs?rev=1.20&content-type=text/x-cvsweb-markup Using "-O2 -ddump-simpl" with your (slightly corrected) example, one can see that the Ints are indeed unboxed during recursion. Cheers, S.

Sven Panne wrote:
Jan-Willem Maessen - Sun Labs East wrote:
[...] The actual problem is function arguments. GHC can't do worker-wrapper unless a function is strict. Often we end up with something like this:
data Tree a = Branch !Int a (Tree a) (Tree a)
lookfor :: Int -> Tree a -> Maybe a lookfor Leaf key = Nothing lookfor (Branch k v l r) key | key==k = Just v | key < k = lookfor l key | otherwise = lookfor r key
Here we don't use "key" in the first clause, so GHC isn't allowed to unbox. Most of the uses of Int# that I've seen in library code have been workarounds to this problem. [...]
If that's the only concern, there is no real problem. One can do:
a) Help GHC a bit by hand with something like
lookfor Leaf key | key == key = Nothing
or
lookfor Leaf key = key `seq` Nothing
Not very nice, but a "traditional" way of improving strictness.
And, of course, we need to remember to do it uniformly everywhere in our program. Figuring out where to put these things isn't always easy. And frankly, I'd love to stamp out the use of equality for strictification once and for all.
b) Use GHC's -O2 flag, enabling specialization over constructors, which is exactly what you're looking for, see the "Game Plan" comment at the beginning of
I had a fascinating conversation with Simon P-J some years back on exactly this subject. It's a lovely optimization. But (as you point out) I need to do -ddump-simpl (and understand the results) to be sure the compiler's done it for me. Tomorrow, the compiler could (for some reason) decide not to do it anymore. It's all terribly fragile. Fundamentally, the idea here is to give a library implementor another tool to say what they mean. They can be assured of reasonable-looking code that's still tuned to run like the blazes---with the confidence of a types backing them up. They don't have to go crawling through their code or the simplifier output a line at a time. Unboxed types were designed with exactly that goal in mind. But they're not portable. I'm suggesting a lightweight, simple wrapper which would be portable. Perhaps that wrapper should even be named "Int#"!
Using "-O2 -ddump-simpl" with your (slightly corrected) example, one can see that the Ints are indeed unboxed during recursion.
Cheers, S.

Fundamentally, the idea here is to give a library implementor another tool to say what they mean. They can be assured of reasonable-looking code that's still tuned to run like the blazes---with the confidence of a types backing them up.
Note that for Hugs and NHC, running like the blazes isn't so important but not leaking space like the blazes is. The HughesPJ pretty printer used to absolutely depend on the strictness of Int# to avoid space leaks. Adding in code to force all the Ints was tedious and I sometimes suspect there are one or two still missing. -- Alastair Reid
participants (3)
-
Alastair Reid
-
Jan-Willem Maessen - Sun Labs East
-
Sven Panne