
Hello all, This proposal would like to make it easier to build data types for strict data structures. Everything described here can be done by making a new data declaration; we'd like to support some more ad-hoc mechanisms. This is not intended to be complete, rather it's intended to jump start discussion about the selection and use of strict and lazy data structures in Haskell. 1. Move strict Maybe and Either into base We create a new module Data.Maybe.Strict and Data.Either.Strict which implement the following semantics: Just _|_ = _|_ Left _|_ = _|_ Right _|_ = _|_ We do not expose a left-strict or right-strict version of Either. 2. Move strict Tuple into base We create a new module Data.Tuple.Strict that defines a new data type Pair with: Pair _|_ a = _|_ Pair a _|_ = _|_ 3. Syntax extension for strict pairs We extend syntax to allow a way of specifying strict pairs (maybe (! 1, 2, 3 !), with operator (!,) (!,,) etc). These are distinct from normal pairs and have their own set of functions. 4. Move strict IO into base Unpredicatable resource usage behavior with hGetContents (which is lazy) and similar functions frequently affects newbies. Offer a module System.IO.Strict. We note that proposals 1, 2 and 4 are implemented by the strict package, as is a form of 3 without nice syntax. 5. Move strict-concurrency into base The package strict-concurrency offers strict versions of MVars and Chans, which are frequent culprits for unpredicatable behavior in which thread a thunk is evaluated on. As an alternative to moving 1-5 into base, we can place them in the Haskell Platform. 6. Syntax extension for anonymous strict types If I have a type (a, b), and I would like to make it strict in the first argument, I have to create a new data type. This syntax extension would permit me to create such data type simply by writing (!a, b). This is probably going to be difficult, so here are some possible implementation strategies: * Such syntax is simply sugar for creating an anonymous data type with those properties. Coercions would need to be specified manually. Type inference errors might become harder to diagnose. * We modify all types to have a strictness bit and modify the inference algorithm to propagate these bits. It sounds like a fascinating research problem with a paper at the end, if it actually works. If this is implemented, it supersedes all of the previous points (and in fact, probably many of the .Strict modules that currently exist, if application of strictness is strictly (ahem) mechanical.) Discussion time: 1 month

On Thu, Feb 17, 2011 at 6:23 PM, Edward Z. Yang
1. Move strict Maybe and Either into base
We create a new module Data.Maybe.Strict and Data.Either.Strict which implement the following semantics:
Just _|_ = _|_ Left _|_ = _|_ Right _|_ = _|_
We do not expose a left-strict or right-strict version of Either.
2. Move strict Tuple into base
We create a new module Data.Tuple.Strict that defines a new data type Pair with:
Pair _|_ a = _|_ Pair a _|_ = _|_
+1 Even if I couldn't have the remaining things in this proposal I definitely want these (in base or HP). Every other package define their own versions of these internally, under different names. Having strict types lets you express your strictness properties much simpler, by changing the return type of a function instead of littering it with bang patterns and seq:s. Johan

On 18 February 2011 02:23, Edward Z. Yang
As an alternative to moving 1-5 into base, we can place them in the Haskell Platform.
+1 for this alternative.
6. Syntax extension for anonymous strict types
If I have a type (a, b), and I would like to make it strict in the first argument, I have to create a new data type. This syntax extension would permit me to create such data type simply by writing (!a, b).
This is probably going to be difficult, so here are some possible implementation strategies:
* Such syntax is simply sugar for creating an anonymous data type with those properties. Coercions would need to be specified manually. Type inference errors might become harder to diagnose.
* We modify all types to have a strictness bit and modify the inference algorithm to propagate these bits. It sounds like a fascinating research problem with a paper at the end, if it actually works.
If this is implemented, it supersedes all of the previous points (and in fact, probably many of the .Strict modules that currently exist, if application of strictness is strictly (ahem) mechanical.)
That would be really nice! It has always felt strange I could write: data Maybe' a = Nothing | Just !a but not: type Maybe' a = Maybe !a Regards, Bas

Edward Z. Yang wrote:
3. Syntax extension for strict pairs
We extend syntax to allow a way of specifying strict pairs (maybe (! 1, 2, 3 !), with operator (!,) (!,,) etc). These are distinct from normal pairs and have their own set of functions.
FWIW, we thought about this when we were just starting out with DPH. In the end, we decided that the gains are not worth the new syntax. Now, we don't even use strict pairs (or other types) in DPH any longer because they interact rather horribly with fusion and other optimisations. BTW, the following works now: data family Strict a data instance Strict (a,b) = SPair !a !b data instance Strict (Maybe a) = SJust !a | SNothing ... I was going to add this to the strict package at some point but it never happened. Roman

Hello Roman, The strict data family idea sounds like a very easy, programmatic way to generate usable strict versions of data types, and corresponds to the explicit coercion method. With some judicious name hiding, we could keep the constructors named the same too, ala: data instance Strict (Prelude.Maybe a) = Just !a | Nothing I'd be in favor adding this to the strict package, though I'd want to try playing around with it a little to make sure it's not too clunky (one annoyance maybe having to declare instances for all types, which sounds like a good job for deriving.) I'm curious about how strict types interact horribly with fusion and other optimisations; I know strictness limits your optimisation ability, but my impression is that it shouldn't blow up too horribly unless you're trying to implement extremely clever optimisations. Cheers, Edward

On Sun, Feb 20, 2011 at 6:11 PM, Edward Z. Yang
I'm curious about how strict types interact horribly with fusion and other optimisations; I know strictness limits your optimisation ability, but my impression is that it shouldn't blow up too horribly unless you're trying to implement extremely clever optimisations.
I'm curious too. In my experience it often helps e.g. getting things unboxed. Do we really need to go all the way to type families to get strict data types? Wouldn't something simpler like a Haskell 98 Data.Strict module do? Johan

On 21 February 2011 02:28, Johan Tibell
On Sun, Feb 20, 2011 at 6:11 PM, Edward Z. Yang
wrote: I'm curious about how strict types interact horribly with fusion and other optimisations; I know strictness limits your optimisation ability, but my impression is that it shouldn't blow up too horribly unless you're trying to implement extremely clever optimisations.
I'm curious too. In my experience it often helps e.g. getting things unboxed.
I haven't seen what Roman is describing, but I would guess that it's something like if GHC sees: let x = f y in g x Then it can apply a rule g (f y) = foo. But if it sees: case f y of x -> g x It may get scared and not do the rewrite. Adding strictness to your program introduces extra case bindings at the Core level and therefore makes this problem more frequent. You can see this problem with the following program: """ g_f :: Bool -> Bool g_f y = y {-# RULES "g/f" forall y. g (f y) = g_f y #-} {-# NOINLINE f #-} f :: Bool -> Bool f = id {-# NOINLINE g #-} g :: Bool -> Bool g = not one x = g y where !y = f x two x = g y where y = f x main = do print (one True, two True) """ GHC successfully rewrites "one" but not "two" using the RULE, so the output is (False, True). Max

On Mon, Feb 21, 2011 at 11:04 AM, Max Bolingbroke
On 21 February 2011 02:28, Johan Tibell
wrote: On Sun, Feb 20, 2011 at 6:11 PM, Edward Z. Yang
wrote: I'm curious about how strict types interact horribly with fusion and other optimisations; I know strictness limits your optimisation ability, but my impression is that it shouldn't blow up too horribly unless you're trying to implement extremely clever optimisations.
I'm curious too. In my experience it often helps e.g. getting things unboxed.
I haven't seen what Roman is describing, but I would guess that it's something like if GHC sees:
let x = f y in g x
Then it can apply a rule g (f y) = foo. But if it sees:
case f y of x -> g x
It may get scared and not do the rewrite. Adding strictness to your program introduces extra case bindings at the Core level and therefore makes this problem more frequent. You can see this problem with the following program:
"""
g_f :: Bool -> Bool g_f y = y
{-# RULES "g/f" forall y. g (f y) = g_f y #-}
{-# NOINLINE f #-} f :: Bool -> Bool f = id
{-# NOINLINE g #-} g :: Bool -> Bool g = not
one x = g y where !y = f x
two x = g y where y = f x
main = do print (one True, two True) """
GHC successfully rewrites "one" but not "two" using the RULE, so the output is (False, True).
Hmm... did you mean to say "two" but not "one"?
Max
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Work is punishment for failing to procrastinate effectively.

Max Bolingbroke wrote:
On 21 February 2011 02:28, Johan Tibell
wrote: On Sun, Feb 20, 2011 at 6:11 PM, Edward Z. Yang
wrote: I'm curious about how strict types interact horribly with fusion and other optimisations; I know strictness limits your optimisation ability, but my impression is that it shouldn't blow up too horribly unless you're trying to implement extremely clever optimisations.
I'm curious too. In my experience it often helps e.g. getting things unboxed.
I haven't seen what Roman is describing, but I would guess that it's something like if GHC sees:
let x = f y in g x
Then it can apply a rule g (f y) = foo. But if it sees:
case f y of x -> g x
It may get scared and not do the rewrite.
Yes, GHC won't rewrite in this case which is a big problem. The other big problem is that strict types have much uglier laws than lazy ones. For instance, fst (x,y) = x but strictFst (StrictPair x y) = y `seq` x. Strict types lead a lot of seqs which are often quite artificial, disrupt other optimisations because of the dependencies they introduce and prevent code from becoming dead. In general, there seems to be a big difference between "evaluate this now" (which is what strict types do), "it's ok to evaluate this now" (which is what's needed for unboxing) and "this is guaranteed to be evaluated at this point" (which strict types do, too, and which is quite useful). There is also http://hackage.haskell.org/trac/ghc/ticket/4081. Roman

On February 21, 2011 06:07:19 Roman Leshchinskiy wrote:
Yes, GHC won't rewrite in this case which is a big problem. The other big problem is that strict types have much uglier laws than lazy ones. For instance, fst (x,y) = x but strictFst (StrictPair x y) = y `seq` x. Strict types lead a lot of seqs which are often quite artificial, disrupt other optimisations because of the dependencies they introduce and prevent code from becoming dead. In general, there seems to be a big difference between "evaluate this now" (which is what strict types do), "it's ok to evaluate this now" (which is what's needed for unboxing) and "this is guaranteed to be evaluated at this point" (which strict types do, too, and which is quite useful).
The "theorems for free" literature does a fairly good job at clarifying how easy it is for messing with evaluation to break laws (rewrites) http://www.iai.uni-bonn.de/~jv/seq-slides.pdf In any event, unless I'm missing something, I believe you've just pointed out that we are missing one of our three horseman[1] pseq - "this is guaranteed to be evaluated at this point" seq - "evaluate this now" ??? - "it's ok to evaluate this now" Should something be done about this? I can certainly see how it would be useful information for the compiler wrt to optimization. The non-deterministic semantics ??? _|_ x -> possibly _|_ or x are also pretty interesting. Very reminiscent of what can happen with rewrites in the presence of the other two. Could there be something here (i.e., this would allow you to have rewrites that still preserve the semantics as the non-determinism would officially be part of the semantics)? Cheers! -Tyson [1] of the parametricity apocalypse

On February 21, 2011 08:15:38 Tyson Whitehead wrote:
The non-deterministic semantics
??? _|_ x -> possibly _|_ or x
are also pretty interesting. Very reminiscent of what can happen with rewrites in the presence of the other two. Could there be something here (i.e., this would allow you to have rewrites that still preserve the semantics as the non-determinism would officially be part of the semantics)?
To clarify a bit more. Traditionally, I believe the problem is you use seq in some function f. Appearing in some composition g, f results in the semantics g _|_ /= a There is a rewrite that either you or the compiler would like to perform on the composition g to give g', but due to seq/pseq in f you windup with g' _|_ = _|_ (or the other way around). If we are using ??? instead of seq/pseq, however, this the rewrite can be expressed such that the semantics are preserved g _|_ = a or _|_ non-deterministically g' _|_ = a or _|_ non-deterministically Cheers! -Tyson

Excerpts from Tyson Whitehead's message of Mon Feb 21 08:15:38 -0500 2011:
Should something be done about this? I can certainly see how it would be useful information for the compiler wrt to optimization.
The non-deterministic semantics
??? _|_ x -> possibly _|_ or x
are also pretty interesting. Very reminiscent of what can happen with rewrites in the presence of the other two. Could there be something here (i.e., this would allow you to have rewrites that still preserve the semantics as the non-determinism would officially be part of the semantics)?
I don't have very much experience rewrite rules, so I've pushed my thoughts and opinions to the end of the post. * The bog-easy way of pulling this off (from the implementation side) would just be "seq", but ignore that it exists when you're performing rewrite rules. [1] * You can probably do better if both arguments are guaranteed (by the user) to be non-bottom. [2] There is one essential difference between seq and strict data structures, which is that we want data going into strict data structures to be known so that they can be stored unboxed; but if no storage is happening at all, it seems plausible that in that case we ought not to evaluate them at all. So it seems to me that once we decide that a structure ought to be unboxed (strict), then there's no point in applying anymore rewrite rules: we presumably want the unboxed structure because it's now going to be shipped to the far ends of the earth, traversed in non-rewritable ways, etc. So in this sense, the failure of strict rewrite rules to make sense is sensible. What we might want to figure out is if we can make both unboxed and boxed versions of data structures coexist (the boxed versions for further rewriting, unboxed for representation), and think about an operator that says, "Yes, now I want this structure to exist (efficiently) in memory" (and hope that the rewrite rules up to that point allow GHC to fill out that table directly). But non-uniform representations like this could blow up code size and cause its own problems. Cheers, Edward [1] But I don't feel very good about this, since it's basically equivalent to "yeah, we don't have any semantics about this but it will either be this or that, depending on what rules fire and which way the wind blows." Not that rewrite rules aren't already like that. [2] But that is very unsafe, and although rewrite rules are "unsafe", here we've pushed the unsafety in the land of client code.

Edward Z. Yang wrote:
* The bog-easy way of pulling this off (from the implementation side) would just be "seq", but ignore that it exists when you're performing rewrite rules. [1]
Yes, though I doubt it's all that easy. It isn't necessarily clear under exactly which conditions this should happen. We could say that we want the rule "f (g x) = f' x" to fire in this case: case g x of y -> f y What about this case, though? case g x of y -> case h x of z -> f y Or this? case g x of y -> case y of (y1,y2) -> case y1 of _ -> case y2 of _ -> f y In general, two function applications can be quite far apart in the presence of seqs and they won't be brought together by the simplifier because those seqs introduce an artificial ordering on the computations.
So it seems to me that once we decide that a structure ought to be unboxed (strict), then there's no point in applying anymore rewrite rules: we presumably want the unboxed structure because it's now going to be shipped to the far ends of the earth, traversed in non-rewritable ways, etc. So in this sense, the failure of strict rewrite rules to make sense is sensible.
I disagree. Here is an example from DPH. A central operation is mapD which applies in parallel a function to a distributed value (Dist). The intention is that each thread will apply this function to its private value: mapD :: (a -> b) -> Dist a -> Dist b We have the obvious rewrite rule: "mapD/mapD" mapD f (mapD g x) = mapD (f . g) x Now, mapD fully evaluates its argument before passing it to the function so using strict types here seems like a good idea. Suppose we use strict pairs: data a :*: b = !a :*: !b and get an intermediate term like this: mapD (\(x:*:y) -> f1 x :*: g1 y) $ mapD (\(x:*:y) -> f2 x :*: g2 y) xy The "mapD/mapD" rule gives us this: mapD (\(x :*: y) -> case f2 x of x' -> case g2 y of y' -> case f1 x' of x'' -> case g1 y' of y'' -> x'' :*: y'') xy If we have rules for f1/f2 and g1/g2, they won't fire even though we really want them to. Contrast this with lazy pairs: mapD (\(x,y) -> (f1 x, g1 y)) $ mapD (\(x,y) -> (f2 x, g2 y)) xy = mapD (\(x,y) -> (f1 (f2 x), g1 (g2 y))) xy Here, the rules will fire without any problems. But now GHC doesn't know that x and y are fully evaluated (or, rather, that it's ok to evaluate and unbox x and y before doing anything else like executing loops). Note that f1 and g1 will usually be strict so the two versions are really semantically equivalent. But GHC won't necessarily be able to figure this out on its own. Roman

This examples have made it quite obvious that I don't know what I'm talking about. :-) I'm going to study seq and rewriting more closely; please continue the discussion, it is quite enlightening. Cheers, Edward

Johan Tibell wrote:
Do we really need to go all the way to type families to get strict data types? Wouldn't something simpler like a Haskell 98 Data.Strict module do?
Sure. Type families just give us nicer syntax. Also this: class Strictify a where strictify :: a -> Strict a unstrictify :: Strict a -> a Roman
participants (7)
-
Bas van Dijk
-
Edward Z. Yang
-
Gábor Lehel
-
Johan Tibell
-
Max Bolingbroke
-
Roman Leshchinskiy
-
Tyson Whitehead