
After programming as an exercice the sum function, my version is faster than the Data.List's version. Looking at the source code, Data.List uses a foldl and not a foldl'. foldl' seems faster and allows to use very big lists. So, why is foldl used by Data.List for sum? -- Guillaume Pinot http://www.irccyn.ec-nantes.fr/~pinot/ « Les grandes personnes ne comprennent jamais rien toutes seules, et c'est fatigant, pour les enfants, de toujours leur donner des explications... » -- Antoine de Saint-Exupéry, Le Petit Prince () ASCII ribbon campaign -- Against HTML e-mail /\ http://www.asciiribbon.org -- Against proprietary attachments

texitoi:
After programming as an exercice the sum function, my version is faster than the Data.List's version. Looking at the source code, Data.List uses a foldl and not a foldl'. foldl' seems faster and allows to use very big lists. So, why is foldl used by Data.List for sum?
It's considered a bug in the H98 report I think (since there was no fold' at the time). If you're summing sequences, have a look at Data.Vector for faster code.

Am Mittwoch 10 März 2010 22:33:43 schrieb TeXitoi:
After programming as an exercice the sum function, my version is faster than the Data.List's version. Looking at the source code, Data.List uses a foldl and not a foldl'. foldl' seems faster and allows to use very big lists. So, why is foldl used by Data.List for sum?
Because Haskell is a non-strict language, and foldl' is strict -- someone might write a (legitimate) Num instance for a datatype such that foldl (+) 0 xs returns a good value, but foldl' (+) 0 xs gives ***Exception: Prelude.undefined for some lists xs. Since Haskell is non-strict, sum xs should give a good value then. However, with optimisations turned on, for the standard Num instances, GHC knows that sum is actually strict and produces code as if sum were defined using foldl'. Depending on how you timed the functions, there are several ways how you could have obtained your result without there being an actual difference between your implementation and the library's for Int, Integer, ...

Daniel Fischer
Am Mittwoch 10 März 2010 22:33:43 schrieb TeXitoi:
After programming as an exercice the sum function, my version is faster than the Data.List's version. Looking at the source code, Data.List uses a foldl and not a foldl'. foldl' seems faster and allows to use very big lists. So, why is foldl used by Data.List for sum?
Because Haskell is a non-strict language, and foldl' is strict -- someone might write a (legitimate) Num instance for a datatype such that
foldl (+) 0 xs
returns a good value, but
foldl' (+) 0 xs
gives
***Exception: Prelude.undefined
for some lists xs. Since Haskell is non-strict, sum xs should give a good value then.
Yeah, I see. I've thought at that looking the comment of maximum.
However, with optimisations turned on, for the standard Num instances, GHC knows that sum is actually strict and produces code as if sum were defined using foldl'.
OK, good to know.
Depending on how you timed the functions, there are several ways how you could have obtained your result without there being an actual difference between your implementation and the library's for Int, Integer, ...
In ghci, that's explain the non optimized version. Same result without any option to ghc. I have similar performances between sum and foldl' (+) 0 with ghc -O2. Thanks for the explainations. -- Guillaume Pinot http://www.irccyn.ec-nantes.fr/~pinot/ « Les grandes personnes ne comprennent jamais rien toutes seules, et c'est fatigant, pour les enfants, de toujours leur donner des explications... » -- Antoine de Saint-Exupéry, Le Petit Prince () ASCII ribbon campaign -- Against HTML e-mail /\ http://www.asciiribbon.org -- Against proprietary attachments

TeXitoi wrote:
why is foldl used by Data.List for sum?
Daniel Fischer wrote:
Because Haskell is a non-strict language, and foldl' is strict -- someone might write a (legitimate) Num instance for a datatype such that foldl (+) 0 xs returns a good value, but foldl' (+) 0 xs gives ***Exception: Prelude.undefined for some lists xs.
It is possible to define such a Num instance, but it is extremely rare for anything like that to come up in practice.
However, with optimisations turned on... GHC knows that sum is actually strict
GHC does that when optimizations are turned on, but that behavior is not required by the Haskell standard. So there is no guarantee that any given compiler will produce usable output if you use foldl instead of foldl' for sum. In GHCi sum is broken, because optimizations are not in effect there. You have to define your own version of sum using foldl' for every GHCi session (or put it in your .ghci file). So it's a trade-off between a slight convenience in a bizarre corner case and general usability. I agree with Don that this is a bug in the Haskell 98 standard. Regards, Yitz

Am Donnerstag 11 März 2010 15:23:32 schrieb Yitzchak Gale:
TeXitoi wrote:
why is foldl used by Data.List for sum?
Daniel Fischer wrote:
Because Haskell is a non-strict language, and foldl' is strict -- someone might write a (legitimate) Num instance for a datatype such that foldl (+) 0 xs returns a good value, but foldl' (+) 0 xs gives ***Exception: Prelude.undefined for some lists xs.
It is possible to define such a Num instance, but it is extremely rare for anything like that to come up in practice.
Yes. And I'd expect foldr (+) 0 to be much more useful for lazy Num instances than foldl (+) 0. Nevertheless, one has to take that possibility into account.
However, with optimisations turned on... GHC knows that sum is actually strict
GHC does that when optimizations are turned on, but that behavior is not required by the Haskell standard. So there is no guarantee that any given compiler will produce usable output if you use foldl instead of foldl' for sum.
In GHCi sum is broken, because optimizations are not in effect there. You have to define your own version of sum using foldl' for every GHCi session (or put it in your .ghci file).
So it's a trade-off between a slight convenience in a bizarre corner case and general usability. I agree with Don that this is a bug in the Haskell 98 standard.
I'm not sure whether it's a wart or a bug, but I agree that it would be better to have the default sum strict (and provide lazysum for the cases where it's useful).
Regards, Yitz
Cheers, Daniel

However, with optimisations turned on... GHC knows that sum is actually strict
GHC does that when optimizations are turned on, but that behavior is not required by the Haskell standard. So there is no guarantee that any given compiler will produce usable output if you use foldl instead of foldl' for sum.
In GHCi sum is broken, because optimizations are not in effect there. You have to define your own version of sum using foldl' for every GHCi session (or put it in your .ghci file).
So it's a trade-off between a slight convenience in a bizarre corner case and general usability. I agree with Don that this is a bug in the Haskell 98 standard.
I'm not sure whether it's a wart or a bug, but I agree that it would be better to have the default sum strict (and provide lazysum for the cases where it's useful).
That would be really inconsistent with the way the rest of the Haskell language and libraries works. Note that foldl' has a ' to indicate that it's not the same as foldl exactly. I would propose that sum' exist as well as sum, and that sum be lazy. Part of what's interesting about Haskell is the default non-strict evaluation of the language, and even if it's hard for newcomers to get used to it, they're better off having a consistent space to work in rather than all these exceptions to make algorithms like "sum" work more efficiently when the problem was one of documentation and education about the language. Of course if you want a strict functional language there's plenty of those out there. Dave
Regards, Yitz
Cheers, Daniel
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Leimbach wrote:
Note that foldl' has a ' to indicate that it's not the same as foldl exactly. I would propose that sum' exist as well as sum, and that sum be lazy.
I wish Haskell allowed ! to occur (non-initially) in alphanum_' identifiers as well as in symbolic ones. Then we could be more consistent about having ! mean strictness like it does with ($!), bang patterns, strict fields,... (too bad about (!) and (!!)). The prime has so many other uses, it's a shame it gets used up for strict/lazy variants, as if there were no other variations. -- Live well, ~wren

Note that foldl' has a ' to indicate that it's not the same as foldl exactly. I would propose that sum' exist as well as sum, and that sum be lazy.
Well, meaningful identifier names is nice, but I think here we have a case of the code smell "type info embedded in the name". Strictness of a function should be expressed in the function's type instead. But that seems impossible with Haskell at the moment. (At best, we can express strictness of constructors?) Hence we have "underspecified" behaviour: Prelude Data.List> :t foldl' foldl' :: (a -> b -> a) -> a -> [b] -> a Prelude Data.List> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a and need to resort to the awkward workaround via naming conventions. Of course Haskell implementations do have some kind of strictness information (e.g., in ghc interface files), so it's not impossible to define some kind of annotation system. Although I did not check what the compiler's strictness info is for foldl and fold' - and what was actually needed (at the source level). The current textual definition (Data.List API docs: "foldl' = a strict version of foldl") is not too precise, either. Well, I guess there's a huge design space. But it's a huge problem (describing/controlling the behaviour of lazy programs). Best - J.W.

On Fri, Mar 12, 2010 at 10:29 AM, Johannes Waldmann
Well, meaningful identifier names is nice, but I think here we have a case of the code smell "type info embedded in the name". Strictness of a function should be expressed in the function's type instead. But that seems impossible with Haskell at the moment. (At best, we can express strictness of constructors?) Hence we have "underspecified" behaviour:
Prelude Data.List> :t foldl' foldl' :: (a -> b -> a) -> a -> [b] -> a
Prelude Data.List> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a
Even if we had a syntax to express that the function is strict, wouldn't we still need two distinct function names for the strict and lazy case ? In that case, some sort of convention on naming is nice, because if I want to change a function to its strict version, I know there's a good chance it's the one that ends with a ' David.
and need to resort to the awkward workaround via naming conventions.
Of course Haskell implementations do have some kind of strictness information (e.g., in ghc interface files), so it's not impossible to define some kind of annotation system.
Although I did not check what the compiler's strictness info is for foldl and fold' - and what was actually needed (at the source level). The current textual definition (Data.List API docs: "foldl' = a strict version of foldl") is not too precise, either.
Well, I guess there's a huge design space. But it's a huge problem (describing/controlling the behaviour of lazy programs).
Best - J.W.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Virebayre
Even if we had a syntax to express that the function is strict, wouldn't we still need two distinct function names for the strict and lazy case ?
OK, I'd like to register a "code smell" for: "hierarchical/systematic structure inside identifier names"; suggested refactoring: use hierarchy/structure provided by the language, in this case, something like: Data.List.Strict.fold, Data.List.Lazy.fold Or - if we had static overloading, and strictness info in the type, then we wouldn't need different names. Can of worms ...

On Fri, Mar 12, 2010 at 12:01 PM, Johannes Waldmann
David Virebayre
writes:
in this case, something like: Data.List.Strict.fold, Data.List.Lazy.fold
But then if you need both version, you will have to import them qualified, which I don't like much.

David Virebayre
But then if you need both version, you will have to import them qualified, which I don't like much.
solution: type directed name resolution: * either in the language, http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolutio... * or, failing that, simulated by a helpful (type and module aware) IDE.

On Mar 12, 2010, at 06:01 , Johannes Waldmann wrote:
in this case, something like: Data.List.Strict.fold, Data.List.Lazy.fold
Or - if we had static overloading, and strictness info in the type, then we wouldn't need different names. Can of worms ...
Doesn't help if strictness needs to be controlled at use sites and not at the type level (which is the usual case). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Johannes Waldmann
Well, meaningful identifier names is nice, but I think here we have a case of the code smell "type info embedded in the name". Strictness of a function should be expressed in the function's type instead.
I've stumbled into this sentiment before, and it's one of those "I must be dumb because everybody else takes this for granted"-issues.¹ Are there any explanation of how/why this is a type issue? To me it appears natural to use the same higher order functions (say, foldr) with strict operations or lazy ones. OTOH, I can't off-hand think of a reasonable use for foldl or foldr', so you're probably right they should be separated.
Prelude Data.List> :t foldl' foldl' :: (a -> b -> a) -> a -> [b] -> a
Prelude Data.List> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a
What should the type look like? If memory serves, Clean allows bangs in type signatures, something like: foldl' :: (a -> b -> a) -> !a -> [b] -> a but I thought it just added a seq under the hood, much like bang patterns like foldl' f !z xs = ... do in Haskell, so it's not like !a is a separate type. Or? Let me explore a bit here. In deference to the time-honored if ill-advised tradition of abusing the apostrophe, let a' be the type a, but strictly evaluated. Perhaps things could look something like: foldl' :: (a' -> b' -> a) -> a' -> b (+) :: a' -> a' -> a -- strict (:) :: a -> [a] -> [a] -- non-strict foldl' (+) 0 [1..10] -- okay foldl' (flip (:)) [] [1..10] -- illegal Would something like this work? It seems to me that strictness doesn't apply to result types (x `seq` x === x), and you need to track the relationship between a and a', since you probably want to allow a lazy value as e.g. the second parameter of foldl'. I'm sure this has been discussed to death ages ago, is there a simple overview I might understand that discusses or summarizes the issue? -k ¹ As opposed to the "everybody else must be dumb, because they disagree with me"-issues. Perhaps I can work out my IQ from the ratio? -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde
Prelude Data.List> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a
What should the type look like?
Good question - and in my posting I tried to avoid the impression that I have an answer, because I really haven't. My suggestion was that some of the annotations that are currently hidden in ghc interface files, could perhaps be lifted to source level (so the programmer can write down the properties she expects). Perhaps an analogy could be "mode" annotations for Prolog? I agree that "this must have been discussed before", and I'd appreciate some pointers.

On Fri, 12 Mar 2010, Johannes Waldmann wrote:
Ketil Malde
writes: Prelude Data.List> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a
What should the type look like?
Good question - and in my posting I tried to avoid the impression that I have an answer, because I really haven't.
foldl' :: Seq a => (a -> b -> a) -> a -> [b] -> a ? I don't think that foldl' is really what we want, since we cannot control the depth of strictness.

On 12 March 2010 10:38, Ketil Malde
What should the type look like? If memory serves, Clean allows bangs in type signatures, something like:
foldl' :: (a -> b -> a) -> !a -> [b] -> a
but I thought it just added a seq under the hood,
Thats my understanding too. I did look briefly at this, and I *think* its pretty straightforward to have a system of segregated strict and non-strict types, with a type constructor ! which "unlifts" a non-strict type to a strict one. In this system, the choice of whether to build a thunk at a let binding / application site is made based on the kind of the bound thing. You have a kind system like: k ::= * | ! | k -> k (* is standard lifted types, ! is unlifted type) And then the type constructor ! has kind "* -> !". You have to allow the (->) to have several kinds (a bit of a wart): (->) :: * -> * Lazy argument, result is enclosed in a thunk at the call site (unless at focus of evaluation) (->) :: * -> ! Lazy argument, result is evaluated eagerly at the call site (->) :: ! -> * Strict argument, result is enclosed in a thunk at the call site (unless at focus of evaluation) (->) :: ! -> ! Strict argument, result is evaluated eagerly at the call site You can then write signatures like this: eq :: !a -> !a -> Bool But what do the type quantifiers look like? The only reasonable answer is: eq :: forall (a :: *). !a -> !a -> Bool Quantifying over type variables of kind * would have to be the default to retain backwards compatibility. This is a bit annoying you will only be able to instantiate any polymorphic Haskell function at lazy types, unless you explicitly wrote it with the "strict" type signature explicitly. So you need e.g. a strict and a non-strict identity function. This seems to really be necessary because in general the code generated for the two alternatives won't be the same, so to get true "strictness polymorphism" you need 2^n copies of the code, where n is the number of type variables in the signature. There are probably some tricks you can play to ameliorate this blow up (strictness "dictionaries" anyone?) but it looks hard. Nonetheless, I think a system with segregated strict/non-strict types could be workable and interesting. I heard tell that JHC may have some prior art in this area.. Cheers, Max

Ketil Malde wrote:
What should the type look like? If memory serves, Clean allows bangs in type signatures, something like:
foldl' :: (a -> b -> a) -> !a -> [b] -> a
but I thought it just added a seq under the hood, much like bang patterns like
foldl' f !z xs = ...
do in Haskell, so it's not like !a is a separate type. Or?
The usual approach I've seen is not to distinguish strict and lazy datatypes, but rather to distinguish strict and lazy functions, e.g. by having two different arrows: (->) for lazy functions and (!->) for strict ones.[1] There are reasonable reasons for wanting to distinguish strict/lazy datatypes instead, though that's usually called distinguishing lifted vs unlifted types, which has different effects on the semantics. I don't recall which approach Clean uses.
Would something like this work? It seems to me that strictness doesn't apply to result types
Which is one benefit of the two arrows version: there's no way to talk about making the result strict.
and you need to track the relationship between a and a', since you probably want to allow a lazy value as e.g. the second parameter of foldl'.
Usually, if you're going to be distinguishing lifted and unlifted types then you'll want to be able to say that the unlifted values can always be coerced into the corresponding lifted type. Whether that involves subtyping, autoboxing, etc. will depend on the compiler and the language. [1] Actually, just having two is insufficiently expressive. What you actually want is to make the (->) type constructor take three arguments: the parameter type, the return type, and an index belonging to the set {Strict,Lazy}. That way you can use type variables to unify strictness behaviors of higher-order functions with their function arguments (i.e., the HOF is strict or lazy in various arguments depending on the strictness properties of the functional arguments). -- Live well, ~wren

On Sat, Mar 13, 2010 at 3:19 AM, wren ng thornton
The usual approach I've seen is not to distinguish strict and lazy datatypes, but rather to distinguish strict and lazy functions, e.g. by having two different arrows: (->) for lazy functions and (!->) for strict ones.[1]
But what about the laziness properties of e.g. the maybe function? ghci> maybe undefined id (Just ()) () ghci> maybe () undefined Nothing () ghci> maybe undefined id Nothing *** Exception: Prelude.undefined ghci> maybe () undefined (Just ()) *** Exception: Prelude.undefined It's clear that no type signature for maybe is going to tell you about all these cases. It's similarly impossible to imagine a type signature that will tell you that take 3 is strict up to the third nested cons of the input list, and no further. In general, laziness behaviour can get complicated quickly and so I'm not convinced that the type signature is a good home for that information. I suppose a function arrow that had the same effect as putting a ! pattern on the parameter to its left might not be a bad thing (although we could argue about the exact syntax and representation, as imo !-> is neither intuitively obvious nor aesthetically pleasing), but it's never going to make seq and ! patterns (which can be applied on a single equation rather than the whole function, and in nested/lambda bindings) and so forth redundant.

Ben Millwood wrote:
In general, laziness behaviour can get complicated quickly and so I'm not convinced that the type signature is a good home for that information.
Certainly it can. A lot of the same problems arise in the logic programming community under the topic of "modes", i.e. whether a logic variable must be ground (or rather, how defined it must be) before it's safe to "run the function backwards". Though, at present AFAIK, they've contented themselves with rough approximations like lifted/unlifted value types or strict/lazy arrows, either of which can catch the simple cases. Personally, I think the only way to capture *all* strictness information is if we have full dependent types (or worse, for logic languages, since their dependencies lack the directionality of usual dependent types). If we had full dependent types, and went with the lifted/unlifted distinction instead of the strict/lazy one, then we could give `maybe` the type: maybe :: (_:b) -> (_:a -> b') -> (m:Maybe a) -> (case m of Nothing => b ; Just _ => b') Would such a type be helpful? <shrug> I think adding full dependent types is a bit much if all we're interested in is strictness behavior. But, as you say, it seems very unlikely that we can encode strictness behaviors which may depend on particular runtime values without DTs. -- Live well, ~wren

wren> I wish Haskell allowed ! to occur (non-initially) in alphanum_' wren> identifiers as well as in symbolic ones. Then we could be more wren> consistent about having ! mean strictness BTW, does something in haskell syntax prevent '?' from appearing at the end of identifiers ? It is a nice way to name predicates. -- Paul

Am Freitag 12 März 2010 12:14:06 schrieb Paul R:
wren> I wish Haskell allowed ! to occur (non-initially) in alphanum_' wren> identifiers as well as in symbolic ones. Then we could be more wren> consistent about having ! mean strictness
BTW, does something in haskell syntax prevent '?' from appearing at the end of identifiers ?
Yes, http://haskell.org/onlinereport/lexemes.html#sect2 uniSymbol -> any Unicode symbol or punctuation Punctuation characters are symbols (as far as the lexical syntax is concerned), hence can't appear in varids, only in varsyms (operators).

On Thu, Mar 11, 2010 at 10:42 PM, wren ng thornton
David Leimbach wrote:
Note that foldl' has a ' to indicate that it's not the same as foldl exactly. I would propose that sum' exist as well as sum, and that sum be lazy.
I wish Haskell allowed ! to occur (non-initially) in alphanum_' identifiers as well as in symbolic ones. Then we could be more consistent about having ! mean strictness like it does with ($!), bang patterns, strict fields,... (too bad about (!) and (!!)). The prime has so many other uses, it's a shame it gets used up for strict/lazy variants, as if there were no other variations.
Yes, a lot of conventions don't always make sense in certain contexts, but at least the convention is there to help keep things consistent (for better or worse) Dave
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Daniel Fischer wrote:
I'm not sure whether it's a wart or a bug, but I agree that it would be better to have the default sum strict
David Leimbach wrote:
That would be really inconsistent with the way the rest of the Haskell language and libraries works... I would propose that... sum be lazy...
Daniel meant "internally strict". All versions of "sum", even the current one, are strict in the literal sense: Prelude> sum undefined `seq` 42 *** Exception: Prelude.undefined As for how it works internally, there is no assumption in Haskell about that. Generally, it is up to the compiler to decide in each individual case. Here the correct thing to do in practice is to be strict, but only a very smart compiler like GHC with -O2 can figure that out. So the library should give that hint.
all these exceptions to make algorithms like "sum" work more efficiently when the problem was one of documentation and education about the language.
This is not a question of efficiency. The current version of "sum" is *broken* - it crashes programs. Haskell education would be far better served by removing bugs from the standard libraries than by needing to include rules like "Never use the standard 'sum' function unless you are absolutely certain that the input to the function will never exceed the stack size of your runtime." Regards, Yitz

I asked a similar question a while ago on the cafe
http://www.haskell.org/pipermail/haskell-cafe/2009-June/thread.html#62772
On Wed, Mar 10, 2010 at 5:33 PM, TeXitoi
After programming as an exercice the sum function, my version is faster than the Data.List's version. Looking at the source code, Data.List uses a foldl and not a foldl'. foldl' seems faster and allows to use very big lists. So, why is foldl used by Data.List for sum?
-- Guillaume Pinot http://www.irccyn.ec-nantes.fr/~pinot/
« Les grandes personnes ne comprennent jamais rien toutes seules, et c'est fatigant, pour les enfants, de toujours leur donner des explications... » -- Antoine de Saint-Exupéry, Le Petit Prince
() ASCII ribbon campaign -- Against HTML e-mail /\ http://www.asciiribbon.org -- Against proprietary attachments
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- keithsheppard.name
participants (15)
-
Ben Millwood
-
Brandon S. Allbery KF8NH
-
Daniel Fischer
-
David Leimbach
-
David Virebayre
-
Don Stewart
-
Henning Thielemann
-
Johannes Waldmann
-
Keith Sheppard
-
Ketil Malde
-
Max Bolingbroke
-
Paul R
-
TeXitoi
-
wren ng thornton
-
Yitzchak Gale