Preventing/handling space leaks

L.S., Does anyone know about documentation (preferably on the Web) on how to prevent/find/remove space leaks? Are there any differences between Hugs and GHC or any other Haskell platform, regarding space leaks? I read some discussions about why not everybody uses Haskell; it looks to me, that the problem of space leaks is a very good reason to not use Haskell for commercial applications. Java, for example, does not have this problem. -- Best regards, Henk-Jan van Tuyl

Henk-Jan.van.Tuyl wrote:
[...] it looks to me, that the problem of space leaks is a very good reason to not use Haskell for commercial applications. Java, for example, does not have this problem.
I just can't resist when I read PR statements like this (SUN's marketing department has *really* done a good job): Granted, Haskell has problems with space leaks from time to time, and it is especially easy for beginners to stumble over them, see e.g. the recurring "foldl (+) 0 [1..1000000]"-discussions popping up regularly. But for large realistic programs most programming languages converge and you basically have the choice of what kind of space leak you want: * C: Missing calls to free(), etc. * C++: All of C's leaks + lots of hard to find space leaks due to incorrectly handled exceptions + ... * Haskell: Functions which are not strict enough, thunks which are never evaluated but hold large data structures, etc. * Java: Listeners which are not de-registered, containers which are not "nulled" after removal of an element, badly written cache-like data structures, etc. Note that I earn my living with JVMs and Java programs (some >1mio LOC), but every project I worked for had problems with space leaks at some point. The advantage of Haskell IMHO is that you normally get them very early... :-] Cheers, S.

On 06-Dec-2003, Sven Panne
Henk-Jan.van.Tuyl wrote:
[...] it looks to me, that the problem of space leaks is a very good reason to not use Haskell for commercial applications. Java, for example, does not have this problem.
I just can't resist when I read PR statements like this (SUN's marketing department has *really* done a good job):
Yes, it is just plain wrong to say that Java never has space leaks.
Granted, Haskell has problems with space leaks from time to time, and it is especially easy for beginners to stumble over them,
The problem with Haskell is not so much that beginners sometimes stuble on space leaks -- the problem is that even seasoned experts have great difficulty analyzing the space usage of very simple Haskell programs.
But for large realistic programs most programming languages converge and you basically have the choice of what kind of space leak you want:
If you are suggesting that space leaks are equally frequent and equally easy to diagnose and avoid in these different languages, then I would disagree.
* C: Missing calls to free(), etc.
For C, leaks are common because it is easy to forget to insert calls to free(), and avoiding or fixing them can be difficult because figuring out when it is safe to call free() requires a non-local analysis to figure out when data is no longer used.
* C++: All of C's leaks + lots of hard to find space leaks due to incorrectly handled exceptions + ...
C does suffer from many of the same problems as C. But in C++, it is much easier to automate techniques like reference counting, which can be done manually in C but are much more cumbersome and error-prone when done manually.
* Haskell: Functions which are not strict enough, thunks which are never evaluated but hold large data structures, etc.
Yes. The difficulty with Haskell is that everything is lazy by default. There is no explicit syntax required to define a lazy function, so if you want to figure out which functions are too lazy, you may need to examine *every* function. There is often no explicit syntax for creating a thunk, so again it is difficult to spot which parts of the program are doing this.
* Java: Listeners which are not de-registered, containers which are not "nulled" after removal of an element,
In general Java can suffer from space leaks if variables which will not be used are not "nulled" out. However, avoiding and/or fixing such problems is a lot easier in Java than in C, since determining whether a variable can safely be nulled out only requires analysing uses of that variable, rather than analysing uses of the object to which that variable points. This is a _much_ easier kind of analysis. IMHO it is probably also much easier than trying to analyze space usage of Haskell programs.
badly written cache-like data structures, etc.
Those can be a problem in any language.
--
Fergus Henderson

[ Just one more mail and I promise to shut up on this topic... :-) ] Fergus Henderson wrote:
[...] C does suffer from many of the same problems as C. But in C++, it is much easier to automate techniques like reference counting, which can be done manually in C but are much more cumbersome and error-prone when done manually.
Granted, C++'s (copy) constructors, destructors and assignment operators make some things relatively easy compared to C, but the complexity of handling exceptions *correctly* makes things worse again: There is a famous article (I can't remember the title or the author) where a well-known C++ grandmaster explains a stack class, but another later article by someone else describes the numerous bugs in that class when exceptions are taken into account. And one final remark on Haskell and Java: In large projects in those languages you usually get the "genuine" space leaks in those languages plus all those nice little leaks from the ubiquitous native functions/methods. So you have to debug in at least two languages at the same time and I haven't seen combined tool support for this yet... :-P * * * Cheers, S.

Sven Panne
[ Just one more mail and I promise to shut up on this topic... :-) ]
Surely slamming C++ is on topic? :-)
Fergus Henderson wrote:
[...] C does suffer from many of the same problems as C. But in C++, it is much easier to automate techniques like reference counting, which can be done manually in C but are much more cumbersome and error-prone when done manually.
Granted, C++'s (copy) constructors, destructors and assignment operators make some things relatively easy compared to C, but the complexity of handling exceptions *correctly* makes things worse again: There is a famous article (I can't remember the title or the author) where a well-known C++ grandmaster explains a stack class, but another later article by someone else describes the numerous bugs in that class when exceptions are taken into account.
I think you're referring to this: http://www.awprofessional.com/content/images/020163371x/supplements/Exceptio... I'd also add that GC is more important in object oriented programming, it's more natural to pass objects around on a larger scale. At least IME.
And one final remark on Haskell and Java: In large projects in those languages you usually get the "genuine" space leaks in those languages plus all those nice little leaks from the ubiquitous native functions/methods. So you have to debug in at least two languages at the same time and I haven't seen combined tool support for this yet
I'm not sure I understand what you mean. Examples? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

G'day all.
Quoting Sven Panne
Granted, C++'s (copy) constructors, destructors and assignment operators make some things relatively easy compared to C, but the complexity of handling exceptions *correctly* makes things worse again: There is a famous article (I can't remember the title or the author) where a well-known C++ grandmaster explains a stack class, but another later article by someone else describes the numerous bugs in that class when exceptions are taken into account.
Far be it from me to defend C++, but this problem is far better understood today than when that GoTW article (circa 1996, from memory) was written. C++ defines two levels of exception safety: the "weak" guarantee, which merely means that there are no space leaks. This is actually quite an easy thing to guarantee. The other is the "strong" guarantee, which effectively means transaction semantics: when an exception is thrown, either the operation happened fully or it didn't at all. For operations which do not have side-effects (e.g. C++ when only manipulating local state, or Haskell when not in a stateful monad), transaction semantics are actually not that hard to do. Haskell run-time systems, for example, do it automatically, by only performing the "cell update" operation when a reduction step is complete and no further exceptions can be raised. Current C++ thinking is to do precisely the same, only manually. Prepare everything, then "update" at the end, just like a two-phase commit. If the "update" operation cannot possibly raise an exception, and everything else is weakly exception-safe, then the operation is strongly exception-safe by default. In the presence of side-effecty operations, such as I/O, neither C++ nor Haskell can provide transaction semantics without extra work from the programmer. If your disk fills up half-way through your write operation, for example, neither a C++ nor a Haskell implementation will automatically roll back the operation. If that's what you need, it's up to you to do it. Neither C++ nor Haskell programmers usually consider transaction semantics on I/O operations to be worth it, so C++ (and Haskell) programmers usually only provide weak exception safety. The point of this is that strongly exception-safe C++ is, for the most part, actually no harder in principle than memory-leak-safe C++. This wasn't generally realised in 1996, though. Cheers, Andrew Bromage

On 08-Dec-2003, ajb@spamcop.net
G'day all.
Quoting Sven Panne
: Granted, C++'s (copy) constructors, destructors and assignment operators make some things relatively easy compared to C, but the complexity of handling exceptions *correctly* makes things worse again: There is a famous article (I can't remember the title or the author) where a well-known C++ grandmaster explains a stack class, but another later article by someone else describes the numerous bugs in that class when exceptions are taken into account.
Far be it from me to defend C++, but this problem is far better understood today than when that GoTW article (circa 1996, from memory) was written.
The problems are certainly better understood. But they are also
certainly NOT understood well enough for programmers to be able to
reliably avoid them.
Even the C++ standard library itself, which has been
subject to review by the world's best C++ experts, suffers
from exception safety problems. A new exception safety
problem with std::auto_ptr was discovered just last Friday! See
http://groups.google.com.au/groups?selm=uptf3hzya.fsf%40boost-consulting.com.
Note that this class has already been the subject of extensive analysis
of its exception safety, and indeed the only reason that auto_ptr
was introduced in the first place was in an attempt to help guarantee
exception safety!
--
Fergus Henderson

Fergus Henderson

G'day all.
Quoting Gabriel Dos Reis
I think it is fair to characterize auto_ptr as a curiously broken class since its conception and design.
I also think that's a fair characterisation. std::auto_ptr has always struck me as a fairly brittle abstraction. Learning that it has a new exception safety issue doesn't exactly come as a shock. Cheers, Andrew Bromage

"Henk-Jan.van.Tuyl"
L.S.,
(Whom?)
Does anyone know about documentation (preferably on the Web) on how to prevent/find/remove space leaks? Are there any differences between Hugs and GHC or any other Haskell platform, regarding space leaks?
I should probably invest the time to learn Hat or Buddha or something, but I find I get pretty far using GHCs (mostly adopted from NHC, I believe) memory profiling. Look it up in the (excellent) User Guide!
Java, for example, does not have [space leaks]
Makes me wonder why my Java-using aquaintances are so frustrated about it, then... :-) -kzm -- If I haven't seen further, it is by standing in the footprints of giants

L.S.,
On 07 Dec 2003 22:53:45 +0100,
"Henk-Jan.van.Tuyl"
writes: L.S.,
(Whom?)
L.S. :: Abbreviation a => Latin a L.S. = "Lectoribus Salutem" -- Readers be greated
Does anyone know about documentation (preferably on the Web) on how to prevent/find/remove space leaks? Are there any differences between Hugs and GHC or any other Haskell platform, regarding space leaks?
I should probably invest the time to learn Hat or Buddha or something, but I find I get pretty far using GHCs (mostly adopted from NHC, I believe) memory profiling. Look it up in the (excellent) User Guide!
Java, for example, does not have [space leaks]
Makes me wonder why my Java-using aquaintances are so frustrated about it, then... :-)
I have spent most of the six years with my former employer coding in C and some time in Java (and other languages) and had never any trouble with space leaks; mainly because I strictly followed some rules for Good Coding Practice (never use gotos, only one return statement per function, catch exceptions as early as possible, keep variables as local as possible, always check the number of free()s, etc. (etc. == etcetera == and so forth :-) )) So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks). -- Best regards, Henk-Jan van Tuyl

hello, Henk-Jan van Tuyl wrote:
I have spent most of the six years with my former employer coding in C and some time in Java (and other languages) and had never any trouble with space leaks; mainly because I strictly followed some rules for Good Coding Practice (never use gotos, only one return statement per function, catch exceptions as early as possible, keep variables as local as possible, always check the number of free()s, etc. (etc. == etcetera == and so forth :-) )) So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks).
not that i am a big fan of n+k patterns but why are they bad coding practise? as for avoiding space leaks, it is conceptually easy -- make sure that things you thought are evaluated, really are. use a profiler to confirm that what you established by looking at the code is correct. the quiestions are not as easy if you are writing a library -- often there is a trade off between lazyness and space leaks. typically you can make your code strict, and then it will be less likely to leak, but may surprise users with its behaviour, or one can make the code lazy, but then is some situations it will leak. just my 2 stotinki. iavor -- ================================================== | Iavor S. Diatchki, Ph.D. student | | Department of Computer Science and Engineering | | School of OGI at OHSU | | http://www.cse.ogi.edu/~diatchki | ==================================================

L.S.,
On Wed, 10 Dec 2003 09:49:54 -0800, Iavor S. Diatchki
hello,
Henk-Jan van Tuyl wrote:
: :
So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks).
not that i am a big fan of n+k patterns but why are they bad coding practise?
1) It takes no effort, once you are use to it, to code without n+k patterns; on the other hand, when you often use these patterns, you might spend hours debugging an endless looping program. If you are working under high pressure in a large project, chances are, that the testing departement will find your bug and write a bug report (or worse, the customer might find it). Report handling and bug solving costs an enormous amount of money. This has resulted in the "clean room" approach for software design: prevent bugs rather than solve them. See also Finnagle's Law. 2) It is likely, that n+k patterns dissapear in the next Haskell standard. If you don't like to rewrite, test and debug all your software every few years, don't use any language/compiler features that are likely to dissapear. This is another thing that might cost companies a lot of money. -- Best regards, Henk-Jan van Tuyl

hi, first here is why i think n+k patterns are problematic; 1) they lead to some parsing awkwardness (e.g. when n+k pattern bindings are involved, but those don'treally make much sense anyways) 2) in haskell as it is, patterns are associated with algebraic datatypes, and n+k patterns may erronously suggest that all numbers are such for the rest, apologies if i appear to be ranting :-) Henk-Jan van Tuyl wrote:
1) It takes no effort, once you are use to it, to code without n+k patterns;
this does not seem like a good argument. there are many other features like that in haskell. for example, going by that we could remove lambda abstractions (i am not saying we should)
on the other hand, when you often use these patterns, you might spend hours debugging an endless looping program.
how do n+k patterns lead to looping programs?
If you are working under high pressure in a large project, chances are, that the testing departement will find your bug and write a bug report (or worse, the customer might find it). Report handling and bug solving costs an enormous amount of money. This has resulted in the "clean room" approach for software design: prevent bugs rather than solve them. See also Finnagle's Law.
i find this reasoning backward. i agree (of course!) that we should write programs without bugs. i find it strange that people often motivate that, by telling me that bugs cost a lot of money for some company. if companies happened to make money out of bugs (and some do), would it then be ok to write buggy software? i guess it all comes down to what one takes as primary -- writing good software, or making money.
2) It is likely, that n+k patterns dissapear in the next Haskell standard. If you don't like to rewrite, test and debug all your software every few years, don't use any language/compiler features that are likely to dissapear. This is another thing that might cost companies a lot of money.
i didn't know anyone is working on a next haskell standard. have n+k patterns been made obsolete? -iavor -- ================================================== | Iavor S. Diatchki, Ph.D. student | | Department of Computer Science and Engineering | | School of OGI at OHSU | | http://www.cse.ogi.edu/~diatchki | ==================================================

On Thu, 11 Dec 2003 06:23:51 -0800
"Iavor S. Diatchki"
Henk-Jan van Tuyl wrote: ...
2) It is likely, that n+k patterns dissapear in the next Haskell standard. If you don't like to rewrite, test and debug all your software every few years, don't use any language/compiler features that are likely to dissapear. This is another thing that might cost companies a lot of money.
i didn't know anyone is working on a next haskell standard. have n+k patterns been made obsolete?
I don't know of any other feature that has had this, "Many people feel that n+k patterns should not be used. These patterns may be removed or changed in future versions of Haskell.", said about it in the (online) Report (3.17.2). Obviously, the second sentence is vacuous; anything can be removed or changed in future versions of Haskell, but the intent seems clear (and is somewhat self-fulfilling).

The n+k pattern issue inspired endless debates on the Haskell committee and this feature was considered for removal in nearly every iteration of the Haskell report. We all agreed that n+k is extremely ad-hoc but that certain programs can be expressed slightly more elegantly using them. Unfortunately n+k doesn't match against negative numbers, so let n+1 = ... in .... n ... is not the same as let n = .... in .... (n-1) ... Of course n+k was designed for natural numbers but these are not a separate numeric type so you get a certain amount of confusion. One proposal was to make naturals a distinct type and restrict n+k to only naturals. The syntactic issues surrounding n+k are truely awful and I still have to look at the report to remember what happens with these: n+1 = 2 (n+1) = 2 A lot of people would have been happy to replace n+k by some view-like mechanism that gives the user control over the meaning of n+k but we never managed to get views into the report. John

L.S.,
On Thu, 11 Dec 2003 06:23:51 -0800, Iavor S. Diatchki
Henk-Jan van Tuyl wrote:
1) It takes no effort, once you are use to it, to code without n+k patterns;
this does not seem like a good argument. there are many other features like that in haskell. for example, going by that we could remove lambda abstractions (i am not saying we should)
This is, of course, as apposed to the effort it takes to solve problems that might occur when using n+k patterns.
on the other hand, when you often use these patterns, you might spend hours debugging an endless looping program.
how do n+k patterns lead to looping programs?
I can't remember exactly, it was a complicated story and I already decided not to use these patterns (it had something to do with pattern bindings; I don't know where I read it).
If you are working under high pressure in a large project, chances are, that the testing departement will find your bug and write a bug report (or worse, the customer might find it). Report handling and bug solving costs an enormous amount of money. This has resulted in the "clean room" approach for software design: prevent bugs rather than solve them. See also Finnagle's Law.
i find this reasoning backward. i agree (of course!) that we should write programs without bugs. i find it strange that people often motivate that, by telling me that bugs cost a lot of money for some company. if companies happened to make money out of bugs (and some do), would it then be ok to write buggy software? i guess it all comes down to what one takes as primary -- writing good software, or making money.
It is almost impossible to write large programs without bugs; it takes a lot of effort to prevent/solve bugs, and a lot of people don't seem to care much about this. I have often heard remarks like: - We don't have time to do things right the first time; our software will be too expensive if we try to do that. - It's easier to solve bugs afterwards. (How will you know you found every bug?) - It takes too much time to think about a structured solution. - If I use a goto, I have less typing to do. You can write perfect programs using gotos. - After explaining the first few steps of the V-model to a new project leader: "Whenever will you get to the real work if you do all that?!" - (At the yearly performance evaluation:) You asked me to write down the specs of the program that you were to design: you shouldn't stick to formalism when we have so little time. - We have to release version 3.0 before our competitor (or: we have to stick to the release date); we will solve the rest of the bugs in 3.1 (as we had only one month to solve a lot of bugs and add the missing pieces and it wasn't properly tested, release 3.1.1 was necessary to get the product accepted by our customers). - The client doesn't mind a few bugs. If you look up statistics in software engineering books, you will find that only one or two percent of the software is "bugfree" and according to customer specifications, at the first release. An amazing percentage of projects is even never finished (I believe it was twenty percent). To get back to the topic of this mailing list: in my search for reliable software for a reasonable amount of work, I found that Lockheed uses a functional language in the specification phase of their projects, to create testable function specifications. That's why I am now studying Haskell, to use either as a specification language or for coding the final product.
2) It is likely, that n+k patterns dissapear in the next Haskell standard. If you don't like to rewrite, test and debug all your software every few years, don't use any language/compiler features that are likely to dissapear. This is another thing that might cost companies a lot of money.
i didn't know anyone is working on a next haskell standard. have n+k patterns been made obsolete?
See http://www.haskell.org/development/ or search the web for "Haskell 2" or "Haskell II".
-iavor
-- Best regards, Henk-Jan van Tuyl

G'day all.
Quoting Henk-Jan van Tuyl
So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks).
Here are a few rules of thumb: - Know what data is "transient" and what data is "persistent". - Persistent data should be fully evaluated where possible. - Transient data should not have multiple consumers if possible. - Use large CAFs sparingly. Also good: http://users.aber.ac.uk/afc/stricthaskell.html Cheers, Andrew Bromage
participants (10)
-
ajb@spamcop.net
-
Derek Elkins
-
Fergus Henderson
-
Gabriel Dos Reis
-
Henk-Jan van Tuyl
-
Henk-Jan.van.Tuyl
-
Iavor S. Diatchki
-
John Peterson
-
ketil+haskell@ii.uib.no
-
Sven Panne