
Hello to all Haskell experts! I have two more beginner questions: I ran into texts mentioning types of "mutable" data. Does that mean the data is actually changed in place? How would I go about implementing such a data type, as I expect much higher performance in some applications? Do I do that with the Foreign module and Ptr types? The second question is this: say I have some functions working with a data type, say a graph. Some of them might want to change the graph and return a new graph as a result. If they change say only one graph node, it would be extremely bad for performance if the whole graph got copied any time such a change takes place. Is the compiler smart enough to make the code only copy the parts that have changed (copying lazily, so to speak)? Is that even possible? Thank you in advance for any hint! Christian

Hi Christian!
On Tue, Aug 17, 2010 at 2:34 PM, C Gosch
I have two more beginner questions: I ran into texts mentioning types of "mutable" data. Does that mean the data is actually changed in place? How would I go about implementing such a data type, as I expect much higher performance in some applications? Do I do that with the Foreign module and Ptr types?
Yes, by mutable data we mean data that changes in place. There are several ways to use mutable data in Haskell: mutable variables and arrays in the ST monad, IORefs, MVars (for concurrent access), and Ptrs (although these should mostly be used for foreign function calls).
The second question is this: say I have some functions working with a data type, say a graph. Some of them might want to change the graph and return a new graph as a result. If they change say only one graph node, it would be extremely bad for performance if the whole graph got copied any time such a change takes place. Is the compiler smart enough to make the code only copy the parts that have changed (copying lazily, so to speak)? Is that even possible?
Since the graph is immutable most of the nodes in the new graph are shared with nodes in the old graph. This is know as path copying and is everywhere persistent ("immutable") data structures are used. Updating a data structure (e.g. a tree) typically requires O(log n) nodes to be copied. -- Johan

Hi Johan,
thanks for the quick answer!
2010/8/17 Johan Tibell
Some of them might want to change the graph and return a new graph as a
result. If they change say only one graph node, it would be extremely bad for performance if the whole graph got copied any time such a change takes place. Is the compiler smart enough to make the code only copy the parts that have changed (copying lazily, so to speak)? Is that even possible?
Since the graph is immutable most of the nodes in the new graph are shared with nodes in the old graph. This is know as path copying and is everywhere persistent ("immutable") data structures are used. Updating a data structure (e.g. a tree) typically requires O(log n) nodes to be copied.
So that means the compiler figures this out for my own data structures, or do I have to take care that copying is done this way? Sorry for my ignorance, I'm still learning :)

On Tuesday 17 August 2010 16:08:49, C Gosch wrote:
Since the graph is immutable most of the nodes in the new graph are shared with nodes in the old graph. This is know as path copying and is everywhere persistent ("immutable") data structures are used. Updating a data structure (e.g. a tree) typically requires O(log n) nodes to be copied.
So that means the compiler figures this out for my own data structures, or do I have to take care that copying is done this way? Sorry for my ignorance, I'm still learning :)
It's done automatically (and, if your nodes contain huge data, you'll be happy to learn tha the data isn't copied at all, only the pointers to it).

On Tue, Aug 17, 2010 at 4:08 PM, C Gosch
Since the graph is immutable most of the nodes in the new graph are shared with nodes in the old graph. This is know as path copying and is everywhere persistent ("immutable") data structures are used. Updating a data structure (e.g. a tree) typically requires O(log n) nodes to be copied.
So that means the compiler figures this out for my own data structures, or do I have to take care that copying is done this way? Sorry for my ignorance, I'm still learning :)
In fact this is not even a case of optimization by the compiler (it doesn't have to "figure it out") but simply a consequence of the evaluation model chosen for Haskell. Think of the "delete" function on list for instance :
delete _ [] = [] delete e (x:xs) = if e == x then xs else x : delete e xs
It delete the first occurrence of e in the list. If you follow the logic of this code, you'll see that the part of the list after the first occurrence of e will be shared with the result of delete. This is automatic and apparent in the code since you said "if x == e then _xs_", in most language you could do the same, but you would then have to be very cautious since modification of your old list would also change your new list (and only if you modified the part after the first "e"), making this a very dangerous function and in practice this solution wouldn't be proposed by a sane library. In Haskell this function is perfectly safe since you _can't_ modify the old list, it is immutable. When you understand this, you'll see that there's plenty of interesting data structures that allow manipulations to share the greatest part of the structure thus allowing to efficiently use immutable data structure in many more situation than you would have thought. Chris Okasaki "Purely Functional Data Structure" is an excellent introduction to this world (where the efficient data structure may not be the same as in your imperative programs). -- Jedaï
participants (4)
-
C Gosch
-
Chaddaï Fouché
-
Daniel Fischer
-
Johan Tibell