using mutable data structures in pure functions

Consider the following idea for implementing pure functions: A pure function can allocate and modify memory as long as a) it never returns a reference to the memory or b) it never again modifies the memory once it returns (i.e. it returns an immutable object). This is based on the idea of "value objects" in object-oriented languages - the only mutation of a value object occurs in the constructor. Once the constructor is finished initializing the object it becomes immutable. My first question is whether or not this is accurate or does it need some qualifications / fixing up. Secondly, where does this idea fit into the Haskell philosophy? For example, consider the definition of Data.List.nub: nub l = nub' l [] where nub' [] _ = [] nub' (x:xs) ls | x `elem` ls = nub' xs ls | otherwise = x : nub' xs (x:ls) Clearly the memory allocated to ls never escapes nub', so it seems that ls could be replaced with a mutable data structure (with an eye towards improving performance in special cases). For another example, consider Data.Map.fromList. I kind of expected fromList to build up the map using a mutable data structure and then "seal it up" before returning it, but it seems to call the same insert that one would call to add to the map after it has been constructed. So, again, what is the Haskell philosophy towards using mutable data structures in pure functions? Is it: 1. leave it to the compiler to find these kinds of opportunities 2. just use the immutable data structures - after all they are just as good (at least asymptotically) 3. you don't want to use mutable data structures because of _____ 4. it does happen, and some examples are ______ Thanks, ER

I'm sure others will want to chime in here, but I'll offer my two cents.
On Sun, 11 Mar 2012 22:38:56 -0500, E R
So, again, what is the Haskell philosophy towards using mutable data structures in pure functions? Is it:
1. leave it to the compiler to find these kinds of opportunities 2. just use the immutable data structures - after all they are just as good (at least asymptotically) 3. you don't want to use mutable data structures because of _____ 4. it does happen, and some examples are ______
You will find that a surprising amount of the time this will be sufficient. After all, programmer time is frequently more expensive than processor time. That being said, there are some cases where you really do want to be able to utilize a mutable data structure inside of an otherwise pure algorithm. This is precisely the use of the ST monad. ST serves to allow the use of mutable state inside of a function while hiding the fact from the outside world. Cheers, - Ben

On 3/11/12 11:52 PM, Ben Gamari wrote:
That being said, there are some cases where you really do want to be able to utilize a mutable data structure inside of an otherwise pure algorithm. This is precisely the use of the ST monad. ST serves to allow the use of mutable state inside of a function while hiding the fact from the outside world.
Do note, however, that in certain cases the ST approach can be much slower than the obvious immutable approach (i.e., the State monad--- especially when implemented directly via argument passing, rather than using the monadic interface). I have some closed-source code where that assumption bit me; the ST code was over 4x slower. One reason this can happen is that, since Haskell is predominantly pure, a whole lot of work has gone into optimizing the pure case. Another reason is that, if the compiler can see that it's pure, then it knows a bunch of safe optimizations the programmer may not have thought about. A final major reason is that often the rearrangements necessary to get things into state-passing form turn out to optimize the algorithm anyways (e.g., by ensuring linear access to inputs, etc) So don't just assume that ST/mutability == fast. -- Live well, ~wren

On Wed, Mar 14, 2012 at 5:50 PM, wren ng thornton
Do note, however, that in certain cases the ST approach can be much slower than the obvious immutable approach (i.e., the State monad--- especially when implemented directly via argument passing, rather than using the monadic interface). I have some closed-source code where that assumption bit me; the ST code was over 4x slower.
An additional reason is that runST often allocates a closure (as it's marked NOINLINE) and thus using it e.g. in a tight loop can increase allocation. -- Johan

On Sun, Mar 11, 2012 at 8:38 PM, E R
1. leave it to the compiler to find these kinds of opportunities 2. just use the immutable data structures - after all they are just as good (at least asymptotically) 3. you don't want to use mutable data structures because of _____ 4. it does happen, and some examples are ______
5. There's no substitute for thinking and understanding the trade-offs that you are making. :)

There is a "trick" to `nub` where you couldn't implement the internal
lookup list with an (assumed faster) search tree anyway.
`nub` only mandates equality not ordering, so building a ordered
structure like a binary tree is impossible. In practice i would be
hard to beat list as the intermediate structure in this case.
On 12 March 2012 03:38, E R
For example, consider the definition of Data.List.nub:
nub l = nub' l [] where nub' [] _ = [] nub' (x:xs) ls | x `elem` ls = nub' xs ls | otherwise = x : nub' xs (x:ls)
Clearly the memory allocated to ls never escapes nub', so it seems that ls could be replaced with a mutable data structure (with an eye towards improving performance in special cases).
participants (5)
-
Ben Gamari
-
E R
-
Johan Tibell
-
Stephen Tetley
-
wren ng thornton