Automatic Reference Counting

Hi guys, Apple recently announced a new static analysis in Clang called ARC (Automatic Reference Counting). The idea is to provide what GC provides (zero memory management code by the programmer), but not to incur the runtime penalty of having to have the GC run. It seems to be extremely effective in objective-C land. I was wondering if any of the compiler gurus out there could comment on the applicability of this kind of analysis to Haskell. Dropping the GC and hence stopping it blocking all threads and killing parallel performance seems like nirvana. The one major problem that's still left with ARC is that it does not solve the retain cycle problem of reference counting. Programmers must insert a "weak" keyword to break cycles in their data graphs. Could even this analysis be automated statically? Could we put up with a language extension that added this annotation? I'd love to hear some comments from people more-experienced-than-I on this. Thanks Tom Davie

Reference counting usually has much higher overheads than garbage
collection and is tricky to parallise. It's main advantage is quicker
release of memory.
I believe the main feature of ARC is that the user does not need to
manually keep reference counts up to date. I heard from people using
CPython (which uses reference counting) as a library from C that it's
very easy to accidentally forget to update the reference count
correctly. With ARC the compiler takes care of it, so there's less
opportunity for mistakes. ARC also optimizes away redundant reference
updates within a function (Haskell functions are usually small, so I
don't know how well that would work).
The reason why reference counting is usually slower is:
- Say you update a field "f" of object "o" from pointing to "a" to
pointing to "b". This entails three memory writes instead of one,
because you also need to update the reference counts of "a" and "b".
- You also need some extra space to store the reference counts.
- You need to take special care to avoid cascades to avoid
occasional long pauses. E.g., if an object's reference count goes to
zero, that will cause all objects pointed-to by that object to be
decreased. This might cause another object's reference count to go to
zero etc.
- You need a backup tracing collector to collect cycles as you mentioned.
There are many optimizations possible, e.g. delayed reference
counting, but AFAIK reference counting is in general slower than
tracing GC. It does get used in situations where quicker resource
release and more predictable GC pauses are more important than
absolute performance (or peak throughput).
On Sat, Jul 2, 2011 at 16:51, Thomas Davie
Hi guys,
Apple recently announced a new static analysis in Clang called ARC (Automatic Reference Counting). The idea is to provide what GC provides (zero memory management code by the programmer), but not to incur the runtime penalty of having to have the GC run. It seems to be extremely effective in objective-C land.
I was wondering if any of the compiler gurus out there could comment on the applicability of this kind of analysis to Haskell. Dropping the GC and hence stopping it blocking all threads and killing parallel performance seems like nirvana.
The one major problem that's still left with ARC is that it does not solve the retain cycle problem of reference counting. Programmers must insert a "weak" keyword to break cycles in their data graphs. Could even this analysis be automated statically? Could we put up with a language extension that added this annotation?
I'd love to hear some comments from people more-experienced-than-I on this.
Thanks
Tom Davie _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Push the envelope. Watch it bend.

On 2 Jul 2011, at 17:18, Thomas Schilling wrote:
Reference counting usually has much higher overheads than garbage collection and is tricky to parallise. It's main advantage is quicker release of memory.
I believe the main feature of ARC is that the user does not need to manually keep reference counts up to date. I heard from people using CPython (which uses reference counting) as a library from C that it's very easy to accidentally forget to update the reference count correctly. With ARC the compiler takes care of it, so there's less opportunity for mistakes. ARC also optimizes away redundant reference updates within a function (Haskell functions are usually small, so I don't know how well that would work).
Apple's claim is that their optimiser works globally and can optimise away redundant reference updates across functions too. It's interesting that you cite that GC is both faster and lower memory overhead – Apple's stated reasons for implementing this were that GC was both too slow and too memory intensive to use sensibly on iDevices and that ARC was both faster and less memory intensive. Tom Davie

On Sat, 2011-07-02 at 17:35 +0100, Thomas Davie wrote:
It's interesting that you cite that GC is both faster and lower memory overhead – Apple's stated reasons for implementing this were that GC was both too slow and too memory intensive to use sensibly on iDevices and that ARC was both faster and less memory intensive.
This is a little more complex that just "better" or "worse". The speed and memory overhead of reference counting depend on what percentage of your data structures are pointers, and of your program is performing pointer updates. Presumably, iOS developers would try to avoid a lot of small heap allocations, and instead use packed data structures with in-place updates to larger contiguous blocks of memory. In that case, it's possible that reference counting is much faster and reduces memory usage compared to garbage collection. This is certainly not the case in a typical functional language, though. When asking about memory usage, you also want to distinguish between "hot" memory usage (how much memory is actively used and so we want it to fit in cache) and overall memory usage (total heap size, even though a lot of it may be swapped out to disk on a desktop). Garbage collection typically increases the overall heap size, but is better than reference counting when it comes to reducing the size of the "hot" area of memory. So basically, I wouldn't call Apple's claims unusual when they are made for iOS and Objective C (though "garbage collection is too slow to use sensibly on iDevices" is definitely pushing the ridiculous side of things), but I also wouldn't expect their rather specific environment to carry over to general purpose computing, or especially to Haskell. -- Chris Smith

On 2 Jul 2011, at 17:52, Chris Smith wrote:
On Sat, 2011-07-02 at 17:35 +0100, Thomas Davie wrote:
It's interesting that you cite that GC is both faster and lower memory overhead – Apple's stated reasons for implementing this were that GC was both too slow and too memory intensive to use sensibly on iDevices and that ARC was both faster and less memory intensive.
This is a little more complex that just "better" or "worse". The speed and memory overhead of reference counting depend on what percentage of your data structures are pointers, and of your program is performing pointer updates. Presumably, iOS developers would try to avoid a lot of small heap allocations, and instead use packed data structures with in-place updates to larger contiguous blocks of memory. In that case, it's possible that reference counting is much faster and reduces memory usage compared to garbage collection. This is certainly not the case in a typical functional language, though.
When asking about memory usage, you also want to distinguish between "hot" memory usage (how much memory is actively used and so we want it to fit in cache) and overall memory usage (total heap size, even though a lot of it may be swapped out to disk on a desktop). Garbage collection typically increases the overall heap size, but is better than reference counting when it comes to reducing the size of the "hot" area of memory.
So basically, I wouldn't call Apple's claims unusual when they are made for iOS and Objective C (though "garbage collection is too slow to use sensibly on iDevices" is definitely pushing the ridiculous side of things), but I also wouldn't expect their rather specific environment to carry over to general purpose computing, or especially to Haskell.
Thanks, that's a great explanation :)

The important point about reference counting on idevices is the near realtime performance, since stops for collecting garbage are actually very short in comparison to collecting compilers (despite more frequent). Some compilers, I think it was for the pure functional programming language OPAL if I'm not wrong even check at compile time and add code for reusing cells instead of freeing them when it is known to be safe. But OPAL has strict evaluation and no lazy construct. This it does not allow for cycles unlike haskell which makes reference counting a viable and easy implementable option for OPAL. About Apple's ARC. It is actually using the very same mechanisms the clang static analyzer uses. That is at a first stage it works with the typical Objective-C conventions of object ownership. For example if you have a function other than alloc and Co. transferring object ownership to it's caller the static analyzer will complain about a possible space leak. In this case one has to add an attribute to the function telling the compiler that it is intended for the function to work like this. Having a look at the ARC docs it seems to be the very same case. That is if you have a function like just mentioned you need to add this said attribute to the function declaration for ARC to insert correct retains/releases. So there is no real magic going on, but one needs to be really aware of it or bug hunting (especially for external libraries) may become.... maybe a little less funny... Additionally clang adds some extra commands into the generated LLVM code which LLVM understands. This allows i) for further optimizations at later compiling stages and ii) you don't need to send messages to objects anymore for reference counting (as long as you don't implement retain/release for them by yourself, but by convention you don't do so...), but the counter may be accessed directly in memory. That's why ARC (if you follow Apple's conventions about object ownership) can be much more efficient than the current implementation. - Steffen

On Tue, Jul 5, 2011 at 9:00 PM, steffen
The important point about reference counting on idevices is the near realtime performance, since stops for collecting garbage are actually very short in comparison to collecting compilers (despite more frequent).
You can get near realtime performance from any collector if you structure
your application to only need a fixed amount of memory. I.e. you can make
guarantees about the amount of memory collected across any two GC cycles.
On Sat, Jul 2, 2011 at 16:51, Thomas Davie
Apple recently announced a new static analysis in Clang called ARC (Automatic Reference Counting). The idea is to provide what GC provides (zero memory management code by the programmer), but not to incur the runtime penalty of having to have the GC run. I was wondering if any of the compiler gurus out there could comment on the applicability of this kind of analysis to Haskell. Dropping the GC and hence stopping it blocking all threads and killing parallel performance seems like nirvana.
RC touches dead data, and GC touches live data. Look at the memory profile of a typical Haskell program - i.e. live memory ranging in megabytes, dead data ranging in hundreds of megabytes (if not gigabytes) per second. It is unlikely that we'll get much better performance from reference counting, though a hybrid model might be useful for the later generation (where collections are rarer). I think Haskell could do a lot to improve its concurrent GC. At the moment, each processor has a space of its own that it can collect concurrently, plus there is global collection that freezes all threads and cleans up all garbage in the later generation. For very-large-memory apps, we could do a lot to extend concurrent GC into later generations, do a more effective job of either managing our own paging or integrating with OS virtual memory, and avoid the global collection as a last resort. Pauses with paging-aware collectors can be improved by over two orders of magnitude [1]. [1] Garbage Collection without Paginghttp://lambda-the-ultimate.org/node/2391

On 2 Jul 2011, at 18:35, Thomas Davie wrote:
It's interesting that you cite that GC is both faster and lower memory overhead – Apple's stated reasons for implementing this were that GC was both too slow and too memory intensive to use sensibly on iDevices and that ARC was both faster and less memory intensive.
Reality is probably a little more subtle than this. In general, and specifically for long-running and memory intensive processes (such as used in servers), quality garbage collection (and especially compacting garbage collection) are probably more efficient overall. Apple already supported (and continues to support) garbage collection for Objective-C in their desktop systems. The primary motivation (as I understand it) for developing ARC is to bring (mostly) automatic memory management to the iOS platforms. There are 2 reasons that I've heard why Apple considers ARC a superior solution for the iOS platform: 1. iOS devices are much more resource constrained than a desktop system. Therefore the delay that garbage collection causes before memory is available for re-allocation can have a much greater effects on application. 2. Running a background garbage collector can introduce unpredictable pauses in your application, which would destroy the illusion of immediacy that is one of the prime characteristics of good iOS apps. So for iOS immediate memory release and predictable performance trumps overall average performance. To see if this technique would be at all useful for Haskell, you'll have to evaluate these points in the context of a Haskell application and decide which trade-off brings you the most benefit. Maarten

On Sat, 2 Jul 2011 16:51:53 +0100, you wrote:
Apple recently announced a new static analysis in Clang called ARC (Automatic Reference Counting). The idea is to provide what GC provides (zero memory management code by the programmer), but not to incur the runtime penalty of having to have the GC run. It seems to be extremely effective in objective-C land.
I was wondering if any of the compiler gurus out there could comment on the applicability of this kind of analysis to Haskell. Dropping the GC and hence stopping it blocking all threads and killing parallel performance seems like nirvana.
Reference counting as a means of lifetime management has been around for a long time in Microsoft's Component Object Model. And in both (Microsoft) Visual Basic and (Borland/CodeGear/Embarcadero) Delphi, reference counting is automatic, in a way that appears to be essentially identical to what Apple is describing. So, the concept is nothing new; the new part is that it is being offerred in a C dialect.
The one major problem that's still left with ARC is that it does not solve the retain cycle problem of reference counting. Programmers must insert a "weak" keyword to break cycles in their data graphs. Could even this analysis be automated statically? Could we put up with a language extension that added this annotation?
For handling cycles, there are alternatives to weak references, but I don't know that any of them is better than weak references. Excluding weak references, I don't know of a way to deal with cycles that isn't computationally equivalent to what full-blown garbage collectors already do, so there's really nothing to be gained there. If there were a way to do static analysis, there would be no need for garbage collection as it is currently implemented--the compiler would be able to insert explicit object lifetime management directly into the code. -Steve Schafer
participants (7)
-
Chris Smith
-
David Barbour
-
Maarten Hazewinkel
-
steffen
-
Steve Schafer
-
Thomas Davie
-
Thomas Schilling