
On Mon, Jul 12, 2010 at 01:50:01PM +0100, Simon Marlow wrote:
Right. I like the idea of packages being able to declare re-exported modules, indeed I considered doing this in GHC (when we needed base3) but decided not to mainly because we would still need PackageImports, and once you have PackageImports you can implement re-exports using that, albeit with more boilerplate. The advantage of direct re-exports, as you say, is that it avoids a conflict when the module is imported. However, since some of the modules would conflict anyway, there didn't seem to be much advantage in avoiding the conflict in some cases but not others.
Well, a main useful case is that I can do -phaskell98 and -phaskell2010 at the same time. So I can make the default jhc behavior be the union of the two languages easily. However, another motivation is that I want to be able to easily do things like provide modules like ghc-base-3 and ghc-base-4 that are simple shims that provide compatible interfaces with versions of ghc, however, those ghc bases also export some haskell98 modules, so being able to do -phaskell98 -pghc-base-3 at the same time is important. There is also some philosophy behind it. I don't like coupling things that don't need to be coupled, such as the interface a library exports and the place that interface is implemented.
I think the ability for a package to define an 'interface' built up from other re-exported modules will be very useful going forward, I am not sure how hard something like that would be to implement in ghc, but it may be worth it in the future.
Isn't that what PackageImports lets you do?
Hmm.. maybe. Jhc doesn't have PackageImports, the idea being that you re-export modules rather than importing them from a specific package and exporting their interface. So, it sort of shifts the effort from the interface makers to the implementors, as in, the Haskell 98 Prelude will actually have to be in a module called Compat.Haskell98.Prelude and just re-exported by the 'haskell98' and 'haskell2010' modules. In one sense 'Compat.Haskell98' sort of acts as a package import, but it isn't actually tied to a package, it just refers to the Compat.Haskell98.Prelude in scope. I also have a large aversion to allowing package names to infect source files carte blanche. It blurs the line between the language itself and the environment the language runs in. It may be neccesarry at some point, but I'll put it off until a case comes up that makes it abundantly clear that it is needed.
In the past I just had the boehm GC and the cross your fingers and hope static analysis can catch everything options. But as of 0.7.4 I have a real gc that I hope to make the new default in the next major release. (enabled with -fjgc on the command line)
It is an accurate GC that is implemented in portable C. I am using something based on the paper 'Accurate garbage collection in an Uncooperative Environment'[1] though the technique was independently discovered. I always pass the GC parameter as the first argument to functions, which is mapped to the same register, so I effectively have a dedicated register without having to resort to a dedicated declared register in gcc. Plus I can omit the parameter in leaf functions that don't allocate and free up the register for normal use. I compile with -mregparm=2 on i386 so the first two arguments to a function get mapped to registers.
I found that an independent shadow stack actually is faster than using the linked version described in the paper, (though, still passing around a pointer to the top of the stack as described), my theory being that taking the address of a stack allocated object will inhibit certain gcc optimizations.
The underlying allocator is based on Bonwick's slab allocator[2] which works quite well for a haskell runtime, I have a slab for each type, so a slab of 'cons' cells, a slab of size 3 tuples, and so forth.
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5570 [2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.4374
Interesting. This is quite a different set of decisions to the way GHC does things, so I look forward to comparing some benchmarks. My intuition is that if allocation is anything more complicated than "reg += size; if (reg > lim) { ... }" then you have a lot of work to do to make up for the overhead, but let's see!
Indeed, in certain places I quite conciously made different choices than ghc, not necessarily because I thought they were better, but more because I knew they would open up a new design space for me to explore. Certain places like the RTS I felt much more comfortable being radically different due to my embedded/operating systems background. Though a lot of the middle end is pretty much analogous to what ghc does (though jhc core is based on a PTS instead of system F). My Grin also diverged from boquist's GRIN quite a bit. Modern architectures and my desire to target different languages where I may not control the native code generator have dictated a different set of optimizations. However, the essential character of it, that it is a proper, pure, monad, remains intact. Yeah, I didn't realize how important the allocator was until I started benchmarking, spending time cutting the cost of marking garbage in half didn't help nearly as much as shaving a few cycles off the allocator. The fast pass of the allocator is actually very fast, each object type has its own allocator and free list so allocation is pretty much just pulling an object off of the free list, it is already of the appropriate size and its constant fields are pre-initialized as they arn't cleared during garbage collection. (there is a heuristic that claims full pages back to the general pool sometimes). The slow path has to grab a full page and initialize it, but that isn't really much slower as I can prefetch the cache lines needed so the cost is on the order of another cache line fill. (thinking about computational complexity in terms of O(cache line fills) rather than O(operations) is much more appropriate on todays architectures.). In any case, something I didn't expect and completely surprised me in a good way was that adding the garbage collector made a large number of benchmarks actually run faster over having a simple incrementing pointer that never reclaimed anything. Apparently it was the indirection short circuiting that GC performed that did it. Saving that cache line fill for the redirection node was a huge win that made up for the GC cost. (cachegrind of the valgrind suite is a great tool for figuring this sort of thing out). There was probably some benefit from packing objects of the same type tightly against each other, cons cells are exactly 2 words and packed right next to each other filling a page, so I can fit more on the same line, and cells of the same type, allocated at around the same time probably have some locality of reference. John -- John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/