Re: [Haskell-cafe] is 256M RAM insufficient for a 20 millionelement Int/Int map?

2008/10/20 Don Stewart
claus.reinke:
Ideally, I'd just like to indicate the strictness in the types (hint to ghc hackers: 'Data.Map.Map Int !Int' and '[!a]' would really be useful!-), In general, being able to specialise polymorphic structures so they look like unpacked monomorphic ones would be awesome.
(!Int, !Bool) -> (,) {-# UNPACK #-}!Int {-# UNPACK #-}!Bool
I spent some time thinking about how one might actually go about implementing this in GHC today, and came to the conclusion that it interacts far too badly with separate compilation. In an ideal world, you know which specialisations to generate for the data structure when compiling it in the first place. You would then of course generate the appropriate specialisations for all subsequent polymorphic function definitions that use the speciliased type ((,), IntMap or whatever), so you don't lose the ability to apply your unpacked version to a function of type like (a, b) -> a - see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.8984 that Claus pointed to. But we don't have that information yet at the point where we compile a data definition... The CLR has a similar problem but gets away with it because it has a JIT compiler that can see the original definitions of classes / methods and instantiate them at the necessary types at runtime (see http://research.microsoft.com/~akenn/generics/space2004generics.pdf). One possible approach for GHC would be to generate all possible specializations for a datatype + functions when compiling a module. This sounds bad, but actually you only need to cater for at most three unboxed field sizes: 1) 32 bit (Int#, Word#, Float#, Addr#) 2) 64 bit (Int64#, Word64#) 3) Usually 64 bit (Double#) Of course, if you have a type like (,,,) with 4 type arguments you still have to potentially generate 3^4 specialisations (and corresponding functions!) so this /can/ get bad quickly :-). This is also sort of a hack, in that it crucially relies on the fact that unboxed tuples don't really make sense as type arguments yet. This is because types like (# Int, Bool #) -> Int are rejected by the compiler - unboxed tuples can only occur on the right of function arrows. However, this is just a technical limitation, and were it to be lifted it might make sense to be able to instantiate data types at unboxed tuple types, blowing this scheme out of the water. You could also export the entire definition of functions+data structures you might need to specialise across module boundaries, and generate specialisations where they are needed, but this can lead to redundant code generation with diamond dependencies (see the Syme/Kennedy paper - though I don't think their problem with redundant type descriptors applies to Haskell, making it easier to apply) and would bloat .hi files. This might still be the best way to go about it, should someone choose to implement it,. A further alternative would be to just compile every function once, but pass something like a dictionary to every function to tell it how to handle its own arguments (how much stuff to pop off the stack etc). Clean ,but the huge runtime overhead of this approach kind of defeats the point of using unboxed types in the first place. A final alternative would be to somehow recompile a module with the desired instantiation if it turned out not to contain it. But this could potentially lead to greatly increased compile times... So, I can't really see a clean way to go about this without going the whole hog and turning GHC into a whole-program compiler :-) Cheers, Max
participants (1)
-
Max Bolingbroke