Re: ANNOUNCE: Harpy -- run-time code generation library

[Relocated to haskell-cafe] Dirk Kleeblatt wrote:
apfelmus wrote:
Note that even currently, your operations cannot be strict in the address a label refers to because this may be determined later than the first use of the label. In other words, your example code
fac = do [...] (1) jmp loopTest [...] (2) loopTest @@ cmp ecx (0 :: Word32) [...] already shows that the function jmp that generates a jmp-instruction may not be strict in the position it jumps to as the address behind loopTest is only known later at line (2).
When generating code for (1), the label loopTest is used as an index into a map, to see whether there's a code position associated with it. If it is, the code position is used to compute the jump offset for the jmp instruction, if not (as in this example), a dummy value is placed in the code buffer, and Harpy remembers this position to be patched later on. At (2), the label is defined, and this leads to patching all instructions that have been emitted before this definition.
So, yes, the code position is only used after the definition of the label. But the "look up in a map"-part makes the jmp operation strict in the label parameter.
Ah, I slowly get what you mean. In short, the point is that you're reinventing lazy evaluation because you're writing directly to a memory buffer and writing a raw _|_ is impossible. So, you invent a scheme to delay the evaluation of the _|_ until they are defined. With raw writing, we indeed have the strictness writeByte buf _|_ = _|_ which implies jmp _|_ = _|_ I thought that you'd simply store the instructions in a list/monoid like the Builder-Monoid from Data.Binary with the effect that jmp _|_ /= _|_ as monadic action although it stores _|_ as instruction. Lazy evaluation then takes care of filling the _|_-instructions with content in times of need. Here's an illustration with a fictional robodog toy assembly language import Control.Monad.RWS type Address = Int data Dogcode = Bark | Bite | Jump Address data CodeGen a = RWS () [Dogcode] Address a emitDogcode :: Dogcode -> CodeGen Address emitDogcode c = x <- get tell c put (x+1) return x jump :: Label -> CodeGen Address jump = emitDogcode . Jump bark, bite :: CodeGen Address bark = emitDogcode Bark bite = emitDogcode Bite generateDogCode :: CodeGen a -> CodeBuffer generateDogCode c = let (_,instructionCount, code) = runRWST c () 0 in ... whatever to do with a list of instructions ... Now, you magically get recursive do-notation without lifting a finger because RWS is already an instance of MonadFix. For example, the following works out of the box pluto :: CodeGen Address pluto = mdo bark jump loop bite loop <- bark bark jump loop (The return value of type Address encapsulated in the monad is the line number of the last instruction of pluto.) If you really want to directly write to a memory buffer, I think that you can still use lazy evaluation to fill in the _|_ by emitting a "jump PATCHME" dummy instruction and build a stack (f.i. of type [Address]) of addresses. The stack takes the role of the map you currently use, I think that a stack is enough. The addresses therein may well be _|_. After having written all instructions into the buffer, all the _|_ are resolved and you can patch the "PATCHME"s in a second pass over the buffer. However, because of omnipresent lazy evaluation, it is unclear whether "directly" writing to a buffer instead of first building a monoid does give a speed/space advantage. So, your current approach may well be slower than a naïve "first (difference) list, then raw memory" approach.
We could omit the map, and just remember where to patch the code, but then we'd need to call explicitly some function after code generation that does the patching. We had implemented this, but found the current solution easier to use, since backpatching is completely automatic and hidden from the user.
Huh, I thought that runCodeGen would do an additional backpatch pass as transparently?
I also think that having liftIO in the CodeGen-monad is plain wrong. I mean, CodeGen is a monad that generates code without any execution taking place. The execution part is already handled by runCodeGen. Having liftIO means that arbitrary Haskell programs can be intertwined with assembly generation and I doubt that you want that.
Feel free to doubt, but this is exactly what we want. :-)
Also, note that runCodeGen runs the code _generation_, executing the generated code is done _within_ the CodeGen monad via the functions generated by callDecl (or the predefined functions in the Harpy.Call module). This is even more intertwined, but intentional.
Huh? That means that code gets executed during it's own generation? But why do you mix separate concerns? I don't see what use this is besides being an opportunity to mess up.
Of course, again a different design is possible, making runCodeGen return a binary code object, that can be called from the IO monad. But then, the user has to care about releasing code buffers, and not to have unevaluated closures having code pointers to already released run-time generated code.
Huh, why does the user have to care? Shouldn't wrapping the raw memory block into a Foreign.ForeignPtr do the job? Regards, apfelmus

apfelmus wrote:
Dirk Kleeblatt wrote:
apfelmus wrote: [...] So, yes, the code position is only used after the definition of the label. But the "look up in a map"-part makes the jmp operation strict in the label parameter.
Ah, I slowly get what you mean. In short, the point is that you're reinventing lazy evaluation because you're writing directly to a memory buffer and writing a raw _|_ is impossible. So, you invent a scheme to delay the evaluation of the _|_ until they are defined.
Exactly.
I thought that you'd simply store the instructions in a list/monoid like the Builder-Monoid from Data.Binary
Currently, we don't.
Here's an illustration with a fictional robodog toy assembly language
import Control.Monad.RWS
type Address = Int data Dogcode = Bark | Bite | Jump Address
Nice! :-)
If you really want to directly write to a memory buffer,
It makes interleaving code generation, execution of the buffer, generating more code, executing a little bit and so on (see below) easier, but...
However, because of omnipresent lazy evaluation, it is unclear whether "directly" writing to a buffer instead of first building a monoid does give a speed/space advantage. So, your current approach may well be slower than a naïve "first (difference) list, then raw memory" approach.
... after seeing some benchmarks in the last hours, I guess this is right: Doing the optimizations that the binary package does might give laziness and speed at the same time.
We could omit the map, and just remember where to patch the code, but then we'd need to call explicitly some function after code generation that does the patching. We had implemented this, but found the current solution easier to use, since backpatching is completely automatic and hidden from the user.
Huh, I thought that runCodeGen would do an additional backpatch pass as transparently?
Currently, it doesn't. And in the current implementation, it's not necessary.
I also think that having liftIO in the CodeGen-monad is plain wrong. I mean, CodeGen is a monad that generates code without any execution note that runCodeGen runs the code _generation_, executing the generated code is done _within_ the CodeGen monad via the functions generated by callDecl (or the predefined functions in the Harpy.Call module). This is even more intertwined, but intentional.
Huh? That means that code gets executed during it's own generation? But why do you mix separate concerns? I don't see what use this is besides being an opportunity to mess up.
One of our projects is a just-in-time compiler for a functional language. Here, compilation is done lazily: A starting stub is compiled and executed, when the execution reaches some point for which no code has been generated yet the next bit of code is compiled, and the program is resumed, and so on. So execution and code generation are interleaved. Another project is related to dependent type checking as described by Benjamin Grégoire and Xavier Leroy in [1]. Here, functions can occur in types, and the cited paper describes how these functions can be executed by compiled code. During type checking. So type checking, code generation, and execution are interleaved.
Of course, again a different design is possible, making runCodeGen return a binary code object, that can be called from the IO monad. But then, the user has to care about releasing code buffers, and not to have unevaluated closures having code pointers to already released run-time generated code.
Huh, why does the user have to care? Shouldn't wrapping the raw memory block into a Foreign.ForeignPtr do the job?
Well, both projects manage a separate memory block which is used like a heap by generated code. This heap contains closures that have code pointers into generated code. And these are subject to our (not yet implemented... ;-) ) garbage collectors, not the Haskell collector. Kind regards, Dirk [1] http://pauillac.inria.fr/~xleroy/publi/strong-reduction.pdf

Dirk Kleeblatt wrote:
apfelmus wrote:
Dirk Kleeblatt wrote:
apfelmus wrote:
I also think that having liftIO in the CodeGen-monad is plain wrong. I mean, CodeGen is a monad that generates code without any execution
note that runCodeGen runs the code _generation_, executing the generated code is done _within_ the CodeGen monad via the functions generated by callDecl (or the predefined functions in the Harpy.Call module). This is even more intertwined, but intentional.
Huh? That means that code gets executed during it's own generation? But why do you mix separate concerns? I don't see what use this is besides being an opportunity to mess up.
One of our projects is a just-in-time compiler for a functional language. Here, compilation is done lazily: A starting stub is compiled and executed, when the execution reaches some point for which no code has been generated yet the next bit of code is compiled, and the program is resumed, and so on. So execution and code generation are interleaved.
Another project is related to dependent type checking as described by Benjamin Grégoire and Xavier Leroy in [1]. Here, functions can occur in types, and the cited paper describes how these functions can be executed by compiled code. During type checking. So type checking, code generation, and execution are interleaved.
Of course, again a different design is possible, making runCodeGen return a binary code object, that can be called from the IO monad. But then, the user has to care about releasing code buffers, and not to have unevaluated closures having code pointers to already released run-time generated code.
Huh, why does the user have to care? Shouldn't wrapping the raw memory block into a Foreign.ForeignPtr do the job?
Well, both projects manage a separate memory block which is used like a heap by generated code. This heap contains closures that have code pointers into generated code. And these are subject to our (not yet implemented... ;-) ) garbage collectors, not the Haskell collector.
Ah, I think the memory organization issues are the driving force for liftIO in the CodeGen-monad, not the possibility to interleave code generation and execution. This is because you can always interleave code generation and execution with the "binary code object"-design as well like in do codeobject <- runCodeGen code1 result <- exec codeobject codeobject2 <- runCodeGen (code2 result) result2 <- exec codeobject2 ... Note that the generated second code may well depend on the result gained from executing the first. However, you currently don't want free-floating binary code objects because you want to manage memory yourself. In other words, you carry around a single-threaded memory agglomeration that contains code and data, i.e. you're working in a monad StateT Memory IO a because only that can guarantee that there exists only a single instance of the memory agglomeration. (In fact, you have to use IORefs but that's not essential). Still, I think that writing opcodes somewhere and memory managing this somewhere are separate concerns. I'd separate them by splitting the CodeGen monad into a monad (I'll call it "Executor") over IO that threads the mentioned memory agglomeration and a data type that represents x86 assembly (hereby called "Code"). It is convenient to have Code being a monad too, but it's not necessary and conceptually wrong. Given this separation, you can support *both* designs simultaneously: - calling x86 assembly inside the Executor foo :: Code -> Executor () foo code = do buf <- newBuffer (size code) writeCode buf code exec buf freeBuffer buf - calling code from binary code objects bar :: Code -> IO () bar code = do (f :: IO ()) <- mkFunction code f The first one is for your intended applications. The second one is a simple way to outsource performance critical parts of a Haskell programs to assembly, something some people from this list are probably very interesting in. Last but not least, here are more details about Code and why it's not really a monad. The basic operation on Code is to glue two Code pieces together in sequence append :: Code -> Code -> Code In an assembly language without jumps, that's all about it besides primitive instructions like bite :: Code bark :: Code and so on. Thus, Code is at least a monoid. Now, jumps are what make Code special, we need a way to reference code positions, aka labels. Let's assume a primitive jump :: Label -> Code One way is to base it on the following two operations append' :: Code -> (Label -> Code) -> Code ccall :: (Label -> Code) -> Code The function append' supplies the position of it's first argument to the second so that the second can reference it. The function ccall supplies the position after the end of the code block to the code block itself, just like "call with current continutation" would do. The robodog example from last time then becomes pluto :: Code pluto = bark `append` ccall (\loop -> (jump loop) `append` bite) `append'` bark $ \loop -> bark `append` (jump loop) Of course, making Code = Code Label a MonadFix is more convenient (and I even think that the above primitives are not enough for multiple cross-jumps). In fact, append' is very much like >>= and ccall like mfix. In any case, I wanted to show that Code is weaker than monad. From another point of view, jumps make Code a specification of graphs. Regards, apfelmus
participants (2)
-
apfelmus
-
Dirk Kleeblatt