[Haskell-cafe] Re: ANNOUNCE: Harpy -- run-time code generation library

16 May 2007

      Dirk Kleeblatt wrote:
...
apfelmus wrote:
...
Dirk Kleeblatt wrote:
...
apfelmus wrote:
...
I also think that having liftIO in the CodeGen-monad is plain wrong. I
mean, CodeGen is a monad that generates code without any execution
note that runCodeGen runs the code _generation_, executing the
generated code is done _within_ the CodeGen monad via the functions
generated by callDecl (or the predefined functions in the Harpy.Call
module). This is even more intertwined, but intentional.
Huh? That means that code gets executed during it's own generation? But
why do you mix separate concerns? I don't see what use this is besides
being an opportunity to mess up.
One of our projects is a just-in-time compiler for a functional
language. Here, compilation is done lazily: A starting stub is compiled
and executed, when the execution reaches some point for which no code
has been generated yet the next bit of code is compiled, and the program
is resumed, and so on. So execution and code generation are interleaved.
Another project is related to dependent type checking as described by
Benjamin Grégoire and Xavier Leroy in [1]. Here, functions can occur in
types, and the cited paper describes how these functions can be executed
by compiled code. During type checking. So type checking, code
generation, and execution are interleaved.
...
...
Of course, again a different design is possible, making runCodeGen
return a binary code object, that can be called from the IO monad. But
then, the user has to care about releasing code buffers, and not to have
unevaluated closures having code pointers to already released run-time
generated code.
Huh, why does the user have to care? Shouldn't wrapping the raw memory
block into a Foreign.ForeignPtr do the job?
Well, both projects manage a separate memory block which is used like a
heap by generated code. This heap contains closures that have code
pointers into generated code. And these are subject to our (not yet
implemented... ;-) ) garbage collectors, not the Haskell collector.
Ah, I think the memory organization issues are the driving force for
liftIO in the CodeGen-monad, not the possibility to interleave code
generation and execution. This is because you can always interleave code
generation and execution with the "binary code object"-design as well
like in

  do
   codeobject  <- runCodeGen code1
   result      <- exec codeobject
   codeobject2 <- runCodeGen (code2 result)
   result2     <- exec codeobject2
   ...

Note that the generated second code may well depend on the result gained
from executing the first.

However, you currently don't want free-floating binary code objects
because you want to manage memory yourself. In other words, you carry
around a single-threaded memory agglomeration that contains code and
data, i.e. you're working in a monad

  StateT Memory IO a

because only that can guarantee that there exists only a single instance
of the memory agglomeration. (In fact, you have to use IORefs but that's
not essential).

Still, I think that writing opcodes somewhere and memory managing this
somewhere are separate concerns. I'd separate them by splitting the
CodeGen monad into a monad (I'll call it "Executor") over IO that
threads the mentioned memory agglomeration and a data type that
represents x86 assembly (hereby called "Code"). It is convenient to have
Code being a monad too, but it's not necessary and conceptually wrong.
Given this separation, you can support *both* designs simultaneously:

 - calling x86 assembly inside the Executor

  foo :: Code -> Executor ()
  foo code = do
    buf <- newBuffer (size code)
    writeCode buf code
    exec buf
    freeBuffer buf

 - calling code from binary code objects

  bar :: Code -> IO ()
  bar code = do
    (f :: IO ()) <- mkFunction code
    f

The first one is for your intended applications. The second one is a
simple way to outsource performance critical parts of a Haskell programs
to assembly, something some people from this list are probably very
interesting in.

Last but not least, here are more details about Code and why it's not
really a monad.

The basic operation on Code is to glue two Code pieces together in sequence

   append :: Code -> Code -> Code

In an assembly language without jumps, that's all about it besides
primitive instructions like

   bite :: Code
   bark :: Code

and so on. Thus, Code is at least a monoid.

Now, jumps are what make Code special, we need a way to reference code
positions, aka labels. Let's assume a primitive

   jump    :: Label -> Code

One way is to base it on the following two operations

   append' :: Code -> (Label -> Code) -> Code
   ccall   :: (Label -> Code) -> Code

The function append' supplies the position of it's first argument to the
second so that the second can reference it. The function ccall supplies
the position after the end of the code block to the code block itself,
just like "call with current continutation" would do. The robodog
example from last time then becomes

  pluto :: Code
  pluto =
               bark
     `append`  ccall (\loop -> (jump loop) `append` bite)
     `append'` bark $ \loop ->
               bark
     `append`  (jump loop)

Of course, making Code = Code Label  a MonadFix is more convenient (and
I even think that the above primitives are not enough for multiple
cross-jumps). In fact, append' is very much like >>= and ccall like
mfix. In any case, I wanted to show that Code is weaker than monad. From
another point of view, jumps make Code a specification of graphs.

Regards,
apfelmus