
Dirk Kleeblatt wrote:
apfelmus wrote:
Dirk Kleeblatt wrote:
apfelmus wrote:
I also think that having liftIO in the CodeGen-monad is plain wrong. I mean, CodeGen is a monad that generates code without any execution
note that runCodeGen runs the code _generation_, executing the generated code is done _within_ the CodeGen monad via the functions generated by callDecl (or the predefined functions in the Harpy.Call module). This is even more intertwined, but intentional.
Huh? That means that code gets executed during it's own generation? But why do you mix separate concerns? I don't see what use this is besides being an opportunity to mess up.
One of our projects is a just-in-time compiler for a functional language. Here, compilation is done lazily: A starting stub is compiled and executed, when the execution reaches some point for which no code has been generated yet the next bit of code is compiled, and the program is resumed, and so on. So execution and code generation are interleaved.
Another project is related to dependent type checking as described by Benjamin Grégoire and Xavier Leroy in [1]. Here, functions can occur in types, and the cited paper describes how these functions can be executed by compiled code. During type checking. So type checking, code generation, and execution are interleaved.
Of course, again a different design is possible, making runCodeGen return a binary code object, that can be called from the IO monad. But then, the user has to care about releasing code buffers, and not to have unevaluated closures having code pointers to already released run-time generated code.
Huh, why does the user have to care? Shouldn't wrapping the raw memory block into a Foreign.ForeignPtr do the job?
Well, both projects manage a separate memory block which is used like a heap by generated code. This heap contains closures that have code pointers into generated code. And these are subject to our (not yet implemented... ;-) ) garbage collectors, not the Haskell collector.
Ah, I think the memory organization issues are the driving force for liftIO in the CodeGen-monad, not the possibility to interleave code generation and execution. This is because you can always interleave code generation and execution with the "binary code object"-design as well like in do codeobject <- runCodeGen code1 result <- exec codeobject codeobject2 <- runCodeGen (code2 result) result2 <- exec codeobject2 ... Note that the generated second code may well depend on the result gained from executing the first. However, you currently don't want free-floating binary code objects because you want to manage memory yourself. In other words, you carry around a single-threaded memory agglomeration that contains code and data, i.e. you're working in a monad StateT Memory IO a because only that can guarantee that there exists only a single instance of the memory agglomeration. (In fact, you have to use IORefs but that's not essential). Still, I think that writing opcodes somewhere and memory managing this somewhere are separate concerns. I'd separate them by splitting the CodeGen monad into a monad (I'll call it "Executor") over IO that threads the mentioned memory agglomeration and a data type that represents x86 assembly (hereby called "Code"). It is convenient to have Code being a monad too, but it's not necessary and conceptually wrong. Given this separation, you can support *both* designs simultaneously: - calling x86 assembly inside the Executor foo :: Code -> Executor () foo code = do buf <- newBuffer (size code) writeCode buf code exec buf freeBuffer buf - calling code from binary code objects bar :: Code -> IO () bar code = do (f :: IO ()) <- mkFunction code f The first one is for your intended applications. The second one is a simple way to outsource performance critical parts of a Haskell programs to assembly, something some people from this list are probably very interesting in. Last but not least, here are more details about Code and why it's not really a monad. The basic operation on Code is to glue two Code pieces together in sequence append :: Code -> Code -> Code In an assembly language without jumps, that's all about it besides primitive instructions like bite :: Code bark :: Code and so on. Thus, Code is at least a monoid. Now, jumps are what make Code special, we need a way to reference code positions, aka labels. Let's assume a primitive jump :: Label -> Code One way is to base it on the following two operations append' :: Code -> (Label -> Code) -> Code ccall :: (Label -> Code) -> Code The function append' supplies the position of it's first argument to the second so that the second can reference it. The function ccall supplies the position after the end of the code block to the code block itself, just like "call with current continutation" would do. The robodog example from last time then becomes pluto :: Code pluto = bark `append` ccall (\loop -> (jump loop) `append` bite) `append'` bark $ \loop -> bark `append` (jump loop) Of course, making Code = Code Label a MonadFix is more convenient (and I even think that the above primitives are not enough for multiple cross-jumps). In fact, append' is very much like >>= and ccall like mfix. In any case, I wanted to show that Code is weaker than monad. From another point of view, jumps make Code a specification of graphs. Regards, apfelmus