Clean Dynamics and serializing code to disk

Hey everyone; recently I've been toying around with various methods of writing a shell and reading the academic literature on such things. The best prior art on the subject seems to be the ESTHER shell (see http://citeseer.ist.psu.edu/689593.html, http://citeseer.ist.psu.edu/744494.html, ftp://ftp.cs.kun.nl/pub/Clean/papers/2003/vWeA2003-Esther.pdf). Now, ESTHER is a really cool looking shell, but it has two main problems for me: 1) Source doesn't seem to be available anywhere online 2) It's written in Clean and not Haskell No problem. All the hard stuff is done, and there's like a good 50 pages of documentation, so how hard could it be? Clean is practically Haskell anyway. But immediately I ran into a road-block: "The shell is built on top of Clean's hybrid static/dynamic type system and its dynamic I/O run-time support. It allows programmers to save any Clean expression, i.e a graph that can contain data, references to functions, and closures to disk. Clean expressions can be written to disk as a _dynamic_, which contains a representation of their (polymorphic) static type, while preserving sharing. Clean programs can load dynamics from disk and use run-time type pattern matching to reintegrate it into the statically-typed program." The Data.Dynamic library seems to do everything as far as dynamic types and run-time pattern matching goes, but I haven't figured out how one could write Haskell expressions to disk, like Clean's system http://www.st.cs.ru.nl/papers/2002/verm2002-LazyDynamicIO.ps.gz apparently allows. Does anyone know if there are any neat or tricky ways this could be done? Projects, extensions, whatever? On #haskell, quicksilver did tell me of one neat way to serialize various stuff through Data.Binary by using ADTs along the lines of the following simple example: -- module Main (main) where import Data.Binary data Math = Add | Subtract | Multiply deriving Show eval :: (Num a) => Math -> a -> a -> a eval f = case f of Add -> (+) Subtract -> (-) Multiply -> (*) instance Binary Math where put Add = putWord8 0 put Subtract = putWord8 1 put Multiply = putWord8 2 get = do tag_ <- getWord8 case tag_ of 0 -> return Add 1 -> return Subtract 2 -> return Multiply main = do encodeFile "tmp.s" [Add, Subtract, Multiply] a <- decodeFile "tmp.s" putStr $ show (a :: [Math]) -- Since from my Lisp days I know that code is data, it strikes me that one could probably somehow smuggle Haskell expressions via this route although I am not sure this is a good way to go or even how one would do it (to turn, say, a list of the chosen ADT back into real functions, you need the 'eval' function, but apparently eval can only produce functions of the same type - so you'd need to either create as many adts and instances as there are varieties of type signatures in Haskell '98 and the libraries, I guess, or somehow encode in a lambda calculus). Is that a route worth pursuing? -- gwern

On 5 Dec 2007, gwern0@gmail.com wrote:
Since from my Lisp days I know that code is data, it strikes me that one could probably somehow smuggle Haskell expressions via this route although I am not sure this is a good way to go or even how one would do it (to turn, say, a list of the chosen ADT back into real functions, you need the 'eval' function, but apparently eval can only produce functions of the same type - so you'd need to either create as many adts and instances as there are varieties of type signatures in Haskell '98 and the libraries, I guess, or somehow encode in a lambda calculus). Is that a route worth pursuing?
I too am interested in serializing functions, but for a different purpose: distributed computing without emulating global shared memory like GdH. The hard part, as I understand it, is tracking down all the references in a function. Once they are identified, we can wrap the whole thing up (sort of lambda lifting at runtime) and send that. I believe this is what the GUM runtime does internally. I am unaware of a way to get at this information without modification of the runtime. If the function we want to serialize is available at compile time, the compiler should be able to do the lambda lifting and give us a binary object that we can serialize. I don't know if this is possible now or if it would need a compiler modification. Perhaps the Mobile Haskell approach is a good idea---serializing byte-code generated by GHCi. In one paper, they reference new functions packV :: a -> IO CString unpackV :: CString -> IO a although I'm skeptical of these type signatures. At least, they are only valid for byte-code, so they don't tell the whole story. This byte-code is dynamically linked on the receiving end so the same libraries must be compiled there, but compiled code is never serialized. From the article: Packing, or serializing, arbitrary graph structures is not a trivial task and care must be taken to preserve sharing and cycles. As in GpH, GdH and Eden, packing is done breadth-first, closure by closure and when the closure is packed its address is recorded in a temporary table that is checked for each new closure to be packed to preserve sharing and cycles. We proceed packing until every reachable graph has been serialised. As long as the real work takes place in compiled code, sending the byte-code might not be a bad idea and it has the added benefit of being platform-independent. However, I haven't been able to find specifics about the implementation of packV/unpackV, and I would think the runtime is better positioned to do this analysis itself. Perhaps someone on this list who knows a thing or two about the internals can offer some insight. :-) Jed

if you want to make something like this for haskell (and i'd very much like to use it!-), there are several issues, including: 1 handling code: - going for a portable intermediate representation, such as bytecode, is most promising, especially if the code representation is fairly stable (if you're worried about performance, choose a representation that can be compiled further by the backend on the target machine) - shipping the code objects, and serializing only pointers into those, then linking into the objects dynamically on load, can work if platform and compiler (ghc versions aren't abi-compatible) don't change 2 handling types: - 'out e' needs a runtime representation of the static type of e; - 'e <- in' needs to compare e's type representation at runtime against the static type expected by the usage context - types evolve, type representations evolve - haskell's Dynamic doesn't handle polymorphic types, and its type representations are not standardised, so they can change even if the types do not (there's no guarantee that reading an expression written by another program will work) - doing a proper integration of Dynamic raises other issues wrt interactions with other type system features (see the Clean papers for examples) or wrt parametricity (if any type can be wrapped in/extracted from a Dynamic, can you now have type-selective functions of type 'a->a'?) 3 handlings graphs: in a typical functional language implementation, there are several parts that need to do this already, although they tend to be specialised to their intended use, so they might not cover all needs for general serialization - a distributed implementation needs to ship graphs to other nodes (but often ships code in a separate setup phase..) - the memory manager needs to move graphs between memory areas (but often does not touch code at all) graph representations evolve, so if you can reuse code from one of the existing parts here, that will save you not only initial time but, more importantly, avoid a maintenance nightmare. 4 surviving evolution: if you got all of that working, the more interesting issues start popping up - implementations move on, can your code keep up? - programs move on, can your system handle versions? - the distinction between static/dynamic has suddenly become rather blurry, raising lots of interesting opportunities, but also pointing out limitations of tools that rely on a fixed single compile-time/ runtime cycle many of those issues were investigated long ago, in the area of orthogonally persistent systems (including types and first-class procedures), see a thread from earlier this year for a few references: http://www.haskell.org/pipermail/haskell-cafe/2007-June/027162.html non-trivial, but doable - you'd probably spend a lot more time thinking, investigating current code, prototyping options, and sorting out issues, than on the final implementation. it is the kind of problem where you either spend a lot of time preparing an overnight success, or you start hacking right away and never get anywhere!-) claus ps: i implemented first-class i/o for a pre-haskell functional language (long gone now) - i experimented with dynamic linking of object code, but went with byte code, didn't have to struggle with types, because the language had runtime types and checking anyway, and was able to reuse code from the distributed implementation for storage/retrieval. and saw only the very early stages of 4;-)

gwern0@gmail.com wrote:
Hey everyone; recently I've been toying around with various methods of writing a shell and reading the academic literature on such things. The best prior art on the subject seems to be the ESTHER shell (see http://citeseer.ist.psu.edu/689593.html, http://citeseer.ist.psu.edu/744494.html, ftp://ftp.cs.kun.nl/pub/Clean/papers/2003/vWeA2003-Esther.pdf).
Now, ESTHER is a really cool looking shell, but it has two main problems for me: 1) Source doesn't seem to be available anywhere online ...
The source code of ESTHER is include with Clean 2.2 in the directory Libraries/Hilde of the windows 32 bit binary zip and the sources zip and tar. Kind regards, John van Groningen

On 2007.12.05 15:56:49 +0100, John van Groningen
gwern0@gmail.com wrote:
Hey everyone; recently I've been toying around with various methods of writing a shell and reading the academic literature on such things. The best prior art on the subject seems to be the ESTHER shell (see http://citeseer.ist.psu.edu/689593.html, http://citeseer.ist.psu.edu/744494.html, ftp://ftp.cs.kun.nl/pub/Clean/papers/2003/vWeA2003-Esther.pdf).
Now, ESTHER is a really cool looking shell, but it has two main problems for me: 1) Source doesn't seem to be available anywhere online ...
The source code of ESTHER is include with Clean 2.2 in the directory Libraries/Hilde of the windows 32 bit binary zip and the sources zip and tar.
Kind regards,
John van Groningen
Thanks for the information! I had no idea it'd be included with the Clean compiler package, but it's there alright. Interesting reading, too. -- gwern
participants (4)
-
Claus Reinke
-
gwern0@gmail.com
-
Jed Brown
-
John van Groningen