
Joel Reymont wrote:
Folks,
Allegro Common Lisp has AllegroCache [1], a database built on B-Trees that lets one store Lisp objects of any type. You can designate certain slots (object fields) as key and use them for lookup. ACL used to come bundled with the ObjectStore OODBMS for the same purpose but then adopted a native solution.
AllegroCache is not distributed or replicating but supports automatic versioning. You can redefine a class and new code will store more (or less) data in the database while code that uses the old schema will merrily chug along. That implies being able to put persistent code into the database. Easy enough in Lisp, less easy in Haskell. How do you serialize it?
Erlang [2] has Mnesia [3] which lets you store any Erlang term ("object"). It stores records (tuples, actually) and you can also designate key fields and use them for lookup. I haven't looked into this deeply but Mnesia is built on top of DETS (Disk-based Term Storage) which most likely also uses a form of B-Trees. Erlang also has a very disciplined approach to code updates, which
As a rule, storing functions along with data is a can of worms. Either you actually store the code as a BLOB or you store a pointer to the function in memory. Either way you run into problems when you upgrade your software and expect the stored functions to work in the new context. presumably helps a lot when functions are stored.
Mnesia is distributed and replicated in real-time. There's no automatic versioning with Mnesia but user code can be run to read old records and write new ones.
Would it make sense to build a similar type of a database for Haskell? I can immediately see how versioning would be much harder as Haskell is statically typed. I would love to extend recent gains in binary serialization, though, to add indexing of records based on a designated key, distribution and real-time replication.
I very much admire Mnesia, even though I'm not an Erlang programmer. It would indeed be really cool to have something like that. But Mnesia is built on the Erlang OTP middleware. I would suggest that Haskell needs a middleware with the same sort of capabilities first. Then we can build a database on top of it.
What do you think?
To stimulate discussion I would like to ask a couple of pointed questions:
- How would you "designate" a key for a Haskell data structure? I haven't tried compiling it, but something like:
class (Ord k) => DataKey a k | a -> k where keyValue :: a -> k
- Is the concept of a schema applicable to Haskell? The real headache is type safety. Erlang is entirely dynamically typed, so untyped schemas with column values looked up by name at run-time fit right in, and its up to the programmer to manage schema and code evolution to prevent errors. Doing all this in a statically type safe way is another layer of complexity and checking.
Actually this is also just another special case of the middleware case. If we have two processes, A and B, that need to communicate then they need to agree on a protocol. Part of that protocol is the data types. If B is a database then this reduces to the schema problem. So lets look at the more general problem first and see if we can solve that. There are roughly two ways for A and B to agree on the protocol. One is to implement the protocol separately in A and B. If it is done correctly then they will work together. But this is not statically checkable (ignoring state machines and model checking for now). This is the Erlang approach, because dynamic checking is the Erlang philosophy. Alternatively the protocol can be defined in a special purpose protocol module P, and A and B then import P. This is the approach taken by CORBA with IDL. However what happens if P is updated to P'? Does this mean that both A and B need to be recompiled and restarted simultaneously? Requiring this is a Bad Thing; imagine if every bank in the world had to upgrade and restart its computers simultaneously in order to upgrade a common protocol. (This protocol versioning problem was one of the major headaches with CORBA.) We would have to have P and P', live simultaneously, and processes negotiate the latest version of the protocol that they both support when they start talking. That way the introduction of P' does not need to be simultaneous with the withdrawal of P. There is still the possibility of a run-time failure at the protocol negotiation stage of course, if it transpires that the to processes have no common protocol. So we need a DSL which allows the definition of data types and abstract protocols (i.e. who sends what to whom when) that can be imported by the two processes (do we need N-way protocols?) on each end of the link. If we could embed this in Haskell directly then so much the better, but something that needs preprocessing would be fine too. However there is a wrinkle here: what about "pass through" processes which don't interpret the data but just store and forward it. Various forms of protocol adapter fit this scenario, as does the database you originally asked about. We want to be able to have these things talk in a type-safe manner without needing to be compiled with every data structure they transmit. You could describe these things using type variables, so that for instance if a database table is created to store a datatype D then any process reading or writing the data must also use D, even though the database itself knows nothing more of D than the name. Similarly a gateway that sets up a channel for datatype D would not need to know anything more than the name. Paul.