
I'm kind of stuck that's why I'm posting here to ask wether this makes sense at all, maybe someone else has already done it? What I'd like to have: Some way representing relational data which is typically stored in databases such as Postgresql.. Rewriting something like Postgresql in haskell would take ages.. So I'd be satisfied with having in memory representation only (this would fit the HAppS state system very well .. :) Why ? * type safety * less conversions compared to SQL data * no need to switch processes, parse SQL etc so maybe it's even faster? (a small benchmark showed that inserting 20000 Ints into a list was 8 times faster than using MySQL parsing 20000 INSERT INTO x (1) statements ) I'd like to illustrate two different ideas using a small example: (A) data CD = CD { title :: String, tracks :: [ Track ] } data Track = Track { track :: String, cd :: CD } data PDB = PDB { cds :: Set CD, tracks :: Set Track } because it's not using foreign ids but kind of pointers I'll call this the pointer method using uniq ids it would look like this: (B) data CD = CD { id : Int, title :: String, tracks :: [Int ] } data Track = Track { trackId :: Int, track :: String, cd :: Int } data IDB = IDB { cds :: Map Int CD, tracks :: Map Int Track } I will call it I DB (I = using ids) PDB: pro : * less work when doing joins (no need to look foreign rows up) con : * you need uniq ids or such when serializing to disk * When updating a track you'll also have to update the pointer stored in cds. and if you had another table shelfs.. this had to be updated as well.. IDB: the other way round I find the idea not using any lookups when using joins appealing. Of course having a simple data Table = Table Map UniqId Rec isn't enough, sometimes you need more than one index or even a multi index: data Table = Table { byId :: Map Int Rec , byNameAndAge :: Map String (Map Int (Set Rec)) } Note that I've used Set here as well because this index does'nt have to be uniq! starting to write an insertTable :: Table -> Rec -> Table more than twice is getting tedious.. Of course you can start using some type hackery to insert a rec into all maps automatically.. but you'll get into trouble making the type system apply the best index not the first matching one. (I bet this could be done using HList etc somehow as well.. ) So my current attempt is defining the database using some data types and make template haskell derive those insertIntoTable and update functions. I've added the draft below. But before continuing spending much time on it I'd like to get your advice: Is there a chance that it will pay off? Some general considerations: haskell solution con: haskell can get close to C but in general it may be >10 times slower when not caring too much about design or writing low level (see recent thread about md5 or one where David Roundy has said something about a matrix thread: only 10 times slower?) Using a garbage collector on database data (some hundred MB) might not be the optimal way because I feel you can tell exactly when you no longer need a piece of allocated memory here? So some time might be wasted. projects tend to run longer as expected.. And if data no longer fits into memory .. :(... -> bad performance I think systems such as postgresql do scale much better if you have some gbs of data and only use the most recent X records frequently.. So maybe you'll have to spend time later which you've won by using a haskell relational data representation in memory only.. Another solution: use clusters - I don't have any experience. pro: much more safety (STM, type system ..) there are less possibilities making compared to C / PHP etc Do you also think (A) is more interesting because some load (looking up foreign keys) is moved on insert / delete and update operations taking less time in but are called more frequently thus maybe reducing peak load on queries? Of course some time would have to be spend on queries wich might look like this: let queryresult = $(query ( tables + constraints + relations ) ) db automatically generating the query function taking into account expected index cardinality etc.. Any comments, suggestions, links to existing solutions (except coddfish, haskelldb) ? Marc Weber draft ============= types represeting tables and db ======================== module RDMH.Types where import Language.Haskell.TH import Language.Haskell.TH.Syntax data Uniqueness = Uniq | NotUniq deriving (Show, Eq) data ModifyMode = InsertOnly | UpdateInsert | UpdateInsertDelete deriving (Show, Eq) type TypeS = String -- a name of a data type (data A = ..) data Index = I { uniqueness :: Uniqueness , key :: Exp -- a fuction rec -> key , subkeys :: [ Index ] , keyType :: TypeS -- the key type (this *should* be determined by the function.. But I don't know yet how to do it.. Probably using some GHC API magic -- , expectedCardinality :: Int -- how should this look like exactly? This will be used to select the best indexes } deriving (Show, Eq) -- Int is negative weight / cardinality. -- If you issue a query having two contstraints on types beeing indexed the one having biggest cardinality is taken first -- first type:` -- convinient function to get the type of a name - see TestCase.hs t = ConT . mkName -- type (either key or row) k = VarE . mkName -- gets the key -- app (VarE $ mkName "fst" ) -- a table representation data Table = Table { tableName :: String , row :: TypeS -- this row type must be an instance of Eq , indexes :: [ Index ] -- [ ( [ mkName "Int", mkName "String" ]] is represented as Map Int ( Map String rec ) -- , history :: Bool -- to be implemented. Each time a record is updated the original row should be saved somehow somewhere , modify :: ModifyMode -- , insertTimeStamp :: Bool -- , updateTimeStamp :: Bool } deriving (Eq, Show) data DB = DB { dbName :: String , tables :: [ Table ] , oneToN :: [ (Table, Table, String) ] -- String must be set if you want two relations to the same table (eg one ticket has foreign inbound and outbound flight) -- , nToM :: [ ( Table, Table ) ] } ============= example use case ======================================= {-# OPTIONS_GHC -XTemplateHaskell #-} module TestCase where import Language.Haskell.TH import RDMH.Types import RDMH.TH import Data.Time -- the data data RCD = CD { title :: String , artist :: String , year :: Int } data RTrack = Track { trackTitle :: String , recordingDate :: UTCTime } -- a track table with two indexes title and recordingDate trackT = Table "track" (t "RTrack") -- row type -- indexes: [ I NotUniq (k "trackTitle" ) [] (t "String") , I NotUniq (k "recordingDate" ) [] (t "UTCTime") ] UpdateInsertDelete -- a cd table with two indexes title and year cdT = Table "cd" (t "RCD" ) -- row type -- indexes: [ I Uniq (k "title" ) [] (t "String") , I NotUniq (k "year" ) [] (t "Int") ] UpdateInsertDelete -- the db: both tables and a simple relation db = DB "cdDB" [ cdT, trackT ] [ OneToN cdT trackT "" ) ] $(createDB db)