Top-down inserts in Persistent

I'm loading data from XML into a Haskell data type, and I'd like to use Persistent to save it to a database. The examples from the Yesod book have you manually define a FooId field and create the relationships yourself from the bottom up. For example, "a person has many cars": blah blah [persistLowerCase| Person name String Car ownerId PersonId Eq name String |] This works well if you're responsible for creating every person/car manually. But what if the data are given to you? If I were to parse people from an XML file, the cars wouldn't have people_ids in them. Instead I'd get, blah blah [persistLowerCase| Person name String cars [Car] Car name String |] As long as the cars list contains another Persistent type, it seems like I should be able to insert a person and have it insert the cars, with proper foreign keys, automatically. Doing it manually isn't straight-forward because I can't add the "ownerId" field to my Car type and still expect to parse it from the XML (which has no such field). Any ideas? I'm not married to Persistent yet; I just want to read in some XML and save it to a database without having to specify the names and types in three places (preferred place: in Haskell). I don't care too much about the schema I get as long as it's relational.

Hi Michael,
The entity definitions in persistent is very close to the SQL schema, in a
1-to-many relation you must have the foreign key relation defined in the
many table.
You should preferably not insert a car before it's owner is inserted, that
would give you a null reference. So if possible you should insert people
first which will return their id and you can then do the insertion of cars
safely. You can also construct keys manually, this is kind of an hack since
you may construct invalid IDs.
In a relational schema you can make the name of the person the primary key.
There has been some work in adding arbitrarily typed primary keys to
persistent, but I'm not sure if it has been released or is on master.
Either way, using a person name as a primary key may be a bad idea because
of collisions.
Having some mismatch when moving things to relational storage is common. A
lot of times I end up creating intermediary types that contain the data in
a format that makes it easier to work with. But I don't mind this at all,
Haskell makes it very safe to add proxy types and refactor them. You
sometimes end up having to do more queries to the DB than seems necessary,
but this is only a problem if it turns out to be a bottle neck.
HTH,
Adam
On Thu, Dec 26, 2013 at 12:17 PM, Michael Orlitzky
I'm loading data from XML into a Haskell data type, and I'd like to use Persistent to save it to a database. The examples from the Yesod book have you manually define a FooId field and create the relationships yourself from the bottom up. For example, "a person has many cars":
blah blah [persistLowerCase| Person name String Car ownerId PersonId Eq name String |]
This works well if you're responsible for creating every person/car manually. But what if the data are given to you? If I were to parse people from an XML file, the cars wouldn't have people_ids in them. Instead I'd get,
blah blah [persistLowerCase| Person name String cars [Car] Car name String |]
As long as the cars list contains another Persistent type, it seems like I should be able to insert a person and have it insert the cars, with proper foreign keys, automatically. Doing it manually isn't straight-forward because I can't add the "ownerId" field to my Car type and still expect to parse it from the XML (which has no such field).
Any ideas? I'm not married to Persistent yet; I just want to read in some XML and save it to a database without having to specify the names and types in three places (preferred place: in Haskell). I don't care too much about the schema I get as long as it's relational. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 12/29/2013 12:33 PM, Adam Bergmark wrote:
Hi Michael,
The entity definitions in persistent is very close to the SQL schema, in a 1-to-many relation you must have the foreign key relation defined in the many table.
You should preferably not insert a car before it's owner is inserted, that would give you a null reference. So if possible you should insert people first which will return their id and you can then do the insertion of cars safely. You can also construct keys manually, this is kind of an hack since you may construct invalid IDs.
...
Having some mismatch when moving things to relational storage is common. A lot of times I end up creating intermediary types that contain the data in a format that makes it easier to work with. But I don't mind this at all, Haskell makes it very safe to add proxy types and refactor them. You sometimes end up having to do more queries to the DB than seems necessary, but this is only a problem if it turns out to be a bottle neck.
I have 650 XML documents -- all with different schemas -- to import. Assuming some of them are outdated or unused, I might wind up doing 100 before I declare victory. Still an offensive amount of XML =) To parse the XML I already need to create 100 Haskell data types; that part is unavoidable. But since XML is XML, all of those data types are trees. Michael Snoyman suggested, forM_ people $ \(PersonXML name cars) -> do personId <- insert $ Person name forM_ cars $ \car -> insert_ $ Car personId car which works for one tree, Person { [Car] }. But it doesn't work for Person { [Car], [Shoes] }, or anything else. The essence of the problem is that I don't want to write 100 functions like the forM above that all do the same thing but to trees with slightly different shapes. They should all follow the same pattern: insert the big thing, then insert the little things with automatic foreign keys to the big thing.

On Mon, Dec 30, 2013 at 3:11 PM, Michael Orlitzky
On 12/29/2013 12:33 PM, Adam Bergmark wrote: I have 650 XML documents -- all with different schemas -- to import. Assuming some of them are outdated or unused, I might wind up doing 100 before I declare victory. Still an offensive amount of XML =)
To parse the XML I already need to create 100 Haskell data types; that part is unavoidable. But since XML is XML, all of those data types are trees.
Are you sure a relational schema with the structure of each type of XML document is the best approach for your dataset? It sounds like you could benefit from a less structured approach, since your data doesn’t sound very regular.
Michael Snoyman suggested,
forM_ people $ \(PersonXML name cars) -> do personId <- insert $ Person name forM_ cars $ \car -> insert_ $ Car personId car
which works for one tree, Person { [Car] }. But it doesn't work for Person { [Car], [Shoes] }, or anything else. The essence of the problem is that I don't want to write 100 functions like the forM above that all do the same thing but to trees with slightly different shapes. They should all follow the same pattern: insert the big thing, then insert the little things with automatic foreign keys to the big thing.
Boris Lykah’s [Groundhog] library sounds like a good fit for your situation: {-# LANGUAGE FlexibleInstances, GADTs, QuasiQuotes, TemplateHaskell, TypeFamilies #-} import Control.Monad.IO.Class (liftIO) import Database.Groundhog.Core (insert, select) import Database.Groundhog.Generic.Sql.Functions (like) import Database.Groundhog.Generic (defaultMigrationLogger, runDbConn, runMigration) import Database.Groundhog.Postgresql (withPostgresqlConn) import Database.Groundhog.TH (defaultCodegenConfig, groundhog, migrationFunction, mkPersist) data Car = Car { carName :: String } deriving Show data Driver = Driver { driverName :: String , driverCars :: [Car] } deriving Show penelope, anthills :: Driver penelope = Driver "Penelope Pitstop" [Car "The Compact Pussycat"] anthills = Driver "The Ant Hill Mob" [Car "The Bulletproof Bomb", Car "Chugga-Boom"] mkPersist defaultCodegenConfig { migrationFunction = Just "migrateAll" } [groundhog| - entity: Car - entity: Driver |] main :: IO () main = withPostgresqlConn "host=/tmp" $ runDbConn $ do runMigration defaultMigrationLogger migrateAll mapM_ insert [penelope, anthills] drivers <- select $ DriverNameField `like` "The%" liftIO $ mapM_ print drivers This code will create a few tables: one for the `Driver` constructor, another for the `Car` constructor, and a couple of tables to keep track of what’s in the list in the `Driver` constructor. It will even create triggers to help maintain the list-related tables clean, although I venture it’d be uncomfortable manipulating this specific generated schema by hand. Groundhog is very flexible with the sort of data types and schemas it can work with. That example was getting a bit long so I didn’t include anything related to constraints, but specifying uniqueness constraints and the like is relatively painless. Boris wrote a very nice [tutorial] for Groundhog in FP Complete’s School of Haskell, and the Hackage documentation for the [`groundhog-th`] package describes the `groundhog` quasiquoter pretty well. [Groundhog]: http://hackage.haskell.org/package/groundhog [tutorial]: https://www.fpcomplete.com/school/to-infinity-and-beyond/competition-winners... [`groundhog-th`]: http://hackage.haskell.org/package/groundhog-th-0.4.0.3/docs/Database-Ground...

On 12/30/2013 04:57 PM, Manuel Gómez wrote:
On Mon, Dec 30, 2013 at 3:11 PM, Michael Orlitzky
wrote: On 12/29/2013 12:33 PM, Adam Bergmark wrote: I have 650 XML documents -- all with different schemas -- to import. Assuming some of them are outdated or unused, I might wind up doing 100 before I declare victory. Still an offensive amount of XML =)
To parse the XML I already need to create 100 Haskell data types; that part is unavoidable. But since XML is XML, all of those data types are trees.
Are you sure a relational schema with the structure of each type of XML document is the best approach for your dataset? It sounds like you could benefit from a less structured approach, since your data doesn’t sound very regular.
There's a complicated and uninteresting answer to this, so for now let's just say the job is to get it into SQL somehow. I did consider other options, but this is the path of least resistance, resistant as it may be.
Boris Lykah’s [Groundhog] library sounds like a good fit for your situation:
I have been vacillating between Persistent and Groundhog in my prototype. For now I'm using Groundhog, but I haven't written any code yet that would rule out Persistent.
<code>
This code will create a few tables: one for the `Driver` constructor, another for the `Car` constructor, and a couple of tables to keep track of what’s in the list in the `Driver` constructor. It will even create triggers to help maintain the list-related tables clean, although I venture it’d be uncomfortable manipulating this specific generated schema by hand.
Yes! It is tempting isn't it? I emailed Boris about this and unfortunately the list handling is unsupported (undocumented) and is likely to disappear in its current form. Otherwise I had considered running a manual migration after the Groundhog ones to create the necessary views.
Groundhog is very flexible with the sort of data types and schemas it can work with. That example was getting a bit long so I didn’t include anything related to constraints, but specifying uniqueness constraints and the like is relatively painless.
Boris wrote a very nice [tutorial] for Groundhog in FP Complete’s School of Haskell, and the Hackage documentation for the [`groundhog-th`] package describes the `groundhog` quasiquoter pretty well.
Thank you for the suggestion; I do like the way Groundhog leaves my types alone. If I can come up with a way to do a generic tree insert, it will be necessary to leave out e.g. the "cars" column from the "people" table (even though I still need it in the Haskell type). At the moment I am banging my head against the Data.Data docs to try to get that working. All I have so far is some writing on the wall in blood about how Hackage 3 should automatically reject any function with more than two type variables and no examples.
participants (3)
-
Adam Bergmark
-
Manuel Gómez
-
Michael Orlitzky