
On Mon, Dec 30, 2013 at 3:11 PM, Michael Orlitzky
On 12/29/2013 12:33 PM, Adam Bergmark wrote: I have 650 XML documents -- all with different schemas -- to import. Assuming some of them are outdated or unused, I might wind up doing 100 before I declare victory. Still an offensive amount of XML =)
To parse the XML I already need to create 100 Haskell data types; that part is unavoidable. But since XML is XML, all of those data types are trees.
Are you sure a relational schema with the structure of each type of XML document is the best approach for your dataset? It sounds like you could benefit from a less structured approach, since your data doesn’t sound very regular.
Michael Snoyman suggested,
forM_ people $ \(PersonXML name cars) -> do personId <- insert $ Person name forM_ cars $ \car -> insert_ $ Car personId car
which works for one tree, Person { [Car] }. But it doesn't work for Person { [Car], [Shoes] }, or anything else. The essence of the problem is that I don't want to write 100 functions like the forM above that all do the same thing but to trees with slightly different shapes. They should all follow the same pattern: insert the big thing, then insert the little things with automatic foreign keys to the big thing.
Boris Lykah’s [Groundhog] library sounds like a good fit for your situation: {-# LANGUAGE FlexibleInstances, GADTs, QuasiQuotes, TemplateHaskell, TypeFamilies #-} import Control.Monad.IO.Class (liftIO) import Database.Groundhog.Core (insert, select) import Database.Groundhog.Generic.Sql.Functions (like) import Database.Groundhog.Generic (defaultMigrationLogger, runDbConn, runMigration) import Database.Groundhog.Postgresql (withPostgresqlConn) import Database.Groundhog.TH (defaultCodegenConfig, groundhog, migrationFunction, mkPersist) data Car = Car { carName :: String } deriving Show data Driver = Driver { driverName :: String , driverCars :: [Car] } deriving Show penelope, anthills :: Driver penelope = Driver "Penelope Pitstop" [Car "The Compact Pussycat"] anthills = Driver "The Ant Hill Mob" [Car "The Bulletproof Bomb", Car "Chugga-Boom"] mkPersist defaultCodegenConfig { migrationFunction = Just "migrateAll" } [groundhog| - entity: Car - entity: Driver |] main :: IO () main = withPostgresqlConn "host=/tmp" $ runDbConn $ do runMigration defaultMigrationLogger migrateAll mapM_ insert [penelope, anthills] drivers <- select $ DriverNameField `like` "The%" liftIO $ mapM_ print drivers This code will create a few tables: one for the `Driver` constructor, another for the `Car` constructor, and a couple of tables to keep track of what’s in the list in the `Driver` constructor. It will even create triggers to help maintain the list-related tables clean, although I venture it’d be uncomfortable manipulating this specific generated schema by hand. Groundhog is very flexible with the sort of data types and schemas it can work with. That example was getting a bit long so I didn’t include anything related to constraints, but specifying uniqueness constraints and the like is relatively painless. Boris wrote a very nice [tutorial] for Groundhog in FP Complete’s School of Haskell, and the Hackage documentation for the [`groundhog-th`] package describes the `groundhog` quasiquoter pretty well. [Groundhog]: http://hackage.haskell.org/package/groundhog [tutorial]: https://www.fpcomplete.com/school/to-infinity-and-beyond/competition-winners... [`groundhog-th`]: http://hackage.haskell.org/package/groundhog-th-0.4.0.3/docs/Database-Ground...