
On 12/30/2013 04:57 PM, Manuel Gómez wrote:
On Mon, Dec 30, 2013 at 3:11 PM, Michael Orlitzky
wrote: On 12/29/2013 12:33 PM, Adam Bergmark wrote: I have 650 XML documents -- all with different schemas -- to import. Assuming some of them are outdated or unused, I might wind up doing 100 before I declare victory. Still an offensive amount of XML =)
To parse the XML I already need to create 100 Haskell data types; that part is unavoidable. But since XML is XML, all of those data types are trees.
Are you sure a relational schema with the structure of each type of XML document is the best approach for your dataset? It sounds like you could benefit from a less structured approach, since your data doesn’t sound very regular.
There's a complicated and uninteresting answer to this, so for now let's just say the job is to get it into SQL somehow. I did consider other options, but this is the path of least resistance, resistant as it may be.
Boris Lykah’s [Groundhog] library sounds like a good fit for your situation:
I have been vacillating between Persistent and Groundhog in my prototype. For now I'm using Groundhog, but I haven't written any code yet that would rule out Persistent.
<code>
This code will create a few tables: one for the `Driver` constructor, another for the `Car` constructor, and a couple of tables to keep track of what’s in the list in the `Driver` constructor. It will even create triggers to help maintain the list-related tables clean, although I venture it’d be uncomfortable manipulating this specific generated schema by hand.
Yes! It is tempting isn't it? I emailed Boris about this and unfortunately the list handling is unsupported (undocumented) and is likely to disappear in its current form. Otherwise I had considered running a manual migration after the Groundhog ones to create the necessary views.
Groundhog is very flexible with the sort of data types and schemas it can work with. That example was getting a bit long so I didn’t include anything related to constraints, but specifying uniqueness constraints and the like is relatively painless.
Boris wrote a very nice [tutorial] for Groundhog in FP Complete’s School of Haskell, and the Hackage documentation for the [`groundhog-th`] package describes the `groundhog` quasiquoter pretty well.
Thank you for the suggestion; I do like the way Groundhog leaves my types alone. If I can come up with a way to do a generic tree insert, it will be necessary to leave out e.g. the "cars" column from the "people" table (even though I still need it in the Haskell type). At the moment I am banging my head against the Data.Data docs to try to get that working. All I have so far is some writing on the wall in blood about how Hackage 3 should automatically reject any function with more than two type variables and no examples.