xml deserialization using generics?

Cant decide whether this list is appropriate for questions related to generics usage, or only to generics design? So let me know if I should use haskell@haskell.org instead, but on the other hand I figure you guys are the experts :-) Trying to write an xmlserialization function for arbitrary (Data) objects, and not requiring a DTD or a code generation step. It's possible in C# so it really ought to be possible in Haskell ;-) xml serialization is pretty easy, see section 9.1.1 of http://www.haskell.org/haskellwiki/HXT On the other hand deserialization is still a work in progress. At least I'm not smart enough to figure it out, and nothing comes up in Google. The xml parsing bit is done, see section 9.1.2 of http://www.haskell.org/haskellwiki/HXT So, we can get a list of pairs/triples etc containing the data of our choice, such as field names, field values (in string format), data types, constructors, etc. What's remaining is to take makeConstrM, our final data type, and the list of field values, and to create the final object. Here's my non-working attempt at this bit so far. Most of the working bits come from a demonstration by kpreid in irc yesterday night of using makeConstrM with a pair of strings. The rest of the code is my feeble attempt to get this working for the Config custom data type. runM' :: (MonadState [String] m, Monad m, Data a) => m a runM' = do value <- gets head modify tail -- then one of: (pick the non-working function of your choice ;-) : -- return read (fromJust value) -- return (fromJust $ cast value ) -- return (fst $ head $ gread( "(" ++ value ++ ")" ) ) -- return (fromConstrM runM' constr) -- return (fromConstr contr) testConstrM' :: (Read a, Data a, Read c, Read b, Data b, Data c) => [String] -> a -> (b,c) testConstrM' fieldvalues object = evalState( fromConstrM runM' (toConstr object) ) fieldvalues data Config = Config{ name :: String, age :: Int } deriving( Data, Show, Typeable, Ord, Eq, Read ) createConfig = Config "blah" 3 test = testConstrM' ["qsdfqsdf", "7"] createConfig Note that whilst I'm a total Haskell newbie (this is week 2), I've used Reflection extensively in other languages, eg wrote a fast async rpc layer over guaranteed udp for C# http://metaverse.svn.sourceforge.net/viewvc/metaverse/Trunk/Source/Metaverse..., so I know more or less what I'm aiming for, just not exactly how to do it ;-)

Hugh Perkins wrote:
Cant decide whether this list is appropriate for questions related to generics usage, or only to generics design?
I think your question is quite appropriate -- actually, it is a good question. It asks essentially for a monadic gmap, which seems quite important. This operation is also the `inverse' of one of our comparison tests: listify. The latter collects values from the leaves of a data type in a list. That list is a `state' of a value, so to speak. The state can be transported over the network, and, on the other end, can be infused into a prototype object (assuming it has the same structure). The monadic gmap subsumes two traversals: generic map and reduction. So, your question looks even better. I wonder if we should add monadic gmap to our test suite? I have just committed the monadic gmap implementation and the test for one generic library, the one I'm more familiar with. The test is in the file http://darcs.haskell.org/generics/comparison/SmashA/Deserialize.hs We start with the data type of a company C [D "Research" (E (P "Laemmel" "Amsterdam") (S 8000.0)) [PU (E (P "Joost" "Amsterdam") (S 1000.0)), PU (E (P "Marlow" "Cambridge") (S 2000.0))], D "Strategy" (E (P "Blair" "London") (S 100000.0)) []] (see http://darcs.haskell.org/generics/comparison/CompanyDatatypes.hs) We can serialize it: test1 = serialize genCom obtaining a list ["Research","Laemmel","Amsterdam","8000.0","Joost","Amsterdam", "1000.0","Marlow","Cambridge","2000.0","Strategy","Blair", "London","100000.0"] OTH, we can take the list retro = ["Metaphysics", "Kant","Koeningsberg","800.0", "Hume","Edinburgh","100.0", "Marlowe","Cambridge","200.0", "Ruling","Thatcher","London","50000.0"] and use it to `upgrade' the company genCom: test2 = deserialize genCom retro giving us *Deserialize> test2 C [D "Metaphysics" (E (P "Kant" "Koeningsberg") (S 800.0)) [PU (E (P "Hume" "Edinburgh") (S 100.0)), PU (E (P "Marlowe" "Cambridge") (S 200.0))], D "Ruling" (E (P "Thatcher" "London") (S 50000.0)) []]

Better still is McBride and Paterson's idiomatic traverse, which subsumes monadic gmap and a bit more (since some idioms - phantom idioms - aren't monads). The serialise/deserialise example is given there too. See Bruno's and my paper "The Essence of the Iterator Pattern". http://web.comlab.ox.ac.uk/jeremy.gibbons/publications/#iterator Jeremy On 27 Jun 2007, at 09:39, oleg@pobox.com wrote:
Hugh Perkins wrote:
Cant decide whether this list is appropriate for questions related to generics usage, or only to generics design?
I think your question is quite appropriate -- actually, it is a good question. It asks essentially for a monadic gmap, which seems quite important. This operation is also the `inverse' of one of our comparison tests: listify. The latter collects values from the leaves of a data type in a list. That list is a `state' of a value, so to speak. The state can be transported over the network, and, on the other end, can be infused into a prototype object (assuming it has the same structure).
The monadic gmap subsumes two traversals: generic map and reduction. So, your question looks even better.
I wonder if we should add monadic gmap to our test suite?
I have just committed the monadic gmap implementation and the test for one generic library, the one I'm more familiar with. The test is in the file
http://darcs.haskell.org/generics/comparison/SmashA/Deserialize.hs
We start with the data type of a company
C [D "Research" (E (P "Laemmel" "Amsterdam") (S 8000.0)) [PU (E (P "Joost" "Amsterdam") (S 1000.0)), PU (E (P "Marlow" "Cambridge") (S 2000.0))], D "Strategy" (E (P "Blair" "London") (S 100000.0)) []]
(see http://darcs.haskell.org/generics/comparison/CompanyDatatypes.hs)
We can serialize it:
test1 = serialize genCom
obtaining a list
["Research","Laemmel","Amsterdam","8000.0","Joost","Amsterdam", "1000.0","Marlow","Cambridge","2000.0","Strategy","Blair", "London","100000.0"]
OTH, we can take the list
retro = ["Metaphysics", "Kant","Koeningsberg","800.0", "Hume","Edinburgh","100.0", "Marlowe","Cambridge","200.0", "Ruling","Thatcher","London","50000.0"]
and use it to `upgrade' the company genCom:
test2 = deserialize genCom retro
giving us
*Deserialize> test2 C [D "Metaphysics" (E (P "Kant" "Koeningsberg") (S 800.0)) [PU (E (P "Hume" "Edinburgh") (S 100.0)), PU (E (P "Marlowe" "Cambridge") (S 200.0))], D "Ruling" (E (P "Thatcher" "London") (S 50000.0)) []]
_______________________________________________ Generics mailing list Generics@haskell.org http://www.haskell.org/mailman/listinfo/generics
Jeremy.Gibbons@comlab.ox.ac.uk Oxford University Computing Laboratory, TEL: +44 1865 283508 Wolfson Building, Parks Road, FAX: +44 1865 283531 Oxford OX1 3QD, UK. URL: http://www.comlab.ox.ac.uk/oucl/people/jeremy.gibbons.html

Oleg, Ok that looks like exactly what I'm looking for :-) Unfortunately I cant seem to run it. It comes up with strange typing messages when compiling: *TestDeserialize> serialize P <interactive>:1:0: No instance for (LDat (TL_red [a]) (HCons (Int -> [String]) (HCons (Float -> [String]) (HCons (String -> [S tring]) HNil))) (String -> String -> Person) df) arising from use of `serialize' at <interactive>:1:0-10 Possible fix: add an instance declaration for (LDat (TL_red [a]) (HCons (Int -> [String]) (HCons (Float -> [String]) (HCons (String -> [String]) HNil)) ) (String -> String -> Person) df) In the expression: serialize P In the definition of `it': it = serialize P where P is: getP = P "Lammel" "Amsterdam" (for simplicity) Similar looking errors with genCom, and/or with deserialize. I guess I need to add an instance or deriving statement to the P class? On another track, read through SYB3 a little, and it seems to have a really clean way to create polymorphic functions, such as: class Data a => StringParser a where parsestring :: String -> a parsestring x = fst $ head $gread(x) instance StringParser Int where parsestring x = read x instance StringParser String where parsestring x = read( "\"" ++ x ++ "\"" ) Looks like it should do exactly what I need: take a string and create any data type, as long as the data type is an instance of Data. Unfortunately, and strangely(?), it only seems to work for datatypes declared as instance. That's surprising since the default function in the class declaration works for the Data class. It would make sense if the default function would be applied to all data types which are instances of the base classes? So, that didnt work yet. A normal "read" ought to work, but there's no way of declaring that the children of the parent data type are instances of Read, so it works straight away less well (gets caught by the compiler). The only return functions which build so far are: return (fromJust $ fst $ head $ gread( "(" ++ (fromJust value) ++ ")" ) ) and: return ((fromJust . cast) value) ... but neither of these work particularly well at runtime, and they're difficult to customize for different data types. I really like the SYB3 approach to creating customizable generic functions, but it's a shame that there's no way of declaring a default function that works for all instances of the base classe(s). Or maybe I just didnt find out how to do that yet?

Well, figured out a solution to parsing xml. It's not really pretty, but it works. Basically we just convert the incoming xml into a gread compatible format then use gread :-D If someone has a more elegant solution, please let me know. For the moment, it will only work with Ints and Strings as the children, but it's pretty easy to add new primitive data types, as long as it's pretty easy to make them compatible with gread. You just need to add additional lines to the case statement in xmlToGShowFormat Again, if someone has a more elegant solution, I'd enjoy seeing it, but at least this one works :-D module ParseXml where import IO import Char import List import Maybe import Data.Generics hiding (Unit) import Text.XML.HXT.Arrow hiding (when) data Config = Config{ name :: String, age :: Int } --data Config = Config{ age :: Int } deriving( Data, Show, Typeable, Ord, Eq, Read ) createConfig = Config "qsdfqsdf" 3 --createConfig = Config 3 gshow' :: Data a => a -> String gshow' t = fromMaybe (showConstr(toConstr t)) (cast t) -- helper function from http://www.defmacro.org/ramblings/haskell-web.html introspectData :: Data a => a -> [(String, String)] introspectData a = zip fields (gmapQ gshow' a) where fields = constrFields $ toConstr a -- function to create xml string from single-layer Haskell data type xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "" ++ a ++ ">") "" ( introspectData object ) ++ "" ++ show(toConstr object) ++ ">" -- parse xml to HXT tree, and obtain the value of node "fieldname" -- returns a string getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist) | otherwise = Nothing where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[] -- parse templateobject to get list of field names -- apply these to xml to get list of values -- return (fieldnames list, value list) xmlToGShowFormat :: Data a => String -> a -> String xmlToGShowFormat xml templateobject = go where mainconstructorname = (showConstr $ toConstr templateobject) fields = constrFields $ toConstr templateobject values = map ( \fieldname -> getValue xml fieldname ) fields datatypes = gmapQ (dataTypeOf) templateobject constrs = gmapQ (toConstr) templateobject datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject fieldtogshowformat (value,datatyperep) = case datatyperep of IntRep -> "(" ++ fromJust value ++ ")" _ -> show(fromJust value) formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps) go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")" xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject) dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config

Hugh Perkins wrote:
Unfortunately I cant seem to run it. It comes up with strange typing messages when compiling: *TestDeserialize> serialize P
<interactive>:1:0: No instance for (LDat (TL_red [a]) ... where P is:
getP = P "Lammel" "Amsterdam"
(for simplicity)
Similar looking errors with genCom, and/or with deserialize.
I guess I need to add an instance or deriving statement to the P class?
You have encountered a bit confusing behavior of GHCi. If you put
getP = P "Lammel" "Amsterdam" t2 = serialize getP
into the file Deserialize.hs, which you then load into GHCi, you can type "t2" at the GHCi prompt and see the expected result. OTH, typing "serialize getP" at the GHCi prompt gives the error you reported. This is a well known issue with GHCi: it uses the flags given in the OPTIONS_GHC pragma when interpreting files, but forgets these flags when interpreting expressions typed at the prompt. I think this a bit confusing behavior started since GHC 6.4.0. So, you could place any complex expression in a file and load that into GHCi (Haskell Emacs mode makes this quite convenient), or start your GHCi with appropriate flags (at least, -fglasgow-exts). The third alternative is to help GHCi by defining serialize like the following
serialize1 xs = gapp (TL_red (concat::[[String]]->[String])) primitive_fields_show xs
with the explicit signature for 'concat'. We can now type "serialize1 getP" at the GHCi prompt and get the desired result.
participants (3)
-
Hugh Perkins
-
Jeremy Gibbons
-
oleg@pobox.com