
Dear haskell-cafe: I'm trying to parse an Open Financial eXchange (OFX) 1.x file. It details my bank transactions, like debit card purchases. It's SGML-based and it goes like: <OFX>[...] <STMTRS>[...] <STMTTRN>[...] <TRNUID>9223ry29r389 <NAME>THE GROCERY STORE BLABLABLA <TRNAMT>234.99 </STMTTRN> <STMTTRN>[...] <TRNUID>1237tg832t <NAME>SOME DUDE ON PAYPAL 4781487 <TRNAMT>2174.27 </STMTTRN> </STMTRS> </OFX> I've left out a bunch, but as you can see it's tree-shaped, and the only reason they didn't misuse XML as a data serialization language instead of SGML was because it wasn't popular yet. (OFX 2.x uses XML but my bank doesn't use OFX 2.x.) When I imagine how to put this into a data structure, I think: -- The '...' below is stuff like the date, info about the bank data OFX = OFX { statement :: StatementResponse, ... } -- The '...' below is stuff like the account number data StatementResponse = StatementResponse { transactions: [Transaction], ... } data Transaction = Transaction { id :: String, name :: String, amount :: Decimal, sic :: Maybe Int, ... } Then I tried to make a parser to emit those data types and failed. I come from Python, where there's no problem if a function returns different types of values depending on its inputs, but that doesn't fly in Haskell. I've tried data OFXThing = OFX { statement :: OFXThing } | StatementResponse { ... transactions :: [OFXThing] } but that would let me make trees of things that make no sense in OFX, like a transaction containing a statement. I made a data Tree k v = Branch k [Tree k v] | Leaf k v type TextTree = Tree String String and a tagsoup-parsec parser that returns Branches for tags like OFX, and Leafs for tags like TRNUID. But now I just have a tree of strings. That holds no useful type information. I want my types to say that OFXes contain statements and statements contain transactions - just like the OFX DTD says. How can I construct the types so that they are tight enough to be meaningful and loose enough that it's possible to write functions that emit them?

On 10 September 2010 17:53, Jared Jennings
I've tried
data OFXThing = OFX { statement :: OFXThing } | StatementResponse { ... transactions :: [OFXThing] }
but that would let me make trees of things that make no sense in OFX, like a transaction containing a statement.
Using alternative constructors (i.e. sum types) is the right approach to get different "datatypes" in the tree. However you aren't respecting the nesting of the tree here - "OFX" is a level higher in the tree than a statement response and the OFX constructor recurs on itself which looks suspect. I suspect OFX is pathologically huge format and isn't a good starting point for designing syntax trees (the downloadable Spec seemed to be several megabytes zipped). If the DTD is very large you might want to use the untyped tree to extract parts of interest and convert after parsing to a smaller typed tree (with only the syntax you are interested in). By the way, HaXML has has a tool called DTD2HS (I think) that will generate Haskell datatypes from a DTD definition. Best wishes Stephen

I show how this can be done using uu-parsinglib. Note that we have sevral parsers, each having its own type: module Transactions where import Text.ParserCombinators.UU import Text.ParserCombinators.UU.Examples import Data.Char pTagged tag (pAttr, pPayload) = pToken ("<" ++ tag ++ ">") *> pAttr *> spaces *> pPayload <* spaces <* pToken ("" ++ tag ++ ">") pTag tag pPayload = pToken ("<" ++ tag ++ ">") *> pPayload data OFX = OFX Response deriving Show data Response = Response [Transaction] deriving Show data Transaction = Transaction String String Amount deriving Show data Amount = Amount Int Int deriving Show pAmount = "TRNAMT" `pTag` (Amount <$> pNatural <* pSym '.' <*> pNatural) pTransaction = "STMTTRN" `pTagged` (pAttr, Transaction <$> "TRNUID" `pTag` pLine <*> "NAME" `pTag` pLine <*> pAmount ) pResponse = "STMTRS" `pTagged` (pAttr, Response <$> pList (pTransaction <* spaces)) pOFX = "OFX" `pTagged` (pAttr, OFX <$> pResponse ) pAttr :: Parser String pAttr = pToken "[...]" spaces = pMunch (`elem` " \n\t") pDigitAsInt = digit2Int <$> pDigit pNatural = foldl (\a b -> a * 10 + b ) 0 <$> pList1 pDigitAsInt digit2Int a = ord a - ord '0' pDigit :: Parser Char pDigit = pSym ('0', '9') pLine = pMunch (/='\n') <* spaces main = do input <- readFile "TrInput" run (pOFX <* spaces) input Running the main function on your code gives: *Transactions> :r [1 of 1] Compiling Transactions ( Transactions.hs, interpreted ) Ok, modules loaded: Transactions. *Transactions> main -- -- > Result: OFX (Response [Transaction "9223ry29r389" "THE GROCERY STORE BLABLABLA" (Amount 234 99),Transaction "1237tg832t" "SOME DUDE ON PAYPAL 4781487" (Amount 2174 27)]) -- *Transactions> It is interesting to what happens if your input is incorrect, Doaitse On 10 sep 2010, at 18:53, Jared Jennings wrote:
<OFX>[...] <STMTRS>[...] <STMTTRN>[...] <TRNUID>9223ry29r389 <NAME>THE GROCERY STORE BLABLABLA <TRNAMT>234.99 </STMTTRN> <STMTTRN>[...] <TRNUID>1237tg832t <NAME>SOME DUDE ON PAYPAL 4781487 <TRNAMT>2174.27 </STMTTRN> </STMTRS> </OFX>

On Fri, Sep 10, 2010 at 2:00 PM, S. Doaitse Swierstra
I show how this can be done using uu-parsinglib. Note that we have sevral parsers, each having its own type:
Thanks for such a complete example, Doaitse! Unfortunately I have a requirement I didn't disclose: the simple tags like <TRNUID>, <NAME>, <AMOUNT> could come in any order; and some are optional. I tried to fix that by making every field in my Transaction record a Maybe, and keeping a Transaction as state for my parser. But after so many Maybes I began to think this was not the right way. And I had to run a parser as part of another parser. And after all that, it wouldn't build because it was badly typed. But in any case, thanks for turning me on to Text.ParserCombinators.UU; I'd only tried Parsec before.

On 16 sep 2010, at 05:42, Jared Jennings wrote:
On Fri, Sep 10, 2010 at 2:00 PM, S. Doaitse Swierstra
wrote: I show how this can be done using uu-parsinglib. Note that we have sevral parsers, each having its own type:
Thanks for such a complete example, Doaitse! Unfortunately I have a requirement I didn't disclose: the simple tags like <TRNUID>, <NAME>, <AMOUNT> could come in any order; and some are optional. I tried to fix that by making every field in my Transaction record a Maybe, and keeping a Transaction as state for my parser. But after so many Maybes I began to think this was not the right way. And I had to run a parser as part of another parser. And after all that, it wouldn't build because it was badly typed.
The good news is that the library has combinators for that too ;-} Just change a few lines. If they are optional use the pOpt combinator instead of the pOne. Doaitse module Transactions where import Text.ParserCombinators.UU import Text.ParserCombinators.UU.Examples import Data.Char pTagged tag (pAttr, pPayload) = pToken ("<" ++ tag ++ ">") *> pAttr *> spaces *> pPayload <* spaces <* pToken ("" ++ tag ++ ">") pTag tag pPayload = pToken ("<" ++ tag ++ ">") *> pPayload data OFX = OFX Response deriving Show data Response = Response [Transaction] deriving Show data Transaction = Transaction String String Amount deriving Show data Amount = Amount Int Int deriving Show pAmount = "TRNAMT" `pTag` (Amount <$> pNatural <* pSym '.' <*> pNatural) pTransaction = "STMTTRN" `pTagged` (pAttr, Transaction `pMerge` ( pOne ("TRNUID" `pTag` pLine) <||> pOne ("NAME" `pTag` pLine) <||> pOne pAmount ) ) pResponse = "STMTRS" `pTagged` (pAttr, Response <$> pList (pTransaction <* spaces)) pOFX = "OFX" `pTagged` (pAttr, OFX <$> pResponse ) pAttr :: Parser String pAttr = pToken "[...]" spaces = pMunch (`elem` " \n\t") pDigitAsInt = digit2Int <$> pDigit pNatural = foldl (\a b -> a * 10 + b ) 0 <$> pList1 pDigitAsInt digit2Int a = ord a - ord '0' pDigit :: Parser Char pDigit = pSym ('0', '9') pLine = pMunch (/='\n') <* spaces main = do input <- readFile "TrInput" run (pOFX <* spaces) input
But in any case, thanks for turning me on to Text.ParserCombinators.UU; I'd only tried Parsec before. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Jared Jennings
-
Malcolm Wallace
-
S. Doaitse Swierstra
-
Stephen Tetley