parsec and source material with random order lines

Hi, I'm trying to parse ical files but the source material doesn't matter much. First, I know there is an icalendar library on hackage, but I'm trying to learn as well through this. Now the format is really quite simple and actually I'm parsing it, it works, but I don't like the code I'm writing, it feels wrong and I'm sure there is a better way. Actually for now I'm parsing it to an array of arrays, but I want to fill a proper "data" structure. For my purpose the file contains a bunch of records like this: BEGIN:VEVENT DTSTART:20121218T103000Z DTEND:20121218T120000Z [..] DESCRIPTION: [..] END:VEVENT There are a bunch of records I don't care about and also I want to parse no matter what is the order of directives (so, i want to parse also if DTEND appears before DTSTART for instance, and so on). That last part is my one problem. I can't do: parseBegin start <- parseStart end <- parseEnd skipRows desc <- parseDesc skipRows end <- parseEnd return Event { eventStart = start, eventEnd = end ...} my current working code is: parseEvent = do parseBegin contents <- many1 $ (try startDate) <|> (try endDate) <|> (try description) <|> unknownCalendarInfo parseEnd return contents But then contents of course returns an array, while I want to return only one element here. SOMEHOW what I would like is: parseEvent = do parseBegin contents <- many1 $ (start <- T.try startDate) <|> (end <- T.try endDate) <|> (desc <- T.try description) <|> unknownCalendarInfo parseEnd return Event { eventStart = start, eventEnd = end ...} But obviously as far as Parsec is concerned startDate could occur several times and also it's just not valid Haskell syntax. So, any hint about this problem? Parsing multi-line records with Parsec, when I don't know the order in which the lines will appear? I mean sure I can convert my array to the proper data structure... I find which element in the array contains the start date and then which contains the end date... and build my data structure.. But I'm sure something much nicer can be done... I just can't find how. I see the author of iCalendar fixed the problem but I can't completely understand his source, it's too many things at the same time for me, I need to take this one step at a time. Thank you! Emmanuel

Hi Emmanuel, Sounds like you want a permutation parser, perhaps? Check out http://hackage.haskell.org/packages/archive/parsec/latest/doc/html/Text-Pars... -Brent On Tue, Dec 25, 2012 at 12:18:37AM +0100, Emmanuel Touzery wrote:
Hi,
I'm trying to parse ical files but the source material doesn't matter much. First, I know there is an icalendar library on hackage, but I'm trying to learn as well through this.
Now the format is really quite simple and actually I'm parsing it, it works, but I don't like the code I'm writing, it feels wrong and I'm sure there is a better way. Actually for now I'm parsing it to an array of arrays, but I want to fill a proper "data" structure.
For my purpose the file contains a bunch of records like this:
BEGIN:VEVENT DTSTART:20121218T103000Z DTEND:20121218T120000Z [..] DESCRIPTION: [..] END:VEVENT
There are a bunch of records I don't care about and also I want to parse no matter what is the order of directives (so, i want to parse also if DTEND appears before DTSTART for instance, and so on).
That last part is my one problem. I can't do:
parseBegin start <- parseStart end <- parseEnd skipRows desc <- parseDesc skipRows end <- parseEnd return Event { eventStart = start, eventEnd = end ...}
my current working code is:
parseEvent = do parseBegin contents <- many1 $ (try startDate) <|> (try endDate) <|> (try description) <|> unknownCalendarInfo parseEnd return contents
But then contents of course returns an array, while I want to return only one element here.
SOMEHOW what I would like is:
parseEvent = do parseBegin contents <- many1 $ (start <- T.try startDate) <|> (end <- T.try endDate) <|> (desc <- T.try description) <|> unknownCalendarInfo parseEnd return Event { eventStart = start, eventEnd = end ...}
But obviously as far as Parsec is concerned startDate could occur several times and also it's just not valid Haskell syntax.
So, any hint about this problem? Parsing multi-line records with Parsec, when I don't know the order in which the lines will appear? I mean sure I can convert my array to the proper data structure... I find which element in the array contains the start date and then which contains the end date... and build my data structure.. But I'm sure something much nicer can be done... I just can't find how.
I see the author of iCalendar fixed the problem but I can't completely understand his source, it's too many things at the same time for me, I need to take this one step at a time.
Thank you!
Emmanuel
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

I think you are right, this is probably the right track. A little more
googling with permutation parsers gave me this, which is also about parsing
iCal using parsec:
http://stackoverflow.com/questions/3706172/haskell-parsec-and-unordered-prop...
I'll review all this and see if that solves the problem... Thank you!
Emmanuel
On Tue, Dec 25, 2012 at 3:28 AM, Brent Yorgey
Hi Emmanuel,
Sounds like you want a permutation parser, perhaps? Check out
http://hackage.haskell.org/packages/archive/parsec/latest/doc/html/Text-Pars...
-Brent
On Tue, Dec 25, 2012 at 12:18:37AM +0100, Emmanuel Touzery wrote:
Hi,
I'm trying to parse ical files but the source material doesn't matter much. First, I know there is an icalendar library on hackage, but I'm trying to learn as well through this.
Now the format is really quite simple and actually I'm parsing it, it works, but I don't like the code I'm writing, it feels wrong and I'm sure there is a better way. Actually for now I'm parsing it to an array of arrays, but I want to fill a proper "data" structure.
For my purpose the file contains a bunch of records like this:
BEGIN:VEVENT DTSTART:20121218T103000Z DTEND:20121218T120000Z [..] DESCRIPTION: [..] END:VEVENT
There are a bunch of records I don't care about and also I want to parse no matter what is the order of directives (so, i want to parse also if DTEND appears before DTSTART for instance, and so on).
That last part is my one problem. I can't do:
parseBegin start <- parseStart end <- parseEnd skipRows desc <- parseDesc skipRows end <- parseEnd return Event { eventStart = start, eventEnd = end ...}
my current working code is:
parseEvent = do parseBegin contents <- many1 $ (try startDate) <|> (try endDate) <|> (try description) <|> unknownCalendarInfo parseEnd return contents
But then contents of course returns an array, while I want to return only one element here.
SOMEHOW what I would like is:
parseEvent = do parseBegin contents <- many1 $ (start <- T.try startDate) <|> (end <- T.try endDate) <|> (desc <- T.try description) <|> unknownCalendarInfo parseEnd return Event { eventStart = start, eventEnd = end ...}
But obviously as far as Parsec is concerned startDate could occur several times and also it's just not valid Haskell syntax.
So, any hint about this problem? Parsing multi-line records with Parsec, when I don't know the order in which the lines will appear? I mean sure I can convert my array to the proper data structure... I find which element in the array contains the start date and then which contains the end date... and build my data structure.. But I'm sure something much nicer can be done... I just can't find how.
I see the author of iCalendar fixed the problem but I can't completely understand his source, it's too many things at the same time for me, I need to take this one step at a time.
Thank you!
Emmanuel
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

I think you are right, this is probably the right track. A little more
googling with permutation parsers gave me this, which is also about
parsing iCal using parsec:
http://stackoverflow.com/questions/3706172/haskell-parsec-and-unordered-prop...
I'll review all this and see if that solves the problem... Thank you!
Emmanuel
On Tue, Dec 25, 2012 at 3:28 AM, Brent Yorgey

Hello,
Well, now I've checked it in further detail... a permutation parser is
basically what I want.
But there is a but. But here I'm really pushing it, it's really my problem
at this point.
The problem is that if let's say the parsing unit is the line (which is my
situation), if there are 6 lines and they contain data and I don't know the
order... The problem is that the permutation parser requires that I know
how to parse and give to my data constructor all 6 lines, while in reality
I only care about 3 of those 6 lines. I can't give a parser with which I
would discard lines, "this I don't parse".
And it makes sense... It's just that in my case, I don't want to load all
the possible fields contained in a iCalendar file, only a couple of them
matter to me.
I think I'll pre-process the data to filter only the data I care about
(filter out all directives I don't understand, I can do this by simply
checking what are the first few characters on each line) and then I give
that to parsec and that's a winning combination.
Just let me know if there is a more elegant way, but it's starting to be a
bit of a messy situation (I don't know the order, and I don't want to use
all the input data..), so I'm not sure there is.
Thank you a lot!
Emmanuel
On Tue, Dec 25, 2012 at 3:28 AM, Brent Yorgey
Hi Emmanuel,
Sounds like you want a permutation parser, perhaps? Check out
http://hackage.haskell.org/packages/archive/parsec/latest/doc/html/Text-Pars...
-Brent
On Tue, Dec 25, 2012 at 12:18:37AM +0100, Emmanuel Touzery wrote:
Hi,
I'm trying to parse ical files but the source material doesn't matter much. First, I know there is an icalendar library on hackage, but I'm trying to learn as well through this.
Now the format is really quite simple and actually I'm parsing it, it works, but I don't like the code I'm writing, it feels wrong and I'm sure there is a better way. Actually for now I'm parsing it to an array of arrays, but I want to fill a proper "data" structure.
For my purpose the file contains a bunch of records like this:
BEGIN:VEVENT DTSTART:20121218T103000Z DTEND:20121218T120000Z [..] DESCRIPTION: [..] END:VEVENT
There are a bunch of records I don't care about and also I want to parse no matter what is the order of directives (so, i want to parse also if DTEND appears before DTSTART for instance, and so on).
That last part is my one problem. I can't do:
parseBegin start <- parseStart end <- parseEnd skipRows desc <- parseDesc skipRows end <- parseEnd return Event { eventStart = start, eventEnd = end ...}
my current working code is:
parseEvent = do parseBegin contents <- many1 $ (try startDate) <|> (try endDate) <|> (try description) <|> unknownCalendarInfo parseEnd return contents
But then contents of course returns an array, while I want to return only one element here.
SOMEHOW what I would like is:
parseEvent = do parseBegin contents <- many1 $ (start <- T.try startDate) <|> (end <- T.try endDate) <|> (desc <- T.try description) <|> unknownCalendarInfo parseEnd return Event { eventStart = start, eventEnd = end ...}
But obviously as far as Parsec is concerned startDate could occur several times and also it's just not valid Haskell syntax.
So, any hint about this problem? Parsing multi-line records with Parsec, when I don't know the order in which the lines will appear? I mean sure I can convert my array to the proper data structure... I find which element in the array contains the start date and then which contains the end date... and build my data structure.. But I'm sure something much nicer can be done... I just can't find how.
I see the author of iCalendar fixed the problem but I can't completely understand his source, it's too many things at the same time for me, I need to take this one step at a time.
Thank you!
Emmanuel
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (3)
-
Brent Yorgey
-
Dudley Brooks
-
Emmanuel Touzery