Difficulties in accessing inner elements of data types

Hi, I'm working on a Haskell library for interacting with emacs org files. For those that do not know, an org file is a structured outline style file that has nested headings, text, tables and other elements. For example: * Heading 1 Some text, more text. This is a subelement of Heading 1 1. You can also have list 1. and nested lists 2. more... ** Nested Heading (subelement of Heading 1) text... (subelement of Nested Heading) ** Another level 2 heading (subelement of Heading 1) | Desc | Value | |-----------+--------------------------------------| | Table | You can also have tables in the file | | another | row | | seperator | you can have seps as well, eg | |-----------+--------------------------------------| * Another top level heading There are many more features, see orgmode.org My library enables read and write access to a subset of this format (eg lists aren't parsed atm). The data structures used for writing are: data OrgFile = OrgFile [OrgFileElement] data OrgFileElement = Table OrgTable | Paragraph String | Heading OrgHeading -- heading level, title, subelements data OrgHeading = OrgHeading Int String [OrgFileElement] data OrgTable = OrgTable [OrgTableRow] data OrgTableRow = OrgTableRow [String] | OrgTableRowSep To write a file you contruct a OrgFile out of those elements, and pass it to a writeOrgFile func. Eg: writeOrg $ OrgFile [Heading (OrgHeading 1 "h1" [Paragraph "str"])] would produce: * h1 str I was going to use the same data structures for reading an org file, but it quickly became apparent that this would not be suitable, as you needed the position of the file of an element to be able to report errors. Eg if you needed to report an error that a number was expected, the message "'cat' is not a number" is not very useful, but "Line 2031: 'cat' is not a number" is. So the data structures I used were: data FilePosition = FilePosition Line Column data WithPos a = WithPos { filePos :: FilePosition, innerValue :: a } data OrgTableP = OrgTableP [WithPos OrgTableRow] data OrgFileElementP = TableP OrgTableP | ParagraphP String | HeadingP OrgHeadingP data OrgHeadingP = OrgHeadingP Int String [WithPos OrgFileElementP] data OrgFileP = OrgFileP [WithPos OrgFileElementP] Finally there is a function readOrg, which takes a string, and returns an OrgTableP. Now, this all works as expected (files are correctly being parsed and written), however I am having a lot of trouble trying to come up with a decent API to work with this. While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous. For example, to read the description line for the project named "Project14" in the file: * 2007 Projects ** Project 1 Description: 1 Tags: None ** Project 2 Tags: asdf,fdsa Description: hello * 2008 Projects * 2009 Projects ** Project14 Tags: RightProject Description: we want this requires the code: type ErrorS = String listToEither str [] = Left str listToEither _ (x:_) = Right x get14 :: OrgFileP -> Either ErrorS String get14 (OrgFileP elements) = getDesc =<< (getRightProject . concatProjects) elements where concatProjects :: [WithPos OrgFileElementP] -> [OrgHeadingP] concatProjects [] = [] concatProjects ((WithPos _ (HeadingP h)) : rest) = h : concatProjects rest concatProjects (_ : rest) = concatProjects rest getRightProject :: [OrgHeadingP] -> Either ErrorS OrgHeadingP getRightProject = listToEither "Couldn't find project14" . filter (\(OrgHeadingP _ name _) -> name == "Project14") getDesc :: OrgHeadingP -> Either ErrorS String getDesc (OrgHeadingP _ _ children) = case filter paragraphWithDesc (map innerValue children) of [] -> Left $ show (filePos $ head children) ++ ": Couldn't find desc in project" ((ParagraphP str):_) -> Right str _ -> error "should not be possible" paragraphWithDesc :: OrgFileElementP -> Bool paragraphWithDesc (ParagraphP str) = str =~ "Description" paragraphWithDesc _ = False If you think that is bad, try writing a function that adds the Tag "Hard" to Project2 :( What I really need is a DSL that would allow sql like queries on an OrgFileP. For example: select (anyHeading `next` headingWithName "Project14" `withFailMsg` "couldn't find p14" `next` paragraphMatchingRegex "Description" `withFailMsg` "no desc") org `output` paragraphText would return a String OR select (anyHeading `next` headingWithName "Project2" `next` paragraphMatchingRegex "Tag:") org `modify` paragraphText (++ ",Hard") would return an OrgFile, with the new Hard tag added. However, I don't know if this is even possible, how to do it, or if there is a better alternative to this. I would really apreciate any hints with regards to this. It would be useful to know if there are other libraries that also face this problem, and how they solved it. Finally, I would be grateful for any other advice regarding my code. One thing that has bugged me is my solution for having file position info - my solution never seemed very elegant. Thanks, David

Hi David,
What you are wanting to do is query and transform a reasonably large
AST. Fortunately there are solutions, generic programming should do
exactly what you want without too much hassle. I'd personally
recommend taking a look at the Uniplate library, starting by reading
the Haskell Workshop 2007 paper (its an academic paper, but for you
the really useful stuff is in Section 2 which is more tutorial style)
- http://www-users.cs.york.ac.uk/~ndm/uniplate/
I'm sure I could solve your particular queries using Uniplate, but
I'll leave it for you to take a look and figure out. If you can't
figure out how to do it after reading section 2 of the paper mail back
and I'll take a closer look.
Thanks
Neil
2009/3/3 David Miani
Hi, I'm working on a Haskell library for interacting with emacs org files. For those that do not know, an org file is a structured outline style file that has nested headings, text, tables and other elements. For example:
* Heading 1 Some text, more text. This is a subelement of Heading 1 1. You can also have list 1. and nested lists 2. more...
** Nested Heading (subelement of Heading 1) text... (subelement of Nested Heading)
** Another level 2 heading (subelement of Heading 1) | Desc | Value | |-----------+--------------------------------------| | Table | You can also have tables in the file | | another | row | | seperator | you can have seps as well, eg | |-----------+--------------------------------------|
* Another top level heading There are many more features, see orgmode.org
My library enables read and write access to a subset of this format (eg lists aren't parsed atm).
The data structures used for writing are:
data OrgFile = OrgFile [OrgFileElement] data OrgFileElement = Table OrgTable | Paragraph String | Heading OrgHeading
-- heading level, title, subelements data OrgHeading = OrgHeading Int String [OrgFileElement]
data OrgTable = OrgTable [OrgTableRow]
data OrgTableRow = OrgTableRow [String] | OrgTableRowSep
To write a file you contruct a OrgFile out of those elements, and pass it to a writeOrgFile func. Eg:
writeOrg $ OrgFile [Heading (OrgHeading 1 "h1" [Paragraph "str"])] would produce: * h1 str
I was going to use the same data structures for reading an org file, but it quickly became apparent that this would not be suitable, as you needed the position of the file of an element to be able to report errors. Eg if you needed to report an error that a number was expected, the message "'cat' is not a number" is not very useful, but "Line 2031: 'cat' is not a number" is. So the data structures I used were:
data FilePosition = FilePosition Line Column
data WithPos a = WithPos { filePos :: FilePosition, innerValue :: a }
data OrgTableP = OrgTableP [WithPos OrgTableRow]
data OrgFileElementP = TableP OrgTableP | ParagraphP String | HeadingP OrgHeadingP
data OrgHeadingP = OrgHeadingP Int String [WithPos OrgFileElementP]
data OrgFileP = OrgFileP [WithPos OrgFileElementP]
Finally there is a function readOrg, which takes a string, and returns an OrgTableP.
Now, this all works as expected (files are correctly being parsed and written), however I am having a lot of trouble trying to come up with a decent API to work with this. While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous.
For example, to read the description line for the project named "Project14" in the file:
* 2007 Projects ** Project 1 Description: 1 Tags: None ** Project 2 Tags: asdf,fdsa Description: hello * 2008 Projects * 2009 Projects ** Project14 Tags: RightProject Description: we want this
requires the code:
type ErrorS = String listToEither str [] = Left str listToEither _ (x:_) = Right x
get14 :: OrgFileP -> Either ErrorS String get14 (OrgFileP elements) = getDesc =<< (getRightProject . concatProjects) elements where concatProjects :: [WithPos OrgFileElementP] -> [OrgHeadingP] concatProjects [] = [] concatProjects ((WithPos _ (HeadingP h)) : rest) = h : concatProjects rest concatProjects (_ : rest) = concatProjects rest
getRightProject :: [OrgHeadingP] -> Either ErrorS OrgHeadingP getRightProject = listToEither "Couldn't find project14" . filter (\(OrgHeadingP _ name _) -> name == "Project14")
getDesc :: OrgHeadingP -> Either ErrorS String getDesc (OrgHeadingP _ _ children) = case filter paragraphWithDesc (map innerValue children) of [] -> Left $ show (filePos $ head children) ++ ": Couldn't find desc in project" ((ParagraphP str):_) -> Right str _ -> error "should not be possible"
paragraphWithDesc :: OrgFileElementP -> Bool paragraphWithDesc (ParagraphP str) = str =~ "Description" paragraphWithDesc _ = False
If you think that is bad, try writing a function that adds the Tag "Hard" to Project2 :(
What I really need is a DSL that would allow sql like queries on an OrgFileP. For example: select (anyHeading `next` headingWithName "Project14" `withFailMsg` "couldn't find p14" `next` paragraphMatchingRegex "Description" `withFailMsg` "no desc") org `output` paragraphText
would return a String OR select (anyHeading `next` headingWithName "Project2" `next` paragraphMatchingRegex "Tag:") org `modify` paragraphText (++ ",Hard")
would return an OrgFile, with the new Hard tag added.
However, I don't know if this is even possible, how to do it, or if there is a better alternative to this. I would really apreciate any hints with regards to this. It would be useful to know if there are other libraries that also face this problem, and how they solved it.
Finally, I would be grateful for any other advice regarding my code. One thing that has bugged me is my solution for having file position info - my solution never seemed very elegant.
Thanks, David
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous.
Have you looked at http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor It's something I've used successfully when wanting to manipulate the internals of complex types. Tim

Seems like Conal's semantic editor combinators could be of interest too, if
I'm understanding correctly:
http://conal.net/blog/posts/semantic-editor-combinators/
On Tue, Mar 3, 2009 at 8:00 PM, Tim Docker
While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous.
Have you looked at
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor
It's something I've used successfully when wanting to manipulate the internals of complex types.
Tim
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi David, I'm working on a Haskell library for interacting with emacs org files. For
those that do not know, an org file is a structured outline style file that has nested headings, text, tables and other elements.
Great! Sounds like fun. :) Now, this all works as expected (files are correctly being parsed and
written), however I am having a lot of trouble trying to come up with a decent API to work with this. While writing an OrgFile is fairly easy, reading (and accessing inner parts) of an org file is very tedious, and modifying them is horrendous.
I can imagine.
However, I don't know if this is even possible, how to do it, or if there is a better alternative to this. I would really apreciate any hints with regards to this. It would be useful to know if there are other libraries that also face this problem, and how they solved it.
I definite agree with Neil: I think generic programming is exactly what you are looking for. Fortunately, there are a number of libraries available to help you solve your problem. The next problem is figuring out which one and learning how to it. For this purpose, a comparison of libraries for generic programming in Haskell was recently published: http://www.cs.uu.nl/wiki/Alexey/ComparingLibrariesForGenericProgrammingInHas... That should give you an idea of what's out there and the pros and cons of each library. I found your example particularly interesting, so I decided to try out a solution in EMGM (Extensible and Modular Generics for the Masses). I wrote up my experiment here: http://splonderzoek.blogspot.com/2009/03/experiments-with-emgm-emacs-org-fil... More information on EMGM is here: http://www.cs.uu.nl/wiki/GenericProgramming/EMGM I'd be happy to help you figure out if EMGM is appropriate for other things you want to do. Good luck with your library! Regards, Sean

Well thanks to everyone that has replied so far - I've had an interesting time trying out different ideas. Firstly, for Neil Mitchell's suggestions regarding uniplate: I read through both uniplate and scrap your boilerplate libraries (found the second after reading about uniplate). For whatever reason, I understood syb better than uniplate (but thats probably just me). This actually worked quite well (note that I've changed the data types slightly, but it doesn't change the code that much): eitherOr :: Either a b -> Either a b -> Either a b eitherOr x@(Right _) _ = x eitherOr _ y = y getP14Desc :: OrgElement -> Either ErrString String getP14Desc org = everything eitherOr (Left descError `mkQ` findDesc) =<< everything eitherOr (Left findError `mkQ` findP14) org where findP14 h@(Heading {headingName=name}) | name == "Project14" = Right h findP14 _ = Left findError findDesc (Paragraph {paragraphText=text}) | text =~ "Description" = Right text findDesc _ = Left findError descError = "Couldn't find description for project" findError = "Couldn't find project." While it isn't that many less loc than my original code, it was much simpler to get working. Also, the find methods could easily be factored out. My second problem, adding the tag "Hard" to Project2 was also fairly simple: addHardTag org = everywhere (mkT addTagToP2) org where addTagToP2 h@(Heading {headingName=name}) | name == "Project 2" = everywhere (mkT addTag) h addTagToP2 x = x addTag text | text =~ "Tags:" = text ++ ",newtag" However, I also wanted to try out Tim Docker's suggestion for using data- accessor. That seemed to also be very promising, except for one thing - data- accessor doesn't seem to be able to cope with multiple constructors! The code for this was faily simple though, so I went about making it work for multiple constructors. The original definition for an Accessor d f (where d is the datatype and f is the type of the field) was Cons {decons :: d -> f -> (d, f)} -- (this wasn't exported by the module though) There is a problem with that for multiple constructors though - its possible that there will be no return for a given accessor. Eg running get headingName' (Paragraph "some text") would not be possible. So I changed the code to this: newtype Accessor1 d f = Accessor (d -> Maybe f, f -> d -> d) If the getter failed, Nothing is returned. If the setter failed, it acts like id. After using that for a while I realized there was potential to have an accessor automatically access all the children of a data type. This could be achieved by changing the return type of the getter to [f], and changing the setter function to a modifier function: newtype MultiConAccessor d f = MultiConAccessor ((d -> [f]),((f -> f) -> d -> d)) I also wrote the chain function, which joins to accessors together After a lot of definitions (although most should be able to be automated with template haskell), I could use the code: projectAccessor name = headingChildren' `chain` -- top level elements headingChildren' `chain` -- level 2 liftFilterS (== name) headingName' getP14Desc2 = getVal $ projectAccessor "Project14" `chain` headingChildren' `chain` liftFilterS (=~ "Description:") paragraphText' `chain` paragraphText' addHardTag2 = modVal (projectAccessor "Project 2" `chain` headingChildren' `chain` liftFilterS (=~ "Tags:") paragraphText' `chain` paragraphText') (++ ",newtag") I've posted all the code at http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=1778#a1778 It isn't very well documented, as I was just experimenting with this. Finally, thanks Sean for your response. That blog post was very nice! Your solution also looks good (especially since most of the code was automated). I haven't had a chance to have a close look at EMGM but will in the next couple of days. So I've gone from having no solution a few days ago to having 3 or 4 now! Not sure which solution I will stick with, any seem to do the job. So thanks everyone! David
participants (5)
-
Andrew Wagner
-
David Miani
-
Neil Mitchell
-
Sean Leather
-
Tim Docker