Help with a project design

Hi, I'm going to be long, sorry for it. And probably also off topic, a bit at least...;-) I need a way to manage bibliographies, pretty common problem isn't it? I used to use a wiki I developed also for such a task.[1] The wiki was basically based on Bibtex. I thought I could rewrite that bibliographic management system in Haskell, but I've been also following the work of some guys who are trying to develop a style citation language in XML.[2] Since my confidence with Haskell is growing, I though I could try to write an implementation of that Citation Style Language, but now I start hitting my basic lack of computer science education. Such an effort would be useful if I could write a library and release it, which requires a clean architecture and a simple exported API. But, while I can grasp difficult computational concepts like monads or arrows, choosing a given path of development and create that API is probably out of my reach. This is why I'm asking for help. Or, probably better, for directions on how to start acquiring such capacities. The task this library should do is simple: given an xml object (representing a bibliographic reference), render it with rules stored in a different xml object (the citation style). While I think I can find solutions for this problem - the rendering -, what I find difficult is the design of the reference xml objects. Bibliographic entries have different types, which must be rendered differently. These types can be classified into 3 main classes (books, articles, parts of a book) that can be rendered with the same methods. That seems to fit Haskell perfectly. Now, I basically see 2 approaches: 1. create some data structures (most part of them is common) to map different types of bibliographic entries, and create the needed classes with the render methods; 2. keep the xml objects as xml and create an abstract interface to the xml objects to get the data required for rendering and classifying the xml objects. This way I would have to: - create data types to store different types of xml objects (data Book = Book XmlTree, data Artilce, etc.): these data types represent my reference classes; - create a class of 'render'-able types with the render method and define the instances; - create an existential type to set the type of the xml objects with some kind of setType :: XmlTree -> ExistentialContainer I think that the first approach is not abstract enough and requires a lot of boilerplate code to translate into a Haskell type a specific type of bibliographic entry. Moreover, this brings me back to Bibtex, that maps each entry type to a set of rendering rules, while xml objects (MODS[3]) have no type (type must be deduced by the presence of given elements). The second one is the one I'm leaning to. But I'm also thinking that probably I should first study a bit the "scrap your boilerplate" approach ... on the other side I think that I should probably take a path, follow it and see what happens. In other words, I keep on testing the feasibility of different approaches, probably because I did not grasp the problem entirely. And then there is the API, function names, argument disposition, and so on. Is there some material I could read to have some guidelines for such a task? I know that this is some kind of meta question that is not really Haskell specific, even though I would like to have Haskell specific answers...;-) But any kind of suggestion will be appreciated, especially if you can give me directions to materials that, even if not directly connected with my specific problem, can help me in understanding the basic principle of functional programming design. Thanks for your kind attention and sorry for such a long message. Andrea [1] http://uniwakka.sf.net [2] xbiblio.sf.net [3] "Metadata Object Description Schema" (MODS) http://www.loc.gov/standards/mods/

Andrea Rossato wrote:
The task this library should do is simple: given an xml object (representing a bibliographic reference), render it with rules stored in a different xml object (the citation style). While I think I can find solutions for this problem - the rendering -, what I find difficult is the design of the reference xml objects.
Bibliographic entries have different types, which must be rendered differently. These types can be classified into 3 main classes (books, articles, parts of a book) that can be rendered with the same methods. That seems to fit Haskell perfectly.
Now, I basically see 2 approaches:
1. create some data structures (most part of them is common) to map different types of bibliographic entries, and create the needed classes with the render methods;
2. keep the xml objects as xml and create an abstract interface to the xml objects to get the data required for rendering and classifying the xml objects. This way I would have to: - create data types to store different types of xml objects (data Book = Book XmlTree, data Artilce, etc.): these data types represent my reference classes; - create a class of 'render'-able types with the render method and define the instances; - create an existential type to set the type of the xml objects with some kind of setType :: XmlTree -> ExistentialContainer
I may not be overly qualified (and experienced with Haskell) to give you advice, so take what follows with caution. I would definitely prefer choice 1 over 2. I think it is very important to design the data structure independent from any external representation of that data. XML is a fine way to externally represent data, but this should not influence your choice of data structure. I'd rather keep the possibility of alternative representations in the back of my head, and make the data structure general enough that they could be added w/o disrupting your main algorithms. Abstraction can be added later; if you find that you need to maintain invariants for you bibliographic data that cannot be easily expressed in the type itself, then you might consider to make your data type abstract, i.e. put it into a module of its own and export only an API.
I think that the first approach is not abstract enough and requires a lot of boilerplate code to translate into a Haskell type a specific type of bibliographic entry.
A certain amount of boiler plate may be unavoidable. I never found this to be a serious obstacle, but again that may be due to my limited experience. It is a bit tedious to write but OTOH may even serve you as 'finger exercise'. If it really gets out of hand, 'scrap' it in some way ;) I recommend the Uniplate approach because it is very easy to understand, performs good, and requires the least amount of extensions. OK, you have been warned... Cheers Ben
participants (2)
-
Andrea Rossato
-
Benjamin Franksen