
Hi All, During the last few days I've been working on the ModuleExtractor - a high level extractor of modules from the Haskell source files. This is not a low level parser -- as used by the compilers -- since it only cares for the things related to documentation. I am using the Daan's Leijen Parsec library, which seems to be well designed, documented and reasonably fast. The motivation for this work is to replace my home brewed parsing of source files in Haskell Module Browser (or rather a sophisticated "grepping") - which I currently do in Smalltalk - by a Haskell version. I do not believe that I will gain much on speed here (Hugs implementation will be probably much slower than the Squeak's one) but the idea is to move as much code as possible from the Squeak's to the Haskell's side in order to create a support code which could benefit other people wishing to interface such browsers to systems other than the Squeak. I think this information is relevant to our discussion and could help in clarifying some issues and provide some experimental tool. The parser aims to extract this information from the source files: data Module = Module { name :: String -- done , comment :: String -- done , exports :: [Export] -- chunk for now , imports :: [Import] -- chunk for now , fixities :: [Fixity] -- done , classes :: [Class] -- chunk for now , instances :: [Instance] -- chunk for now . categories :: [String] -- chunk for now , functions :: [Function] -- done , footnote :: String -- done } At the first stage, the parser breaks the source code into chunks: type Chunk = [Comment, Code] and then examines each chunk to convert it to one of the above specified entities. For example, the Function datatype is defined as: data Function = Function { funName :: String , funSignature :: Signature , funBody :: String } The good news is that the parser is able to deal with any positional placement of comments. For example, when it deals with functions it considers any one or all (concatenating all of them) the following comment options: + Many "--" comments or "{- .. -}" comment before the signature x Signature + Many "--" comments or "{- .. -}" comment after the signature x First line of function body + Many indented "--" comment lines x Indented function body Similar pattern applies to other entities. But in order of this positional approach to work I had to admit a concept of a category (known and cherished in Smalltalk, Objective C, Eiffel). In Haskell case, a special banner separates groups of functions. If this is not indicated somehow then the banner would become a part of the comment of the entity that follows it (wrong, but not catastrophic). It seems, after all, that I was not entirely correct in one of my previous posts - an intelligent parser can cope with a purely positional layout, given a bit of help related to definition of category delimiters. I should have remembered this, because I've done similar parsing for Xcoral browser for Java. I thought that this would be a helpful information for our discusion. I'll post the code when it's ready. Jan