
Hello This is my first post to Haskell Café. I am a hobbyist programmer who has lots of experience with imperative and OO languages ranging from 80286 assembly to Ada and Ruby. Last four years I've been learning Haskell, which has been a painfully educational experience - I have a good understanding about how CPUs work, but very little understanding about maths. This is a very long post, I apologize. Now I'm trying to create a tool which automatically generates foreign function interface (FFI) imports from C and C++ to Haskell. I have written a GCC plugin which extracts all relevant (and lots of irrelevant) information from GCC's TREEs and dumps the output as JSON. It also dumps preprocessor macros which is one of the reasons this tol is being given birth. The tool is used like Shake - import Development.Familiar and write code which glues all relevant data processing and FFI generation functions together. This is also going to be a reusable library which others can use for whatever they find the data to be useful for. This could include making FFI bindings to other languages. Some code analysis tools also might find the data useful. I'm having a hard time importing this data to Haskell in a type safe way. GCC's TREEs are structured in an OO way. I have tried to find a more functional way of representing them, but so far with no luck. The JSON data is typed, and the hierarchy is exactly the same as in GCC. Below I construct a tree of the hierarchy where I list a few of possible types and very few of the fields the data types contain. For listing fields and their types I use a syntax similar to Haskell. This shortened listing should be enough for illustration: * macro: (data fields include: name :: string, tokens :: [token], etc.) * tree: all nodes have a unique ID to be able to recover the cycles among the node graph; (id :: integer) * * translation_unit: (declarations :: [top_level_declaration]) * * constant: a compile-time constant value; (type :: type) * * * complex_constant: (value :: complex) * * * integer_constant: (value :: integer) * * namespace_decl: (name :: identifier, declarations :: [top_level_declaration]) * * declaration: (name :: identifier, type :: type) * * * const_decl: One component of enum type; (value :: type) * * * function_decl: (type :: function_type, params :: [param_decl], result :: result_decl) * * * type_decl: (no additional fields) * * type: (declaration :: type_decl, name :: identifier, size :: integer_constant, alignment :: integer, qualifiers :: [qualifier], completeness :: bool) * * * array_type: (element_type :: type, element_count :: integer) * * * numeric_type: (precision :: integer, signed :: bool, min_value :: integer_constant, max_value :: integer_constant) * * * * enumeral_type: (values :: [const_decl]) * * * function_type: (param_types :: [type], result_type :: type) * * * pointer_type: (referred_type :: type) * * * record_type: struct (C or C++)/union; (base_types :: [(type, access_info)], fields :: [field_decl or const_decl or type_decl], methods :: [method_decl]) * reference: Any node may be replaced by a reference. They can be forward or backward references. (referred_id :: integer) As you may see, tree nodes form many many cycles together. Subclasses sometimes provide stronger invariants for their fields than the parent classes. For example all declarations are associated with a type and for function_decl nodes this type is always a function_type. Now to my question. How to represent this in Haskell. Because of the enormous amount of fields per tree node, I certainly need to use records. -- Haskell follows: data BaseTree = BaseTree { id :: Int, ... ... } data BaseDecl = BaseDecl { baseTree :: BaseTree, name :: Identifier, declType :: Type, ... } data ConstDecl = ConstDecl { baseDecl :: BaseDecl, value :: ConstValue } data ConstValue = ComplexConst Complex | IntegerConst Integer data TypeDecl = TypeDecl { baseDecl :: BaseDecl } But... What shall I do about the heterogenous lists which many nodes contain. I wish I don't need to make myriards of ADTs for them, like: data AnyTopLevelDecl = ATLDConst ConstDecl | ATLDFunction FunctionDecl | ATLDNamespace NamespaceDecl | ATLDType TypeDecl -- ... data StructFieldDecl = SFDConst ConstDecl | SFDField FieldDecl | SFDType TypeDecl And so forth, basicly each heterogenous list gets its own ADT, because there is often variance in which node types can appear in the lists. I could use one big ADT like AnyDecl. However, to me it would feel like a bad design, if my types allowed RecordType nodes to reference NamespaceDecls, and therefore I don't want to pack all nodes into one type, unless I could use some type parameters or constraints to restrict the possible values the ADT could take. At first I thought that the following could solve my problem: data SomeDecl a = SomeDecl where SomeConst :: ConstDecl -> SomeDecl ConstDecl SomeField :: FieldDecl ->SomeDecl FieldDecl SomeFunction :: FunctionDecl -> SomeDecl FunctionDecl SomeType :: TypeDecl -> SomeDecl TypeDecl -- and then I would use type classes like this class TopLevelDecl decl where -- No methods needed instance TopLevelDecl ConstDecl instance TopLevelDecl FunctionDecl instance TopLevelDecl TypeDecl -- And then I could write functions like this: handleTopLevelDecl :: TopLevelDecl d => SomeDecl d -> Whatever handleTopLevelDecl (FunctionDecl f) = -- ... -- and I'd list all cases which have a TopLevelDecl instance, but not -- those which don't have. I also thought that my data types could look like -- this: data NamespaceDecl where NamespaceDecl :: { baseTree :: BaseTree, forall a. TopLevelDecl a => declarations :: [a] } Obviously this does not work, because type classes are open, and even though GHC prevents me from calling handleTopLevelDecl with (SOmeDecl FieldDecl), it still complains if I don't define an equation for it, because in some other module somebody might make a TopLevelDecl instance for FieldDecl. Now I would like to know if there is a way to solve my problem in an elegant way, e.g. by using type families. My second question is how to resolve the cyclicity of the node graph. I don't want to tie the knot, because I want to be able to manipulate the graph. Maps are one option, yes: data AnyNode = (list all possible nodes) type Id = Int type NodeIdMap = Map Id AnyNode -- Then my data types refer to other nodes like this: data FunctionDecl = FunctionDecl { parameters :: [Id] } Now I lose all type safety. I would have to insert a run-time check to every dereference of an Id if I want to make sure that function parameter declarations don't contain StringConstants. I've also considered using STRefs. During JSON parsing I would need to populate the nodes with the node Ids they refer to and then have a separate pass which turns all Ids to typed STRefs. At least with this approach I would get all possible run-time errors straight after parsing, and not when the data is used. I'm again sorry for this long elaboration. However I think that if you consider answering me you have a pretty good understanding of my problem and also a good knowledge of my level of understanding Haskell. I would like to unlearn OO-style thinking, and I would want to find a data and type representation which feels Haskellish. However I don't want to cut down the data. I want all information to be accessible in case somebody needs it. Thank you in advance! -- Aura