
Hi All, In my quest to get Yhc bytecode compiled from Yhc Core I've discovered that I will need to make (yet more) changes to Core. In short the changes necessary are (from most substantial to least substantial): a) changing the way names are encoded b) adding to Core a list of symbols imported from other modules c) adding yet more things to the CorePrim data type Some of these may break some people's code, but hopefully not too many or too much. This email is quite long so feel free to skip to the relevant sections if you don't want to read it all. --------------------------------------------- BACKGROUND --------------------------------------------- Previously I was looking at converting the names generated by Core back into nhc98's internal Id data type and then using the nhc98 symbol table. I've decided this is a bad idea because it seriously limits what possible transformations could be made to core; anything that would cause a mismatch with the nhc98 symbol table wouldn't work. This is very much against the spirit of converting the backend to generate from Core in the first place. Thus the more ideal solution would be to convert the internal PosLambda to a Core form that contained enough information to do the complete bytecode generation process. Having done the translation the nhc98 symbol table could then simply be forgotten about. However, unfortunately Core at the moment, doesn't quite have all the information needed. --------------------------------------------- CHANGE a) CHANGING THE ENCODING OF NAMES --------------------------------------------- I propose changing the way Core encodes names from Module.Item to Module;Item. For example, the fromJust function would appear as Data.Maybe;fromJust x = ... instead of Data.Maybe.fromJust x = .... -- CONSEQUENCES --------- - Anyone who relies on being able to parse the names will find the name parsing code will break. - Anyone trying to convert the names to valid Haskell identifiers will need to change their code. -- REASON --------------- The reason this change is necessary is to do with class instances and how the interpreter load symbols. Consider the following class instance module Foo.Bar data Baz = Baz instance Eq Baz where a == b = True The Core generated for the '==' function would currently look like: Foo.Bar.Prelude.Eq.Foo.Bar.Baz.== a b = True This encodes: - that the instance is defined in the Foo.Bar module - that it is an instance of the class Prelude.Eq - that the data type being given an instance is Foo.Bar.Baz - that the function being defined is '==' The problem is with the ambiguity in separating these components. Suppose some function defined in another module needs to use the == function for the Baz datatype. It would do this by asking the interpreter to load "Foo.Bar.Prelude.Eq.Foo.Bar.Baz.==" In order to load this the interpreter first needs to work out which module file it should load. Unfortunately from this name alone it has no way of knowing. This name could be (Foo.Bar.Prelude).(Eq.Foo.Bar.Baz.==) Or (Foo).(Bar.Prelude.Eq).(Foo.Bar.Baz.==) Or even (Foo.Bar.Prelude.Eq.Foo.Bar.Baz).(==) The name simply doesn't contain enough information to decide which part is the module name and which part is the item in that module. I thus suggest changing the name Core generates to Foo.Bar;Prelude.Eq.Foo.Bar.Baz.== which makes it clear. Semicolon is a good choice of separator because it is one of the few characters that cannot appear in a valid Haskell identifier. --------------------------------------------- CHANGE b) ADDING AN IMPORT TABLE --------------------------------------------- I propose changing the Core datatype to include a list of symbols that are imported from other modules. So data Core = { ... coreImportSymbols :: [CoreImport] ... } data CoreImport = CoreImportData CoreData | CoreImportFunc { coreImportName :: String, coreImportArity :: Int } -- CONSEQUENCES --------- - Anyone who does a complete pattern match on Core will find their code breaks as it will have gained an extra field. -- REASON --------------- The only information Yhc Core currently provides about symbols defined in other modules is their name. This is not enough information to compile applications to those functions or make cases on those datatypes. For example, in module Foo you make an application to the function 'Bar.bar' such as Foo.foo x = Bar.bar (x+1) To compile this application the compiler needs to know the arity of the bar function. Depending on the arity it will then either make a partial application, a saturated application or a super-saturated application (each of which would generate different bytecodes). Similarly when casing on a datatype Foo.foo x = case x of Bar.Bar y -> ... The compiler needs to know what the tag number for Bar.Bar is, and whether this case statement is complete or partial (again each has different bytecodes). --------------------------------------------- CHANGE c) ADDING FIELDS TO CorePrim --------------------------------------------- I propose changing the CorePrim datatype to: CorePrim { ... corePrimExternal :: String, -- the 'C' name of the function corePrimConv :: String, -- the calling convention corePrimImport :: Bool, -- whether this is import/export corePrimTypes :: [String] -- the types of the arguments/return } Three of these changes were suggested earlier. The types would be a simple encoding of the arguments and return type, so. foreign import malloc :: Int -> Ptr a would have types [ "Prelude.Int", "Data.Foreign.Ptr a" ] -- CONSEQUENCES --------- - Anyone who does a complete pattern match on CorePrim (it's not recommended) will find their code breaks. - Recommendation: from now on people don't do a complete pattern match on CorePrim instead using the field selectors and (CorePrim{}) for pattern matches. This will make it easier to accommodate any further changes to CorePrim (which may well be necessary). -- REASON --------------- The current CorePrim datatype does not contain enough information to compile calls to foreign functions. The above changes would mean that from this bytecode backend's point of view this would no longer be true. --------------------------------------------- CONCLUSION --------------------------------------------- From a detailed look at the code, and a start at implementing the Yhc Core to Yhc bytecode compiler, I believe the changes listed above are everything that's necessary. I could easily be proven wrong on that one though ;-) Cheers Tom