
Hi All, In my quest to get Yhc bytecode compiled from Yhc Core I've discovered that I will need to make (yet more) changes to Core. In short the changes necessary are (from most substantial to least substantial): a) changing the way names are encoded b) adding to Core a list of symbols imported from other modules c) adding yet more things to the CorePrim data type Some of these may break some people's code, but hopefully not too many or too much. This email is quite long so feel free to skip to the relevant sections if you don't want to read it all. --------------------------------------------- BACKGROUND --------------------------------------------- Previously I was looking at converting the names generated by Core back into nhc98's internal Id data type and then using the nhc98 symbol table. I've decided this is a bad idea because it seriously limits what possible transformations could be made to core; anything that would cause a mismatch with the nhc98 symbol table wouldn't work. This is very much against the spirit of converting the backend to generate from Core in the first place. Thus the more ideal solution would be to convert the internal PosLambda to a Core form that contained enough information to do the complete bytecode generation process. Having done the translation the nhc98 symbol table could then simply be forgotten about. However, unfortunately Core at the moment, doesn't quite have all the information needed. --------------------------------------------- CHANGE a) CHANGING THE ENCODING OF NAMES --------------------------------------------- I propose changing the way Core encodes names from Module.Item to Module;Item. For example, the fromJust function would appear as Data.Maybe;fromJust x = ... instead of Data.Maybe.fromJust x = .... -- CONSEQUENCES --------- - Anyone who relies on being able to parse the names will find the name parsing code will break. - Anyone trying to convert the names to valid Haskell identifiers will need to change their code. -- REASON --------------- The reason this change is necessary is to do with class instances and how the interpreter load symbols. Consider the following class instance module Foo.Bar data Baz = Baz instance Eq Baz where a == b = True The Core generated for the '==' function would currently look like: Foo.Bar.Prelude.Eq.Foo.Bar.Baz.== a b = True This encodes: - that the instance is defined in the Foo.Bar module - that it is an instance of the class Prelude.Eq - that the data type being given an instance is Foo.Bar.Baz - that the function being defined is '==' The problem is with the ambiguity in separating these components. Suppose some function defined in another module needs to use the == function for the Baz datatype. It would do this by asking the interpreter to load "Foo.Bar.Prelude.Eq.Foo.Bar.Baz.==" In order to load this the interpreter first needs to work out which module file it should load. Unfortunately from this name alone it has no way of knowing. This name could be (Foo.Bar.Prelude).(Eq.Foo.Bar.Baz.==) Or (Foo).(Bar.Prelude.Eq).(Foo.Bar.Baz.==) Or even (Foo.Bar.Prelude.Eq.Foo.Bar.Baz).(==) The name simply doesn't contain enough information to decide which part is the module name and which part is the item in that module. I thus suggest changing the name Core generates to Foo.Bar;Prelude.Eq.Foo.Bar.Baz.== which makes it clear. Semicolon is a good choice of separator because it is one of the few characters that cannot appear in a valid Haskell identifier. --------------------------------------------- CHANGE b) ADDING AN IMPORT TABLE --------------------------------------------- I propose changing the Core datatype to include a list of symbols that are imported from other modules. So data Core = { ... coreImportSymbols :: [CoreImport] ... } data CoreImport = CoreImportData CoreData | CoreImportFunc { coreImportName :: String, coreImportArity :: Int } -- CONSEQUENCES --------- - Anyone who does a complete pattern match on Core will find their code breaks as it will have gained an extra field. -- REASON --------------- The only information Yhc Core currently provides about symbols defined in other modules is their name. This is not enough information to compile applications to those functions or make cases on those datatypes. For example, in module Foo you make an application to the function 'Bar.bar' such as Foo.foo x = Bar.bar (x+1) To compile this application the compiler needs to know the arity of the bar function. Depending on the arity it will then either make a partial application, a saturated application or a super-saturated application (each of which would generate different bytecodes). Similarly when casing on a datatype Foo.foo x = case x of Bar.Bar y -> ... The compiler needs to know what the tag number for Bar.Bar is, and whether this case statement is complete or partial (again each has different bytecodes). --------------------------------------------- CHANGE c) ADDING FIELDS TO CorePrim --------------------------------------------- I propose changing the CorePrim datatype to: CorePrim { ... corePrimExternal :: String, -- the 'C' name of the function corePrimConv :: String, -- the calling convention corePrimImport :: Bool, -- whether this is import/export corePrimTypes :: [String] -- the types of the arguments/return } Three of these changes were suggested earlier. The types would be a simple encoding of the arguments and return type, so. foreign import malloc :: Int -> Ptr a would have types [ "Prelude.Int", "Data.Foreign.Ptr a" ] -- CONSEQUENCES --------- - Anyone who does a complete pattern match on CorePrim (it's not recommended) will find their code breaks. - Recommendation: from now on people don't do a complete pattern match on CorePrim instead using the field selectors and (CorePrim{}) for pattern matches. This will make it easier to accommodate any further changes to CorePrim (which may well be necessary). -- REASON --------------- The current CorePrim datatype does not contain enough information to compile calls to foreign functions. The above changes would mean that from this bytecode backend's point of view this would no longer be true. --------------------------------------------- CONCLUSION --------------------------------------------- From a detailed look at the code, and a start at implementing the Yhc Core to Yhc bytecode compiler, I believe the changes listed above are everything that's necessary. I could easily be proven wrong on that one though ;-) Cheers Tom

Hi
CHANGE a) CHANGING THE ENCODING OF NAMES
I propose changing the way Core encodes names from Module.Item to Module;Item. For example, the fromJust function would appear as
Data.Maybe;fromJust x = ...
instead of
Data.Maybe.fromJust x = ....
I disagree. Data.Maybe.fromJust is how you would write the fully qualified name in Haskell, and is entirely unambiguous. One of the design decisions was to make Core and Haskell appear as similar as possible.
The reason this change is necessary is to do with class instances and how the interpreter load symbols. Consider the following class instance
Foo.Bar;Prelude.Eq.Foo.Bar.Baz.==
Why can't this encoding be applied only to class instances? For a class you need two module names and a function name, module1;module2.function should be plenty.
which makes it clear. Semicolon is a good choice of separator because it is one of the few characters that cannot appear in a valid Haskell identifier.
Agreed, semi-colon is a sensible choice.
CHANGE b) ADDING AN IMPORT TABLE
I propose changing the Core datatype to include a list of symbols that are imported from other modules.
Is this necessary? We currently have a list of imported modules in the Core data type. I would have thought you could build the import table outside of the Core data type, since its not information in the Core file, more information that is taken from other files.
The only information Yhc Core currently provides about symbols defined in other modules is their name. This is not enough information to compile applications to those functions or make cases on those datatypes.
From the import list in a Core file, you can load all the .hi files, and make a mapping outside of the Core data type. i.e, CorePlusImports = (Core, CoreImportData).
The advantage of not putting this inside the Core data type is that you can change it freely, and use a Data.Map for faster lookups etc. It also allows to add extra information like strictness (Yhc.Core already has a strictness analyser)
CHANGE c) ADDING FIELDS TO CorePrim
Three of these changes were suggested earlier.
All seem sensible.
- Recommendation: from now on people don't do a complete pattern match on CorePrim instead using the field selectors and
Agreed. We also have isCorePrim, and functions like coreFuncArity/coreFuncName which work on both primitives and functions. Thanks Neil

Neil Mitchell wrote:
I disagree. Data.Maybe.fromJust is how you would write the fully qualified name in Haskell, and is entirely unambiguous. One of the design decisions was to make Core and Haskell appear as similar as possible.
Well in the case of Data.Maybe.fromJust it is unambiguous however in my opinion consistency is more important than closeness to Haskell identifiers (perhaps others differ on this). Of course closeness to Haskell for identifiers is a bit of a moot point anyway since if you want to generate Haskell you need to mangle the names in any case. You can't generate a Haskell file containing Data.Maybe.fromJust x = ... If closeness to Haskell is deemed more important than consistency then the unambiguous cases could be left as they are. However I think much of the benefit of having the names close to Haskell will have already been lost. It's quite nice being able to say *all* the generated names look like Haskell identifiers. It's much less nice saying "well some of the names look like Haskell identifiers, except class instances (and possible some others) which are different". To my way of looking closeness to Haskell identifiers is something that should either be wholly kept or wholly sacrificed; 50-50 satisfies no-one.
Why can't this encoding be applied only to class instances? For a class you need two module names and a function name, module1;module2.function should be plenty.
Sadly whilst class instances are the major culprit they are not the only one, local functions get the same kind of problems. For example module Foo foo x = bar x where bar x = ... generates Foo.Foo.Prelude.200.bar v217 v218 = ... this is just about decidable, but it's awfully hard work. There may also be other things that I'm not aware of that do this too. In my opinion the cleanest resolution is to say that the module name is always separated with a ';'.
Is this necessary? We currently have a list of imported modules in the Core data type. I would have thought you could build the import table outside of the Core data type, since its not information in the Core file, more information that is taken from other files.
Well thinking about it it's probably not strictly necessary for my purposes - it would be possible to instead use the 'String -> Id' map now generated by the conversion process. However anyone else wanting to compile Core who wants to retain the ability to do separate compilation is likely to need the information and they probably wont have that mapping available. That said ...
The advantage of not putting this inside the Core data type is that you can change it freely, and use a Data.Map for faster lookups etc. It also allows to add extra information like strictness (Yhc.Core already has a strictness analyser)
These are both good points, of course I was simply going to use the list to build a Data.Map and then use that for the compilation process. But the strictness information is a reasonable issue. That said the .hi files don't contain strictness information either so adding strictness would already involve a .hi file change (which is not a minor thing). I guess either way is fine on this one ... Cheers Tom

Hi
Well in the case of Data.Maybe.fromJust it is unambiguous however in my opinion consistency is more important than closeness to Haskell identifiers (perhaps others differ on this).
Closeness to Haskell reduces the learning curve, and also makes it easier to follow when debugging. ; vs . is not massive, so probably isn't massively different. The other thing that directs me slightly further towards ; is the (.) function - Prelude.. feels ugly, Prelude;. feels less so.
Of course closeness to Haskell for identifiers is a bit of a moot point anyway since if you want to generate Haskell you need to mangle the names in any case. You can't generate a Haskell file containing
Data.Maybe.fromJust x = ...
Generating resultant Haskell is a rare case, so this shouldn't have any bearing on our choice. We also need to encode much more than just this.
Sadly whilst class instances are the major culprit they are not the only one, local functions get the same kind of problems. For example
module Foo
foo x = bar x where bar x = ...
generates
Foo.Foo.Prelude.200.bar v217 v218 = ...
I find this case much more compelling than the class one. However, it seems in this case that 200 is a "unique" bit, so why not Foo.200_bar, which is still unique, and entirely unambiguous. The Prelude, and the duplication of Foo, both seem really random.
this is just about decidable, but it's awfully hard work. There may also be other things that I'm not aware of that do this too. In my opinion the cleanest resolution is to say that the module name is always separated with a ';'.
I am still unconvinced. I agree that the rule for extracting module names should be simple and unambiguous, but still think introducing ; is probably unnecessary.
now generated by the conversion process. However anyone else wanting to compile Core who wants to retain the ability to do separate compilation is likely to need the information and they probably wont have that mapping available.
Anyone wanting to do separate compilation in Core will need some sort of .hi file, and is likely to need custom information from each .hi file. At the moment that would be awfully painful, once its needed we'll generalise the .hi file mechanism.
These are both good points, of course I was simply going to use the list to build a Data.Map and then use that for the compilation process. But the strictness information is a reasonable issue. That said the .hi files don't contain strictness information either so adding strictness would already involve a .hi file change (which is not a minor thing).
As said, we need to generalise .hi once we support this. Thanks Neil

Neil Mitchell wrote:
Closeness to Haskell reduces the learning curve, and also makes it easier to follow when debugging. ; vs . is not massive, so probably isn't massively different. The other thing that directs me slightly further towards ; is the (.) function - Prelude.. feels ugly, Prelude;. feels less so.
Yes I think "Data.Maybe.fromJust" is a lot better than "Data.Maybe._417", but I think the difference between "Data.Maybe.fromJust" and "Data.Maybe;fromJust" is fairly small ...
Generating resultant Haskell is a rare case, so this shouldn't have any bearing on our choice. We also need to encode much more than just this.
fair enough.
I find this case much more compelling than the class one. However, it seems in this case that 200 is a "unique" bit, so why not Foo.200_bar, which is still unique, and entirely unambiguous. The Prelude, and the duplication of Foo, both seem really random.
That's interesting since I can see absolutely no way to make the class case unambiguous except by adding a different separator, in that way it would seem the more compelling case :-) Converting all such names to "Foo.200_bar" would indeed fix the problem, though the conversion process might be a bit fiddly ...
I am still unconvinced. I agree that the rule for extracting module names should be simple and unambiguous, but still think introducing ; is probably unnecessary.
Well I think it's definitely necessary in the case of classes, and if you're going to do it for classes why not be consistent and do it everywhere? It would definitely be useful to know what other people think on this question ... As a little side note for the same reasons the current system doesn't guarantee unique names for class instances. (ModA.ModB).(ModC.ClassD).(ModE.DataF).g and (ModA).(ModB.ModC.ClassD).(ModE.DataF).g are different instances, but they have the same unique Core name. Pretty unlikely to occur in practice but possible none the less.
Anyone wanting to do separate compilation in Core will need some sort of .hi file, and is likely to need custom information from each .hi file. At the moment that would be awfully painful, once its needed we'll generalise the .hi file mechanism.
Well ... possibly though I think you could do a lot just knowing the arity and data constructors. Of course for strictness information the core strictness analyser could even transform the core to an appropriately strict version (through the use of cases) in which case no strictness information needs to be passed to the back-end at all. In many ways this is the cleaner solution. As I say, I'm happy enough to leave that change out, I just imagined it might be useful to people. Cheers Tom

Hi
Converting all such names to "Foo.200_bar" would indeed fix the problem, though the conversion process might be a bit fiddly ...
Yes, but I think it would be worth it.
Well I think it's definitely necessary in the case of classes, and if you're going to do it for classes why not be consistent and do it everywhere?
I agree its definitely necessary in the case of classes. I don't see treating classes like the other functions as being consistent. Classes are fundamentally different - most functions have one module name, classes have three - this is a built in inconsistency. I can see advantages to being able to "spot" class dictionaries and recover precisely their original context information, I've wanted it before and this desugaring scheme, applied only to classes, would give that.
Anyone wanting to do separate compilation in Core will need some sort of .hi file, and is likely to need custom information from each .hi file. At the moment that would be awfully painful, once its needed we'll generalise the .hi file mechanism.
Well ... possibly though I think you could do a lot just knowing the arity and data constructors. Of course for strictness information the core strictness analyser could even transform the core to an appropriately strict version (through the use of cases) in which case no strictness information needs to be passed to the back-end at all. In many ways this is the cleaner solution.
Don't you then pay to create a thunk, then evaluate it later? I thought that this was one of the main costs saved by strictness, which wouldn't be saved by a case transformation. Thanks Neil

Neil Mitchell wrote:
I agree its definitely necessary in the case of classes. I don't see treating classes like the other functions as being consistent. Classes are fundamentally different - most functions have one module name, classes have three - this is a built in inconsistency. I can see advantages to being able to "spot" class dictionaries and recover precisely their original context information, I've wanted it before and this desugaring scheme, applied only to classes, would give that.
Of course it's still very easy in the case that it's applied everywhere. You split the string based on ';'. If you get two parts its a normal name, if you get 3 or 4 parts its a class instance. Three is possible because the instance dictionary is generated as Foo.Bar;Prelude.Eq;Foo.Bar.Baz = (,) Foo.Bar;Prelude.Eq;Foo.Bar.Baz;== Foo.Bar;Prelude.Eq;Foo.Bar.Baz;/=
Don't you then pay to create a thunk, then evaluate it later? I thought that this was one of the main costs saved by strictness, which wouldn't be saved by a case transformation.
Conceptually, no case f x of _ -> ... is a perfectly reasonable description to the backend that 'f x' should be compiled strictly (i.e. without thunks).
I would have thought the best encoding would be:
Foo.Bar;Foo.Bar.Baz;Prelude.Eq.==
OR
Prelude.Eq;Foo.Bar.Baz;Foo.Bar.==
In the first the rule for extracting module names is "up to the first ; or last .", in the second its "skip all bits before ;" - and I'm leaning towards the first. I read the first as "hey, I'm located in Foo.Bar, I operate over Foo.Bar.Baz and I'm really just a type specific version of Prelude.Eq.=="
Yeah that would be fine, don't forget that the class dictionary is Foo.Bar;Foo.Bar.Baz;Prelude.Eq if that has any bearings on trying to understand what things are.
Whatever scheme we do go with, I suggest a coreFunModule function be added which obtains the module from a function name, and assuming dictionaries are unambiguous, coreFunDictionary to split up the pieces.
or even 'coreModule' which would also work for data types. 'coreFunInstance' is possibly a better name than 'coreFunDictionary'. Cheers Tom

Hi
Of course it's still very easy in the case that it's applied everywhere. You split the string based on ';'. If you get two parts its a normal name, if you get 3 or 4 parts its a class instance. Three is possible because the instance dictionary is generated as
Foo.Bar;Prelude.Eq;Foo.Bar.Baz = (,) Foo.Bar;Prelude.Eq;Foo.Bar.Baz;== Foo.Bar;Prelude.Eq;Foo.Bar.Baz;/=
True, this leads me to the conclusion that in dictionaries we separate all pieces with ; - leaving 3 or 4 pieces. I know you can still split the dictionary, but by the time you've got 4 ;'s floating around, you are a million miles from either Data.Maybe.fromJust or Data.Maybe;fromJust - so I think being consistent with Haskell wins.
Don't you then pay to create a thunk, then evaluate it later? I thought that this was one of the main costs saved by strictness, which wouldn't be saved by a case transformation.
Conceptually, no
case f x of _ -> ...
is a perfectly reasonable description to the backend that 'f x' should be compiled strictly (i.e. without thunks).
Yes, but: foo x y = case x of _ -> ... And in another module: bar x = foo (f x) x Now the (f x) thunk is still created, then immediately forced. If we are expanding our .hi file format (which should only be done once its binary!) we could add some inlining information as well, esp for things like otherwise. Thanks Neil

Neil Mitchell wrote:
True, this leads me to the conclusion that in dictionaries we separate all pieces with ; - leaving 3 or 4 pieces. I know you can still split the dictionary, but by the time you've got 4 ;'s floating around, you are a million miles from either Data.Maybe.fromJust or Data.Maybe;fromJust - so I think being consistent with Haskell wins.
Well I still prefer my suggestion that all module names be separated with ';', however, in the interests of making progress I'll agree the the Neil variant, which gives valid examples as: - Simple function Data.Maybe.fromJust - Local function Foo.200_bar - instance dictionary Foo.Bar;Foo.Bar.Baz;Prelude.Eq - instance method Foo.Bar;Foo.Bar.Baz;Prelude.Eq;== INVARIANT: - In a simple dotted name (e.g. Data.Maybe.fromJust) the non-module part of the name may not contain a dot unless the first character of the name is a non alpha/underscore. The complication is to accommodate for operators containing '.' like: Foo.Bar.+.+ which is of course (Foo.Bar).(+.+). Note that this also works correctly for local functions that are operators containing '.'. For example Foo.Bar.200_+.+ obeys the rule because it starts with a digit and not with an alpha or an underscore. I'll have the name conversion routine balk and die on anything that doesn't conform to one of those 4 forms. I'll also leave out the import stuff for now, with a view that it's easy enough to add it later if people do decide they do want it. The conversion routine is going to generate the same information in either case, it's just whether it gets put in the core structure or not. I'll add the new stuff to the primitives, if anyone has any additional suggestions please say so :-)
Yes, but:
foo x y = case x of _ -> ...
And in another module:
bar x = foo (f x) x
Now the (f x) thunk is still created, then immediately forced.
If we are expanding our .hi file format (which should only be done once its binary!) we could add some inlining information as well, esp for things like otherwise.
Ah yes if you want separate compilation you need to include the information in the .hi file, you're quite right :-) Cheers Tom

Hi Tom
Well I still prefer my suggestion that all module names be separated with ';'
Foo.Bar.+.+
which is of course (Foo.Bar).(+.+). Note that this also works correctly for local functions that are operators containing '.'. For example
Foo.Bar.200_+.+
Hmm, now I'm changing my mind. Those examples make me think that ; is more sensible - since otherwise you are really screwing with small differences in what makes an operator or not - I especially dislike the "leading digits" rule. I would rather that Core didn't care too much about operator vs function differences, but these two examples show it will have to. Therefore I think ; makes more sense. If we use ; we can also change it so: module Foo where foo = ... where bar = ... Foo;foo.bar - which is exactly what you would want, to keep as much info as possible. Of course, this change could be done sometime in the future.
I'll also leave out the import stuff for now, with a view that it's easy enough to add it later if people do decide they do want it. The conversion routine is going to generate the same information in either case, it's just whether it gets put in the core structure or not.
Agreed, ditto with the primitive stuff. Thanks Neil

Neil Mitchell wrote:
Hmm, now I'm changing my mind. Those examples make me think that ; is more sensible - since otherwise you are really screwing with small differences in what makes an operator or not - I especially dislike the "leading digits" rule. I would rather that Core didn't care too much about operator vs function differences, but these two examples show it will have to. Therefore I think ; makes more sense.
Okay. As a little side note, you won't believe how nhc98 stores local functions (which I've just discovered) module Foo foo = .. where bar = ... The name Foo.Foo.Prelude.200.bar is actually stored in nhc98 as an instance dictionary! - defined in the module Foo - on the datatype Prelude.200, i.e. the two hundred tuple!?! - to the class(!?) Foo.Foo.bar Yes, really. Thus currently my code encodes the name as Foo;Prelude.(,,,,, ... ,,,,,,,,);Foo.Foo.bar The fiddly bit is going to be deciding that this isn't an instance dictionary but is actually a locally defined function ... *sigh*
If we use ; we can also change it so:
module Foo where
foo = ... where bar = ...
Foo;foo.bar - which is exactly what you would want, to keep as much info as possible.
Of course, this change could be done sometime in the future.
Yes this would be ideal, but unfortunately it would involve some extensive changes as currently nhc98 doesn't store that much information about what a function was defined inside. Cheers :-) Tom

Hi
Okay. As a little side note, you won't believe how nhc98 stores local functions (which I've just discovered)
module Foo
foo = .. where bar = ...
The name Foo.Foo.Prelude.200.bar is actually stored in nhc98 as an instance dictionary!
- defined in the module Foo - on the datatype Prelude.200, i.e. the two hundred tuple!?! - to the class(!?) Foo.Foo.bar
Yes, really. Thus currently my code encodes the name as
Foo;Prelude.(,,,,, ... ,,,,,,,,);Foo.Foo.bar
LOL!!!!
The fiddly bit is going to be deciding that this isn't an instance dictionary but is actually a locally defined function ... *sigh*
Isn't the solution to change this at generation time? I would have thought it was hard to fix up later. Thanks Neil

module Foo
foo x = bar x where bar x = ...
generates
Foo.Foo.Prelude.200.bar v217 v218 = ...
I don't see where the Prelude bit came from, but I'm assuming that the second "Foo" was a typo for lower-case "foo".
so why not Foo.200_bar, which is still unique, and entirely unambiguous.
FWIW, my preference would be to keep a reference to the enclosing function name in the local name, e.g. Foo.foo.200.bar This style has the advantage of remaining decipherable by the careful reader. "bar" is a local definition within "Foo.foo", and because there are potentially many different local "bar"s within different clauses of "Foo.foo", the number disambiguates. Even better if the number was essentially a line number or similar (clause number?), to make it even more readable. Regards, Malcolm

Hi
Foo.Foo.Prelude.200.bar v217 v218 = ...
I don't see where the Prelude bit came from, but I'm assuming that the second "Foo" was a typo for lower-case "foo".
Thin air. And no typo, it duplicates the module name twice.
FWIW, my preference would be to keep a reference to the enclosing function name in the local name, e.g.
Foo.foo.200.bar
This style has the advantage of remaining decipherable by the careful reader. "bar" is a local definition within "Foo.foo", and because there are potentially many different local "bar"s within different clauses of "Foo.foo", the number disambiguates.
I agree, which necessitates Foo;foo.200.bar, to split the components back up.
Even better if the number was essentially a line number or similar (clause number?), to make it even more readable.
It can't be a line number, since you can potentially put two clauses on the same line. You could certainly do better with the desugaring, but I think a more annoying desugar problem comes with _LAMBDA's which crop up quite a bit, and have absolutely nothing tying them back to their original location. Thanks Neil

Even better if the number was essentially a line number or similar (clause number?), to make it even more readable.
It can't be a line number, since you can potentially put two clauses on the same line. You could certainly do better with the desugaring, but I think a more annoying desugar problem comes with _LAMBDA's which crop up quite a bit, and have absolutely nothing tying them back to their original location.
How much in Core has an "obvious" translation from Haskell, such that different implementations or versions, can compile the same Haskell code and get the exact same Core text out? If only local things like the details of how let/lambda-bound variables are named, vary, there is more possible separate compilation of separate files (or something like that...). I think (line,column) is sufficiently unique identification? Maybe 200_4 would represent line 200, column 4. (or "200.4"? By itself, that looks like a fractional number, which is bad) Isaac

Hi
How much in Core has an "obvious" translation from Haskell, such that different implementations or versions, can compile the same Haskell code and get the exact same Core text out?
Not a whole lot. There is also the issue of optimisations, even the very basic ones will permute expressions around. You can get the same top-level declarations regardless, but that is about it.
If only local things like the details of how let/lambda-bound variables are named, vary, there is more possible separate compilation of separate files (or something like that...).
The top level interface can be made very solid, so separate compilation still works fine.
I think (line,column) is sufficiently unique identification? Maybe 200_4 would represent line 200, column 4. (or "200.4"? By itself, that looks like a fractional number, which is bad)
That may be sufficient, or not. Consider: f x = (a,b) where (a,b) = x This compiles to something like: f x = (fst x, snd x) However, fst and snd are actually lambda's, which could conceivably have the same source position. I do think line_col could be made to work though, with some thought, and it would be a very handy scheme. Thanks Neil

Tom Shackell wrote:
Of course closeness to Haskell for identifiers is a bit of a moot point anyway since if you want to generate Haskell you need to mangle the names in any case. You can't generate a Haskell file containing
Data.Maybe.fromJust x = ...
Actually, you can. As long as "the module name with which you're qualifying the symbol you're defining", is the same name as "the name of the module you are in". 200_bar --A. starts with a number 200_+.+ --B. part varid, part varsym instance names --C. very complicated (A) might be changed with a prefix; (b) and (c) would require more mangling to make them reasonable names, so it may not be worth it. The semicolon seems good - it's not even part of "[]", "(,,,,)", etc. (which is why the comma on second thought seems like a bad idea to me) Of course it is quite contradictory to the way the semicolon is used in Haskell syntax. Foo.Bar;Prelude.Eq;Foo.Bar.Baz;== ok Foo.Bar;Foo.Bar.Baz;Prelude.Eq.== I don't think this one makes much sense, since Prelude is the module and Prelude.Eq is in it. Prelude.Eq;Foo.Bar.Baz;Foo.Bar.== not unreasonable, although do consider that it's possible to define several Foo.Bar.=='s each part of different classes defined in different modules. All of these however, don't put semicolons between Prelude.Eq (why not Prelude;Eq ?) Foo.Bar.Baz (why not Foo.Bar;Baz ?) . Since qualification is required, Foo.Bar;Prelude;Eq;Foo.Bar;Baz;== would be perfectly unambiguous, containing the three relevant modules, the class name, the type name, (and the member name when it's not the whole dictionary being referred to). Prelude.Eq and Foo.Bar.Baz are tightly tied together, though, and it would be nice for them to use a different separator symbol. such as Foo.Bar#Prelude;Eq#Foo.Bar;Baz#== if we can only come up with a good '#' (two semicolons?? some curly bracket(s)?? the "`" symbol used in haskell to make functions infix??) Anyway, code that uses it should have an abstraction (data type or just functions) for dealing with them in a comprehensible way. So we are free to choose any simple unambiguous form we want. BTW. good work on Core, I hope to be compiling to it some day soon (don't get your hopes up, I have more unfinished plans than time ;) Isaac

On Sat, Aug 04, 2007 at 11:01:38PM +0100, Tom Shackell wrote:
Consider the following class instance
module Foo.Bar
data Baz = Baz
instance Eq Baz where a == b = True
The Core generated for the '==' function would currently look like:
Foo.Bar.Prelude.Eq.Foo.Bar.Baz.== a b = True
This encodes: - that the instance is defined in the Foo.Bar module - that it is an instance of the class Prelude.Eq - that the data type being given an instance is Foo.Bar.Baz - that the function being defined is '=='
suggest changing the name Core generates to
Foo.Bar;Prelude.Eq.Foo.Bar.Baz.==
How do you know that this isn't: - that the instance is defined in the Foo.Bar module - that it is an instance of the class Prelude.Eq.Foo - that the data type being given an instance is Bar.Baz - that the function being defined is '==' ? Thanks Ian

Ian Lynagh wrote:
How do you know that this isn't:
- that the instance is defined in the Foo.Bar module - that it is an instance of the class Prelude.Eq.Foo - that the data type being given an instance is Bar.Baz - that the function being defined is '=='
?
Indeed you don't, however, the yhc compiler only cares about being able to separate the module name so the change is sufficient for its purposes. To give a truely unique and unambiguous name something like Foo.Bar;Prelude.Eq;Foo.Bar.Baz;== would be necessary. This would also be a possible encoding if people thought it was preferable. Cheers :-) Tom

Hi
Indeed you don't, however, the yhc compiler only cares about being able to separate the module name so the change is sufficient for its purposes.
To give a truely unique and unambiguous name something like
Foo.Bar;Prelude.Eq;Foo.Bar.Baz;==
would be necessary. This would also be a possible encoding if people thought it was preferable.
I would have thought the best encoding would be: Foo.Bar;Foo.Bar.Baz;Prelude.Eq.== OR Prelude.Eq;Foo.Bar.Baz;Foo.Bar.== In the first the rule for extracting module names is "up to the first ; or last .", in the second its "skip all bits before ;" - and I'm leaning towards the first. I read the first as "hey, I'm located in Foo.Bar, I operate over Foo.Bar.Baz and I'm really just a type specific version of Prelude.Eq.==" Although I would be perfectly happy with 3 semi-colons - I think it is important to know and track class dictionaries at the Core level. Whatever scheme we do go with, I suggest a coreFunModule function be added which obtains the module from a function name, and assuming dictionaries are unambiguous, coreFunDictionary to split up the pieces. Thanks Neil
participants (5)
-
Ian Lynagh
-
Isaac Dupree
-
Malcolm Wallace
-
Neil Mitchell
-
Tom Shackell