Proposals for changes to searching behaviour

Hi Folks, Henrik Nillson & I have been discussing the current behaviour of GHC when searching for modules, particularly in combination with hierarchical modules, and have identified two ways in which things might be made more flexible. The current situation forces you to put sources in a directory hierarchy which mirrors the module hierarchy. While simple, this might be inconvenient, and will certainly be painful when the module hierarchy gets deeper. Also, the current situation seems to have caught people out more than once (perhaps that's a documentation problem, though). My own preference is for keeping things as simple as possible, so I'm generally in favour of the status quo - but we always value feedback from our users, so please let us know what you think. Anyway, here's the two suggestions: - The sources for a module A.B.C would be allowed to be placed in either A.B.C.hs or A/B/C.hs relative to one of the directories in the search path. Currently only A/B/C.hs is allowed. This is an easy change to make, and I believe Hugs already does it this way. - We could provide the ability to specify a module prefix to associate with a directory in the search path. For example, you could say that the directory '.' is associated with the module prefix "Graphics.Rendering.OpenGL" and avoid having to place your sources in the directory Graphics/Rendering/OpenGL. I'm not sure what syntax we'd use for this. Henrik suggested placing the module prefix in square brackets before the directory, eg. ghc -i '-i[Graphics.Rendering.OpenGL].' In contrast to the previous suggestion, this would actually save some trips to the OS when GHC is looking for files. Please let us know if either of these would make your life easier, or if there's anything else you'd like to see. Cheers, Simon

On Mon, Dec 09, 2002 at 12:40:18PM -0000, Simon Marlow wrote:
- The sources for a module A.B.C would be allowed to be placed in either A.B.C.hs or A/B/C.hs relative to one of the directories in the search path. Currently only A/B/C.hs is allowed.
Please let us know if either of these would make your life easier, or if there's anything else you'd like to see.
Since the problem is often that your program/library has a managable depth, but it is _located_ very deep in the hierarchy (eg. User.* modules if you have a long and complicated domain), then how about allowing A.B/C.hs for module A.B.C? Then you don't need to have a long, mostly empty dummy hierarchy. Lauri Alanko la@iki.fi

Hi Lauri,
Since the problem is often that your program/library has a managable depth, but it is _located_ very deep in the hierarchy (eg. User.* modules if you have a long and complicated domain), then how about allowing A.B/C.hs for module A.B.C? Then you don't need to have a long, mostly empty dummy hierarchy.
Simon M. and I thought about that, but felt that it was unnecessarily complicated since it would lead to the number of possible file names for a module growing exponentially with the number components in the module name. It would probably be possible to devise a recursive search algorithm to deal with this, but that would be significantly more complicated than what GHC currently does, and, worse, users would also have to be aware of the exact algorithm used. Probably not worth it. But note, that with the suggested approach, you can flatten the hierarchy as much or as little as you like, and you can call the top-level directory whatever you like, including "A.B", since the compiler does not care about that name. /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu

On Mon, Dec 09, 2002 at 12:40:18PM -0000, Simon Marlow wrote:
- The sources for a module A.B.C would be allowed to be placed in either A.B.C.hs or A/B/C.hs relative to one of the directories in the search path. Currently only A/B/C.hs is allowed.
This is an easy change to make, and I believe Hugs already does it this way.
I like this idea, especially if this is currently the way Hugs does it. It's great for smaller projects.
- We could provide the ability to specify a module prefix to associate with a directory in the search path. For example, you could say that the directory '.' is associated with the module prefix "Graphics.Rendering.OpenGL" and avoid having to place your sources in the directory Graphics/Rendering/OpenGL.
I'm not sure what syntax we'd use for this. Henrik suggested placing the module prefix in square brackets before the directory, eg. ghc -i '-i[Graphics.Rendering.OpenGL].'
This seems a bit unpredictable to me; it means that you can
have a whole bunch of unrelated modules sitting together in the
same directory, and then confuse the user even more with obscure
GHC commandline switches :).
I'd argue that if you have a Graphics.Rendering.OpenGL module,
you should make it 100% obvious that the module is in
a Graphics.Rendering category; either putting it in a
Graphics/Rendering directory or having
a Graphics.Rendering.OpenGL.hs file makes this explicit.
To put it another way -- is there a situation where you don't
want to use either of the above two module naming schemes, and
can justify having unrelated modules in an arbitrarily organised
directory structure?
--
#ozone/algorithm

Hi Andre,
I like this idea, especially if this is currently the way Hugs does it. It's great for smaller projects.
Yes, we believe Hugs does allow "A.B.C.hs" as well as "A/B/C.hs". Ultimately, I think it is actually going to quite important that the different Haskell tools provoide reasonably compatible ways of finding sources/libraries. That includes the mechanisms discussed here as well as a package mechanims.
I'm not sure what syntax we'd use for this. Henrik suggested placing the module prefix in square brackets before the directory, eg. ghc -i '-i[Graphics.Rendering.OpenGL].'
This seems a bit unpredictable to me; it means that you can have a whole bunch of unrelated modules sitting together in the same directory, and then confuse the user even more with obscure GHC commandline switches :).
First, the point here is to reduce the number of assumptions built into a Haskell system about where sources live and what they are called. The fewer assumptions that are made, the greater the likelyhood that it will interoperate smoothly with other tools. Try to use the current mapping from hierarchical module names to file names with the Make VPATH mechanism for an example of what I mean. Second, I don't think this particular example of command line syntax is that obscure. Being able to tell GHC (and ultimately, I would hope, other Haskell implementations), what part of the module hierarchy they can find along a certain search path seems quite natural to me.
I'd argue that if you have a Graphics.Rendering.OpenGL module, you should make it 100% obvious that the module is in a Graphics.Rendering category; either putting it in a Graphics/Rendering directory or having a Graphics.Rendering.OpenGL.hs file makes this explicit.
I see little merrit in having a Haskell system enforce such rules. When dealing with large systems possibly involving many different languages and tools, it is very hard to predict what kind of source structure that is going to be most suitable and most easy for someone to get familiar with. Ultimately, the person(s) implementing an application/library is/are the one(s) best qualified for making such decisions, and the tools should ideally support that (within reason), not get in the way. Furthermore, different people might have different opinions on what's obvious, or what's the best tradeoff betweem "obviousness"/conveniece.
To put it another way -- is there a situation where you don't want to use either of the above two module naming schemes, and can justify having unrelated modules in an arbitrarily organised directory structure?
Well, again, ultimately I think the application/libabry implementor(s) should gave the ultimate say as to how to organize his/her/their sources, and what files that are sufficiently related to be put in a single directory. One reason I don't particularly like the "A/B/C" scheme, is that my sources can end up being spread out over several directories just because of the names I happen to choose for the modules. If, say, a library consists of the top-level module "A.B.C" and a bunch of internal components "A.B.C.M1", "A.B.C.M2", etc., I can't see why I should not be allowed to put them all in one directory. Another reason is how it interacts with tools like "Make". I've already mentioned the VPATH mechanism. Other make facilities like its (as well as the invoked shell's) wildcard support for file name matching also becomes much less useful. The reason I'm not quite happy with "fully qualified" file names, is that they could become inconveniently long, and that it still can make sense to use directories for part of the module hierarchy. I think the two suggestions ("." as an alternative to "/", and the possiblity to associate a search path with a module prefix) complement each other quite nicely, yielding a scheme which lets the implementors decide how to best organize their source code. /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu

On Mon, Dec 09, 2002 at 12:03:10PM -0500, nilsson@cs.yale.edu wrote:
If, say, a library consists of the top-level module "A.B.C" and a bunch of internal components "A.B.C.M1", "A.B.C.M2", etc., I can't see why I should not be allowed to put them all in one directory.
I think that's the selling point for me. I'm now convinced it's a good idea. That being said, I wrote that message because I've been struggling with obscure build tools for the last few days, which work fine until they break -- then you're in for hours of pain. So I prefer to stick to fairly obvious, explicit ways of doing something, and I guess I saw Simon's second suggestion as another way to have more pain-inducing bizarre build schemes via make. (Of course, this relies on the developer wanting to invent bizarre Makefiles, but I've seen _plenty_ of Makefiles where I can't even _begin_ to work out how they work. That's my fear.)
Another reason is how it interacts with tools like "Make". I've already mentioned the VPATH mechanism.
It's all bad once that VPATH word gets mentioned ;).
--
#ozone/algorithm

It's all bad once that VPATH word gets mentioned ;).
Granted! But other work-arounds can be even worse! And again, if an implementor for some reason thinks he or she could benefit from using a mechanism like VPATH, it's good if that's not too painful. /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu

Hi,
I'm not sure what syntax we'd use for this. Henrik suggested placing the module prefix in square brackets before the directory, eg. ghc -i '-i[Graphics.Rendering.OpenGL].'
In contrast to the previous suggestion, this would actually save some trips to the OS when GHC is looking for files.
Actually, I suggested putting it *after*. But putting it before might be quite nice, actually. So credit where credit is due! /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu

- The sources for a module A.B.C would be allowed to be placed in either A.B.C.hs or A/B/C.hs relative to one of the directories in the search path. Currently only A/B/C.hs is allowed.
This is an easy change to make, and I believe Hugs already does it this way.
Sounds useful (provided all major implementations agree). Btw, there's another convention, of adding suffixes to indicate processing phases (e.g., Main.hs.cpp.gz.uue). Is that likely to lead to conflicts (perhaps Main.WWW.hs for pipe through runhugs then post on web site?-)?
- We could provide the ability to specify a module prefix to associate with a directory in the search path. For example, you could say that the directory '.' is associated with the module prefix "Graphics.Rendering.OpenGL" and avoid having to place your sources in the directory Graphics/Rendering/OpenGL.
I'm not sure what syntax we'd use for this. Henrik suggested placing the module prefix in square brackets before the directory, eg. ghc -i '-i[Graphics.Rendering.OpenGL].'
Does that mean I can refer to X.hs as [Graphics.Rendering.OpenGL(.Graphics.Rendering.OpenGL)*/]X.hs ?-) Probably no problem with Haskell's explicit imports.
In contrast to the previous suggestion, this would actually save some trips to the OS when GHC is looking for files.
I'm not sure about the details of your first suggestion, but if you take it to permit all mixtures of "."and "/", such as A.B/C.hs (I assume at least A/B.C.hs is already permitted?), you can get rid of the second - e.g., simply make "Graphics.Rendering.OpenGL" a link to ".". That way, directory trees could be compressed (even the middle sections of long paths) while providing visible documentation of those shortcuts. I'd prefer that to compiler options (and instead of shortcutting into ".", I'd probably have one second layer of directories, with shortcuts from "." into those, if only to avoid naming conflicts at the leaves of the tree..). Claus

The suggested changes sound hard to understand and to implement consistently in all compilers. I lean towards leaving the spec as it is. IIRC, the current Hugs semantics is a complex balancing act intended to achieve backward compatability and implement module paths at the same time. I'd prefer to see everyone switch over to the new way so that we can drop old features. -- Alastair Reid

Hi Alastair!
The suggested changes sound hard to understand and to implement consistently in all compilers. I lean towards leaving the spec as it is.
I think they're pretty straightforward, actually. Coudl you elaborate on why it seems complicated to you? As to implementation, I can't imagine that it would be that difficult to get something reasonably consistent. After all, different compilers/ interpreters have managed to handle a language like Haskell98 with extensions in a fairly consitent manner.
IIRC, the current Hugs semantics is a complex balancing act intended to achieve backward compatability and implement module paths at the same time. I'd prefer to see everyone switch over to the new way so that we can drop old features.
Which way is the old and which is the new? I've noticed that Hugs looks fore.g. a module "A.B.C" in both "A/B/C.hs" and "A.B.C.hs", which seems nice to me and is part of what we suggested. The only extra addition would be to associate prefixes with directories in the search path. Is that really complicated? Regards, /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu
participants (6)
-
Alastair Reid
-
Andre Pang
-
Claus Reinke
-
Lauri Alanko
-
nilsson@cs.yale.edu
-
Simon Marlow