Files and Modules

older
[ANNOUNCE] GHC 9.12.1-rc1 is now...

julian getcontented.com.au

2 Dec 2024 2 Dec '24

9:16 a.m.

In a recent project that compiles Haskell source from data (ie of type Text from the module Data.Text), it would be useful to be able to decouple the dependency between GHC’s notion of where modules are and the file system. This doesn’t seem to be programmatically controllable. How tenable is this? Would it be useful for anyone else to have compilation itself be more first class in the language? If I think about languages such as LISP/Racket/Clojure, there’s a certain flexibility there that Haskell lacks, but it’s not apparent why, other than historical reasons? Would this imply changing compiling and linking into a different Monad than IO? At the moment to compile some source that exists in Text, my system has to write a bunch of temp files including the Text that contains the main module, and then put other modules in directories named a certain way, run the compiler across them via some exec command to call GHC or stack externally, then read the resulting executable back off disk to store it in its final destination. It might be useful to be able to do this from within Haskell code directly, partly similarly to how the hint library works. Though, in this case it would almost certainly also require being able to have two versions of GHC loaded at once, which would also imply being able to simultaneously have multiple or different versions of libraries loaded at once, too, and possibly also just from data, ie not from disk. It feels like a massive, massive project at that point, though, like we’d be putting an entire dependency system into a first-class programmable context. I’m still interested in what folks think about these ideas, though, event though we this may never eventuate. Does it seem to anyone else like abstracting the library and module-access capabilities of compilation so that it’s polymorphic over where it gets its data from might be useful? Is this just ridiculous? Does this step into Backpack's territory? From memory, the Haskell report doesn’t specify that modules necessarily need to be tied to the file system, but I think GHC imposes one file per module and that it be one the FS. Julian

Show replies by date

Brandon Allbery

2 Dec 2 Dec

9:36 a.m.

As long as we're using system tools for assembling and linking, we must abide by their constraints. Not using them isn't really viable; linking in particular is a nightmare. The bytecode interpreter has a partial linker because it can't use the system one, and it's easily the nastiest part of ghc. It's also completely nonportable, by definition: every target has its own notion of what relocations are available and how they work. On Mon, Dec 2, 2024 at 4:16 AM julian getcontented.com.au < julian@getcontented.com.au> wrote:

...

In a recent project that compiles Haskell source from data (ie of type Text from the module Data.Text), it would be useful to be able to decouple the dependency between GHC’s notion of where modules are and the file system. This doesn’t seem to be programmatically controllable.

How tenable is this? Would it be useful for anyone else to have compilation itself be more first class in the language? If I think about languages such as LISP/Racket/Clojure, there’s a certain flexibility there that Haskell lacks, but it’s not apparent why, other than historical reasons? Would this imply changing compiling and linking into a different Monad than IO?

At the moment to compile some source that exists in Text, my system has to write a bunch of temp files including the Text that contains the main module, and then put other modules in directories named a certain way, run the compiler across them via some exec command to call GHC or stack externally, then read the resulting executable back off disk to store it in its final destination.

It might be useful to be able to do this from within Haskell code directly, partly similarly to how the hint library works. Though, in this case it would almost certainly also require being able to have two versions of GHC loaded at once, which would also imply being able to simultaneously have multiple or different versions of libraries loaded at once, too, and possibly also just from data, ie not from disk. It feels like a massive, massive project at that point, though, like we’d be putting an entire dependency system into a first-class programmable context. I’m still interested in what folks think about these ideas, though, event though we this may never eventuate.

Does it seem to anyone else like abstracting the library and module-access capabilities of compilation so that it’s polymorphic over where it gets its data from might be useful? Is this just ridiculous? Does this step into Backpack's territory? From memory, the Haskell report doesn’t specify that modules necessarily need to be tied to the file system, but I think GHC imposes one file per module and that it be one the FS.

Julian _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

-- brandon s allbery kf8nh allbery.b@gmail.com

Hécate

9:42 a.m.

One compromise GHC could make would be abstract the notion of module from the Haskell source (multiple modules per file, no file involved at all), but ultimately as Brandon says, the file system has to get involved for the linker, so the concept of a module needs to remain compatible with the action of writing it as a file. Le 02/12/2024 à 10:36, Brandon Allbery a écrit :

...

As long as we're using system tools for assembling and linking, we must abide by their constraints. Not using them isn't really viable; linking in particular is a nightmare. The bytecode interpreter has a partial linker because it can't use the system one, and it's easily the nastiest part of ghc. It's also completely nonportable, by definition: every target has its own notion of what relocations are available and how they work.

On Mon, Dec 2, 2024 at 4:16 AM julian getcontented.com.au http://getcontented.com.au wrote:

In a recent project that compiles Haskell source from data (ie of type Text from the module Data.Text), it would be useful to be able to decouple the dependency between GHC’s notion of where modules are and the file system. This doesn’t seem to be programmatically controllable.

How tenable is this? Would it be useful for anyone else to have compilation itself be more first class in the language? If I think about languages such as LISP/Racket/Clojure, there’s a certain flexibility there that Haskell lacks, but it’s not apparent why, other than historical reasons? Would this imply changing compiling and linking into a different Monad than IO?

At the moment to compile some source that exists in Text, my system has to write a bunch of temp files including the Text that contains the main module, and then put other modules in directories named a certain way, run the compiler across them via some exec command to call GHC or stack externally, then read the resulting executable back off disk to store it in its final destination.

It might be useful to be able to do this from within Haskell code directly, partly similarly to how the hint library works. Though, in this case it would almost certainly also require being able to have two versions of GHC loaded at once, which would also imply being able to simultaneously have multiple or different versions of libraries loaded at once, too, and possibly also just from data, ie not from disk. It feels like a massive, massive project at that point, though, like we’d be putting an entire dependency system into a first-class programmable context. I’m still interested in what folks think about these ideas, though, event though we this may never eventuate.

Does it seem to anyone else like abstracting the library and module-access capabilities of compilation so that it’s polymorphic over where it gets its data from might be useful? Is this just ridiculous? Does this step into Backpack's territory? From memory, the Haskell report doesn’t specify that modules necessarily need to be tied to the file system, but I think GHC imposes one file per module and that it be one the FS.

Julian _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

-- brandon s allbery kf8nh allbery.b@gmail.com

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

-- Hécate ✨ 🐦: @TechnoEmpress IRC: Hecate WWW:https://glitchbra.in RUN: BSD

Sylvain Henry

12:31 p.m.

It's not ridiculous. I would also like the GHC API to abstract over the file system so that I can store modules in a DB. It would require quite a lot of refactoring though... Sylvain On 02/12/2024 10:16, julian getcontented.com.au wrote:

...

In a recent project that compiles Haskell source from data (ie of type Text from the module Data.Text), it would be useful to be able to decouple the dependency between GHC’s notion of where modules are and the file system. This doesn’t seem to be programmatically controllable.

How tenable is this? Would it be useful for anyone else to have compilation itself be more first class in the language? If I think about languages such as LISP/Racket/Clojure, there’s a certain flexibility there that Haskell lacks, but it’s not apparent why, other than historical reasons? Would this imply changing compiling and linking into a different Monad than IO?

At the moment to compile some source that exists in Text, my system has to write a bunch of temp files including the Text that contains the main module, and then put other modules in directories named a certain way, run the compiler across them via some exec command to call GHC or stack externally, then read the resulting executable back off disk to store it in its final destination.

It might be useful to be able to do this from within Haskell code directly, partly similarly to how the hint library works. Though, in this case it would almost certainly also require being able to have two versions of GHC loaded at once, which would also imply being able to simultaneously have multiple or different versions of libraries loaded at once, too, and possibly also just from data, ie not from disk. It feels like a massive, massive project at that point, though, like we’d be putting an entire dependency system into a first-class programmable context. I’m still interested in what folks think about these ideas, though, event though we this may never eventuate.

Does it seem to anyone else like abstracting the library and module-access capabilities of compilation so that it’s polymorphic over where it gets its data from might be useful? Is this just ridiculous? Does this step into Backpack's territory? From memory, the Haskell report doesn’t specify that modules necessarily need to be tied to the file system, but I think GHC imposes one file per module and that it be one the FS.

Julian _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

jo＠durchholz.org

1:24 p.m.

On 02.12.24 10:16, julian getcontented.com.au wrote:

...

In a recent project that compiles Haskell source from data (ie of type Text from the module Data.Text), it would be useful to be able to decouple the dependency between GHC’s notion of where modules are and the file system. This doesn’t seem to be programmatically controllable.

How tenable is this?

I can report how this works in the Java world, where this exists. TL;DR: The real issues are programmer workflows, tool integration, and optimization, not so much semantic issues. The mechanism itself is a non-problem there, as it was designed right into the ecosystem right from the start. Even there, it came with a number of significant downsides. It places pretty hefty constraints on global optimizations, inlining in particular: You can't usefully inline if you don't know if a call will be polymorphic because somebody added a subclass. You either have to prevent subclassing, or forfeit cross-module inlining, or keep track of dependencies so you can undo inlining whenever assumptions about polymorphism are broken by new code. Now Haskell's polymorphism is different from Java's, but I'd expect similar issues. In the Java world, this meant integrating the optimization phase into the runtime system, increasing the code and memory footprint of the JVM, and with a heavy runtime code during start-up when the bulk of optimisations is run. The Java world is currently swinging from dynamic code loading to static precompilation (look for references to GraalVM is you want to know more); however, this burdens the application programmer with defining the dynamic loading behaviour of the class system, even though that's done in the build specifications and not in the code itself (which comes with its own set of problems, such as having to cross-reference code and build specs when reasoning about code, though that affects only those who want to control compiler behaviour tightly).

...

Would it be useful for anyone else to have compilation itself be more first class in the language? If I think about languages such as LISP/Racket/Clojure, there’s a certain flexibility there that Haskell lacks, but it’s not apparent why, other than historical reasons?

These languages are extremely hard to optimize, so at least the GHC people won't be able to follow that route. If that's fine, then your suggestions seems doable.

...

Would this imply changing compiling and linking into a different Monad than IO?

I can't say with confidence but I wouldn't expect that to be an issue. A compiler typically maps special operations like this to machine code or possibly intermediate code, as part of the optimization phase. The selection of IO mechanism is more a programmer-facing issue. The exact conditions under which what optimization is applicable does depend on the details of the IO mechanism's semantics, so I guess nobody will want to even touch that part of the mechanism, and say that IO is fine; they'd rather modify other systems to make IO work, if there are problems with it.

...

At the moment to compile some source that exists in Text, my system has to write a bunch of temp files including the Text that contains the main module, and then put other modules in directories named a certain way, run the compiler across them via some exec command to call GHC or stack externally, then read the resulting executable back off disk to store it in its final destination.

And now we're in the area of application programmer downsides. If you make these files temporary and unavailable to the programmer for debugging, stack traces and such become meaningless. You'll have to add tooling for that. I.e. code that takes the stack traces and maps them back to the specification language that you're generating code from. You'll have to consider the messages from that stack trace low-level, and add a translation step that transforms the low-level semantics to what the programmer specified, i.e. you have to know exactly what Haskell code patterns can exist, have a full list of possible errors, and code that does the translation. Similar considerations apply to debuggers, profiling tools and whatever other programming tools with a connection to code lines have. That's a pretty tall order, not only because of the translation step, but because you have to integrate that translation with a multitude of tools, some (most?) of them under active development, i.e. moving targets. It's a pretty tall order. Code generators like yours typically use another technique: Generate the code into a directory that's not under version control but part of the module paths of all tools. Generate code with comments that refer back to the original specification, utilizing the programmer's knowledge to do the backwards translation. In the Java world, this kind of stuff was recently integrated into the toolchains. There's a mechanism called an "annotation processor" (please ignore the "annotation" part, it's just the trigger for the mechanism) which will be run by the Java compiler and generate the code into a generated-code subdirectory; the toolchains know to include this directory into their module paths, since pretty recently even by default.

...

It might be useful to be able to do this from within Haskell code directly, partly similarly to how the hint library works. Though, in this case it would almost certainly also require being able to have two versions of GHC loaded at once, which would also imply being able to simultaneously have multiple or different versions of libraries loaded at once, too, and possibly also just from data, ie not from disk. It feels like a massive, massive project at that point, though, like we’d be putting an entire dependency system into a first-class programmable context. I’m still interested in what folks think about these ideas, though, event though we this may never eventuate. It will be less massive if you start with integrating code generation into the toolchains I think. But yeah, I think it's still a massive project.

...

Does it seem to anyone else like abstracting the library and module- access capabilities of compilation so that it’s polymorphic over where it gets its data from might be useful? You can't usefully work with generated code unless you automate the translation of messages from the Haskell level to the new combined Haskell+specifications language. I am not sure that such a thing is even realistically doable; even the Java ecosystem has been shying away from that approach, despite having much more manpower than Haskell's, and despite having a JVM that was designed and built for integrating anonymous code. E.g. I hit bugs^W unexpected behaviour in Hibernate-generated code that I couldn't diagnose. The behaviour remained a mystery, the code generation was too deeply hidden below layers of polymorphic library code, so I started the bytecode disassembler and found, to my incredulous amazement, that Hibernate would add attributes, a behaviour that Hibernate did not document ever; I had accidentally defined an attribute with a conflicting name, so Hibernate would interfere with application logic and vice versa.

Sorry for the wall of text, but it's a pretty big topic. HTH Jo

320

Age (days ago)

320

Last active (days ago)

List overview

Download

4 comments

5 participants

participants (5)

Brandon Allbery
Hécate
jo＠durchholz.org
julian getcontented.com.au
Sylvain Henry