I did some reading and realized that the question is a bit too wide. I've got now Simon Marlow's book on parallel/concurrent haskell and going through it.

Files depend on each other because one defines something that the other uses. There's more in SystemVerilog, but assume #define and #ifdef pairs.

So the dependency is only implicit: you know it only after you've parsed both files.

Nevertheless, a well organized system will #include files with definitions (and do the same for other SystemVerilog things that cause dependencies) so I'm ok with optimistically assume no dependency and compute a list of "problems" that cause recompilation.

So if:

file A:

...

#define foo

...

file B:

...

#ifdef foo

...

#endif

...

and the command line has the order fileA, fileB we compile both files in parallel, but the result of compiling A will be (ast, [foo], []) and the result of compiling B will be (ast/Error, [], [foo]) since foo is in the use list of B and in the def list of A we know we have to recompile B (and compiling B doesn't pass a def list to a possible C until we're satisfied with it). So when there're dependencies, I'm ok with recompiling multiple times.

As for the second question, splitting a toplevel across files is even more perverse and should be extremely rare (I think is there only for historical reasons: the first Verilog simulators assumed the entire design available and basically did a 'cat f1..fN' before compiling. Again, I'm happy with redo work in this case. If compilation units are [ [a,b,c], [d,e] ] and you have a parse error in c after having consumed the entire file and d has also a parse error, we restart with a new specification [ [a,b,(c,d)] [e]] where (c,d) means concatenation of the content of file c and d. Parse for a and b are still valid. Parse for e might be valid. But even if we invalidate and redo everything in exchange for simple code, I'm ok with it.

My question can be abstracted a bit more in terms of list processing.

We have a list of lists [[a]]. We need to process each leaf element by mapping a function p a e which process an element of type 'a' in an environment 'e' which comes form the evaluation of previous elements. Up to here, would be normal monadic code to hide 'e'. In addition since in our domain, we assume 'e' `intersect` 'u' is often empty (where 'u' are the uses that we discover while processing an element) we'd like to start the processing of all elements in parallel and redo the ones that turned out to be invalidated by previous ones. I'd like to find a solution to this that is in good Haskell style.

We can forget the toplevel split across files for now, that's can be dealt in a outer layer that restarts everything throwing away all the work.

On Sat, Mar 28, 2015 at 7:13 AM, Konstantine Rybnikov <k-bx@k-bx.com> wrote:

Maurizio,

It's not completely clear how your compiler knows which files depend on which. Does it read files one by one and "updates" dependency list? What about situation when toplevel construct is split in two files, how does this single-file API handle it?

On Fri, Mar 27, 2015 at 3:38 PM, Maurizio Vitale <mrz.vtl@gmail.com> wrote:
G'day,
I'm trying to decide how to architect the frontend of a compiler I'm using as an excuse for learning Haskell.

Suppose we have a language (SystemVerilog) that has the notion of multi-file compilation units.
This means that specific subset of files form separate scopes for certain things.
For instance if we represent with lists of lists compilation units:
[[a,b,c], [d], [e,f]]
a macro definition in file b, would affect the rest of the file and file c, but wouldn't have effect on files d,e or a.

Furthermore, if a toplevel construct is split between two files, say c and d, the compiler should treat the original compilation unit specification as if it was:
[[a,b,(c,d)][e,f]], where (c,d) means we're compiling the logical concatenation of c and d.
If (c,d) is also incomplete, (c,d,e) should be tried and so on.

Now in well designed systems, all files can actually be compiled in parallel and so I'd like to optimize for that case and recompile files (either because we need side effects from files before them or because they're incomplete. So I'm not to concerned if wasted work is done for the degenerate cases.

Any suggestion on how to go about this?
Le's assume that parsing a file is done with a function:
p :: file -> Either Error (ast, [defs], [use]) where the [defs] are things that might affects files down the line and [use] are things that, if defined by some prior file, cause recompilation.

I'm interested in both a sequential and parallel solution and it would be sweet if they where similar, with the proper abstractions.

Thanks a lot for any insight/ideas,

Maurizio

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners