I did some reading and realized that the question is a bit too wide. I've got now Simon Marlow's book on parallel/concurrent haskell and going through it.
Files depend on each other because one defines something that the other uses. There's more in SystemVerilog, but assume #define and #ifdef pairs.
So the dependency is only implicit: you know it only after you've parsed both files.
Nevertheless, a well organized system will #include files with definitions (and do the same for other SystemVerilog things that cause dependencies) so I'm ok with optimistically assume no dependency and compute a list of "problems" that cause recompilation.
So if:
file A:
...
#define foo
...
file B:
...
#ifdef foo
...
#endif
...
and the command line has the order fileA, fileB we compile both files in parallel, but the result of compiling A will be (ast, [foo], []) and the result of compiling B will be (ast/Error, [], [foo]) since foo is in the use list of B and in the def list of A we know we have to recompile B (and compiling B doesn't pass a def list to a possible C until we're satisfied with it). So when there're dependencies, I'm ok with recompiling multiple times.
As for the second question, splitting a toplevel across files is even more perverse and should be extremely rare (I think is there only for historical reasons: the first Verilog simulators assumed the entire design available and basically did a 'cat f1..fN' before compiling. Again, I'm happy with redo work in this case. If compilation units are [ [a,b,c], [d,e] ] and you have a parse error in c after having consumed the entire file and d has also a parse error, we restart with a new specification [ [a,b,(c,d)] [e]] where (c,d) means concatenation of the content of file c and d. Parse for a and b are still valid. Parse for e might be valid. But even if we invalidate and redo everything in exchange for simple code, I'm ok with it.
My question can be abstracted a bit more in terms of list processing.
We have a list of lists [[a]]. We need to process each leaf element by mapping a function p a e which process an element of type 'a' in an environment 'e' which comes form the evaluation of previous elements. Up to here, would be normal monadic code to hide 'e'. In addition since in our domain, we assume 'e' `intersect` 'u' is often empty (where 'u' are the uses that we discover while processing an element) we'd like to start the processing of all elements in parallel and redo the ones that turned out to be invalidated by previous ones. I'd like to find a solution to this that is in good Haskell style.
We can forget the toplevel split across files for now, that's can be dealt in a outer layer that restarts everything throwing away all the work.