
On 02/04/2012 07:37, Mikhail Glushenkov wrote:
Hi all,
[Hoping it's not too late.]
During my work on parallelising 'ghc --make' [1] I encountered a stumbling block: running 'ghc --make' can be often much faster than using separate compile ('ghc -c') and link stages, which means that any parallel build tool built on top of 'ghc -c' will be significantly handicapped [2]. As far as I understand, this is mainly due to the effects of interface file caching - 'ghc --make' only needs to parse and load them once. One potential improvement (suggested by Duncan Coutts [3]) is to produce whole-package interface files and load them in using mmap().
Questions:
Would implementing this optimisation be a worthwhile/realistic GSoC project? What are other potential ways to bring 'ghc -c' performance up to par with 'ghc --make'?
My guess is that this won't have a significant impact on ghc -c compile times. The advantage of squashing the .hi files for a package together is that they could share a string table, which would save a bit of space and time, but I think the time saved is small compared to the cost of deserialising and typechecking the declarations from the interface, which still has to be done. In fact it might make things worse, if the string table for the whole base package is larger than the individual tables that would be read from .hi files. I don't think mmap() will buy very much over the current scheme of just reading the file into a ByteArray. Of course this is all just (educated) guesswork without actual measurements, and I could be wrong... Perhaps there are ways to optimise the reading of interface files. A good first step would be to do some profiling and see where the hotspots are. Cheers, Simon