On Feb 16, 2008 12:01 AM, John Meacham
On Fri, Feb 15, 2008 at 07:21:55PM +0100, Lemmih wrote:
Greetings,
I've found a few hotspots that'll be working on. I'd be very interested in discussing solutions.
Performance flaws: * IdMaps are used to generate new ids. * Ho files contain huge amounts of duplicate information. * Ho files aren't saved lazily. * C code is used for generating atoms.
Repeatedly mapping variables to 'const Nothing' is very expensive. It is currently the most expensive procedure in Jhc, taking ~20% CPU time when compiling the base library.
Hmm.. yeah, sometimes I used Maps, sometimes Sets, depending on what I already have and sometimes it helped to switch between them, sometimes not. it wasn't always obvious. Ideally all new id selection will be done in Name.Id as part of the general plan to turn Id into a newtype. It has the newIds routine, a couple variants of that to work on IdMap and IdSet would be good. if I am just doing a map (const Nothing) before passing to the id selection routine then that can probably just be dropped since the id selection stuff doesn't care about the actual values in the map.
The id selection can be finicky, using Set.size to seed the iterations helped a bunch but I wanted to try a hash function from Id -> Id at some point as it should reduce the time spent linearly probing for an open Id.
The base library contains 65 megabytes of uncompressed data. Most of that is duplicate information that disappears when it is compressed. However, parsing that amount of data takes considerable time.
Which duplicate data in particular is concerning you? I am in the process of completely reorganizing the Ho file layout so it is probably best to hold off here. Some of the redundancy is there on purpose, but most probably isn't.
Each atom is saved ~100 times. A TVr can contain 50k of data and each TVr is saved ~24 times. -- Cheers, Lemmih