Re: [jhc] Hotspots.

19 Feb 2008


      On Feb 16, 2008 12:01 AM, John Meacham  wrote:
...
On Fri, Feb 15, 2008 at 07:21:55PM +0100, Lemmih wrote:
...
Greetings,
I've found a few hotspots that'll be working on. I'd be very
interested in discussing solutions.
Performance flaws:
 * IdMaps are used to generate new ids.
 * Ho files contain huge amounts of duplicate information.
 * Ho files aren't saved lazily.
 * C code is used for generating atoms.
Repeatedly mapping variables to 'const Nothing' is very expensive. It
is currently the most expensive procedure in Jhc, taking ~20% CPU time
when compiling the base library.
Hmm.. yeah, sometimes I used Maps, sometimes Sets, depending on what I
already have and sometimes it helped to switch between them, sometimes
not. it wasn't always obvious.  Ideally all new id selection will be
done in Name.Id as part of the general plan to turn Id into a newtype.
It has the newIds routine, a couple variants of that to work on IdMap
and IdSet would be good. if I am just doing a map (const Nothing) before
passing to the id selection routine then that can probably just be
dropped since the id selection stuff doesn't care about the actual
values in the map.
The id selection can be finicky, using Set.size to seed the iterations
helped a bunch but I wanted to try a hash function from Id -> Id at some
point as it should reduce the time spent linearly probing for an open
Id.
...
The base library contains 65 megabytes of uncompressed data. Most of
that is duplicate information that disappears when it is compressed.
However, parsing that amount of data takes considerable time.
Which duplicate data in particular is concerning you? I am in the
process of completely reorganizing the Ho file layout so it is probably
best to hold off here.  Some of the redundancy is there on purpose, but
most probably isn't.
Each atom is saved ~100 times. A TVr can contain 50k of data and each
TVr is saved ~24 times.

-- 
Cheers,
  Lemmih