A Faster whatIs
---------------

These are the patches to merge the "faster whatIs" changes into the
Nov 2002 Hugs sources.  Use the shell script patchNov2002toFWHATIS.sh
to do the merge.

For completeness, I have appended the pertinent hugs-users email.

milbo@icon.co.za     Dec 9, 2002

---------------------------------------------------------------------

From: "Stephen Milborrow" <milbo@icon.co.za>
To: <hugs-users@haskell.org>
Subject: A Faster whatIs
Date: Sat, 30 Nov 2002 11:10:47 +0200
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200

Hello everyone:

I made some changes to the Hugs sources which give small runtime
speed gains.  The results are in the table below.  The gains are
minor, but I thought some people in the group might be interested.


                                   WHATIS   WHATIS1    FHEAP

Speed Increase (percent)
              gamteb                 8         9         5
              parser                 6         8       gcfail
              prolog                 9         9         2
              queens                10        13         4

Extra BSS memory (kBytes)           24        24         0

Nbr source lines changed            50       144        41



---Notes On The Table

The table headings WHATIS, WHATIS1, and FHEAP refer to three
different sets of changes.  WHATIS and WHATIS1 are described
below.  FHEAP was an experiment that simply used a fixed size
heap (an array instead of a malloc) and is impractical because
it rules out use of the Hugs -h flag.

The (motley) collection of test programs are from nofib.  
I timed runhugs for each program without command line flags,
except that for queens I used -h1000000.  Different programs
would probably give different results: the usual caveats apply.
These programs were just the nofib ones I had at hand.

I compiled with MSC 5.0 Service Pack 3 using modified Nov 2002
Hugs sources. I ran the timing tests on a Windows 98 machine.



---Strategy

I wanted to see what speed gains could be achieved by taking a
worm's eye view of the code, with no changes to fundamental
algorithms.  I also wanted to make changes that would be limited
to just a few places in the code -- a minimal force approach.

To start off, I ran the MSC profiler.  This confirmed that
whatIs() is a candidate for optimization, as already noted in
Mark's Gopher implementation document and in comments in
storage.c.  But the profiler showed a few other hotspots too.
(An interesting thing to do is sort the profiler results on
execution line-count, which immediately tells you which are the
most executed lines in the program.)



---WHATIS Change

To reduce the time spent in whatIs, I created a byte array
whatCode of whatIs codes.  I then created a whatIs macro which
replaces the whatIs function:


  #define whatIs(c) (isPair(c)?                        \
                      (isTag(fst(c)) ? fst(c) : AP ) : \
                      whatCode[c])


Negative indexing into whatCode is prevented by the isPair.  To
keep the size of whatCode reasonable, I had to reduce the range
of unboxed ints -- the bigger the range, the bigger the whatCode
array.  I settled on a range of 2048 i.e. ints between -1023 and
1024 are unboxed, all others are boxed. The extra boxing will
actually slow down execution of certain Haskell programs, but as
far as I can tell the majority of real Hugs programs would be
unaffected.  Increasing this range to, say, 10 000 would increase
memory usage by 10 000 bytes -- nothing really when you consider
that the memory footprint of winhugs is about 10 MBytes.

This change yields the speed improvements under the WHATIS
column in the table.



---WHATIS1 Change

IsTag is defined as

  #define isTag(c)   (TAGMIN<=(c) && (c)<SPECMIN)

I reduced the cost of IsTag slightly by changing defines in
storage.h so that only box-cell-tags are in the range 1 to
0x7f. (I shifted special cell values down to start at
0x80).  I then defined variants for IsTag and friends:

  #define isTag1(c)  (((c) & TAG_MASK) == 0)

  #define whatIs1(c) (isPair(c)?                         \
                        (isTag1(fst(c))? fst(c) : AP ) : \
                        whatCode[c])

  #define isAp1(c)   (isPair(c) && !isTag1(fst(c)))


whatIs1 is a faster version of whatIs, with the "bug" that it
will return the wrong value if fst(c) is NIL.  When used in
several places as a replacement for whatIs, it gains us a few
speed percentage points as shown in the WHATIS1 column above.
WHATIS1 sits on top of WHATIS so the gains attributable to
WHATIS1 alone are the differences between the percentages in the
two columns.



---Another Candidate

Another candidate for this kind of optimization is the line
in eval():

   if (!isCfun(n) && (ar=name(n).arity)<=(sp-base)) {...

This is one of the most executed lines in the entire program.  

If the stored arity of Cfun's was offset by a largish number,
say 10000, then the above test against isCfun wouldn't be
needed.  This change would introduce inefficiencies elsewhere
(we would have to un-offset the stored arity before using it
elsewhere) but the net effect would probably be a speed gain.  I
shied away from this change because it would require ubiquitous
(though easy) code changes.



---Final Comments

Optimizing code written by Mark Jones is a challenge.  It
becomes a little less daunting if we change the rules by
allowing ourselves to waste some memory in the pursuit of speed,
and to introduce some ugliness into the code. Even so, the speed
improvements I got were small.

If anyone is interested, I would be happy to send the sources.
They are modified Nov 2002 sources with all the changes
demarcated by #define's. Six files changed.

I would be interested to know what results these changes give on
other machines.


---
Stephen Milborrow

Return-Path: <alastair@reid-consulting-uk.ltd.uk>
Received: from smtp-5.worldonline.co.za ([192.168.128.75]) by
          istore-1.worldonline.co.za (Netscape Messaging Server 4.15) with
          ESMTP id H6E9FK00.CZT for <milbo@icon.co.za>; Sat, 30 Nov 2002
          16:54:56 +0200 
Received: from reid-consulting-uk.ltd.uk ([127.0.0.1]) by
          smtp-5.worldonline.co.za (Netscape Messaging Server 4.15) with
          ESMTP id H6E9FI02.QCZ for <milbo@icon.co.za>; Sat, 30 Nov 2002
          16:54:54 +0200 
Received: by reid-consulting-uk.ltd.uk (Postfix, from userid 1296)
	id 9B6E75AC45; Sat, 30 Nov 2002 14:54:51 +0000 (GMT)
To: "Stephen Milborrow" <milbo@icon.co.za>
Cc: <hugs-users@haskell.org>
Subject: Re: A Faster whatIs
References: <001e01c29850$6144ce20$b839fea9@1>
From: Alastair Reid <alastair@reid-consulting-uk.ltd.uk>
Date: 30 Nov 2002 14:54:51 +0000
In-Reply-To: <001e01c29850$6144ce20$b839fea9@1>
Message-ID: <yzzu1hzjef8.fsf@reid-consulting-uk.ltd.uk>
Lines: 51
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii


> Hello everyone: I made some changes to the Hugs sources which give
> small runtime speed gains.

I think you're the first person to look at optimizing Hugs in quite a
while.  Great stuff!

I toyed with changing whatIs a few years ago.  Instead of your
table-based approach, I was thinking of using a bitwise encoding.  My
idea was to use one bit to select int/not int and that everything else
would be interpreted as a triple of the form:

  (WhatisTag,ModuleId,Index)

Where, for example, the WhatisTag is 5 bits, the ModuleId is 10 bits
and the Index is 16 bits (the numbers will probably need tweaked
somewhat). So, each module would have its own set of tables for
storing names, types, etc.

There's a whole bunch of tags like BANG, COMP, ASPAT, etc. which don't
have any associated data - they'd be in a tuple of the form

  (WhatisTag,<unused>,MoreTag)

I figured this would be faster because most whatis tests could be
resolved just by looking at the tag.  I also figured it would lead to
a more flexible structure in Hugs because:

- We regularily overflow a bunch of hardwired limits in Hugs as we
  load bigger and bigger programs.  Programs are getting bigger not
  because modules are getting bigger but because we load more modules.
  This change would make those hardwired limits be per-module limits
  so Hugs ought to scale better.

- Hugs uses a stacklike method for loading modules.  If a module you
  loaded early changes, then all modules loaded after it have to be
  reloaded whether they depend on that module or not.  This change
  could lead the way to relaxing that constraint.

There's other ways to achieve both these goals - this just seemed like
a good way of killing 3 birds with one stone.


I wonder how the this change would stack up against your change?

--
Alastair

ps I'd really like to ask for your code so I can spend some time
looking at it but I have to spend more time on things that actually
pay me for the next few months so I doubt I'll manage it.  Sorry.

Return-Path: <hugs-users-admin@haskell.org>
Received: from smtp-6.worldonline.co.za ([192.168.128.76]) by
          istore-1.worldonline.co.za (Netscape Messaging Server 4.15) with
          ESMTP id H6MJQ700.6FT for <milbo@icon.co.za>; Thu, 5 Dec 2002
          04:18:07 +0200 
Received: from www.haskell.org ([127.0.0.1]) by
          smtp-6.worldonline.co.za (Netscape Messaging Server 4.15) with
          ESMTP id H6MJQ600.29D for <milbo@icon.co.za>; Thu, 5 Dec 2002
          04:18:06 +0200 
Received: from haskell.cs.yale.edu (unknown [127.0.0.1])
	by www.haskell.org (Postfix) with ESMTP
	id CA2684221DE; Wed,  4 Dec 2002 21:18:01 -0500 (EST)
Delivered-To: hugs-users@haskell.org
Received: from sofxp (dsl-209-162-216-129.dsl.easystreet.com
    [209.162.216.129]) by www.haskell.org (Postfix) with ESMTP id E7942422077
    for <hugs-users@haskell.org>; Wed,  4 Dec 2002 21:17:11 -0500 (EST)
Received: from sofxp ([127.0.0.1]) by sofxp with Microsoft
    SMTPSVC(6.0.2600.1106); Wed, 4 Dec 2002 18:19:41 -0800
Message-Id: <063901c29c04$c5086860$1e32a8c0@sofxp>
From: "Sigbjorn Finne" <sof@galois.com>
To: "Stephen Milborrow" <milbo@icon.co.za>
Cc: <hugs-users@haskell.org>
References: <001e01c29850$6144ce20$b839fea9@1>
    <001f01c29a12$609b3b30$1e32a8c0@sofxp> <000f01c29ad4$80c2f5c0$b839fea9@1>
Subject: Re: A Faster whatIs
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-Msmail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-Mimeole: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Originalarrivaltime: 05 Dec 2002 02:19:41.0926 (UTC) FILETIME=[C5086860:
    01C29C04]
Sender: hugs-users-admin@haskell.org
Errors-To: hugs-users-admin@haskell.org
X-BeenThere: hugs-users@haskell.org
X-Mailman-Version: 2.0.8
Precedence: bulk
List-Help: <mailto:hugs-users-request@haskell.org?subject=help>
List-Post: <mailto:hugs-users@haskell.org>
List-Subscribe: <http://www.haskell.org/mailman/listinfo/hugs-users>,
	<mailto:hugs-users-request@haskell.org?subject=subscribe>
List-Id: The Hugs Users Mailing List <hugs-users.haskell.org>
List-Unsubscribe: <http://www.haskell.org/mailman/listinfo/hugs-users>,
	<mailto:hugs-users-request@haskell.org?subject=unsubscribe>
List-Archive: <http://www.haskell.org/pipermail/hugs-users/>
Date: Wed, 4 Dec 2002 18:19:41 -0800

Hi there,

'diff -u' would be a Fine Choice & if the changes aren't
too big, I suggest Cc'ing the hugs-bugs list for the benefit
of people that don't use CVS.

Hugs survives on the contributions of the community, so your
contrib is most welcome (as is that of others who might have
some changes up their sleeves!)

--sigbjorn

----- Original Message -----
From: "Stephen Milborrow" <milbo@icon.co.za>
To: "Sigbjorn Finne" <sof@galois.com>
Cc: <hugs-users@haskell.org>
Sent: Tuesday, December 03, 2002 06:00
Subject: Re: A Faster whatIs


> Hello Sigbjorn,
> How should I send the files: diff,  diff -c, or all 6 complete changed
> source files?
> And to you directly, or to the hugs-users' group as well?  I'm not sure
what
> the protocol is here. In any case I will tar and gzip what I send.  Let me
> know what is easiest for you.
> Thanks,
> Stephen.
>
> ----- Original Message -----
> From: Sigbjorn Finne <sof@galois.com>
> To: Stephen Milborrow <milbo@icon.co.za>
> Cc: <hugs-users@haskell.org>
> Sent: Monday, December 02, 2002 4:52 PM
> Subject: Re: A Faster whatIs
>
>
> "Stephen Milborrow" <milbo@icon.co.za> writes:
> >
> > Hello everyone:
> >
> > I made some changes to the Hugs sources which give small runtime
> > speed gains.
>     ...
> >
> >
> > If anyone is interested, I would be happy to send the sources.
> > They are modified Nov 2002 sources with all the changes
> > demarcated by #define's. Six files changed.
> >
> > I would be interested to know what results these changes give on
> > other machines.
> >
>
> Hi Stephen,
>
> good writeup. I'd be interested in having a close look at your
> changes & very possibly integrate them into the CVS sources for the
> benefit of others to both use and test.
>
> --sigbjorn
>

_______________________________________________
Hugs-Users mailing list
Hugs-Users@haskell.org
http://www.haskell.org/mailman/listinfo/hugs-users
