> On Mon, 30 Jul 2001, Simon Marlow wrote:
>
> > I've looked at a few. The most common factor in the worst
> performers
> > seems to be the performance of String-related operations. I tried
> > converting a few to use PackedStrings but that didn't help much. I
> > suspect our PackedString implementation could do with some
> tuning, and
> > we could gain considerable benefit from having an
> hGetLinePS function
> > (like hGetLine but gets a PackedString).
> >
> Well, I think the common factor is I/O. The lazy I/O seems to
> be a real
> bottleneck here. Trying to improve that would gain even more, I think.
> I'm saying this because I did quite a bit of fiddling with some of the
> examples (including trying PackedStrings). The conclusion I drew from
> doing this was that the I/O performance was the problem.
Agreed - I/O is indeed a bottleneck, but I believe it's the
strings-as-lists-of-characters aspect of I/O rather than the lazy aspect
that's the killer.
Enclosed is a version of the spell-checker program that is about a
factor of 3 faster than the one that Doug currently has; it uses
PackedStrings and a home-grown hash table. The only down side is that
it reads the entire input into a big PackedString before producing
anything, and profiling suggests that it spends about half its time
splitting the big string into lines. I think if I hack up an hGetLinePS
I might be able to get another factor of 2 out of it.
Cheers,
Simon