Re: [Haskell-cafe] haskell in online contests

28 Nov 2009

      wow I just woke up to see this :). Im impressed at the speed of the
response, thanks Daniel

Bad news first.
...
a) According to codechef, you must also consider digits.
you're right, I totally missed this. Thanks :)

b) Your distance function is wrong.
...

...
With idx i j = (i+1)*(j+1) - 1, you write to the same position several
times, resulting in
garbage. You should use idx i j = i*(n+1) + j.
 Unfortunately, that slows things down.
wow that's just such an incredibly "doh" moment for me. I had initially
written the array as being indexed by a tuple (i,j) and later to speed
things up I moved to a single dimensional array without realising I had done
something so dumb.
...
Now some good news.
a) Since the substitution cost for a pair of letters doesn't depend on the
strings, you
can make a universal substitution-cost matrix  (UArray (Char,Char) Int) and
read the cost
off that. Doesn't work wonders, but speeds things up a little.
yes this makes a lot of sense. I had initially kept the letterHash outside.
I moved it inside thinking to encapsulate it into substition cost because it
didnt change the time much. But making the entire substitution cost matrix
fixed makes a lot of sense
...
b) If the lengths of the two strings differs by more than 2, the
Levenshtein distance is
at least 3, so you needn't calculate. This was probably your intention, but
laziness
doesn't quite work the way you thought (if I interpreted your intentions
correctly).
With
distance orig new = memf m n
     where
       m = snd $ bounds orig
       n = snd $ bounds new ...
, if |m - n| > 2, the thunks for the array entries must still be written -
although most
needn't be evaluated in this case, that still takes a lot of time.
Make it
distance orig new = f m n
and no thunks need be written at all in this case.
Cuts down running time by nearly half :)
wow yes. I was too obsessed with how I had seen the fibonacci example of
memoisation that I didnt think of this. Also I think I still dont pay enough
attention to thunks and the time they take. So the problem here is when I
calculate memf an entire array of thunks is written down and then the last
one is being evaluated. So I could avoid the array creation. Makes sense to
me :)
...
I think you could speed it up significantly by calculating the distance
more lazily.
I'd love to hear your thoughts on how that might happen? I thought the whole
thing was inherently lazy?
...
The profiling output is pretty straightforward. You have two functions that
take up more
or less all the time, one is substitutionCost, the other distance.
The former is comparatively benign, the letterHash should be calculated
only once and kept
alive through the entire run, but you use a Map.lookup and `elem` (plus a
branch); with 26
letters, a lookup takes on average about 4 comparisons, then in general two
comparisons
for `elem`. An array-lookup will be much faster.
aha right.
...
The latter uses really a lot of time. As said before, a large part of it is
because you're
not lazy enough.  Still, it is a complicated calculation, so it remains a
time-consuming
task.
For more detailed information which parts of it take the most time, add
further cost
centres ({-# SCC "thing" #-} annotations). Be aware however, that profiling
often rules
out optimisations and thus changes the run-time behaviour of your code.
...
thanks
thanks a bunch. Im gonna make those changes here now :). Might take me a
while though cause my haskell code writing speed is still a bit slow =p.

Re: [Haskell-cafe] haskell in online contests

vishnu