[darcs #222] performance problem with massive changes in a single file

20 Feb 2005

      I'm cc'ing haskell-cafe on this darcs bug, since I imagine there might
be a student (or professor) hanging around there who might be interested
in the implementation of an efficient LCS algorithm in haskell.

[markjugg - Sun Feb 20 09:15:57 2005]:
...
Typically in software, 1000's of line aren't changed at the time, which
is why this hasn't been an big issue before.
Indeed.  The issue here is that darcs uses the exact Hunt-Szymanski LCS
algorithm when it computes the diff, while GNU diff uses an approximate
--but faster--algorithm, which works "almost" as well.  The approximate
algorithm is reasonably well-documented, but more complex to implement
than the exact one, just because it's harder to understand.

This would be a reasonable project for an interested haskell CS student.
 The LCS is implemented in pure haskell 98, and is of signature

lcs :: Ord a => [a] -> [a] -> [a]

It's an interesting problem that's easily testable, and has different
scaling behavior in different limits (finite alphabet, infinite
alphabet, permutations of unique objects).

David (ever optimistic for new contributors to darcs!)
http://darcs.net

David Roundy via RT

ajb＠spamcop.net

tags

participants (2)