
From: Jared Updike [mailto:jupdike@gmail.com]
Thanks for posting the code. It works on pretty large data sets (for example, a thousand Strings each) and I have a hunch that if I use Data.ByteString it would even work fast enough on my quarter meg text files (split on words, ~40,000 and ~50,000 words each) to use in place of GNU sdiff or diff. Did you use FastPackedString or ByteString to get performance you alluded to?
You'll notice that the primary interface is: diff :: Eq a => [a] -> [a] -> [Modification a] So it's not optimised for any kind of String input. The performance I'm alluding to is not something I've tested with any rigor, or even metrics to support. I simply needed to compare a couple of large text files which GNU diff didn't handle (I think when the input is over a certain size it breaks it into chunks and diffs the chunks). I tried this code, and it did the job in a reasonable time (my PC has 1G of memory, and that helps when the input lists have to reside entirely in memory). That was using naïve String-IO too, so there might well be some mileage in a FPS-specific implementation. I profiled the code by generating large inputs in memory, and then invoking the various interfaces, so as to isolate the testing from IO performance. And that was just to find the performance hotspots in the algorithm, not to compare against other diff implementations. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************