
Just a quick status report, and to note a couple of lessons learned: Things work adequately, as far as I can tell. I can now process heaps of data, without blowing up anything. Appears to be faster than spam-stat.el, at least, although I haven't measured. I'm back to using "readFile" for file IO, and it works nicely, as long as I make sure all the file is processed. I think this is a good way of processing large amounts of data (where the processing reduces the data size), reading the entire file into memory strictly is quickly going to be too costly (expanded to linked lists of unicode, ugh) Don't trust finiteMap to evaluate anything. I have evidence one of the major space leaks was FM only evaluating the strings used as keys to the point they were proved unique. (Is this right?) Strictifying the strings helped a lot. One question though, about hFlush. I print out the status by repeatedly putStr'ing "blah blah \r". With NoBuffering set, it works, but when following the putStr with 'hFlush stdout', it doesn't (only outputs very sporadically. I guess I'm misunderstanding the function of hFlush, anybody care to elaborate?) And a final lesson, unlike cockroaches, computer bugs hide in light as well in the darkness. One bug in the very trivial token parsing code caused a lot of words that should have been ignored to be included. Thanks to everybody who helped out. -kzm -- If I haven't seen further, it is by standing in the footprints of giants