
Am Montag 19 April 2010 17:17:11 schrieb Arnoldo Muller:
The strings will not be longer than 30 characters.
For 20 -30 character strings, using ByteStrings should be better, in my tests about 40% faster, allocation figures slightly lower, resident memory much lower and bytes copied during GC very much lower. For a sample of english text (many short words, few long), ByteStrings were about 25% faster, allocation figures very slightly lower, resident memory much lower, bytes copied much lower (relative difference not as large as for longer strings).
I am doing sets of 2000 (total of 2000^2 distance computations)
That shouldn't give memory problems either way.
I am expecting that all the operations will be lazyly performed but at some point I get a memory error.
My guess is a bad consumption pattern.
Most of the memory is being allocated for the hamming distance and I am still unable to find the source of my memory leak.
Allocation as such is not a problem, resident memory is the important thing. Try heap profiling to see what holds on to memory (+RTS -hc would be a good first run).
Regards,
Arnoldo
On Mon, Apr 19, 2010 at 3:47 PM, Daniel Fischer
Am Montag 19 April 2010 14:37:33 schrieb John Lato:
Is it really necessary to use Strings? I think a packed type, e.g. Vector or ByteString, would be much more efficient here.
Not very much if the strings are fairly short (and the list isn't too long, so there's not a big difference in cache-friendliness). If eight-bit characters aren't enough, packing the strings into UArray Int Char gives performance quite close to ByteStrings.
Of course this is only likely to be a benefit if you can move away from String entirely.
I suspect that "hamming2" would perform better then.
John