
7 Jul
2011
7 Jul
'11
2:27 a.m.
On 7/6/11 6:45 PM, Aleksandar Dimitrov wrote:
One hint, if you ever find yourself reading in quantitative linguistic data with Haskell: forget lazy IO. Forget strict IO, except your documents aren't ever bigger than a few hundred megs. In case you're not keeping the whole document in memory, but you're keeping some stuff in memory, never keep it around in ByteStrings, but use Text or SmallString (ByteStrings will invariably leak space in this scenario.) Learn how to use Iteratees and use them judiciously.
I definitely agree with the iteratees comment, but I'm curious about the leaks you mention. I haven't run into leakiness issues (that I'm aware of) in my use of ByteStrings for NLP. -- Live well, ~wren