Re: [Haskell-cafe] NLP libraries and tools?

On 7/7/11 3:38 AM, Aleksandar Dimitrov wrote:
On Wed, Jul 06, 2011 at 07:27:10PM -0700, wren ng thornton wrote:
I definitely agree with the iteratees comment, but I'm curious about the leaks you mention. I haven't run into leakiness issues (that I'm aware of) in my use of ByteStrings for NLP.
The issue is this: strict ByteStrings retain pointers to the original chunk. The chunk is probably bigger than you'd want to keep in memory, if you, say, wanted to just keep one or two words. In my case, the chunk was some 65K (that was my Iteratee chunk size.)
Oh, that issue. Yeah, I maintain an intern table and make sure that the copy in the table is a trimmed copy instead of keeping the whole string alive. I guess I should factor that part of my tagger out into a separate package :) I didn't know if you meant there was a technical issue, e.g. something about the fact that ByteStrings uses pinned memory (whereas Text doesn't IIRC). -- Live well, ~wren
participants (1)
-
wren ng thornton