
1 Jun
2011
1 Jun
'11
6:46 a.m.
On Wednesday 01 June 2011 12:28:28, John Lato wrote:
There are a few solutions to this. The first is to make a copy of the bytestring so only the required data is retained. In my experiments this wasn't helpful, but it would depend on your corpus. The second is to start with smaller chunks.
The third, check whether the word is already known, and *make a copy if not*. That should only keep the required parts (including the currently processed chunk) in memory.