
Hello, On 11/29/2011 10:57 PM, Stephen Tetley wrote:
Hi Hugo
What is a POSTags and how big do you expect it to be?
type Token = String type Tag = String type NGramTag = (Token, Tag, Tag) type POSTags = Z.Zipper NGramTag
Generally I'd recommend you first try to calculate the size of your data rather than try to strictify things, see Johan Tibell's very useful posts:
http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
According to size in String I am expecting a maximum of 50 Mega. Profiling (after a painful 80 minutes) shows: total alloc = 20,350,382,592 bytes Way too much.
Once you know the size of your data - you can decide if it is too big to comfortably work with in memory. If it is too big you need to make sure you're are streaming[*] it rather than forcing it into memory.
If POSTags is large, I'd be very concerned about the top line of updateState - reversing lists (or sorting them) simply doesn't play well with streaming.
The zipper does quite a bit of reversing and appending. I also need to reverse lists to retain the order of the characters (text). I also do sorting but I have eliminated this in the tests. So my question: how can one "force" the reversing and append? Anyone? TIA, Hugo F.
[*] Even in a lazy language like Haskell, streaming data isn't necessarily automatic.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners