
Hello, I'm writing a small program to process Delicious [1] RSS feeds. I like look at the recent feeds to see what others have bookmarked recently. But, there are a lot of duplicates in the recent feeds as an entry is shown for each person who bookmarks an individual URL. I decided to write a small program that would trim out those that I've seen before. I wrote a small program that read a feed (initially just a on-disk copy of an RSS feed) and removed the duplicate items just within that feed. It worked great. Then, I wanted to add persistence, so this would maintain state from one run to the next. I decided to use Data.Binary to serialize the Data.Map I was using and re-load it each time. Unfortunately, making this change caused a "Stack Space Overflow" error and I couldn't track down what was wrong. This was with GHC 6.8.2. I recently upgraded to GHC 6.10.1 and the memory just grows unbounded, until it actually locks up my machine. This happens even when I comment out the code for the serialization / de-serialization of the map, so essentially the only difference from my prior version is the function where the map is initialized returns IO [Item] instead of [Item]. The latest version of my code is up on github [2], and the sample RSS feed I was processing is included in the repo. I'd appreciate some help in how to attack this problem. I've even tried profiling this (back when I was using 6.8.2) and there was nothing enlightening there, at least with my limited Haskell experience. I am unsure of how to get this to work, or if the problem is even my code. Additionally, I am unsure if my serialization code would work anyway. Because Haskell is not pass-by-reference, would the changes to the seenMap propogate back to my deDupWithSerializedMap function where it is serialized? If not, how would I go about doing this? I think part of my problem might be the difference between pure and impure code and how to separate it. Thanks for the help! [1] http://www.delicious.com/ [2] http://github.com/Nafai77/recent-feeds/tree/master --------------- Travis B. Hartwell Software Toolsmith Blog: http://www.travishartwell.net/blog Where to find me: http://www.travishartwell.net/blog/static/where_to_find_me