
Bertram Felgenhauer wrote:
I'm fairly certain that the stack overflow is (indirectly) caused by Data.Binary, not Data.Map.
Yes, I think you are right. At least it seems clear that the stack overflow is not directly caused by fromDistinctAscList.
The result of 'decode' is a list of known length with unevaluated elements (because the Get monad is lazy) [*], where each element depends on the previous one (because reading is sequential, and the elements of the list may have nonuniform size). Now when you evaluate the last element, that will demand the last but one element, which in turn will demand the previous element, and so on. If the list is long enough, you'll get a stack overflow.
Using fromAscList instead of fromDistinctAscList cures the problem, because it inspects every key in the list, starting from the beginning, in order to eliminate duplicates. fromDistinctAscList, on the other hand, does not inspect the keys at all.
do [path] <- getArgs m <- liftM decode (BS.readFile path)::IO [((Int, Maybe String), Int)] putStrLn . show . last $ m
should also exhibit the same stack overflow.
Indeed it does. So it seems if I understand correctly, this is yet another example what I would call the lazy applied strict thunk chain problem (I'm sure you understand what I mean by that). The standard "party line" advice in such circumstances is to prevent their formation by evaluating eagerly as they are applied. However, this is often not possible for users of libraries they don't "own", if the API does not provide the necessary strictness control. I would argue that even if it possible is sometimes undesirable too (if evaluation is expensive, saves nothing in heap use and the final value may never be needed). So I still think the stack management system should be designed so that as far as is practical (given finite memory), any expression that can be built on the heap can also be evaluated without causing a "stack overflow". But I guess this rant is not much help to the OP :-) Regards -- Adrian Hey