
Right, I wouldn't use DList for this. Here's an alternative I use:
data AList a = ANil | ASing a | Append (AList a) (AList a)
lenA :: AList a -> Int lenA ANil = 0 lenA (ASing _) = 1 lenA (Append l r) = lenA l + lenA r
appendA ANil r = r appendA l ANil = l appendA l r = Append l r
Note how appendA is strict(ish) and eliminates ANils, so in a writer monad it shouldn't build up a space leak. I'm sure you can write toList (don't use (++) though).
I hope you're not overestimating me :) I thrashed around for a good long time trying to get (++) to associate to the right, and then stumbled across OrdList in ghc, which is apparently just this data structure, with the addition of 'Many [a]'.
I'd put the bang on ws:
m>>= k = WriterT $ do (a, w) <- runWriterT m (b, w') <- runWriterT (k a) let !ws = w `mappend` w' return (b, ws)
The problem with this monad is that >>= isn't tail-recursive, so it will cause stack overflows on recursive monadic functions. I suspect that a better alternative to the strict writer monad is the strict state monad in most cases, because its bind is tail-recursive.
Bang on 'ws' was my first instinct too, but it appeared to have no effect. I tried to work out a manual reduction for >>= to figure out exactly what is being forced when, but I got a headache and decided to do something more relaxing, like watch Primer. Yes, I suppose Writer's >>= isn't tail recursive... which seems like a disaster for >>=, since they are usually composed into very long chains. I know non-tail recursiveness isn't necessarily a disaster in haskell because of laziness, but it seems like you need just one strict monad in the stack, like ErrorT or IO and you are no longer lazy. I implemented LoggerT as a strict state monad, and I think it got better, but there's still a lot of garbage coming from somewhere. AppendList actually seems slower in the face of repeated appends than (:) followed by reverse, as I mentioned in the other email. Or it could be I'm not measuring things right...
I also have a situation where -hb shows about 4mb of drag at the most, but when I run with '-hc -hbdrag', I only see 10k peaks. Shouldn't filtering the graph by drag add up to as much drag as when the graph isn't filtered?
That sounds suspicious. If you can make a self-contained example that demonstrates it and create a ticket, that would be a great help.
Ok, I'll try to reduce this to a manageable size. It's the same code that produces the deadlock while profiling, so perhaps that's related. thanks again for the advice!