
On Friday 03 September 2010 00:22:14, Jan Christiansen wrote:
Hi,
On 02.09.2010, at 13:41, Daniel Fischer wrote:
takes a little to run and keeps the entire file until the first occurrence of pat in memory.
first of all thanks very much for the detailed instructions.
I have rewritten the example slightly using Strings instead of Bytestrings. Replacing all occurrences of 'ä' by "ä" in the collected works of Shakespeare ; ) has a maximum memory usage of around 65MB with the current implementation of intersperse while it has a maximum memory usage of only around 5KB with the less strict implementation.
No surprise, there aren't many 'ä's in Shakespeare's works, are there?
I think this is due to the Wadler tuple space leak.
Yup.
The same would apply to the current implementation of lines. I wonder whether an implementation of lines analogous to splitBy has any disadvantages.
Hardly, but yes. 'break' constructs a pair pretty immediately, so case break p (x:xs) of (pre, post) -> pre : case post of [] -> [] (y:ys) -> stuff can only do harm if (p x) diverges, but then it does. Currently, lines (_|_ : rest) = _|_ : _|_ while withe the break, we'd have lines' (_|_ : rest) = _|_ On the other hand, the current implementation of lines does not seem to suffer from Wadler's tuple space leak (according to one test I made), so I'd stick with the current implementation for the time being.
That is, we could have
intersect [] _|_ = [] and intersect (_|_:_|_) [] = []
or
intersect [] (_|_:_|_) = [] and intersect _|_ [] = []
and the current implementation satisfies neither.
Right. So the question is, has the current implementation advantages over either of these? (I don't see any.) If not, which of these two behaviours is preferable?
I'd prefer the first one as it is in line with the left to right pattern matching of Haskell.
Moi aussi.