
Hi all, in my haskell exercises I often build list by appending values at the end. But I read somewhere that in haskell this is inefficient since there is no 'pointer to the tail of the list'. I've also seen some example of recursive functions which build the list tail-first and then in the base case of the recursion returns the reverse of the accumulated results, counting on the fact that in haskell 'reverse list' just means 'consume the list from the tail'. Just out of curiosity (I have no need to speed up my little programs with which I try to teach myself some haskell): is this a consolidated pattern? And are append really expensive? Ciao ------- FB

In general, you want to be building a list "from the front". Consider the list constructor-- x:xs is a list, when x is an element, and xs is a list; on the other hand, xs:x will give you a type error. I think the lisp syntax is a bit more elucidating; in lisp, you use "cons" to create a pair e.g. (cons 1 2), and to create a list, you make a pair whose last element is empty, and cons elements to the front. E.g. (cons 2 '()) gives you the list '(2), which is lisp for [2], and (cons 1 (cons 2 (cons 3 (cons 4 '() ) ) ) ) gives you '(1 2 3 4)-- lisp for [1,2,3,4]). In order to add the end of the list, you need to unravel the whole list, (replacing the '() at the end with "(cons 5 '())" ). On the other hand, adding to the front requires you to just cons another value to it. Haskell works the same way, but it uses an infix operator [1,2,3,4] = 1:2:3:4:[] = 1 : (2 : (3 : (4 : [] ) ) ). So appending to the list is expensive, while adding to the front just requires x : list. Obviously, if you're actually concatenating two lists, you need to filter through one of them anyway, but if you're just putting an element on, it's better to put it on the front. Normally, this is practically accomplished by making recursive procedures which work towards the end of the list. For example, map will apply a function to the first element and then map to the rest: map f (x:xs) = (f x) : (map f xs) map _ [] = [] So, you use your list as a stack-- pulling everything off, and then placing it all back on with the function applied. Does that help, or did I miss the point? Cheers, Cory Knapp Francesco Bochicchio wrote:
Hi all,
in my haskell exercises I often build list by appending values at the end. But I read somewhere that in haskell this is inefficient since there is no 'pointer to the tail of the list'. I've also seen some example of recursive functions which build the list tail-first and then in the base case of the recursion returns the reverse of the accumulated results, counting on the fact that in haskell 'reverse list' just means 'consume the list from the tail'.
Just out of curiosity (I have no need to speed up my little programs with which I try to teach myself some haskell): is this a consolidated pattern? And are append really expensive?
Ciao ------- FB ------------------------------------------------------------------------
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

2009/2/7 Cory Knapp
Does that help, or did I miss the point?
Ok, I'll try to be more clear with an example: the following function -
which surely can be written in better way - takes a list and a predicate and builds two lists. a list of lists of consecutive elements that satisfy the predicate and a list of separators, i.e. of elements that does not satisfy the predicate. That is: groups odd [1,3,2,5,9,6] -> [[1,3],[5,9]], [2,6] The function uses another function, groupWhile, which works like takeWhile but returns also the rest of the list. Now, from the way the function works, it shoud build the two result lists by appending new elements to them. But, as you said, appending is expensive in haskell, so instead I build the lists using (:), which gives me the list in reverse order, and then I use the 'final step' of the function - the base case of the recursion - to reverse the result. If I understand lazyness, this should be a low-cost operation because it does not actual build a reversed list, but only make the list to be 'consumed' from the tail. Now, this trick is not mine: I read it somewhere on internet (I can'tremember where), so I was asking if it is classified among the 'neat tricks' or among the 'ugly hacks' :-) Here is the code I was referring to: groups :: (a->Bool)-> [a] -> ([[a]],[a]) groups f lst = groups_ f ([],[]) lst where groups_ f (gs,seps) [] = (reverse gs, reverse seps) groups_ f (gs,seps) [a] = if (f a) then ( reverse ( [a]:gs), reverse seps ) else ( reverse gs , reverse (a:seps) ) groups_ f (gs, seps) lst = groups_ f (g:gs, r:seps) est where (g, (r:est)) = groupWhile f lst Ciao ------- FB

Francesco Bochicchio wrote:
But, as you said, appending is expensive in haskell
We can say how expensive. Namely, fully evaluating xs ++ ys takes O(length xs) time. Always appending an element at the end will lead to something like (..((([1] ++ [2]) ++ [3]) ++ [4]) ++ ..) ++ [n] which will take O(n^2) time instead of linear time. This is probably best demonstrated with the following two implementations of reverse reverse1 [] = [] reverse1 (x:xs) = reverse1 xs ++ [x] reverse2 xs = go [] xs where go ys [] = ys go ys (x:xs) = go (x:ys) xs The first runs in quadratic time while second runs in linear time.
so instead I build the lists using (:), which gives me the list in reverse order, and then I use the 'final step' of the function - the base case of the recursion - to reverse the result. If I understand laziness, this should be a low-cost operation because it does not actual build a reversed list, but only make the list to be 'consumed' from the tail.
No, reverse has nothing to do with laziness and still costs O(n) time. It's just that building the list in linear time and then reversing it in linear time is cheaper than building the list in quadratic time.
Here is the code I was referring to:
groups :: (a->Bool)-> [a] -> ([[a]],[a]) groups f lst = groups_ f ([],[]) lst where groups_ f (gs,seps) [] = (reverse gs, reverse seps) groups_ f (gs,seps) [a] = if (f a) then ( reverse ( [a]:gs), reverse seps ) else ( reverse gs , reverse (a:seps) ) groups_ f (gs, seps) lst = groups_ f (g:gs, r:seps) est where (g, (r:est)) = groupWhile f lst
While using reverse instead of appending at the end of the list is a good idea, not building the list in reverse at all is much better. First, let me restate your code as groups p xs = go ([],[]) xs where go (gs,seps) xs = let (g,rest) = span p xs in case rest of r:est -> go (g:gs, r:seps) est [] -> (reverse (g:gs), reverse seps) The groupWhile function can be found in the Prelude, it's called span . This function groups differs from yours, for instance we have groups odd [2,2,2] == ([[],[],[],[]],[2,2,2]) groups odd [1,2] == ([[1],[]],[2]) groups odd [1,1] == ([[1,1]],[]) By the way, your code crashes in the last case. Now, let's reformulate it so that the result won't be built in reverse. The key is to eliminate the unnecessary accumulating parameter and work with the result of the recursive call to groups directly: groups p xs = let (g,rest) = span p xs in case rest of r:est -> let (gs,seps) = groups p est in (g:gs,r:seps) [] -> ([g], []) I'd say that this is the canonical implementation. In case you wrote this function to split strings, have a look at http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split Regards, apfelmus -- http://apfelmus.nfshost.com

Francesco Bochicchio wrote:
Heinrich Apfelmus wrote:
No, reverse has nothing to do with laziness and still costs O(n) time. It's just that building the list in linear time and then reversing it in linear time is cheaper than building the list in quadratic time.
I see. Thanks for the information. I keep thinking to haskell lists as they are 'generators', but this is not always the case ... and I realize now that there is no way to reverse the list without expanding it first.
Well, they are 'generators' from an operational point of view, in that elements are generated on demand. But other than that, they are just a tree : / \ 1 : / \ 2 : / \ 3 ...
Now, let's reformulate it so that the result won't be built in reverse. The key is to eliminate the unnecessary accumulating parameter and work with the result of the recursive call to groups directly:
groups p xs = let (g,rest) = span p xs in case rest of r:est -> let (gs,seps) = groups p est in (g:gs,r:seps) [] -> ([g], [])
I'd say that this is the canonical implementation.
The reason I used accumulators is that I try to teach myself to always write tail-recursive functions. I have been bitten by stack exhaustion in one of my first exercises (working on large amount of data), so I thought that non tail-recursive functions should be always avoided. But using accumulators often leads to appending to lists, so one has to find the balance between the two things...
Due to lazy evaluation, something that looks tail recursive is not necessarily tail recursive at all; I strongly advise to unlearn your habit. For example, the version of groups that uses an accumulator is less efficient that the one without. In general, a lazy approach to performance is best: just use the simplest possible implementation and postpone performance considerations to when the program actually takes ages to compute the result. Concerning stack exhaustion and large amounts of data, there is one very common pattern worth knowing, namely foldr vs. foldl' , see also http://en.wikibooks.org/wiki/Haskell/Performance_Introduction#Space http://book.realworldhaskell.org/read/functional-programming.html Other than that, there is no way around understanding Haskell's evaluation model properly. Regards, apfelmus -- http://apfelmus.nfshost.com

Yes, appending to the tail of a list is an expensive operation. You may ask yourself why there is no pointer to the end of a list. That's because the end is in general not known (e.g. if the list is produced by a computation, or if it's infinite). However if you know up front that your list will be finite and you don't rely on lazyness you may want to use a Data.Sequence instead
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/ Data-Sequence.html
Cheers, Adrian Am 07.02.2009 um 21:29 schrieb Francesco Bochicchio:
Hi all,
in my haskell exercises I often build list by appending values at the end. But I read somewhere that in haskell this is inefficient since there is no 'pointer to the tail of the list'. I've also seen some example of recursive functions which build the list tail-first and then in the base case of the recursion returns the reverse of the accumulated results, counting on the fact that in haskell 'reverse list' just means 'consume the list from the tail'.
Just out of curiosity (I have no need to speed up my little programs with which I try to teach myself some haskell): is this a consolidated pattern? And are append really expensive?
Ciao ------- FB _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (4)
-
Adrian Neumann
-
Cory Knapp
-
Francesco Bochicchio
-
Heinrich Apfelmus