Performance of delete-and-return-last-element

Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list: darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys) There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like: darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys)) But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC? - Lucas

I don't think a really smart compiler can make that transformation. It
looks like an exponential-time algorithm would be required, but I can't
prove that.
GHC definitely won't...
For this specific example, though, I'd probably do:
darle :: [a] -> (a, [a])
darle xs =
case reverse xs of
[] -> error "darle: empty list"
(x:xs) -> (x, reverse xs)
- Clark
On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul
Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list:
darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys)
There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like:
darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))
But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC?
- Lucas
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

isn't this what zippers are for? b On Aug 30, 2013, at 1:04 PM, Clark Gaebel wrote:
I don't think a really smart compiler can make that transformation. It looks like an exponential-time algorithm would be required, but I can't prove that.
GHC definitely won't...
For this specific example, though, I'd probably do:
darle :: [a] -> (a, [a]) darle xs = case reverse xs of [] -> error "darle: empty list" (x:xs) -> (x, reverse xs)
- Clark
On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul
wrote: Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list: darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys)
There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like:
darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))
But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC?
- Lucas
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

One solution would be to fold over a specific semigroup instead of a recursive function: |import Data.Semigroup import Data.Foldable(foldMap) import Data.Maybe(maybeToList) data Darle a =Darle {getInit :: [a],getLast ::a } deriving Show instance Semigroup (Darle a)where ~(Darle xs1 l1) <> ~(Darle xs2 l2) =Darle (xs1 ++ [l1] ++ xs2) l2 darle :: [a] ->Darle a darle = foldr1 (<>) . map (Darle [])| It's somewhat more verbose, but the core idea is clearly expressed in the one line that defines |<>|, and IMHO it better shows /what/ are we doing rather than /how/. It's sufficiently lazy so that you can do something like |head . getInit $ darle [1..]|. Best regards, Petr Dne 08/30/2013 08:18 PM, Lucas Paul napsal(a):
Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list:
darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys)
There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like:
darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))
But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC?
- Lucas
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am 31.08.13 14:35, schrieb Petr Pudlák:
One solution would be to fold over a specific semigroup instead of a recursive function:
|import Data.Semigroup import Data.Foldable(foldMap) import Data.Maybe(maybeToList)
data Darle a =Darle {getInit :: [a],getLast ::a } deriving Show instance Semigroup (Darle a)where ~(Darle xs1 l1) <> ~(Darle xs2 l2) =Darle (xs1 ++ [l1] ++ xs2) l2
darle :: [a] ->Darle a darle = foldr1 (<>) . map (Darle [])|
It's somewhat more verbose, but the core idea is clearly expressed in the one line that defines |<>|, and IMHO it better shows /what/ are we doing rather than /how/. It's sufficiently lazy so that you can do something like |head . getInit $ darle [1..]|.
I am wondering why you put the Semigroup instance there and what the
other imports are for. Doesn't this work just as well?
data Darle a = Darle {getInit :: [a], getLast :: a}
deriving Show
~(Darle xs1 l1) <> ~(Darle xs2 l2) = Darle (xs1 ++ [l1] ++ xs2) l2
darle :: [a] ->Darle a
darle = foldr1 (<>) . map (Darle [])
Seems to work here. I am still puzzled, though, if this is really a good
idea performance-wise. I am afraid I don't understand it well enough.
Harald
--
Harald Bögeholz

Am 31.08.13 14:35, schrieb Petr Pudlák:
One solution would be to fold over a specific semigroup instead of a recursive function:
|import Data.Semigroup import Data.Foldable(foldMap) import Data.Maybe(maybeToList)
data Darle a =Darle {getInit :: [a],getLast ::a } deriving Show instance Semigroup (Darle a)where ~(Darle xs1 l1) <> ~(Darle xs2 l2) =Darle (xs1 ++ [l1] ++ xs2) l2
darle :: [a] ->Darle a darle = foldr1 (<>) . map (Darle [])|
It's somewhat more verbose, but the core idea is clearly expressed in the one line that defines |<>|, and IMHO it better shows /what/ are we doing rather than /how/. It's sufficiently lazy so that you can do something like |head . getInit $ darle [1..]|. I am wondering why you put the Semigroup instance there and what the other imports are for. Doesn't this work just as well? Sorry, the two other imports are redundant, I forgot to erase them when
Dne 09/01/2013 09:13 PM, Harald Bögeholz napsal(a): playing with various ideas. The Semigroup instance of course isn't necessary for this particular purpose. But having it (1) signals that the operation satisfies some laws (associativity) and (2) allows the structure to be reused anywhere where a Semigroup is required. For example, we can wrap it into `Option` to get a monoid, and perhaps use it in `foldMap`. This way we extend the functionality to empty collections: ```haskell darle :: Foldable f => f a -> Maybe (Darle a) darle = getOption . foldMap (Option . Just . Darle []) ``` Best regards, Petr
participants (5)
-
Ben
-
Clark Gaebel
-
Harald Bögeholz
-
Lucas Paul
-
Petr Pudlák