
Dear Cafe, Here's a basic exercise in list processing: define a function runs :: Ord a => [a] -> [[a]] that breaks up its input list into a list of increasing "runs"; e.g., runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]] A natural solution is the following: runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys) My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?) Todd Wilson

If you use a case statement instead of a let statement in the recursive
case, then GHC will know the pairs are being made and immediately taken
apart, and will optimize it to use unboxed pairs internally.
On Thu, Mar 16, 2023, 20:33 Todd Wilson
Dear Cafe,
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

By "using unboxed pairs", do you mean that Haskell optimizes this so that
it is equivalent somehow to the following Prolog version of my program?
runs([], []).
runs([X|Xs], [[X|Ys]|Rs]) :-
run(X, Xs, Ys, Zs),
runs(Zs, Rs).
run(_, [], [], []) :- !.
run(X, [Y|Ys], [Y|Us], Vs) :- X =< Y, !, run(Y, Ys, Us, Vs).
run(_, Ys, [], Ys).
Here, it is clear that, in the second clause for `runs`, computation is
happening on two fronts -- `Ys` and `Rs` -- and we can build the two conses
in the return value before we do the calls that fill in the missing
parts, so this ends up being a tail recursion. When the computation that
produces `Ys` finishes, the computation that produces `Rs` can resume.
Maybe this can best be explained functionally using continuations?
Todd Wilson
On Thu, Mar 16, 2023 at 6:38 PM Zemyla
If you use a case statement instead of a let statement in the recursive case, then GHC will know the pairs are being made and immediately taken apart, and will optimize it to use unboxed pairs internally.
On Thu, Mar 16, 2023, 20:33 Todd Wilson
wrote: Dear Cafe,
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

It seems that this solution is constantly building and breaking apart pairs.
At first glance it seems fine. In order to pull a sublist off the front of a list you’ll need to build a new list for that part, so the disassemble/reassemble is necessary there. You can use an “as pattern” to avoid re-creating `y:ys` in your else clause, but that’s somewhat minor. I don’t see anywhere else where you are pulling something apart and then recreating the same thing. Regarding the other response, an unboxed pair is just an optimization whereby a pair of values can be returned from a function without actually allocating heap storage, but it’s just reducing memory allocation, nothing conceptually more fancy. Jeff
On Mar 16, 2023, at 6:34 PM, Todd Wilson
wrote: Dear Cafe,
Here's a basic exercise in list processing: define a function runs :: Ord a => [a] -> [[a]] that breaks up its input list into a list of increasing "runs"; e.g., runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]] A natural solution is the following: runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys) My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Thu, Mar 16, 2023 at 06:33:27PM -0700, Todd Wilson wrote:
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
The key feature of this solution is that it is lazy in the tail of the list of runs. For example, the below completes quickly despite ostensibly working with an infinite list of runs. It is able to "emit" the first run as soon as a successort is smaller than its predecessor. {-# LANGUAGE ScopedTypeVariables #-} module Main (main) where runs :: forall a. Ord a => [a] -> [[a]] runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run :: a -> [a] -> ([a], [a]) run x [] = ([], []) run x l@(y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], l) main :: IO () main = print $ sum $ map sum $ take 100 $ runs $ concat $ map (\i -> [0..i]) [0..] It is also able to generate the leading elements of an infinite first run: main :: IO () main = print $ sum $ take 100 $ head $ runs $ [0..] Any constant factors are less important. -- Viktor.

I want to thank all of those who contributed to this thread. If I can
summarize the contributions (feel free to clarify any misrepresentations):
- The code I provided for this problem is probably optimal, except
perhaps for the small point about using an @-pattern in `run` to avoid
reconstructing `y:ys` in the last line -- but isn't this also something the
compiler could easily discover and take advantage of without explicit
direction?
- The compiler will avoid breaking down and reconstructing pairs on each
recursive call, even though that is how it is coded.
- Although nobody commented on my Prolog version (perhaps because this
is a Haskell list!), the point I made there about compiling this as a tail
call that doesn't grow the stack should also apply to Haskell (does it?).
One thing that's clear from the Prolog code is that it goes down the input
list and distributes the elements to one of two places -- the end of the
current run or the beginning of a new run -- interspersed with some
cons-cell creation. Is that what's happening in Haskell also?
- Reference was made to Haskell's `sort` implementation, which
essentially finds runs (increasing and decreasing) in the input before
repeatedly merging them in pairs until one run remains. Decreasing runs can
be turned into increasing ones by using a tail call with a list
accumulator, the way iterative reverse works. But increasing runs are
handled in the same way that my Prolog code does, essentially by using what
appears to be the Haskell equivalent of Prolog's difference-lists:
functions whose arguments are the extendible ends of lists.
I'm intrigued by the idea mentioned in this last bullet point. Is this
functional equivalent of Prolog's difference lists -- essentially partial
data structures that are grown top-down rather than bottom-up, actually
efficient? Is there an overhead penalty to be paid for using functions
here, or is it compiled away into Prolog-like tail calls that fill in the
missing parts of the data structure?
Todd Wilson
On Thu, Mar 16, 2023 at 6:33 PM Todd Wilson
Dear Cafe,
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson

On Sun, Mar 26, 2023 at 10:59:13AM -0700, Todd Wilson wrote:
Is there an overhead penalty to be paid for using functions here, or is it compiled away into Prolog-like tail calls that fill in the missing parts of the data structure?
Efficient implementations of functional langauges make extensive use of tail call optimisation. I don't know whether looking at the GHC-generated "Core" for your implementation will answer any of your questions, but here it is: Haskell: runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys) Core (with some added comments): -- Unpack the "Ord" dictionary to extract just the required "<=" -- function and call the "$wruns" worker ("ww3" is "<=", and "ds" -- is the list to be transformed: -- runs :: forall a. Ord a => [a] -> [[a]] runs = \ (@a) ($dOrd :: Ord a) (ds :: [a]) -> case $dOrd of { C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> $wruns ww3 ds } Rec { $wruns :: forall {a}. (a -> a -> Bool) -> [a] -> [[a]] $wruns = \ (@a) (ww :: a -> a -> Bool) (ds :: [a]) -> case ds of { [] -> []; -- runs [] = [] : x xs -> -- runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs let { ds1 :: ([a], [a]) -- A lazily evaluated thunk for (run x xs) ds1 = letrec { -- Internal recursion in "run" returns strict unboxed pairs -- (on the stack) avoiding heap or thunk allocation for the tuple. $wrun :: a -> [a] -> (# [a], [a] #) $wrun = \ (x1 :: a) (ds2 :: [a]) -> case ds2 of wild1 { -- (y:ys) is "wild1" [] -> (# [], [] #); -- run x [] = ([], []) : y ys -> case ww x1 y of { -- x <= y ? False -> (# [], wild1 #); -- else ([], y:ys) True -> -- then let (us, vs) = run y ys in (y:us, vs) let { ds3 :: ([a], [a]) -- A "thunk" for (run y ys) evaluated lazily ds3 = case $wrun y ys of { (# ww1, ww2 #) -> (ww1, ww2) } } in (# : y (case ds3 of { (us, vs) -> us }), case ds3 of { (us, vs) -> vs } #) } }; } in case $wrun x xs of { (# ww1, ww2 #) -> (ww1, ww2) } } in : (: x (case ds1 of { (ys, zs) -> ys })) (case ds1 of { (ys, zs) -> $wruns ww zs }) } end Rec } So for a non-empty cons-cell (: x ...) the result is a new cons cell: : (x : (fst ds1)) (runs (snd ds1)) --------- -------------- in which both underline parts are computed lazily (on demand) from the thunk "ds1": λ> head $ head $ runs $ 42 : undefined 42 When we do want the successor of the first element, we look no futher than necessary: λ> head $ runs $ 42 : 0 : undefined [42] λ> take 2 $ head $ runs $ 42 : 43 : undefined [42,43] Does this help? FWIW, the (simplified) Core output was generated via: hscore() { ghc -ddump-simpl -dsuppress-idinfo -dsuppress-coercions -dsuppress-type-applications -dsuppress-uniques -dsuppress-module-prefixes "$@" } hscore -O2 Runs.hs > Runs.core -- Viktor.

YMMV, but based on some benchmarking I've done, the default implementation using `compareBy` seems to optimize better than trying to force dictionary unpacking. The only way I've been able to generically sort faster than base's `compareBy` is to use nonempty list structures for runs.
—
Sent from my phone with K-9 Mail.
On 26 March 2023 19:07:03 UTC, Viktor Dukhovni
On Sun, Mar 26, 2023 at 10:59:13AM -0700, Todd Wilson wrote:
Is there an overhead penalty to be paid for using functions here, or is it compiled away into Prolog-like tail calls that fill in the missing parts of the data structure?
Efficient implementations of functional langauges make extensive use of tail call optimisation.
I don't know whether looking at the GHC-generated "Core" for your implementation will answer any of your questions, but here it is:
Haskell: runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
Core (with some added comments): -- Unpack the "Ord" dictionary to extract just the required "<=" -- function and call the "$wruns" worker ("ww3" is "<=", and "ds" -- is the list to be transformed: -- runs :: forall a. Ord a => [a] -> [[a]] runs = \ (@a) ($dOrd :: Ord a) (ds :: [a]) -> case $dOrd of { C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> $wruns ww3 ds }
Rec { $wruns :: forall {a}. (a -> a -> Bool) -> [a] -> [[a]] $wruns = \ (@a) (ww :: a -> a -> Bool) (ds :: [a]) -> case ds of { [] -> []; -- runs [] = [] : x xs -> -- runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs let { ds1 :: ([a], [a]) -- A lazily evaluated thunk for (run x xs) ds1 = letrec { -- Internal recursion in "run" returns strict unboxed pairs -- (on the stack) avoiding heap or thunk allocation for the tuple. $wrun :: a -> [a] -> (# [a], [a] #) $wrun = \ (x1 :: a) (ds2 :: [a]) -> case ds2 of wild1 { -- (y:ys) is "wild1" [] -> (# [], [] #); -- run x [] = ([], []) : y ys -> case ww x1 y of { -- x <= y ? False -> (# [], wild1 #); -- else ([], y:ys) True -> -- then let (us, vs) = run y ys in (y:us, vs) let { ds3 :: ([a], [a]) -- A "thunk" for (run y ys) evaluated lazily ds3 = case $wrun y ys of { (# ww1, ww2 #) -> (ww1, ww2) } } in (# : y (case ds3 of { (us, vs) -> us }), case ds3 of { (us, vs) -> vs } #) } }; } in case $wrun x xs of { (# ww1, ww2 #) -> (ww1, ww2) } } in : (: x (case ds1 of { (ys, zs) -> ys })) (case ds1 of { (ys, zs) -> $wruns ww zs }) } end Rec }
So for a non-empty cons-cell (: x ...) the result is a new cons cell:
: (x : (fst ds1)) (runs (snd ds1)) --------- --------------
in which both underline parts are computed lazily (on demand) from the thunk "ds1":
λ> head $ head $ runs $ 42 : undefined 42
When we do want the successor of the first element, we look no futher than necessary:
λ> head $ runs $ 42 : 0 : undefined [42]
λ> take 2 $ head $ runs $ 42 : 43 : undefined [42,43]
Does this help? FWIW, the (simplified) Core output was generated via:
hscore() { ghc -ddump-simpl -dsuppress-idinfo -dsuppress-coercions -dsuppress-type-applications -dsuppress-uniques -dsuppress-module-prefixes "$@" } hscore -O2 Runs.hs > Runs.core
-- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Thanks, Viktor, for that information, especially the macro for getting
Haskel core output at the end, which I will try to use in the future and
perhaps avoid having to query this list to get such answers! I have a few
follow-up questions on this code, however:
On Sun, Mar 26, 2023 at 12:07 PM Viktor Dukhovni
I don't know whether looking at the GHC-generated "Core" for your implementation will answer any of your questions, but here it is:
Haskell: runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
Core (with some added comments): -- Unpack the "Ord" dictionary to extract just the required "<=" -- function and call the "$wruns" worker ("ww3" is "<=", and "ds" -- is the list to be transformed: -- runs :: forall a. Ord a => [a] -> [[a]] runs = \ (@a) ($dOrd :: Ord a) (ds :: [a]) -> case $dOrd of { C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> $wruns ww3 ds }
Rec { $wruns :: forall {a}. (a -> a -> Bool) -> [a] -> [[a]] $wruns = \ (@a) (ww :: a -> a -> Bool) (ds :: [a]) -> case ds of { [] -> []; -- runs [] = [] : x xs -> -- runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs let { ds1 :: ([a], [a]) -- A lazily evaluated thunk for (run x xs) ds1 = letrec { -- Internal recursion in "run" returns strict unboxed pairs -- (on the stack) avoiding heap or thunk allocation for the tuple. $wrun :: a -> [a] -> (# [a], [a] #) $wrun = \ (x1 :: a) (ds2 :: [a]) -> case ds2 of wild1 { -- (y:ys) is "wild1" [] -> (# [], [] #); -- run x [] = ([], []) : y ys -> case ww x1 y of { -- x <= y ? False -> (# [], wild1 #); -- else ([], y:ys) True -> -- then let (us, vs) = run y ys in (y:us, vs) let { ds3 :: ([a], [a]) -- A "thunk" for (run y ys) evaluated lazily
Why doesn't ds3 have an explicitly unboxed pair type, and does that have any performance implications? For example, ...
ds3 = case $wrun y ys of { (# ww1, ww2 #) -> (ww1, ww2) } } in (# : y (case ds3 of { (us, vs) -> us }), case ds3 of { (us, vs) -> vs } #)
Granted I'm not that familiar with core, but It sure looks like this code breaks apart pairs (with the equivalent of fst and snd) and rebuilds them
} }; } in case $wrun x xs of { (# ww1, ww2 #) -> (ww1, ww2) } } in : (: x (case ds1 of { (ys, zs) -> ys })) (case ds1 of { (ys, zs) -> $wruns ww zs }) } end Rec }
So for a non-empty cons-cell (: x ...) the result is a new cons cell:
: (x : (fst ds1)) (runs (snd ds1)) --------- --------------
in which both underline parts are computed lazily (on demand) from the thunk "ds1":
λ> head $ head $ runs $ 42 : undefined 42
When we do want the successor of the first element, we look no futher than necessary:
λ> head $ runs $ 42 : 0 : undefined [42]
λ> take 2 $ head $ runs $ 42 : 43 : undefined [42,43]
Does this help?
Yes, it does, thanks, although I was aware of this aspect of the laziness of my code from the beginning and was concerned more with how the output lists were built.
FWIW, the (simplified) Core output was generated via: hscore() { ghc -ddump-simpl -dsuppress-idinfo -dsuppress-coercions -dsuppress-type-applications -dsuppress-uniques -dsuppress-module-prefixes "$@" } hscore -O2 Runs.hs > Runs.core
Thanks again for this, it will be helpful going forward. --Todd

On Sun, Mar 26, 2023 at 04:24:09PM -0700, Todd Wilson wrote:
Core (with some added comments): -- Unpack the "Ord" dictionary to extract just the required "<=" -- function and call the "$wruns" worker ("ww3" is "<=", and "ds" -- is the list to be transformed: -- runs :: forall a. Ord a => [a] -> [[a]] runs = \ (@a) ($dOrd :: Ord a) (ds :: [a]) -> case $dOrd of { C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> $wruns ww3 ds }
Rec { $wruns :: forall {a}. (a -> a -> Bool) -> [a] -> [[a]] $wruns = \ (@a) (ww :: a -> a -> Bool) (ds :: [a]) -> case ds of { [] -> []; -- runs [] = [] : x xs -> -- runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs let { ds1 :: ([a], [a]) -- A lazily evaluated thunk for (run x xs) ds1 = letrec { -- Internal recursion in "run" returns strict unboxed pairs -- (on the stack) avoiding heap or thunk allocation for the tuple. $wrun :: a -> [a] -> (# [a], [a] #) $wrun = \ (x1 :: a) (ds2 :: [a]) -> case ds2 of wild1 { -- (y:ys) is "wild1" [] -> (# [], [] #); -- run x [] = ([], []) : y ys -> case ww x1 y of { -- x <= y ? False -> (# [], wild1 #); -- else ([], y:ys) True -> -- then let (us, vs) = run y ys in (y:us, vs) let { ds3 :: ([a], [a]) -- A "thunk" for (run y ys) evaluated lazily
Why doesn't ds3 have an explicitly unboxed pair type, and does that have any performance implications? For example, ...
Precisely because "ds3" must be evaluated lazily, it can't be an unboxed pair (which are always strictly evaluated).
ds3 = case $wrun y ys of { (# ww1, ww2 #) -> (ww1, ww2) } } in (# : y (case ds3 of { (us, vs) -> us }), case ds3 of { (us, vs) -> vs } #)
Granted I'm not that familiar with Core, but It sure looks like this code breaks apart pairs (with the equivalent of fst and snd) and rebuilds them
The inner (recursive) invocation of "run" must also be lazily evaluated, so yes, its output needs to be boxed as a pair.
When we do want the successor of the first element, we look no futher than necessary:
λ> head $ runs $ 42 : 0 : undefined [42]
λ> take 2 $ head $ runs $ 42 : 43 : undefined [42,43]
Does this help?
Yes, it does, thanks, although I was aware of this aspect of the laziness of my code from the beginning and was concerned more with how the output lists were built.
As with all lists, they are built via the (:) constructor from a head element and a tail. GHC reuses any full list or tail it can reuse, and constructs new cons cells that are not already in hand. Because these are pure functions working with immutable data, to construct an initial segment of a list we must build a new list, we can't truncate the original original in place, it is immutable. Therefore, the original list will be picked apart and reassembled. With "linear Haskell" there are in some cases opportunities to mutate certain objects in place, because they are sure to not have any other references. But that isn't the case here. So even `runs [0..10]` has to pick apart and reassemble the list. I hope I understood correctly what you're getting at with the concern about building and rebuilding. It looks to me like the Core code does exactly as much building and re-building as required by laziness, and the result can be consumed in a single pass in constant space (multi-pass use naturally memoises the result). -- Viktor.

Linear Haskell helps control in-place mutation, but that's still only in
mutable references and arrays. As I understand it, data built from regular
constructors is more fundamentally immutable because of the way GHC garbage
collection works.
On Sun, Mar 26, 2023, 8:00 PM Viktor Dukhovni
On Sun, Mar 26, 2023 at 04:24:09PM -0700, Todd Wilson wrote:
Core (with some added comments): -- Unpack the "Ord" dictionary to extract just the required "<=" -- function and call the "$wruns" worker ("ww3" is "<=", and "ds" -- is the list to be transformed: -- runs :: forall a. Ord a => [a] -> [[a]] runs = \ (@a) ($dOrd :: Ord a) (ds :: [a]) -> case $dOrd of { C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> $wruns ww3 ds }
Rec { $wruns :: forall {a}. (a -> a -> Bool) -> [a] -> [[a]] $wruns = \ (@a) (ww :: a -> a -> Bool) (ds :: [a]) -> case ds of { [] -> []; -- runs [] = [] : x xs -> -- runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs let { ds1 :: ([a], [a]) -- A lazily evaluated thunk for (run x xs) ds1 = letrec { -- Internal recursion in "run" returns strict unboxed pairs -- (on the stack) avoiding heap or thunk allocation for the tuple. $wrun :: a -> [a] -> (# [a], [a] #) $wrun = \ (x1 :: a) (ds2 :: [a]) -> case ds2 of wild1 { -- (y:ys) is "wild1" [] -> (# [], [] #); -- run x [] = ([], []) : y ys -> case ww x1 y of { -- x <= y ? False -> (# [], wild1 #); -- else ([], y:ys) True -> -- then let (us, vs) = run y ys in (y:us, vs) let { ds3 :: ([a], [a]) -- A "thunk" for (run y ys) evaluated lazily
Why doesn't ds3 have an explicitly unboxed pair type, and does that have any performance implications? For example, ...
Precisely because "ds3" must be evaluated lazily, it can't be an unboxed pair (which are always strictly evaluated).
ds3 = case $wrun y ys of { (#
ww1, ww2 #) -> (ww1, ww2) } } in
(# : y (case ds3 of { (us, vs) ->
us }),
case ds3 of { (us, vs) -> vs }
#)
Granted I'm not that familiar with Core, but It sure looks like this code breaks apart pairs (with the equivalent of fst and snd) and rebuilds them
The inner (recursive) invocation of "run" must also be lazily evaluated, so yes, its output needs to be boxed as a pair.
When we do want the successor of the first element, we look no futher than necessary:
λ> head $ runs $ 42 : 0 : undefined [42]
λ> take 2 $ head $ runs $ 42 : 43 : undefined [42,43]
Does this help?
Yes, it does, thanks, although I was aware of this aspect of the laziness of my code from the beginning and was concerned more with how the output lists were built.
As with all lists, they are built via the (:) constructor from a head element and a tail. GHC reuses any full list or tail it can reuse, and constructs new cons cells that are not already in hand.
Because these are pure functions working with immutable data, to construct an initial segment of a list we must build a new list, we can't truncate the original original in place, it is immutable.
Therefore, the original list will be picked apart and reassembled.
With "linear Haskell" there are in some cases opportunities to mutate certain objects in place, because they are sure to not have any other references. But that isn't the case here.
So even `runs [0..10]` has to pick apart and reassemble the list. I hope I understood correctly what you're getting at with the concern about building and rebuilding.
It looks to me like the Core code does exactly as much building and re-building as required by laziness, and the result can be consumed in a single pass in constant space (multi-pass use naturally memoises the result).
-- Viktor. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I don't know prolog, but I believe the answer to your last question is no.
The functions will actually be function closures in this case. There's no
way to leave a (native) Haskell list open to being extended at the end,
except to the extent that lazy evaluation does so. Somewhat separately,
while it would be possible for a Haskell implementation to construct
strict-spined list-like structures with constant stack space using tail
recursion modulo cons, GHC doesn't do that.
On Sun, Mar 26, 2023, 1:59 PM Todd Wilson
I want to thank all of those who contributed to this thread. If I can summarize the contributions (feel free to clarify any misrepresentations):
- The code I provided for this problem is probably optimal, except perhaps for the small point about using an @-pattern in `run` to avoid reconstructing `y:ys` in the last line -- but isn't this also something the compiler could easily discover and take advantage of without explicit direction? - The compiler will avoid breaking down and reconstructing pairs on each recursive call, even though that is how it is coded. - Although nobody commented on my Prolog version (perhaps because this is a Haskell list!), the point I made there about compiling this as a tail call that doesn't grow the stack should also apply to Haskell (does it?). One thing that's clear from the Prolog code is that it goes down the input list and distributes the elements to one of two places -- the end of the current run or the beginning of a new run -- interspersed with some cons-cell creation. Is that what's happening in Haskell also? - Reference was made to Haskell's `sort` implementation, which essentially finds runs (increasing and decreasing) in the input before repeatedly merging them in pairs until one run remains. Decreasing runs can be turned into increasing ones by using a tail call with a list accumulator, the way iterative reverse works. But increasing runs are handled in the same way that my Prolog code does, essentially by using what appears to be the Haskell equivalent of Prolog's difference-lists: functions whose arguments are the extendible ends of lists.
I'm intrigued by the idea mentioned in this last bullet point. Is this functional equivalent of Prolog's difference lists -- essentially partial data structures that are grown top-down rather than bottom-up, actually efficient? Is there an overhead penalty to be paid for using functions here, or is it compiled away into Prolog-like tail calls that fill in the missing parts of the data structure?
Todd Wilson
On Thu, Mar 16, 2023 at 6:33 PM Todd Wilson
wrote: Dear Cafe,
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Interesting. So one consequence of this, if I understand correctly, is that
the aforementioned library implementation
https://hackage.haskell.org/package/base-4.18.0.0/docs/src/Data.OldList.html...
of `sort` could actually be improved by rewriting the `ascending` function
there so that it returned a pair, as in my code, rather than had a
"difference-list" functional accumulator argument that is finally applied
in the base case.
--Todd
On Sun, Mar 26, 2023 at 12:53 PM David Feuer
I don't know prolog, but I believe the answer to your last question is no. The functions will actually be function closures in this case. There's no way to leave a (native) Haskell list open to being extended at the end, except to the extent that lazy evaluation does so. Somewhat separately, while it would be possible for a Haskell implementation to construct strict-spined list-like structures with constant stack space using tail recursion modulo cons, GHC doesn't do that.
On Sun, Mar 26, 2023, 1:59 PM Todd Wilson
wrote: I want to thank all of those who contributed to this thread. If I can summarize the contributions (feel free to clarify any misrepresentations):
- The code I provided for this problem is probably optimal, except perhaps for the small point about using an @-pattern in `run` to avoid reconstructing `y:ys` in the last line -- but isn't this also something the compiler could easily discover and take advantage of without explicit direction? - The compiler will avoid breaking down and reconstructing pairs on each recursive call, even though that is how it is coded. - Although nobody commented on my Prolog version (perhaps because this is a Haskell list!), the point I made there about compiling this as a tail call that doesn't grow the stack should also apply to Haskell (does it?). One thing that's clear from the Prolog code is that it goes down the input list and distributes the elements to one of two places -- the end of the current run or the beginning of a new run -- interspersed with some cons-cell creation. Is that what's happening in Haskell also? - Reference was made to Haskell's `sort` implementation, which essentially finds runs (increasing and decreasing) in the input before repeatedly merging them in pairs until one run remains. Decreasing runs can be turned into increasing ones by using a tail call with a list accumulator, the way iterative reverse works. But increasing runs are handled in the same way that my Prolog code does, essentially by using what appears to be the Haskell equivalent of Prolog's difference-lists: functions whose arguments are the extendible ends of lists.
I'm intrigued by the idea mentioned in this last bullet point. Is this functional equivalent of Prolog's difference lists -- essentially partial data structures that are grown top-down rather than bottom-up, actually efficient? Is there an overhead penalty to be paid for using functions here, or is it compiled away into Prolog-like tail calls that fill in the missing parts of the data structure?
Todd Wilson
On Thu, Mar 16, 2023 at 6:33 PM Todd Wilson
wrote: Dear Cafe,
Here's a basic exercise in list processing: define a function
runs :: Ord a => [a] -> [[a]]
that breaks up its input list into a list of increasing "runs"; e.g.,
runs [3,4,5,6,2,3,4,1,2,1] ---> [[3,4,5,6],[2,3,4],[1,2],[1]]
A natural solution is the following:
runs [] = [] runs (x:xs) = let (ys, zs) = run x xs in (x:ys) : runs zs where run x [] = ([], []) run x (y:ys) = if x <= y then let (us, vs) = run y ys in (y:us, vs) else ([], y:ys)
My question: can we do better than this? It seems that this solution is constantly building and breaking apart pairs. (Or is it, when optimized?)
Todd Wilson
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (6)
-
David Feuer
-
Jeff Clites
-
Keith
-
Todd Wilson
-
Viktor Dukhovni
-
Zemyla