Re: [Haskell-cafe] a sort of chunk

18 Jan 2020

      On Fri, Jan 17, 2020 at 07:08:37PM -0500, Viktor Dukhovni wrote:
...
An alternative implementation of splitWeight is lazier, and produces the
first element of the initial chunk in constant time even when the target
weight is large and requires billions of elements (on a 64bit system):
-- import Data.Bifunctor
splitWeight' :: Int -> (a -> Int) -> [a] -> ([a], [a])
    splitWeight' target weight xs = go 0 xs
      where
        go _ [] = ([], [])
        go n (h:ts)
            | let w = n + weight h
            , w < target    = first (h:) $ go w ts
            | otherwise     = ([h], ts)
-- Define directly, or import from Data.Bifunctor
        first f ~(a, b) = (f a, b)
And with the below all-in-one version:

    chunks :: Int -> (a -> Int) -> [a] -> [[a]]
    chunks target weight xs = go 0 xs
      where
        go _ [] = []
        go n (h:ts)
            | null ts       = [h] : []
            | let w = n + weight h
            , w < target    = cons1 h $ go w ts
            | otherwise     = [h] : go 0 ts
        cons1 h ~(a:b) = (h : a) : b

a strict fold of the generated list of chunks compiled with optimization
runs in:

    * constant space regardless of the input list length
    * ~constant space as the chunk size varies over 7 orders of magnitude.
    * ~constant time as the chunk size varies over 7 orders of magnitude.

The folds I'm testing look like:

    foldl' (\a l -> a + foldl' (+) 0 l) 0 $ chunks 1_000_000 (const 1) [1..100_000_000]

(with -XNumericUnderscores for readable large decimals).

For substantially smaller chunks (size 1 or 100 instead of 1_000_000),
this runs less than a factor of 2 slower than the version using my first
splitWeight, which benefits from tail call optimization and does not
significantly suffer for having to reverse the constructed chunks.

The lazier splitWeight' would be mostly useful in other contexts where
one might not consume most of the constructed chunk.  It is slower when
the result is consumed strictly.  Of course if one bothers with weighted
chunking, it seems likely that the subsequent processing will be strict.

-- 
    Viktor.