a sort of chunk

newer
Taking over double-conversion on...

older
10 PhD studentships in Nottingham...

PICCA Frederic-Emmanuel

17 Jan 2020 17 Jan '20

10:10 a.m.

Hello, I need to write some code in order to do some computations with stack of images coming from hdf5 files. Each computation done per images is quite intensive, so I want to do this on multiple cores of my computer (24 in my case) A stack of images is composed of a list of files. each file contain n images. So I decided to proceed like this: create a data type which represent a chunk of this stack. data Chunk = Chunk { filename :: FilePath , from :: Int , to :: Int } And indeed the full stack is [Chunk] Now since I have n core, I need to split the full stack in equivalent chunk of images. Just for example. at the begining I have this image stack 30476 images [Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00077.nxs" 0 698,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00078.nxs" 0 1104,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00079.nxs" 0 1510,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00080.nxs" 0 1914,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00081.nxs" 0 2318,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00082.nxs" 0 2720,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00083.nxs" 0 2169,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00084.nxs" 0 2445,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00085.nxs" 0 2720,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00086.nxs" 0 490,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00087.nxs" 0 812,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00088.nxs" 0 1133,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00089.nxs" 0 1454,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00090.nxs" 0 1773,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00091.nxs" 0 2090,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00092.nxs" 0 2406,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00093.nxs" 0 2720] so i need to write a function which do something like chunksOf chunksOf :: Int -> [Chunk] -> [[Chunk]] In this case I need to create chunk of 1325 images, so the finla result should be [ [Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00077.nxs" 0 698,Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00078.nxs" 0 627] , [Chunk "/nfs/ruche-sixs/sixs-soleil/com-sixs/2019/Run3/FeSCO_Cu111/sample2_ascan_omega_00078.nxs" 627 1104, ...] ... ] I would like your help in order to give me advices or help me find information whcih can help me implement this chunksOf method. thanks for your help. Frederic

Show replies by date

PICCA Frederic-Emmanuel

17 Jan 17 Jan

3:57 p.m.

Hello i end up with this data Chunk = Chunk FilePath Int Int deriving Show chunkLen :: Chunk -> Int chunkLen (Chunk _ f t) = t - f chunks :: Int -> [Chunk] -> [[Chunk]] chunks n cs = reverse $ map reverse $ go cs [[]] 0 where go :: [Chunk] -> [[Chunk]] -> Int -> [[Chunk]] go [] _ _ = [] go [x@(Chunk fn f t)] (c:cs') acc = if acc + chunkLen x < n then (x : c) : cs' else go [(Chunk fn (f + n - acc) t)] ([] : ((Chunk fn f (f + n - acc)) : c) : cs') 0 go (x@(Chunk fn f t):xs) (c:cs') acc = if acc + chunkLen x < n then go xs ((x : c) : cs') (acc + chunkLen x) else go ((Chunk fn (f + n - acc) t) : xs) ([] : ((Chunk fn f (f + n - acc)) : c) : cs') 0 But i do not find this that elegant...

Viktor Dukhovni

10:41 p.m.

On Fri, Jan 17, 2020 at 03:57:18PM +0000, PICCA Frederic-Emmanuel wrote:

...

chunkLen :: Chunk -> Int chunkLen (Chunk _ f t) = t - f

chunks :: Int -> [Chunk] -> [[Chunk]] chunks n cs = reverse $ map reverse $ go cs [[]] 0 where go :: [Chunk] -> [[Chunk]] -> Int -> [[Chunk]] go [] _ _ = [] go [x@(Chunk fn f t)] (c:cs') acc = if acc + chunkLen x < n then (x : c) : cs' else go [(Chunk fn (f + n - acc) t)] ([] : ((Chunk fn f (f + n - acc)) : c) : cs') 0 go (x@(Chunk fn f t):xs) (c:cs') acc = if acc + chunkLen x < n then go xs ((x : c) : cs') (acc + chunkLen x) else go ((Chunk fn (f + n - acc) t) : xs) ([] : ((Chunk fn f (f + n - acc)) : c) : cs') 0

To your specific question I would refactor this a bit: -- | Split a list after the first element that reaches a target cumulative weight -- splitWeight :: Int -> (a -> Int) -> [a] -> ([a], [a]) splitWeight target weight xs = (,) <$> reverse . fst <*> snd $ go 0 xs [] where go _ [] acc = (acc, []) go n (h:ts) acc | let w = n + weight h , w < target = go w ts $ h : acc | otherwise = (h : acc, ts) -- | Partition a list into chunks, with each non-final chunk having -- a weight at least equal to the target. -- chunks :: Int -> (a -> Int) -> [a] -> [[a]] chunks target weight = unfoldr gen gen [] = Nothing gen xs = Just $ splitWeight target weight xs Example: λ> chunks 42 id [1..25] [[1,2,3,4,5,6,7,8,9],[10,11,12,13],[14,15,16],[17,18,19],[20,21,22],[23,24],[25]] Your weight function would be:

...

chunkLen :: Chunk -> Int chunkLen (Chunk _ f t) = t - f

That said, it feels like perhaps you're asking the wrong question. If you want parallelize monadic list reduction over multiple cores, perhaps something out of https://hackage.haskell.org/package/monad-par would meet your needs? You might find https://www.oreilly.com/library/view/parallel-and-concurrent/9781449335939/ a good resource. -- Viktor.

Viktor Dukhovni

18 Jan 18 Jan

12:08 a.m.

On Fri, Jan 17, 2020 at 05:41:12PM -0500, Viktor Dukhovni wrote:

...

To your specific question I would refactor this a bit:

-- | Split a list after the first element that reaches a target cumulative weight -- splitWeight :: Int -> (a -> Int) -> [a] -> ([a], [a]) splitWeight target weight xs = (,) <$> reverse . fst <*> snd $ go 0 xs [] where go _ [] acc = (acc, []) go n (h:ts) acc | let w = n + weight h , w < target = go w ts $ h : acc | otherwise = (h : acc, ts)

An alternative implementation of splitWeight is lazier, and produces the first element of the initial chunk in constant time even when the target weight is large and requires billions of elements (on a 64bit system): -- import Data.Bifunctor splitWeight' :: Int -> (a -> Int) -> [a] -> ([a], [a]) splitWeight' target weight xs = go 0 xs where go _ [] = ([], []) go n (h:ts) | let w = n + weight h , w < target = first (h:) $ go w ts | otherwise = ([h], ts) -- Define directly, or import from Data.Bifunctor first f ~(a, b) = (f a, b) λ> take 5 $ fst $ splitWeight' (maxBound :: Int) id [1..] [1,2,3,4,5] -- Viktor.

Viktor Dukhovni

12:12 p.m.

On Fri, Jan 17, 2020 at 07:08:37PM -0500, Viktor Dukhovni wrote:

...

An alternative implementation of splitWeight is lazier, and produces the first element of the initial chunk in constant time even when the target weight is large and requires billions of elements (on a 64bit system):

-- import Data.Bifunctor

splitWeight' :: Int -> (a -> Int) -> [a] -> ([a], [a]) splitWeight' target weight xs = go 0 xs where go _ [] = ([], []) go n (h:ts) | let w = n + weight h , w < target = first (h:) $ go w ts | otherwise = ([h], ts)

-- Define directly, or import from Data.Bifunctor first f ~(a, b) = (f a, b)

And with the below all-in-one version: chunks :: Int -> (a -> Int) -> [a] -> [[a]] chunks target weight xs = go 0 xs where go _ [] = [] go n (h:ts) | null ts = [h] : [] | let w = n + weight h , w < target = cons1 h $ go w ts | otherwise = [h] : go 0 ts cons1 h ~(a:b) = (h : a) : b a strict fold of the generated list of chunks compiled with optimization runs in: * constant space regardless of the input list length * ~constant space as the chunk size varies over 7 orders of magnitude. * ~constant time as the chunk size varies over 7 orders of magnitude. The folds I'm testing look like: foldl' (\a l -> a + foldl' (+) 0 l) 0 $ chunks 1_000_000 (const 1) [1..100_000_000] (with -XNumericUnderscores for readable large decimals). For substantially smaller chunks (size 1 or 100 instead of 1_000_000), this runs less than a factor of 2 slower than the version using my first splitWeight, which benefits from tail call optimization and does not significantly suffer for having to reverse the constructed chunks. The lazier splitWeight' would be mostly useful in other contexts where one might not consume most of the constructed chunk. It is slower when the result is consumed strictly. Of course if one bothers with weighted chunking, it seems likely that the subsequent processing will be strict. -- Viktor.

PICCA Frederic-Emmanuel

19 Jan 19 Jan

11:17 a.m.

Hello viktor import Data.List data T = T String Int Int deriving Show l :: [T] l = [ T "1" 0 30 , T "2" 0 40 ] tweight :: T -> Int tweight (T n f t) = t - f chunks :: Int -> (a -> Int) -> [a] -> [[a]] chunks target weight xs = go 0 xs where go _ [] = [] go n (h:ts) | null ts = [h] : [] | let w = n + weight h, w < target = cons1 h $ go w ts | otherwise = [h] : go 0 ts cons1 h ~(a:b) = (h : a) : b main :: IO () main = do let cs = chunks 10 tweight l print cs seems to work, but this is not what I want :)) picca@cush:~$ runhaskell test.hs [[T "1" 0 30],[T "2" 0 40]] I want at the end to split each of my T into chunk of length target. like this [[T "1" 0 10], [T "1" 10 20], [T "1" 20 30], [T "2" 0 10], ...] So a split function like this should be used split :: Int -> a -> (a, a) split s (T n f t) = (T n f s, T n s t) Cheers Frederic

Viktor Dukhovni

20 Jan 20 Jan

8:29 a.m.

On Sun, Jan 19, 2020 at 11:17:54AM +0000, PICCA Frederic-Emmanuel wrote:

...

..., but this is not what I want :))

I want at the end to split each of my T into chunk of length target.

like this

[[T "1" 0 10], [T "1" 10 20], [T "1" 20 30], [T "2" 0 10], ...]

So a split function like this should be used

split :: Int -> a -> (a, a) split s (T n f t) = (T n f s, T n s t)

{-# LANGUAGE BangPatterns #-} {-# LANGUAGE MultiWayIf #-} {-# LANGUAGE StandaloneDeriving #-} data W n a = W !a !n !n deriving instance (Show n, Show a) => Show (W n a) wget :: W n a -> a wget (W a _ _) = a weight :: Num n => W n a -> n weight (W _ l h) = h - l wsplit :: Num n => W n a -> n -> (W n a, W n a) wsplit (W a l h) n = ( (W a l (l + n)), (W a (l+n) h) ) chunk :: (Num n, Ord n) => n -> [W n a] -> [[W n a]] chunk target = go target target where go tgt _ [] = [] go tgt gap [x] = golast tgt gap x go tgt gap ~(x:xs) = let gap' = gap - weight x in if | gap' > 0 -> cons1 x $ go tgt gap' xs | gap' == 0 -> [x] : go tgt tgt xs | (x1, x2) <- wsplit x gap -> [x1] : go tgt tgt (x2 : xs) cons1 !x ~(c:cs) = (x : c) : cs golast tgt gap x = if | weight x <= gap -> [x] : [] | (x1, x2) <- wsplit x gap -> [x1] : golast tgt tgt x2 {-# SPECIALIZE chunk :: Int -> [W Int a] -> [[W Int a]] #-} -- Viktor.

PICCA Frederic-Emmanuel

10:03 a.m.

Hello victor I decided to use your solution, since I find it more elegant than mine :)) data Chunk n a = Chunk !a !n !n deriving instance (Show n, Show a) => Show (Chunk n a) cweight :: Num n => Chunk n a -> n cweight (Chunk _ l h) = h - l csplit :: Num n => Chunk n a -> n -> (Chunk n a, Chunk n a) csplit (Chunk a l h) n = ( (Chunk a l (l + n)), (Chunk a (l+n) h) ) chunk :: (Num n, Ord n) => n -> [Chunk n a] -> [[Chunk n a]] chunk target = go target target where go _ _ [] = [] go tgt gap [x] = golast tgt gap x go tgt gap ~(x:xs) = let gap' = gap - cweight x in if | gap' > 0 -> cons1 x $ go tgt gap' xs | gap' == 0 -> [x] : go tgt tgt xs | (x1, x2) <- csplit x gap -> [x1] : go tgt tgt (x2 : xs) cons1 !x ~(c:cs) = (x : c) : cs golast tgt gap x = if | cweight x <= gap -> [x] : [] | (x1, x2) <- csplit x gap -> [x1] : golast tgt tgt x2 {-# SPECIALIZE chunk :: Int -> [Chunk Int FilePath] -> [[Chunk Int FilePath]] #-} But when compiling I have these warning

...

src/Hkl/Projects/Sixs.hs:(59,5)-(65,69): warning: [-Wincomplete-uni-patterns] Pattern match(es) are non-exhaustive In a pattern binding: Patterns not matched: [] | 59 | go _ _ [] = [] | ^^^^^^^^^^^^^^^^^^^^^^^...

I do not understand this one

...

src/Hkl/Projects/Sixs.hs:67:5-35: warning: [-Wincomplete-uni-patterns] Pattern match(es) are non-exhaustive In a pattern binding: Patterns not matched: [] | 67 | cons1 !x ~(c:cs) = (x : c) : cs | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If this is something which can not be reach, is it possible to explain to ghc how to avoid these warning ? cheers Fred

Viktor Dukhovni

10:39 a.m.

On Mon, Jan 20, 2020 at 10:03:16AM +0000, PICCA Frederic-Emmanuel wrote:

...

chunk :: (Num n, Ord n) => n -> [Chunk n a] -> [[Chunk n a]] chunk target = go target target where go _ _ [] = [] go tgt gap [x] = golast tgt gap x go tgt gap ~(x:xs) = let gap' = gap - cweight x in if | gap' > 0 -> cons1 x $ go tgt gap' xs | gap' == 0 -> [x] : go tgt tgt xs | (x1, x2) <- csplit x gap -> [x1] : go tgt tgt (x2 : xs)

cons1 !x ~(c:cs) = (x : c) : cs

golast tgt gap x = if | cweight x <= gap -> [x] : [] | (x1, x2) <- csplit x gap -> [x1] : golast tgt tgt x2

I've tried both GHC 8.6.5 and GHC 8.8.1, and get no warnings with either "-O -Wall" or "-O2 -Wall" and the below (my original names, which look equivalent). chunk :: (Num n, Ord n) => n -> [W n a] -> [[W n a]] chunk target = go target target where go _ _ [] = [] go tgt gap [x] = golast tgt gap x -- XXX: Try dropping the "~" on the next line it should be redundant go tgt gap ~(x:xs) = let gap' = gap - weight x in if | gap' > 0 -> cons1 x $ go tgt gap' xs | gap' == 0 -> [x] : go tgt tgt xs | (x1, x2) <- wsplit x gap -> [x1] : go tgt tgt (x2 : xs) cons1 !x ~(c:cs) = (x : c) : cs golast tgt gap x = if | weight x <= gap -> [x] : [] | (x1, x2) <- wsplit x gap -> [x1] : golast tgt tgt x2

...

But when compiling I have these warning

...
src/Hkl/Projects/Sixs.hs:(59,5)-(65,69): warning: [-Wincomplete-uni-patterns] Pattern match(es) are non-exhaustive In a pattern binding: Patterns not matched: [] | 59 | go _ _ [] = [] | ^^^^^^^^^^^^^^^^^^^^^^^...

I do not understand this one

See the XXX comment I added above, with that and also "-Wincomplete-uni-patterns" in addition to "-Wall", I no longer get that warning.

...

...
src/Hkl/Projects/Sixs.hs:67:5-35: warning: [-Wincomplete-uni-patterns] Pattern match(es) are non-exhaustive In a pattern binding: Patterns not matched: [] | 67 | cons1 !x ~(c:cs) = (x : c) : cs | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If this is something which can not be reach, is it possible to explain to ghc how to avoid these warning ?

However, with "-Wincomplete-uni-patterns" I still get this one, but the irrefutable binding is there for a reason, stricness in the second argument has undesirable performance implications, we know that the list won't be empty. So perhaps compile this code in a module that does not have that warning turned on. It is not too difficult to rewrite "chunk" so that it operates on (type-level) non-empty lists, but perhaps easier to just ignore or disable the warning. -- Viktor.

Viktor Dukhovni

11:47 p.m.

On Mon, Jan 20, 2020 at 05:39:16AM -0500, Viktor Dukhovni wrote:

...

...
...
src/Hkl/Projects/Sixs.hs:67:5-35: warning: [-Wincomplete-uni-patterns] Pattern match(es) are non-exhaustive In a pattern binding: Patterns not matched: [] | 67 | cons1 !x ~(c:cs) = (x : c) : cs | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If this is something which can not be reach, is it possible to explain to ghc how to avoid these warning ?

However, with "-Wincomplete-uni-patterns" I still get this one, but the irrefutable binding is there for a reason, stricness in the second argument has undesirable performance implications, we know that the list won't be empty. So perhaps compile this code in a module that does not have that warning turned on.

Actually, the warning can be eliminated by writing: cons1 x cs = (x : head cs) : tail cs which just as "unsafe" (we know that `cs` won't be empty, but the compiler does not), but use of the partial functions 'head' and 'tail' does not elicit a warning. The `!x` is also redundant, we've already evaluated the weight of `x`'. -- Viktor.

2003

Age (days ago)

2006

Last active (days ago)

List overview

Download

9 comments

2 participants

participants (2)

PICCA Frederic-Emmanuel
Viktor Dukhovni

a sort of chunk

tags

participants (2)