Pragmatic concurrency Re: [Haskell-cafe] multiple computations, same input

On 2006-03-28 at 08:02+0200 Tomasz Zielonka wrote:
I wonder if it would be possible to remove the space-leak by running both branches concurrently, and scheduling threads in a way that would minimise the space-leak. I proposed this before
http://www.haskell.org/pipermail/haskell-cafe/2005-December/013428.html
I would like to hear opinions from some compiler gurus.
This is something I've been thinking about on and off for a long time (probably since John Hughes mentioned the case of "average"). I even kept Tomasz's original message in my inbox until today in the hope that I'd get round to sending a response, but my flaky health gets in the way. So here, and I hope people will allow for the fact that I'm half asleep as I write this, is an attempt. There are some observations I'd like to make, and a proposal. Since the proposal relates (in a small way) to concurrency and is, I think worthwhile, I've cc'd this message to haskell-prime. 1) choosing the optimal reduction strategy is undecidable 2) we shouldn't (in general) attempt to do undecidable things automatically 3) Separation of concerns: Pragmatic decisions about evaluation order should be kept separate from the denotational aspect of the code. By this token, seq shouldn't be a function (because it isn't one), but a pragma. The fact that it's shorter to write seq a b than {-# SEQ a #-} b is a matter of syntax, so shouldn't rate highly in language design decisions. Perhaps we want a different syntax for this kind of pragma, but that's a side issue. So, to take Tomasz's example of wc, we want to be able to define it essentially this way: wc cs = (ll, ww, cc) where ll = lines cs ww = words cs cc = length cs but add [a] pragma[s] to the effect that evaluation should be input driven, and that ll, ww, and cc are to be given equal time. Something like {-# STEPPER cs; ROUND_ROBIN ll,ww,cc #-} (please do not take this as a suggestion of real syntax!). The way I would implement this is to add a new primitive, STEP, which is like seq except that it only evaluates its argument until it encounters another STEP. (It really isn't much different to seq). So after the compiler understood the pragma, it would replace wc with this (allowing the compiler to pretend step is a function): wc cs = (ll, ww, cc) where ll = lines cs' ww = words cs' cc = length cs' cs' = foldr (\a -> STEP ll . STEP ww . STEP cc . (a:)) [] cs Evaluation would start as normal (a wrinkle here is that the way I've written it, whichever element of the tuple is evaluated first gets two goes at the start, but that's a compiler detail). when it came to evaluating cs', it would be looking at a thunk something like STEP ll (STEP ww (STEP cc ('x': ...))) update the thunk to (STEP ww (STEP cc ('x': ...))) evaluate ll until (and if) it hits the thunk again, update it to (STEP cc ('x': ...)) evaluate ww until it hits the thunk, update it to 'x' : (STEP ...) evaluate cc, and so on. It seems to me that this wouldn't take much effort to implement, but it would provide a simple means of removing space leaks from a whole bunch of programmes without mangling the source code much. Jón -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

On Wed, Mar 29, 2006 at 12:50:02PM +0100, Jon Fairbairn wrote:
[...]
but add [a] pragma[s] to the effect that evaluation should be input driven, and that ll, ww, and cc are to be given equal time. Something like {-# STEPPER cs; ROUND_ROBIN ll,ww,cc #-} (please do not take this as a suggestion of real syntax!).
The way I would implement this is to add a new primitive, STEP, which is like seq except that it only evaluates its argument until it encounters another STEP. (It really isn't much different to seq).
[...]
It seems to me that this wouldn't take much effort to implement, but it would provide a simple means of removing space leaks from a whole bunch of programmes without mangling the source code much.
Actually, it may require no effort from compiler implementors. I just managed to get the desired effect in current GHC! :-) I implemented your idea of stepper by writing the function stepper that rewrites the list invoking "yield" every 500 processed elements. This way I can concurrently consume the list without the space leak - when a thread evaluates too many list elements, it gets preempted. I think it suffices if RTS employs a round-robin scheduler. I am not sure it's important. The code isn't as beautiful as the naive wc implementation. That's because I haven't yet thought how to hide newEmptyMVar, forkIO, putMVar i takeMVar. Perhaps someone will come up with a solution to this. import Control.Concurrent import Control.Monad import System.IO.Unsafe (unsafePerformIO) stepper l = s n l where n = 500 s 0 (x:xs) = unsafePerformIO $ do yield return (x : s n xs) s i (x:xs) = x : s (i-1) xs s _ [] = [] main = do cs <- liftM stepper getContents ll <- newEmptyMVar ww <- newEmptyMVar cc <- newEmptyMVar forkIO $ putMVar ll $! length (lines cs) forkIO $ putMVar ww $! length (words cs) forkIO $ putMVar cc $! length cs takeMVar ll >>= print takeMVar ww >>= print takeMVar cc >>= print See how well it works: $ cat words words words words | ./A +RTS -sstderr ./A +RTS -K8M -sstderr 394276 394272 3725868 <- that's the size of cs 643,015,284 bytes allocated in the heap 72,227,708 bytes copied during GC 109,948 bytes maximum residency (46 sample(s)) <- no space leak! 2452 collections in generation 0 ( 0.33s) 46 collections in generation 1 ( 0.00s) 2 Mb total memory in use <- no space leak! INIT time 0.00s ( 0.01s elapsed) MUT time 1.25s ( 1.27s elapsed) GC time 0.33s ( 0.36s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.58s ( 1.64s elapsed) %GC time 20.9% (22.0% elapsed) Alloc rate 514,412,227 bytes per MUT second Productivity 79.1% of total user, 76.2% of total elapsed Thanks for your idea, Jon! :-) Best regards Tomasz

On Thu, Mar 30, 2006 at 05:05:30PM +0200, Tomasz Zielonka wrote:
Actually, it may require no effort from compiler implementors. I just managed to get the desired effect in current GHC! :-)
More specifically: in uniprocessor GHC 6.4.1.
I implemented your idea of stepper by writing the function stepper that rewrites the list invoking "yield" every 500 processed elements. This way I can concurrently consume the list without the space leak - when a thread evaluates too many list elements, it gets preempted. I think it suffices if RTS employs a round-robin scheduler. I am not sure it's important.
I just realised that this technique will only work on uniprocessors! :-( I relies on only one thread running at any moment. If there are multiple CPUs, yielding won't stop the current thread from consuming the list.
The code isn't as beautiful as the naive wc implementation. That's because I haven't yet thought how to hide newEmptyMVar, forkIO, putMVar i takeMVar. Perhaps someone will come up with a solution to this.
Here is my attempt to make the code more pure. The "concurrently" combinator uses CPS, because otherwise it was a bit difficult to split evaluation into two phases - first forking the thread, second taking the result from an MVar. I also tried using additional data constructor wrapper for the result, so first phase occured when forcing the constructor, and the second when forcing it's parameter, but it was tricky to use it properly considering that "let" and "where" bindings use irrefutable patterns. import Control.Concurrent import Control.Monad import System.IO.Unsafe stepper :: Int -> [a] -> [a] stepper n l = s n l where s 0 (x:xs) = unsafePerformIO $ do yield return (x : s n xs) s i (x:xs) = x : s (i-1) xs s _ [] = [] concurrently :: a -> (a -> b) -> b concurrently e f = unsafePerformIO $ do var <- newEmptyMVar forkIO $ putMVar var $! e return (f (unsafePerformIO (takeMVar var))) wc :: String -> (Int, Int, Int) wc cs0 = let cs = stepper 500 cs0 in concurrently (length (lines cs)) $ \ll -> concurrently (length (words cs)) $ \ww -> concurrently (length cs) $ \cc -> (ll, ww, cc) main = do cs <- getContents print (wc cs) It's probably worth noting that (in this case) when I remove "yield", so I only use concurrency with no stepper, the space-leak is also reduced, but not completely. Best regards Tomasz
participants (2)
-
Jon Fairbairn
-
Tomasz Zielonka