Re: [Haskell-cafe] 1st attempt at parallelizing

26 Jul 2010

      2010/7/26 Günther Schmidt :
...
Hi all,
Hello!
...
I'm spidering web pages, the implementation currently is synchronous. I'd
like to parallelize this for speed-up, ie. get up to 6 pages in parallel and
recycle those threads.
This is usually called concurrent programming, not parallel.
...
Now I have come across good examples for this on the web before, but I doubt
I'd find it again right away.
I'd appreciate some good pointers.
There's a simple way of doing this with Chans, for example:

import Control.Applicative
import Control.Concurrent.STM
import Control.Monad
import qualified Data.Map as M

data Page = ...
data Info = ...

download :: Page -> IO Info
download = ...

getOneByOne :: [Page] -> IO (M.Map Page Info)
getOneByOne = M.fromList <$> mapM (\p -> (,) p <$> download p)

downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO ()
downloader in out = do
  mp <- atomically (readTChan in)
  case mp of
    Nothing -> return ()
    Just p -> download p >>= atomically . writeTChan out

getConcurrent :: Int -> [Page] -> IO [M.Map Page Info]
getConcurrent n xs = do
  in <- newTChanIO
  out <- newTChanIO
  replicateM_ n (forkIO $ downloader in out) -- create n threads
  mapM (writeTChan in . Just) xs
  replicateM_ n (writeTChan in Nothing) -- kill n threads
  M.fromList <$> mapM (\_ -> readTChan out) xs

This code doesn't take exceptions into account, which you should, but
this works.  Well, I guess, didn't try, if it compiles then it should
;).

HTH,

-- 
Felipe.

Re: [Haskell-cafe] 1st attempt at parallelizing

Felipe Lessa