
2010/7/26 Günther Schmidt
Hi all,
Hello!
I'm spidering web pages, the implementation currently is synchronous. I'd like to parallelize this for speed-up, ie. get up to 6 pages in parallel and recycle those threads.
This is usually called concurrent programming, not parallel.
Now I have come across good examples for this on the web before, but I doubt I'd find it again right away.
I'd appreciate some good pointers.
There's a simple way of doing this with Chans, for example: import Control.Applicative import Control.Concurrent.STM import Control.Monad import qualified Data.Map as M data Page = ... data Info = ... download :: Page -> IO Info download = ... getOneByOne :: [Page] -> IO (M.Map Page Info) getOneByOne = M.fromList <$> mapM (\p -> (,) p <$> download p) downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO () downloader in out = do mp <- atomically (readTChan in) case mp of Nothing -> return () Just p -> download p >>= atomically . writeTChan out getConcurrent :: Int -> [Page] -> IO [M.Map Page Info] getConcurrent n xs = do in <- newTChanIO out <- newTChanIO replicateM_ n (forkIO $ downloader in out) -- create n threads mapM (writeTChan in . Just) xs replicateM_ n (writeTChan in Nothing) -- kill n threads M.fromList <$> mapM (\_ -> readTChan out) xs This code doesn't take exceptions into account, which you should, but this works. Well, I guess, didn't try, if it compiles then it should ;). HTH, -- Felipe.