1st attempt at parallelizing

Hi all, I'm spidering web pages, the implementation currently is synchronous. I'd like to parallelize this for speed-up, ie. get up to 6 pages in parallel and recycle those threads. Now I have come across good examples for this on the web before, but I doubt I'd find it again right away. I'd appreciate some good pointers. Günther

2010/7/26 Günther Schmidt
Hi all,
Hello!
I'm spidering web pages, the implementation currently is synchronous. I'd like to parallelize this for speed-up, ie. get up to 6 pages in parallel and recycle those threads.
This is usually called concurrent programming, not parallel.
Now I have come across good examples for this on the web before, but I doubt I'd find it again right away.
I'd appreciate some good pointers.
There's a simple way of doing this with Chans, for example: import Control.Applicative import Control.Concurrent.STM import Control.Monad import qualified Data.Map as M data Page = ... data Info = ... download :: Page -> IO Info download = ... getOneByOne :: [Page] -> IO (M.Map Page Info) getOneByOne = M.fromList <$> mapM (\p -> (,) p <$> download p) downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO () downloader in out = do mp <- atomically (readTChan in) case mp of Nothing -> return () Just p -> download p >>= atomically . writeTChan out getConcurrent :: Int -> [Page] -> IO [M.Map Page Info] getConcurrent n xs = do in <- newTChanIO out <- newTChanIO replicateM_ n (forkIO $ downloader in out) -- create n threads mapM (writeTChan in . Just) xs replicateM_ n (writeTChan in Nothing) -- kill n threads M.fromList <$> mapM (\_ -> readTChan out) xs This code doesn't take exceptions into account, which you should, but this works. Well, I guess, didn't try, if it compiles then it should ;). HTH, -- Felipe.

2010/7/26 Felipe Lessa
downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO () downloader in out = do mp <- atomically (readTChan in) case mp of Nothing -> return () Just p -> download p >>= atomically . writeTChan out
Oops! Of course there should be recursion here! (This is a bug the typechecker probably wouldn't catch.) downloader :: TChan (Maybe Page) -> TChan (Page, Info) -> IO () downloader in out = do mp <- atomically (readTChan in) case mp of Nothing -> return () Just p -> download p >>= atomically . writeTChan out >> downloader in out Cheers, -- Felipe.

Dear Felipe, thank you for the code and for the correction :). As usual I come across interesting stuff when I have no immediate need for it and when I do I can't find it anymore. I am looking for something slightly more abstracted and iirc there recently was a post about the pi-calculus which seemed elegant even though the author told me himself it was not meant for any RW use. But I believe the Galois boys have created a lib, called orc?, for this purpose. I think I'll check into that and see how it goes. Günther

2010/7/26 Günther Schmidt
Dear Felipe,
thank you for the code and for the correction :).
As usual I come across interesting stuff when I have no immediate need for it and when I do I can't find it anymore.
I am looking for something slightly more abstracted and iirc there recently was a post about the pi-calculus which seemed elegant even though the author told me himself it was not meant for any RW use.
But I believe the Galois boys have created a lib, called orc?, for this purpose. I think I'll check into that and see how it goes.
BTW, Galois also employs women, not just boys/men. "Galois folks" would be more appropriate and gender inclusive. And yes, Orc is pretty cool and should be perfectly suited for what you're doing as fetching data from websites was one of the original use cases for Orc. Jason

Dear Jason,
And yes, Orc is pretty cool and should be perfectly suited for what you're doing as fetching data from websites was one of the original use cases for Orc.
Jason
thanks for that, it's nice to be on the right track for once. Günther
participants (3)
-
Felipe Lessa
-
Günther Schmidt
-
Jason Dagit