
On Mon, Mar 02, 2009 at 04:10:41PM +0100, Manlio Perillo wrote:
Anish Muttreja ha scritto:
On Sun, Mar 01, 2009 at 07:25:56PM +0100, Manlio Perillo wrote:
Hi.
I have a function that do some IO (take a file path, read the file, parse, and return some data), and I would like to parallelize it, so that multiple files can be parsed in parallel.
I would like to use the simple mapReduce function, from Real Word Haskell:
mapReduce :: Strategy b -- evaluation strategy for mapping -> (a -> b) -- map function -> Strategy c -- evaluation strategy for reduction -> ([b] -> c) -- reduce function -> [a] -- list to map over -> c
mapReduce mapStrat mapFunc reduceStrat reduceFunc input = mapResult `pseq` reduceResult where mapResult = parMap mapStrat mapFunc input reduceResult = reduceFunc mapResult `using` reduceStrat
Is this possible?
Thanks Manlio Perillo
Would this work?
I suspect that it will not work..
Read in each file into a string (or byteString) using a lazy function and then call mapReduce with the strings instead of file paths.
import qualified Data.Bytestring.Lazy.Char8 as L do let handles = map (openFile ) files strings <- mapM L.hGetContents handles let result = mapReduce ...
The actual work of reading in the file should happen on-demand inside the parsing function called by mapReduce.
By doing this I will probably lose any control about file resources usage.
OK. How about this. Is there a reason why I can't replace the variables b and c in the type signature of mapReduce with with (IO b') and (IO c'). b and c can be any types. mapReduce :: Strategy (IO b') -- evaluation strategy for mapping -> (a -> IO b') -- map function -> Strategy (IO c') -- evaluation strategy for reduction -> ([IO b'] -> (IO c')) -- reduce function -> [a] -- list to map over -> (IO c') Just remember to wrap all values back in the IO monad. Anish
Thanks Manlio