
Hello Haskellers, I'm wondering how to get the following to work: I need to parse about 18.000 files. I would like to have a function 'parseFiles' that parses all these files and returns the result in a list. When I execute 'test' from the simplified code below I get the error: *** Exception: file_0014994.txt: openFile: resource exhausted (Too many open files) So it seems that 'parseFiles' tries to open all the ~18.000 files and gets exhausted when opening the 14994 file. What I would like is 'take 3 =<< parseFiles' to read only the first 3 files. Is this possible, and if so, what is the best way to do this? Note that the code below is a bit simplified: ------------------------------------------------------------------------------ module Main where import Text.ParserCombinators.Parsec data T = ... test = print . take 3 =<< parseFiles parseFiles :: IO [T] parseFiles = mapM parseFile =<< getFileFPs getFileFPs :: IO [FilePath] getFileFPs = ... -- returns a large list of about 18.000 FilePaths parseFile :: FilePath -> IO T parseFile fp = liftM getRight $ parseFromFile someParser fp getRight (Right r) = r someParser :: Parser T someParser = ... ------------------------------------------------------------------------------ Greetings, Bas van Dijk

On Mon, Oct 23, 2006 at 08:48:24PM +0200, Bas van Dijk wrote:
So it seems that 'parseFiles' tries to open all the ~18.000 files and gets exhausted when opening the 14994 file.
What I would like is 'take 3 =<< parseFiles' to read only the first 3 files.
Is this possible, and if so, what is the best way to do this?
parseFiles :: IO [T] parseFiles = mapM parseFile =<< getFileFPs
use this function instead of mapM above: unsafeInterleaveMapIO f (x:xs) = unsafeInterleaveIO $ do y <- f x ys <- unsafeInterleaveMapIO f xs return (y : ys) unsafeInterleaveMapIO _ [] = return [] Best regards Tomasz

On Monday 23 October 2006 21:50, Tomasz Zielonka wrote:
unsafeInterleaveMapIO f (x:xs) = unsafeInterleaveIO $ do y <- f x ys <- unsafeInterleaveMapIO f xs return (y : ys) unsafeInterleaveMapIO _ [] = return []
Great it works! I didn't know about unsafeInterleaveIO [1]. Why is it called 'unsafe'? And how does the laziness work? Is it because of the 'let r = ... in (s, r)'? [1] ghc/libraries/base/GHC/IOBase.lhs:249: ------------------------------------------------------------------------------- {-| 'unsafeInterleaveIO' allows 'IO' computation to be deferred lazily. When passed a value of type @IO a@, the 'IO' will only be performed when the value of the @a@ is demanded. This is used to implement lazy file reading, see 'System.IO.hGetContents'. -} {-# INLINE unsafeInterleaveIO #-} unsafeInterleaveIO :: IO a -> IO a unsafeInterleaveIO (IO m) = IO ( \ s -> let r = case m s of (# _, res #) -> res in (# s, r #)) -- We believe that INLINE on unsafeInterleaveIO is safe, because the -- state from this IO thread is passed explicitly to the interleaved -- IO, so it cannot be floated out and shared. ------------------------------------------------------------------------------- Thanks, Bas van Dijk

Hello Bas, Tuesday, October 24, 2006, 1:03:55 AM, you wrote:
Great it works! I didn't know about unsafeInterleaveIO [1]. Why is it called 'unsafe'? And how does the laziness work? Is it because of the 'let r = ... in (s, r)'?
read either http://haskell.org/haskellwiki/IO_inside or Simon's paper "awkward squad" mentioned there in your original program, parseFile don't parsed file. it just returned thunk (computation) that will parse file when data from this thunk will be requested. so, your mapM returned 18k of such thunks, each with its own file open. Tomasz's solution postpone not only parsing, but the whole parseFile call (including file open) until the data from this parsing will be really requested. because your next code uses results of each parsing before using results of next one, this allows garbage collector to close files of already used parsings before you go to consume next one. smart solution -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bas van Dijk wrote:
On Monday 23 October 2006 21:50, Tomasz Zielonka wrote:
unsafeInterleaveMapIO f (x:xs) = unsafeInterleaveIO $ do y <- f x ys <- unsafeInterleaveMapIO f xs return (y : ys) unsafeInterleaveMapIO _ [] = return []
Great it works! I didn't know about unsafeInterleaveIO [1].
Why is it called 'unsafe'?
Because it causes pure code to perform side-effects (=IO), albeit in a controlled manner, so it's not as bad as unsafePerformIO. For instance, using getContents you get a string (list of chars) with the property that evaluating subsequent elements of the list causes IO to happen (in this case reading another character from stdin). Thus, unsafeInterleaveIO is safe only if it is not observable (from inside the program) when exactly the IO gets performed. Ben

I don't know why these unsafe* functions are suggested so easily. Their name is chosen for good reason. On Mon, 23 Oct 2006, Bas van Dijk wrote:
------------------------------------------------------------------------------ module Main where
import Text.ParserCombinators.Parsec
data T = ...
test = print . take 3 =<< parseFiles
parseFiles :: IO [T] parseFiles = mapM parseFile =<< getFileFPs
Is it possible to turn it into parseFiles :: [IO T] ? Then you can easily do sequence (take 3 parseFiles)

On Tuesday 24 October 2006 13:03, Henning Thielemann wrote:
Is it possible to turn it into
parseFiles :: [IO T]
?
Then you can easily do
sequence (take 3 parseFiles)
Thanks, I think I go for this sollution: parseFiles :: IO [IO a] parseFiles = liftM (map parseFile) getFileFPs test = print =<< sequence . take 3 =<< parseFiles Bas van Dijk
participants (5)
-
Bas van Dijk
-
Benjamin Franksen
-
Bulat Ziganshin
-
Henning Thielemann
-
Tomasz Zielonka