
Am Donnerstag 17 September 2009 22:20:55 schrieb Cristiano Paris:
On Thu, Sep 17, 2009 at 10:01 PM, Daniel Fischer
wrote: ... readBit fn = do txt <- readFile fn let (l,_:bdy) = span (/= '\n') txt return $ Bit (read l) bdy
?
With
main = do args <- getArgs let n = case args of (a:_) -> read a _ -> 1000 bl <- mapM readBit ["file1.txt","file2.txt"] mapM_ (putStrLn . show . index) $ sortBy (comparing index) bl mapM_ (putStrLn . take 20 . drop n . body) bl
Yes, it *seems* to work but... the files don't get closed (readFile is unfinished until body is read) so I think I'm going to have problems when the number of files to read is higher than the maximum number of open handles a process can have.
Indeed. If the number of files is large, reading lazily with readFile is not so good. Eat the cake and have it. If you have a lot of files, want to read the metadata of all, select a (much) smaller number of files by some criterion on the set of metadata and then read the body of the selected files, it's hairy. Reading all bodies immediately is probably out due to memory restrictions. The clean approach would be to separate the reading of metadata and body. The drawback is that then you have a second entry into IO. Using unsafePerformIO, you can pretend that you don't reenter IO. Whether that is safe in your situation, I don't know. Probably not (rule of thumb: all nontrivial actions wrapped in unsafePerformIO aren't safe, though chances aren't bad that it works most of the time).
That's a possibility I considered even if not directly using readFile.
Cristiano