
I have the following code, which reads several thousand text files and creates some Map data. On Windows XP, it crapped out with the error "resource exhausted: too many open files." So I'm sure I need to make it stricter. I've never tried to do that before, so I'm not sure where the strictness is necessary. Maybe all I need is to switch to strict ByteStrings? -- Read a bunch of text files. The files are organized in pairs, where the file names in each pair are -- "a<number>.txt" and "f<fnumber>.txt". Example: "a00001.txt" and "f00001.txt". -- Each file is a list of float values specified in a certain text format (which happens to be -- an output format of the audio-related program Csound) -- -- Input: -- [(String,String)] : a list of pairs of file names -- String : directory in which the files exist readPvsFiles :: [(String,String)] -> String -> IO (Map Int (Map Float Float)) readPvsFiles filenames dir = do -- contents :: [(Int,Map Float Float)] -- where Int is the number in the file name. -- and Map Float Float is a map of numbers in the -- in the first file mapped to numbers in the second file contents <- mapM (oneFilePair dir) filenames return $ M.fromList contents -- Read one file pair. -- -- Input: -- String : directory of the files -- (String,String) : the two file names in the pair -- (Int,Map Float Float) : -- Int is the number in the file name. -- and Map Float Float is a map of numbers in the -- in the first file mapped to numbers in the second file oneFilePair :: String -> (String,String) -> IO (Int,Map Float Float) oneFilePair dir (ffile,afile) = do -- Read the float values from each file (which is a list of floats in -- a text format. fvalues <- readTableValues (dir ++ "/" ++ ffile) avalues <- readTableValues (dir ++ "/" ++ afile) -- t is the number in the filename. let t = read . take 6 . drop 1 $ ffile return (t, M.fromList $ zip fvalues avalues) -- Open the file via readFile, and parse all the text values in it. readTableValues :: String -> IO [Float] readTableValues s = do b <- readFile s let bs = lines b lenLine = head . drop 1 $ bs n = read (drop 6 lenLine) :: Int -- valueLines :: [String]. This is a list of the lines -- in the file that have the float values. One float -- value in text format per line. valueLines = take n . drop 25 $ bs return $ map read valueLines

Dennis Raddle wrote:
I have the following code, which reads several thousand text files and creates some Map data. On Windows XP, it crapped out with the error "resource exhausted: too many open files." So I'm sure I need to make it stricter.
Yes.
I've never tried to do that before, so I'm not sure where the strictness is necessary.
When you use readFile, each file remains open until the entire file has been read. When reading is delayed due to laziness, the open file handles begin to pile up.
Maybe all I need is to switch to strict ByteStrings?
That should solve your problem in this case, yes. You might consider using Text instead of ByteString, though; you are interpreting those values as text. Text has built-in functions for parsing floating point numbers. With ByteString, you'd have to change each line to a String. I almost always prefer using Text or ByteString over String nowadays anyway. To solve the problem when using readFile from the Prelude, you would need to make sure that each file is read all the way to the end as you go along. One trick sometimes used for that is to use the evaluate function from Control.Exception to force evaluation of the length of each file: b <- readFile s evaluate $ length b ... That would cause the entire contents of the file to be read into memory immediately and the file to be closed, like the behavior of readFile for strict ByteString and strict Text. -Yitz

Hi Yitz,
I tried "evaluate" before posting my question, and for some reason it didn't
change the behavior. But strict bytestrings worked. I haven't tried Text
yet.
It was a pain putting "map (chr . fromIntegral)" in front of everything to
get
regular strings.
On Wed, Nov 9, 2011 at 10:40 AM, Yitzchak Gale
To solve the problem when using readFile from the Prelude, you would need to make sure that each file is read all the way to the end as you go along. One trick sometimes used for that is to use the evaluate function from Control.Exception to force evaluation of the length of each file:
b <- readFile s evaluate $ length b ...
That would cause the entire contents of the file to be read into memory immediately and the file to be closed, like the behavior of readFile for strict ByteString and strict Text.
-Yitz

On Thursday 10 November 2011, 00:48:20, Dennis Raddle wrote:
It was a pain putting "map (chr . fromIntegral)" in front of everything to get regular strings.
import qualified Data.ByteString.Char8 as C foo <- C.unpack `fmap` C.readFile whichever You don't need to put it *everywhere*

Ah yes, so my first problem was not using the Char8 version, so 'unpack' was getting me [Word8] instead of [Char] On Wed, Nov 9, 2011 at 3:57 PM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:
On Thursday 10 November 2011, 00:48:20, Dennis Raddle wrote:
It was a pain putting "map (chr . fromIntegral)" in front of everything to get regular strings.
import qualified Data.ByteString.Char8 as C
foo <- C.unpack `fmap` C.readFile whichever
You don't need to put it *everywhere*
participants (3)
-
Daniel Fischer
-
Dennis Raddle
-
Yitzchak Gale