Reading Multiple Files and Iterate Function Application

Dear All, Another I/O question. Let us say that you are given a list of files file1.dat, file2.dat...file10.dat and so on (i.e. every file is indexed by a number and every file is a single column where every entry is a string without spaces). In the snippet below I read file1.dat, convert it to a list and then print out its length. Now, how can I iterate the process on file1.dat, file2.dat and file3.dat and store the lengths in a list? I would like to map the file reading and following operations on the list [1,2.3], but that is giving me a headache. It is relatively easy to create the file name filename="file"++(show i)++".dat" , for i=1,2,3 but it the the iteration part that is giving me troubles. Any suggestion is appreciated. Cheers Lorenzo -- ##################################################################### import Data.Ord import Data.List main :: IO () main = do list_t <- readFile "file1.dat" let list = lines list_t let b = length list putStrLn "length of list is, " print b

On Monday 11 October 2010 16:56:58, Lorenzo Isella wrote:
Dear All, Another I/O question. Let us say that you are given a list of files file1.dat, file2.dat...file10.dat and so on (i.e. every file is indexed by a number and every file is a single column where every entry is a string without spaces). In the snippet below I read file1.dat, convert it to a list and then print out its length. Now, how can I iterate the process on file1.dat, file2.dat and file3.dat and store the lengths in a list?
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file) filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat" getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums If you want something other than the character count, instead of fileLength use e.g. countLines :: FilePath -> IO Int countLines file = fmap (length . lines) (readFile file) or whatever you're interested in. Another nice thing is often forM (from Control.Monad) forM nums $ \i -> do let filename = "file" ++ show i ++ ".dat" contents <- readFile filename let result = function contents doSomethingOrNot return result
I would like to map the file reading and following operations on the list [1,2.3], but that is giving me a headache. It is relatively easy to create the file name
filename="file"++(show i)++".dat" , for i=1,2,3
but it the the iteration part that is giving me troubles. Any suggestion is appreciated. Cheers
Lorenzo

Thanks a lot Daniel, but I am a bit lost (up to not long ago I did not even know the existence of a control monad...and some unstructured reading did not help). Some online research about mapM and fmap led me here http://en.wikibooks.org/wiki/Haskell/Category_theory and I think I am a bit astray at this point ;-) Why does my "simple" snippet below raise a number of errors? Cheers Lorenzo import Data.Ord import Data.List main :: IO () main = do let nums=[1,2] let fl = getAllLengths nums putStrLn "fl is, " print fl filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat" fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file) getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums On 10/11/2010 05:21 PM, Daniel Fischer wrote:
On Monday 11 October 2010 16:56:58, Lorenzo Isella wrote:
Dear All, Another I/O question. Let us say that you are given a list of files file1.dat, file2.dat...file10.dat and so on (i.e. every file is indexed by a number and every file is a single column where every entry is a string without spaces). In the snippet below I read file1.dat, convert it to a list and then print out its length. Now, how can I iterate the process on file1.dat, file2.dat and file3.dat and store the lengths in a list?
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
If you want something other than the character count, instead of fileLength use e.g.
countLines :: FilePath -> IO Int countLines file = fmap (length . lines) (readFile file)
or whatever you're interested in.
Another nice thing is often forM (from Control.Monad)
forM nums $ \i -> do let filename = "file" ++ show i ++ ".dat" contents<- readFile filename let result = function contents doSomethingOrNot return result
I would like to map the file reading and following operations on the list [1,2.3], but that is giving me a headache. It is relatively easy to create the file name
filename="file"++(show i)++".dat" , for i=1,2,3
but it the the iteration part that is giving me troubles. Any suggestion is appreciated. Cheers
Lorenzo

Hi,
replace
let fl = getAllLengths nums
by
fl <- getAllLengths nums
, since getAllLengths returns a monadic action.
The author of the following book is much better at explaining why this is so
than I am: http://learnonlineyouahaskell.com/
http://learnyouahaskell.com/. May
I suggest you read it cover to cover, it's really really good. It is
probably the best way to learn Haskell at the moment, together with trying
out code snippets like you're doing right now.
Regards,
Thomas
On Mon, Oct 11, 2010 at 6:06 PM, Lorenzo Isella
Thanks a lot Daniel, but I am a bit lost (up to not long ago I did not even know the existence of a control monad...and some unstructured reading did not help). Some online research about mapM and fmap led me here http://en.wikibooks.org/wiki/Haskell/Category_theory and I think I am a bit astray at this point ;-)
Why does my "simple" snippet below raise a number of errors?
Cheers
Lorenzo
import Data.Ord
import Data.List
main :: IO ()
main = do
let nums=[1,2]
let fl = getAllLengths nums
putStrLn "fl is, " print fl
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
On 10/11/2010 05:21 PM, Daniel Fischer wrote:
On Monday 11 October 2010 16:56:58, Lorenzo Isella wrote:
Dear All, Another I/O question. Let us say that you are given a list of files file1.dat, file2.dat...file10.dat and so on (i.e. every file is indexed by a number and every file is a single column where every entry is a string without spaces). In the snippet below I read file1.dat, convert it to a list and then print out its length. Now, how can I iterate the process on file1.dat, file2.dat and file3.dat and store the lengths in a list?
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
If you want something other than the character count, instead of fileLength use e.g.
countLines :: FilePath -> IO Int countLines file = fmap (length . lines) (readFile file)
or whatever you're interested in.
Another nice thing is often forM (from Control.Monad)
forM nums $ \i -> do let filename = "file" ++ show i ++ ".dat" contents<- readFile filename let result = function contents doSomethingOrNot return result
I would like to map the file reading and following operations on the
list [1,2.3], but that is giving me a headache. It is relatively easy to create the file name
filename="file"++(show i)++".dat" , for i=1,2,3
but it the the iteration part that is giving me troubles. Any suggestion is appreciated. Cheers
Lorenzo
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On 10/11/2010 07:20 PM, Thomas Miedema wrote:
Hi,
replace let fl = getAllLengths nums
by fl <- getAllLengths nums
, since getAllLengths returns a monadic action.
The author of the following book is much better at explaining why this is so than I am: http://learnonlineyouahaskell.com/ http://learnyouahaskell.com/. May I suggest you read it cover to cover, it's really really good. It is probably the best way to learn Haskell at the moment, together with trying out code snippets like you're doing right now.
Regards, Thomas
On Mon, Oct 11, 2010 at 6:06 PM, Lorenzo Isella
mailto:lorenzo.isella@gmail.com> wrote: Thanks a lot Daniel, but I am a bit lost (up to not long ago I did not even know the existence of a control monad...and some unstructured reading did not help). Some online research about mapM and fmap led me here http://en.wikibooks.org/wiki/Haskell/Category_theory and I think I am a bit astray at this point ;-)
Why does my "simple" snippet below raise a number of errors?
Cheers
Lorenzo
import Data.Ord
import Data.List
main :: IO ()
main = do
let nums=[1,2]
let fl = getAllLengths nums
putStrLn "fl is, " print fl
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
On 10/11/2010 05:21 PM, Daniel Fischer wrote:
On Monday 11 October 2010 16:56:58, Lorenzo Isella wrote:
Dear All, Another I/O question. Let us say that you are given a list of files file1.dat, file2.dat...file10.dat and so on (i.e. every file is indexed by a number and every file is a single column where every entry is a string without spaces). In the snippet below I read file1.dat, convert it to a list and then print out its length. Now, how can I iterate the process on file1.dat, file2.dat and file3.dat and store the lengths in a list?
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
If you want something other than the character count, instead of fileLength use e.g.
countLines :: FilePath -> IO Int countLines file = fmap (length . lines) (readFile file)
or whatever you're interested in.
Another nice thing is often forM (from Control.Monad)
forM nums $ \i -> do let filename = "file" ++ show i ++ ".dat" contents<- readFile filename let result = function contents doSomethingOrNot return result
I would like to map the file reading and following operations on the list [1,2.3], but that is giving me a headache. It is relatively easy to create the file name
filename="file"++(show i)++".dat" , for i=1,2,3
but it the the iteration part that is giving me troubles. Any suggestion is appreciated. Cheers
Lorenzo
_______________________________________________ Beginners mailing list Beginners@haskell.org mailto:Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
Thanks Thomas. Yep, I do need some extra reading unfortunately. One question: if I was to apply a function on many files file1, file2...using e.g. Python, this would be my pipeline read file1 do stuff on file 1 read file2 do stuff on file 2 ...... Now, due to the laziness of haskell, can I here resort to this approach read file1, file2... into a single list map (do-my-stuff) on list As far as I understand, this should not result e.g. into a huge RAM consumptions since files are read and processed only when needed (hence one at the time). Am I on the right track? Cheers Lorenzo

On Tuesday 12 October 2010 11:02:47, Lorenzo Isella wrote:
Thanks Thomas. Yep, I do need some extra reading unfortunately. One question: if I was to apply a function on many files file1, file2...using e.g. Python, this would be my pipeline read file1 do stuff on file 1
read file2 do stuff on file 2
......
Now, due to the laziness of haskell, can I here resort to this approach
read file1, file2... into a single list
map (do-my-stuff) on list
As far as I understand, this should not result e.g. into a huge RAM consumptions since files are read and processed only when needed (hence one at the time). Am I on the right track?
Yes, but there are dangers on that way. With readFile, the contents are read lazily upon demand, but the file is opened immediately for reading. So contentsList <- mapM readFile fileList or allContents <- fmap concat $ mapM readFile fileList can make you run out of file handles if fileList is long enough. Also, the file handles aren't closed until the entire contents of the file has been read (there are a few situations where the handle is closed earlier) and they're not guaranteed to be immediately closed when the end of the file has been reached, they could linger for a GC or two. That means you can also run out of file handles when you process the files sequentially (if you have a bad consumption pattern). The memory usage depends on your consumption pattern, independent of whether theSting[s] you process come[s] from file readings or from a non-IO generator. If you keep references to the beginning of the list, you get a leak, if you consume the list sequentially, it runs in small space.
Cheers
Lorenzo

On Monday 11 October 2010 18:06:50, Lorenzo Isella wrote:
Thanks a lot Daniel, but I am a bit lost (up to not long ago I did not even know the existence of a control monad...and some unstructured reading did not help).
I think there's a misunderstanding here. Control is the top-level name for stuff related to control flow, like Control.Concurrent for concurrency combinators, Control.Parallel (and Control.Parallel.Strategies) for parallelism combinators. Monads (some) are also related to control flow, so the monad stuff that's not available from the Prelude lives in Control.Monad and Control.Monad.Whatever (Control.Monad.State, Control.Monad.Writer, ...) Control.Monad (part of the standard libraries) provides a lot of general functions for working with Monads, among them forM (which is "flip mapM").
Some online research about mapM and fmap led me here http://en.wikibooks.org/wiki/Haskell/Category_theory and I think I am a bit astray at this point ;-)
Why does my "simple" snippet below raise a number of errors? Cheers
Lorenzo
import Data.Ord
import Data.List
main :: IO ()
main = do
let nums=[1,2]
let fl = getAllLengths nums
That means fl is the IO-action which gets the file lengths. You want the result, thus fl <- getAllLengths nums to bind the result of that action to the name fl.
putStrLn "fl is, " print fl
filename :: Int -> FilePath filename i = "file" ++ show i ++ ".dat"
fileLength :: FilePath -> IO Int fileLength file = fmap length (readFile file)
getAllLengths :: [Int] -> IO [Int] getAllLengths nums = mapM (fileLength . filename) nums
participants (3)
-
Daniel Fischer
-
Lorenzo Isella
-
Thomas Miedema