file line operation perhaps need loop

Hello, I have two txt file,and i want to mix the two files line by line, e.g. $ cat url1.txt url1_1.line url1_2.line $ cat url2.txt url2_1.line url2_2.line and i want this file as result: $ cat aha.txt url1_1.line url2_1.line url1_2.line url2_2.line i first write this snippet of code: --- import System.IO mix :: [a] -> [a] -> [a] mix [] ys = ys mix xs [] = xs mix (x:xs) (y:ys) = [x,y] ++ mix xs ys f1 = do contents1 <- readFile "url1.txt" contents2 <- readFile "url2.txt" let urls1 = lines contents1 urls2 = lines contents2 urls = mix urls1 urls2 writeFile "aha.txt" (unlines urls) -- this works fine, but i think if the two file are very big, and the readFile will consume too many mem.so i need to read the file line by line but stunned by the loop in IO Monad: --- main = do h1 <- openFile "url1.txt" ReadMode h2 <- openFile "url2.txt" ReadMode line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2 : [] -- i don't howto do hClose h1 hClose h2 -- any ideas? thank you all. -- Sun Yi Ming

On Wed, Jul 20, 2005 at 02:27:36PM +0800, Sun Yi Ming wrote:
Hello, I have two txt file,and i want to mix the two files line by line, e.g. $ cat url1.txt url1_1.line url1_2.line $ cat url2.txt url2_1.line url2_2.line and i want this file as result: $ cat aha.txt url1_1.line url2_1.line url1_2.line url2_2.line
i first write this snippet of code: --- import System.IO
mix :: [a] -> [a] -> [a] mix [] ys = ys mix xs [] = xs mix (x:xs) (y:ys) = [x,y] ++ mix xs ys
f1 = do contents1 <- readFile "url1.txt" contents2 <- readFile "url2.txt" let urls1 = lines contents1 urls2 = lines contents2 urls = mix urls1 urls2 writeFile "aha.txt" (unlines urls) -- this works fine, but i think if the two file are very big, and the readFile will consume too many mem.
Ah, but this is exactly where lazyness wins bigtime: a smart implementation of readFile will lazily read the actual file for as far as needed. Thus, reading with readFile will not read the entire file into memory at once. What will happen is that writeFile starts writing, and upon discovery of needing the value of urls it will then start reading. Any value already written in this case obviously turns into garbage and will be garbage collected. I would be slightly surprised if this code uses more than constant memory.
so i need to read the file line by line but stunned by the loop in IO Monad: --- main = do h1 <- openFile "url1.txt" ReadMode h2 <- openFile "url2.txt" ReadMode line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2 : [] -- i don't howto do hClose h1 hClose h2 -- any ideas? thank you all.
Yes. You need to split the lines line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2: [] into a separate function that will then recurse over the file. Doei, Arthur. -- /\ / | arthurvl@cs.uu.nl | Work like you don't need the money /__\ / | A friend is someone with whom | Love like you have never been hurt / \/__ | you can dare to be yourself | Dance like there's nobody watching

Arthur van Leeuwen
Ah, but this is exactly where lazyness wins bigtime: a smart implementation of readFile will lazily read the actual file for as far as needed. Thus, reading with readFile will not read the entire file into memory at once. What will happen is that writeFile starts writing, and upon discovery of needing the value of urls it will then start reading. Any value already written in this case obviously turns into garbage and will be garbage collected. I would be slightly surprised if this code uses more than constant memory.
so i need to read the file line by line but stunned by the loop in IO Monad: --- main = do h1 <- openFile "url1.txt" ReadMode h2 <- openFile "url2.txt" ReadMode line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2 : [] -- i don't howto do hClose h1 hClose h2 -- any ideas? thank you all.
Yes. You need to split the lines
line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2: []
into a separate function that will then recurse over the file.
ah, readFile is lazy,that's great! my hat's off to the haskell design/implement teams for their robust and elegant work. thank all you guys! BTW, sorry to Doei Arthur for my reply to you by mistake. -- Sun Yi Ming

On Wed, 2005-07-20 at 14:27 +0800, Sun Yi Ming wrote: [snip]
i first write this snippet of code: --- import System.IO
mix :: [a] -> [a] -> [a] mix [] ys = ys mix xs [] = xs mix (x:xs) (y:ys) = [x,y] ++ mix xs ys
f1 = do contents1 <- readFile "url1.txt" contents2 <- readFile "url2.txt" let urls1 = lines contents1 urls2 = lines contents2 urls = mix urls1 urls2 writeFile "aha.txt" (unlines urls) -- this works fine, but i think if the two file are very big, and the readFile will consume too many mem.so i need to read the file line by line but stunned by the loop in IO Monad:
Did you try it on a big file to see what happens? There should not be any problem because readFile is lazy. That is it reads the contents of the file on demand, not all at once. The only thing you have to be careful about is that you do not require all the contents of the file before any output can be produced. Bernie.

Sun Yi Ming wrote:
Hello,
Hello,
this works fine, but i think if the two file are very big, and the readFile will consume too many mem.so i need to read the file line by line but stunned by the loop in IO Monad:
main = do h1 <- openFile "url1.txt" ReadMode h2 <- openFile "url2.txt" ReadMode line1 <- hGetLine h1 line2 <- hGetLine h2 print $ line1 : line2 : [] -- i don't howto do hClose h1 hClose h2
Don't worry about memory... GNU tools differs from unix-like systems: they don't utilizing IO by computing, they "slurp", or mmap entire file into memory and then process it. main = do h1 <- openFile "url1.lsm" h2 <- openFile "url2.lsm" print (zipFiles h1 h2) zipFiles :: Handle -> Handle -> IO [String] zipFiles h1 h2 = do eof1 <- hIsEOF h eof2 <- hIsEOF h case (eof1, eof2) of (False, False) -> do l1 <- hGetLine h1 l2 <- hGetLine h2 return l1:l2:(zipFiles h1 h2) (False, True) -> return (readFile h1) (True, False) -> return (readFile h2) _ -> [] I didn't tested it, but.... it should work... (I like fixed-width font and text-only mails... please, if you mail me, then send me only text, not html). Matej 'Yin' Gagyi

On Wednesday 20 July 2005 08:27, Sun Yi Ming wrote:
Hello, I have two txt file,and i want to mix the two files line by line, [...] import System.IO
mix :: [a] -> [a] -> [a] mix [] ys = ys mix xs [] = xs mix (x:xs) (y:ys) = [x,y] ++ mix xs ys
f1 = do contents1 <- readFile "url1.txt" contents2 <- readFile "url2.txt" let urls1 = lines contents1 urls2 = lines contents2 urls = mix urls1 urls2 writeFile "aha.txt" (unlines urls) -- this works fine, but i think if the two file are very big, and the readFile will consume too many mem.
No. Both files are read lazily (on demand). THis is how 'readFile' is specified. The program should work fine even with very large files. Try it. Ben
participants (6)
-
Arthur van Leeuwen
-
Benjamin Franksen
-
Bernard Pope
-
Henning Thielemann
-
Sun Yi Ming
-
yin