Silly I/O question

John Goerzen

28 Sep 2004 28 Sep '04

8:05 p.m.

I'm trying to write a program that will copy an arbitrarily large text file to a destination, and duplicate it 100 times. Thus: ./myprog < Input > Output would be the same as running: cat Input > Output # once cat Input >> Output # 99 times My first attempt was this: import IO main = disp 100 disp 0 = return () disp n = do c <- getContents putStr c hSeek stdin AbsoluteSeek 0 disp (n-1) That failed, though, because getContents closes the file after it's been completely read (ugh -- why?). So then I tried to work on various options around hGetLine and hPutStrLn. But I couldn't figure out a way to make this either properly tail-recursive while handling the exception, or to avoid polling for EOF each time through the function. I also don't want to store the entire file in memory -- the idea is to seek back to the beginning for each iteration. I'm assuming that stdin is a seekable fd. I checked the wiki for a pattern here but didn't see any. Suggestions?

Show replies by date

Peter Simons

28 Sep 28 Sep

9:06 p.m.

John Goerzen writes:

...

That failed, though, because getContents closes the file after it's been completely read (ugh -- why?).

getContents reads from standard input: you can't seek on that stream. Just think of "cat String -> IO () printTimes n msg = sequence_ (replicate n (putStr msg)) printTimes' :: Int -> String -> IO () printTimes' n msg = putStr (concat (replicate n msg)) This means, unfortunately, that you'll have to keep the whole file in memory, but if you want to read from standard input, there is no way around that. You could read the contents once, write it to a temporary file, and then copy it multiple times from there. Then you could do it in blocks. But that's probably not what you want to do.

...

But I couldn't figure out a way to make this either properly tail-recursive while handling the exception, or to avoid polling for EOF each time through the function.

You might want to write a function that copies the file _once_ and then just call that function several times. Like in the examples above. I don't think you need explicit recursion at all. Hope this is helpful. Peter

John Goerzen

9:19 p.m.

On 2004-09-28, Peter Simons wrote:

...

John Goerzen writes:

...
That failed, though, because getContents closes the file after it's been completely read (ugh -- why?).

You could read the contents once, write it to a temporary file, and then copy it multiple times from there. Then you could do it in blocks. But that's probably not what you want to do.

That just moves the problem :-) If I assume that stdin is redirected, it is seekable, and I could do the same there. But the block I/O in Haskell makes no sense to me (how do I allocate a Ptr type block thingy)?

...

...
But I couldn't figure out a way to make this either properly tail-recursive while handling the exception, or to avoid polling for EOF each time through the function.

You might want to write a function that copies the file _once_ and then just call that function several times. Like in the examples above. I don't think you need explicit recursion at all.

If I load it into memory, yes. Otherwise, it seems not so easy.

...

Hope this is helpful.

Yes, thanks for the insight. FWIW, this is working for me: import IO main = disp 100 disp 0 = return () disp n = let copy x = do eof <- isEOF if eof then return () else do line <- getLine putStrLn line (copy 0) in do copy 0 hSeek stdin AbsoluteSeek 0 disp (n-1) but it seems wasteful to poll isEOF so much. -- John

Jon Fairbairn

9:59 p.m.

On 2004-09-28 at 21:19-0000 John Goerzen wrote:

...

On 2004-09-28, Peter Simons wrote:

...
John Goerzen writes: FWIW, this is working for me:

import IO

main = disp 100

disp 0 = return () disp n = let copy x = do eof <- isEOF if eof then return () else do line <- getLine putStrLn line (copy 0) in do copy 0 hSeek stdin AbsoluteSeek 0 disp (n-1)

but it seems wasteful to poll isEOF so much.

Why do you say that? The condition has to be tested for, whether you do it by polling or waiting for an error to be thrown For my 2¢, I think I prefere this sort of thing to look like this:

...

import IO

number_of_copies = 100

main = mapM_ contentsToStdOut $ replicate number_of_copies stdin

contentsToStdOut hdl = do line_by_line hdl hSeek hdl AbsoluteSeek 0

line_by_line hdl = foldIO (const putStrLn) () hGetLine hdl

foldIO process_item initial_state io_operation handle = process initial_state where process state = do eof <- hIsEOF handle if eof then return state else do item <- io_operation handle new_item <- process_item state item process $ new_item

and some version of foldIO should probably be in a library somewhere. If you really don't like polling, you can write this:

...

contentsToStdOut hdl = do t <- try $ line_by_line hdl hSeek hdl AbsoluteSeek 0 case t of Right () -> error "this never happens" Left e -> if isEOFError e then return () else ioError e

with

...

line_by_line hdl = do line <- hGetLine hdl putStrLn line line_by_line hdl

Note that all of these are incorrect because hGetLine doesn't tell you whether there was a newline at the end of file. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Alastair Reid

10:57 p.m.

On Tuesday 28 September 2004 22:19, John Goerzen wrote:

...

[program that calls isEOF once per line deleted]

but it seems wasteful to poll isEOF so much.

I think all Haskell implementations have a buffering layer between the Haskell level and the operating system. So, calls to hGetLine, hIsEOF, etc. don't make (expensive) calls to the operating system but, instead, make (cheap) function calls to the I/O library to examine the state of the buffer for that file. In other words, calling isEOF is pretty cheap. That said, if you want to write a cat-like program which is as fast as Unix cat, you should not process data a character at a time or a line at a time but, rather, read fixed size blocks. Ideally the block size would match what the OS can provide efficiently and you would avoid introducing additional layers of buffering. You would also avoid converting from the external representation (a sequence of bytes in memory) to some internal representation (a linked list of characters, an array of unboxed values, or whatever) since you will waste a lot of time in conversion. -- Alastair Reid ps It sounds like you're trying to learn Haskell by writing programs with lots of I/O in them. This isn't really playing to Haskell's strengths and forces you to learn some tricky stuff (and, if chasing performance, learn some murky, non-portable libraries) before you learn what Haskell is really good for.

John Goerzen

11:43 p.m.

On 2004-09-28, Alastair Reid wrote:

...

On Tuesday 28 September 2004 22:19, John Goerzen wrote: That said, if you want to write a cat-like program which is as fast as Unix cat, you should not process data a character at a time or a line at a time but, rather, read fixed size blocks. Ideally the block size would match what

Right. However, the way to do that is not really apparent from the docs.

...

ps It sounds like you're trying to learn Haskell by writing programs with lots of I/O in them. This isn't really playing to Haskell's strengths and forces you to learn some tricky stuff (and, if chasing performance, learn some murky, non-portable libraries) before you learn what Haskell is really good for.

I appreciate the wisdom, Alastair, and understand what you're saying. Partly you are seeing these I/O-related questions because I've figured out other things without help :-) At the same time, I'm not new to functional programming, nor to lazy evaluation (though it has been some time since I've worked with it extensively), and have been spending a lot of time with OCaml recently. It strikes me as very similar to Haskell in several ways. Most of the programs I write are I/O intensive. I/O is the most, well, different thing about Haskell when compared to my prior experiences. I have no problem framing tasks recursively, for instance. One of the things I do when I learn a new language is to try to probe where its weaknesses are. I'm not saying I/O is a weakness in Haskell; just that it was a cause for concern after the shootout results on Alioth. (BTW, I, having known Haskell for a few hours, did manage to speed up one of them by a factor of three while still using only one line of code, so things may not be so bad <g>) The fact that so many Haskell tutorials save I/O for many chapters down, or even don't bother to cover it at all, is a cause for concern for a hacker-type like me. No, I'm not going to bother with murky low-level hackish libraries for I/O. I just want to understand the strengths and limitations of the existing system. OCaml's system is blazingly fast. But Haskell's is, well, beautiful. I like that. I/O is very critical to a lot of different applications. I think maybe Haskell is shortchanged in that department sometimes, and just suffers from some under-documentation. (Hal Daume III's Yet Another Haskell Tutorial had a great down-to-earth coverage of it that was accessible and easy to follow.)

karczma

29 Sep 29 Sep

12:08 a.m.

John Goerzen writes:

...

One of the things I do when I learn a new language is to try to probe where its weaknesses are.

Please, when meeting new women in your life, don't do so. Otherwise you won't live long enough in order to appreciate your new knowledge... Jerzy Karczmarczuk

Peter Simons

28 Sep 28 Sep

11:30 p.m.

John Goerzen writes:

...

But the block I/O in Haskell makes no sense to me (how do I allocate a Ptr type block thingy)?

With mallocArray. Block I/O is difficult, though.

...

...
You might want to write a function that copies the file _once_ and then just call that function several times.

...

If I load it into memory, yes. Otherwise, it seems not so easy.

Looking at your program:

...

main = disp 100

...

disp 0 = return () disp n = let copy x = do eof <- isEOF if eof then return () else do line <- getLine putStrLn line (copy 0) in do copy 0 hSeek stdin AbsoluteSeek 0 disp (n-1)

Just let the function _begin_ with the seek, copy the file, and return. Then call it 100 times. No recursion. Peter

7586

Age (days ago)

7587

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Alastair Reid
John Goerzen
Jon Fairbairn
karczma
Peter Simons