RE: [Haskell-cafe] Re: Hugs vs GHC (again) was: Re: Some randomnewbiequestions

On 08 January 2005 08:09, Aaron Denney wrote:
On 2005-01-07, Simon Marlow
wrote: - Can you use (some encoding of) Unicode for your Haskell source files? I don't think this is true in any Haskell compiler right now.
I assume this won't be be done until the next one is done...
Not necessarily; GHC doesn't use the standard IO library for reading source files.
- Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. Some progress has been made on an experimental prototype (see recent discussion on this list).
Many of the easy ways to do this that I've heard proposed make the current hacks for binary IO fail.
Making hacks fail isn't necessarily a bad thing :-)
IMHO, we really, really, need a standard, supported way to do binary IO.
I agree, but I think it should be part of a larger redesign of the IO library. The streams proposal includes binary I/O, by the way. I'm not keen to provide binary IO on top of the existing IO library, and then to have Unicode as a layer on top of that. Performance will be terrible. It needs to be designed properly from the ground up.
If I can read in and output octets, then I can implement unicode handling on top of that. In fact it would let a bunch of the proposed ideas for unicode support can be implemented in pure haskell and have API details hashed out and polished.
For unix, there are couple different tacks one could take. The locale system is standard, and does work, but is ugly and a pain to work with. In particular, it's another (set of) global variables. And what do you do with a character not expressible in the current locale?
I'd like to possibility of different character sets for different files, for example.
Not a problem. Have you looked at the streams proposal? Cheers, Simon

Simon Marlow wrote:
Not a problem. Have you looked at the streams proposal?
Is there a Wiki page or URL with the steram proposal? -- % Andre Pang : trust.in.love.to.save http://www.algorithm.com.au/

Andre Pang wrote:
Is there a Wiki page or URL with the steram proposal?
It's here: http://www.haskell.org/~simonmar/io/System.IO.html -- Ben

Ben Rudiak-Gould
"fileRead :: File -> FileOffset -> Integer -> Buffer -> IO ()" This is unimplementable safely if the descriptor is read concurrently by different processes. The current position is shared. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Marcin 'Qrczak' Kowalczyk wrote:
"fileRead :: File -> FileOffset -> Integer -> Buffer -> IO ()"
This is unimplementable safely if the descriptor is read concurrently by different processes. The current position is shared.
... which is terrible library design, which we should avoid if at all possible, which is one of several reasons that I want to get rid of the notion of "current position". Hence the above prototype. fileRead can be implemented in terms of OS primitives, and it's easy enough to implement a thread-safe seek/read interface on top of it. The reverse isn't true--if we provided seek/read, it would be very hard to implement fileRead safely. (Maybe that's what you were saying?) -- Ben

Ben Rudiak-Gould
fileRead can be implemented in terms of OS primitives,
Only if they already support reading from a fixed offset (like pread). I'm not sure if we can rely on something like this being always available, or whether it should be emulated using lseek which is safe only as long as we are the only process using the given open file. pread requires that the file is seekable, which means that it can't be used for all file handles: not for pipes, sockets, terminals nor various other devices.
and it's easy enough to implement a thread-safe seek/read interface on top of it.
Not if it must cooperate with other processes, and you *do* want to set a file position before running another program with redirected standard I/O. In this case it's not enough that you set a private Haskell variable holding its logical file position - you must perform the lseek syscall. Doing something differently than everybody else has a risk of limited interoperability, even if the new way is "better", and thus must be carefully evaluated to check whether all lost functionality is unimportant enough to lose. BTW, on Unix sockets and files are the same, but probably not on Windows. I don't know details about WinAPI. I know that basic file I/O uses HANDLEs, winsock uses ints which emulate Unix descriptors; what I don't know is whether you can also use HANDLEs for sockets (perhaps each winsock fd has an associated HANDLE with easy translation in both directions? or is there another API for sockets on HANDLEs?) and how do you perform redirection of standard I/O, in terms of HANDLEs or what - in particular I don't know whether you can redirect standard I/O to a socket. How should Haskell view this? I mean that on Unix it should somehow make files and sockets interchangeable, in order to support I/O redirection for programs being run; but it's not easy if you use completely different interfaces for files and sockets, as the streams proposal seems to do. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Marcin 'Qrczak' Kowalczyk wrote:
Ben Rudiak-Gould
writes: fileRead can be implemented in terms of OS primitives,
Only if they already support reading from a fixed offset (like pread). I'm not sure if we can rely on something like this being always available, or whether it should be emulated using lseek which is safe only as long as we are the only process using the given open file.
First of all, I don't think any OS shares file pointers between processes. Otherwise it would be practically impossible to safely use an inherited filehandle via any API. Different threads using the same filehandle do share a file pointer (which is a major nuisance in my experience, because of the lack of an atomic seek-read/write), but a Posix fork duplicates the file pointer along with all other state. I can't believe I'm wrong about this, but someone please correct me if I am. This limits the problem to a single process. If you're only using GHC's lightweight threads, there's no problem at all. If you're using OS threads, the worst thing that could happen is that you might have to protect handle access with a critical section. I don't think this would lead to a noticeable performance hit when combined with the other overhead of file read/write operations (or lazy evaluation for that matter).
pread requires that the file is seekable, which means that it can't be used for all file handles: not for pipes, sockets, terminals nor various other devices.
The file interface in this library is only used for files, which are always seekable (by definition). If you want to treat a file as a stream, you create an InputStream or OutputStream backed by the file. Such streams maintain internal (per-stream) file pointers.
Not if it must cooperate with other processes, and you *do* want to set a file position before running another program with redirected standard I/O. In this case it's not enough that you set a private Haskell variable holding its logical file position - you must perform the lseek syscall.
If you're using Posix fork/exec, you can use Posix lseek without losing portability. If you're using a higher-level Haskell library to spawn the program, it will be Stream-aware (if it supports redirection at all) and will know how to set the system file pointer when necessary.
Doing something differently than everybody else has a risk of limited interoperability, even if the new way is "better", and thus must be carefully evaluated to check whether all lost functionality is unimportant enough to lose.
Very true. (But hardly a new problem for Haskell.) -- Ben

Ben Rudiak-Gould
First of all, I don't think any OS shares file pointers between processes.
Unix does. It's because shared files are usually stdin/stdout/stderr (I mean that they are visible as stdin/stdout/stderr, rather than about their nature as terminals - they may be regular files), which are usually accessed sequentially without seeking. Quite often they are not seekable at all (terminals or pipes), which means that they behave as if the file pointer was always positioned at the end (for writing) or beginning (for reading) and shared. If they are seekable, the position is shared so you can redirect I/O to a process running subprograms.
Otherwise it would be practically impossible to safely use an inherited filehandle via any API.
Pipes are not seekable and always behave as if the position is shared. It doesn't make them impossible to safely inherit. They are inherited on fork because they are anonymous objects, so it's the only way to connect them; after fork most programs close the reading end in one of the processes and the writing end in the other. It's rare that two processes read from the same pipe or write to the same pipe. If they do, one of them is usually a helper program started by the other, and the other waits for the helper to finish. If you want two processes to access the same file indepenently, pass a file name and open it twice.
The file interface in this library is only used for files, which are always seekable (by definition).
What do you mean by files? What you get from open() is not always seekable, because it's not always a regular file. The name may refer to a device (whether it's seekable depends on the particular device; block devices are seekable but most character devices are not, e.g. /dev/ttyS0, /dev/lp0 - serial and parallel ports) or to a named pipe (not seekable). -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Marcin 'Qrczak' Kowalczyk wrote:
Ben Rudiak-Gould
writes: The file interface in this library is only used for files, which are always seekable (by definition).
What do you mean by files? What you get from open() is not always seekable [...]
This was all discussed a year ago, and rather than reiterate it I'll try to expand the wiki page when I have a chance. Maybe all of this new discussion should be on the wiki also. -- Ben

Ben Rudiak-Gould
Marcin 'Qrczak' Kowalczyk wrote:
Ben Rudiak-Gould
writes: fileRead can be implemented in terms of OS primitives,
Only if they already support reading from a fixed offset (like pread). I'm not sure if we can rely on something like this being always available, or whether it should be emulated using lseek which is safe only as long as we are the only process using the given open file.
First of all, I don't think any OS shares file pointers between processes. Otherwise it would be practically impossible to safely use an inherited filehandle via any API. Different threads using the same filehandle do share a file pointer (which is a major nuisance in my experience, because of the lack of an atomic seek- read/write), but a Posix fork duplicates the file pointer along with all other state. I can't believe I'm wrong about this, but someone please correct me if I am.
This may be what you wrote, but let me still put it: dup()-ed filehandles share a common file position. Handles straight from open() have independent file positions. fork() duplicates filehandles and the child inherits those => the child process shares the file position with its parent. Threads or processes doesn't make the difference; dup() or open() does. This is my interpretation of the docs, I didn't test it... :) -- Feri.

Ferenc Wagner wrote:
dup()-ed filehandles share a common file position.
They also share the file status flags (O_NONBLOCK, O_APPEND etc). So,
enabling or disabling non-blocking I/O will affect all descriptors
obtained by duplication (either by dup/dup2 or by fork).
OTOH, each descriptor has its own set of descriptor flags (i.e. the
close-on-exec flag).
A related issue is that device state (e.g. terminal settings) is a
property of the device itself, and so is shared amongst all
descriptors which refer to the device regardless of whether they were
created by dup/dup2 or a separate open() call.
For this reason, hSetBuffering shouldn't be modifying the ICANON flag,
IMHO.
--
Glynn Clements

First of all, I don't think any OS shares file pointers between processes. Otherwise it would be practically impossible to safely use an inherited filehandle via any API. Different threads using the same filehandle do share a file pointer (which is a major nuisance in my experience, because of the lack of an atomic seek-read/write), but a Posix fork duplicates the file pointer along with all other state. I can't believe I'm wrong about this, but someone please correct me if I am.
I'm afraid you _are_ wrong :-(... POSIX quite clearly states that the
file
descriptors are duplicated by fork(), but the "open file descriptions"
(your "file pointers") are shared. This is obvious by observing what
happens when you write to the same fd in parent and child after a
fork: you don't end up overwriting, you end up interleaving.
--KW 8-)
--
Keith Wansbrough

"Simon Marlow"
For unix, there are couple different tacks one could take. The locale system is standard, and does work, but is ugly and a pain to work with. In particular, it's another (set of) global variables. And what do you do with a character not expressible in the current locale? I'd like to possibility of different character sets for different files, for example.
Not a problem. Have you looked at the streams proposal?
I don't suppose this will make (stream)getContents any more efficient, beyond reducing the data size from Char to Word8? (So I still need to use explicit buffering (as described by Peter Simons, IIRC) to get fast IO?) And one small comment: is it still considered good form to prefix functions with argument type (instead of using modules - i.e. streamGetContents and fileGetContents as opposed to Stream.getContents and File.getContents). -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On Mon, 10 Jan 2005 17:12:44 -0000, Simon Marlow
Not a problem. Have you looked at the streams proposal?
I've missed most of the discussion on this, so if someone could just clarify the reasons for a few things I find peculiar: * Prefixing function names with their types, not necessary with qualified imports. * Why not put isEOS in the InputStream class since it doesn't make sense for OutputStreams? * Why not introduce an additional class BufferedStream containing the the Buffering-functions (instead of having them in Stream and resorting to weird failure-patterns for non-buffered streams): setBufferMode :: s -> BufferMode -> IO () getBufferMode :: s -> IO BufferMode flush :: s -> IO () sync :: s -> IO Bool Where setBufferMode is not allowed to "fail" (by returning False). If for some reason it does fail an exception is thrown. Same thing with getBufferMode. It will return a Buffering or throw an exception (not return "NoBuffering" instead of a failure). /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

On 2005-01-10, Simon Marlow
- Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. Some progress has been made on an experimental prototype (see recent discussion on this list).
Many of the easy ways to do this that I've heard proposed make the current hacks for binary IO fail.
Making hacks fail isn't necessarily a bad thing :-)
True, but making the tasks the hacks enable impossible (sans FFI) is. And doing that myself involves rewrapping stdio and networking.
I'm not keen to provide binary IO on top of the existing IO library, and then to have Unicode as a layer on top of that. Performance will be terrible. It needs to be designed properly from the ground up.
Agreed. I was thinking in the end binary IO as the base and both the current standard and some unicode on top of that, though even that might be unwieldy.
Not a problem. Have you looked at the streams proposal?
I took a brief look a while back. I don't have much time to hack on anything these days, and nothing stood out as being obviously horrible. I'll take another look. -- Aaron Denney -><-
participants (10)
-
Aaron Denney
-
Andre Pang
-
Ben Rudiak-Gould
-
Ferenc Wagner
-
Glynn Clements
-
Keith Wansbrough
-
Ketil Malde
-
Marcin 'Qrczak' Kowalczyk
-
Sebastian Sylvan
-
Simon Marlow