Adding new contructors to IOMode to support "don't overwrite if already exists" behavior

The current behavior for System.IO.openFile will either truncate an existing file (with WriteMode) or append new content to the end (with AppendMode). However, there's another behavior that's available from the underlying API: signal an error when the file already exists. I poked around at the code and i think that with a couple extra variants of IOMode, this behavior could be added with a relatively small patch. I have a branch pushed where i added this behavior: https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/1ff18d4d3fc63f42c371... I'm a relative newcomer to contributing to GHC or the base library, so if i'm missing something in my patch please let me know. Specifically, i would like to know is there's a place i can add a test for this behavior. I'm also open to changing the names of the new IOModes - this was just something i wrote in to get something working. Looking forward to working with maintainers to get this added! Thanks, Grey Mitchell (@QuietMisdreavus)

At least naively, this seems like a really good idea.
1) do folks who have more experience across the range of supported
platforms have any opinions about these additional semantics ?
2) are these names suitably descriptive / unambiguous and or otherwise
widely used / discoverable ?
On Mon, Mar 30, 2020 at 12:22 PM Grey Mitchell
The current behavior for System.IO.openFile will either truncate an existing file (with WriteMode) or append new content to the end (with AppendMode). However, there's another behavior that's available from the underlying API: signal an error when the file already exists. I poked around at the code and i think that with a couple extra variants of IOMode, this behavior could be added with a relatively small patch.
I have a branch pushed where i added this behavior:
https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/1ff18d4d3fc63f42c371...
I'm a relative newcomer to contributing to GHC or the base library, so if i'm missing something in my patch please let me know. Specifically, i would like to know is there's a place i can add a test for this behavior. I'm also open to changing the names of the new IOModes - this was just something i wrote in to get something working.
Looking forward to working with maintainers to get this added!
Thanks, Grey Mitchell (@QuietMisdreavus) _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

One really important question is what modes are actually promised by actual
file systems on every posix or otherwise supported platforms . Table
stakes is code should work well on every tier 1 platform and tier 2 as well
And it’s been pointed out to me that this is part of the Haskell report so
whatever final design is done needs to be excellent
Meta aside : on and off I sometimes wonder what the world would be like if
file systems weren’t stuck in posix minima tar pits (heck an Mvcc with
transactions file system would rock). But that’s unrelated to improving
the world we have. Do either way this at least starts a discussion about
what we can do better today !
Pardon the slow reply, I think everyone everywhere is having a very strange
start to spring.
Be well and look forward to improving bits and pieces of how we deal with
files on computers ;)
-Carter
On Tue, Mar 31, 2020 at 5:51 PM Carter Schonwald
At least naively, this seems like a really good idea.
1) do folks who have more experience across the range of supported platforms have any opinions about these additional semantics ?
2) are these names suitably descriptive / unambiguous and or otherwise widely used / discoverable ?
On Mon, Mar 30, 2020 at 12:22 PM Grey Mitchell
wrote: The current behavior for System.IO.openFile will either truncate an existing file (with WriteMode) or append new content to the end (with AppendMode). However, there's another behavior that's available from the underlying API: signal an error when the file already exists. I poked around at the code and i think that with a couple extra variants of IOMode, this behavior could be added with a relatively small patch.
I have a branch pushed where i added this behavior:
https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/1ff18d4d3fc63f42c371...
I'm a relative newcomer to contributing to GHC or the base library, so if i'm missing something in my patch please let me know. Specifically, i would like to know is there's a place i can add a test for this behavior. I'm also open to changing the names of the new IOModes - this was just something i wrote in to get something working.
Looking forward to working with maintainers to get this added!
Thanks, Grey Mitchell (@QuietMisdreavus) _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Re: Platform support: I have access to Windows, Linux, and Mac machines, so i can run a quick test if need be. (My current development environment is on a Mac, so i'd need to set up a fresh environment on my Windows and Linux machines.) My assumption is the the EXCL flag being enabled by these modes is part of the basic support for the POSIX open() call - even the MSVC C standard library includes it as part of its emulated open(). It would of course be important to verify this by running the code. Re: Names: I want to reiterate that the names in my branch were sketches. I did a quick survey of a handful of programming languages just now to see is there was any existing consensus. Rust and .NET call the behavior "create new" in their enumerations. The docs for Python and Ruby call the flag "exclusive create" or "exclusive access", reflecting the name of the underlying C flag. This seems to tell me that my initial suggestion of "no overwrite" doesn't really match up; i'd like to offer another suggestion of WriteCreateNewMode and ReadWriteCreateNewMode. No worries about the slow reply! I've been there, so it's totally understandable. -grey On Tue, Mar 31, 2020, at 4:15 PM, Carter Schonwald wrote:
One really important question is what modes are actually promised by actual file systems on every posix or otherwise supported platforms . Table stakes is code should work well on every tier 1 platform and tier 2 as well
And it’s been pointed out to me that this is part of the Haskell report so whatever final design is done needs to be excellent
Meta aside : on and off I sometimes wonder what the world would be like if file systems weren’t stuck in posix minima tar pits (heck an Mvcc with transactions file system would rock). But that’s unrelated to improving the world we have. Do either way this at least starts a discussion about what we can do better today !
Pardon the slow reply, I think everyone everywhere is having a very strange start to spring.
Be well and look forward to improving bits and pieces of how we deal with files on computers ;)
-Carter
On Tue, Mar 31, 2020 at 5:51 PM Carter Schonwald
wrote: At least naively, this seems like a really good idea.
1) do folks who have more experience across the range of supported platforms have any opinions about these additional semantics ?
2) are these names suitably descriptive / unambiguous and or otherwise widely used / discoverable ?
On Mon, Mar 30, 2020 at 12:22 PM Grey Mitchell
wrote: The current behavior for System.IO.openFile will either truncate an existing file (with WriteMode) or append new content to the end (with AppendMode). However, there's another behavior that's available from the underlying API: signal an error when the file already exists. I poked around at the code and i think that with a couple extra variants of IOMode, this behavior could be added with a relatively small patch.
I have a branch pushed where i added this behavior: https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/1ff18d4d3fc63f42c371...
I'm a relative newcomer to contributing to GHC or the base library, so if i'm missing something in my patch please let me know. Specifically, i would like to know is there's a place i can add a test for this behavior. I'm also open to changing the names of the new IOModes - this was just something i wrote in to get something working.
Looking forward to working with maintainers to get this added!
Thanks, Grey Mitchell (@QuietMisdreavus) _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Thanks, Grey

Let's have a look at the POSIX spec for open(): https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html It very clearly distinguishes between *2* disjoint sets of flags: * You have to use exactly one of O_EXEC, O_RDONLY, O_RDWR, O_SEARCH, and O_WRONLY. * In addition, you can specify any combination of O_APPEND, O_CLOEXEC, O_CREAT, O_DIRECTORY, O_DSYNC, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_RSYNC, O_SYNC, O_TRUNC, and O_TTY_INIT. Alas, GHC.IO.FD and GHC.IO.IOMode completely ignore this distinction already and add tons of special cases: * They add an ad hoc combination of O_WRONLY and O_APPEND (AppendMode). * They leave out O_EXEC and O_SEARCH. * They add an ad hoc boolean flag for non-blocking I/O to openFile, i.e. pick 1 of the 13 possible additional flags. * For some obscure reason, O_NOCTTY is always added. * HandleType doesn't reflect the first set of flags. My proposal is to not make this mess even worse by adding yet another ad hoc combination, but to think about how to expose these two sets in a sane way. I didn't have a look at the Windows counterpart or into all the *nix variants yet, but I guess that the overlap in the API is quite big. But even the intersection of all the relevant platforms will very probably have the distinction between the 2 sets of flags, so we should reflect that in the Haskell API, too.

Hi all! I took another look at this this past weekend and came up with a different solution, that might fit Sven's idea of a more-general mechanism to expose the `open()` flags. I have a new branch open on GitLab: https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/ed8f142f9e18f8ef7389... https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/ed8f142f9e18f8ef7389b8e8e4d722dcaf740971?view=parallel&w=1 I'm curious at what point i should turn this branch into an MR. The rough idea is that i introduced a new `IOFlags` type, which wraps the idea of `open()` flags, and added some new "openFileWithFlags" functions that take this new type in addition to the existing `IOMode`. These flags are then combined with the base ones from `IOMode` to create the final set of flags given to `open()`. I decided to keep all the existing "openFile" (et al) behavior w.r.t. always creating the file, clearing the file if `WriteMode`/`ReadWriteMode` were given instead of `AppendMode`, etc. In the initial commit, i only added EXCL since all the other existing flags are exposed by either existing IOMode behavior or in other functions, but it could be possible to add new flags. ----- Speaking of additional flags to support, i did some additional research into the different sets of flags exposed on different platforms. I'm choosing to ignore the MSVC version of `open()` for the moment, since MSYS is used on Windows currently, and they expose the Linux version of the syscall. For example, here's an excerpt from the man page for `open(2)`, on my macOS system, talking about `oflags`: ``` The flags specified for the oflag argument must include exactly one of the following file access modes: O_RDONLY open for reading only O_WRONLY open for writing only O_RDWR open for reading and writing In addition any combination of the following values can be or'ed in oflag: O_NONBLOCK do not block on open or for data to become available O_APPEND append on each write O_CREAT create file if it does not exist O_TRUNC truncate size to 0 O_EXCL error if O_CREAT and the file exists O_SHLOCK atomically obtain a shared lock O_EXLOCK atomically obtain an exclusive lock O_NOFOLLOW do not follow symlinks O_SYMLINK allow open of symlinks O_EVTONLY descriptor requested for event notifications only O_CLOEXEC mark as close-on-exec ``` (I'm not sure if there are up-to-date man pages for macOS online any more? The only pages i could find on the Apple Developer site were out-of-date pages labeled for iOS, most recently updated in 2016.) Just from this list, it's worthwhile to winnow out the flags that are already exposed or made irrelevant by existing behavior. For example, O_CREAT is always passed for IOModes that open a file for writing, and O_APPEND is already exposed in AppendMode. O_TRUNC is done manually in GHC.IO.FD.openFile' when the IOMode of WriteMode is passed in. O_SHLOCK and O_EXLOCK are also done inside the runtime in C code in `rts/fileLock.c`. (Would it be worthwhile to expose these flags anyway to allow for system-level locking?) It looks like O_SYMLINK is macOS-exclusive, since it doesn't appear in the pages for Linux or FreeBSD (linked below). O_CLOEXEC sounds like behavior that's otherwise not exposed, but is it valid to `exec()` here? I'm admittedly not super familiar with all the internals that would need to be accounted for in a fork/exec combo. To my eyes, that leaves O_EXCL (which i've exposed in my commit), O_NONBLOCK (which is currently exposed in separate functions which i did not wrap in my commit), O_NOFOLLOW, and maybe O_SHLOCK/O_EXLOCK. This also doesn't mention Linux/BSD-exclusive flags that are not exposed on macOS. For reference, here is the Linux page for `open(3p)`: http://man7.org/linux/man-pages/man3/open.3p.html and the FreeBSD page for `open(2)`: https://www.freebsd.org/cgi/man.cgi?query=open&sektion=2&manpath=FreeBSD+12.1-RELEASE+and+Ports These mention even more flags, including the additional modes of O_EXEC and (on Linux) O_SEARCH. An interesting flag not mentioned in the macOS page is O_SYNC (and friends, O_FSYNC/O_DIRECT on BSD and O_DSYNC/O_RSYNC on Linux), which provide extra filesystem synchronization outside of the runtime. They both also mention O_DIRECTORY, which seems like a good guard to expose similarly to O_EXCL or O_NOFOLLOW. If necessary, we could expose the IOFlags constructor and allow libraries like `unix` to expose system-specific flags themselves. ----- Regardless, i'm curious about everyone's impression of this approach. This should allow a more extensible method to expose these additional flags, which should sidestep the previous concerns about naming the new IOModes or creating "new ad-hoc combinations of flags". Thanks, Victoria Mitchell (formerly Grey Mitchell) (@QuietMisdreavus) https://quietmisdreavus.net/ On Wed, Apr 1, 2020, at 12:45 AM, Sven Panne wrote:
Let's have a look at the POSIX spec for open(): https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html It very clearly distinguishes between *2* disjoint sets of flags:
* You have to use exactly one of O_EXEC, O_RDONLY, O_RDWR, O_SEARCH, and O_WRONLY.
* In addition, you can specify any combination of O_APPEND, O_CLOEXEC, O_CREAT, O_DIRECTORY, O_DSYNC, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_RSYNC, O_SYNC, O_TRUNC, and O_TTY_INIT.
Alas, GHC.IO.FD and GHC.IO.IOMode completely ignore this distinction already and add tons of special cases:
* They add an ad hoc combination of O_WRONLY and O_APPEND (AppendMode).
* They leave out O_EXEC and O_SEARCH.
* They add an ad hoc boolean flag for non-blocking I/O to openFile, i.e. pick 1 of the 13 possible additional flags.
* For some obscure reason, O_NOCTTY is always added.
* HandleType doesn't reflect the first set of flags.
My proposal is to not make this mess even worse by adding yet another ad hoc combination, but to think about how to expose these two sets in a sane way.
I didn't have a look at the Windows counterpart or into all the *nix variants yet, but I guess that the overlap in the API is quite big. But even the intersection of all the relevant platforms will very probably have the distinction between the 2 sets of flags, so we should reflect that in the Haskell API, too.

O_NOFOLLOW|O_PATH is more or less the same as O_SYMLINK: it gives you
a handle on a symlink. But there are semantic differences including in
how you make use of it. O_NOFOLLOW by itself is useful for secure
applications; O_PATH more or less mandates system-specific code that
we tend to either not support or fold into POSIX or Win32 support.
The file locking flags are a minefield: they don't mean the same thing
on Windows vs. POSIX, and may have semantic differences between POSIX
platforms. It's better to use the low level calls as described above
if you need locking, since you can then write the appropriate code for
the platform and your use case.
O_DIRECTORY by itself isn't problematic, but only does something
useful on FreeBSD (you can read a directory's raw contents using the
dirent code). Other calls are system dependent as above, and force you
into POSIX if supported at all.
In short, there's good reason for most of the open flags to not be
supported in the OS-independent openFile. O_EXCL and maybe O_NOFOLLOW
are the most likely to be portably useful.
On 5/11/20, Victoria Mitchell
Hi all!
I took another look at this this past weekend and came up with a different solution, that might fit Sven's idea of a more-general mechanism to expose the `open()` flags.
I have a new branch open on GitLab: https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/ed8f142f9e18f8ef7389... https://gitlab.haskell.org/QuietMisdreavus/ghc/-/commit/ed8f142f9e18f8ef7389b8e8e4d722dcaf740971?view=parallel&w=1 I'm curious at what point i should turn this branch into an MR.
The rough idea is that i introduced a new `IOFlags` type, which wraps the idea of `open()` flags, and added some new "openFileWithFlags" functions that take this new type in addition to the existing `IOMode`. These flags are then combined with the base ones from `IOMode` to create the final set of flags given to `open()`. I decided to keep all the existing "openFile" (et al) behavior w.r.t. always creating the file, clearing the file if `WriteMode`/`ReadWriteMode` were given instead of `AppendMode`, etc.
In the initial commit, i only added EXCL since all the other existing flags are exposed by either existing IOMode behavior or in other functions, but it could be possible to add new flags.
-----
Speaking of additional flags to support, i did some additional research into the different sets of flags exposed on different platforms. I'm choosing to ignore the MSVC version of `open()` for the moment, since MSYS is used on Windows currently, and they expose the Linux version of the syscall.
For example, here's an excerpt from the man page for `open(2)`, on my macOS system, talking about `oflags`:
``` The flags specified for the oflag argument must include exactly one of the following file access modes:
O_RDONLY open for reading only O_WRONLY open for writing only O_RDWR open for reading and writing
In addition any combination of the following values can be or'ed in oflag:
O_NONBLOCK do not block on open or for data to become available O_APPEND append on each write O_CREAT create file if it does not exist O_TRUNC truncate size to 0 O_EXCL error if O_CREAT and the file exists O_SHLOCK atomically obtain a shared lock O_EXLOCK atomically obtain an exclusive lock O_NOFOLLOW do not follow symlinks O_SYMLINK allow open of symlinks O_EVTONLY descriptor requested for event notifications only O_CLOEXEC mark as close-on-exec ```
(I'm not sure if there are up-to-date man pages for macOS online any more? The only pages i could find on the Apple Developer site were out-of-date pages labeled for iOS, most recently updated in 2016.)
Just from this list, it's worthwhile to winnow out the flags that are already exposed or made irrelevant by existing behavior. For example, O_CREAT is always passed for IOModes that open a file for writing, and O_APPEND is already exposed in AppendMode. O_TRUNC is done manually in GHC.IO.FD.openFile' when the IOMode of WriteMode is passed in. O_SHLOCK and O_EXLOCK are also done inside the runtime in C code in `rts/fileLock.c`. (Would it be worthwhile to expose these flags anyway to allow for system-level locking?) It looks like O_SYMLINK is macOS-exclusive, since it doesn't appear in the pages for Linux or FreeBSD (linked below). O_CLOEXEC sounds like behavior that's otherwise not exposed, but is it valid to `exec()` here? I'm admittedly not super familiar with all the internals that would need to be accounted for in a fork/exec combo.
To my eyes, that leaves O_EXCL (which i've exposed in my commit), O_NONBLOCK (which is currently exposed in separate functions which i did not wrap in my commit), O_NOFOLLOW, and maybe O_SHLOCK/O_EXLOCK. This also doesn't mention Linux/BSD-exclusive flags that are not exposed on macOS.
For reference, here is the Linux page for `open(3p)`: http://man7.org/linux/man-pages/man3/open.3p.html and the FreeBSD page for `open(2)`: https://www.freebsd.org/cgi/man.cgi?query=open&sektion=2&manpath=FreeBSD+12.1-RELEASE+and+Ports
These mention even more flags, including the additional modes of O_EXEC and (on Linux) O_SEARCH. An interesting flag not mentioned in the macOS page is O_SYNC (and friends, O_FSYNC/O_DIRECT on BSD and O_DSYNC/O_RSYNC on Linux), which provide extra filesystem synchronization outside of the runtime. They both also mention O_DIRECTORY, which seems like a good guard to expose similarly to O_EXCL or O_NOFOLLOW.
If necessary, we could expose the IOFlags constructor and allow libraries like `unix` to expose system-specific flags themselves.
-----
Regardless, i'm curious about everyone's impression of this approach. This should allow a more extensible method to expose these additional flags, which should sidestep the previous concerns about naming the new IOModes or creating "new ad-hoc combinations of flags".
Thanks, Victoria Mitchell (formerly Grey Mitchell) (@QuietMisdreavus) https://quietmisdreavus.net/
On Wed, Apr 1, 2020, at 12:45 AM, Sven Panne wrote:
Let's have a look at the POSIX spec for open(): https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html It very clearly distinguishes between *2* disjoint sets of flags:
* You have to use exactly one of O_EXEC, O_RDONLY, O_RDWR, O_SEARCH, and O_WRONLY.
* In addition, you can specify any combination of O_APPEND, O_CLOEXEC, O_CREAT, O_DIRECTORY, O_DSYNC, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_RSYNC, O_SYNC, O_TRUNC, and O_TTY_INIT.
Alas, GHC.IO.FD and GHC.IO.IOMode completely ignore this distinction already and add tons of special cases:
* They add an ad hoc combination of O_WRONLY and O_APPEND (AppendMode).
* They leave out O_EXEC and O_SEARCH.
* They add an ad hoc boolean flag for non-blocking I/O to openFile, i.e. pick 1 of the 13 possible additional flags.
* For some obscure reason, O_NOCTTY is always added.
* HandleType doesn't reflect the first set of flags.
My proposal is to not make this mess even worse by adding yet another ad hoc combination, but to think about how to expose these two sets in a sane way.
I didn't have a look at the Windows counterpart or into all the *nix variants yet, but I guess that the overlap in the API is quite big. But even the intersection of all the relevant platforms will very probably have the distinction between the 2 sets of flags, so we should reflect that in the Haskell API, too.
-- brandon s allbery kf8nh allbery.b@gmail.com

O_NOFOLLOW|O_PATH is more or less the same as O_SYMLINK: it gives you a handle on a symlink. But there are semantic differences including in how you make use of it. O_NOFOLLOW by itself is useful for secure applications; O_PATH more or less mandates system-specific code that we tend to either not support or fold into POSIX or Win32 support.
O_NOFOLLOW is distinct from O_SYMLINK in that O_NOFOLLOW will error if the given path includes a symlink. That’s why i think it’s useful to expose, just as a new kind of failure mode that applications may want to account for.
The file locking flags are a minefield: they don't mean the same thing on Windows vs. POSIX, and may have semantic differences between POSIX platforms. It's better to use the low level calls as described above if you need locking, since you can then write the appropriate code for the platform and your use case.
This makes sense; i wasn’t familiar with the file locking APIs, so it’s important to know that they vary between platforms. This is also why i suggested allowing platform-support libraries to expose their specific set of IOFlags.
I should also add that I have no idea whether O_NOFOLLOW exists on Windows, or how it would be implemented on an old-style "symlink" / NTFS reparse point. This may again force it into system dependent code, especially in the latter case.
The MSVC POSIX compatibility implementation of `open()` does not define the O_NOFOLLOW flag, so if you're looking for whether Windows supports it, that's probably a good indicator: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?... But i was under the impression that MSYS was used on Windows, which defines a Linux-equivalent version of `open()`. I'm not sure how MSYS deals with O_NOFOLLOW, though. Thanks, Victoria Mitchell

On Mon, May 11, 2020, at 11:25 AM, Victoria Mitchell wrote:
The MSVC POSIX compatibility implementation of `open()` does not define the O_NOFOLLOW flag, so if you're looking for whether Windows supports it, that's probably a good indicator: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?...
But i was under the impression that MSYS was used on Windows, which defines a Linux-equivalent version of `open()`. I'm not sure how MSYS deals with O_NOFOLLOW, though.
So i pulled out one of my Windows machines and set up a GHC build environment on it, so check what was available. I wasn't sure about the exact setup that was going on, and i wanted to make sure before i started adding flags. Turns out, the minGW C compiler that GHC uses exposes the same set of flags that the MSVC C compiler does, meaning that there is no O_NOFOLLOW behavior possible on Windows. Unfortunately, this also means that the only really common flag between the major platforms that isn't already being used is O_EXCL. Thanks, Victoria Mitchell
participants (5)
-
Brandon Allbery
-
Carter Schonwald
-
Grey Mitchell
-
Sven Panne
-
Victoria Mitchell