
I'm trying to write to a temp file in /tmp and then move it to another location that may be in /tmp. If I use System.Directory.renameFile and the final location is in another filesystem, I'll get an error "renameFile:renamePath:rename: unsupported operation (Invalid cross-device link)". If I copy to the new location and then remove the temp file, but the new location is also in /tmp, I am doing an unnecessary copy (instead of mv), which is inefficient. I don't know how to detect whether two filepaths are in the same filesystem. What's the best way to move the file if possible but copy-and-delete if necessary? Josh

The source code to mv.c might be useful: https://github.com/coreutils/coreutils/blob/master/src/mv.c On 4/15/21 12:32 AM, ☂Josh Chia (謝任中) wrote:
I'm trying to write to a temp file in /tmp and then move it to another location that may be in /tmp.
If I use System.Directory.renameFile and the final location is in another filesystem, I'll get an error "renameFile:renamePath:rename: unsupported operation (Invalid cross-device link)".
If I copy to the new location and then remove the temp file, but the new location is also in /tmp, I am doing an unnecessary copy (instead of mv), which is inefficient.
I don't know how to detect whether two filepaths are in the same filesystem.
What's the best way to move the file if possible but copy-and-delete if necessary?
Josh
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I don't know how to detect whether two filepaths are in the same filesystem.
Yes you do!
If I use System.Directory.renameFile and the final location is in another filesystem, I'll get an error "renameFile:renamePath:rename: unsupported operation (Invalid cross-device link)".
And it follows that if you get no error, the two filepaths are in the same filesystem
What's the best way to move the file if possible but copy-and-delete if necessary?
Catch that error, and copy-and-delete. Donn

Catch that error, and copy-and-delete.
I suggest this is typical EAFP (Easier to ask for forgiveness than permission) coding style from Zen of Python http://docs.python.org//glossary.html#term-eafp http://docs.python.org//glossary.html#term-eafp I'm very curious to know how Haskellers think and do about it? Is there something close to Zen of Haskell?
On 2021-04-15, at 13:08, Donn Cave
wrote: I don't know how to detect whether two filepaths are in the same filesystem.
Yes you do!
If I use System.Directory.renameFile and the final location is in another filesystem, I'll get an error "renameFile:renamePath:rename: unsupported operation (Invalid cross-device link)".
And it follows that if you get no error, the two filepaths are in the same filesystem
What's the best way to move the file if possible but copy-and-delete if necessary?
Catch that error, and copy-and-delete.
Donn _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Thu, 15 Apr 2021, YueCompl via Haskell-Cafe wrote:
Catch that error, and copy-and-delete. I suggest this is typical EAFP (Easier to ask for forgiveness than permission) coding style from Zen of Python http://docs.python.org//glossary.html#term-eafp
I'm very curious to know how Haskellers think and do about it?
I think this is independent from the language. With regard to file operations you must use the exception-catching style, because if you check first and operate then, it might be that in the short time between someone alters the files or directories.

quoth YueCompl
Catch that error, and copy-and-delete.
I suggest this is typical EAFP (Easier to ask for forgiveness than permission) coding style from Zen of Python http://docs.python.org//glossary.html#term-eafp http://docs.python.org//glossary.html#term-eafp
I'm very curious to know how Haskellers think and do about it?
Is there something close to Zen of Haskell?
For another example, I quote from my man page for access(2), which can be used to determine if the caller has the necessary privilege etc. - " The result of access() should not be used to make an actual access control decision, since its response, even if correct at the moment it is formed, may be outdated at the time you act on it. access() results should only be used to pre-flight, such as when configuring user interface elements or for optimization purposes. The actual access control decision should be made by attempting to execute the relevant system call while holding the applicable credentials, and properly handling any resulting errors; and this must be done even though access() may have predicted success." That particular consideration doesn't apply so much to this case, that I can see anyway, but as a model for how to deal with situations like this it seems sound to me, as long as the operation can be trusted to either work or fail inexpensively and without side effects. You make the intended functionality the main program path; you handle failures as you see fit. However, I have to say, it appears to be less of a Haskell option than in most computer programming languages I'm familiar with. main = catchIOError (rename oldFile newFile) (\ e -> print (ioeGetErrorType e)) ... all I get out of that is "unsupported operation". I see no way to get to the POSIX error value, EXDEV, which isn't one of the few common errors that has a documented Haskell test. In OCaml for example, the well known errors are all enumerated (including EXDEV), but if you need to handle one that isn't, there's EUNKNOWNERR with a value. A casual look at the documentation suggests that while I can catch this error, Haskell doesn't give me the means to identify it. To make the most of my appearance here on haskell-cafe ... I find the notion of a "Zen" of this or that language kind of awkward. I suppose it's one of those usages that has taken on its own diluted popular meaning that has few traces of the original (goes back to "Zen in the Art of Archery"? I don't know.) But as little as the term may intend to refer the actual religious practice, that religious practice is a real thing with many adherents, a complicated doctrine with various schools, a lot of priests - and one where I find it very hard to recognize the appication to computer programming. Donn

Hi Josh, the fastest and safest way is to create the temporary file in the directory of the final location and then move the file to its final name. This way you ensure that the temporary file is always on the same device or partition as the final file and therefore a move is always possible. And it's the safest, because a move is an atomic operation - which copy isn't - and therefore other processes will never see a partially updated file. Greetings, Daniel

I'm not sure you need this, but on shared filesystem (nfs e.g.), to implement generate-on-demand fashioned data production with parallelism, a trick is to create a temporary directory that exclusively named by your particular run of the critical section of the code. Expect failure from the os in creation of a generated name in location where your final file would go, to ensure same filesystem, and decorate it (with the value from a local counter e.g.) to not appear as the ultimately desired path/name - if failed, try another name; otherwise you succeeded, you are sure this dir is exclusive per your current execution thread, then generate the data (which would take time) and put the payload into a file inside this dir, then fsync to make sure the storage is permanent, then rename it to the ultimate file path/name you'd like it to be. Maybe another parallel process did the same thing and race to overwrite your production, that'll be fine as long as the file data generation algorithm holds some invariant, you'll never have a corrupted/incomplete file on the expected path/name this way. I suggest this is never a Haskell thing but os tricks per se.
On 2021-04-15, at 17:05, Sven Panne
wrote: Am Do., 15. Apr. 2021 um 10:24 Uhr schrieb Daniel Trstenjak
mailto:daniel.trstenjak@gmail.com>: And it's the safest, because a move is an atomic operation [...] ... unless you are on Windows. ;-) _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Thu, Apr 15, 2021 at 11:05:25AM +0200, Sven Panne wrote:
Am Do., 15. Apr. 2021 um 10:24 Uhr schrieb Daniel Trstenjak
: And it's the safest, because a move is an atomic operation [...]
... unless you are on Windows. ;-)
I thought it is if you ensure that the move is done on the same device or partition. Greetings, Daniel

Am Do., 15. Apr. 2021 um 13:00 Uhr schrieb Daniel Trstenjak < daniel.trstenjak@gmail.com>:
On Thu, Apr 15, 2021 at 11:05:25AM +0200, Sven Panne wrote:
Am Do., 15. Apr. 2021 um 10:24 Uhr schrieb Daniel Trstenjak < daniel.trstenjak@gmail.com>:
And it's the safest, because a move is an atomic operation [...]
... unless you are on Windows. ;-)
I thought it is if you ensure that the move is done on the same device or partition.
Which "move" do you mean? ReplaceFileA's return code exposes various levels of non-atomicity ( https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-replac...) and Transactional NTFS (e.g. MoveFileTransactedA) has already been deprecated ( https://docs.microsoft.com/en-us/windows/win32/fileio/deprecation-of-txf). Perhaps there are other, even more arcane variations on this conceptually simple operation on Windows, but I don't know... Supporting this madness in a cross-platform way is even more interesting.

That would work but may not be the most efficient in all cases. On many
systems, /tmp is a tmpfs, which being memory-backed is more efficient than
a file on a physical disk or network, so writing to /tmp has performance
advantages.
On Thu, Apr 15, 2021 at 4:23 PM Daniel Trstenjak
Hi Josh,
the fastest and safest way is to create the temporary file in the directory of the final location and then move the file to its final name.
This way you ensure that the temporary file is always on the same device or partition as the final file and therefore a move is always possible.
And it's the safest, because a move is an atomic operation - which copy isn't - and therefore other processes will never see a partially updated file.
Greetings, Daniel

Hi, Am 15.04.21 um 11:27 schrieb ☂Josh Chia (謝任中):
That would work but may not be the most efficient in all cases. On many
systems, /tmp is a tmpfs, which being memory-backed is more efficient than a file on a physical disk or network, so writing to /tmp has performance advantages.
This isn't necessarily relevant in the case the OP describes. In the case where the target directory is also on tmpfs, writing there directly is just as fast as writing to /tmp first. In the case where the target directory is on a slower medium it is not more efficient to write to /tmp first, because you will still have the performance penalty once you copy from /tmp to the target directory. The latter case might be slower if you first write something to the file and later overwrite parts of it with other content or if you decide that you don't need the file at all and just delete it instead of copying it. But you can prevent even those performance penalties by not calling fsync (or close) on the open file before the file contains exactly the data you ultimately want. That way the written content will not actually be written to the medium before fsync is called and you get pretty much the same performance as writing on tmpfs. Though controlling when fsync is called might be tricky if the file is not produced by your own code. I'm also not sure if Haskell calls fsync implicitly in some cases other than closing the file descriptor. Regards Sven

On Thu, Apr 15, 2021 at 12:32:16PM +0800, ☂Josh Chia (謝任中) wrote:
What's the best way to move the file if possible but copy-and-delete if necessary?
The `conduit` package has `sinkFileCautious` that creates a temporary file in the target directory, deletes it if an exception is thrown, but otherwise renames the temp file to the requested name on success. https://hackage.haskell.org/package/conduit-1.3.4.1/docs/Conduit.html#v:sink... There are fancier things one can do on Linux systems with unnamed temporary files created via openat(2), that can be linked into the target directory only when ready via linkat(2), but this is not particularly portable, and not even supported by all Linux filesystems. AFAIK there is not possible to avoid a narrow window during which the temporary file exists under a transient name, because linkat(2) does not provide a way to atomically replace the target if it exists. So the calling sequence is (with appropriate error checks, not shown): fd = openat(dirfd, ...|O_TMPFILE, mode); write(fd, ...); ... write(fd, ...); (void) unlink("file.tmp"); linkat(fd, "", dirfd, "file.tmp", AT_EMPTY_PATH); renameat(dirfd, "file.tmp", dirfd, "file"); -- Viktor.
participants (9)
-
Daniel Trstenjak
-
David Kraeutmann
-
Donn Cave
-
Henning Thielemann
-
Sven Bartscher
-
Sven Panne
-
Viktor Dukhovni
-
YueCompl
-
☂Josh Chia (謝任中)