RE: getpid() or something similar

Hal Daume writes:
In GHC in the Posix module, there's GetProcessID:
Thanks for the pointer!
Unfortunately, this is a GHC-only solution, which is really annoying in my case because the software is completely compiler-independent, and I don't want to restrict it to GHC because I need it to determine a unique file name for a temporary file!
Is there no other way? I cannot be the first person ever to need a temporary file, can I? Surely there must be some portable way, too?
Even using a ProcessID doesn't guarnatee uniqueness, it just gives you a good chance that your file won't conflict - you still have to cope with the situation that a file with that name already exists. The right way to do this is to try to open it for writing, and try a different name if the open fails. Using System.Random to get a starting point is probably just as good as using the ProcessID, and is more portable. Cheers, Simon

Simon Marlow writes:
Even using a ProcessID doesn't guarnatee uniqueness [...]
Why not? If I use the file only temporarily (it is gone once the process terminates), something like /tmp/foo.<pid> will be unique, for all I can tell.
The right way to do this is to try to open it for writing, and try a different name if the open fails.
Unfortunately, this approach features a race condition because POSIX has no notion of mandatory file locking. I'd really rather avoid this, if I can. Is there no other way? And one that works across different compilers? Peter

On Thu, May 22, 2003 at 12:05:46PM +0200, Peter Simons wrote:
Simon Marlow writes:
Even using a ProcessID doesn't guarnatee uniqueness [...]
Why not? If I use the file only temporarily (it is gone once the process terminates), something like /tmp/foo.<pid> will be unique, for all I can tell.
A Bad Person using another program may have created a file named /tmp/foo.<pid> to annoy you (or crash your program)... -- David Roundy http://www.abridgegame.org

On Thu, May 22, 2003 at 12:05:46PM +0200, Peter Simons wrote: > Simon Marlow writes: > > Even using a ProcessID doesn't guarnatee uniqueness [...] > Why not? If I use the file only temporarily (it is gone once the > process terminates), something like /tmp/foo.will be unique, for > all I can tell. standard procedure is to integrate: 1) your hostname 2) the current time 3) the pid 4) an incrementing counter 5) your program name into the name. this should be guarenteed to be unique. the reason to include your hostname is that you may be on an NFS filesystem where different systems will be using the same PID at the same time. the reason to use the current time is pid's are reused. and you need the counter to create multiple temporary files within a second of each other. the program name is because other apps may use this scheme. > > The right way to do this is to try to open it for writing, and try > > a different name if the open fails. > > Unfortunately, this approach features a race condition because POSIX > has no notion of mandatory file locking. I'd really rather avoid this, > if I can. > > Is there no other way? And one that works across different compilers? mandatory locks arn't needed. (and they are a common extension to the fcntl(2) locking mechanism anyway, at least I do not know of a system which doesn't support them) open(..., O_RDWR | O_CREAT | O_EXCL, 0600); is what you want, (wrapped in haskell of course) it will create the file if it doesnt exit (O_CREAT) but if it already does exist then it will return an error (EEXIST). this check is done ATOMICALLY, meaning there is no race condition. of course, if you follow the above advice when choosing a name then this is unlikely to be an issue unless a program tries to step on your toes. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Alright, you all got me convinced. :-) Thanks a lot for the helpful replies! Peter

mandatory locks arn't needed. (and they are a common extension to the fcntl(2) locking mechanism anyway, at least I do not know of a system which doesn't support them)
open(..., O_RDWR | O_CREAT | O_EXCL, 0600); is what you want, (wrapped in haskell of course) it will create the file if it doesnt exit (O_CREAT) but if it already does exist then it will return an error (EEXIST). this check is done ATOMICALLY, meaning there is no race condition.
Not so; on NFS, the implementation is *not* atomic, and the race remains. See the Linux open(2) man page, for example: O_EXCL When used with O_CREAT, if the file already exists it is an error and the open will fail. In this con text, a symbolic link exists, regardless of where its points to. O_EXCL is broken on NFS file sys tems, programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also suc cessful. The suggested solution still requires a unique filename, so you still need to use the complicated filename-generation technique, and you still can't proof yourself against a hostile user guessing the right name to use at the right moment. Summary: NFS is broken, but it's too late now to fix it. --KW 8-)

On Thu, 22 May 2003 08:21:37 -0700 John Meachamwrote: > On Thu, May 22, 2003 at 12:05:46PM +0200, Peter Simons wrote: > > Simon Marlow writes: > > > Even using a ProcessID doesn't guarnatee uniqueness [...] > > Why not? If I use the file only temporarily (it is gone once the > > process terminates), something like /tmp/foo. will be unique, > > for all I can tell. > standard procedure is to integrate: > 1) your hostname > 2) the current time > 3) the pid > 4) an incrementing counter > 5) your program name > > into the name. this should be guarenteed to be unique. the reason to > include your hostname is that you may be on an NFS filesystem where > different systems will be using the same PID at the same time. the > reason to use the current time is pid's are reused. and you need the > counter to create multiple temporary files within a second of each > other. the program name is because other apps may use this scheme. > > > > > The right way to do this is to try to open it for writing, and > > > try a different name if the open fails. > > > > Unfortunately, this approach features a race condition because POSIX > > has no notion of mandatory file locking. I'd really rather avoid > > this, if I can. > > > > Is there no other way? And one that works across different > > compilers? > > mandatory locks arn't needed. (and they are a common extension to the > fcntl(2) locking mechanism anyway, at least I do not know of a system > which doesn't support them) > > open(..., O_RDWR | O_CREAT | O_EXCL, 0600); > is what you want, (wrapped in haskell of course) it will create the > file if it doesnt exit (O_CREAT) but if it already does exist then it > will return an error (EEXIST). this check is done ATOMICALLY, meaning > there is no race condition. of course, if you follow the above advice > when choosing a name then this is unlikely to be an issue unless a > program tries to step on your toes. > John Why do you need such a unique name, using the open call you can always choose another if it already exists. One way or the other you still need to atomically check for security reasons, no matter how unique your name is your code shouldn't rely on the file not being created between checking and creation, uniqueness means little for a malicious attack. Simply using a random number generator would seem sufficient, though I'd probably do, at least, progname++num, though more so that the user can see what's related to what (if files get left around) than on the off chance of 4 billion numerically named files.

On Thu, May 22, 2003 at 12:21:10PM -0400, Derek Elkins wrote:
Why do you need such a unique name, using the open call you can always choose another if it already exists. One way or the other you still need to atomically check for security reasons, no matter how unique your name is your code shouldn't rely on the file not being created between checking and creation, uniqueness means little for a malicious attack. Simply using a random number generator would seem sufficient, though I'd probably do, at least, progname++num, though more so that the user can see what's related to what (if files get left around) than on the off chance of 4 billion numerically named files.
like someone else mentioned, NFS (and probably some other filesystems) have wierd semantics where O_CREAT | O_EXCL don't work properly always. on such broken filesystems there is not much you can do, but to make the system as robust as possible one should use everything at their disposal. I should also mention that I didn't just make up the previous formula for 'robust' temporary files. they are used in various applications and that is 'best common practice'. see the Maildir format for another example. as for 32bit random numbers being sufficient, I have one technical argument against it and one anecdote. The technical argument is called 'the birthday attack', a web search will provide lots of info on it. The upshot is the 32 bits is not nearly as secure as you think because probability does not work the way our intuition says. The anecdote involved a certain distributed operating system which when booted, would wait a random amount of time to connect to the server since there would be many machines, and all of them connecting at once would wedge the server. all seemed well until their first power outage, the power came back up and 3 minutes later the whole system came tumbling down much to their surprise. their random number generator was seeded from their clock, since the power came on for all systems at the same time, all the random number generators were seeded with the same value and hence the first thing that came out of them was the same for every system. The moral, random numbers arn't always random :) For the quick and dirty behind the birthday attack ask yourself this: how many people do you need before two of them most likely (> 50% chance) share a birthday? if you said anything greater than about 20 then that is too high. basically the number of random samples needed before two collide is much smaller than one thinks, it grows as the square root so a bigger space (like 32bits) doesn't help as much as one might think. -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

On Thu, 22 May 2003 09:47:08 -0700
John Meacham
On Thu, May 22, 2003 at 12:21:10PM -0400, Derek Elkins wrote:
Why do you need such a unique name, using the open call you can always choose another if it already exists. One way or the other you still need to atomically check for security reasons, no matter how unique your name is your code shouldn't rely on the file not being created between checking and creation, uniqueness means little for a malicious attack. Simply using a random number generator would seem sufficient, though I'd probably do, at least, progname++num, though more so that the user can see what's related to what (if files get left around) than on the off chance of 4 billion numerically named files.
like someone else mentioned, NFS (and probably some other filesystems) have wierd semantics where O_CREAT | O_EXCL don't work properly always. on such broken filesystems there is not much you can do, but to make the system as robust as possible one should use everything at their disposal.
As you say in that case, there's nothing you can do, however, the long filename does nothing to raise security, which was my focus.
I should also mention that I didn't just make up the previous formula for 'robust' temporary files. they are used in various applications and that is 'best common practice'. see the Maildir format for another example.
I'm didn't say that it was a bad idea or that you made it up, just seems like overkill (but if computer/video games have taught me anything, overkill is good). However, the main reason for my response, was the reply slightly gave the impression that the long filename itself was good enough. I just wanted to make clear that that is NOT the case (well part of my reply). The other part is just that I suspect that many programs can reasonably (though it does make some presumptions on the use) get by without such a gung ho effort.
as for 32bit random numbers being sufficient, I have one technical argument against it and one anecdote. The technical argument is called 'the birthday attack', a web search will provide lots of info on it. The upshot is the 32 bits is not nearly as secure as you think because probability does not work the way our intuition says. The anecdote involved a certain distributed operating system which when booted, would wait a random amount of time to connect to the server since there would be many machines, and all of them connecting at once would wedge the server. all seemed well until their first power outage, the power came back up and 3 minutes later the whole system came tumbling down much to their surprise. their random number generator was seeded from their clock, since the power came on for all systems at the same time, all the random number generators were seeded with the same value and hence the first thing that came out of them was the same for every system. The moral, random numbers arn't always random :) [[ On computers, random numbers are rarely random ]]
I'm aware of what the birthday attack is. I also thought up this problem on a smaller scale, of simply starting two instances at the same time and seeding based on time (moral here is "Don't seed on time in this case!" and as a side note "Don't use that generator anymore, if knowledge of the random numbers has security value"). On the small scale, it's not too much of a problem, especially if relatively few temp files are used. I'm not sure how extreme this scenario would have to get for it to be a real problem. It'd certainly depend on the application. "The upshot is the 32 bits is not nearly as secure as you think ..." Taking the you personally, whether you meant that or not, I didn't put forth my views on it's security. On non-broken filesystems, the random number would have no security benefit. However, even on broken filesystems, the random number would be no less secure than the long filename (especially if the count always started at some known number and/or incremented a known amount). You'd have me on robustness though, I was focusing on the security aspect.

"Simon Marlow"
Is there no other way? I cannot be the first person ever to need a temporary file, can I? Surely there must be some portable way, too?
Even using a ProcessID doesn't guarnatee uniqueness, it just gives you a good chance that your file won't conflict - you still have to cope with
How about wrapping mkstemp (3)? And perhaps add the wrapper somewhere in System? Sounds like useful functionality to me. -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (7)
-
David Roundy
-
Derek Elkins
-
John Meacham
-
Keith Wansbrough
-
ketil@ii.uib.no
-
Peter Simons
-
Simon Marlow