Reading a list of null-terminated filenames from stdin?

If I want to read a list of filenames, each terminated with a nul byte, from stdin (kind of like xargs -0), what's the best way to do that in Haskell? Or am I swimming against the current to do anything but newline-termination? Google's top hit is: https://downloads.haskell.org/~ghc/7.6.1/docs/html/libraries/bytestring-0.10... ...but I don't see anything about nul termination there. I checked the top ~5 hits, and didn't find much. I also checked https://www.haskell.org/hoogle/?hoogle=Lines The goal is to be able to operate on filenames that contain newlines, but it's not that end of the world if that isn't very practical. Thanks! -- Dan Stromberg

On Oct 10, 2015, at 11:45 AM, Dan Stromberg
wrote: If I want to read a list of filenames, each terminated with a nul byte, from stdin (kind of like xargs -0), what's the best way to do that in Haskell? Or am I swimming against the current to do anything but newline-termination?
It looks like all the stdin-specific functions are line-oriented, but you can use "Data.ByteString.hGet stdin”. http://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString....
Google's top hit is: https://downloads.haskell.org/~ghc/7.6.1/docs/html/libraries/bytestring-0.10... ...but I don't see anything about nul termination there. I checked the top ~5 hits, and didn't find much.
I don’t think it matters in this case, but you should be aware that Google does not always find the most recent results when it comes to Hackage packages. (See my link above.) -Karl

First off, thanks for your response.
On Sat, Oct 10, 2015 at 12:14 PM, Karl Voelker
On Oct 10, 2015, at 11:45 AM, Dan Stromberg
wrote: If I want to read a list of filenames, each terminated with a nul byte, from stdin (kind of like xargs -0), what's the best way to do that in Haskell? Or am I swimming against the current to do anything but newline-termination?
It looks like all the stdin-specific functions are line-oriented, but you can use "Data.ByteString.hGet stdin”.
http://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString....
I'm very much a Haskell newb, but does Data.ByteString.hGet stdin read a fixed (maximum) number of bytes, rather than a nul terminated sequence of bytes?
Google's top hit is:
https://downloads.haskell.org/~ghc/7.6.1/docs/html/libraries/bytestring-0.10...
...but I don't see anything about nul termination there. I checked the top ~5 hits, and didn't find much.
I don’t think it matters in this case, but you should be aware that Google does not always find the most recent results when it comes to Hackage packages. (See my link above.
I'll keep that in mind. Thanks. -- Dan Stromberg

On Oct 10, 2015, at 12:40 PM, Dan Stromberg
wrote: I'm very much a Haskell newb, but does Data.ByteString.hGet stdin read a fixed (maximum) number of bytes, rather than a nul terminated sequence of bytes?
Yes, you’d have to write some code to look for nuls yourself. Or, if the input is small enough that you don’t mind reading it all into memory at once, use hGetContents. -Karl

On Sun, Oct 11, 2015 at 1:01 AM, Andrew Bernard
On what operating system can filenames contain newlines?
Any unixlike system. You would need to quote the newline from the shell, but it is perfectly valid. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Sun, Oct 11, 2015 at 10:43 AM, Brandon Allbery
On Sun, Oct 11, 2015 at 1:01 AM, Andrew Bernard
wrote: On what operating system can filenames contain newlines?
Any unixlike system. You would need to quote the newline from the shell, but it is perfectly valid.
Valid... Ok Perfectly?? Here's a counter view http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html Summary: Funny chars in filenames is a feature close to a bug in *Nix filesystems So my advice for this would be: - If you have better things to do dont bother - if for some reason you do need to bother, respect Postel's law https://en.wikipedia.org/wiki/Robustness_principle and allow it out more reluctantly than in

Hi Brandon, It may be valid, but it is just asking for trouble with many tools and utilities and scripts. Funny, I had to check re UNIX. I have been programming UNIX systems for over thirty years and never even imagined a newline in a filename! Would the OP say why he needs to use newlines in filenames? Something best avoided. I suppose this is not a Haskell matter, but one does have to ask. Andrew
On 11 Oct 2015, at 16:13, Brandon Allbery
wrote: Any unixlike system. You would need to quote the newline from the shell, but it is perfectly valid.

Andrew Bernard wrote:
Hi Brandon,
It may be valid, but it is just asking for trouble with many tools and utilities and scripts. Funny, I had to check re UNIX. I have been programming UNIX systems for over thirty years and never even imagined a newline in a filename!
Would the OP say why he needs to use newlines in filenames? Something best avoided. I suppose this is not a Haskell matter, but one does have to ask.
The OP must speak for himself, but as far as I am concerned, no reasonable person uses newlines in filenames, ever. However, the possibility is there, and it may happen that someone unreasonable has created a filename with a newline in it. This may become a security issue if, for example, someone creates a file named "/tmp/foo /etc/passwd bar.log" and a careless system person runs a script as root to remove all files named *.log from /tmp/ and subdirectories. find /tmp -name '*.log' -print | xargs rm is an insanely reckless way to do it, which in this case would cause the removal of the password file. Much better: find /tmp -name '*.log' -type f -print0 | xargs -0 rm or even better: find /tmp -name '*.log' -type f -exec rm {} + In summary, if you have to encode a list of filenames in a byte stream, doing it with zero terminated is the correct way to do it. – Harald

why not base64 encode input text to produce file names: https://en.wikipedia.org/wiki/Base64#Filenames The encoded file names would be less likely to cause problem. You could get the original text input by decoding the file name.

Imants Cekusins wrote:
why not base64 encode input text to produce file names:
An intriguing idea, but sometimes you just have to accept what the upstream program provides. – Harald

On Sun, Oct 11, 2015 at 5:22 AM, Harald Hanche-Olsen
Andrew Bernard wrote:
It may be valid, but it is just asking for trouble with many tools and utilities and scripts. Funny, I had to check re UNIX. I have been programming UNIX systems for over thirty years and never even imagined a newline in a filename!
Would the OP say why he needs to use newlines in filenames? Something best avoided. I suppose this is not a Haskell matter, but one does have to ask.
The OP must speak for himself, but as far as I am concerned, no reasonable person uses newlines in filenames, ever.
However, the possibility is there, and it may happen that someone unreasonable has created a filename with a newline in it. This may become a security issue if, for example, someone creates a file named
This is indeed the problem. I'm a sysadmin; if I need a tool like this, I don't usually have any say in what is in the filenames --- and, sadly, people *do* use all manner of odd characters, including newlines, non-UTF8, etc. It's my place to deal with what is, not what would be in an ideal world. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Sat, Oct 10, 2015 at 9:45 PM, Dan Stromberg
If I want to read a list of filenames, each terminated with a nul byte, from stdin (kind of like xargs -0), what's the best way to do that in Haskell? Or am I swimming against the current to do anything but newline-termination?
Google's top hit is:
https://downloads.haskell.org/~ghc/7.6.1/docs/html/libraries/bytestring-0.10... ...but I don't see anything about nul termination there. I checked the top ~5 hits, and didn't find much.
I also checked https://www.haskell.org/hoogle/?hoogle=Lines
The goal is to be able to operate on filenames that contain newlines, but it's not that end of the world if that isn't very practical.
Thanks!
-- Dan Stromberg
To give you an idea of how this might be done, I put together an example using the conduit-combinators library: https://gist.github.com/snoyberg/2a5ca79d97f483bdcfe9 This can be done in a more low-level manner by using the bytestring library directly, which will require learning less new concepts. However, streaming libraries like conduit and pipes are specifically designed for handling these kinds of problems. There's a tutorial on conduit available at: https://github.com/snoyberg/conduit#readme Michael
participants (8)
-
Andrew Bernard
-
Brandon Allbery
-
Dan Stromberg
-
Harald Hanche-Olsen
-
Imants Cekusins
-
Karl Voelker
-
Michael Snoyman
-
Rustom Mody