
Hi all, I am so often in need for this (I suppose, non-existing) module, that I am going crazy :) I have found discussions in the archives for haskell mailing lists, but is there any, even posix-only, even unix-only, but better portable, implementation of functions to - concatenate file paths - tell if a path is absolute or relative - chase symlinks - find all the files in a directory (yes, that's what I need :)) I am tired of finding bugs in my implementation, is there someone with a better thing to offer? Is there any hope to see such functions in GHC libraries, or are those already there and it's just me who can't find'em? Thanks in advance Vincenzo

Vincenzo aka Nick Name wrote:
I am so often in need for this (I suppose, non-existing) module, that I am going crazy :) I have found discussions in the archives for haskell mailing lists, but is there any, even posix-only, even unix-only, but better portable, implementation of functions to
- concatenate file paths
(concat . intersperse "/") Maybe you wanted something more, e.g. canonicalisation?
- tell if a path is absolute or relative
((== '/') . head)
- chase symlinks
I was going to ask what you meant here but, AFAICT, Haskell (at least, GHC 5.04) doesn't appear to recognise the existence of symlinks. So, whatever you meant, the answer is probably "no".
- find all the files in a directory (yes, that's what I need :))
Define "file" (e.g. "regular file", "anything other than a directory",
"directory entry" etc). Also, define "in"; i.e. are you talking about
a recursive search (like "find")?
Simplest case:
getDirectoryContents
If you only want files, as defined by doesFileExist:
getDirectoryFiles dir = do
entries <- getDirectoryContents dir
filterM (doesFileExist . ((dir ++ "/") ++)) entries
Recursion is a somewhat harder (maybe even impossible), given that
doesDirectoryExist doesn't distinguish between a directory and a
symlink to a directory, and the Posix module doesn't appear to include
anything which would be of use (e.g. a binding for lstat()).
--
Glynn Clements

Alle 19:13, giovedì 5 febbraio 2004, Glynn Clements ha scritto:
- concatenate file paths
(concat . intersperse "/")
Maybe you wanted something more, e.g. canonicalisation?
Yes, maybe an interface to realpath(3) is what I really need.
- tell if a path is absolute or relative
((== '/') . head)
Agreed :)
- chase symlinks
I was going to ask what you meant here but, AFAICT, Haskell (at least, GHC 5.04) doesn't appear to recognise the existence of symlinks. So, whatever you meant, the answer is probably "no".
I currently use module System.Posix from ghc6, there are stat and lstat equivalents, what I want is to get the true file pointed from a symlink after having known that it is a symlink, this can be done with recursion of course, and it's trivial; I was just wondering if it was implemented somewhere else, because I am not so expert in working with filesystems and could make some mistake (e.g: I realized only recently that using an hashtable of already visited files is necessary to avoid ciclic links; also, without getting the canonical path, I could visit a file twice).
- find all the files in a directory (yes, that's what I need :))
Define "file" (e.g. "regular file", "anything other than a directory", "directory entry" etc). Also, define "in"; i.e. are you talking about a recursive search (like "find")?
Yes, I forgot to say "recursively". I have an ocaml implementation but it's prone to errors because of missing "canonicalization", so I did not want to translate that in haskell for the same problem. Currently I workarounded this all by forking "find", but it's prone to errors too because I have no way to distinguish between newlines ending a file name and newlines in the middle of a file name. I should put something like "///" with "find -printf" at the end of each file name, and then parse that, but it would really be preferable to code an haskell library function equivalent to unix find.
Recursion is a somewhat harder (maybe even impossible), given that doesDirectoryExist doesn't distinguish between a directory and a symlink to a directory, and the Posix module doesn't appear to include anything which would be of use (e.g. a binding for lstat()).
Yes, the System.Posix module in ghc6 has more features, but I still don't like to handcode functions like "realpath"; if there is nothing else maybe the best thing is to write a binding to this function (I never did that but guess it's a one-liner), and to carefully read source code for gnu find and implement it in haskell the same way. V. -- Teatri vuoti e inutili potrebbero affollarsi se tu ti proponessi di recitare te [CCCP]

On Thu, 2004-02-05 at 13:44, Vincenzo aka Nick Name wrote:
Yes, I forgot to say "recursively". I have an ocaml implementation but it's prone to errors because of missing "canonicalization", so I did not want to translate that in haskell for the same problem. Currently I workarounded this all by forking "find", but it's prone to errors too because I have no way to distinguish between newlines ending a file name and newlines in the middle of a file name. I should put something like "///" with "find -printf" at the end of each file name, and then parse that, but it would really be preferable to code an haskell library function equivalent to unix find.
If your "find" supports it (GNU find does, I don't know about others), you can use "find -print0" to NUL-terminate file names. Carl Witty

Vincenzo aka Nick Name wrote:
I was going to ask what you meant here but, AFAICT, Haskell (at least, GHC 5.04) doesn't appear to recognise the existence of symlinks. So, whatever you meant, the answer is probably "no".
I currently use module System.Posix from ghc6, there are stat and lstat equivalents, what I want is to get the true file pointed from a symlink after having known that it is a symlink, this can be done with recursion of course, and it's trivial; I was just wondering if it was implemented somewhere else, because I am not so expert in working with filesystems and could make some mistake (e.g: I realized only recently that using an hashtable of already visited files is necessary to avoid ciclic links; also, without getting the canonical path, I could visit a file twice).
Again, you probably want a binding for realpath(). However, note that realpath() implementations don't generally keep a history. They just keep a symlink counter; in the event of a cycle, the symlink counter will eventually hit its limit, resulting in ENAMETOOLONG.
- find all the files in a directory (yes, that's what I need :))
Define "file" (e.g. "regular file", "anything other than a directory", "directory entry" etc). Also, define "in"; i.e. are you talking about a recursive search (like "find")?
Yes, I forgot to say "recursively". I have an ocaml implementation but it's prone to errors because of missing "canonicalization", so I did not want to translate that in haskell for the same problem. Currently I workarounded this all by forking "find", but it's prone to errors too because I have no way to distinguish between newlines ending a file name and newlines in the middle of a file name. I should put something like "///" with "find -printf" at the end of each file name, and then parse that,
Use "find ... -print0", which NUL-terminates each filename. Having said that, the only real-world scenario in which you are likely to encounter filenames which contain embedded newlines is if someone created them with malicious intent. More on malicious intent below.
but it would really be preferable to code an haskell library function equivalent to unix find.
For recursive directory scanning, you don't need full
canonicalisation; you just need to be able to distinguish actual
directories from symlinks to directories (i.e. lstat()). Just ensure
that all symlinks are treated as leaves, along with "." and "..", and
you have a strict tree structure.
FWIW, this assumes that the OS doesn't allow hard links to be made to
directories. However, AFAIK, that's true of every version of Unix
which is still in use outside of a computer museum. Even on the ones
which did allow hard links to directories, directory-recursion tended
to exhibit undesirable (but entirely predictable) behaviour if you
actually did so.
Also, if you are concerned about security issues, you need to consider
the possibility of symlink races; i.e. where an attacker does:
chdir("foo");
mkdir("bar", mode);
/* "find" lstat()s "bar" and decides that it's a directory */
rename("bar", "_bar");
symlink("/etc", "bar");
/* "find" ends up chdir()ing into /etc */
To deal with that situation, calls to chdir() need to be followed up
with a check to ensure that they ended up where they thought they
would, e.g. by comparing the device:inode pair for "." with the values
obtained from the lstat() on the directory entry, or by comparing the
device:inode pair for ".." with those for the previous directory.
--
Glynn Clements
participants (3)
-
Carl Witty
-
Glynn Clements
-
Vincenzo aka Nick Name