Re: Abstract FilePath Proposal

28 Jun 2015

      Normalization is a very hairy issue, which is not just platform specific
but also filesystem specific. Mac OS X is probably the worst of all words
in that respect, where HFS+ will do NFD normalization and may or may not
have case sensitivity depending on how that partition was formatted.
Network file shares and disk images may or may not have case sensitivity
and can use either NFD or NFC normalization based on mount options.

Contrary to statements earlier in the thread, NFD normalization happens on
HFS+ filesystems (the default) regardless of whether you're using POSIX
APIs or not. It's easy to prove this to yourself by creating a file with
U+00c9 (LATIN SMALL LETTER E WITH ACUTE) in the name (from any of the APIs)
and you'll see it come back out (e.g. from readdir) as two code points: 'e'
and then U+0301 (COMBINING ACUTE ACCENT). It'll also do some weird
transformations to file names that contain byte sequences that are not
valid UTF-8.

On Sun, Jun 28, 2015 at 12:05 PM, Edward Kmett  wrote:
...
Worse there are situations where you absolutely _have_ to be able to use
\\?\ encoding of a path on Windows to read, modify or delete files with
"impossible names" that were created by other means.
e.g. Filenames like AUX, that had traditional roles under DOS cause weird
interactions, or that were created with "impossibly long names" -- which
can happen in the wild when you move directories around, etc.
I'm weakly in favor of the proposal precisely because it is the first
version of this concept that I've seen that DOESN'T try to get too clever
with regards to adding all sorts of normalization and this proposal seems
to be the simplest move that would enable us to do something correctly in
the future, regardless of what that correct thing winds up being.
-Edward
On Sun, Jun 28, 2015 at 8:09 AM, David Turner <
dct25-561bs@mythic-beasts.com> wrote:
...
Hi,
I think it'd be more robust to handle normalisation when converting from
String/Text to FilePath (and combining things with () and so on) rather
than in the underlying representation.
It's absolutely crucial that you can ask the OS for a filename (which it
gives you as a sequence of bytes) and then pass that exact same sequence of
bytes back to the OS without any normalisation or other useful alterations
having taken place.
You can do some deeply weird stuff in Windows by starting an absolute
path with \\?\, including apparently using '.' and '..' as the name of a
filesystem component:
Because it turns off automatic expansion of the path string, the "\\?\"
prefix also allows the use of ".." and "." in the path names, which can be
useful if you are attempting to perform operations on a file with these
otherwise reserved relative path specifiers as part of the fully qualified
path.
(from
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).a...
)
I don't fancy shaking all the corner cases out of this. An explicit
'normalise' function seems ok, but baking normalisation into the type
itself seems bad.
Cheers,
David
On 28 June 2015 at 11:03, Boespflug, Mathieu  wrote:
...
Hi Neil,
why does the proposal *not* include normalization?
There are four advantages that I see to making FilePath a datatype:
1. it makes it possible to implement the correct semantics for some
systems (including POSIX),
2. it allows for information hiding, which in turn helps modularity,
3. the type is distinct from any other type, hence static checks are
stronger,
4. it becomes possible to quotient values over some arbitrary set of
identities that makes sense. i.e. in the case of FilePath, arguably
"foo/bar//baz" *is* "foo/bar/baz" *is* "foo//bar/baz" for all intents
and purposes, so it is not useful to distinguish these three ways of
writing down the same path (and in fact in practice distinguishing
them leads to subtle bugs). That is, the Eq instance compares
FilePath's modulo a few laws.
Do you propose to forego (4)? If so why so?
If we're going through a deprecation process, could we do so once, by
getting the notion of path equality we want right the first time?
Contrary to type indexing FilePath, it seems to me that the design
space for path identities is much smaller. Essentially, exactly the
ones here:
https://hackage.haskell.org/package/filepath-1.1.0.2/docs/System-FilePath-Po...
.
Best,
Mathieu
...
Hi Niklas,
The function writeFile takes a FilePath. We could fork base or tell
everyone
to use writeFile2, but in practice everyone will keep using writeFile,
and
this String for FilePath. This approach is the only thing we could
On 27 June 2015 at 12:12, Neil Mitchell  wrote:
figure
...
that made sense.
Henning: we do not propose normalisation on initialisation. For ASCII
strings fromFilePath . toFilePath will be id. It might also be for
unicode
on some/all platforms. Of course, you can write your own FilePath
creator
that does normalisation on construction.
Thanks, Neil
On Saturday, 27 June 2015, Niklas Larsson 
wrote:
...
Hi!
Instead of trying to minimally patch the existing API and still
breaking
...
loads of code, why not make a new API that doesn't have to compromise
and
depreciate the old one?
Niklas
________________________________
Från: Herbert Valerio Riedel
Skickat: ‎2015-‎06-‎26 18:09
Till: libraries@haskell.org; ghc-devs@haskell.org
Ämne: Abstract FilePath Proposal
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello *,
What?
=====
We (see From: & CC: headers) propose, plain and simple, to turn the
currently defined type-synonym
type FilePath = String
into an abstract/opaque data type instead.
Why/How/When?
=============
For details (including motivation and a suggested transition scheme)
please consult
https://ghc.haskell.org/trac/ghc/wiki/Proposal/AbstractFilePath
Suggested discussion period: 4 weeks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJVjXkZAAoJELo8uj/+IrV0WXUP/0romoKazwLbQpaMAKgCNZon
BsY8Di44w6rkbdBXoky0xZooII8LJJyQfexH0BLRYEVLZFy0+LB8XzpPt8Ekg526
YlY4x0qFm9oiJbJDMqHUnb6z6Lr2KxzBcV37drTPbltUA+HB49DUVkkPbvHimpL2
28SIyhAr4fN6fLpGcFAkv6Rcs0mkvnTp7vsC0HNyshmGi6qQ+C+eB4mklQzWOPcn
koHZ2wtI8AJmyTdHKcXKAIFM0r+xl4MJ5445IvDjvIuGXZCzybXMw9Ss/4wSG3VN
qSIJVEDGZXrBCc12fPxPEB0Bqx9MIVytjplXKIo8rFrk93h3at9t9kDM26z+9PZ5
KYnEdjRKF4KL4j+3xqJDOEJT15GVRbGRRzb9A8xH0YIQ0S3Q3pt1PAfla1Hss75+
NRQgfowZYryL9dfCkAj2XNfdQ+pUk25N3bNig11se+zjk2JO77QRM0u3GOYZ9+CU
tSlwhtIMF32xnjgQyWE5yBBiEg3/Y+S+809tVaPseUEzkQJXMGq5TFxBrN6bj1Vm
awr6QghThKjeoRwky5bmFn/gept/lbYN6VV5B6gNznGP5xgFrmvVtmjbQJBRMYCv
aEUnrYqxkkbIddJjD5gl771/LWH4M2F1yBgJjfiZw2paEVAXKxEr327LsbOQaPdb
HjIPRrJbVK9AABo4AZ/Y
=lg0o
-----END PGP SIGNATURE-----
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Re: Abstract FilePath Proposal

Bob Ippolito