[GHC] #9114: Invalid UTF8 not round-tripped correctly

#9114: Invalid UTF8 not round-tripped correctly ------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.6.3 Keywords: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: None/Unknown Difficulty: Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | ------------------------------------+------------------------------------- As reported by Robert Bihlmeyer at http://bugs.debian.org/748125, the promised round-tripping of invalid UTF8 sequences in filenames through String does not work: ``` $ mkdir foo $ touch foo/$(echo -e '\xC0\xB7.txt') $ ghc -e 'System.Directory.getDirectoryContents "foo" >>= print . last' "7.txt" ``` The sequence 0xC8B7 is an (invalid) encoding of 37, i.e. `'7'`, so if it is mapped to `'7'`, no round-tripping is possible. (Other invalid byte sequences are round-tripped.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9114: Invalid UTF8 not round-tripped correctly -------------------------------------+------------------------------------ Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by nomeata): Sigh, got the syntax wrong and am lacking the rights to edit my own posts. If someone with the rights feels like it, please replace {{{```}}} by {{{{{{}}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9114: Invalid UTF8 not round-tripped correctly -------------------------------------+------------------------------------ Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Description changed by tibbe: Old description:
As reported by Robert Bihlmeyer at http://bugs.debian.org/748125, the promised round-tripping of invalid UTF8 sequences in filenames through String does not work:
``` $ mkdir foo $ touch foo/$(echo -e '\xC0\xB7.txt') $ ghc -e 'System.Directory.getDirectoryContents "foo" >>= print . last' "7.txt" ```
The sequence 0xC8B7 is an (invalid) encoding of 37, i.e. `'7'`, so if it is mapped to `'7'`, no round-tripping is possible. (Other invalid byte sequences are round-tripped.)
New description: As reported by Robert Bihlmeyer at http://bugs.debian.org/748125, the promised round-tripping of invalid UTF8 sequences in filenames through String does not work: {{{ $ mkdir foo $ touch foo/$(echo -e '\xC0\xB7.txt') $ ghc -e 'System.Directory.getDirectoryContents "foo" >>= print . last' "7.txt" }}} The sequence 0xC8B7 is an (invalid) encoding of 37, i.e. `'7'`, so if it is mapped to `'7'`, no round-tripping is possible. (Other invalid byte sequences are round-tripped.) -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9114: Invalid UTF8 not round-tripped correctly -------------------------------------+------------------------------------ Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by hvr): Isn't this rather to be filed against `libraries/directory`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9114: Invalid UTF8 not round-tripped correctly -------------------------------------+------------------------------------ Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by nomeata): No, I believe the bug is in `GHC.IO.Encoding` or `GHC.IO.Encoding.UTF8`. I tried to give an example using that interface, but failed to work with these buffers; the following code always prints `'\NUL'`: {{{ import GHC.IO.Encoding import GHC.IO.Buffer import GHC.IO.Encoding.Types main = do te <- getFileSystemEncoding case te of TextEncoding _ decIO _ -> do BufferCodec encode recover close getState setState <- decIO bb <- newByteBuffer 2 ReadBuffer writeWord8Buf (bufRaw bb) 0 0xC0 writeWord8Buf (bufRaw bb) 1 0xB7 checkBuffer bb cb <- newCharBuffer 1 WriteBuffer (InputUnderflow,_,cb') <- encode bb cb close c <- peekCharBuf (bufRaw cb') 0 print c }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9114: Invalid UTF8 not round-tripped correctly -------------------------------------+------------------------------------- Reporter: nomeata | Owner: ekmett Type: bug | Status: closed Priority: normal | Milestone: Component: Core Libraries | Version: 7.6.3 Resolution: worksforme | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by thomie): * status: infoneeded => closed * resolution: => worksforme -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9114#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC