[GHC] #10907: GHC fails to read file with byte-order mark when LANG=C

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: | Owner: RyanGlScott | Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Keywords: | Operating System: Linux Architecture: x86_64 | Type of failure: GHC doesn't work (amd64) | at all Test Case: | Blocked By: Blocking: | Related Tickets: #6016, #6037 Differential Revisions: | -------------------------------------+------------------------------------- I've attached a minimal example that causes the problem. Compiling `ByteOrderMark.hs` like so fails: {{{ $ LANG=C ghc ByteOrderMark.hs ByteOrderMark.hs: hLookAhead: invalid argument (invalid byte sequence) }}} I can reproduce this on x86_64 Linux using GHC 7.10.2 and HEAD. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Changes (by RyanGlScott): * Attachment "ByteOrderMark.hs" added. Minimal example -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Description changed by RyanGlScott: Old description:
I've attached a minimal example that causes the problem. Compiling `ByteOrderMark.hs` like so fails:
{{{ $ LANG=C ghc ByteOrderMark.hs ByteOrderMark.hs: hLookAhead: invalid argument (invalid byte sequence) }}}
I can reproduce this on x86_64 Linux using GHC 7.10.2 and HEAD.
New description: I've attached a minimal example that causes the problem. Compiling `ByteOrderMark.hs` like so fails: {{{ $ LANG=C ghc ByteOrderMark.hs ByteOrderMark.hs: hLookAhead: invalid argument (invalid byte sequence) }}} I can reproduce this on x86_64 Linux using GHC 7.10.2 and HEAD. Note that compiling it without `LANG=C` does work. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by osa1): Just tried this and it worked fine on my x86_64 Linux, with GHC 7.10.1 and HEAD. Is there anything else I need to set other than `LANG`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by osa1): Sorry, I had to unset `$LC_CTYPE`, now I can reproduce. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): I can confirm as well. Furthermore it worked with 7.8, and it only seems to affect files with a BOM, files with other unicode characters, but no BOM, are read successfully. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by thomie): It's quite likely caused by my patch for #6016. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata): The problem seems to be `skipBOM` in `StringUtils.hs`, which switches to text mode so that `hLookAhead` is able to consume the whole BOM, instead of just the first character. But in text mode we are locale dependent. At first I thought it would make sense to stay in binary mode, but then `hLookAhead` returns just one bytes, which is not enough to detect a bom. Using `hGetChar` twice would help, but if there is no BOM, we’d have to rewind. Are we sure we can `hSeek` on all buffers that we need to? A `Word16` encoding would help. Or maybe it works well enough to force utf8 for this single `hLookAhead`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by nomeata):
It's quite likely caused by my patch for #6016.
I believe so as well. Since you have touched that last, shall I leave this to you? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by thomie): Please go ahead and fix it yourself if you know of a good solution. Otherwise I'll put it on my list. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: Phab:D1274 -------------------------------------+------------------------------------- Changes (by nomeata): * status: new => patch * differential: => Phab:D1274 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10907: GHC fails to read file with byte-order mark when LANG=C
-------------------------------------+-------------------------------------
Reporter: RyanGlScott | Owner:
Type: bug | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.2
(Parser) |
Resolution: | Keywords:
Operating System: Linux | Architecture: x86_64
Type of failure: GHC doesn't work | (amd64)
at all | Test Case:
Blocked By: | Blocking:
Related Tickets: #6016, #6037 | Differential Revisions: Phab:D1274
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#10907: GHC fails to read file with byte-order mark when LANG=C -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.2 (Parser) | Resolution: fixed | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: GHC doesn't work | (amd64) at all | Test Case: Blocked By: | Blocking: Related Tickets: #6016, #6037 | Differential Revisions: Phab:D1274 -------------------------------------+------------------------------------- Changes (by nomeata): * status: patch => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10907#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC