[GHC] #13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn

#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Keywords: | Operating System: Linux Architecture: | Type of failure: Poor/confusing Unknown/Multiple | error message Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html. i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content. a minimal test: v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v) return () the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character: "\65279\r\n.sprache English\r\n\..... consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail. i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Linux | Architecture: Type of failure: Poor/confusing | Unknown/Multiple error message | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by andrewufrank: Old description:
this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html.
i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content.
a minimal test:
v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v)
return ()
the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character:
"\65279\r\n.sprache English\r\n\.....
consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail.
i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM.
New description: this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html. i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content. a minimal test: {{{ v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v) return () }}} the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character: "\65279\r\n.sprache English\r\n\..... consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail. i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Linux | Architecture: Type of failure: Poor/confusing | Unknown/Multiple error message | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Ugh, this is rather unfortunate. I wonder what other language implementations do here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Linux | Architecture: Type of failure: Poor/confusing | Unknown/Multiple error message | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by andrewufrank): i do not see a systematically correct solution (given that BOM is not a systematically correct feature ..). therefore warnings in the different haddock comments for putStrLn - invisible characters (e.g. BOM) may not be shown (use "putStrLn . show" to make them visible) parsec (and others): consider invisible characters (e.g. a BOM mark at the start of the file) and use optional (char '\65279') -- to remove BOM if present the problem is not BOM in particular, but the existence of completely invisible characters in UTF8. alternatively, consider a readFile alternate which does not read BOM at the beginning. none of these solutions are really attractive! andrew -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC