
#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Linux | Architecture: Type of failure: Poor/confusing | Unknown/Multiple error message | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by andrewufrank: Old description:
this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html.
i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content.
a minimal test:
v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v)
return ()
the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character:
"\65279\r\n.sprache English\r\n\.....
consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail.
i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM.
New description: this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html. i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content. a minimal test: {{{ v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v) return () }}} the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character: "\65279\r\n.sprache English\r\n\..... consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail. i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler