
#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn -------------------------------------+------------------------------------- Reporter: andrewufrank | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Keywords: | Operating System: Linux Architecture: | Type of failure: Poor/confusing Unknown/Multiple | error message Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- this is a very annoying issue and has been discussed already (e.g. #1744) and https://mail.haskell.org/pipermail/haskell- cafe/2011-January/088021.html. i think it is ok that the BOM character is not automatically removed when reading a file, but it is INCONSISTENT then to not show the BOM character when printing the file content. a minimal test: v <- readFile "fileWithBOM" putStrLn "the file content" putStrLn v putStrLn (show v) return () the first line does not indicate that there is a BOM character in the input and not removed from the result - only the second putStrLn (with the incorrect show on the result string) demonstrates the presence of the BOM character: "\65279\r\n.sprache English\r\n\..... consistency here is important to warn the programmer early on (after reading and checking file content) because other tools (e.g. parsec) see the BOM character and fail. i recommend that the BOM character is read but shown in printStrLn - i guess this is preferably over automatic (silent) removal. reading in and not showing, however, leads to misguided searches for strange errors caused by the BOM. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13486 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler