
On Sun, Nov 28, 2010 at 9:19 AM, Michael Snoyman
On Sun, Nov 28, 2010 at 8:53 AM, Yitzchak Gale
wrote: Michael Snoyman wrote:
Perhaps a silly question, but are you certain that the input file is valid UTF-8?
That is a very good point.
You could also try using the readFile from utf8-string... [or] read the contents as a lazy bytestring and then use the decode functions...
Those approaches are now both deprecated. Either do what you are doing, which gives you conceptually simple strings as lists of Char. Or, for better efficiency, use the text package:
import qualified Data.Text.Lazy as T main :: IO () main = do text <- T.readFile "unicode.txt" T.putStr text
In any case, you still need to have the correct encoding set on the handles as before. (And the input needs to be valid for your selected encoding.)
Which is why I would actually recommend sticking with the bytestring/text combination when you know what the file encoding will be and it is not system-dependent. It's the approach that I use with Hamlet et al for precisely that reason.
Sorry for replying to myself, but I didn't clarify that very well. You're right that setting encoding on the handle can work well enough for this, but it does *not* address invalid byte sequences (AFAIK), which can be dealt with using the bytestring/text decoding combination. Michael