Reading files efficiently

19 Mar 2006

      I've got another n00b question, thanks for all the help you have been 
giving me!

I want to read a text file.  As an example, let's use 
/usr/share/dict/words and try to print out the last line of the file. 
First of all I came up with this program:

import System.IO
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

This program gives the following error, presumably because there is an 
ISO-8859-1 character in the dictionary:
"Program error: <handle>: IO.getContents: protocol error (invalid 
character encoding)"

How can I tell the Haskell system that it is to read ISO-8859-1 text 
rather than UTF-8?

I now used iconv to convert the file to UTF-8 and tried again.  This 
time it worked, but it seems horribly inefficient -- Hugs took 2.8 
seconds to read a 96,000 line file.  By contrast the equivalent Python 
program:

print open("words", "r").readlines()[-1]

took 0.05 seconds.  I assume I must be doing something wrong here, and 
somehow causing Haskell to use a particularly inefficient algorithm. 
Can anyone give me any clues what I should be doing instead?

Thanks again,
Pete

Pete Chown

dons＠cse.unsw.edu.au

dons＠cse.unsw.edu.au

Pete Chown

dons＠cse.unsw.edu.au

Simon Marlow

Pete Chown

tags

participants (3)