Reading files efficiently

I've got another n00b question, thanks for all the help you have been giving me! I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file. First of all I came up with this program: import System.IO main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines This program gives the following error, presumably because there is an ISO-8859-1 character in the dictionary: "Program error: <handle>: IO.getContents: protocol error (invalid character encoding)" How can I tell the Haskell system that it is to read ISO-8859-1 text rather than UTF-8? I now used iconv to convert the file to UTF-8 and tried again. This time it worked, but it seems horribly inefficient -- Hugs took 2.8 seconds to read a 96,000 line file. By contrast the equivalent Python program: print open("words", "r").readlines()[-1] took 0.05 seconds. I assume I must be doing something wrong here, and somehow causing Haskell to use a particularly inefficient algorithm. Can anyone give me any clues what I should be doing instead? Thanks again, Pete

1:
I've got another n00b question, thanks for all the help you have been giving me!
I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file. First of all I came up with this program:
import System.IO main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
This program gives the following error, presumably because there is an ISO-8859-1 character in the dictionary: "Program error: <handle>: IO.getContents: protocol error (invalid character encoding)"
How can I tell the Haskell system that it is to read ISO-8859-1 text rather than UTF-8?
I now used iconv to convert the file to UTF-8 and tried again. This time it worked, but it seems horribly inefficient -- Hugs took 2.8 seconds to read a 96,000 line file. By contrast the equivalent Python program:
print open("words", "r").readlines()[-1]
took 0.05 seconds. I assume I must be doing something wrong here, and somehow causing Haskell to use a particularly inefficient algorithm. Can anyone give me any clues what I should be doing instead?
a) Compile your code with GHC instead of interpreting it. GHC is blazing fast. $ ghc -O A.hs $ time ./a.out Zyzzogeton ./a.out 0.23s user 0.01s system 91% cpu 0.257 total b) If not satisifed with the result, Use packed strings (as python does). http://www.cse.unsw.edu.au/~dons/fps.html import qualified Data.FastPackedString as P import IO main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines $ ghc -O2 -package fps B.hs $ time ./a.out Zyzzogeton./a.out 0.04s user 0.02s system 86% cpu 0.063 total 0.06s is ok with me :) -- Don

dons:
1:
I've got another n00b question, thanks for all the help you have been giving me!
I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file. First of all I came up with this program:
import System.IO main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
This program gives the following error, presumably because there is an ISO-8859-1 character in the dictionary: "Program error: <handle>: IO.getContents: protocol error (invalid character encoding)"
How can I tell the Haskell system that it is to read ISO-8859-1 text rather than UTF-8?
I now used iconv to convert the file to UTF-8 and tried again. This time it worked, but it seems horribly inefficient -- Hugs took 2.8 seconds to read a 96,000 line file. By contrast the equivalent Python program:
print open("words", "r").readlines()[-1]
took 0.05 seconds. I assume I must be doing something wrong here, and somehow causing Haskell to use a particularly inefficient algorithm. Can anyone give me any clues what I should be doing instead?
a) Compile your code with GHC instead of interpreting it. GHC is blazing fast.
$ ghc -O A.hs $ time ./a.out Zyzzogeton ./a.out 0.23s user 0.01s system 91% cpu 0.257 total
b) If not satisifed with the result, Use packed strings (as python does).
http://www.cse.unsw.edu.au/~dons/fps.html
import qualified Data.FastPackedString as P import IO main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines
$ ghc -O2 -package fps B.hs $ time ./a.out Zyzzogeton./a.out 0.04s user 0.02s system 86% cpu 0.063 total
0.06s is ok with me :)
Faster, don't split up the file into lines. Here we're following the "How to optimise Haskell code by posting to haskell-cafe@" law: import qualified Data.FastPackedString as P import IO main = do P.readFile "/usr/share/dict/words" >>= P.hPut stdout . snd . P.spanEnd (/='\n') . P.init putChar '\n' $ time ./a.out Zyzzogeton ./a.out 0.00s user 0.01s system 60% cpu 0.013 total

Donald Bruce Stewart wrote:
a) Compile your code with GHC instead of interpreting it. GHC is blazing fast.
That's one answer I suppose! I quite liked using Hugs for that particular program because it's a script that I didn't want to spend time compiling. Oh well, it's not that important. I did notice that the script runs much quicker with runghc rather than runhugs. Is there any way of making runghc work with a script whose name doesn't end ".hs"?
b) If not satisifed with the result, Use packed strings (as python does).
Good suggestion, thanks. Pete

Donald Bruce Stewart wrote:
a) Compile your code with GHC instead of interpreting it. GHC is blazing fast.
That's one answer I suppose! I quite liked using Hugs for that particular program because it's a script that I didn't want to spend time compiling. Oh well, it's not that important.
You can use 'ghci' as well -- it's much like hugs.
I did notice that the script runs much quicker with runghc rather than runhugs. Is there any way of making runghc work with a script whose name doesn't end ".hs"?
Well, I know this works: $ cat A.lhs #!/usr/bin/env runhaskell > main = putStrLn "gotcha!" $ ./A.lhs gotcha! But for files with no .hs or .lhs extension? Anyone know of a trick? -- Don

Donald Bruce Stewart wrote:
Well, I know this works:
$ cat A.lhs #!/usr/bin/env runhaskell > main = putStrLn "gotcha!"
$ ./A.lhs gotcha!
But for files with no .hs or .lhs extension? Anyone know of a trick?
GHC 6.6 will allow this, because we added the -x flag (works just like gcc's -x flag). eg. "ghc -x hs foo.wibble" will interpret foo.wibble as a .hs file. I have an uncommitted patch for runghc that uses -x, I need to test & commit it. Cheers, Simon

Simon Marlow wrote:
GHC 6.6 will allow this, because we added the -x flag (works just like gcc's -x flag). eg. "ghc -x hs foo.wibble" will interpret foo.wibble as a .hs file. I have an uncommitted patch for runghc that uses -x, I need to test & commit it.
Ah, that will be very useful, thanks!
You may already know this, but there is an oddity with shellscripts that
can make it difficult to pass flags like -x in a useful way. It's
easiest to show this by an example. First create a shellscript called
bar, containing one line:
#!./foo -x -y
Now create foo from foo.c:
#include
participants (3)
-
dons@cse.unsw.edu.au
-
Pete Chown
-
Simon Marlow