Shootout favoring imperative code

Scherrer, Chad

4 Jan 2006 4 Jan '06

3:35 p.m.

Several people on this list have said that the shootout favors imperative code. Is this really the case? Why is it Clean seems to have no trouble (for the incomplete set of benchmarks that are written in it)? http://shootout.alioth.debian.org/clean.php How difficult would it be to translate the Clean algorithms into Haskell? -Chad

Show replies by date

Sebastian Sylvan

4 Jan 4 Jan

5:16 p.m.

On 1/4/06, Scherrer, Chad wrote:

...

Several people on this list have said that the shootout favors imperative code. Is this really the case? Why is it Clean seems to have no trouble (for the incomplete set of benchmarks that are written in it)?

http://shootout.alioth.debian.org/clean.php

How difficult would it be to translate the Clean algorithms into Haskell?

IMO, the problem isn't that it's not *possible* to make the Haskell versions as fast at the C versions. You just write them in an imperative style using pointers, peek, poke etc. However, most people don't use Haskell for it's facilites for writing C-style code. Some of the problems seem to be heavily geared towards an imperative *implementation*, meaning that a Haskell version is hardly idiomatic Haskell (and as such I , and I suspect otehrs, really have no inclination to work on it). Take the fannuch benchmark for instance. Part of it is to generate input data (all permutations of a sequence). It would be better to not require the program to print out the first 30 lists from the input data, since that places additional (completely unneeded) restrictions on how to implement the program (and leads to an unnecessarily ugly implementation if you generate the input in a non-imperative fashion). I assume it's no coincidence that this sequence exactly matches the "straightforward" way to generate permutations in C. Now, there is some value to restrict implementaiton in *some* cases. For instance if you want to test the speed of a function call with a recursive algorithm, you need to enforce that the algorithm is written with recursion. So in conclusion: By placing too many requirements on the implementations of the algorithms you make the benchmarks completely useless for languages that aren't C-ish in nature. It all degenerates into "are there enough programmers who are willing to spend time writing C in Haskell?" and not how easy it is to solve the problems in Haskell, and how fast the results are. I should note that there are a few good "unbiased" benchmarks in there as well (such as chamenos and others) which place very little restriction on implementation and just says what needs to be done... I should also note that I don't think these benchmarks mean anything at all. It would be better to test, say, the best possible solutions for some of the ICFP programming contests, for example. They say a lot more about real-world speed. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Brent Fulgham

7:03 p.m.

--- Sebastian Sylvan wrote:

...

Some of the problems seem to be heavily geared towards an imperative *implementation*, meaning that a Haskell version is hardly idiomatic Haskell (and as such I , and I suspect otehrs, really have no inclination to work on it).

...

Take the fannuch benchmark for instance. Part of it is to generate input data (all permutations of a sequence). It would be better to not require the program to print out the first 30 lists from the input data, since that

...

additional (completely unneeded) restrictions on how to implement the program (and leads to an unnecessarily ugly implementation if you generate

I agree that several benchmarks suffer from this problem, but we have been trying to change this where possible. places the input in

...

a non-imperative fashion).

I must disagree. We based this benchmark on a very standard benchmark studied by Lisp implementors (and others, see e.g., http://citeseer.ist.psu.edu/315141.html)in an effort to address the problems of the original array access benchmark (which was extremely imperative in nature). I don't think asking Haskell to handle this simple program is unfair; Ken Anderson and others dealt with this for Lisp many years ago.

...

I assume it's no coincidence that this sequence exactly matches the "straightforward" way to generate permutations in C.

...

I should also note that I don't think these benchmarks mean anything at all. It would be better to test, say,

But I think Haskell may face real-world cases where data must be produced in some known order. For Haskell to be a contender in "real world" use, it sometimes has to confront ugly requirements. the best

...

possible solutions for some of the ICFP programming contests, for example. They say a lot more about real-world speed.

Agreed. However, it's a lot easier to get volunteers to implement small benchmarks (therefore, providing the ability to compare many languages) rather than large ICFP entries. -Brent

Sebastian Sylvan

7:23 p.m.

On 1/4/06, Brent Fulgham wrote:

...

But I think Haskell may face real-world cases where data must be produced in some known order. For Haskell to be a contender in "real world" use, it sometimes has to confront ugly requirements.

I must respectfully note that you contradict yourself somewhat. First you state that there's no problem introducing unnecessary requirements on the order of generating input data because it's to be expected in the "real-world", and then you state that the reason for not using more real-world benchmarks is to facilitate more volonteers. Wouldn't less strict requirements where they are posible also facilitate more contributions? My point here was that even though you _can_ generate this data in Haskell, there's no point in requiring (because the order doesn't matter for the benchmark itself). This needless requirementm (for the data to follow the order you get from an imperative solution) will only put off contributors for functional solutions. If you wanted to be fair here the order would be much more intricate and require considerable obfuscation for all langauges. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Brent Fulgham

7:37 p.m.

--- Sebastian Sylvan wrote:

...

My point here was that even though you _can_ generate this data in Haskell, there's no point in requiring (because the order doesn't matter for the benchmark itself).

We do need to agree on which 30 permutations should be used in the validation of the benchmark (just to make sure that the algorithms are producing correct output). Perhaps we could specify the 30 (or perhaps 'N') permutations as an input file, or perhaps require that they be hard-coded into the program? The problem with using an input file is that now we are involving file I/O in the benchmark, which introduces questions about where time is being spent (i.e., file access instead of pancake-flipping). -Brent

Sebastian Sylvan

7:43 p.m.

On 1/4/06, Brent Fulgham wrote:

...

--- Sebastian Sylvan wrote:

...
My point here was that even though you _can_ generate this data in Haskell, there's no point in requiring (because the order doesn't matter for the benchmark itself).

We do need to agree on which 30 permutations should be used in the validation of the benchmark (just to make sure that the algorithms are producing correct output).

Wouldn't the "maximum number of flips" output be enough for validation? /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Sebastian Sylvan

5 Jan 5 Jan

4:41 p.m.

On 1/4/06, Brent Fulgham wrote:

...

--- Sebastian Sylvan wrote:

...
Some of the problems seem to be heavily geared towards an imperative *implementation*, meaning that a Haskell version is hardly idiomatic Haskell (and as such I , and I suspect otehrs, really have no inclination to work on it).

I agree that several benchmarks suffer from this problem, but we have been trying to change this where possible.

A good example of an unfair benchmark that Udo Stenzel noted over at the shootout haskell wiki is "sum-file". Here you specify that no line is ever more than 128 characters long. What's the purpose of doing that? Clearly it's to make life easier for C programmers, is it not? (do C programs never have to deal with the "real world" where such assumptions can't be made?). Furthermore an 128 digit number in base 10 would occupy 53 bytes, but the C (and most other) implementations assume that it will all fit within one machine word, which is obviously against the spec (the only thing the spec says about size of the numbers is that it's at most 127 digits, since one char is a newline, so the implementation must assume the worst about the size of the numbers to be compliant). This is a one-liner in idiomatic Haskell (getContents >>= print . sum . map read . lines), but since there are restrictions specifically tailored to make life easier for lower level languages, the Haskell submission must resort to all sorts of "hacks" to compete (to circumvent the high-level general tools available). Unfair! /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Chris Kuklewicz

5 p.m.

Also about sum-file: They do not reveal what the actual 8k test file contains. So there is no way to reproduce the benchmark locally for testing. (One can learn it totals 400000, but since negative numbers are allowed, this does not help much). The problem can even be solved in one line with (g)awk. Apparantly it is bottlenecked by parsing strings into integers, but they specify "Programs should use built-in line-oriented I/O functions rather than custom-code." which means the programmer's hands are completely tied. So it is just a benchmark of the build-in library function, not of any algorithm the programmer provides. There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on. -- Chris Sebastian Sylvan wrote:

...

On 1/4/06, Brent Fulgham wrote:

...
--- Sebastian Sylvan wrote:

...
Some of the problems seem to be heavily geared towards an imperative *implementation*, meaning that

a Haskell

...
version is hardly idiomatic Haskell (and as such I ,

and I

...
suspect otehrs, really have no inclination to work

on it).

I agree that several benchmarks suffer from this problem, but we have been trying to change this where possible.

A good example of an unfair benchmark that Udo Stenzel noted over at the shootout haskell wiki is "sum-file". Here you specify that no line is ever more than 128 characters long. What's the purpose of doing that? Clearly it's to make life easier for C programmers, is it not? (do C programs never have to deal with the "real world" where such assumptions can't be made?). Furthermore an 128 digit number in base 10 would occupy 53 bytes, but the C (and most other) implementations assume that it will all fit within one machine word, which is obviously against the spec (the only thing the spec says about size of the numbers is that it's at most 127 digits, since one char is a newline, so the implementation must assume the worst about the size of the numbers to be compliant).

This is a one-liner in idiomatic Haskell (getContents >>= print . sum . map read . lines), but since there are restrictions specifically tailored to make life easier for lower level languages, the Haskell submission must resort to all sorts of "hacks" to compete (to circumvent the high-level general tools available). Unfair!

/S

-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Sebastian Sylvan

6:39 p.m.

On 1/5/06, Chris Kuklewicz wrote:

...

Also about sum-file: They do not reveal what the actual 8k test file contains. So there is no way to reproduce the benchmark locally for testing. (One can learn it totals 400000, but since negative numbers are allowed, this does not help much).

The problem can even be solved in one line with (g)awk.

Apparantly it is bottlenecked by parsing strings into integers, but they specify "Programs should use built-in line-oriented I/O functions rather than custom-code." which means the programmer's hands are completely tied. So it is just a benchmark of the build-in library function, not of any algorithm the programmer provides.

There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on.

I agree. The benchmark really is about how fast you can call low-level IO system calls. But since Haskell is a high-level language it's natural that it's a bit difficult to get access to these unsafe (but fast) low-level functions. In fact, if I really wanted to do this, I would use the FFI... Do you think they'll accept this contribution for sum-file? -------- import Foreign.C import Foreign.Ptr import Foreign.Marshal.Array foreign import ccall "stdio.h" fgets :: CString -> Int -> Ptr CFile ->IO CString foreign import ccall safe "stdlib.h" atoi :: CString -> Int foreign import ccall safe "stdio.h &(*stdin)" c_stdin :: Ptr CFile bufferSize = 128 loop :: CString -> Int -> IO Int loop buf v = do ret <- fgets buf bufferSize c_stdin case (ret == nullPtr) of True -> return v -- eof, or some other error False -> do let i = atoi buf i `seq` loop buf (v + i) -- force eval of 'i'! main = do v <- allocaArray bufferSize (\buf -> loop buf 0) print v -------------- -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Chris Kuklewicz

8:35 p.m.

I did manage to tweak SumFile to use unboxed Int# and go 10% faster. http://haskell.org/hawiki/SumFile Sebastian Sylvan wrote:

...

On 1/5/06, Chris Kuklewicz wrote:

...
Also about sum-file: They do not reveal what the actual 8k test file contains. So there is no way to reproduce the benchmark locally for testing. (One can learn it totals 400000, but since negative numbers are allowed, this does not help much).

The problem can even be solved in one line with (g)awk.

Apparantly it is bottlenecked by parsing strings into integers, but they specify "Programs should use built-in line-oriented I/O functions rather than custom-code." which means the programmer's hands are completely tied. So it is just a benchmark of the build-in library function, not of any algorithm the programmer provides.

There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on.

I agree. The benchmark really is about how fast you can call low-level IO system calls. But since Haskell is a high-level language it's natural that it's a bit difficult to get access to these unsafe (but fast) low-level functions. In fact, if I really wanted to do this, I would use the FFI...

Do you think they'll accept this contribution for sum-file?

--------

import Foreign.C import Foreign.Ptr import Foreign.Marshal.Array

foreign import ccall "stdio.h" fgets :: CString -> Int -> Ptr CFile ->IO CString foreign import ccall safe "stdlib.h" atoi :: CString -> Int foreign import ccall safe "stdio.h &(*stdin)" c_stdin :: Ptr CFile

bufferSize = 128

loop :: CString -> Int -> IO Int loop buf v = do ret <- fgets buf bufferSize c_stdin case (ret == nullPtr) of True -> return v -- eof, or some other error False -> do let i = atoi buf i `seq` loop buf (v + i) -- force eval of 'i'!

main = do v <- allocaArray bufferSize (\buf -> loop buf 0) print v

--------------

-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Udo Stenzel

11:41 p.m.

Sebastian Sylvan wrote:

...

On 1/5/06, Chris Kuklewicz wrote:

...
There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on.

I agree. The benchmark really is about how fast you can call low-level IO system calls. But since Haskell is a high-level language it's natural that it's a bit difficult to get access to these unsafe (but fast) low-level functions.

There's probably a bit more to it. First off, one could legitimately argue that (liftM lines getContents) is the Haskell way to do line oriented IO. The real question is, why does the fast solution have to be ugly

...

foreign import ccall "stdio.h" fgets :: CString -> Int -> Ptr CFile ->IO CString foreign import ccall safe "stdlib.h" atoi :: CString -> Int foreign import ccall safe "stdio.h &(*stdin)" c_stdin :: Ptr CFile

and why does the idiomatic solution have to be slow? | main = print . sum . map read . lines =<< getContents The biggest hit is probably the construction of a huge String as linked list, which cannot be deforested away (not with the foldr/build mechanism anyway). Assuming we find a better representation for Strings, we could make some headway here and in many other benchmarks. So I think, just by replacing String and along with it getContents, lines and read, we will get competitive speed and retain the ability to handle arbitrarily long lines. Of course, the shootout wouldn't care for a feature that is otherwise quite important in practice... Anyway, I'll try try to come up with something and then follow up on this. Udo. -- A man always needs to remember one thing about a beautiful woman: Somewhere, somebody's tired of her.

Chris Kuklewicz

6 Jan 6 Jan

9:51 a.m.

Relative speed comparison below Udo Stenzel wrote:

...

Sebastian Sylvan wrote:

...
On 1/5/06, Chris Kuklewicz wrote:

...
There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on.

I agree. The benchmark really is about how fast you can call low-level IO system calls. But since Haskell is a high-level language it's natural that it's a bit difficult to get access to these unsafe (but fast) low-level functions.

There's probably a bit more to it. First off, one could legitimately argue that (liftM lines getContents) is the Haskell way to do line oriented IO. The real question is, why does the fast solution have to be ugly

...
foreign import ccall "stdio.h" fgets :: CString -> Int -> Ptr CFile ->IO CString foreign import ccall safe "stdlib.h" atoi :: CString -> Int foreign import ccall safe "stdio.h &(*stdin)" c_stdin :: Ptr CFile

and why does the idiomatic solution have to be slow?

| main = print . sum . map read . lines =<< getContents

The biggest hit is probably the construction of a huge String as linked list, which cannot be deforested away (not with the foldr/build mechanism anyway). Assuming we find a better representation for Strings, we could make some headway here and in many other benchmarks.

So I think, just by replacing String and along with it getContents, lines and read, we will get competitive speed and retain the ability to handle arbitrarily long lines. Of course, the shootout wouldn't care for a feature that is otherwise quite important in practice... Anyway, I'll try try to come up with something and then follow up on this.

Udo.

The solution on the wiki (Char by Char) is the fastest I could make :

...

print . total =<< getContents Time was 1.00 seconds

I tried using getLine and it was slower:

...

let next rt = do line <- catch getLine (\_ -> return []) if (null line) then return (I# rt) else next (rt +# aLine line) Time was 3.79 seconds

I tried using getContents and lines and it was slowest:

...

total <- return.(next 0#).lines =<< getContents Time was 4.76 seconds

...

From this, I assume the buffering that getContents does is very optimized. The cost of getLine was very nontrivial. So for non-binary input, I would recommend using getContents and processing it Char by Char.

-- Chris

Sebastian Sylvan

7 Jan 7 Jan

1:40 p.m.

On 1/6/06, Udo Stenzel wrote:

...

Sebastian Sylvan wrote:

...
On 1/5/06, Chris Kuklewicz wrote:

...
There is no need to beat a dead horse, though. This benchmark sets out to test fgets / atoi, and that is all. There are better benchmarks to spend time on.

I agree. The benchmark really is about how fast you can call low-level IO system calls. But since Haskell is a high-level language it's natural that it's a bit difficult to get access to these unsafe (but fast) low-level functions.

There's probably a bit more to it. First off, one could legitimately argue that (liftM lines getContents) is the Haskell way to do line oriented IO. The real question is, why does the fast solution have to be ugly

...
foreign import ccall "stdio.h" fgets :: CString -> Int -> Ptr CFile ->IO CString foreign import ccall safe "stdlib.h" atoi :: CString -> Int foreign import ccall safe "stdio.h &(*stdin)" c_stdin :: Ptr CFile

and why does the idiomatic solution have to be slow?

| main = print . sum . map read . lines =<< getContents

The biggest hit is probably the construction of a huge String as linked list, which cannot be deforested away (not with the foldr/build mechanism anyway). Assuming we find a better representation for Strings, we could make some headway here and in many other benchmarks.

So I think, just by replacing String and along with it getContents, lines and read, we will get competitive speed and retain the ability to handle arbitrarily long lines. Of course, the shootout wouldn't care for a feature that is otherwise quite important in practice... Anyway, I'll try try to come up with something and then follow up on this.

It would be neat if the PackedString library contained functions such as hGetLine etc. It does have a function for reading from a buffer, but it won't stop at a newline... But yeah, fast string manipulation is difficult when using a linked-list representation... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Chris Kuklewicz

4:18 p.m.

New subject: x86 code generation going wrong?

Hello, I need to ask for some help to test x86 code generation. There is a factor of two runtime difference between the code I am benchmarking on my OS X powerbook G4 (ghc 6.4.1) and shootout's speed on a linux x86 machine (ghc 6.4.1). Could someone else running on x86 test the three versions pasted below before I think about submitting another one to the shootout? To compile "ghc --make filename.hs -o program" To run "cat input-file | time ./program" where to save space, the gzip'd input file is at http://paradosso.mit.edu/~ckuklewicz/sum-file-test-input.gz ------------------------------------------------------------------------- -- Original version {-# OPTIONS -O2 #-} import Char( ord ) main :: IO () main = getContents >>= print . accP 0 0 accP :: Int -> Int -> String -> Int accP before this [] = before+this accP before this ('\n':xs) = accP (before+this) 0 xs accP before this ('-' :xs) = accN before this xs accP before this (x :xs) = accP before (this*10+ord(x)-ord('0')) xs accN :: Int -> Int -> String -> Int accN before this [] = before-this accN before this ('\n':xs) = accP (before-this) 0 xs accN before this (x :xs) = accN before (this*10+ord(x)-ord('0')) xs ------------------------------------------------------------------------- -- Faster on G4, 2x slower on x86 {-# OPTIONS -O2 -funbox-strict-fields #-} import GHC.Base data I = I !Int main = print . new (I 0) =<< getContents new (I i) [] = i new (I i) ('-':xs) = neg (I 0) xs where neg (I n) ('\n':xs) = new (I (i - n)) xs neg (I n) (x :xs) = neg (I (parse x + (10 * n))) xs new (I i) (x:xs) = pos (I (parse x)) xs where pos (I n) ('\n':xs) = new (I (i + n)) xs pos (I n) (x :xs) = pos (I (parse x + (10 * n))) xs parse c = ord c - ord '0' ------------------------------------------------------------------------- -- Explicitly unboxed proposal, faster on G4 {-# OPTIONS -fglasgow-exts -O2 #-} import GHC.Base main = print . sumFile =<< getContents where sumFile = (\rest -> newLine rest 0#) newLine [] rt = (I# rt) newLine ('-':rest) rt = negLine rest 0# where negLine ('\n':rest) soFar = newLine rest (rt -# soFar) negLine ( x :rest) soFar = negLine rest (d2i x +# (10# *# soFar)) newLine (x:rest) rt = posLine rest (d2i x) where posLine ('\n':rest) soFar = newLine rest (rt +# soFar) posLine ( x :rest) soFar = posLine rest (d2i x +# (10# *# soFar)) d2i (C# c) = (ord# c) -# z where z = ord# '0'# ------------------------------------------------------------------------- Thanks, Chris

Branimir Maksimovic

8 Jan 8 Jan

10:57 a.m.

New subject: x86 code generation going wrong?

...

From: Chris Kuklewicz To: haskell-cafe@haskell.org Subject: [Haskell-cafe] x86 code generation going wrong? Date: Sat, 07 Jan 2006 16:18:59 +0000

Hello,

I need to ask for some help to test x86 code generation.

There is a factor of two runtime difference between the code I am benchmarking on my OS X powerbook G4 (ghc 6.4.1) and shootout's speed on a linux x86 machine (ghc 6.4.1).

Could someone else running on x86 test the three versions pasted below before I think about submitting another one to the shootout?

Here are the tests on P4 2.4 ghz and athlon 64 3000 linux test1-3 in respective order of appearance (note:OPTIONS didn't do anything I have to compile -O2 -fglasgow-exts explicitely, because I've got compile error for test3.hs ) [bmaxa@maxa ~/haskell/myhaskell] $ time ./test1 < sum-file-test-input 4000000 real 0m3.550s user 0m3.440s sys 0m0.080s [bmaxa@maxa ~/haskell/myhaskell] $ time ./test2 < sum-file-test-input 4000000 real 0m3.708s user 0m3.660s sys 0m0.060s [bmaxa@maxa ~/haskell/myhaskell] $ time ./test3 < sum-file-test-input 4000000 real 0m3.678s user 0m3.620s sys 0m0.050s This is on athlon64 3000 , linux : [bmaxa@devel64 ~]$ time ./test1 < sum-file-test-input 4000000 real 0m5.782s user 0m5.724s sys 0m0.056s [bmaxa@devel64 ~]$ time ./test2 < sum-file-test-input 4000000 real 0m5.953s user 0m5.900s sys 0m0.052s [bmaxa@devel64 ~]$ time ./test3 < sum-file-test-input 4000000 real 0m5.403s user 0m5.332s sys 0m0.072s Greetings, Bane.

...

To compile "ghc --make filename.hs -o program"

To run "cat input-file | time ./program"

where to save space, the gzip'd input file is at

http://paradosso.mit.edu/~ckuklewicz/sum-file-test-input.gz

------------------------------------------------------------------------- -- Original version {-# OPTIONS -O2 #-} import Char( ord )

main :: IO () main = getContents >>= print . accP 0 0

accP :: Int -> Int -> String -> Int accP before this [] = before+this accP before this ('\n':xs) = accP (before+this) 0 xs accP before this ('-' :xs) = accN before this xs accP before this (x :xs) = accP before (this*10+ord(x)-ord('0')) xs

accN :: Int -> Int -> String -> Int accN before this [] = before-this accN before this ('\n':xs) = accP (before-this) 0 xs accN before this (x :xs) = accN before (this*10+ord(x)-ord('0')) xs

------------------------------------------------------------------------- -- Faster on G4, 2x slower on x86 {-# OPTIONS -O2 -funbox-strict-fields #-} import GHC.Base

data I = I !Int

main = print . new (I 0) =<< getContents

new (I i) [] = i new (I i) ('-':xs) = neg (I 0) xs where neg (I n) ('\n':xs) = new (I (i - n)) xs neg (I n) (x :xs) = neg (I (parse x + (10 * n))) xs new (I i) (x:xs) = pos (I (parse x)) xs where pos (I n) ('\n':xs) = new (I (i + n)) xs pos (I n) (x :xs) = pos (I (parse x + (10 * n))) xs

parse c = ord c - ord '0'

------------------------------------------------------------------------- -- Explicitly unboxed proposal, faster on G4 {-# OPTIONS -fglasgow-exts -O2 #-}

import GHC.Base

main = print . sumFile =<< getContents where sumFile = (\rest -> newLine rest 0#)

newLine [] rt = (I# rt) newLine ('-':rest) rt = negLine rest 0# where negLine ('\n':rest) soFar = newLine rest (rt -# soFar) negLine ( x :rest) soFar = negLine rest (d2i x +# (10# *# soFar)) newLine (x:rest) rt = posLine rest (d2i x) where posLine ('\n':rest) soFar = newLine rest (rt +# soFar) posLine ( x :rest) soFar = posLine rest (d2i x +# (10# *# soFar))

d2i (C# c) = (ord# c) -# z where z = ord# '0'# -------------------------------------------------------------------------

Thanks, Chris

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

Bulat Ziganshin

11:19 a.m.

New subject: Re[2]: x86 code generation going wrong?

Hello Branimir, Sunday, January 08, 2006, 1:57:06 PM, you wrote: BM> of appearance (note:OPTIONS didn't do anything I have to BM> compile -O2 -fglasgow-exts explicitely, because I've got compile error for BM> test3.hs ) use {-# OPTIONS_GHC -O2 -fglasgow-exts #-} -- Best regards, Bulat mailto:bulatz@HotPOP.com

Simon Marlow

9 Jan 9 Jan

11:32 a.m.

Sebastian Sylvan wrote:

...

It would be neat if the PackedString library contained functions such as hGetLine etc. It does have a function for reading from a buffer, but it won't stop at a newline... But yeah, fast string manipulation is difficult when using a linked-list representation...

My version of the packed string library does have an hGetLine. Don Stewart was merging my version with his fps at some point, Don - any news on that? Cheers, Simon

Einar Karttunen

11 Jan 11 Jan

3:06 p.m.

On 09.01 11:32, Simon Marlow wrote:

...

Sebastian Sylvan wrote:

...
It would be neat if the PackedString library contained functions such as hGetLine etc. It does have a function for reading from a buffer, but it won't stop at a newline... But yeah, fast string manipulation is difficult when using a linked-list representation...

My version of the packed string library does have an hGetLine. Don Stewart was merging my version with his fps at some point, Don - any news on that?

Getting a fast FastPackedString will solve the problems with many benchmarks. A similar thing for arrays would be nice - although this is more about inteface:

...

module Data.Array.UnsafeOps where

import Data.Array.Base hiding((!))

{-# INLINE (!) #-} (!) :: MArray a e m => a Int e -> Int -> m e (!) = unsafeRead

{-# INLINE set #-} set :: MArray a e m => a Int e -> Int -> e -> m () set = unsafeWrite

{-# INLINE swap #-} swap :: MArray a e m => a Int e -> Int -> Int -> m () swap arr x y = do xv <- arr ! x yv <- arr ! y set arr x yv set arr y xv

{-# INLINE combineTo #-} combineTo :: MArray a e m => a Int e -> Int -> (e -> e -> e) -> a Int e -> Int -> m () combineTo a0 i0 f a1 i1 = do v0 <- a0 ! i0 v1 <- a1 ! i1 set a0 i0 $! f v0 v1

and so forth. Usually imperative solutions have something like "a[i] += b[i]", which currently is quite tedious and ugly to translate to MArrays. Now it would become "combineTo a i (+) b i". - Einar Karttunen

Bulat Ziganshin

6:20 p.m.

Hello Einar, Wednesday, January 11, 2006, 6:06:56 PM, you wrote:

...

...
My version of the packed string library does have an hGetLine. Don Stewart was merging my version with his fps at some point, Don - any news on that?

EK> Getting a fast FastPackedString will solve the problems with many EK> benchmarks. btw, JHC's version of FPS uses slightly less memory (i don't remember, 8 or 12 bytes per each string) and i think must be faster (because it uses ByteArray# instead of Addr#). so, the best variant is to add hGetLine to John's library

...

...
set arr x yv

(arr,x) =: yv looks better ;) EK> and so forth. Usually imperative solutions have something like EK> "a[i] += b[i]", which currently is quite tedious and ugly to EK> translate to MArrays. Now it would become "combineTo a i (+) b i". you are definitely a Hal Daume's client, look at http://www.isi.edu/~hdaume/STPP/ -- Best regards, Bulat mailto:bulatz@HotPOP.com

Brent Fulgham

6 Jan 6 Jan

1:26 a.m.

--- Chris Kuklewicz wrote:

...

Also about sum-file: They do not reveal what the actual 8k test file contains. So there is no way to reproduce the benchmark locally for testing. (One can learn it totals 400000, but since negative numbers are allowed, this does not help much).

It's created by catting the example file together multiple times. I'll update the page to be more clear about this, and I can send you the actual file used on the test machine if you want.

...

Apparantly it is bottlenecked by parsing strings into integers, but they specify "Programs should use built-in line-oriented I/O functions rather than custom-code." which means the programmer's hands are completely tied. So it is just a benchmark of the build-in library function, not of any algorithm the programmer provides.

Yes -- it was designed as a test of the standard I/O system. -Brent

Chris Kuklewicz

9:55 a.m.

Brent Fulgham wrote:

...

--- Chris Kuklewicz wrote:

...
Also about sum-file: They do not reveal what the actual 8k test file contains. So there is no way to reproduce the benchmark locally for testing. (One can learn it totals 400000, but since negative numbers are allowed, this does not help

much).

It's created by catting the example file together multiple times. I'll update the page to be more clear about this, and I can send you the actual file used on the test machine if you want.

That is what I did as a hack. Nice to see I guessed right. Right now I use a 1,680,000 line (concatenated) version to get the processing times large enough to discern small improvements.

...

...
Apparantly it is bottlenecked by parsing strings into integers, but they specify "Programs should use built-in line-oriented I/O functions rather than custom-code." which means the programmer's hands are completely tied. So it is just a

benchmark of the

...
build-in library function, not of any algorithm the programmer provides.

Yes -- it was designed as a test of the standard I/O system.

-Brent

I assumed that that I could use getContents, like the previously accepted Haskell entry. It returns the entire stdin as a single (lazy) string, so it is technically not "line oriented". But it is certainly a "standard I/O system" for Haskell. I'll submit my improved version soon. -- Chris

7113

Age (days ago)

7120

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

Branimir Maksimovic
Brent Fulgham
Bulat Ziganshin
Chris Kuklewicz
Einar Karttunen
Scherrer, Chad
Sebastian Sylvan
Simon Marlow
Udo Stenzel