
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :) http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all Just one or two more parallel programs required... Submit them here, and we can test on quad core, http://haskell.org/haskellwiki/Shootout/Parallel -- Don

Don Stewart wrote:
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :)
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all
Yay! This makes me very happy. ;-)

Don Stewart ha scritto:
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :)
This is cheating, IMHO. Some test comparisons are unfair. The first problem is with the thread-ring benchmark. Haskell uses the "concurrent Haskell" extension, but all other programs (with some exceptions) uses OS threads. This is unfair, as an example a C program can make use of the GNU threads library, for user space threads), but there is no such program. If I remove the thread-ring bench, then Haskell goes sixth, after Java and OCaml. With parallel programs it is the same: other languages does not have a parallel version.
[...]
Manlio Perillo

On 22 Sep 2008, at 11:46, Manlio Perillo wrote:
Don Stewart ha scritto:
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :)
This is cheating, IMHO. Some test comparisons are unfair.
The first problem is with the thread-ring benchmark. Haskell uses the "concurrent Haskell" extension, but all other programs (with some exceptions) uses OS threads. This is unfair, as an example a C program can make use of the GNU threads library, for user space threads), but there is no such program.
Who said that C *had* to use OS threads? The point of the shootout is to show the relative strengths of the various languages, one of the strengths of Haskell is some excellent lightweight thread support, this is not present in C, so C does badly on the tests that check how well you can deal with small threads.
With parallel programs it is the same: other languages does not have a parallel version.
Yes, and the new benchmarks are *specifically* designed to test how fast programs are on more recent multi-core hardware, so again, the other languages are welcome to submit parallel versions... It just turns out that Haskell is pretty damn good at doing parallelism. Bob

Hello Manlio, Monday, September 22, 2008, 1:46:55 PM, you wrote:
This is cheating, IMHO. Some test comparisons are unfair.
this overall test is uselles for speed comparison. afair, there are only 2-3 programs whose speed isn't heavily depend on libraries. in DNA test, for example, Tcl (or PHP?) was leader just because it has better regexp library to make things even funnier, this test allows to use libs bundled to compiler, but not 3rd-arty ones. so ghc (not Haskell!) what includes more built-in libs than gcc looks like a leader. of course, noone uses bare ghc or gcc in real world even benchamrks that test pure speed of compiled code are useless because 1) they are imited by RAM speed, not speed of code itself 2) there uis a fashion in Haskell world to write for shootout in the very low-level style, which isn't actually used in real programming and actually understood only by a few "wizards" developing high-performance haskell code so actually that this shootout shows is that 1) in real world, program speed more depends on libs that on compilers. if you go to compare weak language plus lot of libs with a strong language with a few libs first variant will win (unless you go to reimplement all these libs at your own) 2) highly-optimized Haskell code is only 2-3 times slower than real C code produces by average C programmer. taken into account that such Haskell code is written many times slower than C one and need much more knowledge, Haskell hardly can be compared to C 3) if you need to compare real-world Haskell code with C one, you may look at these papers: An approach to fast arrays in Haskell http://www.cse.unsw.edu.au/~chak/papers/afp-arrays.ps.gz Rewriting Haskell Strings http://www.cse.unsw.edu.au/~dons/papers/fusion.pdf that they call "naive approach" is real Haskell programs not speeded up by special libs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Mon, Sep 22, 2008 at 2:07 PM, Bulat Ziganshin
this overall test is uselles for speed comparison. afair, there are only 2-3 programs whose speed isn't heavily depend on libraries. in DNA test, for example, Tcl (or PHP?) was leader just because it has better regexp library
On the regex-dna benchmark, I'll have to agree. It's unfortunate to have a benchmark so dependent on the set of libraries included in the distribution, and e.g. Text.Regex.PCRE kicks Text.Regex.Posix's behind in this benchmark - but we probably can't use it only because one's bundled and the other isn't. Maybe we could roll our own regexp engine for this specific benchmark though (for example, Text.Regex.TDFA is within 10% or something of PCRE and AFAIK pure Haskell - a customized and downsized version of that could probably be made quite competitive).
to make things even funnier, this test allows to use libs bundled to compiler, but not 3rd-arty ones. so ghc (not Haskell!) what includes more built-in libs than gcc looks like a leader. of course, noone uses bare ghc or gcc in real world
I don't think it's ever claimed that the shootout is a fair benchmark of real-world problems :D
2) there uis a fashion in Haskell world to write for shootout in the very low-level style, which isn't actually used in real programming and actually understood only by a few "wizards" developing high-performance haskell code
That is certainly the case with a few of the benchmark implementations, and in the past it was required to get the performance. In some cases it's also due simply to the haskell implementation being a straight-from-C port - which necessitates using pointers and putting everything in IO etc... Some of that haskell code is among the least readable code I've ever read (see e.g. the fannkuch benchmark at http://shootout.alioth.debian.org/u64q/benchmark.php?test=fannkuch&lang=ghc). But that's what you get when porting pointer-rich C code directly into Haskell :) With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones - it's all about allowing the compiler to do good stuff with naive (or at least naive-looking) code, and teaching it how to cut through the abstractions. (As well as using the right abstractions, of course...) // Simon

Hello Simon, Monday, September 22, 2008, 9:03:52 PM, you wrote:
With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words" -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Mon, Sep 22, 2008 at 1:12 PM, Bulat Ziganshin
Hello Simon,
Monday, September 22, 2008, 9:03:52 PM, you wrote:
With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words".
To a newbie, that's a cryptic statement. Are you saying that these libs aren't needed to make a high-performance "wc", and that the Prelude functions are sufficient? (I note too that there is Data.ByteString.Char8.words, but I assume you are talking about the Prelude function.) Graham

Hello Graham,
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words".
To a newbie, that's a cryptic statement. Are you saying that these libs aren't needed to make a high-performance "wc", and that the Prelude functions are sufficient? (I note too that there is Data.ByteString.Char8.words, but I assume you are talking about the Prelude function.)
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

--- Bulat Ziganshin
Hello Graham,
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words".
To a newbie, that's a cryptic statement. Are you saying that these libs aren't needed to make a high-performance "wc", and that the Prelude functions are sufficient? (I note too that there is Data.ByteString.Char8.words, but I assume you are talking about the Prelude function.)
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
Is there something particularly fascinating about naive code written in any language?

Hello Isaac, Monday, September 22, 2008, 11:49:30 PM, you wrote:
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
Is there something particularly fascinating about naive code written in any language?
yes, in asm number of instructions executed more or less define number of CPU cycles used. C known as portable asm. Haskell was developed with goal to hide implementation details from egg-headed scientists and this obviously should have some drawbacks -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, 2008-09-23 at 00:20 +0400, Bulat Ziganshin wrote:
Hello Isaac,
Monday, September 22, 2008, 11:49:30 PM, you wrote:
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
Is there something particularly fascinating about naive code written in any language?
yes, in asm number of instructions executed more or less define number of CPU cycles used.
On modern processors (RISC design), naive asm is likely to be extraordinarily slow, because memory usage and cache considerations and register scheduling dominate processor cycles.
C known as portable asm.
Known as != is. And naive C is also extraordinarily slow, especially if written at a high level. It is not the least bit difficult to write memory hogs in C. (I should know; I've done it. And so has every major software house (including open source projects) to release in C, for that matter.)
Haskell was developed with goal to hide implementation details from egg-headed scientists and this obviously should have some drawbacks
Should != is. Not all shoot-out entries look like C with Haskell syntax (although some do). Naive Haskell can be 100s of times slower than well-tuned C; naive C can be 100s of times slower than well-tuned Haskell (where well-tuned Haskell can just mean using good data structures. It's quite naive indeed to dismiss better data structures and better algorithms (especially where `better algorithms) as `libraries.) jcc

Hello Jonathan, Tuesday, September 23, 2008, 12:30:19 AM, you wrote:
yes, in asm number of instructions executed more or less define number of CPU cycles used.
.... well, i more or less know all this stuff. but are you really compare to Haskell??? does Haskell programs typically written in account of cache misses and CPU ticks? i suggest you to look into two papers i mentioned - they state hundreds times slower naive Haskell vs naive C. it's a reality, against all your arguments -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, 2008-09-23 at 00:46 +0400, Bulat Ziganshin wrote:
Hello Jonathan,
Tuesday, September 23, 2008, 12:30:19 AM, you wrote:
yes, in asm number of instructions executed more or less define number of CPU cycles used.
....
well, i more or less know all this stuff. but are you really compare to Haskell???
You were tasked with finding a language where `naive' code is fast. Assembler doesn't count; you have to think about
does Haskell programs typically written in account of cache misses and CPU ticks?
No. At that level, you write assembler (or C). But assembler and C are usually not written in a naive fashion (not as naive as naive Haskell). And when they do, they suck, performance-wise.
i suggest you to look into two papers i mentioned
I don't remember the exact citation, but I'm sure I've read them. Probably several times over. Most papers I've seen I think the authors looked for a large contrast; they aren't comparing Haskell and C on truly level ground.
- they state hundreds times slower naive Haskell
With slow datastructures
vs naive C.
Not so naive; just better datastructures. Which are then replicated in the relevant papers. And the result is neither extremely low-level code nor 100s of times slower than C. The term `naive' gets thrown around in the introduction to these papers for effect, but the Haskell code under discussion by the time you reach the conclusion isn't much less `naive' than the C code in the introduction.
it's a reality, against all your arguments
Bang harder!! I don't think the next list over can hear you yet! jcc

Hello Bulat,
On Mon, Sep 22, 2008 at 3:09 PM, Bulat Ziganshin
Hello Graham,
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words".
To a newbie, that's a cryptic statement. Are you saying that these libs aren't needed to make a high-performance "wc", and that the Prelude functions are sufficient? (I note too that there is Data.ByteString.Char8.words, but I assume you are talking about the Prelude function.)
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
I'm fairly new to Haskell and the Haskell community, but I can say from my experience of hacking on GHC, the GHC team of developers are very interested in performance improvements, e.g. this thread is about performance! So Bulat, why don't you hack on GHC yourself and help improve the compiler? Or suggest detailed improvements since you seem capable of citing specific examples? Thank you. __ Donnie

Hello Donnie, Tuesday, September 23, 2008, 2:53:17 AM, you wrote:
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
I'm fairly new to Haskell and the Haskell community, but I can say from my experience of hacking on GHC, the GHC team of developers are very interested in performance improvements, e.g. this thread is about performance! So Bulat, why don't you hack on GHC yourself and help improve the compiler? Or suggest detailed improvements since you seem capable of citing specific examples?
for the same reason i don't feed 5000 men with 7 breads - it's not my business ;) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Donnie,
Tuesday, September 23, 2008, 2:53:17 AM, you wrote:
i mean that naive haskell code is very slow and 3 or 5 or twelve libs can't solve the problem of ghc generating slow code
I'm fairly new to Haskell and the Haskell community, but I can say from my experience of hacking on GHC, the GHC team of developers are very interested in performance improvements, e.g. this thread is about performance! So Bulat, why don't you hack on GHC yourself and help improve the compiler? Or suggest detailed improvements since you seem capable of citing specific examples?
for the same reason i don't feed 5000 men with 7 breads - it's not my business ;)
Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still. For example, Bulat cites a paper talking about naive list code from 2002, however, by 2008 we know how to do fusion on lists, so it runs in the same time as low level loops, the technique is implemented and you can download it from hackage, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-fusion Simon Marlow is busy adding more efficient GC and parallelism to GHC, and there's a summer project to rewrite the native code generator. GHC gained pointer tagging in the last release cycle, dramatically reducing the cost of algebraic data types. At the same time, we're writing books about how to program "naively" in Haskell, such that your code doesn't suck. That is: training. Teaching people how to write Haskell. We can see it paying off, where naive code performs very very well, http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all And lots and lots more people able to write good code for Hackage. I find Bulat's outlook rather bleak, and I think it is time to update expectations. Progress is beautiful. -- Don

Hello Don, Tuesday, September 23, 2008, 3:12:37 AM, you wrote:
for the same reason i don't feed 5000 men with 7 breads - it's not my business ;)
Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
yes. when we say that things are greatly improving each year, this means that they was bad previously ;)
For example, Bulat cites a paper talking about naive list code from 2002, however, by 2008 we know how to do fusion on lists, so it runs in the same time as low level loops, the technique is implemented and you can download it from hackage,
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-fusion
someone can ask why this great code isn't used in ghc by default. probably it is just work in progress which isn't yet ready to replace old slow code. then we can see tha fusion isn't magic wand which improves speed of every haskell code that's slower than C one. it just makes C-speed code sometimes
Simon Marlow is busy adding more efficient GC and parallelism to GHC, and there's a summer project to rewrite the native code generator.
well, i'm sais about *current* real situation. if you consider this as attack against Haskell developers, it's your mistake. the problem is that i many years wrote about slowness of ghc code, every time you cosider this as attack and write that in future Haskell will become much faster. we still wait for this future, however
GHC gained pointer tagging in the last release cycle, dramatically reducing the cost of algebraic data types.
10-20% speed improvement, on average. it's the only real improvement (for my own program) i know. i think that ByteString-related libs gived more improvements but their use isn't automatic and they doesn't help in any situation. they just provide fast library code for solving some concrete (although very frequent) situations, such as counting lines
At the same time, we're writing books about how to program "naively" in Haskell, such that your code doesn't suck. That is: training. Teaching people how to write Haskell.
it is what i say - if naive code was effective and automagically optimized by ghc, we don't need all those tutorials. anyone looked into your tutorial on writing efficient Haskell code, will never say that it's easier than in C. so we can conclude that optimized haskell programs are several times slower than C ones and need several times more to write
We can see it paying off, where naive code performs very very well,
http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all
yes! it's my beloved example of "elegant" haskell code entirely based on the readInt function added to ghc libs specifically to win in this test. it's implementation is really simple and naive: -- --------------------------------------------------------------------- -- Reading from ByteStrings -- | readInt reads an Int from the beginning of the ByteString. If there is no -- integer at the beginning of the string, it returns Nothing, otherwise -- it just returns the int read, and the rest of the string. readInt :: ByteString -> Maybe (Int, ByteString) readInt as | null as = Nothing | otherwise = case unsafeHead as of '-' -> loop True 0 0 (unsafeTail as) '+' -> loop False 0 0 (unsafeTail as) _ -> loop False 0 0 as where loop :: Bool -> Int -> Int -> ByteString -> Maybe (Int, ByteString) STRICT4(loop) loop neg i n ps | null ps = end neg i n ps | otherwise = case B.unsafeHead ps of w | w >= 0x30 && w <= 0x39 -> loop neg (i+1) (n * 10 + (fromIntegral w - 0x30)) (unsafeTail ps) | otherwise -> end neg i n ps end _ 0 _ _ = Nothing end True _ n ps = Just (negate n, ps) end _ _ n ps = Just (n, ps) when gcc developers will start to add to C libraries functions performing shootout benchmarks we will continue this discussion :D
And lots and lots more people able to write good code for Hackage.
I find Bulat's outlook rather bleak, and I think it is time to update expectations.
Progress is beautiful.
:) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello Don, Tuesday, September 23, 2008, 4:22:19 AM, you wrote:
bulat.ziganshin:
when gcc developers will start to add to C libraries functions performing shootout benchmarks we will continue this discussion :D
atoi(3).
it isn't the same as readInt which was added specifically for this test. it doesn't support arbitrary-size streams and doesn't return rest of stream -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Don,
Tuesday, September 23, 2008, 4:22:19 AM, you wrote:
bulat.ziganshin:
when gcc developers will start to add to C libraries functions performing shootout benchmarks we will continue this discussion :D
atoi(3).
it isn't the same as readInt which was added specifically for this test. it doesn't support arbitrary-size streams and doesn't return rest of stream
Hmm? That is wrong. These functions explicitly work on arbitrarily long lazy bytestrings, and return the rest of the stream in a pair: readInt :: ByteString -> Maybe (Int, ByteString) readInteger :: ByteString -> Maybe (Integer, ByteString) These are the initial parts of a bytestring lexing library, more of which is appareing in the bytestring-lex package on Hackage. -- Don

Hello Don, Tuesday, September 23, 2008, 4:36:55 AM, you wrote:
it isn't the same as readInt which was added specifically for this test. it doesn't support arbitrary-size streams and doesn't return rest of stream
Hmm? That is wrong. These functions explicitly work on arbitrarily long lazy bytestrings, and return the rest of the stream in a pair:
lazy bytestring which is read from file on demand is the same as arbitrary-sized stream. equivalent C code would be much faster (and probably limited by memory speed)
readInt :: ByteString -> Maybe (Int, ByteString) readInteger :: ByteString -> Maybe (Integer, ByteString)
These are the initial parts of a bytestring lexing library, more of which is appareing in the bytestring-lex package on Hackage.
readInt was added to FPS library by you and immediately used to improve speed of this benchmark two years ago. there was no readInteger since shootout doesn't contain such benchmarks :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

2008/9/23 Bulat Ziganshin
Hello Don,
Tuesday, September 23, 2008, 4:22:19 AM, you wrote:
bulat.ziganshin:
when gcc developers will start to add to C libraries functions performing shootout benchmarks we will continue this discussion :D
atoi(3).
it isn't the same as readInt which was added specifically for this test. it doesn't support arbitrary-size streams and doesn't return rest of stream
Yeah, readInt has a more useful type than atoi() in most circumstances so obviously it's a default which somehow disqualify this function... (I wasn't aware that this function was only useful in the shoutout context, I should probably scrap it from my other programs now) I think we should write all the entries in Haskell98 and disable any optimisation in GHC too, that would gives us a much fairer vision of Haskell current performances. -- Jedaï

Hello Chaddaï, Tuesday, September 23, 2008, 4:39:18 AM, you wrote:
it isn't the same as readInt which was added specifically for this test. it doesn't support arbitrary-size streams and doesn't return rest of stream
I think we should write all the entries in Haskell98 and disable any optimisation in GHC too, that would gives us a much fairer vision of Haskell current performances.
well, it's what C developers does. just look at full list of C and Haskell entries for this benchmark - all qualified C programs used std C/C++ libs, several really optimized entries was disqualified. and then Don found genious solution to the problem - add this function to GHC libs. or shootout wasn't first usage of this function? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

At the risk of getting sucked into a silly discussion, I'd like to point out
that the c code uses the following "really simple and naive" function:
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/strtol.c?rev=1.42.2.2&content-type=text/x-cvsweb-markup&cvsroot=glibc
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/strtol_l.c?rev=1.4.2.3&content-type=text/x-cvsweb-markup&cvsroot=glibc
Oh, and it "simply and naively" loops with the following:
while (fgets_unlocked (line, MAXLINELEN, stdin))
If Bulat's point is that the shootout has inspired work on Haskell
performance, and in the stdlibs no less, then lovely, and that's all to the
good. Below every high level interface is lots of low level pain.
If his point is anything else, this is getting absurd.
--S
On Mon, Sep 22, 2008 at 8:16 PM, Bulat Ziganshin
Hello Don,
Tuesday, September 23, 2008, 3:12:37 AM, you wrote:
for the same reason i don't feed 5000 men with 7 breads - it's not my business ;)
Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
yes. when we say that things are greatly improving each year, this means that they was bad previously ;)
For example, Bulat cites a paper talking about naive list code from 2002, however, by 2008 we know how to do fusion on lists, so it runs in the same time as low level loops, the technique is implemented and you can download it from hackage,
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stream-fusion
someone can ask why this great code isn't used in ghc by default. probably it is just work in progress which isn't yet ready to replace old slow code. then we can see tha fusion isn't magic wand which improves speed of every haskell code that's slower than C one. it just makes C-speed code sometimes
Simon Marlow is busy adding more efficient GC and parallelism to GHC, and there's a summer project to rewrite the native code generator.
well, i'm sais about *current* real situation. if you consider this as attack against Haskell developers, it's your mistake. the problem is that i many years wrote about slowness of ghc code, every time you cosider this as attack and write that in future Haskell will become much faster. we still wait for this future, however
GHC gained pointer tagging in the last release cycle, dramatically reducing the cost of algebraic data types.
10-20% speed improvement, on average. it's the only real improvement (for my own program) i know. i think that ByteString-related libs gived more improvements but their use isn't automatic and they doesn't help in any situation. they just provide fast library code for solving some concrete (although very frequent) situations, such as counting lines
At the same time, we're writing books about how to program "naively" in Haskell, such that your code doesn't suck. That is: training. Teaching people how to write Haskell.
it is what i say - if naive code was effective and automagically optimized by ghc, we don't need all those tutorials. anyone looked into your tutorial on writing efficient Haskell code, will never say that it's easier than in C. so we can conclude that optimized haskell programs are several times slower than C ones and need several times more to write
We can see it paying off, where naive code performs very very well,
http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all
yes! it's my beloved example of "elegant" haskell code entirely based on the readInt function added to ghc libs specifically to win in this test. it's implementation is really simple and naive:
-- --------------------------------------------------------------------- -- Reading from ByteStrings
-- | readInt reads an Int from the beginning of the ByteString. If there is no -- integer at the beginning of the string, it returns Nothing, otherwise -- it just returns the int read, and the rest of the string. readInt :: ByteString -> Maybe (Int, ByteString) readInt as | null as = Nothing | otherwise = case unsafeHead as of '-' -> loop True 0 0 (unsafeTail as) '+' -> loop False 0 0 (unsafeTail as) _ -> loop False 0 0 as
where loop :: Bool -> Int -> Int -> ByteString -> Maybe (Int, ByteString) STRICT4(loop) loop neg i n ps | null ps = end neg i n ps | otherwise = case B.unsafeHead ps of w | w >= 0x30 && w <= 0x39 -> loop neg (i+1) (n * 10 + (fromIntegral w - 0x30)) (unsafeTail ps) | otherwise -> end neg i n ps
end _ 0 _ _ = Nothing end True _ n ps = Just (negate n, ps) end _ _ n ps = Just (n, ps)
when gcc developers will start to add to C libraries functions performing shootout benchmarks we will continue this discussion :D
And lots and lots more people able to write good code for Hackage.
I find Bulat's outlook rather bleak, and I think it is time to update expectations.
Progress is beautiful.
:)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello Sterling, Tuesday, September 23, 2008, 5:13:57 AM, you wrote:
Oh, and it "simply and naively" loops with the following: while (fgets_unlocked (line, MAXLINELEN, stdin)) If Bulat's point is that the shootout has inspired work on Haskell performance, and in the stdlibs no less, then lovely, and that's all to the good. Below every high level interface is lots of low level pain.
functions used to make C code faster is obviously worse than those used for Haskell code. just look - Clean gets 2x better performance than C
If his point is anything else, this is getting absurd.
my point is very simple - these tests says nothing about real performance since these tests was hardly optimized including adding special functions to ghc libs. amount of work required to do this is much much more than amount of work required to write optimal C/asm code and this work obviously doesn't speed up every Haskell program. so that we have in Haskell world now is heroic efforts to speed up shootout test which doesn't say anything about real Haskell performance. what we have on prcatice is 10-20% speedup of ghc 6.8 and several libs which may improve speed in some usages -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
and this work obviously doesn't speed up every Haskell program. so that we have in Haskell world now is heroic efforts to speed up shootout test which doesn't say anything about real Haskell performance. what we have on prcatice is 10-20% speedup of ghc 6.8 and several libs which may improve speed in some usages
If you understand performance as well as you claim to - and from your previous postings, I believe you *do* understand performance well - then you will know that "10-20% speedup" is almost entirely meaningless in isolation. Any given particular program has a bottleneck; this bottleneck may be different depending on the OS/hardware configuration, although normally it won't be. Improvements which touch that bottleneck can have staggering benefits in the 40-500% range; improvements which are elsewhere have tiny <5% or non-measurale effects. In fact, various improvements made to GHC in the 6.4-6.10 timeline have had enormous, order-of-magnitude improvements to particular code patterns which had particular bottlenecks. Meanwhile they may well have had no effect at all on other code patterns which had different bottlenecks. You may ask, what are the common code patterns? What are the common bottlenecks? I'm not aware of good studies to answer these questions although they probably exist; I don't read widely the research in this area. [Naive C programs tend to IO bottleneck or Memory bottleneck; I strongly suspect naive haskell programs tend to GC bottleneck; but I don't think either of these observations is particularly profound or useful] What matters to a particular programmer of course it not actually common patterns and common bottlenecks. It is the bottleneck in his particular program. However: The Shootout is a *game*. It even says that in its name. It's a game which many of us enjoy playing; if you don't enjoy, please feel free not to play. Many of us find that, by playing the game, we learn a lot of interesting things about the low-level performance of GHC in interesting edge-cases. The quad-core machine recently added to the benchmark has enabled us to learn interesting things about `par` and Control.Parallel. Learning these things, and sharing them, may help us write better programs, help us teach other people to write better programs, and help the GHC team write a better compiler. Jules

Hello Jules, Tuesday, September 23, 2008, 2:21:34 PM, you wrote:
performance. what we have on prcatice is 10-20% speedup of ghc 6.8 and several libs which may improve speed in some usages
If you understand performance as well as you claim to - and from your previous postings, I believe you *do* understand performance well - then you will know that "10-20% speedup" is almost entirely meaningless in isolation.
Any given particular program has a bottleneck; this bottleneck may be
that's good when we consider optimization of specific program, in my own one bottlenecks are studied and rewritten either in C or low-level Haskell but when we say about compiler speed, we are forced to sau about some average values. these numbers are given by GHC developers, i seen the same on one test of my own, several bottlenecks fixed can't say too much good things about ghc - it's not that now we are 100x fatser than C at least sometimes, it's what we have fioxed some cases when we was 100x slower -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

amount of work required to do this is much much more than amount of work required to write optimal C/asm code
I'm sorry, but no it's not. I've been using Haskell for a little under two
years now, and I'm already able to produce programs with significantly less
pain which outperform the C equivalents. Sure, if I pour a lot more time and
head scratching into my C, then I can probably beat my Haskell, but I just
don't have the time (or the need) to introduce pointer tagging, lazy
evaluation, and referential transparency transparency to my C code.
Any way, this thread has lost any usefulness.
On Tue, Sep 23, 2008 at 5:32 AM, Bulat Ziganshin
Hello Sterling,
Tuesday, September 23, 2008, 5:13:57 AM, you wrote:
Oh, and it "simply and naively" loops with the following: while (fgets_unlocked (line, MAXLINELEN, stdin)) If Bulat's point is that the shootout has inspired work on Haskell performance, and in the stdlibs no less, then lovely, and that's all to the good. Below every high level interface is lots of low level pain.
functions used to make C code faster is obviously worse than those used for Haskell code. just look - Clean gets 2x better performance than C
If his point is anything else, this is getting absurd.
my point is very simple - these tests says nothing about real performance since these tests was hardly optimized including adding special functions to ghc libs. amount of work required to do this is much much more than amount of work required to write optimal C/asm code
and this work obviously doesn't speed up every Haskell program. so that we have in Haskell world now is heroic efforts to speed up shootout test which doesn't say anything about real Haskell performance. what we have on prcatice is 10-20% speedup of ghc 6.8 and several libs which may improve speed in some usages
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- /jve

Hello John, Tuesday, September 23, 2008, 4:27:05 PM, you wrote:
amount of work required to do this is much much more than amount of work required to write optimal C/asm code
I'm sorry, but no it's not. I've been using Haskell for a little under two years now, and I'm already able to produce programs with significantly less pain which outperform the C equivalents. Sure, if
can you write program which sums hexadecimal numbers which is both significantly less than C 10-liner and work faster? :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Probably not, but I think you completely missed my point. Perhaps I should
have originally written "my original C equivalents" rather than "the".
You're probably just a better C programmer than me.
On Tue, Sep 23, 2008 at 9:25 AM, Bulat Ziganshin
Hello John,
Tuesday, September 23, 2008, 4:27:05 PM, you wrote:
amount of work required to do this is much much more than amount of work required to write optimal C/asm code
I'm sorry, but no it's not. I've been using Haskell for a little under two years now, and I'm already able to produce programs with significantly less pain which outperform the C equivalents. Sure, if
can you write program which sums hexadecimal numbers which is both significantly less than C 10-liner and work faster? :)
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
-- /jve

Hello John, Tuesday, September 23, 2008, 5:39:17 PM, you wrote:
Probably not, but I think you completely missed my point. Perhaps I should have originally written "my original C equivalents" rather than "the". You're probably just a better C programmer than me.
well, i don't say about me personnaly or someone else. my points is that 1) modern C compilers allows to get close-to-asm performance. Haskell don't 2) fastest Haskell code is very hard to write, much harder than C one so trying to write fastest possible code in Haskell you lose several times in both speed of code and speed of development. of course i mean here best Haskell and C optimization knowledge, personal skills may vary -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, Sep 23, 2008 at 2:52 PM, Bulat Ziganshin
Hello John,
Tuesday, September 23, 2008, 5:39:17 PM, you wrote:
Probably not, but I think you completely missed my point. Perhaps I should have originally written "my original C equivalents" rather than "the". You're probably just a better C programmer than me.
well, i don't say about me personnaly or someone else. my points is that
1) modern C compilers allows to get close-to-asm performance. Haskell don't 2) fastest Haskell code is very hard to write, much harder than C one
This is only true for tiny toy benchmarks, not for real applications. For real applications you're going to have bottle necks in small parts of the code. So what if that 5-20% of the code has to be written using specialized techniques to get speed, when you get several times more productivity/correctness/maintainability in the 80-95% of the code which is not the bottle neck? If you're involved in a game where everyone tries to show off how their programming language deals with the bottle necks then obviously you'd use these "ugly" techniques to optimize things. It may be that it's more convenient to write this low level code in C (though I wouldn't say it's "much" more convenient, slightly more for some cases, less for others), but the price you pay is that you have to use this low level style for *everything*. Most of the inconvenience, IMO, is due to syntax not semantics (specifically having to do "<-" on separate lines in order to read mutable memory rather than being able to extract the value at the usage site - though I do believe some syntactic sugar was on the way for that?) So I think you're wrong to criticize people for showing that Haskell can be fast. Yes it takes effort (though you can increasingly avoid most of it by using libraries like ByteString), but nobody is saying that it doesn't. You're arguing against a straw man. The point is that you *can* get good performance in Haskell for the parts of your programs which is a bottle neck. The shootout only tests these parts, which is why the Haskell code in those submissions is sometimes a bit obscure (though not nearly to the extent you make out). But that's not how *real* applications look! In real applications that kind of code would be only a fraction of the total code base, most of it would be perfectly ordinary Haskell! (whereas in C everything would be, well C) Cheers, -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

On Sep 22, 2008, at 20:16 , Bulat Ziganshin wrote:
Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
yes. when we say that things are greatly improving each year, this means that they was bad previously ;)
You're reaching --- trying to use the past to assert the present --- to try to support your incorrect thesis. People who insist something can't be done should stay out of the way of those who are doing it :) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Don Stewart ha scritto:
[...] Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
Any roadmap for improve support in intensive IO multiplexing? Or, at least, some papers about how this is implemented in GHC? Af far as I understand, select is used in two separate places. How much effort it takes to implement a pluggable "reactor" (select, poll, epoll, kqueue, /dev/poll, and so)?
[...]
Thanks Manlio Perillo

Hello Manlio, Tuesday, September 23, 2008, 1:36:16 PM, you wrote:
Any roadmap for improve support in intensive IO multiplexing? Or, at least, some papers about how this is implemented in GHC?
Af far as I understand, select is used in two separate places. How much effort it takes to implement a pluggable "reactor" (select, poll, epoll, kqueue, /dev/poll, and so)?
look at alt-network package improving existing ghc i/o library may be a bad idea since it is a large monolithic code which is aklready hard to maintain. there was several efforts to provide new i/o library. of those, my Streams library was the largest. unfortunately, all who done those libs stopped their work. so the most probable scenario now is what we eventually will get ByteString-based modular I/O library from FPS/Binary developers -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin ha scritto:
Hello Manlio,
Tuesday, September 23, 2008, 1:36:16 PM, you wrote:
Any roadmap for improve support in intensive IO multiplexing? Or, at least, some papers about how this is implemented in GHC?
Af far as I understand, select is used in two separate places. How much effort it takes to implement a pluggable "reactor" (select, poll, epoll, kqueue, /dev/poll, and so)?
look at alt-network package
improving existing ghc i/o library may be a bad idea since it is a large monolithic code which is aklready hard to maintain. there was several efforts to provide new i/o library. of those, my Streams library was the largest. unfortunately, all who done those libs stopped their work. so the most probable scenario now is what we eventually will get ByteString-based modular I/O library from FPS/Binary developers
Maybe improve GHC to make Haskell suitable to write high reliable internet servers is not of interest?
Thanks Manlio Perillo

Hello Manlio, Tuesday, September 23, 2008, 3:14:58 PM, you wrote:
Maybe improve GHC to make Haskell suitable to write high reliable internet servers is not of interest?
well if it's interesting - do it :) various people do that they find most exciting/important. actually, alt-network package is just about fast network i/o -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin ha scritto:
Hello Manlio,
Tuesday, September 23, 2008, 3:14:58 PM, you wrote:
Maybe improve GHC to make Haskell suitable to write high reliable internet servers is not of interest?
well if it's interesting - do it :)
Unfortunately, I no more have the time for "do it your self", unless there is *really* no other solution (and, in this case, the solution is to use Erlang).
various people do that they find most exciting/important. actually, alt-network package is just about fast network i/o
Where can I find alt-network? Thanks Manlio Perillo

Hello Manlio, Tuesday, September 23, 2008, 3:43:03 PM, you wrote:
Maybe improve GHC to make Haskell suitable to write high reliable internet servers is not of interest?
well if it's interesting - do it :)
Unfortunately, I no more have the time for "do it your self", unless there is *really* no other solution (and, in this case, the solution is to use Erlang).
believe it or not but other developers are in the same position :)
most exciting/important. actually, alt-network package is just about fast network i/o
http://www.cs.helsinki.fi/u/ekarttun/network-alt/network-alt-0.3.tar.gz Licence * The code is subject to a three clause BSD-licence. Contact: * Einar Karttunen <ekarttun at cs helsinki fi> * musasabi on #haskell in irc.freenode.net -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Manlio Perillo wrote:
Don Stewart ha scritto:
[...] Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
Any roadmap for improve support in intensive IO multiplexing? Or, at least, some papers about how this is implemented in GHC?
Af far as I understand, select is used in two separate places. How much effort it takes to implement a pluggable "reactor" (select, poll, epoll, kqueue, /dev/poll, and so)?
We'd certainly support any efforts to add support for a more modern I/O multiplexing or asynchronous I/O back-end to the IO library. It's not too difficult, because the interface between the low-level I/O supplier and the rest of the IO library is rather small - just a few functions that read and write blocks of data in a blocking or non-blocking way. The blocking variants currently communicate with the I/O manager thread which does select() on behalf of the clients. Before doing this, though, I'd urge you to check that your application really does have scaling issues with the current system. If it doesn't, then great, and if it does then you have a good benchmark to evaluate the replacement. Cheers, Simon

Simon Marlow ha scritto:
Manlio Perillo wrote: [...] We'd certainly support any efforts to add support for a more modern I/O multiplexing or asynchronous I/O back-end to the IO library. It's not too difficult, because the interface between the low-level I/O supplier and the rest of the IO library is rather small - just a few functions that read and write blocks of data in a blocking or non-blocking way. The blocking variants currently communicate with the I/O manager thread which does select() on behalf of the clients.
There is some documentation that summarize the current status, and how all fits together? There are some benchmarks that tell you the use of a separate I/O manager thread is a good solution?
Before doing this, though, I'd urge you to check that your application really does have scaling issues with the current system. If it doesn't, then great, and if it does then you have a good benchmark to evaluate the replacement.
Right now I don't have any Haskell applications. The only application that I develope and need to scale are, until now, web applications, and for this I use Nginx + mod_wsgi + wsgix (the last two implemented by myself, http://hg.mperillo.ath.cx/). I'm quite satisfied with the current status, and I don't think I need to change language/frameworks. However I'm looking for a good environment for implementing generic internet servers, or web applications with special needs. As an example one of my "maybe future" tasks is to write a simple BitTorrent tracker + seeder. I have tried to write it as a Nginx module, but it requires a lot of boiler plate code, so I gave up. Twisted (a Python asynchronous framework) is a confortable environment, but I feel concurrent Haskell is superior. It surely will work using just select, but since I have experience (including indirect experience) with both Twisted and Nginx, I know that using select is asking for troubles (but the solution used by Haskell is very new for me). It is ok for clients, but not for servers that need to go in internet.
Cheers, Simon
Thanks Manlio Perillo

manlio_perillo:
However I'm looking for a good environment for implementing generic internet servers, or web applications with special needs. As an example one of my "maybe future" tasks is to write a simple BitTorrent tracker + seeder.
You could look at conjure, the bitorrent client that uses STM for network multiplexing, http://darcs.haskell.org/~lemmih/conjure/
Twisted (a Python asynchronous framework) is a confortable environment, but I feel concurrent Haskell is superior.
Should be a lot faster, given there's compiled native code, and no global locks. Actually, the very kind of thing we see on the shootout now :)
It surely will work using just select, but since I have experience (including indirect experience) with both Twisted and Nginx, I know that using select is asking for troubles (but the solution used by Haskell is very new for me).
Yes, you'd use probably STM's `orElse` to multiplex IO from different sources, and GHCs lightweight threads for concurrency. We've used solutions like this at Galois to build servers that are both high level and efficient. -- Don

On Wed, Sep 24, 2008 at 12:31 PM, Don Stewart
Twisted (a Python asynchronous framework) is a confortable environment, but I feel concurrent Haskell is superior.
Should be a lot faster, given there's compiled native code, and no global locks.
The concurrent Haskell programming model is vastly nicer. Twisted is entirely event-driven, so it's nearly as far from a comfortable environment as you might hope to stretch. I can't speak to their respective performance strengths. However, I've done quite a bit of concurrent networking with lightweight coroutines in Python (greenlet), and its performance is nothing to write home about.

Don Stewart ha scritto:
manlio_perillo:
However I'm looking for a good environment for implementing generic internet servers, or web applications with special needs. As an example one of my "maybe future" tasks is to write a simple BitTorrent tracker + seeder.
You could look at conjure, the bitorrent client that uses STM for network multiplexing,
Its unfortunate that the project seems to be dead.
Twisted (a Python asynchronous framework) is a confortable environment, but I feel concurrent Haskell is superior.
Should be a lot faster, given there's compiled native code, and no global locks.
With Twisted you usually don't use threads.
[...]
Manlio Perillo

Manlio Perillo wrote:
Simon Marlow ha scritto:
Manlio Perillo wrote: [...] We'd certainly support any efforts to add support for a more modern I/O multiplexing or asynchronous I/O back-end to the IO library. It's not too difficult, because the interface between the low-level I/O supplier and the rest of the IO library is rather small - just a few functions that read and write blocks of data in a blocking or non-blocking way. The blocking variants currently communicate with the I/O manager thread which does select() on behalf of the clients.
There is some documentation that summarize the current status, and how all fits together?
Sadly no, but the code is mostly restricted to a couple of modules (GHC.Handle and GHC.Conc).
There are some benchmarks that tell you the use of a separate I/O manager thread is a good solution?
Compared to what? I did measure the difference between doing this in Haskell code in the IO library (as with -threaded) against doing it in the RTS (without -threaded), and the Haskell version won hands down, mainly because it only re-initiates the select() when it has to change the file handles in the set. Using epoll() instead of select() should be even better. Cheers, Simon

Simon Marlow ha scritto:
Manlio Perillo wrote:
Simon Marlow ha scritto:
Manlio Perillo wrote: [...] We'd certainly support any efforts to add support for a more modern I/O multiplexing or asynchronous I/O back-end to the IO library. It's not too difficult, because the interface between the low-level I/O supplier and the rest of the IO library is rather small - just a few functions that read and write blocks of data in a blocking or non-blocking way. The blocking variants currently communicate with the I/O manager thread which does select() on behalf of the clients.
There is some documentation that summarize the current status, and how all fits together?
Sadly no, but the code is mostly restricted to a couple of modules (GHC.Handle and GHC.Conc).
Ok. I' I'm also reading the "unify" paper. What I would like to do is to write a library for high efficient IO, using the code from Nginx (that is well written and reusable). What Nginx give us is: 1) Several modules for efficient event handling on all platform. There is also Windows support with IOCP, but the code is not public (but lately someone is porting Nginx to Windows, and I can always ask the author of Nginx to make his code public). 2) Support for SSL. 3) There is support for timeouts, using a red black tree. In Haskell timeouts are handled spwawing a thread with forkIO, but is this efficient? 4) Support for async DNS resolver. 5) Support for "sending" buffer chains. A buffer chain is "sent" with writev. In Haskell a ByteString has several chunks, so it can be efficiently "sent" using writev. 6) Support for file buffers. A file buffer is "sent" with sendfile, if available. There is no equivalent in Haskell. Since I have started to work with Nginx, I had the interest of reusing the Nginx code as a library/framework, but all the languages I knew did not have native support for microthreads (Python has greenlets, but it also already hase Twisted, so it is of little interest to implement yet another async framework). GHC seems the best platform to use, since it has microthreads, and Haskell language allow a "safe" use of OS threads.
There are some benchmarks that tell you the use of a separate I/O manager thread is a good solution?
Compared to what? I did measure the difference between doing this in Haskell code in the IO library (as with -threaded) against doing it in the RTS (without -threaded),
Yes, this is what I meant.
and the Haskell version won hands down, mainly because it only re-initiates the select() when it has to change the file handles in the set. Using epoll() instead of select() should be even better.
Ok, thanks. In GHC there is only one IO manager thread? And forkIO can execute code in a separate OS thread, if necessary?
Cheers, Simon
Regards Manlio

simonmarhaskell:
Manlio Perillo wrote:
Don Stewart ha scritto:
[...] Ok. So I'll just say: high level, efficient code is an overriding theme of many individuals working on Haskell. Things are better and better each year. We do not stand still.
Any roadmap for improve support in intensive IO multiplexing? Or, at least, some papers about how this is implemented in GHC?
Af far as I understand, select is used in two separate places. How much effort it takes to implement a pluggable "reactor" (select, poll, epoll, kqueue, /dev/poll, and so)?
We'd certainly support any efforts to add support for a more modern I/O multiplexing or asynchronous I/O back-end to the IO library. It's not too difficult, because the interface between the low-level I/O supplier and the rest of the IO library is rather small - just a few functions that read and write blocks of data in a blocking or non-blocking way. The blocking variants currently communicate with the I/O manager thread which does select() on behalf of the clients.
Indeed, it has been done at least once elsewhere, http://www.seas.upenn.edu/~lipeng/homepage/unify.html Check the tarball for a custom bytestring and epoll implementation used to scale up to 10M haskell threads, http://www.seas.upenn.edu/%7Elipeng/unify/unify-0.0.1.tar.gz Perhaps something worth resuscitating code for, for more epoll servers. -- Don

On Sep 22, 2008, at 18:53 , Donnie Jones wrote:
I'm fairly new to Haskell and the Haskell community, but I can say from my experience of hacking on GHC, the GHC team of developers are very interested in performance improvements, e.g. this thread is about performance! So Bulat, why don't you hack on GHC yourself and help improve the compiler? Or suggest detailed improvements since you seem capable of citing specific examples?
Because he's convinced himself it's pointless because Haskell will never be able to run faster than C (note how he keeps trying to run dons out of the discussion because he has proof to the contrary). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hello Brandon, Wednesday, September 24, 2008, 9:36:56 PM, you wrote:
Because he's convinced himself it's pointless because Haskell will never be able to run faster than C
taking into account that C compilers are very close to maximum speed possible, and this required many years of developemnt by many people, it's not as easy as you believe ;)
(note how he keeps trying to run dons out of the discussion because he has proof to the contrary).
Don os free to present such examples. unfortunately all examples i've seen was due to use of slow C code which easily can be made faster. in particular, shootout can't be considered as representative benchmark -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sep 24, 2008, at 13:44 , Bulat Ziganshin wrote:
(note how he keeps trying to run dons out of the discussion because he has proof to the contrary).
Don os free to present such examples. unfortunately all examples i've seen was due to use of slow C code which easily can be made faster. in particular, shootout can't be considered as representative benchmark
"I reserve the right to disallow any actual evidence for any reason I can come up with, including amorphous and vacuous ones" (you can almost always write something faster, but with how much effort?) Thanks, I get enough of that from the government here. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hello Brandon, Wednesday, September 24, 2008, 11:13:14 PM, you wrote:
can come up with, including amorphous and vacuous ones" (you can almost always write something faster, but with how much effort?)
as i said, eddorts to optimize Haskell code is several times larger while the result is several times slower -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sep 24, 2008, at 15:20 , Bulat Ziganshin wrote:
Wednesday, September 24, 2008, 11:13:14 PM, you wrote:
can come up with, including amorphous and vacuous ones" (you can almost always write something faster, but with how much effort?)
as i said, eddorts to optimize Haskell code is several times larger while the result is several times slower
...and we're back to "dons demonstrated otherwise, so you have to invent reasons to disqualify him". If all you can do is get in the way of people doing what you claim is impossible, please stop. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hello Brandon, Thursday, September 25, 2008, 12:43:55 AM, you wrote:
as i said, eddorts to optimize Haskell code is several times larger while the result is several times slower
...and we're back to "dons demonstrated otherwise, so you have to
please show me example that you mean and i will show exact reasons why this Haskell code wasn't compared to the best C code -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Wed, Sep 24, 2008 at 11:20 PM, Bulat Ziganshin
please show me example that you mean and i will show exact reasons why this Haskell code wasn't compared to the best C code
The shootout seems pretty popular, and there's still a lot of C programmers around, so I wonder why the C code on the shootout would be of poor quality.

Hello david48, Thursday, September 25, 2008, 1:38:55 AM, you wrote:
please show me example that you mean and i will show exact reasons why this Haskell code wasn't compared to the best C code
The shootout seems pretty popular, and there's still a lot of C programmers around, so I wonder why the C code on the shootout would be of poor quality.
1. speed of most shootout examples heavily depends on availability and quality of libraries bundled with the compiler. shootout authors doesn't allow to use 3rd-party libs nor rewrite this functionality from scratch. for example C lays down in multithreading tests because C compilers doesn't include green thread libs 2. unlike Don, C authors can't modify libs bundled to their compilers to reach out maximum speed on these benchmarks. for example, using of readInt instead of generic read allowed to make program tens times faster and even outperform a bit C version that uses standard library functions -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

I'm going to have to agree with David... even if you ignore the
multi-threaded projects, why couldn't the C programs just implement very
specific version of the third party library inside their code? Is there
anything stopping them?
On Wed, Sep 24, 2008 at 5:50 PM, Bulat Ziganshin
Hello david48,
Thursday, September 25, 2008, 1:38:55 AM, you wrote:
please show me example that you mean and i will show exact reasons why this Haskell code wasn't compared to the best C code
The shootout seems pretty popular, and there's still a lot of C programmers around, so I wonder why the C code on the shootout would be of poor quality.
1. speed of most shootout examples heavily depends on availability and quality of libraries bundled with the compiler. shootout authors doesn't allow to use 3rd-party libs nor rewrite this functionality from scratch. for example C lays down in multithreading tests because C compilers doesn't include green thread libs
2. unlike Don, C authors can't modify libs bundled to their compilers to reach out maximum speed on these benchmarks. for example, using of readInt instead of generic read allowed to make program tens times faster and even outperform a bit C version that uses standard library functions
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- /jve

Hallo, John Van Enk wrote:
I'm going to have to agree with David... even if you ignore the multi-threaded projects, why couldn't the C programs just implement very specific version of the third party library inside their code? Is there anything stopping them?
Maybe they don't care *that* much? Cheers, -alex http://www.ventonegro.org/

Hello John, Thursday, September 25, 2008, 1:55:18 AM, you wrote: ask benchmark authors or just examine code before making any conclusions. for example, in sumfile behcnmark, C entries use fgets+atoi or C++ streams to read input numbers so C works 2x slower than Clean and 4x slower than low-level Haskell code (which wasn't accepted for the same reasons). so we have "fight of invalids" where Haskell invalid was made much faster (on this test) by adding special function to ghc libs
I'm going to have to agree with David... even if you ignore the multi-threaded projects, why couldn't the C programs just implement very specific version of the third party library inside their code? Is there anything stopping them?
On Wed, Sep 24, 2008 at 5:50 PM, Bulat Ziganshin
wrote: Hello david48,
Thursday, September 25, 2008, 1:38:55 AM, you wrote:
please show me example that you mean and i will show exact reasons why this Haskell code wasn't compared to the best C code
The shootout seems pretty popular, and there's still a lot of C programmers around, so I wonder why the C code on the shootout would be of poor quality.
1. speed of most shootout examples heavily depends on availability and quality of libraries bundled with the compiler. shootout authors doesn't allow to use 3rd-party libs nor rewrite this functionality from scratch. for example C lays down in multithreading tests because C compilers doesn't include green thread libs
2. unlike Don, C authors can't modify libs bundled to their compilers to reach out maximum speed on these benchmarks. for example, using of readInt instead of generic read allowed to make program tens times faster and even outperform a bit C version that uses standard library functions
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

"John Van Enk"
I'm going to have to agree with David... even if you ignore the multi-threaded projects, why couldn't the C programs just implement very specific version of the third party library inside their code?
Or they could just FFI to the Haskell libraries :-) -k -- If I haven't seen further, it is by standing in the footprints of giants

Hello Ketil, Thursday, September 25, 2008, 8:57:05 PM, you wrote:
"John Van Enk"
writes:
I'm going to have to agree with David... even if you ignore the multi-threaded projects, why couldn't the C programs just implement very specific version of the third party library inside their code?
Or they could just FFI to the Haskell libraries :-)
there are lots of green thread libs for C, they are just not bundled with gcc -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello Brandon, Thursday, September 25, 2008, 12:43:55 AM, you wrote:
as i said, eddorts to optimize Haskell code is several times larger while the result is several times slower
...and we're back to "dons demonstrated otherwise, so you have to invent reasons to disqualify him". If all you can do is get in the way of people doing what you claim is impossible, please stop.
and btw it's you who doesn't have any technical arguments and try to "disqualify" me instead. if you have something technical to say, you may do it instaed of Don. if you just *believe* that Haskell is cool while not knowing anything about tech details, you can stop this meaningless talk -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 2008.09.22 21:12:06 +0400, Bulat Ziganshin
Hello Simon,
Monday, September 22, 2008, 9:03:52 PM, you wrote:
With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones
i don't think that these 3 libs allows to write high-level high-performance code in *most* cases. just for example, try to write wc without using "words"
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
http://haskell.org/haskellwiki/Wc seems to do fine; you'll notice it drops "lines" after the first version. -- gwern class ASPIC Duress BNC WWSV of intelligence retrieval Al PRF

Hello Gwern, Tuesday, September 23, 2008, 3:33:02 AM, you wrote:
high-performance code in *most* cases. just for example, try to write wc without using "words"
http://haskell.org/haskellwiki/Wc seems to do fine; you'll notice it drops "lines" after the first version.
actually it counts lines using built-in function. you may find that naive C is 6x fatser than naive Haskell and difference is so small only because C version is bound by memory speed -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

2008/9/23 Bulat Ziganshin
http://haskell.org/haskellwiki/Wc seems to do fine; you'll notice it drops "lines" after the first version.
actually it counts lines using built-in function. you may find that naive C is 6x fatser than naive Haskell and difference is so small only because C version is bound by memory speed
So what ? Strings represented as lists of character are slow ? What an incredible discovery ! The ByteString versions are fast, ByteString was invented especially for IO intensive situation, it's a library you can use pretty naively and mostly get excellent performances, what exactly isn't Haskelly enough for you in this solution ? The guts of ByteString aren't idiomatic Haskell ? And ? The guts of the JVM aren't written in Java either... At least ByteString was built over GHC which means it can be done. Besides a part of the performances of ByteString comes from stream fusion and that's specifically something that is hard to achieve outside of Haskell... What exactly is your point ? - Haskell is slow, we can't make it faster ? That's obviously false as demonstrated by the progress in the latest years. - Haskell is slower than C ? Well that's probably true, because C can touch the hardware directly and can always optimize every single aspects of a computation... On the other hand that kind of uber optimization has a cost that I am unwilling to pay, I would rather write high level Haskell and pay a little hit on execution time. - We shouldn't center ourselves around performance to promote Haskell (or should we even promote Haskell) ? Well there you may have a point, but maybe it would be better to just make it and avoid denying other peoples efforts to make Haskell faster ? -- Jedaï

--- Simon Brenner
On Mon, Sep 22, 2008 at 2:07 PM, Bulat Ziganshin
wrote: this overall test is uselles for speed comparison. afair, there are only 2-3 programs whose speed isn't heavily depend on libraries. in DNA test, for example, Tcl (or PHP?) was leader just because it has better regexp library
On the regex-dna benchmark, I'll have to agree. It's unfortunate to have a benchmark so dependent on the set of libraries included in the distribution, and e.g. Text.Regex.PCRE kicks Text.Regex.Posix's behind in this benchmark - but we probably can't use it only because one's bundled and the other isn't. Maybe we could roll our own regexp engine for this specific benchmark though (for example, Text.Regex.TDFA is within 10% or something of PCRE and AFAIK pure Haskell - a customized and downsized version of that could probably be made quite competitive).
You could always suggest use of Text.Regex.PCRE, provide a program and instructions on how to install Text.Regex.PCRE on Ubuntu. -snip-
With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones ...
Go do!

igouy2:
--- Simon Brenner
wrote: On Mon, Sep 22, 2008 at 2:07 PM, Bulat Ziganshin
wrote: this overall test is uselles for speed comparison. afair, there are only 2-3 programs whose speed isn't heavily depend on libraries. in DNA test, for example, Tcl (or PHP?) was leader just because it has better regexp library
On the regex-dna benchmark, I'll have to agree. It's unfortunate to have a benchmark so dependent on the set of libraries included in the distribution, and e.g. Text.Regex.PCRE kicks Text.Regex.Posix's behind in this benchmark - but we probably can't use it only because one's bundled and the other isn't. Maybe we could roll our own regexp engine for this specific benchmark though (for example, Text.Regex.TDFA is within 10% or something of PCRE and AFAIK pure Haskell - a customized and downsized version of that could probably be made quite competitive).
You could always suggest use of Text.Regex.PCRE, provide a program and instructions on how to install Text.Regex.PCRE on Ubuntu.
-snip-
With bytestrings, unboxed arrays, light-weight threads and other tricks, we can usually replace all those ugly low-level programs with nice high-level haskell ones ...
Go do!
All is in hand. Tim Newsham's uploaded a regex-pcre and regex-posix entry to the wiki, and I'm testing now on quad core. If it survives testing, we'll submit it to Isaac. -- Don

igouy2:
--- Simon Brenner
wrote: On Mon, Sep 22, 2008 at 2:07 PM, Bulat Ziganshin
wrote: this overall test is uselles for speed comparison. afair, there are only 2-3 programs whose speed isn't heavily depend on libraries. in DNA test, for example, Tcl (or PHP?) was leader just because it has better regexp library
On the regex-dna benchmark, I'll have to agree. It's unfortunate to have a benchmark so dependent on the set of libraries included in the distribution, and e.g. Text.Regex.PCRE kicks Text.Regex.Posix's behind in this benchmark - but we probably can't use it only because one's bundled and the other isn't. Maybe we could roll our own regexp engine for this specific benchmark though (for example, Text.Regex.TDFA is within 10% or something of PCRE and AFAIK pure Haskell - a customized and downsized version of that could probably be made quite competitive).
You could always suggest use of Text.Regex.PCRE, provide a program and instructions on how to install Text.Regex.PCRE on Ubuntu.
I've now submitted a Text.Regex.PCRE parallelised entry written by Tim Newsham. It is by far the fastest haskell regex entry so far (down to 9s on quad core, from 100+ seconds on single core for the old entry), http://alioth.debian.org/tracker/index.php?func=detail&aid=311132&group_id=30402&atid=411646 It does require the regex-pcre library, which if it isn't in your package system on Ubuntu, you can certainly build, $ wget http://hackage.haskell.org/packages/archive/regex-pcre-builtin/0.94.2.0.7.7/... ar.gz $ tar xzf regex-pcre-builtin-0.94.2.0.7.7.tar.gz $ cd regex-pcre-builtin-0.94.2.0.7.7 $ runhaskell Setup.hs configure --prefix=$HOME $ runhaskell Setup.hs build $ sudo runhaskell Setup.hs install I also included these details on the ticket. It uses a simple parMap strategy. Cheers, Don

Don Stewart ha scritto:
[...] I've now submitted a Text.Regex.PCRE parallelised entry written by Tim Newsham. It is by far the fastest haskell regex entry so far (down to 9s on quad core, from 100+ seconds on single core for the old entry),
http://alioth.debian.org/tracker/index.php?func=detail&aid=311132&group_id=30402&atid=411646
It does require the regex-pcre library, which if it isn't in your package system on Ubuntu, you can certainly build,
What performance gain do you obtain using regex-pcre-builtin against a native Haskell regex library?
[...]
Thanks Manlio Perillo

And, though I had never seen it before, the current winner for speed is "ATS" ( http://www.ats-lang.org/ ) which is dependently-typed functional language.

ChrisK wrote:
And, though I had never seen it before, the current winner for speed is "ATS" ( http://www.ats-lang.org/ ) which is dependently-typed functional language.
And as discussed on Reddit recently[1] it uses a good deal of embedded C/C++ and so is subject to the same non-idiomatic complaints. While the non-idiomatic complaint is a valid one, I think it's also overrated. Back in the day people said that an automatic compiler would never be able to out-perform hand-written assembly; and today gcc produces better assembly code than any mere mortal. Yet occasionally C/C++ programmers still feel the need to do the compiler's job. Similarly, if ATS or Haskell lets one write better C/C++ than any mere mortal, then it has won-- even if programmers feel the need to do the compiler's job from time to time. Compilers for any language will never be omniscient, but they're far more consistent about applying known optimization patterns globally. While some tasks do require absolute performance, high-level languages seek to optimize programmer time which is far more valuable than cpu time for the great majority of tasks. The optimization techniques for Haskell are quite different than for C, but I wouldn't say they are inherently "harder" in any real sense. Optimizing Haskell requires a deep understanding of the runtime internals of ghc (or other compiler of choice); optimizing C requires a deep understanding of the runtime internals of C's memory 'runtime'. The only difference I see is that C's view of the world is taught as canon and has been a target of research for longer. [1] http://www.reddit.com/r/programming/comments/72hmw/language_shootout_ats_is_... -- Live well, ~wren

Don Stewart ha scritto:
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :)
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all
Just one or two more parallel programs required...
Submit them here, and we can test on quad core,
There is a lot of efforts to improve CPU time, but what about memory usage? http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=ghc Compared to C versions: mandelbrot requires 18x memory k-nucleotide requires 13x memory reverse-complement requires 4.9x memory The main problem is with k-nucleotide, that requires a total of 2,920,980 KB! http://shootout.alioth.debian.org/u64q/benchmark.php?test=knucleotide&lang=all&sort=kb http://shootout.alioth.debian.org/u64q/benchmark.php?test=revcomp&lang=all&sort=kb For mandelbrot there is an alternate version with better memory and CPU usage.
-- Don
Manlio Perillo

manlio_perillo:
Don Stewart ha scritto:
Thanks to those guys who've submitted parallel programs to the language benchmarks game, we're climbing up the rankings, now in 3rd, and ahead of C :)
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all
Just one or two more parallel programs required...
Submit them here, and we can test on quad core,
There is a lot of efforts to improve CPU time, but what about memory usage?
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=ghc
No one is looking at memory at the moment, as the parallel/multicore profiler isn't working in 6.8.2. -- Don

Don Stewart ha scritto:
[...]
There is a lot of efforts to improve CPU time, but what about memory usage?
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=ghc
No one is looking at memory at the moment, as the parallel/multicore profiler isn't working in 6.8.2.
But the k-nucleotide application don't make use of concurrent or parallel features...
-- Don
Manlio Perillo
participants (24)
-
Alex Sandro Queiroz e Silva
-
Andrew Coppin
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
Chaddaï Fouché
-
ChrisK
-
david48
-
Don Stewart
-
Donnie Jones
-
Graham Fawcett
-
Gwern Branwen
-
Isaac Gouy
-
John Van Enk
-
Jonathan Cast
-
Jules Bean
-
Ketil Malde
-
Manlio Perillo
-
Sebastian Sylvan
-
Simon Brenner
-
Simon Marlow
-
Sterling Clover
-
Thomas Davie
-
wren ng thornton