Re: Why can't Haskell be faster?

Peter Hercek wrote:
* it is easy to mark stuff strict (even in function signatures etc), so it is possible to save on unnecessary CAF creations
Also, the Clean compiler has a strictness analyzer. The compiler will analyze code and find many (but not all) cases where a function argument can be made strict without changing the behavior of the program.

Hi
I've been working on optimising Haskell for a little while
(http://www-users.cs.york.ac.uk/~ndm/supero/), so here are my thoughts
on this. The Clean and Haskell languages both reduce to pretty much
the same Core language, with pretty much the same type system, once
you get down to it - so I don't think the difference between the
performance is a language thing, but it is a compiler thing. The
uniqueness type stuff may give Clean a slight benefit, but I'm not
sure how much they use that in their analyses.
Both Clean and GHC do strictness analysis - I don't know which one
does better, but both do quite well. I think Clean has some
generalised fusion framework, while GHC relies on rules and short-cut
deforestation. GHC goes through C-- to C or ASM, while Clean has been
generating native code for a lot longer. GHC is based on the STG
machine, while Clean is based on the ABC machine - not sure which is
better, but there are differences there.
My guess is that the native code generator in Clean beats GHC, which
wouldn't be too surprising as GHC is currently rewriting its CPS and
Register Allocator to produce better native code.
Thanks
Neil
On 10/31/07, Jeff.Harper@handheld.com
Peter Hercek wrote:
* it is easy to mark stuff strict (even in function signatures etc), so it is possible to save on unnecessary CAF creations
Also, the Clean compiler has a strictness analyzer. The compiler will analyze code and find many (but not all) cases where a function argument can be made strict without changing the behavior of the program.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

ndmitchell:
Hi
I've been working on optimising Haskell for a little while (http://www-users.cs.york.ac.uk/~ndm/supero/), so here are my thoughts on this. The Clean and Haskell languages both reduce to pretty much the same Core language, with pretty much the same type system, once you get down to it - so I don't think the difference between the performance is a language thing, but it is a compiler thing. The uniqueness type stuff may give Clean a slight benefit, but I'm not sure how much they use that in their analyses.
Both Clean and GHC do strictness analysis - I don't know which one does better, but both do quite well. I think Clean has some generalised fusion framework, while GHC relies on rules and short-cut deforestation. GHC goes through C-- to C or ASM, while Clean has been generating native code for a lot longer. GHC is based on the STG machine, while Clean is based on the ABC machine - not sure which is better, but there are differences there.
My guess is that the native code generator in Clean beats GHC, which wouldn't be too surprising as GHC is currently rewriting its CPS and Register Allocator to produce better native code.
Yes, this was my analysis too -- its in the native code gen. Which is perhaps the main GHC bottleneck now. -- Don

So in a few years time when GHC has matured we can expect performance to be
on par with current Clean? So Clean is a good approximation to peak
performance?
--ryan
On 10/31/07, Don Stewart
ndmitchell:
Hi
I've been working on optimising Haskell for a little while (http://www-users.cs.york.ac.uk/~ndm/supero/), so here are my thoughts on this. The Clean and Haskell languages both reduce to pretty much the same Core language, with pretty much the same type system, once you get down to it - so I don't think the difference between the performance is a language thing, but it is a compiler thing. The uniqueness type stuff may give Clean a slight benefit, but I'm not sure how much they use that in their analyses.
Both Clean and GHC do strictness analysis - I don't know which one does better, but both do quite well. I think Clean has some generalised fusion framework, while GHC relies on rules and short-cut deforestation. GHC goes through C-- to C or ASM, while Clean has been generating native code for a lot longer. GHC is based on the STG machine, while Clean is based on the ABC machine - not sure which is better, but there are differences there.
My guess is that the native code generator in Clean beats GHC, which wouldn't be too surprising as GHC is currently rewriting its CPS and Register Allocator to produce better native code.
Yes, this was my analysis too -- its in the native code gen. Which is perhaps the main GHC bottleneck now.
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

goalieca:
So in a few years time when GHC has matured we can expect performance to be on par with current Clean? So Clean is a good approximation to peak performance?
The current Clean compiler, for micro benchmarks, seems to be rather good, yes. Any slowdown wrt. the same program in Clean could be considered a bug in GHC... And remember usually Haskell is competing against 'high level' languages like python for adoption, where we're 5-500x faster anyway... -- Don

On 31/10/2007, Don Stewart
goalieca:
So in a few years time when GHC has matured we can expect performance to be on par with current Clean? So Clean is a good approximation to peak performance?
The current Clean compiler, for micro benchmarks, seems to be rather good, yes. Any slowdown wrt. the same program in Clean could be considered a bug in GHC...
And remember usually Haskell is competing against 'high level' languages like python for adoption, where we're 5-500x faster anyway...
Not so sure about that last thing. I'd love to use Haskell for performance, in other words use it because it makes it easier to write parallel and concurrent programs (NDP and STM mainly, though I wouldn't mind some language support for message passing, and perhaps Sing#-style static protocol specifications, with some high degree of inference). Anyway, in order for that to be reasonable I think it's important that even the sequential code (where actual data dependencies enforce evaluation sequence) runs very quickly, otherwise we'll lose out to some C-based language (written with 10x the effort) again when we start bumping into the wall of Almdahls law... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Don Stewart
goalieca:
So in a few years time when GHC has matured we can expect performance to be on par with current Clean? So Clean is a good approximation to peak performance?
If I remember the numbers, Clean is pretty close to C for most benchmarks, so I guess it is fair to say it is a good approximation to practical peak performance. Which proves that it is possible to write efficient low-level code in Clean.
And remember usually Haskell is competing against 'high level' languages like python for adoption, where we're 5-500x faster anyway...
Unfortunately, they replaced line counts with bytes of gzip'ed code -- while the former certainly has its problems, I simply cannot imagine what relevance the latter has (beyond hiding extreme amounts of repetitive boilerplate in certain languages). When we compete against Python and its ilk, we do so for programmer productivity first, and performance second. LOC was a nice measure, and encouraged terser and more idiomatic programs than the current crop of performance-tweaked low-level stuff. BTW, Python isn't so bad, performance wise. Much of what I do consists of reading some files, building up some hashes (associative arrays or finite maps, depending on where you come from :-), and generating some output. Python used to do pretty well here compared to Haskell, with rather efficient hashes and text parsing, although I suspect ByteString IO and other optimizations may have changed that now. -k -- If I haven't seen further, it is by standing in the footprints of giants

I assume the reason the switched away from LOC is to prevent
programmers artificially reducing their LOC count, e.g. by using
a = 5; b = 6;
rather than
a = 5;
b = 6;
in languages where newlines aren't syntactically significant. When
gzipped, I guess that the ";\n" string will be represented about as
efficiently as just the single semi-colon.
On 01/11/2007, Ketil Malde
Don Stewart
writes: goalieca:
So in a few years time when GHC has matured we can expect performance to be on par with current Clean? So Clean is a good approximation to peak performance?
If I remember the numbers, Clean is pretty close to C for most benchmarks, so I guess it is fair to say it is a good approximation to practical peak performance.
Which proves that it is possible to write efficient low-level code in Clean.
And remember usually Haskell is competing against 'high level' languages like python for adoption, where we're 5-500x faster anyway...
Unfortunately, they replaced line counts with bytes of gzip'ed code -- while the former certainly has its problems, I simply cannot imagine what relevance the latter has (beyond hiding extreme amounts of repetitive boilerplate in certain languages).
When we compete against Python and its ilk, we do so for programmer productivity first, and performance second. LOC was a nice measure, and encouraged terser and more idiomatic programs than the current crop of performance-tweaked low-level stuff.
BTW, Python isn't so bad, performance wise. Much of what I do consists of reading some files, building up some hashes (associative arrays or finite maps, depending on where you come from :-), and generating some output. Python used to do pretty well here compared to Haskell, with rather efficient hashes and text parsing, although I suspect ByteString IO and other optimizations may have changed that now.
-k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ketil Malde wrote:
Python used to do pretty well here compared to Haskell, with rather efficient hashes and text parsing, although I suspect ByteString IO and other optimizations may have changed that now.
It still does just fine. For typical "munge a file with regexps, lists, and maps" tasks, Python and Perl remain on par with comparably written Haskell. This because the scripting-level code acts as a thin layer of glue around I/O, regexps, lists, and dicts, all of which are written in native code. The Haskell regexp libraries actually give us something of a leg down with respect to Python and Perl. The aggressive use of polymorphism in the return type of (=~) makes it hard to remember which of the possible return types gives me what information. Not only did I write a regexp tutorial to understand the API in the first place, I have to reread it every time I want to match a regexp. A suitable solution would be a return type of RegexpMatch a => Maybe a (to live alongside the existing types, but aiming to become the one that's easy to remember), with appropriate methods on a, but I don't have time to write up a patch.

Hi Bryan, I wrote the current regex API, so your suggestions are interesting to me. The also goes for anyone else's regex API opinions, of course. Bryan O'Sullivan wrote:
Ketil Malde wrote:
Python used to do pretty well here compared to Haskell, with rather efficient hashes and text parsing, although I suspect ByteString IO and other optimizations may have changed that now.
It still does just fine. For typical "munge a file with regexps, lists, and maps" tasks, Python and Perl remain on par with comparably written Haskell. This because the scripting-level code acts as a thin layer of glue around I/O, regexps, lists, and dicts, all of which are written in native code.
The Haskell regexp libraries actually give us something of a leg down with respect to Python and Perl.
True, the pure Haskell library is not as fast as a C library. In particular, the current regex-tdfa handles lazy bytestring in a sub-optimal manner. This may eventually be fixed. But the native code libraries have also been wrapped in the same API, and they are quite fast when combined with strict ByteStrings.
The aggressive use of polymorphism in the return type of (=~) makes it hard to remember which of the possible return types gives me what information. Not only did I write a regexp tutorial to understand the API in the first place, I have to reread it every time I want to match a regexp.
The (=~) operator uses many return types provided by the instances of RegexContext. These are all thin wrappers around the unpolymorphic return types of the RegexLike class. So (=~) could be avoided altogether, or another API created.
A suitable solution would be a return type of RegexpMatch a => Maybe a (to live alongside the existing types, but aiming to become the one that's easy to remember), with appropriate methods on a, but I don't have time to write up a patch.
The (=~~) is the monadic wrapper for (=~) to allow for different failure behaviors. So using (=~~) with Maybe is already possible, and gives Nothing whenever there are zero matches. But more interesting to me is learning what API you would like to see. What would you like the code that uses the API to be? Could you sketch either the definition or usage of your RegexMatch class suggestion? I don't use my own regex API much, so external feedback and ideas would be wonderful. -- Chris

ChrisK wrote:
The Haskell regexp libraries actually give us something of a leg down with respect to Python and Perl.
True, the pure Haskell library is not as fast as a C library.
Actually, I wasn't referring to the performance of the libraries, merely to the non-stick nature of the API. For my purposes, regex-pcre performs well (though I owe you some patches to make it, and other regex back ends, compile successfully out of the box).
But more interesting to me is learning what API you would like to see. What would you like the code that uses the API to be?
Python's regexp API is pretty easy to use, and also to remember. Here's what it does for match objects. http://docs.python.org/lib/match-objects.html

Unfortunately, they replaced line counts with bytes of gzip'ed code -- while the former certainly has its problems, I simply cannot imagine what relevance the latter has (beyond hiding extreme amounts of repetitive boilerplate in certain languages).
Sounds pretty fair to me. Programming is a job of compressing a solution set. Excessive boilerplate might mean that you have to type a lot, but doesn't necessarily mean that you have to think a lot. I think the previous line count was skewed in favor of very terse languages like haskell, especially languages that let you put many ideas onto a single line. At the very least there should be a constant factor applied when comparing haskell line counts to python line counts, for example. (python has very strict rules about putting multiple things on the same line). Obviously no simple measure is going to satisfy everyone, but I think the gzip measure is more even handed across a range of languages. It probably more closely aproximates the amount of mental effort and hence time it requires to construct a program (ie. I can whip out a lot of lines of code in python very quickly, but it takes a lot more of them to do the same work as a single, dense, line of haskell code).
When we compete against Python and its ilk, we do so for programmer productivity first, and performance second. LOC was a nice measure, and encouraged terser and more idiomatic programs than the current crop of performance-tweaked low-level stuff.
The haskell entries to the shootout are very obviously written for speed and not elegance. If you want to do better on the LoC measure, you can definitely do so (at the expense of speed).
-k
Tim Newsham http://www.thenewsh.com/~newsham/

On 01/11/2007, Tim Newsham
Unfortunately, they replaced line counts with bytes of gzip'ed code -- while the former certainly has its problems, I simply cannot imagine what relevance the latter has (beyond hiding extreme amounts of repetitive boilerplate in certain languages).
Sounds pretty fair to me. Programming is a job of compressing a solution set. Excessive boilerplate might mean that you have to type a lot, but doesn't necessarily mean that you have to think a lot. I think the previous line count was skewed in favor of very terse languages like haskell, especially languages that let you put many ideas onto a single line. At the very least there should be a constant factor applied when comparing haskell line counts to python line counts, for example. (python has very strict rules about putting multiple things on the same line).
Obviously no simple measure is going to satisfy everyone, but I think the gzip measure is more even handed across a range of languages. It probably more closely aproximates the amount of mental effort and hence time it requires to construct a program (ie. I can whip out a lot of lines of code in python very quickly, but it takes a lot more of them to do the same work as a single, dense, line of haskell code).
When we compete against Python and its ilk, we do so for programmer productivity first, and performance second. LOC was a nice measure, and encouraged terser and more idiomatic programs than the current crop of performance-tweaked low-level stuff.
The haskell entries to the shootout are very obviously written for speed and not elegance. If you want to do better on the LoC measure, you can definitely do so (at the expense of speed).
Personally I think syntactic noise is highly distracting, and semantic noise is even worse! Gzip'd files don't show you that one language will require you to do 90% book-keeping for 10% algorithm, while the other lets you get on with the job, it may make it look as if both languages are roughly equally good at letting the programmer focus on the important bits. I'm not sure what metric to use, but actively disgusing noisy languages using compression sure doesn't seem like anywhere close to the ideal. Token count would be good, but then we'd need a parser for each language, which is quite a bit of work to do... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Hello Sebastian, Thursday, November 1, 2007, 9:58:45 PM, you wrote:
the ideal. Token count would be good, but then we'd need a parser for each language, which is quite a bit of work to do...
i think that wc (word count) would be good enough approximation -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 02/11/2007, Bulat Ziganshin
Hello Sebastian,
Thursday, November 1, 2007, 9:58:45 PM, you wrote:
the ideal. Token count would be good, but then we'd need a parser for each language, which is quite a bit of work to do...
i think that wc (word count) would be good enough approximation
Yes, as long as you police abuse ( eg "if(somevar)somfunccall(foo,bar,baz)"shouldn't be treated as a single word)). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

"Sebastian Sylvan"
Obviously no simple measure is going to satisfy everyone, but I think the gzip measure is more even handed across a range of languages. It probably more closely aproximates the amount of mental effort [..]
I'm not sure I follow that reasoning? At any rate, I think the ICFP contest is much better as a measure of productivity. But, just like for performance, LOC for the shootout can be used as a micro-benchmark.
Personally I think syntactic noise is highly distracting, and semantic noise is even worse!
This is important - productivity doesn't depend so much on the actual typing, but the ease of refactoring, identifying and fixing bugs, i.e *reading* code. Verbosity means noise, and also lower information content in a screenful of code. I think there were some (Erlang?) papers where they showed a correlation between program size (in LOC), time of development, and possibly number of bugs?) - regardless of language.
Token count would be good, but then we'd need a parser for each language, which is quite a bit of work to do...
Whatever you do, it'll be an approximation, so why not 'wc -w'? With 'wc -c' for J etc where programs can be written as spaceless sequences of symbols. Or just average chars, words and lines? -k -- If I haven't seen further, it is by standing in the footprints of giants

Hi
So in a few years time when GHC has matured we can expect performance to be on par with current Clean? So Clean is a good approximation to peak performance?
No. The performance of many real world programs could be twice as fast at least, I'm relatively sure. Clean is a good short term target, but in the long run Haskell should be aiming for equivalence with highly optimised C. Thanks Neil

On 10/31/07, Neil Mitchell
in the long run Haskell should be aiming for equivalence with highly optimised C.
Really, that's not very ambitious. Haskell should be setting its sights higher. :-) When I first started reading about Haskell I misunderstood what currying was all about. I thought that if you provided one argument to a two argument function, say, then it'd do partial evaluation. Very I soon I was sorely let down as I discovered that it simply made a closure that waits for the second argument to arrive so the reduction can be carried out. But every day, while coding at work (in C++), I see situations where true partial evaluation would give a big performance payoff, and yet there are so few languages that natively support it. Of course it would require part of the compiler to be present in the runtime. But by generating code in inner loops specialised to the data at hand it could easily outperform C code in a wide variety of real world code. I know there has been some research in this area, and some commercial C++ products for partial evaluation have appeared, so I'd love to see it in an easy to use Haskell form one day. Just dreaming, I know... -- Dan

On Wed, 31 Oct 2007, Dan Piponi wrote:
But every day, while coding at work (in C++), I see situations where true partial evaluation would give a big performance payoff, and yet there are so few languages that natively support it. Of course it would require part of the compiler to be present in the runtime. But by generating code in inner loops specialised to the data at hand it could easily outperform C code in a wide variety of real world code. I know there has been some research in this area, and some commercial C++ products for partial evaluation have appeared, so I'd love to see it in an easy to use Haskell form one day.
I weakly remember an article on Hawiki about that ... If you write foo :: X -> Y -> Z foo x = let bar y = ... x ... y ... in bar would this give you true partial evaluation?

On Wed, 2007-10-31 at 23:44 +0100, Henning Thielemann wrote:
On Wed, 31 Oct 2007, Dan Piponi wrote:
But every day, while coding at work (in C++), I see situations where true partial evaluation would give a big performance payoff, and yet there are so few languages that natively support it. Of course it would require part of the compiler to be present in the runtime. But by generating code in inner loops specialised to the data at hand it could easily outperform C code in a wide variety of real world code. I know there has been some research in this area, and some commercial C++ products for partial evaluation have appeared, so I'd love to see it in an easy to use Haskell form one day.
I weakly remember an article on Hawiki about that ...
Probably RuntimeCompilation (or something like that and linked from the Knuth-Morris-Pratt implementation on HaWiki) written by Andrew Bromage.
If you write
foo :: X -> Y -> Z foo x = let bar y = ... x ... y ... in bar
would this give you true partial evaluation?
No. Partial evaluation (usually) implies a heck of a lot more than what you are trying to do.

G'day all.
Quoting Derek Elkins
Probably RuntimeCompilation (or something like that and linked from the Knuth-Morris-Pratt implementation on HaWiki) written by Andrew Bromage.
I didn't keep a copy, but if someone wants to retrieve it from the Google cache and put it on the new wiki (under the new licence, of course), please do so. Cheers, Andrew Bromage

On 10/31/07, ajb@spamcop.net
I didn't keep a copy, but if someone wants to retrieve it from the Google cache and put it on the new wiki (under the new licence, of course), please do so.
Cheers, Andrew Bromage
Done: http://www.haskell.org/haskellwiki/RuntimeCompilation . Please update it as needed. Justin

Quoting Justin Bailey
Done: http://www.haskell.org/haskellwiki/RuntimeCompilation . Please update it as needed.
Thanks! Cheers, Andrew Bromage

There are many ways to implement currying. And even with GHC you can get it
to do some work given one argument if you write the function the right way.
I've used this in some code where it was crucial.
But yeah, a code generator at run time is a very cool idea, and one that has
been studied, but not enough.
-- Lennart
On 10/31/07, Dan Piponi
On 10/31/07, Neil Mitchell
wrote: in the long run Haskell should be aiming for equivalence with highly optimised C.
Really, that's not very ambitious. Haskell should be setting its sights higher. :-)
When I first started reading about Haskell I misunderstood what currying was all about. I thought that if you provided one argument to a two argument function, say, then it'd do partial evaluation. Very I soon I was sorely let down as I discovered that it simply made a closure that waits for the second argument to arrive so the reduction can be carried out.
But every day, while coding at work (in C++), I see situations where true partial evaluation would give a big performance payoff, and yet there are so few languages that natively support it. Of course it would require part of the compiler to be present in the runtime. But by generating code in inner loops specialised to the data at hand it could easily outperform C code in a wide variety of real world code. I know there has been some research in this area, and some commercial C++ products for partial evaluation have appeared, so I'd love to see it in an easy to use Haskell form one day.
Just dreaming, I know... -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello Lennart, Thursday, November 1, 2007, 2:45:49 AM, you wrote:
But yeah, a code generator at run time is a very cool idea, and one that has been studied, but not enough.
vm-based languages (java, c#) has runtimes that compile bytecode to the native code at runtime -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Yes, of course. But they don't do partial evaluation.
On 11/1/07, Bulat Ziganshin
Hello Lennart,
Thursday, November 1, 2007, 2:45:49 AM, you wrote:
But yeah, a code generator at run time is a very cool idea, and one that has been studied, but not enough.
vm-based languages (java, c#) has runtimes that compile bytecode to the native code at runtime
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Oct 31, 2007 at 03:37:12PM +0000, Neil Mitchell wrote:
Hi
I've been working on optimising Haskell for a little while (http://www-users.cs.york.ac.uk/~ndm/supero/), so here are my thoughts on this. The Clean and Haskell languages both reduce to pretty much the same Core language, with pretty much the same type system, once you get down to it - so I don't think the difference between the performance is a language thing, but it is a compiler thing. The uniqueness type stuff may give Clean a slight benefit, but I'm not sure how much they use that in their analyses.
Both Clean and GHC do strictness analysis - I don't know which one does better, but both do quite well. I think Clean has some generalised fusion framework, while GHC relies on rules and short-cut deforestation. GHC goes through C-- to C or ASM, while Clean has been generating native code for a lot longer. GHC is based on the STG machine, while Clean is based on the ABC machine - not sure which is better, but there are differences there.
My guess is that the native code generator in Clean beats GHC, which wouldn't be too surprising as GHC is currently rewriting its CPS and Register Allocator to produce better native code.
I don't think the register allocater is being rewritten so much as it is being written: stefan@stefans:/tmp$ cat X.hs module X where import Foreign import Data.Int memset :: Ptr Int32 -> Int32 -> Int -> IO () memset p v i = p `seq` v `seq` case i of 0 -> return () _ -> poke p v >> memset (p `plusPtr` sizeOf v) v (i - 1) stefan@stefans:/tmp$ ghc -fbang-patterns -O2 -c -fforce-recomp -ddump-asm X.hs ... X_zdwa_info: movl 8(%ebp),%eax testl %eax,%eax jne .LcH6 movl $base_GHCziBase_Z0T_closure+1,%esi addl $12,%ebp jmp *(%ebp) .LcH6: movl 4(%ebp),%ecx movl (%ebp),%edx movl %ecx,(%edx) movl (%ebp),%ecx addl $4,%ecx decl %eax movl %eax,8(%ebp) movl %ecx,(%ebp) jmp X_zdwa_info ... Admittedly that's better than it used to be (I recall 13 memory references last time I tested it), but still... the reason for your performance woes should be quite obvious in that snippet. Stefan

Hi
I don't think the register allocater is being rewritten so much as it is being written:
From talking to Ben, who rewrote the register allocator over the summer, he said that the new graph based register allocator is pretty good. The thing that is holding it back is the CPS conversion bit, which was also being rewritten over the summer, but didn't get finished. I think these are both things which are likely to be done for 6.10.
Thanks Neil

On Thu, Nov 01, 2007 at 02:30:17AM +0000, Neil Mitchell wrote:
Hi
I don't think the register allocater is being rewritten so much as it is being written:
From talking to Ben, who rewrote the register allocator over the summer, he said that the new graph based register allocator is pretty good. The thing that is holding it back is the CPS conversion bit, which was also being rewritten over the summer, but didn't get finished. I think these are both things which are likely to be done for 6.10.
Oh, that's good news. I look forward to a massive increase in the performance of GHC-compiled programs, most specifically GHC itself. Stefan

Yes, that's right. We'll be doing a lot more work on the code generator in the rest of this year and 2008. Here "we" includes Norman Ramsey and John Dias, as well as past interns Michael Adams and Ben Lippmeier, so we have real muscle! Simon | > I don't think the register allocater is being rewritten so much as it is | > being written: | | >From talking to Ben, who rewrote the register allocator over the | summer, he said that the new graph based register allocator is pretty | good. The thing that is holding it back is the CPS conversion bit, | which was also being rewritten over the summer, but didn't get | finished. I think these are both things which are likely to be done | for 6.10. | | Thanks | | Neil | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

On 01/11/2007, Simon Peyton-Jones
Yes, that's right. We'll be doing a lot more work on the code generator in the rest of this year and 2008. Here "we" includes Norman Ramsey and John Dias, as well as past interns Michael Adams and Ben Lippmeier, so we have real muscle!
That's very good to know. I wonder where could I read more about current state of the art on Haskell compilation techniques and about the implementation of ghc in general? Is there a book on it or maybe some group of papers that would aid me to understand it? Cheers, Paulo Matos
Simon
| > I don't think the register allocater is being rewritten so much as it is | > being written: | | >From talking to Ben, who rewrote the register allocator over the | summer, he said that the new graph based register allocator is pretty | good. The thing that is holding it back is the CPS conversion bit, | which was also being rewritten over the summer, but didn't get | finished. I think these are both things which are likely to be done | for 6.10. | | Thanks | | Neil | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Paulo Jorge Matos - pocm at soton.ac.uk http://www.personal.soton.ac.uk/pocm PhD Student @ ECS University of Southampton, UK

http://hackage.haskell.org/trac/ghc/wiki/Commentary
| -----Original Message-----
| From: pocmatos@gmail.com [mailto:pocmatos@gmail.com] On Behalf Of Paulo J. Matos
| Sent: 01 November 2007 13:42
| To: Simon Peyton-Jones
| Cc: Neil Mitchell; Stefan O'Rear; Jeff.Harper@handheld.com; haskell-cafe@haskell.org
| Subject: Re: [Haskell-cafe] Re: Why can't Haskell be faster?
|
| On 01/11/2007, Simon Peyton-Jones

On 01/11/2007, at 2:37 AM, Neil Mitchell wrote:
My guess is that the native code generator in Clean beats GHC, which wouldn't be too surprising as GHC is currently rewriting its CPS and Register Allocator to produce better native code.
I discussed this with Rinus Plasmeijer (chief designer of Clean) a couple of years ago, and if I remember correctly, he said that the native code generator in Clean was very good, and a significant reason why Clean produces (relatively) fast executables. I think he said that they had an assembly programming guru on their team. (Apologies to Rinus if I am mis-remembering the conversation). At the time I was impressed by how fast Clean could recompile itself. Cheers, Bernie.

Bernie wrote:
I discussed this with Rinus Plasmeijer (chief designer of Clean) a couple of years ago, and if I remember correctly, he said that the native code generator in Clean was very good, and a significant reason why Clean produces (relatively) fast executables. I think he said that they had an assembly programming guru on their team. (Apologies to Rinus if I am mis-remembering the conversation).
That guru would be John van Groningen... If I understood correctly, and I think I did, John is now working on a Haskell front end for the Clean compiler---which is actually quite interesting in the light of the present discussion. Cheers, Stefan

Neil wrote:
The Clean and Haskell languages both reduce to pretty much the same Core language, with pretty much the same type system, once you get down to it - so I don't think the difference between the performance is a language thing, but it is a compiler thing. The uniqueness type stuff may give Clean a slight benefit, but I'm not sure how much they use that in their analyses.
From what I know from the Nijmegen team, having the uniqueness information available and actually using it for code generation does allow for an impressive speed-up. The thing is: in principle, there is, I think, no reason why we can't do the same thing for Haskell. Of course, the Clean languages exposes uniqueness types at its surface level, but that is in no way essential to the underlying analysis. Exposing uniqueness types is, in that sense, just an alternative to monadic encapsulation of side effects. So, a Haskell compiler could just implement a uniqueness analysis under the hood and use the results for generating code that does in-place updates that are guaranteed to maintain referential transparency. Interestingly---but now I'm shamelessly plugging a paper of Jurriaan Hage, Arie Middelkoop, and myself, presented at this year's ICFP [*]---such an analysis is very similar to sharing analysis, which may be used by compilers for lazy languages to avoid unnecessary thunk updates. Cheers, Stefan [*] Jurriaan Hage, Stefan Holdermans, and Arie Middelkoop. A generic usage analysis with subeffect qualifiers. In Ralf Hinze and Norman Ramsey, editors, Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP 2007, Freiburg, Germany, October 1–-3, pages 235–-246. ACM Press, 2007. http://doi.acm.org/ 10.1145/1291151.1291189.

Stefan Holdermans wrote:
Exposing uniqueness types is, in that sense, just an alternative to monadic encapsulation of side effects.
While *World -> (a, *World) seems to work in practice, I wonder what its (denotational) semantics are. I mean, the two programs loop, loop' :: *World -> ((),*World) loop w = loop w loop' w = let (_,w') = print "x" w in loop' w' both have denotation _|_ but are clearly different in terms of side effects. (The example is from SPJs awkward-squad tutorial.) Any pointers? Regards, apfelmus

Hi there,
I'm new here, so sorry if I'm stating the obvious.
On Nov 1, 2007 2:46 PM, apfelmus
Stefan Holdermans wrote:
Exposing uniqueness types is, in that sense, just an alternative to monadic encapsulation of side effects.
While *World -> (a, *World) seems to work in practice, I wonder what its (denotational) semantics are. I mean, the two programs
loop, loop' :: *World -> ((),*World)
loop w = loop w loop' w = let (_,w') = print "x" w in loop' w'
both have denotation _|_ but are clearly different in terms of side effects. (The example is from SPJs awkward-squad tutorial.) Any pointers?
For side-effecting program one has to consider bisimilarity. It's common that semantically equivalent (under operational or denotational semantics) behave differently with regard to side-effects (i/o) - i.e. they are not bisimilar. http://en.wikipedia.org/wiki/Bisimulation Arnar


Paul Hudak wrote:
loop, loop' :: *World -> ((),*World)
loop w = loop w loop' w = let (_,w') = print "x" w in loop' w'
both have denotation _|_ but are clearly different in terms of side effects.
One can certainly use an operational semantics such as bisimulation, but you don't have to abandon denotational semantics. The trick is to make output part of the "final answer". For a conventional imperative language one could define, for example, a (lifted, recursive) domain:
Answer = Terminate + (String x Answer)
and then define a semantic function "meaning", say, such that:
meaning loop = _|_ meaning loop' = <"x", <"x", ... >>
In other words, loop denotes bottom, whereas loop' denotes the infinite sequence of "x"s. There would typically also be a symbol to denote proper termination, perhaps <>.
A good read on this stuff is Reynolds book "Theories of Programming Languages", where domain constructions such as the above are called "resumptions", and can be made to include input as well.
In the case of Clean, programs take as input a "World" and generate a "World" as output. One of the components of that World would presumably be "standard output", and that component's value would be _|_ in the case of loop, and <"x", <"x", ... >> in the case of loop'. Another part of the World might be a file system, a printer, a missile firing, and so on. Presumably loop and loop' would not affect those parts of the World.
Ah, so the denotational semantics would be a bit like good old stream-based IO. However, we have to change the semantics of -> , there's no way to put the side effects in *World only. I mean, the problem of both loop and loop' is that they never return any world at all, the side effects occur during function evaluation. Then, we'd need a "purity lemma" that states that any function not involving the type *World as in- and output is indeed pure, which may be a bit tricky to prove in the presence of higher-order functions and polymorphism. I mean, the function arrows are "tagged" for side effects in a strange way, namely by looking like *World -> ... -> (*World, ...). In contrast, we can see IO a as an abstract (co-)data type subject to some straightforward operational semantics, no need to mess with the pure -> . So, in a sense, the Haskell way is cleaner than the Clean way ;) Regards, apfelmus

On Nov 2, 2007, at 6:35 , apfelmus wrote:
during function evaluation. Then, we'd need a "purity lemma" that states that any function not involving the type *World as in- and output is indeed pure, which may be a bit tricky to prove in the presence of higher-order functions and polymorphism. I mean, the function arrows are "tagged" for side effects in a strange way, namely by looking like *World -> ... -> (*World, ...).
I don't quite see that; the Clean way looks rather suspiciously like my "unwrapped I/O in GHC" example from a couple weeks ago, so I have trouble seeing where any difficulty involving functions not using *World / RealWorld# creeps in. I will grant that hiding *World / RealWorld# inside IO is cleaner from a practical standpoint, though. Just not from a semantic one. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Brandon S. Allbery KF8NH wrote:
apfelmus wrote:
during function evaluation. Then, we'd need a "purity lemma" that states that any function not involving the type *World as in- and output is indeed pure, which may be a bit tricky to prove in the presence of higher-order functions and polymorphism. I mean, the function arrows are "tagged" for side effects in a strange way, namely by looking like *World -> ... -> (*World, ...).
I don't quite see that; the Clean way looks rather suspiciously like my "unwrapped I/O in GHC" example from a couple weeks ago, so I have trouble seeing where any difficulty involving functions not using *World / RealWorld# creeps in.
I will grant that hiding *World / RealWorld# inside IO is cleaner from a practical standpoint, though. Just not from a semantic one.
What do you mean? I mean, in Clean, we may ask the following question: are all functions of type say forall a . Either (a -> *World -> a) String -> [*World] or Either (forall a . a -> *World -> a) String -> Maybe *World pure? In Haskell, the answer to any such question is unconditionally "yes" (unless you're hacking with unsafePerformIO and GHC internals like RealWorld# of course) even with plenty of appearances of the IO type constructor. But in Clean, functions may perform side effects, that's the only way to explain why the examples loop and loop' aren't the same. Regards, apfelmus

On Fri, 2007-11-02 at 08:35 -0400, Brandon S. Allbery KF8NH wrote:
On Nov 2, 2007, at 6:35 , apfelmus wrote:
during function evaluation. Then, we'd need a "purity lemma" that states that any function not involving the type *World as in- and output is indeed pure, which may be a bit tricky to prove in the presence of higher-order functions and polymorphism. I mean, the function arrows are "tagged" for side effects in a strange way, namely by looking like *World -> ... -> (*World, ...).
I don't quite see that; the Clean way looks rather suspiciously like my "unwrapped I/O in GHC" example from a couple weeks ago, so I have trouble seeing where any difficulty involving functions not using *World / RealWorld# creeps in.
I will grant that hiding *World / RealWorld# inside IO is cleaner from a practical standpoint, though. Just not from a semantic one.
On the contrary. GHC's IO newtype isn't an implementation of IO in Haskell at all. It's an implementation in a language that has a Haskell-compatible subset, but that also has semantically bad constructs like unsafePerformIO and unsafeInterleaveIO that give side effects to operations of pure, non-RealWorld#-involving types. Clean's type system is specified in a way that eliminates both functions from the language, which recovers purity. But proving that is harder than I'd like to attempt. jcc

On Nov 2, 2007, at 11:51 , Jonathan Cast wrote:
I will grant that hiding *World / RealWorld# inside IO is cleaner from a practical standpoint, though. Just not from a semantic one.
On the contrary. GHC's IO newtype isn't an implementation of IO in Haskell at all. It's an implementation in a language that has a Haskell-compatible subset, but that also has semantically bad constructs
Differing viewpoints, I guess; from my angle, Clean's "uniqueness constraint" looks like a hack hidden in the compiler. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Fri, 2007-11-02 at 11:56 -0400, Brandon S. Allbery KF8NH wrote:
On Nov 2, 2007, at 11:51 , Jonathan Cast wrote:
I will grant that hiding *World / RealWorld# inside IO is cleaner from a practical standpoint, though. Just not from a semantic one.
On the contrary. GHC's IO newtype isn't an implementation of IO in Haskell at all. It's an implementation in a language that has a Haskell-compatible subset, but that also has semantically bad constructs
Differing viewpoints, I guess; from my angle, Clean's "uniqueness constraint" looks like a hack hidden in the compiler.
Yeah. After all, the "uniqueness constraint" has a theory with an excellent pedigree (IIUC linear logic, whose proof theory Clean uses here, goes back at least to the 60s, and Wadler proposed linear types for IO before anybody had heard of monads). It's not some random hack somebody happened to notice would work, any more than existential types are. jcc

Hello, Just a bit of minor academic nitpicking...
Yeah. After all, the "uniqueness constraint" has a theory with an excellent pedigree (IIUC linear logic, whose proof theory Clean uses here, goes back at least to the 60s, and Wadler proposed linear types for IO before anybody had heard of monads).
Linear logic/typing does not quite capture uniqueness types since a term with a unique type can always be copied to become non-unique, but a linear type cannot become unrestricted. As a historical note, the first paper on linear logic was published by Girard in 1987; but the purely linear core of linear logic has (non-commutative) antecedents in a system introduced by Lambek in a 1958 paper titled "The Mathematics of Sentence Structure". -Jeff --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

On Fri, 2007-11-02 at 14:41 -0400, Jeff Polakow wrote:
Hello,
Just a bit of minor academic nitpicking...
Yeah. After all, the "uniqueness constraint" has a theory with an excellent pedigree (IIUC linear logic, whose proof theory Clean uses here, goes back at least to the 60s, and Wadler proposed linear types for IO before anybody had heard of monads).
Linear logic/typing does not quite capture uniqueness types since a term with a unique type can always be copied to become non-unique, but a linear type cannot become unrestricted.
Can I write a Clean program with a function that duplicates World?
As a historical note, the first paper on linear logic was published by Girard in 1987; but the purely linear core of linear logic has (non-commutative) antecedents in a system introduced by Lambek in a 1958 paper titled "The Mathematics of Sentence Structure".
OK then. Correction accepted. jcc

Hello,
Just a bit of minor academic nitpicking...
Yeah. After all, the "uniqueness constraint" has a theory with an excellent pedigree (IIUC linear logic, whose proof theory Clean uses here, goes back at least to the 60s, and Wadler proposed linear types for IO before anybody had heard of monads).
Linear logic/typing does not quite capture uniqueness types since a term with a unique type can always be copied to become non-unique, but a linear type cannot become unrestricted.
Can I write a Clean program with a function that duplicates World?
Clean won't let you duplicate the World. My comment on the mismatch with linear logic is aimed more at general uniqueness type systems (e.g. recent work by de Vries, Plasmeijer, and Abrahamson such as https://www.cs.tcd.ie/~devriese/pub/ifl06-paper.pdf). Sorry for the confusion. -Jeff --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

On Fri, 2007-11-02 at 15:43 -0400, Jeff Polakow wrote:
Hello,
Just a bit of minor academic nitpicking...
Yeah. After all, the "uniqueness constraint" has a theory with an excellent pedigree (IIUC linear logic, whose proof theory Clean uses here, goes back at least to the 60s, and Wadler proposed linear types for IO before anybody had heard of monads).
Linear logic/typing does not quite capture uniqueness types since a term with a unique type can always be copied to become non-unique, but a linear type cannot become unrestricted.
Can I write a Clean program with a function that duplicates World?
Clean won't let you duplicate the World. My comment on the mismatch with linear logic is aimed more at general uniqueness type systems (e.g. recent work by de Vries, Plasmeijer, and Abrahamson such as https://www.cs.tcd.ie/~devriese/pub/ifl06-paper.pdf). Sorry for the confusion.
Ah. I see. jcc
participants (28)
-
ajb@spamcop.net
-
apfelmus
-
Arnar Birgisson
-
Bernie Pope
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
ChrisK
-
Dan Piponi
-
Derek Elkins
-
Don Stewart
-
Henning Thielemann
-
Jeff Polakow
-
Jeff.Harper@handheld.com
-
Jonathan Cast
-
Justin Bailey
-
Ketil Malde
-
Lennart Augustsson
-
Neil Mitchell
-
Paul Hudak
-
Paulo J. Matos
-
Rodrigo Queiro
-
Ryan Dickie
-
Sebastian Sylvan
-
Simon Peyton-Jones
-
Stefan Holdermans
-
Stefan O'Rear
-
Tim Newsham