
Hello GHC, This page http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeG... mentions a to-be-developed "dataflow rewriting engine". Can someone please send a description of what this will do? Thanks! -- Chad Scherrer "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx

Norman, John Would you care to respond to this? (Perhaps by amplifying the wiki page?) A good starting point is perhaps Craig's paper. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On | Behalf Of Chad Scherrer | Sent: 22 August 2008 22:21 | To: GHC Users | Subject: "dataflow rewriting engine" | | Hello GHC, | | This page | http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeG... | mentions a to-be-developed "dataflow rewriting engine". Can someone | please send a description of what this will do? | | Thanks! | -- | | Chad Scherrer | | "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I've added some text and links to point the reader in the right direction. Here's the new text, cribbed from the Wiki: Dataflow optimization: We can define a new optimization simply by defining a lattice of dataflow facts (akin to a specialized logic) and then writing the dataflow-transfer functions found in compiler textbooks. Handing these functions to the dataflow engine produces a new optimization that is not only useful on its own, but that can easily be composed with other optimizations to create an integrated "superoptimization" that is strictly more powerful than any sequence of individual optimizations, no matter how many times they are re-run. The dataflow engine is based on (Lerner, Grove, and Chambers 2002 http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find a functional implementation of the dataflow engine presented in (Ramsey and Dias 2005 http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html). Let me know how I can further clarify the text, -j
-----Original Message----- From: Simon Peyton-Jones Sent: Tuesday, August 26, 2008 1:32 PM To: Norman Ramsey; John Dias Cc: Chad Scherrer; GHC Users Subject: RE: "dataflow rewriting engine"
Norman, John
Would you care to respond to this? (Perhaps by amplifying the wiki page?) A good starting point is perhaps Craig's paper.
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On | Behalf Of Chad Scherrer | Sent: 22 August 2008 22:21 | To: GHC Users | Subject: "dataflow rewriting engine" | | Hello GHC, | | This page | http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Integrated CodeGen | mentions a to-be-developed "dataflow rewriting engine". Can someone | please send a description of what this will do? | | Thanks! | -- | | Chad Scherrer | | "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I think we're all rather excited about seeing this stuff land. What's the expected timeline, wrt. ghc 6.10's release? -- Don t-jodias:
I've added some text and links to point the reader in the right direction. Here's the new text, cribbed from the Wiki:
Dataflow optimization: We can define a new optimization simply by defining a lattice of dataflow facts (akin to a specialized logic) and then writing the dataflow-transfer functions found in compiler textbooks. Handing these functions to the dataflow engine produces a new optimization that is not only useful on its own, but that can easily be composed with other optimizations to create an integrated "superoptimization" that is strictly more powerful than any sequence of individual optimizations, no matter how many times they are re-run. The dataflow engine is based on (Lerner, Grove, and Chambers 2002 http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find a functional implementation of the dataflow engine presented in (Ramsey and Dias 2005 http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html).
Let me know how I can further clarify the text, -j
-----Original Message----- From: Simon Peyton-Jones Sent: Tuesday, August 26, 2008 1:32 PM To: Norman Ramsey; John Dias Cc: Chad Scherrer; GHC Users Subject: RE: "dataflow rewriting engine"
Norman, John
Would you care to respond to this? (Perhaps by amplifying the wiki page?) A good starting point is perhaps Craig's paper.
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On | Behalf Of Chad Scherrer | Sent: 22 August 2008 22:21 | To: GHC Users | Subject: "dataflow rewriting engine" | | Hello GHC, | | This page | http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Integrated CodeGen | mentions a to-be-developed "dataflow rewriting engine". Can someone | please send a description of what this will do? | | Thanks! | -- | | Chad Scherrer | | "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Has there been any thought about working with the LLVM project? I didn't find anything on the wiki along those lines. Thanks, Deborah On Aug 26, 2008, at 10:57 AM, Don Stewart wrote:
I think we're all rather excited about seeing this stuff land. What's the expected timeline, wrt. ghc 6.10's release?
-- Don
t-jodias:
I've added some text and links to point the reader in the right direction. Here's the new text, cribbed from the Wiki:
Dataflow optimization: We can define a new optimization simply by defining a lattice of dataflow facts (akin to a specialized logic) and then writing the dataflow-transfer functions found in compiler textbooks. Handing these functions to the dataflow engine produces a new optimization that is not only useful on its own, but that can easily be composed with other optimizations to create an integrated "superoptimization" that is strictly more powerful than any sequence of individual optimizations, no matter how many times they are re- run. The dataflow engine is based on (Lerner, Grove, and Chambers 2002 http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find a functional implementation of the dataflow engine presented in (Ramsey and Dias 2005 http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html).
Let me know how I can further clarify the text, -j
-----Original Message----- From: Simon Peyton-Jones Sent: Tuesday, August 26, 2008 1:32 PM To: Norman Ramsey; John Dias Cc: Chad Scherrer; GHC Users Subject: RE: "dataflow rewriting engine"
Norman, John
Would you care to respond to this? (Perhaps by amplifying the wiki page?) A good starting point is perhaps Craig's paper.
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On | Behalf Of Chad Scherrer | Sent: 22 August 2008 22:21 | To: GHC Users | Subject: "dataflow rewriting engine" | | Hello GHC, | | This page | http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Integrated CodeGen | mentions a to-be-developed "dataflow rewriting engine". Can someone | please send a description of what this will do? | | Thanks! | -- | | Chad Scherrer | | "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Excerpts from Deborah Goldsmith's message of Tue Aug 26 14:26:33 -0500 2008:
Has there been any thought about working with the LLVM project? I didn't find anything on the wiki along those lines.
I imagine if tried, haskell as a language on LLVM would quickly end up a second-class citizen, due to non-strict semantics? Austin

Deborah Goldsmith:
Has there been any thought about working with the LLVM project? I didn't find anything on the wiki along those lines.
I have only had a rather brief look at LLVM, but my understanding at the moment is that LLVM would not be able to support one of GHC's current code layout optimisations. More precisely, with LLVM, it would not be possible to enforce that the meta data for a closure is placed right before (in terms of layout in the address space) the code executing the "eval" method of that same closure. GHC uses that to have the closure code pointer point directly to the "eval" code (and hence also by an appropriate offset) to the various fields of the meta data. If that layout cannot be ensured, GHC needs to take one more indirection to execute "evals" (which is a very frequent operation) - this is what an unregistered build does btw. However, I am not convinced that this layout optimisation is really gaining that much extra performance these days. In particular, since dynamic pointer tagging, very short running "evals" (for which the extra indirection incurs the largest overhead) have become less frequent. Even if there is a slight performance regression, I think, it would be worthwhile to consider giving up on the described layout constraint. It is the Last Quirk that keeps GHC from using standard compiler back-ends (such as LLVM), and I suspect, it is not worth it anymore. When we discussed this last, Simon Marlow planned to run benchmarks to determine how much performance the layout optimisation gains us these days. Simon, did you ever get around to that? Manuel
On Aug 26, 2008, at 10:57 AM, Don Stewart wrote:
I think we're all rather excited about seeing this stuff land. What's the expected timeline, wrt. ghc 6.10's release?
-- Don
t-jodias:
I've added some text and links to point the reader in the right direction. Here's the new text, cribbed from the Wiki:
Dataflow optimization: We can define a new optimization simply by defining a lattice of dataflow facts (akin to a specialized logic) and then writing the dataflow-transfer functions found in compiler textbooks. Handing these functions to the dataflow engine produces a new optimization that is not only useful on its own, but that can easily be composed with other optimizations to create an integrated "superoptimization" that is strictly more powerful than any sequence of individual optimizations, no matter how many times they are re- run. The dataflow engine is based on (Lerner, Grove, and Chambers 2002 http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find a functional implementation of the dataflow engine presented in (Ramsey and Dias 2005 http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html).
Let me know how I can further clarify the text, -j
-----Original Message----- From: Simon Peyton-Jones Sent: Tuesday, August 26, 2008 1:32 PM To: Norman Ramsey; John Dias Cc: Chad Scherrer; GHC Users Subject: RE: "dataflow rewriting engine"
Norman, John
Would you care to respond to this? (Perhaps by amplifying the wiki page?) A good starting point is perhaps Craig's paper.
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow- haskell-users-bounces@haskell.org] On | Behalf Of Chad Scherrer | Sent: 22 August 2008 22:21 | To: GHC Users | Subject: "dataflow rewriting engine" | | Hello GHC, | | This page | http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Integrated CodeGen | mentions a to-be-developed "dataflow rewriting engine". Can someone | please send a description of what this will do? | | Thanks! | -- | | Chad Scherrer | | "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Wed, 2008-08-27 at 10:25 +1000, Manuel M T Chakravarty wrote:
However, I am not convinced that this layout optimisation is really gaining that much extra performance these days. In particular, since dynamic pointer tagging, very short running "evals" (for which the extra indirection incurs the largest overhead) have become less frequent. Even if there is a slight performance regression, I think, it would be worthwhile to consider giving up on the described layout constraint. It is the Last Quirk that keeps GHC from using standard compiler back-ends (such as LLVM), and I suspect, it is not worth it anymore.
There's also a potential benefit on the other side, that the cpu's instruction cache is not wasted on non-instruction data. Apparently some cpus also do instruction read-ahead and suffer slowdown if they encounter data that does not decode as valid op codes. Obviously it's not allowed to fault since the instructions are not actually executed but it can impact on performance according to Intel manuals (presumably because it ends up flushing some instruction decoding caches or something). Of course, as you've discussed, the only way to find out is to benchmark. Duncan

| > Has there been any thought about working with the LLVM project? I | > didn't find anything on the wiki along those lines. | | I have only had a rather brief look at LLVM, but my understanding at | the moment is that LLVM would not be able to support one of GHC's | current code layout optimisations. With the new, modular code generation framework, described here http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen I think it'll be much easier [for someone else] to consider using LLVM. I don't see the "data next to code" thing as a show-stopper; it used to be a performance win of a handful of percent, but I don't think we have recent data. And an LLVM back end might be willing to give up that few percent. There may well be other difficulties, of course. If anyone is interested, the best thing is to wait for the new stuff to be committed to the HEAD (which will happen in Sept), and look at the new Cmm data types (which define the language you'd have to convert to LLVM). Simon

Manuel M T Chakravarty wrote:
Deborah Goldsmith:
Has there been any thought about working with the LLVM project? I didn't find anything on the wiki along those lines.
I have only had a rather brief look at LLVM, but my understanding at the moment is that LLVM would not be able to support one of GHC's current code layout optimisations. More precisely, with LLVM, it would not be possible to enforce that the meta data for a closure is placed right before (in terms of layout in the address space) the code executing the "eval" method of that same closure. GHC uses that to have the closure code pointer point directly to the "eval" code (and hence also by an appropriate offset) to the various fields of the meta data. If that layout cannot be ensured, GHC needs to take one more indirection to execute "evals" (which is a very frequent operation) - this is what an unregistered build does btw.
However, I am not convinced that this layout optimisation is really gaining that much extra performance these days. In particular, since dynamic pointer tagging, very short running "evals" (for which the extra indirection incurs the largest overhead) have become less frequent. Even if there is a slight performance regression, I think, it would be worthwhile to consider giving up on the described layout constraint. It is the Last Quirk that keeps GHC from using standard compiler back-ends (such as LLVM), and I suspect, it is not worth it anymore.
When we discussed this last, Simon Marlow planned to run benchmarks to determine how much performance the layout optimisation gains us these days. Simon, did you ever get around to that?
I didn't get around to benchmarking it, but since the layout optimisation is easily switched off (it's called tablesNextToCode inside GHC) there's really nothing stopping someone from building a backend that doesn't rely on it. Everything works without this optimisation, including GHCi, the debugger, and the FFI. My guess is you'd pay a few percent on average for not doing it. You're quite right that pointer tagging makes it less attractive, but like most optimisations there are programs that fall outside the common case. Programs that do a lot of thunk evals will suffer the most. Cheers, Simon

| I think we're all rather excited about seeing this stuff land. | What's the expected timeline, wrt. ghc 6.10's release? Good question. I've updated the overview here http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen to say what we plan. Simon

Wow, lots of great information. We'll take a look at the papers and get back if there's any remaining confusion. Thanks! Chad
participants (9)
-
Austin Seipp
-
Chad Scherrer
-
Deborah Goldsmith
-
Don Stewart
-
Duncan Coutts
-
John Dias
-
Manuel M T Chakravarty
-
Simon Marlow
-
Simon Peyton-Jones