Re: [Haskell-cafe] library on common sub-expression elimination?

I guess you mean the function that converts an abstract syntax tree to a directed acyclic graph (DAG). Just for completeness I should mention that if the object language has effects including non-termination, one has to be careful eliminating seemingly common expressions. For example, in \a i -> if i >= 0 then a ! i else if i > 0 then a ! i + a ! (i-1) else .... we see the expression (a ! i) repeated in both branches of the conditional. Eliminating the `duplicate' by pulling it out \a i -> let t = a ! i in if i >= 0 then t else if i > 0 then t + a ! (i-1) else .... would be wrong, wouldn't it? This issue has been extensively investigated in the Fortran compiler community; Elliott, Finne and de Moor's ``Compiling Embedded Languages'' (JFP 2003) talks about it at length. The standard technique to detect occurrences of common subexpressions is so-called hash-consing. There are (OCaml) libraries for it: author = {Jean-Christophe Filli{\^a}tre and Sylvain Conchon}, title = {Type-Safe Modular Hash-Consing}, pages = {12--19}, crossref = "ml2006", doi = "10.1145/1159876.1159880", The upcoming distilled tutorial at DSL 2011 http://dsl2011.bordeaux.inria.fr/index.php?option=com_content&view=article&id=2&Itemid=2 will present a Haskell library for hash-consing. The library can work with the standard Haskell Hash-tables or without them (using Data.Map, for example). The library does _not_ rely on Stable names and other internal GHC operations with unstable semantics. The library will find all common sub-expressions. Incidentally, despite the Lisp-sounding name, hash-consing was invented before Lisp. It was described, for the English audience, in the first volume of Comm. ACM, in 1958: @Article{ Ershov-hash-consing, author = {A. P. Ershov}, title = {On programming of arithmetic operations}, journal = "Communications of the {ACM}", year = 1958, volume = 1, number = 8, pages = {3--6}, doi ="10.1145/368892.368907", url = "http://portal.acm.org/citation.cfm?id=368907" } The translation is quite accurate, as far as I could see, but misses the flowcharts and the diagram of the original paper. This short paper fully describes what we now call hash tables, hash functions, useful properties of hash functions, and hash-consing. The article analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then. The library to be presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described (see the preface to the English translation of Ershov's paper). Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.

Just to make it explicit, is it
\a i ->
let t = a ! i in
if i >= 0 then
t
else if i > 0 then
t + a ! (i-1)
else ....
bad idea, because of last else-case? Can it be mended with
one another pass for if-expressions?
The upcoming distilled tutorial at DSL 2011 - thank you for the link.
I'm going to experiment with data-reify, while the library you've mentioned
is
OCaml only.
2011/8/12
I guess you mean the function that converts an abstract syntax tree to a directed acyclic graph (DAG).
Just for completeness I should mention that if the object language has effects including non-termination, one has to be careful eliminating seemingly common expressions. For example, in
\a i -> if i >= 0 then a ! i else if i > 0 then a ! i + a ! (i-1) else ....
we see the expression (a ! i) repeated in both branches of the conditional. Eliminating the `duplicate' by pulling it out
\a i -> let t = a ! i in if i >= 0 then t else if i > 0 then t + a ! (i-1) else ....
would be wrong, wouldn't it?
This issue has been extensively investigated in the Fortran compiler community; Elliott, Finne and de Moor's ``Compiling Embedded Languages'' (JFP 2003) talks about it at length.
The standard technique to detect occurrences of common subexpressions is so-called hash-consing. There are (OCaml) libraries for it:
author = {Jean-Christophe Filli{\^a}tre and Sylvain Conchon}, title = {Type-Safe Modular Hash-Consing}, pages = {12--19}, crossref = "ml2006", doi = "10.1145/1159876.1159880",
The upcoming distilled tutorial at DSL 2011
http://dsl2011.bordeaux.inria.fr/index.php?option=com_content&view=article&id=2&Itemid=2
will present a Haskell library for hash-consing. The library can work with the standard Haskell Hash-tables or without them (using Data.Map, for example). The library does _not_ rely on Stable names and other internal GHC operations with unstable semantics. The library will find all common sub-expressions.
Incidentally, despite the Lisp-sounding name, hash-consing was invented before Lisp. It was described, for the English audience, in the first volume of Comm. ACM, in 1958:
@Article{ Ershov-hash-consing, author = {A. P. Ershov}, title = {On programming of arithmetic operations}, journal = "Communications of the {ACM}", year = 1958, volume = 1, number = 8, pages = {3--6}, doi ="10.1145/368892.368907", url = "http://portal.acm.org/citation.cfm?id=368907" }
The translation is quite accurate, as far as I could see, but misses the flowcharts and the diagram of the original paper. This short paper fully describes what we now call hash tables, hash functions, useful properties of hash functions, and hash-consing. The article analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.
The library to be presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described (see the preface to the English translation of Ershov's paper). Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.

On 8/12/11 12:52 PM, Anton Kholomiov wrote:
Just to make it explicit, is it
\a i -> let t = a ! i in if i>= 0 then t else if i> 0 then t + a ! (i-1) else ....
bad idea, because of last else-case? Can it be mended with one another pass for if-expressions?
Lifting the computation out of the guards preserves the semantics of the original program only if: (a) you can guarantee that it has no observable effects (i) this clearly precludes nontermination (ii) this also precludes it taking any non-zero measurable length of time to compute (or memory, or any other observable resource) or (b) you can guarantee that the computation is executed in every possible execution path from the point to which the computation is lifted (including all exceptional paths); moreover, you can guarantee that all observable effects generated prior to the original sites of the computation are identical along all paths. The reason for including (a)(ii) is that, if the computation takes a non-zero measurable length of time, then we can detect a difference between the original and the modified program whenever the computation does not adhere to condition (b). This is notable for performance reasons, but, more importantly, it's critical for the semantics of security. The relative time taken by different branches of a computation is an observable effect which could allow the informational content of "secret" variables[1] to leak out and be discovered. It is only valid to ignore (a)(ii) when the semantics you're preserving explicitly exclude concerns with program execution time, memory, security, etc., by assuming/pretending that these things aren't observable. [1] In this case, information about the value of i, and possibly additional things in the trailing else case. -- Live well, ~wren

in "let t= a!i" a!i might be out of bounds ?
On Fri, Aug 12, 2011 at 9:52 AM, Anton Kholomiov
Just to make it explicit, is it
\a i ->
let t = a ! i in if i >= 0 then t else if i > 0 then t + a ! (i-1) else ....
bad idea, because of last else-case? Can it be mended with one another pass for if-expressions?
The upcoming distilled tutorial at DSL 2011 - thank you for the link.
I'm going to experiment with data-reify, while the library you've mentioned is OCaml only.
2011/8/12
I guess you mean the function that converts an abstract syntax tree to a directed acyclic graph (DAG).
Just for completeness I should mention that if the object language has effects including non-termination, one has to be careful eliminating seemingly common expressions. For example, in
\a i -> if i >= 0 then a ! i else if i > 0 then a ! i + a ! (i-1) else ....
we see the expression (a ! i) repeated in both branches of the conditional. Eliminating the `duplicate' by pulling it out
\a i -> let t = a ! i in if i >= 0 then t else if i > 0 then t + a ! (i-1) else ....
would be wrong, wouldn't it?
This issue has been extensively investigated in the Fortran compiler community; Elliott, Finne and de Moor's ``Compiling Embedded Languages'' (JFP 2003) talks about it at length.
The standard technique to detect occurrences of common subexpressions is so-called hash-consing. There are (OCaml) libraries for it:
author = {Jean-Christophe Filli{\^a}tre and Sylvain Conchon}, title = {Type-Safe Modular Hash-Consing}, pages = {12--19}, crossref = "ml2006", doi = "10.1145/1159876.1159880",
The upcoming distilled tutorial at DSL 2011
http://dsl2011.bordeaux.inria.fr/index.php?option=com_content&view=article&id=2&Itemid=2
will present a Haskell library for hash-consing. The library can work with the standard Haskell Hash-tables or without them (using Data.Map, for example). The library does _not_ rely on Stable names and other internal GHC operations with unstable semantics. The library will find all common sub-expressions.
Incidentally, despite the Lisp-sounding name, hash-consing was invented before Lisp. It was described, for the English audience, in the first volume of Comm. ACM, in 1958:
@Article{ Ershov-hash-consing, author = {A. P. Ershov}, title = {On programming of arithmetic operations}, journal = "Communications of the {ACM}", year = 1958, volume = 1, number = 8, pages = {3--6}, doi ="10.1145/368892.368907", url = "http://portal.acm.org/citation.cfm?id=368907" }
The translation is quite accurate, as far as I could see, but misses the flowcharts and the diagram of the original paper. This short paper fully describes what we now call hash tables, hash functions, useful properties of hash functions, and hash-consing. The article analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.
The library to be presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described (see the preface to the English translation of Ershov's paper). Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Anton Kholomiov
-
oleg@okmij.org
-
Vinod Grover
-
wren ng thornton