I love purity, but it's killing me.

older
Re: [Haskell-cafe] please comment...

Tom Hawkins

8 Feb 2008 8 Feb '08

6:33 a.m.

I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution. My DSLs invariably define a datatype to capture expressions; something like this: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL. The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example: let a = Add (Constant 10) (Variable "i1") b = Sub (Variable "i2") (Constant 2) c = Add a b would compile to a C program that may look like this: a = 10 + i1; b = i2 - 2; c = a + b; The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case "a", then "b", then "c". The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example: let d = Add c c e = Add d d -- "e" now as 16 leaf nodes. As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years. Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each "let" expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back. Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-( Thanks for any and all help. -Tom

Show replies by date

Alfonso Acosta

8 Feb 8 Feb

6:57 a.m.

As I pointed out a few days ago in another thread, you can benefit from using Observable sharing [1] Be warned that Observable sharing is a non-conservative extension of Haskell and it breaks referential transparency. [1] http://www.cs.chalmers.se/~koen/pubs/entry-asian99-lava.html On Feb 8, 2008 7:33 AM, Tom Hawkins wrote:

...

I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution.

My DSLs invariably define a datatype to capture expressions; something like this:

data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq

Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL.

The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example:

let a = Add (Constant 10) (Variable "i1") b = Sub (Variable "i2") (Constant 2) c = Add a b

would compile to a C program that may look like this:

a = 10 + i1; b = i2 - 2; c = a + b;

The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case "a", then "b", then "c". The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example:

let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years.

Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each "let" expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back.

Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-(

Thanks for any and all help.

-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Emil Axelsson

7 a.m.

I know of a few of ways to express sharing in a pure language: 1) "Observable sharing", which, in general, is unsafe. http://www.cs.chalmers.se/~koen/pubs/entry-asian99-lava.html 2) Using Template Haskell http://www.dcs.gla.ac.uk/publications/PAPERS/7524/EmbedHDLinTH.ps 3) Matthew Naylor has done some work on "expressible sharing", which has advantages over both methods above. I don't have any reference though... 4) Use a monad (but I'm sure this is what you're trying to avoid). / Emil On 2008-02-08 07:33, Tom Hawkins wrote:

...

I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution.

My DSLs invariably define a datatype to capture expressions; something like this:

data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq

Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL.

The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example:

let a = Add (Constant 10) (Variable "i1") b = Sub (Variable "i2") (Constant 2) c = Add a b

would compile to a C program that may look like this:

a = 10 + i1; b = i2 - 2; c = a + b;

The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case "a", then "b", then "c". The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example:

let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years.

Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each "let" expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back.

Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-(

Thanks for any and all help.

-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Matthew Naylor

10:56 a.m.

Hi, (Warning: longish message!) There is some concern, and rightly so, that observable sharing is dangerous, and that your Haskell program will explode if you use it, and perhaps even that anyone who uses it is "dirty" and should be sent to matron's for a good scrubbing! However, when used "safely", it is not a hack, nor even dirty, but a nice, simple solution to an otherwise nasty problem. Below I define what I mean by "safely". First consider the circumstances under which obserevable sharing is useful. Typically we have some Haskell function "f" that produces a symbolic representation of an expression. For whatever reason, we'd like to *generate a program* that computes the value of this expression, rather that computing it in Haskell. For example, in Lava, we want to generate a VHDL program so that the expression can be computed on, say, an FPGA. In Tom's case, he wants to generate a C program to compute the expression. All perfectly reasonable, and in my opinion, a very powerfull way to use Haskell. Now recall that referential transparency lets you replace equals with equals without changing the *value produced* by a program. Note that it says nothing about preserving *runtime behaviour*. Sharing, for example, may be lost. So if you do equational reasoning on function "f" (above), and loose some sharing, then you can only expect that the same sharing will also be also lost in the generated program. As long as the generated program computes the same result as it did before, referential transparency will be, overall, preserved; it would only be lost intermediately. This is what I mean by "safe". Now, there remains the concern that Haskell's semantics does not enforce sharing. A Haskell compiler is free to change the sharing a program at a whim, unknowingly to the programmer who may be relying on it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction. Clean, a similar language to Haskell, indeed has a semantics based on graphs. So I don't believe Haskell being non-strict (not necessarily lazy) is a good reason for not using observable sharing. Though I do feel better when I compile without -O. :-) Finally, when I say "observable sharing", I don't necessarily mean it as defined by Koen Claessen and David Sands. I simply mean the use of unsafePerformIO to detect sharing, whether or not this is done by an "eq" predicate on Refs. (I say this because I think there are simpler ways to detect sharing, though these will probably not have the nice semantic properties of observable sharing.) Sorry for the somewhat long exposition. :-) Matt.

Henning Thielemann

1:16 p.m.

On Fri, 8 Feb 2008, Matthew Naylor wrote:

...

Now recall that referential transparency lets you replace equals with equals without changing the *value produced* by a program. Note that it says nothing about preserving *runtime behaviour*. Sharing, for example, may be lost. So if you do equational reasoning on function "f" (above), and loose some sharing, then you can only expect that the same sharing will also be also lost in the generated program. As long as the generated program computes the same result as it did before, referential transparency will be, overall, preserved; it would only be lost intermediately. This is what I mean by "safe".

I think there are degrees of observability. If a Haskell library immediately talks to a C library and shares resources generated by the library, then this sharing can be hardly observed and the method is somehow safe. If you generate a C program with Haskell and write it to a disk it can be easily observed and people might rely on a particular resulting C program. If the C program is piped to a C compiler which is immediately run, then sharing can be hardly observed. Even within Haskell sharing is somehow observable, the Haskell program could observe the free memory of the machine and thus it can see a difference between sharing and duplicated objects.

Tim Chevalier

3:02 p.m.

On 2/8/08, Matthew Naylor wrote:

...

it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction. Clean, a similar language to

I'm not sure that programmers ought to be relying on this rule. Sure, all Haskell compilers I know of preserve sharing and do graph reduction. But conventional wisdom is not the same thing as an unwritten rule. Someday, someone might come along and write a Haskell compiler that isn't based on graph reduction and doesn't preserve sharing at the implementation level (while still preserving the informal semantics of Haskell). A programmer who had written code that failed to compile correctly under this hypothetical compiler would be a very naughty Haskell programmer indeed.

...

Haskell, indeed has a semantics based on graphs. So I don't believe

Haskell doesn't have a semantics, graph-based or not... or at least not a formal one, and if not a formal one, I don't know what you mean :-) Cheers, Tim -- Tim Chevalier * http://cs.pdx.edu/~tjc * Often in error, never in doubt "There are no sexist decisions to be made. There are antisexist decisions to be made. And they require tremendous energy and self-scrutiny, as well as moral stamina..." -- Samuel R. Delany

Bertram Felgenhauer

9 Feb 9 Feb

2:34 p.m.

Matthew Naylor wrote: [snip]

...

Finally, when I say "observable sharing", I don't necessarily mean it as defined by Koen Claessen and David Sands. I simply mean the use of unsafePerformIO to detect sharing, whether or not this is done by an "eq" predicate on Refs. (I say this because I think there are simpler ways to detect sharing, though these will probably not have the nice semantic properties of observable sharing.)

ghc actually provides a primop for this: reallyUnsafePtrEquality# :: a -> a -> Int# Use at your own risk. Note that you can only check for equality uing that primop. To detect cycles in data structures efficiently, a total order would be better, but in the presence of copying garbage collection that's asking too much. Bertram

Felipe Lessa

2:46 p.m.

On Feb 9, 2008 12:34 PM, Bertram Felgenhauer wrote:

...

ghc actually provides a primop for this:

reallyUnsafePtrEquality# :: a -> a -> Int#

Use at your own risk.

Why is it more than unsafe? 'unsafePerformIO' seems to me a lot unsafer than 'reallyUnsafePtrEquality#'. Also, is anybody using 'reallyUnsafePtrEquality#' on a working project? Cheers, -- Felipe.

Bertram Felgenhauer

2:49 p.m.

Matthew Naylor wrote: (snip)

...

Now, there remains the concern that Haskell's semantics does not enforce sharing. A Haskell compiler is free to change the sharing a program at a whim, unknowingly to the programmer who may be relying on it in for an efficient program. However, to my knowledge, it is an unwritten rule of Haskell compilers that sharing *is* preserved, and that they do perform *graph* reduction.

That is not true anymore for the threaded runtime of ghc. If two threads demand the same thunk, one of them will usually block, but there is a small window where both threads can start evaluting the expression. To prevent this, you'd have to take a lock or otherwise synchronize the threads upon entering each thunk, which is prohibitively expensive. See "Haskell on a Shared-Memory Multiprocessor", http://www.haskell.org/~simonmar/papers/multiproc.pdf for details, section 3.1 in particular. Bertram

Tom Hawkins

8 Feb 8 Feb

11:28 p.m.

On 2/8/08, Emil Axelsson wrote:

...

I know of a few of ways to express sharing in a pure language:

1) "Observable sharing", which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on "expressible sharing", which has 4) Use a monad (but I'm sure this is what you're trying to avoid).

Or... 5) Forget embedding the DSL, and write a direct compiler. In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime. Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation? And thanks for the link to John's paper describing Hydra's use of Template Haskell. I will definiately consider TH. -Tom

Don Stewart

9 Feb 9 Feb

12:47 a.m.

tomahawkins:

...

On 2/8/08, Emil Axelsson wrote:

...
I know of a few of ways to express sharing in a pure language:

1) "Observable sharing", which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on "expressible sharing", which has 4) Use a monad (but I'm sure this is what you're trying to avoid).

Or...

5) Forget embedding the DSL, and write a direct compiler.

In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime.

Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation?

Taking the output of GHC's intermediate phase, after optimising leaves you with type checked, optimised, 'Core' -- basically lambda calculus with extras. It's a good start if you then want to hand-compile that down. Extract it with -fext-core -- Don

Henning Thielemann

5:20 a.m.

On Fri, 8 Feb 2008, Tom Hawkins wrote:

...

5) Forget embedding the DSL, and write a direct compiler.

In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime.

Also in a pure Haskell library you will try to avoid direct access to constructors, because the internal data structures might change. Better are functions that access the internal data of a type, like 'maybe' and 'either' for 'Maybe' and 'Either', respectively.

Matthew Naylor

7:07 p.m.

Hi Tom,

...

In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime.

you can embed algebraic data types and pattern matching in Haskell with your own semantics, and retain type inference. It goes something like this: (nil, (|>)) = datatype (cons0 [] \/ cons2 (:)) map f xs = match xs rules where rules (x, xs) = [ nil --> nil , x |> xs --> f x |> map f xs ] here, map :: (Term a -> Term b) -> Term [a] -> Term [b]. The main issue is that you have to quantify the free variables in patterns, hence the "rules" function. I don't know if this helps you.

...

Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation?

The Yhc.Core library is very simple to use and fairly mature (Neil's been tweeking it for about 3 years now), but it doesn't meet your type-annotated requirement. (Untyped core is still pretty useful, though.) If you go the real compiler route, would it not make sense to take the DSL as the source language rather than Haskell? Or are the DSL and Haskell quite similar? Or perhaps you are thinking of a two language system, where some code is evaluated at compile time by Haskell, and some is compiled to the target language? If so, maybe you just want domain specific syntax inside a Haskell program, in which case the paper "Why it's nice to be quoted: quasiquoting for haskell" might be relevant (it's actually supported in GHC I think). Anyway, all very thought provoking! Matt. P.S. Tom Hawkins wrote:

...

Emil Axelsson wrote:

...
I know of a few of ways to express sharing in a pure language:

1) "Observable sharing", which, in general, is unsafe. 2) Using Template Haskell 3) Matthew Naylor has done some work on "expressible sharing", which has 4) Use a monad (but I'm sure this is what you're trying to avoid).

Or...

5) Forget embedding the DSL, and write a direct compiler.

Taking options 2 or 5 just to solve the sharing problem sounds to me like a lot of hard work for little reward. But don't worry, I won't repeat my observable sharing speech. :-)

Tom Hawkins

10 Feb 10 Feb

6:45 a.m.

Hi Matt, On Feb 9, 2008 1:07 PM, Matthew Naylor wrote:

...

If you go the real compiler route, would it not make sense to take the DSL as the source language rather than Haskell? Or are the DSL and Haskell quite similar?

The two are nearly identical. In fact the only significant difference between the languages is the semantics of top level monad; it wouldn't be IO, but something else. With the syntax the same, it could leverage much of Haskell's standard library.

...

Or perhaps you are thinking of a two language system, where some code is evaluated at compile time by Haskell, and some is compiled to the target language?

Not necessarily in the same compilation flow, but I can think of several scenarios where it would be advantageous for code written in this other language to be pulled into a conventional Haskell program.

...

Taking options 2 or 5 just to solve the sharing problem sounds to me like a lot of hard work for little reward. But don't worry, I won't repeat my observable sharing speech. :-)

So is the general strategy with observable sharing to use unsafePerformIO with Data.Unique to label expressions at construction? Ahh...clever! I did not think of this. Of course, now that you have me reading up on Yhc.Core, option #5 is looking considerably more fun. -Tom

Matthew Naylor

6:52 p.m.

Hi Tom,

...

So is the general strategy with observable sharing to use unsafePerformIO with Data.Unique to label expressions at construction?

something like that, yes. Basically, you just need: {-# NOINLINE ref #-} ref x = unsafePerformIO (newIORef x) and you can write expressions like ref False == ref False and let x = ref False in x == x However, while referential equality is enough for sharing detection, I *suspect* it's simpler to use the fact that refs are IORefs and you can read and write them (in the IO monad). So a very simple Lava might look like module Lava (Bit,Netlist,low,high,nand2,netlist) where import Data.IORef import System.IO.Unsafe {-# NOINLINE ref #-} ref x = unsafePerformIO (newIORef x) type Ref = IORef (Maybe Int) data Bit = Gate String Ref [Bit] type Netlist = [(String, Int, [Int])] -- gate, output, inputs low = Gate "low" (ref Nothing) [] high = Gate "high" (ref Nothing) [] nand2 (a, b) = Gate "nand2" (ref Nothing) [a, b] netlist :: Bit -> IO Netlist netlist x = do i <- newIORef (0 :: Int) ; f i x where f i (Gate str r xs) = do val <- readIORef r num <- readIORef i case val of Nothing -> do writeIORef r (Just num) writeIORef i (num+1) rest <- mapM (f i) xs let is = map ((\(g,o,is) -> o) . head) rest return ((str,num,is):concat rest) Just j -> return [("indirection",j,[])] -- explicit sharing! Indirections can be filtered out at the end, they don't actually give the netlist any information.

...

Of course, now that you have me reading up on Yhc.Core, option #5 is looking considerably more fun.

Yeah, I think Yhc.Core is pretty nifty too. Thank Neil! Matt.

Alfonso Acosta

9 Feb 9 Feb

11:55 p.m.

On Feb 9, 2008 12:28 AM, Tom Hawkins wrote:

...

5) Forget embedding the DSL, and write a direct compiler.

In addition to the sharing problem, another shortcoming of Haskell DSLs is they can not fully exploit the benefits of algebraic datatypes. Specifically, pattern matching ADTs can only be used to control the compile-time configuration of the target, it can't be used to describe the target's behavior -- at least for DSLs that generate code that executes outside of Haskell's runtime.

Only partly true. Probably you are not aware of them (I myself learned about its existence a few days ago) but pattern quasiquoting (available in GHC's HEAD) can be used for that. http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.ht...

...

Writing a real compiler would solve both of these problems. Is there any Haskell implementation that has a clean cut-point, from which I can start from a fully type-checked, type-annotated intermediate representation?

If you have to write a compiler why not define a language which fits better with the semantics of the embedded language instead of using plain Haskell? The approach you propose has the disadvantages of both the embedded and the standalone languages. On one hand you have to stick with the syntax of the host language which may not fit with your exact semantical requirements and, on the other hand, you cannot take advantage of all the existing machinery around the host language (you have to code your own compiler). Furthermore, the first citizen status of functions make it impossible (or really difficult at least) to compile EDSL descriptions avoiding runtime and simply applying a static analysis approach (using Core or plain Haskell as input).

...

And thanks for the link to John's paper describing Hydra's use of Template Haskell. I will definiately consider TH.

Well, TH would be one of those static analysis approaches. Actually, O'Donell's implementation (which uses an outdated version of Template Haskell) only works with a small Haskell subset. So using TH you'd probably be changing the host language anyhow. Furthermore, the TH approach consists in adding node labels by preprocessing the EDSL description, making sharing observable. That makes the original EDSL description inpure. The only difference is that side effects are added by preprocessing instead of using runtime unsafe functions. ---- Some pointers covering the topic: [1] and [2] summarize what are the alternatives to observe sharing in Haskell whereas [3] compares the embedded approach vs standalone approach and advocates the last one. 1) http://www.imit.kth.se/~ingo/MasterThesis/ThesisAlfonsoAcosta2007.pdf (section 2.4.1 and 3.1) 2) http://www.cs.um.edu.mt/svrg/Papers/csaw2006-01.pdf (section 3) 3) http://web.cecs.pdx.edu/~sheard/papers/secondLook.ps

Bulat Ziganshin

8 Feb 8 Feb

7:09 a.m.

Hello Tom, Friday, February 8, 2008, 9:33:35 AM, you wrote:

...

The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution.

in similar situation, i've added hash field to each node, initialized by smart constructor: data Expression = Add Hash Expression Expression | ... type Hash=Int add x y = Add (x*y+1234567) x y ... -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Henning Thielemann

8:16 a.m.

On Fri, 8 Feb 2008, Tom Hawkins wrote:

...

I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution.

It seems to become a FAQ. I think all DSLs suffer from the same problems: sharing and recursion. I've used wrappers for CSound, SuperCollider, MetaPost, they all have these problems.

Chung-chieh Shan

13 Feb 13 Feb

7:55 a.m.

Henning Thielemann wrote in article in gmane.comp.lang.haskell.cafe:

...

It seems to become a FAQ. I think all DSLs suffer from the same problems: sharing and recursion. I've used wrappers for CSound, SuperCollider, MetaPost, they all have these problems.

What do you mean by the "recursion" problem? Sometimes (or perhaps even often), sharing in an EDSL can be expressed in two ways. First, to reuse a -value- in the embedded language, you could introduce a "let" construct in the embedded language. let_ expr body = body expr Second, to reuse an -expression- in the embedded language, if your interpreter is compositional (here by "interpreter" I include a compiler, and by "compositional" I mean a fold), then you can represent an embedded expression simply as its interpretation. add x y = x + y let expr = add x y in add expr expr Jacques Carette, Oleg Kiselyov, and I have been exploring this "final" representation. http://okmij.org/ftp/Computation/tagless-typed.html -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig I am a signature virus. Put me in your signature.

Paul L

27 May 27 May

7:58 a.m.

Let-expression in the EDSL indeed solves the sharing problem, but only partially. Recursion appears when you have a leaf node pointing back to the root node or another branch and forming a cyclic graph in the data structure. It is often desirable to recover cyclic sharing when showing/reading/interpretating EDSL programs. One possible solution is to further introduce a fixed point data constructor, a Rec or even LetRec to explicitly capture cycles. But then you still incur much overheads interpreting them, and syntax wise it just gets more and more complicated to the point that turning the EDSL into a DSL (or even a preprocessor with your own lexer and parser) becomes more attractive. Another alternative is to express the EDSL as Monad/MonadFix, or Arrows/ArrowLoop. There are still interpretive overheads, but at the very least they could help with the syntax. The tagless paper is really nice, but I doubt it offers solutions to the (cyclic) sharing problem. On 2/13/08, Chung-chieh Shan wrote:

...

Henning Thielemann wrote in article in gmane.comp.lang.haskell.cafe:

...
It seems to become a FAQ. I think all DSLs suffer from the same problems: sharing and recursion. I've used wrappers for CSound, SuperCollider, MetaPost, they all have these problems.

What do you mean by the "recursion" problem?

Sometimes (or perhaps even often), sharing in an EDSL can be expressed in two ways. First, to reuse a -value- in the embedded language, you could introduce a "let" construct in the embedded language.

let_ expr body = body expr

Second, to reuse an -expression- in the embedded language, if your interpreter is compositional (here by "interpreter" I include a compiler, and by "compositional" I mean a fold), then you can represent an embedded expression simply as its interpretation.

add x y = x + y let expr = add x y in add expr expr

Jacques Carette, Oleg Kiselyov, and I have been exploring this "final" representation. http://okmij.org/ftp/Computation/tagless-typed.html

-- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig I am a signature virus. Put me in your signature.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Regards, Paul Liu Yale Haskell Group http://www.haskell.org/yale

Paul L

8:17 a.m.

BTW, I doubt the (cyclic) sharing problem relates that much to purity, because in an impure language (or the unsafe observable sharing), you still have to remember whether something has been traversed or not and in the worst case accumulates everything that's been traversed so far before releasing all of them in the end. If you remember the history by mutating some states, e.g., using a dirty tag, you lose the ability to do simultaneous traversals. Adding a simple indirect reference only to the places where sharing is needed (and thus making it explicit) could alleviate this problem. But this solution exists in both pure and impure languages. So let's love purity still :-) -- Regards, Paul Liu Yale Haskell Group http://www.haskell.org/yale

Chung-chieh Shan

7 Jun 7 Jun

1:48 a.m.

On 2009-05-27T03:58:58-0400, Paul L wrote:

...

One possible solution is to further introduce a fixed point data constructor, a Rec or even LetRec to explicitly capture cycles. But then you still incur much overheads interpreting them,

I don't understand this criticism -- what interpretive overhead do you mean? Certainly the Rec/LetRec encoding is pretty efficient for one object language with cycles, namely the lambda calculus with Rec or LetRec. :) One concrete way for you to explain what interpretive overhead you mean, if it's not too much trouble, might be to compare a Rec/LetRec encoding of a particular object language to another encoding that does not have the interpretive overhead you mean and is therefore more efficient. -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig We want our revolution, and we want it now! -- Marat/Sade We want our revolution, and we'll take it at such time as you've gotten around to delivering it -- Haskell programmer

Paul L

9 Jun 9 Jun

5:24 a.m.

Interpreting lambda calculus is neither cheap or efficient, otherwise we wouldn't all be using compilers :-) By "interpretive overhead" of adding Let/Rec/LetRec to an object language I mean the need to introduce variables, scoping, and environment (mapping variables to either values or structures they bind to) during interpretations, which are otherwise not needed in the object language. I can't show you how I can do better because I don't have a solution. The open question is whether there exists such a solution that's both elegant and efficient at maintain proper sharing in the object language. We certainly can get rid of all interpretive overheads by either having a "tagless" interpreter (as in Oleg and Shan's paper), or by direct compilation. But so far I don't see how a tagless interpreter could handle sharing when it can't be distinguished in the host language. One would argue that the overhead of variables (and the environment associated with them) can be avoided by having a higher order syntax, but that has its own problem. Let me illustrate with a data structure that uses higher order Rec. data C a = Val a | ... | Rec (C a -> C a) val :: C a -> a val (Val x) = x val ... val (Rec f) = val (fix f) where fix f = f (fix f) update :: C a -> C a update (val x) = ... update ... update (Rec f) = Rec (\x -> ...) The problem is right there in the creation of a new closure during update (Rec f). Haskell would not evaluate under lambda, and repeated updates will inevitably result in space and time leaks. -- Regards, Paul Liu Yale Haskell Group http://www.haskell.org/yale On 6/6/09, Chung-chieh Shan wrote:

...

On 2009-05-27T03:58:58-0400, Paul L wrote:

...
One possible solution is to further introduce a fixed point data constructor, a Rec or even LetRec to explicitly capture cycles. But then you still incur much overheads interpreting them,

I don't understand this criticism -- what interpretive overhead do you mean? Certainly the Rec/LetRec encoding is pretty efficient for one object language with cycles, namely the lambda calculus with Rec or LetRec. :)

One concrete way for you to explain what interpretive overhead you mean, if it's not too much trouble, might be to compare a Rec/LetRec encoding of a particular object language to another encoding that does not have the interpretive overhead you mean and is therefore more efficient.

-- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig We want our revolution, and we want it now! -- Marat/Sade We want our revolution, and we'll take it at such time as you've gotten around to delivering it -- Haskell programmer

Chung-chieh Shan

11:56 p.m.

Paul L wrote in article <856033f20906082224s2b7d5391gdc7a4ed913004639@mail.gmail.com> in gmane.comp.lang.haskell.cafe:

...

The open question is whether there exists such a solution that's both elegant and efficient at maintain proper sharing in the object language.

What is your criterion for "efficient"?

...

We certainly can get rid of all interpretive overheads by either having a "tagless" interpreter (as in Oleg and Shan's paper), or by direct compilation.

(BTW, the paper is by Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan.)

...

But so far I don't see how a tagless interpreter could handle sharing when it can't be distinguished in the host language.

Indeed, I would agree with those on this thread who have stated that sharing should be distinguished in the host language. -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig We want our revolution, and we want it now! -- Marat/Sade We want our revolution, and we'll take it at such time as you've gotten around to delivering it -- Haskell programmer

Conal Elliott

26 May 26 May

11:49 p.m.

Hi Tom, I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for. What's your latest wisdom about CSE in DSELs? Thanks, - Conal On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins wrote:

...

I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution.

My DSLs invariably define a datatype to capture expressions; something like this:

data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq

Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL.

The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example:

let a = Add (Constant 10) (Variable "i1") b = Sub (Variable "i2") (Constant 2) c = Add a b

would compile to a C program that may look like this:

a = 10 + i1; b = i2 - 2; c = a + b;

The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case "a", then "b", then "c". The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example:

let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years.

Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each "let" expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back.

Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-(

Thanks for any and all help.

-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

27 May 27 May

12:23 a.m.

Conal Elliott wrote:

...

Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What's your latest wisdom about CSE in DSELs?

Thanks, - Conal

One common trick that Tom didn't seem to mention in the 2008-02-07T23:33 post is hash cons'ing. Given a perfect hash function, traverse the term bottom-up storing each (hash,subterm) pair in a memo table and replacing the subterm by its hash. Once that's done, equality checks are trivial, and the memotable can be converted to SSA rather easily. This works best if you amortize the memoization by doing it with smart constructors, so that you don't need to worry about the exponential duplication of work for expressions with DAGy structure sharing in the Haskell. Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly). Maybe this is the too-slow too-complex solution you're using already? -- Live well, ~wren

Conal Elliott

2:59 p.m.

Hi Wren, I considered the idea of hashing, but not *perfect* hashing. I don't know how to hash perfectly with something like expressions, which have infinitely many values. Since it's stateful, that means the smart constructors may need to be in an

...

appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly).

Hm -- stateful? Unless I'm misunderstanding, a stateful & monadic/applicative approach would break the simple functional interface I'm going for. Could well be I haven't formed a mental picture that matches yours. - Conal On Tue, May 26, 2009 at 5:23 PM, wren ng thornton wrote:

...

Conal Elliott wrote:

...
Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What's your latest wisdom about CSE in DSELs?

Thanks, - Conal

One common trick that Tom didn't seem to mention in the 2008-02-07T23:33 post is hash cons'ing.

Given a perfect hash function, traverse the term bottom-up storing each (hash,subterm) pair in a memo table and replacing the subterm by its hash. Once that's done, equality checks are trivial, and the memotable can be converted to SSA rather easily.

This works best if you amortize the memoization by doing it with smart constructors, so that you don't need to worry about the exponential duplication of work for expressions with DAGy structure sharing in the Haskell. Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly).

Maybe this is the too-slow too-complex solution you're using already?

-- Live well, ~wren

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

6:33 p.m.

Conal Elliott wrote:

...

Hi Wren,

I considered the idea of hashing, but not *perfect* hashing. I don't know how to hash perfectly with something like expressions, which have infinitely many values.

An imperfect hash can work. You'll need a memo table with a source of unique symbols (e.g. storing the next unused integer) in order to, effectively, make the "hash" function perfect[1]. If you have a source of unique symbols then you can also use a trie, Data.Map, or similar in lieu of a hash map. In a language with pointers (or stable names), the pointer is often used as the "hash" in conjunction with using the memo table as an intern table for smart constructors. Thus, client code can never observe that structurally equal expressions could have different hashes. [1] hash :: HashState m => Expr -> m Hash hash e = case lookupHS e of Just h -> return h Nothing -> do h <- nextH insertHS e h return h

...

...
Since it's stateful, that means the smart constructors may need to be in an appropriate monad/applicative for passing the memo table around (some hash functions may not need to store the table explicitly).

Hm -- stateful? Unless I'm misunderstanding, a stateful & monadic/applicative approach would break the simple functional interface I'm going for. Could well be I haven't formed a mental picture that matches yours.

Er, it's only stateful for the versions above that use pointers or a source of unique symbols (since they need to maintain a memo table). If you can come up with a perfect hash function[2], then there's no need to create/store the memo table at all, since it can be reconstructed on the fly. Since perfect hashing often isn't feasible, the stateful approximations to a perfect hash function are generally used. Sorry if I was unclear. If you don't mind unsafePerformIO (or similar hacks) then you can hide the state from the type system by using the same table for the whole program. Generally for hash cons'ing you want your tables to be as large as they can be (to maximize sharing) so this shouldn't be problematic. However, for languages with scoping it can be beneficial to use separate tables to recognize when expressions need to be recomputed; so the global store might want to be something like a stack of memo tables with fall-through lookup. I believe Applicative is powerful enough to capture the sort of state passing needed since the client code can't ever make decisions based on the state. So with smart constructors (to package up the <*> etc) I'd think you should be able to have an EDSL that looks nice, just with a more complicated type. Perhaps the issues are with mixing pure Haskell functions into the EDSL? ... The real trick behind hash cons'ing is everywhere substituting the "hash" in for the sub-expression, effectively flattening all expressions into a single ply. Thus, expression constructors "cons the hashes" rather than cons'ing expressions. It's similar in spirit to trie'ing, but from the bottom up in the same way that dynamic programming is done. The reason for wanting to do the hashing in smart constructors, as opposed to at the end, is to maximize the benefit of dynamic programming. If all client-visible expressions are represented by hashes, then any structure sharing in the Haskell layer is sharing the hash representation, thus you don't need to traverse the shared substructure multiple times. (If you hand construct equal expressions without sharing, then you'll have to traverse each expression to prove that they're equal, but you can use that proof (the hashes) thenceforth). For host languages with destructive updates (like Smalltalk's "become"), you can rewrite the subterms as you traverse them, so doing it at the end isn't too bad. If you only expose smart constructors then your Expr type can "recurse" as whatever Hash type. If you do the hashing at the end, then you'll need to define a catamorphism on Expr. ... This is probably similar to what you're doing in Pan, Vertigo, and Pajama (I haven't read it). The general technique is elegant in its simplicity, and it's not uncommon. Though, like most dynamic programming tricks, it seems not to be as widely known as I would assume, so I thought I'd mention it in case you've missed it. [2] Into any domain of terms that can quickly answer (==), namely flat terms like Integer. Using a bounded type like Int can give better performance guarantees, but there's only so many of them. -- Live well, ~wren

Tom Hawkins

12:24 a.m.

On Tue, May 26, 2009 at 6:49 PM, Conal Elliott wrote:

...

Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What's your latest wisdom about CSE in DSELs?

I wasn't able to find a solution that offered both performance and elegance, so I changed the fundamental operation of the DSL (in this case, atom). When atom was still a hardware description language, the compiler would combine several user defined expressions together resulting in very wide and deep expression trees, resulting in the same problem you are observing. But when I switch the target of atom from HDL to C, the compiler no longer needed to perform the same expression expansion. And since the user defined expressions are generally shallow -- at least in the case of my applications -- atom is able to get away with exhaustive equality comparison (deriving Eq). Sorry I can't be of more help. -Tom

Sebastiaan Visser

10:15 a.m.

On May 27, 2009, at 1:49 AM, Conal Elliott wrote:

...

Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What do you mean with `exponential behavior'? Exponential related to what? For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions. Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script. So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example:

...

let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

But again, I might be missing some important point here.

...

What's your latest wisdom about CSE in DSELs?

Thanks, - Conal

On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins wrote:

...
...

-- Sebastiaan Visser (warning: messy code) [1] http://github.com/sebastiaanvisser/frp-js/blob/b4f37d3b564c4932a3019b9b580e6...

Sittampalam, Ganesh

10:51 a.m.

Sebastiaan Visser wrote:

...

On May 27, 2009, at 1:49 AM, Conal Elliott wrote:

...
Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What do you mean with `exponential behavior'? Exponential related to what?

For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions.

Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script.

So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example:

...
let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

But again, I might be missing some important point here.

That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/ =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================

Sebastiaan Visser

11:27 a.m.

On May 27, 2009, at 12:51 PM, Sittampalam, Ganesh wrote:

...

Sebastiaan Visser wrote:

...
...

But again, I might be missing some important point here.

That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once.

In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem.

Ganesh

[1] http://www.earth.li/~ganesh/research/paradise-icfp08/

Thanks, I just pushed your paper on top of my stack. -- Sebastiaan Visser

Conal Elliott

3:14 p.m.

...

In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem.

Plus unsafePerformIO and weak references as in *Stretching the storage manager: weak pointers and stable names in Haskellhttp://citeseer.ist.psu.edu/peytonjones99stretching.html *? Lacking a more elegant alternative, that's what I'll probably do again, as in Pan, Vertigo, and Pajama. - Conal On Wed, May 27, 2009 at 3:51 AM, Sittampalam, Ganesh < ganesh.sittampalam@credit-suisse.com> wrote:

...

Sebastiaan Visser wrote:

...
On May 27, 2009, at 1:49 AM, Conal Elliott wrote:

...
Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What do you mean with `exponential behavior'? Exponential related to what?

For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions.

Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script.

So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example:

...
let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

But again, I might be missing some important point here.

That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once.

In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem.

Ganesh

[1] http://www.earth.li/~ganesh/research/paradise-icfp08/http://www.earth.li/%7Eganesh/research/paradise-icfp08/

=============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html

===============================================================================

Sittampalam, Ganesh

4:38 p.m.

Yes, though we don't bother with weak pointers as we only keep the stable names map around for the duration of CSE so there's no ongoing memory leak issue. ________________________________ From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Conal Elliott Sent: 27 May 2009 16:14 To: Sittampalam, Ganesh Cc: Haskell Cafe Subject: Re: [Haskell-cafe] I love purity, but it's killing me. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Plus unsafePerformIO and weak references as in Stretching the storage manager: weak pointers and stable names in Haskell http://citeseer.ist.psu.edu/peytonjones99stretching.html ? Lacking a more elegant alternative, that's what I'll probably do again, as in Pan, Vertigo, and Pajama. - Conal On Wed, May 27, 2009 at 3:51 AM, Sittampalam, Ganesh wrote: Sebastiaan Visser wrote: > On May 27, 2009, at 1:49 AM, Conal Elliott wrote: >> Hi Tom, >> >> I've been working on another code-generating graphics compiler, >> generating GPU code. As always, I run into the problem of efficient >> common subexpression elimination. In Pan, Vertigo & Pajama, I used >> lazy memoization, using stable pointers and weak references, to avoid >> the worst-case-exponential behavior you mention below. I'm now using >> a bottom-up CSE method that's slower and more complicated than I'm >> going for. > > What do you mean with `exponential behavior'? Exponential related to > what? > > For my FRP EDSL to JavaScript (toy) compiler[1] I've been > implementing CSE as well. I traverses the expression tree recursively > and creates an small intermediate language containing id's (pointers) > to expressions instead of real sub-expressions. > > Maybe (probably) I am very naive, but I think this trick takes time > linear to the amount of sub-expressions in my script. When using a > trie instead of a binary tree for the comparisons there should be no > more character (or atomic expression) comparisons that the amount of > characters in the script. > > So the problem seems not to be CSE algorithm, but the fact that EDSL > itself tends to blow up because it is hosted in Haskell. Like Tom's > example: > > > let d = Add c c > > e = Add d d -- "e" now as 16 leaf nodes. > > But again, I might be missing some important point here. That's exactly right. But it's pretty inconvenient to have your expression tree to blow up exponentially in relation to the code the user actually wrote! You can indeed construct an intermediate language that collapses this blowup, but the pass to create it must take exponential time if written completely purely, since it has to visit everything at least once. In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem. Ganesh [1] http://www.earth.li/~ganesh/research/paradise-icfp08/ http://www.earth.li/%7Eganesh/research/paradise-icfp08/ ======================================================================== ======= Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ======================================================================== ======= =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================

Heinrich Apfelmus

29 May 29 May

8:08 a.m.

Conal Elliott wrote:

...

Sittampalam, Ganesh wrote

...
In my experience [1], observable sharing using GHC's stable names is a pretty effective solution to this problem.

Plus unsafePerformIO and weak references as in *Stretching the storage manager: weak pointers and stable names in Haskellhttp://citeseer.ist.psu.edu/peytonjones99stretching.html *?

Lacking a more elegant alternative, that's what I'll probably do again, as in Pan, Vertigo, and Pajama.

Note that the lack of a more elegant alternative, i.e. one that avoids unsafePerformIO for observing sharing, is not a random inconvenience but an unavoidable consequence of embedding a DSL into a host language. In other words, there is a fundamental problem here and unsafePerformIO is but the usual duct tape for inadequately fixing fundamental problems. The problem is that there are two types of let expressions, the one of the host language and the one of the embedded language. For instance, consider the example let a = Add 2 3 :: Expr b = Add a a :: Expr Replacing equals with equals, this is the same as Add (Add 2 3) (Add 2 3) :: Expr and there is no sharing in the embedded language. But I argue that this is a good thing, for sharing in the embedded language should be done with let expressions of the embedded language, like for example Let ['a' := Add 2 3, 'b' := Add (Var 'a') (Var 'a') ] :: Expr Clearly, these are two different expressions of the embedded language, even if one is an optimization of the other! One could say that the let of the host language is a shorthand notation for constructing large Expr and only the Let of the embedded language can express sharing inside the embedded language. As soon as we try to make the former synonymous to the latter, our ability to use it as a shorthand notation is gone and it now becomes *impossible* to represent Add (Add 2 3) (Add 2 3) :: Expr as let a = Add 2 3 :: Expr b = Add a a :: Expr Whether this is desirable or not is irrelevant; the host language Haskell will rebel at this. An analogous example would be the two fixed points of a MonadFix , namely the "internal" (embedded) fixed point fixInternal :: (a -> m a) -> m a fixInternal = mfix and the "external" (host) fixed point fixExternal :: (a -> m a) -> m a fixExternal = \f -> fix (>>= f) (Perhaps not surprisingly, MonadFix was first discovered/used when designing the DSL Lava, if I am informed correctly.) Regards, apfelmus -- http://apfelmus.nfshost.com

Conal Elliott

27 May 27 May

3:06 p.m.

...

What do you mean with `exponential behavior'? Exponential related to what?

I mean that the size of the observable tree can be exponential in the size of the unobservable dag representation. So the problem seems not to be CSE algorithm, but the fact that EDSL itself

...

tends to blow up because it is hosted in Haskell.

In other words, the tree size blows up, and hosting in pure Haskell doesn't allow us to examine the compact dag. Are we on the same track now? - Conal On Wed, May 27, 2009 at 3:15 AM, Sebastiaan Visser wrote:

...

On May 27, 2009, at 1:49 AM, Conal Elliott wrote:

...
Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What do you mean with `exponential behavior'? Exponential related to what?

For my FRP EDSL to JavaScript (toy) compiler[1] I've been implementing CSE as well. I traverses the expression tree recursively and creates an small intermediate language containing id's (pointers) to expressions instead of real sub-expressions.

Maybe (probably) I am very naive, but I think this trick takes time linear to the amount of sub-expressions in my script. When using a trie instead of a binary tree for the comparisons there should be no more character (or atomic expression) comparisons that the amount of characters in the script.

So the problem seems not to be CSE algorithm, but the fact that EDSL itself tends to blow up because it is hosted in Haskell. Like Tom's example:

...
let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

But again, I might be missing some important point here.

What's your latest wisdom about CSE in DSELs?

...
Thanks, - Conal

On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins wrote:

...
...

-- Sebastiaan Visser

(warning: messy code) [1] http://github.com/sebastiaanvisser/frp-js/blob/b4f37d3b564c4932a3019b9b580e6...

Conal Elliott

3:26 p.m.

I just remembered: Andy Gill has a new paper "Type Directed Observable Sharing" (http://www.ittc.ku.edu/~andygill/paper.php?label=DSLExtract09) that looks very relevant. Abstract: Haskell is a great language for writing and supporting embedded Domain

...

Specific Languages (DSLs). Some form of observable sharing is often a critical capability for allowing so-called deep DSLs to be compiled and processed. In this paper, we describe and explore uses of an IO function for reification which allows direct observation of sharing.

On Tue, May 26, 2009 at 4:49 PM, Conal Elliott wrote:

...

Hi Tom,

I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.

What's your latest wisdom about CSE in DSELs?

Thanks, - Conal

On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins wrote:

...
I've been programming with Haskell for a few years and love it. One of my favorite applications of Haskell is using for domain specific languages. However, after designing a handful of DSLs, I continue to hit what appears to be a fundamental hurdle -- or at least I have yet to find an adequate solution.

My DSLs invariably define a datatype to capture expressions; something like this:

data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq

Using the datatype Expression, it is easy to mass a collections of functions to help assemble complex expressions, which leads to very concise programs in the DSL.

The problem comes when I want to generate efficient code from an Expression (ie. to C or some other target language). The method I use invovles converting the tree of subexpressions into an acyclic graphic to eliminate common subexpressions. The nodes are then topologically ordered and assigned an instruction, or statement for each node. For example:

let a = Add (Constant 10) (Variable "i1") b = Sub (Variable "i2") (Constant 2) c = Add a b

would compile to a C program that may look like this:

a = 10 + i1; b = i2 - 2; c = a + b;

The process of converting an expression tree to a graph uses either Eq or Ord (either derived or a custom instance) to search and build a set of unique nodes to be ordered for execution. In this case "a", then "b", then "c". The problem is expressions often have shared, equivalent subnodes, which dramatically grows the size of the tree. For example:

let d = Add c c e = Add d d -- "e" now as 16 leaf nodes.

As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What's worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years.

Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this problem in OCaml because each "let" expression was mutable, and I could use the physical equality operator to perform fast comparisons. Unfortunately, I have grown to love Haskell's type system and its lack of side effects, and could never go back.

Is there anything that can be done to dramatically speed up comparisons, or is there a better approach I can take to extract common subexpressions? I should point out I have an opportunity to get Haskell on a real industrial application. But if I can't solve this problem, I may have to resort to far less eloquent languages. :-(

Thanks for any and all help.

-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

5871

Age (days ago)

6358

Last active (days ago)

List overview

Download

36 comments

17 participants

participants (17)

Alfonso Acosta
Bertram Felgenhauer
Bulat Ziganshin
Chung-chieh Shan
Conal Elliott
Don Stewart
Emil Axelsson
Felipe Lessa
Heinrich Apfelmus
Henning Thielemann
Matthew Naylor
Paul L
Sebastiaan Visser
Sittampalam, Ganesh
Tim Chevalier
Tom Hawkins
wren ng thornton