What *is* a DSL?

What is a DSL?

older
cabal haddock hpc, exposed modules?

Günther Schmidt

7 Oct 2009 7 Oct '09

3:10 p.m.

Hi all, for people that have followed my posts on the DSL subject this question probably will seem strange, especially asking it now. I have read quite a lot lately on the subject, most of it written by the great old ones, (come on guys you know whom I mean :)). What I could gather from their papers was, that a DSL is basically something entirely abstract as such, ie. it allows you build and combine expressions in a language which is specific for your problem domain. Irregardless of further details on how to do that, and there are quite a few, the crux as such is that they are abstract of "meaning". The meaning depends how you *evaluate* the expression, which can be in more than merely one way, which is where, as far as I understand it, the true power lies. So, you might wonder, since I figured it out this far, why ask what a DSL is? Because out there I see quite a lot of stuff that is labeled as DSL, I mean for example packages on hackage, quite useuful ones too, where I don't see the split of assembling an expression tree from evaluating it, to me that seems more like combinator libraries. Thus: What is a DSL? Günther

Show replies by date

Emil Axelsson

7 Oct 7 Oct

3:24 p.m.

Hi, A DSL is just a domain-specific language. It doesn't imply any specific implementation technique. An *embedded* DSL is a library implemented in a more general language, which has been designed to give the "feeling" of a stand-alone language. Still nothing about implementation. A *shallow embedding* of a DSL is when the "evaluation" is done immediately by the functions and combinators of the DSL. I don't think it's possible to draw a line between a combinator library and a shallowly embedded DSL. A *deep embedding* is when interpretation is done on an intermediate data structure. / Emil Günther Schmidt skrev:

...

Hi all,

for people that have followed my posts on the DSL subject this question probably will seem strange, especially asking it now.

I have read quite a lot lately on the subject, most of it written by the great old ones, (come on guys you know whom I mean :)).

What I could gather from their papers was, that a DSL is basically something entirely abstract as such, ie. it allows you build and combine expressions in a language which is specific for your problem domain. Irregardless of further details on how to do that, and there are quite a few, the crux as such is that they are abstract of "meaning".

The meaning depends how you *evaluate* the expression, which can be in more than merely one way, which is where, as far as I understand it, the true power lies.

So, you might wonder, since I figured it out this far, why ask what a DSL is?

Because out there I see quite a lot of stuff that is labeled as DSL, I mean for example packages on hackage, quite useuful ones too, where I don't see the split of assembling an expression tree from evaluating it, to me that seems more like combinator libraries.

Thus:

What is a DSL?

Günther

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Günther Schmidt

3:31 p.m.

Hi Emil, now that is an interpretation I could live with! Glad I posted the question. Günther Am 07.10.2009, 17:24 Uhr, schrieb Emil Axelsson :

...

Hi,

A DSL is just a domain-specific language. It doesn't imply any specific implementation technique.

An *embedded* DSL is a library implemented in a more general language, which has been designed to give the "feeling" of a stand-alone language. Still nothing about implementation.

A *shallow embedding* of a DSL is when the "evaluation" is done immediately by the functions and combinators of the DSL. I don't think it's possible to draw a line between a combinator library and a shallowly embedded DSL.

A *deep embedding* is when interpretation is done on an intermediate data structure.

/ Emil

Joe Fredette

3:32 p.m.

So, if I understand this: Parsec is a DSL, I'm going to venture it's a "Deep embedding" -- I don't understand the internals, but if I were to build something like Parsec, I would probably build up a "Parser" datastructure and then apply optimizations to it, then "run" it with another function. Am I on the right track here? /Joe On Oct 7, 2009, at 11:24 AM, Emil Axelsson wrote:

...

Hi,

A DSL is just a domain-specific language. It doesn't imply any specific implementation technique.

An *embedded* DSL is a library implemented in a more general language, which has been designed to give the "feeling" of a stand- alone language. Still nothing about implementation.

A *shallow embedding* of a DSL is when the "evaluation" is done immediately by the functions and combinators of the DSL. I don't think it's possible to draw a line between a combinator library and a shallowly embedded DSL.

A *deep embedding* is when interpretation is done on an intermediate data structure.

/ Emil

Günther Schmidt skrev:

...
Hi all, for people that have followed my posts on the DSL subject this question probably will seem strange, especially asking it now. I have read quite a lot lately on the subject, most of it written by the great old ones, (come on guys you know whom I mean :)). What I could gather from their papers was, that a DSL is basically something entirely abstract as such, ie. it allows you build and combine expressions in a language which is specific for your problem domain. Irregardless of further details on how to do that, and there are quite a few, the crux as such is that they are abstract of "meaning". The meaning depends how you *evaluate* the expression, which can be in more than merely one way, which is where, as far as I understand it, the true power lies. So, you might wonder, since I figured it out this far, why ask what a DSL is? Because out there I see quite a lot of stuff that is labeled as DSL, I mean for example packages on hackage, quite useuful ones too, where I don't see the split of assembling an expression tree from evaluating it, to me that seems more like combinator libraries. Thus: What is a DSL? Günther _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Robert Atkey

4:29 p.m.

On Wed, 2009-10-07 at 11:32 -0400, Joe Fredette wrote:

...

So, if I understand this:

Parsec is a DSL, I'm going to venture it's a "Deep embedding" -- I don't understand the internals, but if I were to build something like Parsec, I would probably build up a "Parser" datastructure and then apply optimizations to it, then "run" it with another function.

Am I on the right track here?

Parsec, like most other parser combinator libraries, is a shallowly embedded DSL. The "Parser a" type is a Haskell function that does parsing, i.e. a function of type String -> Maybe (String, a). (Obviously, the real Parsec library allows more than strings, and has better error reporting than this type, but this is the basic idea). You can't analyse it further---you can't transform it into another grammar to optimise it or print it out---because the information about what things it accepts has been locked up into a non-analysable Haskell function. The only thing you can do with it is feed it input and see what happens. A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were. First we define what a production with a semantic action is, parameterised by the type of non-terminals in our grammar and the result type:

...

data Production nt a = Stop a | Terminal Char (Production nt a) | forall b. NonTerminal (nt b) (Production nt (b -> a))

You can think of a production as a list of either terminals or non-terminals, terminated by the "value" of that production. The non-regular nested type argument in NonTerminal means that the final value can depend on the values that will be returned when parsing the strings that match other non-terminals. Productions are functors:

...

instance Functor (Production nt) where fmap f (Stop a) = Stop (f a) fmap f (Terminal c p) = Expect c (fmap f p) fmap f (NonTerminal nt p) = NonTerminal nt (fmap (fmap f) p)

They are also applicative functors:

...

instance Applicative (Production nt) where pure = Stop (Stop f) <*> a = fmap f a (Terminal c t) <*> a = Terminal c (t <*> a) (NonTerminal nt t) <*> a = NonTerminal nt (fmap flip t <*> a)

A rule in one of our grammars is just a list of alternative productions:

...

newtype Rule nt a = Rule [Production nt a]

Since lists are (applicative) functors and (applicative) functors compose, Rule nt is also a Functor and Applicative functor:

...

instance Functor (Rule nt) where fmap f (Rule l) = Rule (fmap (fmap f) l)

...

instance Applicative (Rule nt) where pure x = Rule $ pure (pure x) (Rule lf) <*> (Rule la) = Rule $ (<*>) <$> lf <*> la

It is also an instance of Alternative, because we composed with lists:

...

instance Alternative (Rule nt) where empty = Rule [] (Rule r1) <|> (Rule r2) = Rule $ r1 <|> r2

A grammar is a map from nonterminals to rules, which are lists of alternative productions, which may themselves refer back to nonterminals in the grammar:

...

type Grammar nt = forall a. nt a -> Rule nt a

Given a value of type "Grammar nt", and a starting nonterminal in "nt a" for some "a", one can easily write a function that translates it into a Parsec grammar to do actual parsing, or implement a different parsing strategy using memoisation or something similar. The translation to a traditional parser combinator library is actually a (indexed-)homomorphism of applicative functors + extra operations, which is pretty cool. If you also know some extra facts about the "nt" type (e.g. that it is finite), then it should be possible implement an CYK or Earley parser using this, or to print out the grammar (for documentation purposes, or for telling another node in a distributed network what things you accept, for instance). Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future. This is the fundamental reason for using the applicative functor interface rather than the monad interface here. I'll give an example grammar for parsing expressions modelled by the following data type:

...

data Expr = ENum Int | ESum Expr Expr | EProduct Expr Expr deriving Show

To define a grammar in this formalism, one first has to define the set of nonterminals that one wants to use:

...

data NT a where Value :: NT Expr Product :: NT Expr Sum :: NT Expr

Now, a grammar is simply a function from members of this type to productions. We use the applicative/alternative functor interface to build up the productions. Conor's SHE would make this look a lot nicer, using idiom brackets.

...

myGrm :: Grammar NT myGrm Value = ENum <$> posInt <|> id <$ char '(' <*> nt Sum <* char ')'

myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product <|> id <$> nt Value

myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum <|> id <$> nt Product

This needs a couple of simple functions to make things look nice:

...

char :: Char -> Rule nt () char c = Rule [Terminal c $ Stop ()]

...

nt :: nt a -> Rule nt a nt nonterminal = Rule [NonTerminal nonterminal $ Stop id]

And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar:

...

choice :: Alternative f => [f a] -> f a choice = foldl (<|>) empty

digit :: Rule nt Int digit = choice [ x <$ char (intToDigit x) | x <- [0..9] ]

posInt :: Rule nt Int posInt = fix 1 . reverse <$> some digit where fix n [] = 0 fix n (d:ds) = d*n + fix (n*10) ds

Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Ben Franksen

10 Oct 10 Oct

6:12 p.m.

Robert Atkey wrote:

...

A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were.

[snip code & explanation]

This is extremely cool. I tried to understand in my head how this all works but it just didn't click. It all seemed like magic. Then I went ahead and tried to write a printer for your example grammar and now everything is much clearer. Although I had to fight the type checker quite a bit. This is the generic part:

...

class Print f where pr :: f a -> String

...

instance Print nt => Print (Production nt) where pr = printProduction

...

printProduction :: Print nt => Production nt a -> String printProduction (Stop _) = "" printProduction (Terminal t (Stop _)) = show t printProduction (Terminal t p) = show t ++ " " ++ printProduction p printProduction (NonTerminal nt (Stop _)) = pr nt printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p

...

instance Print nt => Print (Rule nt) where pr (Rule ps) = printPs ps where printPs [] = "" printPs [p] = printProduction p printPs (p:ps) = printProduction p ++ " | " ++ printPs ps

...

data Any f = forall a. Any (f a)

...

class Enumerable f where enumeration :: [Any f]

...

printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String printRule g nt = pr nt ++ " ::= " ++ pr (g nt)

...

printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String printGrammar g = foldr (++) "" (intersperse "\n" rules) where rules = map printAnyRule enumeration printAnyRule (Any nt) = printRule g nt

We must also provide instances for the concrete types:

...

instance Enumerable NT where enumeration = [Any Sum, Any Product, Any Value]

...

instance Print NT where pr Value = "Value" pr Product = "Product" pr Sum = "Sum"

So far so good. This even works... almost ;-) *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Interrupted. -- had to hit Ctrl-C here When I replace 'posInt' with 'digit' in the rule for Value

...

myGrm Value = ENum <$> digit <|> id <$ char '(' <*> nt Sum <* char ')'

then the printer terminates just fine: *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '(' Sum ')' I found that the problem is the use of function 'some' from Control.Applicative in

...

posInt :: Rule nt Int posInt = fix 1 . reverse <$> some digit where fix n [] = 0 fix n (d:ds) = d*n + fix (n*10) ds

Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time. I can see at least two solutions: One is to parameterize everything over the type of terminals, too. A type suitable for the example would be

...

data T = TNum Int | TPlus | TMult | TOParen | TCParen

and leave token recognition to a separate scanner. The second solution (which I followed) is to break the recursion by adding another nonterminal to the NT type:

...

data NT a where Sum :: NT Expr Product :: NT Expr Value :: NT Expr Number :: NT [Int] Digit :: NT Int

...

instance Enumerable NT where enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit]

...

instance Print NT where pr Sum = "Sum" pr Product = "Product" pr Value = "Value" pr Number = "Number" pr Digit = "Digit"

(Adding Digit /and/ Number is not strictly necessary, but it makes for a nicer presentation.)

...

myGrm :: Grammar NT myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum <|> id <$> nt Product

myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product <|> id <$> nt Value

myGrm Value = (ENum . toNat) <$> nt Number <|> id <$ char '(' <*> nt Sum <* char ')'

myGrm Number = extend <$> nt Digit <*> optional (nt Number)

myGrm Digit = digit

...

extend d Nothing = [d] extend d (Just ds) = d:ds

...

toNat :: [Int] -> Int toNat = fix 1 . reverse where fix n [] = 0 fix n (d:ds) = d*n + fix (n*10) ds

With this I get *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Number | '(' Sum ')' Number ::= Digit Number | Digit Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' Morale: Be careful with recursive functions when constructing a data representation (e.g. for a deep DSL). You might get an infinite representation which isn't what you want in this case. Oh, and another point: there should be a distinguished "start" nonterminal, otherwise this is not really a grammar. This suggests something like

...

type Grammar nt a = (nt a, forall b. nt b -> Rule nt b)

Next thing I'll try is to transform such a grammar into an actual parser... Cheers Ben

Ben Franksen

11 Oct 11 Oct

7:54 p.m.

Ben Franksen wrote:

...

Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working. However, this exposed yet another problem I am not sure how to solve. The problem manifests itself with non-left-factored rules like Number ::= Digit Number | Digit Translating such a grammar directly into a Parsec parser leads to parse errors because Parsec's choice operator is predictive: if a production has consumed any input the whole choice fails, even if alternative productions would not: *Main> P.parseTest (parseGrammar myGrm) "2" parse error at (line 1, column 2): unexpected end of input expecting Number Of course, one solution is to apply Parsec's try combinator to all choices in a rule. But this rather defeats the purpose of using a (by default) predictive parser in the first place which is to avoid unnecessary backtracking. So, a better solution is to left-factor the grammar before translating to Parsec. Since we have a data representation of the grammar that we can readily analyse and transform, this should be possible given some suitable algorithm. But how is this transformation to be typed? My first naive attempt was to define (recap: nt :: * -> * is the type of nonterminals, t :: * is the type of terminals a.k.a. tokens, and a is the result type):

...

leftFactor :: Grammar nt t a -> Grammar nt t a

Of course, this is wrong: Left-factoring is expected to introduce new nonterminals, so on the right-hand side of the type we should not have the same 'nt' as on the left. Instead we shoudl have some other type that is "'nt' extended with new constructors". Moreover, we cannot statically know how many new nonterminals are added, so we cannot simply apply a type function to nt. Is this solvable at all in Haskell or do I need proper dependent types to express this? I have very vague ideas that revolve around setting up some recursive type function that on each level adds one constructor, define a common interface with a (multiparam) type class and then use existential quantification in the result type to hide the resulting type of nonterminals. The next question is: Even if this turns out to be possible, isn't it overkill? Maybe it is better to use an infinite type for the nonterminals in the first place and let the grammar be a partial function? OTOH, the formulation of the grammar as a function that pattern matches on the nonterminals is very elegant. Cheers Ben

Ben Franksen

10 p.m.

Ben Franksen wrote:

...

Ben Franksen wrote:

...
Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working.

First, before all this talking to myself here is boring you to death, please shout and I'll go away. Anyway, at least one person has privately expressed interest, so I'll post my code for the translation.(*)

...

{-# LANGUAGE ExistentialQuantification, GADTs, Rank2Types #-} {-# LANGUAGE TypeSynonymInstances, MultiParamTypeClasses, ImpredicativeTypes #-} import qualified Text.ParserCombinators.Parsec as P

Note that I have parameterized everything on the token (terminal) type. Here are the data types, adapting the rest of the code is completely mechanical.

...

data Production nt t a = Stop a | Terminal t (Production nt t a) | forall b. NonTerminal (nt b) (Production nt t (b -> a))

...

newtype Rule nt t a = Rule [Production nt t a]

...

type RuleSet nt t = forall a. nt a -> Rule nt t a

...

type Grammar nt t b = (nt b, RuleSet nt t)

I should probably turn this into a proper data type, which would BTW also make the ImpredicativeTypes extension unnecessary. Translation to Parsec --------------------- We restrict ourselves to Char as terminals for simplicity. The generalization to arbitrary token types would need another three arguments: showTok :: (tok -> String), nextPos :: (SourcePos -> tok -> [tok] -> SourcePos), and testTok :: (tok -> Maybe a), which are needed by P.tokenPrim.

...

parseGrammar :: Print nt => Grammar nt Char a -> P.Parser a parseGrammar (start,rules) = parseNonTerminal rules start

...

parseNonTerminal :: Print nt => RuleSet nt Char -> nt a -> P.Parser a parseNonTerminal rs nt = parseRule rs (rs nt) P. pr nt

...

parseRule :: Print nt => RuleSet nt Char -> Rule nt Char a -> P.Parser a parseRule rs (Rule ps) = P.choice (map ({- P.try . -} parseProduction rs) ps)

...

parseProduction :: Print nt => RuleSet nt Char -> Production nt Char a -> P.Parser a parseProduction _ (Stop x) = return x parseProduction rs (Terminal c p) = P.char c >> parseProduction rs p parseProduction rs (NonTerminal nt p) = do vnt <- parseNonTerminal rs nt vp <- parseProduction rs p return (vp vnt)

This is really not difficult, once you understand how the list-like Production type works. The trick is that a NonTerminal forces the "rest" of the production to return a function type, so you can apply its result to the result of parsing the nonterminal. Whereas the result of parsing terminals gets ignored by the "rest" of the production. You might wonder how the code manages to return the correct integer values inside an ENum. Well, I did, at least. I don't yet understand it completely but I think the answer is in in the Functor and Applicative instances: all the code that interprets syntactic elements (up to the abstract syntax) inside the myGrm function gets pushed down through the elements of a production until it ends up at a Stop, where we can finally pull it out (see the first clause of parseProduction). Note also the (commented-out) use of P.try in function parseRule. Let's try it: *Main> putStrLn (printGrammar myGrm) *Start ::= Sum Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Number | '(' Sum ')' Number ::= Digit Number | Digit Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' *Main> P.parseTest (parseGrammar myGrm) "2*(2+52)" parse error at (line 1, column 2): unexpected "*" expecting Number After re-inserting the P.try call, I can actually parse expressions (yay!): *Main> :r [1 of 1] Compiling Main ( Grammar.lhs, interpreted ) Ok, modules loaded: Main. *Main> P.parseTest (parseGrammar myGrm) "2*(2+52)" EProduct (ENum 2) (ESum (ENum 2) (ENum 52)) BTW, does anyone know a source (books, papers, blogs, whatever) about algorithms for automatic left-factoring? I searched with google and found some interesting papers on eliminating left recursion but nothing so far on left-factoring. Have these problems all been solved before the internet age? Cheers Ben (*) One of these days I really should get my hands dirty and set up a weblog; suggestions for how to proceed are appreciated. I would especially like something where I can just upload a literate Haskell file and it gets formatted automagically. Bonus points for beautifying operator symbols a la lhs2tex ;-)

Brandon S. Allbery KF8NH

10:29 p.m.

On Oct 11, 2009, at 18:00 , Ben Franksen wrote:

...

Ben Franksen wrote:

...
Ben Franksen wrote:

...
Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working.

First, before all this talking to myself here is boring you to death, please shout and I'll go away. Anyway, at least one person has privately expressed interest, so I'll post my code for the translation.(*)

It's -cafe, so let 'er rip. And maybe write it up for TMR, if you don't want to set up a blog with all the goodies? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Brent Yorgey

13 Oct 13 Oct

12:19 a.m.

On Sun, Oct 11, 2009 at 06:29:58PM -0400, Brandon S. Allbery KF8NH wrote:

...

On Oct 11, 2009, at 18:00 , Ben Franksen wrote:

...
Ben Franksen wrote:

...
Ben Franksen wrote:

...
Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working.

First, before all this talking to myself here is boring you to death, please shout and I'll go away. Anyway, at least one person has privately expressed interest, so I'll post my code for the translation.(*)

It's -cafe, so let 'er rip. And maybe write it up for TMR, if you don't

Yes please! =) -Brent

S. Doaitse Swierstra

12 Oct 12 Oct

3:25 p.m.

This problem of dynamically transforming grammars and bulding parsers out of it is addressed in: @inproceedings{1411296, author = {Viera, Marcos and Swierstra, S. Doaitse and Lempsink, Eelco}, title = {Haskell, do you read me?: constructing and composing efficient top-down parsers at runtime}, booktitle = {Haskell '08: Proceedings of the first ACM SIGPLAN symposium on Haskell}, year = {2008}, isbn = {978-1-60558-064-7}, pages = {63--74}, location = {Victoria, BC, Canada}, doi = {http://doi.acm.org/10.1145/1411286.1411296}, publisher = {ACM}, address = {New York, NY, USA}, } and the code can be found on hackage under the name ChristmasTree The left-factorisation is explained in a paper we presented at the last LDTA and which will appear in ENTCS. Since we have signed some copyright form I do notthink I can attach it here, but if you send me a mail, I can definitely send you the paper. Doaitse On 11 okt 2009, at 21:54, Ben Franksen wrote:

...

Ben Franksen wrote:

...
Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working. However, this exposed yet another problem I am not sure how to solve.

The problem manifests itself with non-left-factored rules like

Number ::= Digit Number | Digit

Translating such a grammar directly into a Parsec parser leads to parse errors because Parsec's choice operator is predictive: if a production has consumed any input the whole choice fails, even if alternative productions would not:

*Main> P.parseTest (parseGrammar myGrm) "2" parse error at (line 1, column 2): unexpected end of input expecting Number

Of course, one solution is to apply Parsec's try combinator to all choices in a rule. But this rather defeats the purpose of using a (by default) predictive parser in the first place which is to avoid unnecessary backtracking.

So, a better solution is to left-factor the grammar before translating to Parsec. Since we have a data representation of the grammar that we can readily analyse and transform, this should be possible given some suitable algorithm. But how is this transformation to be typed?

My first naive attempt was to define (recap: nt :: * -> * is the type of nonterminals, t :: * is the type of terminals a.k.a. tokens, and a is the result type):

...
leftFactor :: Grammar nt t a -> Grammar nt t a

Of course, this is wrong: Left-factoring is expected to introduce new nonterminals, so on the right-hand side of the type we should not have the same 'nt' as on the left. Instead we shoudl have some other type that is "'nt' extended with new constructors". Moreover, we cannot statically know how many new nonterminals are added, so we cannot simply apply a type function to nt. Is this solvable at all in Haskell or do I need proper dependent types to express this?

I have very vague ideas that revolve around setting up some recursive type function that on each level adds one constructor, define a common interface with a (multiparam) type class and then use existential quantification in the result type to hide the resulting type of nonterminals.

The next question is: Even if this turns out to be possible, isn't it overkill? Maybe it is better to use an infinite type for the nonterminals in the first place and let the grammar be a partial function? OTOH, the formulation of the grammar as a function that pattern matches on the nonterminals is very elegant.

Cheers Ben

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ben Franksen

9:05 p.m.

S. Doaitse Swierstra wrote:

...

This problem of dynamically transforming grammars and bulding parsers out of it is addressed in:

@inproceedings{1411296, author = {Viera, Marcos and Swierstra, S. Doaitse and Lempsink, Eelco}, title = {Haskell, do you read me?: constructing and composing efficient top-down parsers at runtime}, [...] }

Indeed, it looks as if you solved exactly the problem that vexed me! I had just found the presentation that corresponds to the paper you mention, and I also found a preprint for "Typed transformations of Typed Abstract Syntax" which I am studying at the moment. I must say your construction is, well, involved... Not that I want to belittle this really astounding and clever achievement... but one disadvantage of your approach (which it shares with almost all examples I have seen for dependently typed or heterogeneous collections) is that (IIUC) the typed map from references to abstract syntactic terms is operationally an association list, indexed by unary numbers. I would expect this to scale poorly if the number of references (e.g. nonterminals) gets large. I think it would make for quite an interesting research project to study whether it is possible to achieve the same level of precise static typing with more efficient data structures. I imagine that using some 'fake dependent type' variant of [Bit] for the key and the equivalent of a patricia tree for the map could perhaps be made to work??? Cheers Ben

Robert Atkey

22 Oct 22 Oct

1:46 p.m.

On Sun, 2009-10-11 at 21:54 +0200, Ben Franksen wrote:

...

Ben Franksen wrote:

...
Next thing I'll try is to transform such a grammar into an actual parser...

Which I also managed to get working. However, this exposed yet another problem I am not sure how to solve.

Another option is to not use a recursive descent parser, and switch to a parsing algorithm for any context-free such as CYK or Earley's algorithm. A little test implementation of a well-typed version of the CYK algorithm seems to work and seems to be as efficient as the normal imperative one if enough memoisation is used. I'm trying to see if I can get Earley's algorithm to work nicely in the well-typed setting. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Robert Atkey

1:44 p.m.

On Sat, 2009-10-10 at 20:12 +0200, Ben Franksen wrote:

...

Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time.

Yes, sorry, I should have been more careful there. One has to be careful to handle EDSLs that have potentially infinite syntax properly.

...

I can see at least two solutions: One is to parameterize everything over the type of terminals, too.

...

The second solution (which I followed) is to break the recursion by adding another nonterminal to the NT type:

A third solution is to add the Kleene star to the grammar DSL, so the representation of productions becomes:

...

data Production nt a = Stop a | Terminal Char (Production nt a) | forall b. NonTerminal (nt b) (Production nt (b -> a)) | forall b. Star (Production nt b) (Production nt ([b] -> a))

You also need to add the necessary parts for Alternative to the Production type too, because they may be nested inside Star constructors:

...

| Zero | Choose (Production nt a) (Production nt a)

The type Production nt is now directly an Applicative and an Alternative and also has the combinator:

...

star :: Production nt a -> Production nt [a] star p = Star p $ Stop id

The type of grammars is changed to (with the additional of the starting nonterminal, as you point out):

...

type Grammar nt = forall a. nt a -> Production nt a

It is probably also possible to write a function that converts grammars with “Star”s in to ones without by introducing new non-terminals in the way you did. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Nils Anders Danielsson

28 Oct 28 Oct

2:05 a.m.

On 2009-10-22 14:44, Robert Atkey wrote:

...

On Sat, 2009-10-10 at 20:12 +0200, Ben Franksen wrote:

...
Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time.

Yes, sorry, I should have been more careful there. One has to be careful to handle EDSLs that have potentially infinite syntax properly.

I find it useful to carefully distinguish between inductive and coinductive components of the syntax. Consider the following recogniser combinator language, implemented in Agda, for instance: data P : Bool → Set where ∅ : P false ε : P true tok : Bool → P false _∣_ : ∀ {n₁ n₂} → P n₁ → P n₂ → P (n₁ ∨ n₂) _·_ : ∀ {n₁ n₂} → P n₁ → ∞? n₁ (P n₂) → P (n₁ ∧ n₂) The recognisers are indexed on their nullability; the index is true iff the recogniser accepts the empty string. The definition of P is inductive, except that the right argument of the sequencing combinator (_·_) is allowed to be coinductive when the left argument does not accept the empty string: ∞? : Set → Bool → Set ∞? true A = A ∞? false A = ∞ A (You can read ∞ A as a suspended computation of type A.) The limitations imposed upon coinduction in the definition of P ensure that recognition is decidable. For more details, see http://www.cs.nott.ac.uk/~nad/listings/parser-combinators/TotalRecognisers.h.... -- /NAD This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

Nils Anders Danielsson

13 Oct 13 Oct

12:28 p.m.

On 2009-10-07 17:29, Robert Atkey wrote:

...

A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were.

The Agda code you saw may have been due to Ulf Norell and me. There is a note about it on my web page: http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinat...

...

Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future.

Previously parsed input /can/ determine what the parser will accept in the future (as pointed out by Peter Ljunglöf in his licentiate thesis). Consider the following grammar for the context-sensitive language {aⁿbⁿcⁿ| n ∈ ℕ}: data NT a where Start :: NT () -- Start ∷= aⁿbⁿcⁿ ABC :: Nat -> NT () -- ABC n ∷= aˡbⁿ⁺ˡcⁿ⁺ˡ X :: Char -> Nat -> NT () -- X x n ∷= xⁿ g :: Grammar NT g Start = nt (ABC 0) g (ABC n) = char 'a' <* nt (ABC (succ n)) <|> nt (X 'b' n) <* nt (X 'c' n) g (X c n) | n == 0 = pure () | otherwise = char c <* nt (X c (pred n))

...

And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar:

Things become more complicated if the reusable component is defined using non-terminals which take rules (defined using an arbitrary non-terminal type) as arguments. Exercise: Define a reusable variant of the Kleene star, without using grammars of infinite depth. -- /NAD This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

Robert Atkey

22 Oct 22 Oct

1:56 p.m.

On Tue, 2009-10-13 at 13:28 +0100, Nils Anders Danielsson wrote:

...

On 2009-10-07 17:29, Robert Atkey wrote:

...
A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were.

The Agda code you saw may have been due to Ulf Norell and me. There is a note about it on my web page:

http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinat...

Yes, it might have been that, OTOH I'm sure I saw it in some Haskell code. Maybe I was imagining it.

...

...
Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future.

Previously parsed input /can/ determine what the parser will accept in the future (as pointed out by Peter Ljunglöf in his licentiate thesis). Consider the following grammar for the context-sensitive language {aⁿbⁿcⁿ| n ∈ ℕ}:

Yes, sorry, I was sloppy in what I said there. Do you know of a characterisation of what languages having a possibly infinite amount of nonterminals gives you. Is it all context-sensitive languages or a subset?

...

...
And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar:

Things become more complicated if the reusable component is defined using non-terminals which take rules (defined using an arbitrary non-terminal type) as arguments. Exercise: Define a reusable variant of the Kleene star, without using grammars of infinite depth.

I see that you have an answer in the paper you linked to above. Another possible answer is to consider open sets of rules in a grammar:

...

data OpenRuleSet inp exp = forall hidden. OpenRuleSet (forall a. (exp :+: hidden) a -> Rule (exp :+: hidden :+: inp) a)

...

data (f :+: g) a = Left2 (f a) | Right2 (g a)

So OpenRuleSet inp exp, exports definitions of the nonterminals in 'exp', imports definitions of nonterminals in 'inp' (and has a collection of hidden nonterminals). It is then possible to combine them with a function of type:

...

combineG :: (inp1 :=> exp1 :+: inp) -> (inp2 :=> exp2 :+: inp) -> OpenRuleSet inp1 exp1 -> OpenRuleSet inp2 exp2 -> OpenRuleSet inp (exp1 :+: exp2)

One can then give a reusable Kleene star by stating it as an open rule set:

...

star :: forall a nt. Rule nt a -> OpenRuleSet nt (Equal [a])

where Equal is the usual equality GADT. Obviously, this would be a bit clunky to use in practice, but maybe more specialised versions combineG could be given. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Nils Anders Danielsson

28 Oct 28 Oct

1:39 a.m.

On 2009-10-22 14:56, Robert Atkey wrote:

...

Yes, it might have been that, OTOH I'm sure I saw it in some Haskell code. Maybe I was imagining it.

There is some related Haskell code in the Agda repository.

...

Do you know of a characterisation of what languages having a possibly infinite amount of nonterminals gives you. Is it all context-sensitive languages or a subset?

I found a PhD thesis by Marvin Solomon (Cornell, 1977) which mentions the following, in retrospect obvious, fact: With an infinite set of non-terminals you can represent /any/ (countable) language, by using one non-terminal for every string in the language. I adapted this argument to show that a total parser combinator library which I have implemented in Agda has exactly the same expressive power as (total) functions of type List Bool → List R (assuming the token type is Bool): Parser combinators are as expressive as possible http://sneezy.cs.nott.ac.uk/fplunch/weblog/?p=271 -- /NAD This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

S. Doaitse Swierstra

9:18 p.m.

On 22 okt 2009, at 15:56, Robert Atkey wrote:

...

...
...
....

Previously parsed input /can/ determine what the parser will accept in the future (as pointed out by Peter Ljunglöf in his licentiate thesis). Consider the following grammar for the context-sensitive language {aⁿbⁿcⁿ| n ∈ ℕ}:

Yes, sorry, I was sloppy in what I said there. Do you know of a characterisation of what languages having a possibly infinite amount of nonterminals gives you. Is it all context-sensitive languages or a subset?

The answer is: all context-sensitive languages. This is a very old insight which has come back in various forms in computer science. The earliest conception in CS terms is the concept of an affix-grammar, in which the infinite number of nonterminals is generated by parameterising non-terminals by trees. They were invented by Kees koster and Lambert Meertens (who applied them to generate music: http://en.wikipedia.org/wiki/index.html?curid=5314967) in the beginning of the sixties of the last century. There is a long follow up on this idea, of which the two most well-known versions are the so-called two-level grammars which were used in the Algol68 report and the attribute grammar formalism first described by Knuth. The full Algol68 language is defined in terms of a two-level grammar. Key publications/starting points if you want to learn more about these are: - the Algol68 report: http://burks.brighton.ac.uk/burks/language/other/a68rr/rrtoc.htm - the wikipedia paper on affix grammars: http://en.wikipedia.org/wiki/Affix_grammar - a nice book about the basics od two-level grammars is the Cleaveland & Uzgalis book, "Grammars for programming languages", which may be hard to get, but there is hope: http://www.amazon.com/Grammars-Programming-Languages-languages/dp/0444001875 - http://www.agfl.cs.ru.nl/papers/agpl.ps - http://comjnl.oxfordjournals.org/cgi/content/abstract/32/1/36 Doaitse Swierstra

...

...
...
And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar:

Things become more complicated if the reusable component is defined using non-terminals which take rules (defined using an arbitrary non-terminal type) as arguments. Exercise: Define a reusable variant of the Kleene star, without using grammars of infinite depth.

I see that you have an answer in the paper you linked to above. Another possible answer is to consider open sets of rules in a grammar:

...
data OpenRuleSet inp exp = forall hidden. OpenRuleSet (forall a. (exp :+: hidden) a -> Rule (exp :+: hidden :+: inp) a)

...
data (f :+: g) a = Left2 (f a) | Right2 (g a)

So OpenRuleSet inp exp, exports definitions of the nonterminals in 'exp', imports definitions of nonterminals in 'inp' (and has a collection of hidden nonterminals).

It is then possible to combine them with a function of type:

...
combineG :: (inp1 :=> exp1 :+: inp) -> (inp2 :=> exp2 :+: inp) -> OpenRuleSet inp1 exp1 -> OpenRuleSet inp2 exp2 -> OpenRuleSet inp (exp1 :+: exp2)

One can then give a reusable Kleene star by stating it as an open rule set:

...
star :: forall a nt. Rule nt a -> OpenRuleSet nt (Equal [a])

where Equal is the usual equality GADT.

Obviously, this would be a bit clunky to use in practice, but maybe more specialised versions combineG could be given.

Bob

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Joe Fredette

7 Oct 7 Oct

3:26 p.m.

Let me add to this, as I've used the term "DSL" without (*gasp*) fully understanding it before. In addition to "What is a DSL", I'd like to ask: "How is a DSL different from an API?" -- in the sense that an API is a set of, say, combinators to filter email + a monad in which to combine them. Or even the API in the more traditional sense of the set of exposed operations on a given type. Is an API a kind of DSL? A kind of Embedded DSL? Also, "What is the difference between an EDSL and a DSL?" -- I've got a vague intuition of the difference, but am unsure how to particularly delineate them. Also, any good introductory papers/books/other resources on DSLs and how to design, build and use them would be _lovely_. /Joe On Oct 7, 2009, at 11:10 AM, Günther Schmidt wrote:

...

Hi all,

for people that have followed my posts on the DSL subject this question probably will seem strange, especially asking it now.

I have read quite a lot lately on the subject, most of it written by the great old ones, (come on guys you know whom I mean :)).

What I could gather from their papers was, that a DSL is basically something entirely abstract as such, ie. it allows you build and combine expressions in a language which is specific for your problem domain. Irregardless of further details on how to do that, and there are quite a few, the crux as such is that they are abstract of "meaning".

The meaning depends how you *evaluate* the expression, which can be in more than merely one way, which is where, as far as I understand it, the true power lies.

So, you might wonder, since I figured it out this far, why ask what a DSL is?

Because out there I see quite a lot of stuff that is labeled as DSL, I mean for example packages on hackage, quite useuful ones too, where I don't see the split of assembling an expression tree from evaluating it, to me that seems more like combinator libraries.

Thus:

What is a DSL?

Günther

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Günther Schmidt

3:42 p.m.

Hi Joe Am 07.10.2009, 17:26 Uhr, schrieb Joe Fredette :

...

Let me add to this, as I've used the term "DSL" without (*gasp*) fully understanding it before.

Welcome to the club then! :)

...

In addition to "What is a DSL", I'd like to ask:

"How is a DSL different from an API?" -- in the sense that an API is a set of, say, combinators to filter email + a monad in which to combine them. Or even the API in the more traditional sense of the set of exposed operations on a given type. Is an API a kind of DSL? A kind of Embedded DSL?

Also,

"What is the difference between an EDSL and a DSL?" -- I've got a vague intuition of the difference, but am unsure how to particularly delineate them.

Well that part I think I can answer. An EDSL is when you don't start from scratch. IE. when you do not, let's say build a compiler that parses a String and then eventually "executes" it. Rather you define the "Terms", ie. primitive Terms (Terminals) and Non-Terminals with the means of the "host" language (Haskell in my case).

...

Also, any good introductory papers/books/other resources on DSLs and how to design, build and use them would be _lovely_.

Well as a book I could recommend Paul Hudaks "School of Expression". The way he abstracts is by means of using a DSL. He assembles objects, Geometrics Regions, Triangles, circles, squares etc. combines them with the help of functions and *later* evaluates them. Now he is definatly using a DSL here, but that is by no means the only way of implementing the abstract through a DSL. Once that has sunk in I suggest papers from Oleg and others on the subject, but to get started SOE would be a good idea. Günther

minh thu

3:49 p.m.

Hi, Some random observation: A (E)DSL and an API fall on the same plane when they just expose functionality of a library. The difference between EDSL and a DSL is really just the E which means embedded into a host language so the embedded language can be built on top of some existing machinery, in Haskell typically the type system. Haskell is particularly good for EDSL (but also Scheme or CL) because the syntax of Haskell lets have a nice syntax for the embedded language and the type system makes it possible to have, with more or less simplicity, typing guarantees for the specifi language. A regular expression library comprises often a regexp language, which is considerd part of the API. That language is (or can be) parsed, compiled and executed. Some EDSL require to execute the Haskell program to output some "object" code, others require only the execution of some function equivalent to runState for the particular monad the EDSL uses. Providing a specialised language on top of a library is quite common, for instance command line tools to process images. Those command line tool can later be used in some progreams (think scripting languages). For instance, the "dot" program of the graphviz suite can be run with unsafePerformIO to get graphviz features inside Haskell. Parsing a String into some data structure is just a special case of transforming some data structure into other data structure because it easier to process that way. For instance HOAS into de Bruijn and vice versa. So for me, there is not a so strong distinction between API and language. Cheers, Thu 2009/10/7 Joe Fredette :

...

Let me add to this, as I've used the term "DSL" without (*gasp*) fully understanding it before.

In addition to "What is a DSL", I'd like to ask:

"How is a DSL different from an API?" -- in the sense that an API is a set of, say, combinators to filter email + a monad in which to combine them. Or even the API in the more traditional sense of the set of exposed operations on a given type. Is an API a kind of DSL? A kind of Embedded DSL?

Also,

"What is the difference between an EDSL and a DSL?" -- I've got a vague intuition of the difference, but am unsure how to particularly delineate them.

Also, any good introductory papers/books/other resources on DSLs and how to design, build and use them would be _lovely_.

/Joe

On Oct 7, 2009, at 11:10 AM, Günther Schmidt wrote:

...
Hi all,

for people that have followed my posts on the DSL subject this question probably will seem strange, especially asking it now.

I have read quite a lot lately on the subject, most of it written by the great old ones, (come on guys you know whom I mean :)).

What I could gather from their papers was, that a DSL is basically something entirely abstract as such, ie. it allows you build and combine expressions in a language which is specific for your problem domain. Irregardless of further details on how to do that, and there are quite a few, the crux as such is that they are abstract of "meaning".

The meaning depends how you *evaluate* the expression, which can be in more than merely one way, which is where, as far as I understand it, the true power lies.

So, you might wonder, since I figured it out this far, why ask what a DSL is?

Because out there I see quite a lot of stuff that is labeled as DSL, I mean for example packages on hackage, quite useuful ones too, where I don't see the split of assembling an expression tree from evaluating it, to me that seems more like combinator libraries.

Thus:

What is a DSL?

Günther

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dan Piponi

4:35 p.m.

2009/10/7 Joe Fredette :

...

Let me add to this, as I've used the term "DSL" without (*gasp*) fully understanding it before.

In addition to "What is a DSL", I'd like to ask:

"How is a DSL different from an API?"

I don't think there is a sharp divide here. A nice example was given by Pat Hanrahan at the recent nvidia GPU conference. He proposed the idea that OpenGL was a DSL. His reasoning was that he could give a formal grammar that accurately captured the structure of many fragments of code making calls to OpenGL. For example you have blocks of code bracketed by glBegin() and glEnd() with sequences of primitives in between. In fact, some people indent their code to reflect this structure as if glBegin() and glEnd() were control structures within the host language. I've argued that every monad gives a DSL. They all have the same syntax - do-notation, but each choice of monad gives quite different semantics for this notation. For example the list monad gives a DSL for non-determinism. -- Dan

Don Stewart

5:30 p.m.

dpiponi:

...

2009/10/7 Joe Fredette :

...
Let me add to this, as I've used the term "DSL" without (*gasp*) fully understanding it before.

In addition to "What is a DSL", I'd like to ask:

"How is a DSL different from an API?"

I don't think there is a sharp divide here. A nice example was given by Pat Hanrahan at the recent nvidia GPU conference. He proposed the idea that OpenGL was a DSL. His reasoning was that he could give a formal grammar that accurately captured the structure of many fragments of code making calls to OpenGL. For example you have blocks of code bracketed by glBegin() and glEnd() with sequences of primitives in between. In fact, some people indent their code to reflect this structure as if glBegin() and glEnd() were control structures within the host language.

I've argued that every monad gives a DSL. They all have the same syntax - do-notation, but each choice of monad gives quite different semantics for this notation. For example the list monad gives a DSL for non-determinism.

I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems. Basic DSLs may only have a few charateristics of languages though -- a (partial) grammar. That's closer to a well-defined API in my books. -- Don

Günther Schmidt

5:59 p.m.

Hi Don,

...

I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction. I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose. Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction. Günther

minh thu

6:06 p.m.

2009/10/7 Günther Schmidt :

...

Hi Don,

...
I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction.

I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose.

Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction.

Günther _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi, Programming abstractions at the DSL level, not to further abstract what the DSL covers. Functions, for instance, are typical abstraction means offered by programming languages. Even if your language is specific to some domain, being able to create your own functions, and not only rely on those provided by the DSL implementation, is important. Imagine a (E)DSL for 3D programming (e.g. shading language): the language is designed to fit well the problem (e.g. in this case, 3D linear algebra, color operations, ...) but you'll agree it would be a shame to not be able to provide your own functions. Cheers, Thu

Ben Franksen

6:52 p.m.

minh thu wrote:

...

2009/10/7 Günther Schmidt :

...
...
I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction.

I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose.

Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction.

Programming abstractions at the DSL level, not to further abstract what the DSL covers.

Functions, for instance, are typical abstraction means offered by programming languages. Even if your language is specific to some domain, being able to create your own functions, and not only rely on those provided by the DSL implementation, is important.

Imagine a (E)DSL for 3D programming (e.g. shading language): the language is designed to fit well the problem (e.g. in this case, 3D linear algebra, color operations, ...) but you'll agree it would be a shame to not be able to provide your own functions.

But isn't one of the advantages of an _E_DSL that we can use the host language (Haskell) as a meta or macro language for the DSL? I would think that this greatly reduces the need to provide abstraction facilities /inside/ the DSL. In fact most existing (and often cited examples of) EDSLs in Haskell do not provide abstraction. Cheers Ben

John Van Enk

7 p.m.

On Wed, Oct 7, 2009 at 2:52 PM, Ben Franksen

...

But isn't one of the advantages of an _E_DSL that we can use the host language (Haskell) as a meta or macro language for the DSL?

Substantially so. I've used brief examples where the EDSL syntax is basically the data declaration (perhaps with some operators overloading constructors) to demonstrate Haskell's fitness as a host language for EDSLs. This is also a credit to the expressiveness of Haskell's data declarations. /jve

minh thu

7:03 p.m.

2009/10/7 Ben Franksen :

...

minh thu wrote:

...
2009/10/7 Günther Schmidt :

...
...
I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction.

I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose.

Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction.

Programming abstractions at the DSL level, not to further abstract what the DSL covers.

Functions, for instance, are typical abstraction means offered by programming languages. Even if your language is specific to some domain, being able to create your own functions, and not only rely on those provided by the DSL implementation, is important.

Imagine a (E)DSL for 3D programming (e.g. shading language): the language is designed to fit well the problem (e.g. in this case, 3D linear algebra, color operations, ...) but you'll agree it would be a shame to not be able to provide your own functions.

But isn't one of the advantages of an _E_DSL that we can use the host language (Haskell) as a meta or macro language for the DSL?

It is.

...

I would think that this greatly reduces the need to provide abstraction facilities /inside/ the DSL. In fact most existing (and often cited examples of) EDSLs in Haskell do not provide abstraction.

Even when you have good macro supports, you don't code everything at the macro level. But it all depends on the particular EDSL we talk about. If the EDSL is close to a regular programming language, it is likely to provide the ability to create functions. Cheers, Thu

Emil Axelsson

7:53 p.m.

Ben Franksen skrev:

...

minh thu wrote:

...
2009/10/7 Günther Schmidt :

...
...
I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction.

I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose.

Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction. Programming abstractions at the DSL level, not to further abstract what the DSL covers.

Functions, for instance, are typical abstraction means offered by programming languages. Even if your language is specific to some domain, being able to create your own functions, and not only rely on those provided by the DSL implementation, is important.

Imagine a (E)DSL for 3D programming (e.g. shading language): the language is designed to fit well the problem (e.g. in this case, 3D linear algebra, color operations, ...) but you'll agree it would be a shame to not be able to provide your own functions.

But isn't one of the advantages of an _E_DSL that we can use the host language (Haskell) as a meta or macro language for the DSL? I would think that this greatly reduces the need to provide abstraction facilities /inside/ the DSL. In fact most existing (and often cited examples of) EDSLs in Haskell do not provide abstraction.

I would say that the DSL is what the user sees. In this view, I think it's correct to say that many (or most) DSLs need function abstraction. Whether or not the internal data structure has function abstraction is an implementation detail. / Emil

Ben Franksen

9 Oct 9 Oct

9:59 p.m.

Emil Axelsson wrote:

...

Ben Franksen skrev:

...
minh thu wrote:

...
2009/10/7 Günther Schmidt :

...
...
I've informally argued that a true DSL -- separate from a good API -- should have semantic characteristics of a language: binding forms, control structures, abstraction, composition. Some have type systems.

That is one requirement that confuses me, abstraction.

I thought of DSLs as "special purpose" languages, ie. you give your DSL everything it needs for that purpose.

Why would it also need the ability to express even further abstractions, it is supposed to *be* the abstraction. Programming abstractions at the DSL level, not to further abstract what the DSL covers.

Functions, for instance, are typical abstraction means offered by programming languages. Even if your language is specific to some domain, being able to create your own functions, and not only rely on those provided by the DSL implementation, is important.

Imagine a (E)DSL for 3D programming (e.g. shading language): the language is designed to fit well the problem (e.g. in this case, 3D linear algebra, color operations, ...) but you'll agree it would be a shame to not be able to provide your own functions.

But isn't one of the advantages of an _E_DSL that we can use the host language (Haskell) as a meta or macro language for the DSL? I would think that this greatly reduces the need to provide abstraction facilities /inside/ the DSL. In fact most existing (and often cited examples of) EDSLs in Haskell do not provide abstraction.

I would say that the DSL is what the user sees. In this view, I think it's correct to say that many (or most) DSLs need function abstraction. Whether or not the internal data structure has function abstraction is an implementation detail.

If it is a stand-alone DSL (i.e. with its own parser), then yes. But I was referring to Embedded DSLs, i.e. DSL as a library in a host language (eg Haskell). In this case the user sees the host language by construction, which means she has less need of function abstraction /inside/ the DSL. Cheers Ben

S. Doaitse Swierstra

12 Oct 12 Oct

2:26 p.m.

Last February we celebrated the 25 years of CS education in Utrecht. On that occasion I wrote a paper for a wider public explaining my views on computer science and on programming languages in particular. I am attaching it since it might shed some light on the discussion, Doaitse

...

Robert Atkey

7 Oct 7 Oct

7:18 p.m.

...

What is a DSL?

How about this as a formal-ish definition, for at least a pretty big class of DSLs: A DSL is an algebraic theory in the sense of universal algebra. I.e. it is an API of a specific form, which consists of: a) a collection of abstract types, the carriers. Need not all be of kind *. b) a collection of operations, of type t1 -> t2 -> ... -> tn where tn must be one of the carrier types from (a), but the others can be any types you like. c) (Optional) a collection of properties about the operations (e.g. equations that must hold) Haskell has a nice way of specifying such things (except part (c)): type classes. Examples of type classes that fit this schema include Monad, Applicative and Alternative. Ones that don't include Eq, Ord and Show. The Num type class would be, if it didn't specify Eq and Show as superclasses. An implementation of a DSL is just an implementation of corresponding type class. Shallowly embedded DSLs dispense with the type class step and just give a single implementation. Deeply embedded implementations are *initial* implementations: there is a unique function from the deep embedding to any of the other implementations that preserves all the operations. The good thing about this definition is that anything we do to the deep embedding, we can do to any of the other implementations via the unique map. Thanks to Church and Reynolds, we can always get a deep embedding for free (free as in "Theorems for Free"). If our DSL is defined by some type class T, then the deep embedding is: type DeepT = forall a. T a => a (and so on, for multiple carrier types, possibly with type parameterisation). Of course, there is often an easier and more efficient way of representing the initial algebra using algebraic data types. Conor McBride often goes on about how the initial algebra (i.e. the deep embedding) of a given specification is the one you should be worrying about, because it often has a nice concrete representation and gives you all you need to reason about any of the other implementations. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Creighton Hogg

8 Oct 8 Oct

12:14 a.m.

2009/10/7 Robert Atkey :

...

...
What is a DSL?

How about this as a formal-ish definition, for at least a pretty big class of DSLs:

A DSL is an algebraic theory in the sense of universal algebra. I.e. it is an API of a specific form, which consists of: a) a collection of abstract types, the carriers. Need not all be of kind *. b) a collection of operations, of type t1 -> t2 -> ... -> tn where tn must be one of the carrier types from (a), but the others can be any types you like. c) (Optional) a collection of properties about the operations (e.g. equations that must hold)

Haskell has a nice way of specifying such things (except part (c)): type classes.

Examples of type classes that fit this schema include Monad, Applicative and Alternative. Ones that don't include Eq, Ord and Show. The Num type class would be, if it didn't specify Eq and Show as superclasses.

An implementation of a DSL is just an implementation of corresponding type class. Shallowly embedded DSLs dispense with the type class step and just give a single implementation. Deeply embedded implementations are *initial* implementations: there is a unique function from the deep embedding to any of the other implementations that preserves all the operations. The good thing about this definition is that anything we do to the deep embedding, we can do to any of the other implementations via the unique map.

Thanks to Church and Reynolds, we can always get a deep embedding for free (free as in "Theorems for Free"). If our DSL is defined by some type class T, then the deep embedding is: type DeepT = forall a. T a => a (and so on, for multiple carrier types, possibly with type parameterisation).

Of course, there is often an easier and more efficient way of representing the initial algebra using algebraic data types.

Conor McBride often goes on about how the initial algebra (i.e. the deep embedding) of a given specification is the one you should be worrying about, because it often has a nice concrete representation and gives you all you need to reason about any of the other implementations.

It's funny, because I wouldn't have thought about this in terms of type classes from the top of my head. What I've been thinking about a lot lately (because I'm trying to prepare notes on it) is building classifying categories from signatures, then considering the category of all possible functorial "models" (read: "dsl embeddings") into the target category. I guess we're essentially talking about the same thing. The difference from looking at it as type classes is that you really do get all your equations preserved with product preserving functors from your classifying category; however, the topic came up earlier today of what would a language look like if it had a built in notion of functorial semantics - my guess is that it'd be like a stronger version of ML functors, but I don't really know. Cheers, C

George Pollard

11 a.m.

I'd also like to note that the canonical pronunciation of DSL ends in "-izzle".

Colin Paul Adams

11:08 a.m.

...

...
...
...
...
"George" == George Pollard writes:

George> I'd also like to note that the canonical pronunciation of George> DSL ends in "-izzle". Whose canon? Interestingly, I have always assumed the canonical pronunciation of DSSSL was diesel, as JADE stands for JAmes's DSSSL Engine. I don't see why removing extra S-es should shorten the vowel. -- Colin Adams Preston Lancashire

Gregg Reynolds

9 Oct 9 Oct

10:11 p.m.

On Thu, Oct 8, 2009 at 6:08 AM, Colin Paul Adams wrote:

...

...
...
...
...
...
"George" == George Pollard writes:

George> I'd also like to note that the canonical pronunciation of George> DSL ends in "-izzle".

Whose canon?

Interestingly, I have always assumed the canonical pronunciation of DSSSL was diesel, as JADE stands for JAmes's DSSSL Engine.

I don't see why removing extra S-es should shorten the vowel.

Wht vwl? U mst b Englsh. 2 n Amrcn, DSSSL is "dissel"; all short vowels. DSL is "dee-ess-ell". "Dizzle" is a brbrzm.

-grgg

Colin Paul Adams

10:47 p.m.

...

...
...
...
...
"Gregg" == Gregg Reynolds writes:

Gregg> On Thu, Oct 8, 2009 at 6:08 AM, Colin Paul Adams Gregg> wrote:

...

...
...
...
...
...
"George" == George Pollard writes: >> George> I'd also like to note that the canonical pronunciation of George> DSL ends in "-izzle". >> >> Whose canon? >> >> Interestingly, I have always assumed the canonical >> pronunciation of DSSSL was diesel, as JADE stands for JAmes's >> DSSSL Engine. >> >> I don't see why removing extra S-es should shorten the vowel. >> >> Wht vwl? U mst b Englsh. 2 n Amrcn, DSSSL is "dissel"; all >> short vowels.

Certainly I am English, and so is James Clark. -- Colin Adams Preston Lancashire

Sjoerd Visscher

12 Oct 12 Oct

1:49 p.m.

Hi Bob, I tried to understand this by applying what you said here to your deep embedding of a parsing DSL. But I can't figure out how to do that. What things become the type class T? greetings, Sjoerd On Oct 7, 2009, at 9:18 PM, Robert Atkey wrote:

...

...
What is a DSL?

How about this as a formal-ish definition, for at least a pretty big class of DSLs:

A DSL is an algebraic theory in the sense of universal algebra. I.e. it is an API of a specific form, which consists of: a) a collection of abstract types, the carriers. Need not all be of kind *. b) a collection of operations, of type t1 -> t2 -> ... -> tn where tn must be one of the carrier types from (a), but the others can be any types you like. c) (Optional) a collection of properties about the operations (e.g. equations that must hold)

Haskell has a nice way of specifying such things (except part (c)): type classes.

Examples of type classes that fit this schema include Monad, Applicative and Alternative. Ones that don't include Eq, Ord and Show. The Num type class would be, if it didn't specify Eq and Show as superclasses.

An implementation of a DSL is just an implementation of corresponding type class. Shallowly embedded DSLs dispense with the type class step and just give a single implementation. Deeply embedded implementations are *initial* implementations: there is a unique function from the deep embedding to any of the other implementations that preserves all the operations. The good thing about this definition is that anything we do to the deep embedding, we can do to any of the other implementations via the unique map.

Thanks to Church and Reynolds, we can always get a deep embedding for free (free as in "Theorems for Free"). If our DSL is defined by some type class T, then the deep embedding is: type DeepT = forall a. T a => a (and so on, for multiple carrier types, possibly with type parameterisation).

Of course, there is often an easier and more efficient way of representing the initial algebra using algebraic data types.

Conor McBride often goes on about how the initial algebra (i.e. the deep embedding) of a given specification is the one you should be worrying about, because it often has a nice concrete representation and gives you all you need to reason about any of the other implementations.

Bob

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Sjoerd Visscher sjoerd@w3future.com

Robert Atkey

5:12 p.m.

On Mon, 2009-10-12 at 15:49 +0200, Sjoerd Visscher wrote:

...

Hi Bob,

I tried to understand this by applying what you said here to your deep embedding of a parsing DSL. But I can't figure out how to do that. What things become the type class T?

Here's the "API" version of the grammar DSL: class GrammarDSL grammar where type Rule grammar :: (* -> *) -> * -> * pure :: a -> Rule grammar nt a (<*>) :: Rule grammar nt (a -> b) -> Rule grammar nt a -> Rule grammar nt b empty :: Rule grammar nt a (<|>) :: Rule grammar nt a -> Rule grammar nt a -> Rule grammar nt a char :: Char -> Rule grammar nt () nt :: nt a -> Rule grammar nt a grammar :: forall nt a. nt a -> (forall a. nt a -> Rule grammar nt a) -> grammar nt a The language of typed-grammars-with-actions is composed of: * two sorts: "grammar"s and "rule"s * "rule"s support the applicative and alternative interfaces, and also two special operators for incorporating terminals and nonterminals into rules. * "grammar"s support a single operation: taking a nonterminal-indexed collection of rules, and a starting non-terminal (as Ben Franksen pointed out), producing a grammar. Basically, the idea is to think 1) "what are the syntactic categories of my DSL?", these become the sorts; 2) "what are the basic syntactic constructions of my DSL?", these become the operations of the type class. Because we are embedded in Haskell, we can have infinite syntax, as demonstrated by the "grammar" operation. WRT the recipe for getting deep embeddings in my previous post, it isn't quite true that the type Grammar nt a = forall grammar. GrammarDSL grammar => grammar nt a is isomorphic to the deep embedding I posted before, because it doesn't guarantee that the applicative functor or alternative laws hold, while the deep embedding does (and it also ensures that <*> and <|> distribute). It isn't hard to come up with a deep embedding that is initial for the completely free version though. The deep embedding from the previous post is an instance of this type class. So is, as Ben Franksen showed, a Parsec parser. I ended up having to inline the applicative and alternative interfaces into the class definition above. I wanted to write: class (Applicative (Rule grammar nt), Alternative (Rule grammar nt)) => Grammar grammar where ... but GHC wouldn't let me, complaining that 'nt' wasn't bound. Is there any reason this couldn't be made to work? Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

5838

Age (days ago)

5859

Last active (days ago)

List overview

Download

39 comments

18 participants

participants (18)

Ben Franksen
Brandon S. Allbery KF8NH
Brent Yorgey
Colin Paul Adams
Creighton Hogg
Dan Piponi
Don Stewart
Emil Axelsson
George Pollard
Gregg Reynolds
Günther Schmidt
Joe Fredette
John Van Enk
minh thu
Nils Anders Danielsson
Robert Atkey
S. Doaitse Swierstra
Sjoerd Visscher

Günther Schmidt

Emil Axelsson

Günther Schmidt

Robert Atkey

Brent Yorgey

Robert Atkey

Robert Atkey

Nils Anders Danielsson

Nils Anders Danielsson

Robert Atkey

Nils Anders Danielsson

Günther Schmidt

Dan Piponi

Günther Schmidt

John Van Enk

Emil Axelsson

Robert Atkey

George Pollard

Colin Paul Adams

Gregg Reynolds

Colin Paul Adams

Robert Atkey

tags

participants (18)

What is a DSL?