feasability of implementing an awk interpreter.

newer
How to substitute FD like a -> b c...

Michael Litchard

20 Aug 2010 20 Aug '10

3:05 a.m.

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level. An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

Show replies by date

Brandon S Allbery KF8NH

20 Aug 20 Aug

3:54 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/19/2010 11:05 PM, Michael Litchard wrote:

...

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

At most TH might save you a little boilerplate, but I don't think there would be enough to bother with it anyway. It should be fairly straightforward to implement in terms of state transformers atop IO for the runtime part, and any reasonable parser framework would do for the program-parsing phase. The hardest part is probably working out how you want to represent the state of the awk engine (variables; input line and fields; "compiled" program fragments and the patterns they're attached to; etc.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxt/HYACgkQIn7hlCsL25X3XwCfW7iw6RZM2r6nDynrNLZ2sAqQ PBMAnRJM/zcyRaZWumuBNytCNRUWGcvE =xrvu -----END PGP SIGNATURE-----

Jason Dagit

4:05 a.m.

On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard wrote:

...

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

I'd love to have portable pure haskell implementations of the traditional unix tools. If it were done well, it would allow you to 'cabal install' yourself into a usable dev environment on windows :) I'd much rather do that than deal with cygwin/mingw. Someone (was it Stephen Hicks?) was writing (or finished writing?) an sh parser and I got really excited for the same reason. It would be a cool project, but I'm not sure I can justify to myself spending my spare cycles on it.

...

An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

I think this is a good opportunity for you to learn about monad transformers. To that end, I think you will like this paper (quite easy for beginners to pick up): http://www.grabmueller.de/martin/www/pub/Transformers.en.html At least, that's how I first learned about them and I though it was easy to read at the time :) You might also want to read (and try) some of the tutorials that focus on creating interpreters just to sort of get some practice in that area. I haven't read it, but I've heard good things about this one: http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them. I hope that helps, Jason

John Lask

7:32 a.m.

On 20/08/2010 1:35 PM, Jason Dagit wrote: fairly easy .. you might want to check out the following tutorial ... http://www.crsr.net/Programming_Languages/SoftwareTools/ch5.html he implements a basic grep tool, you might then want to check out one of the regex packages as a basis for your implementation of awk.

...

On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard mailto:michael@schmong.org> wrote:

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

I'd love to have portable pure haskell implementations of the traditional unix tools. If it were done well, it would allow you to 'cabal install' yourself into a usable dev environment on windows :) I'd much rather do that than deal with cygwin/mingw.

Someone (was it Stephen Hicks?) was writing (or finished writing?) an sh parser and I got really excited for the same reason. It would be a cool project, but I'm not sure I can justify to myself spending my spare cycles on it.

An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

I think this is a good opportunity for you to learn about monad transformers. To that end, I think you will like this paper (quite easy for beginners to pick up): http://www.grabmueller.de/martin/www/pub/Transformers.en.html

At least, that's how I first learned about them and I though it was easy to read at the time :)

You might also want to read (and try) some of the tutorials that focus on creating interpreters just to sort of get some practice in that area. I haven't read it, but I've heard good things about this one: http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.

I hope that helps, Jason

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Josef Svenningsson

9:33 a.m.

On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit wrote:

...

On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard wrote:

...
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

Implementing an awk interpreter in Haskell can be a fun project. I have a

...

...
http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.

When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this

half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter. An outline of a possible approach would be appreciated. I am using that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef

Michael Litchard

5:14 p.m.

Thank you all for your encouragement. I need to think about the core functionality, and do some reading. On Fri, Aug 20, 2010 at 2:33 AM, Josef Svenningsson wrote:

...

On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit wrote:

...
On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard wrote:

...
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

Implementing an awk interpreter in Haskell can be a fun project. I have a half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter.

...
...
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.

When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef

Don Stewart

5:18 p.m.

There's a lot of examples of languages implemented in Haskell to choose from, too http://haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_inte... michael:

...

Thank you all for your encouragement. I need to think about the core functionality, and do some reading.

On Fri, Aug 20, 2010 at 2:33 AM, Josef Svenningsson wrote:

...
On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit wrote:

...
On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard wrote:

...
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

Implementing an awk interpreter in Haskell can be a fun project. I have a half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter.

...
...
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.

When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef

Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Richard O'Keefe

23 Aug 23 Aug

6:07 a.m.

On Aug 21, 2010, at 5:14 AM, Michael Litchard wrote:

...

Thank you all for your encouragement. I need to think about the core functionality, and do some reading.

But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK. There are weird little gotchas like File "foo" = '{ prin' File "bar" = 't 2 }' awk -f foo -f bar is legal and is required to act the same as awk '{ print 2 }' mawk fails this, and I don't blame it, and I don't really _care_. Is that "core"? Who knows? Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.

...

Roel van Dijk

7 a.m.

On Mon, Aug 23, 2010 at 8:07 AM, Richard O'Keefe wrote:

...

But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK. There are weird little gotchas like File "foo" = '{ prin' File "bar" = 't 2 }' awk -f foo -f bar is legal and is required to act the same as awk '{ print 2 }' mawk fails this, and I don't blame it, and I don't really _care_. Is that "core"? Who knows?

I say that that behaviour is not part of the language but of the runtime.

...

Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.

One approach to find the core of a language is to find which parts can be implemented in terms of other parts. If part B can be expressed in terms of part A then B doesn't belong in the core.

Richard O'Keefe

24 Aug 24 Aug

12:43 a.m.

On Aug 23, 2010, at 7:00 PM, Roel van Dijk wrote:

...

On Mon, Aug 23, 2010 at 8:07 AM, Richard O'Keefe wrote:

...
But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK.

[If -f progfile is specified, the application shall ensure that the files named by each of the progfile option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an awk program. ] is what I was referring to.

...

...
Is that "core"? Who knows?

I say that that behaviour is not part of the language but of the runtime.

Actually, it's a *compile*-time thing.

...

...
Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.

One approach to find the core of a language is to find which parts can be implemented in terms of other parts. If part B can be expressed in terms of part A then B doesn't belong in the core.

Agreed. But it's not clear that AWK *has* a non-trivial core in that sense. OK, so you can define != in terms of == and >,<=,>= in terms of <, and you can define + and unary - in terms of infix -. And you can define (a,b,c,...) as (a SUBSEP b SUBSEP c SUBSEP ...). But you can't, for example, define print <number> in terms of print (<number> "") because number printing and number to string printing use different format variables (OFMT and CONVFMT respectively), and you can't define the two of them in terms of sprintf() because there is no way for an AWK program to _test_ whether a value is a number or a string or an uninitialized value (which has defined properties) or an uncommitted numeric string. What you would have to do would be to define an *extended* 'core' containing case(E; U, x.I, x.F, x.UI, x.UF, x.S) U - what to do for uninitialized value x.I - what to do for an integral value x.F - what to do for a non-integral number x.UI - what to do for a uncommitted maybe-integer-maybe-string x.UF - what to do for an uncommitted maybe-float-maybe-string x.S - what to do for a string That is, the core you need contains operations that are NOT in the source language. Here's one of my favourite quotations from the Single Unix Specification V3 description of AWK: For example, with historical implementations the following program: { a = "+2" b = 2 if (NR % 2) c = a + b if (a == b) print "numeric comparison" else print "string comparison" } would perform a numeric comparison (and output numeric comparison) for each odd-numbered line, but perform a string comparison (and output string comparison) for each even-numbered line. IEEE Std 1003.1-2001 ensures that comparisons will be numeric if necessary. I just tried four AWK implementations. GNU AWK and Mike's AWK both wrote string comparison string comparison string comparison string comparison as required by the standard. But two others (one provided by a major UNIX vendor, and the other provided by one of the inventors of AWK) did indeed write numeric comparison string comparison numeric comparison string comparison Now let's make an apparently tiny change to the program. Let's replace a = "+2" by a = ENVIRON["FOO"] and do setenv FOO +2 in the shell. Now all four implementations print numeric comparison four times. Getting this right is not just a tiny tweak to the system, it's a fundamental issue that affects the way you represent AWK 'values' in your interpreter. Then there are the undefined things. Consider BEGIN { echo = "echo" n = getline

...

Bill Atkins

22 Aug 22 Aug

3:22 a.m.

I don't think Template Haskell will be essential for this - you will probably need a parser (probably written with Parsec), an eval function, and a state monad to represent imperative changes made by the language you're evaluating. Template Haskell is more for the elimination of boilerplate code or turning specs into compile-time constraints. On Thursday Aug 19, 2010, at 11:05 PM, Michael Litchard wrote:

...

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Brandon S Allbery KF8NH

3:37 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/21/2010 11:22 PM, Bill Atkins wrote:

...

I don't think Template Haskell will be essential for this - you will probably need a parser (probably written with Parsec), an eval function, and a state monad to represent imperative changes made by the language you're evaluating. Template Haskell is more for the elimination of boilerplate code or turning specs into compile-time constraints.

On Thursday Aug 19, 2010, at 11:05 PM, Michael Litchard wrote:

...
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.

Something that might not be clear to a beginning Haskell programmer is that laziness subsumes many of the things you would use macros for in another language. In particular, you can trivially create new "control structures" because the code you control with them only executes when needed. This is why Haskell is popular for EDSLs (embedded domain-specific languages). Template Haskell can usually be ignored until you're programming in the type system or other advanced usages. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxwm14ACgkQIn7hlCsL25X+DwCfdsb+UABmQw1y9489F973EpC1 oKAAn1a2OKqrJpAzpZzUenHaP8gG6zPo =eWBI -----END PGP SIGNATURE-----

5430

Age (days ago)

5434

Last active (days ago)

List overview

Download

11 comments

9 participants

participants (9)

Bill Atkins
Brandon S Allbery KF8NH
Don Stewart
Jason Dagit
John Lask
Josef Svenningsson
Michael Litchard
Richard O'Keefe
Roel van Dijk