feasability of implementing an awk interpreter.

I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level. An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/19/2010 11:05 PM, Michael Litchard wrote:
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
At most TH might save you a little boilerplate, but I don't think there would be enough to bother with it anyway. It should be fairly straightforward to implement in terms of state transformers atop IO for the runtime part, and any reasonable parser framework would do for the program-parsing phase. The hardest part is probably working out how you want to represent the state of the awk engine (variables; input line and fields; "compiled" program fragments and the patterns they're attached to; etc.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxt/HYACgkQIn7hlCsL25X3XwCfW7iw6RZM2r6nDynrNLZ2sAqQ PBMAnRJM/zcyRaZWumuBNytCNRUWGcvE =xrvu -----END PGP SIGNATURE-----

On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
I'd love to have portable pure haskell implementations of the traditional unix tools. If it were done well, it would allow you to 'cabal install' yourself into a usable dev environment on windows :) I'd much rather do that than deal with cygwin/mingw. Someone (was it Stephen Hicks?) was writing (or finished writing?) an sh parser and I got really excited for the same reason. It would be a cool project, but I'm not sure I can justify to myself spending my spare cycles on it.
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.
I think this is a good opportunity for you to learn about monad transformers. To that end, I think you will like this paper (quite easy for beginners to pick up): http://www.grabmueller.de/martin/www/pub/Transformers.en.html At least, that's how I first learned about them and I though it was easy to read at the time :) You might also want to read (and try) some of the tutorials that focus on creating interpreters just to sort of get some practice in that area. I haven't read it, but I've heard good things about this one: http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them. I hope that helps, Jason

On 20/08/2010 1:35 PM, Jason Dagit wrote: fairly easy .. you might want to check out the following tutorial ... http://www.crsr.net/Programming_Languages/SoftwareTools/ch5.html he implements a basic grep tool, you might then want to check out one of the regex packages as a basis for your implementation of awk.
On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard
mailto:michael@schmong.org> wrote: I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
I'd love to have portable pure haskell implementations of the traditional unix tools. If it were done well, it would allow you to 'cabal install' yourself into a usable dev environment on windows :) I'd much rather do that than deal with cygwin/mingw.
Someone (was it Stephen Hicks?) was writing (or finished writing?) an sh parser and I got really excited for the same reason. It would be a cool project, but I'm not sure I can justify to myself spending my spare cycles on it.
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.
I think this is a good opportunity for you to learn about monad transformers. To that end, I think you will like this paper (quite easy for beginners to pick up): http://www.grabmueller.de/martin/www/pub/Transformers.en.html
At least, that's how I first learned about them and I though it was easy to read at the time :)
You might also want to read (and try) some of the tutorials that focus on creating interpreters just to sort of get some practice in that area. I haven't read it, but I've heard good things about this one: http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours
You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.
I hope that helps, Jason
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit
On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard
wrote: I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
Implementing an awk interpreter in Haskell can be a fun project. I have a
http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.
You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.
When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this
half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter. An outline of a possible approach would be appreciated. I am using that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef

Thank you all for your encouragement. I need to think about the core
functionality, and do some reading.
On Fri, Aug 20, 2010 at 2:33 AM, Josef Svenningsson
On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit
wrote: On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard
wrote: I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
Implementing an awk interpreter in Haskell can be a fun project. I have a half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter.
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.
You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.
When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef

There's a lot of examples of languages implemented in Haskell to choose from, too http://haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_inte... michael:
Thank you all for your encouragement. I need to think about the core functionality, and do some reading.
On Fri, Aug 20, 2010 at 2:33 AM, Josef Svenningsson
wrote: On Fri, Aug 20, 2010 at 6:05 AM, Jason Dagit
wrote: On Thu, Aug 19, 2010 at 8:05 PM, Michael Litchard
wrote: I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
Implementing an awk interpreter in Haskell can be a fun project. I have a half finished implementation lying around on the hard drive. It's perfectly possible to implement it without using any super fancy language features. But as other people have pointed out, monads are helpful for dealing with a lot of the plumbing in the interpreter.
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description.
You might also focus on the 'core' of awk. Think about, what is the minimal language and start from there. Grow your implementation adding features bit by bit. It's also a good opportunity to do testing. You have a reference implementation and so you can write lots of tests for each feature as you add them.
When I wrote my awk interpreter I decided to go for the whole language from start. I had reasons for doing this as there were certain aspects of this that I wanted to capture but it is not they way I would recommend going about it. I definitely second Jason's advice at trying to capture the core functionality first. Have fun, Josef
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Aug 21, 2010, at 5:14 AM, Michael Litchard wrote:
Thank you all for your encouragement. I need to think about the core functionality, and do some reading.
But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK. There are weird little gotchas like File "foo" = '{ prin' File "bar" = 't 2 }' awk -f foo -f bar is legal and is required to act the same as awk '{ print 2 }' mawk fails this, and I don't blame it, and I don't really _care_. Is that "core"? Who knows? Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.

On Mon, Aug 23, 2010 at 8:07 AM, Richard O'Keefe
But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK. There are weird little gotchas like File "foo" = '{ prin' File "bar" = 't 2 }' awk -f foo -f bar is legal and is required to act the same as awk '{ print 2 }' mawk fails this, and I don't blame it, and I don't really _care_. Is that "core"? Who knows?
I say that that behaviour is not part of the language but of the runtime.
Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.
One approach to find the core of a language is to find which parts can be implemented in terms of other parts. If part B can be expressed in terms of part A then B doesn't belong in the core.

On Aug 23, 2010, at 7:00 PM, Roel van Dijk wrote:
On Mon, Aug 23, 2010 at 8:07 AM, Richard O'Keefe
wrote: But what _is_ "the core functionality". The Single Unix Specification can be browsed on-line. There is no part of it labelled "core"; it's all required or it isn't AWK.
[If -f progfile is specified, the application shall ensure that the files named by each of the progfile option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an awk program. ] is what I was referring to.
Is that "core"? Who knows?
I say that that behaviour is not part of the language but of the runtime.
Actually, it's a *compile*-time thing.
Whatever the "core functionality" might be, YOU will have to define what that "core" is. There's no standard, or even common, sublanguage.
One approach to find the core of a language is to find which parts can be implemented in terms of other parts. If part B can be expressed in terms of part A then B doesn't belong in the core.
Agreed. But it's not clear that AWK *has* a non-trivial core in that
sense. OK, so you can define != in terms of == and >,<=,>= in terms
of <, and you can define + and unary - in terms of infix -. And you
can define (a,b,c,...) as (a SUBSEP b SUBSEP c SUBSEP ...). But you
can't, for example, define
print <number>
in terms of
print (<number> "")
because number printing and number to string printing use different
format variables (OFMT and CONVFMT respectively), and you can't
define the two of them in terms of sprintf() because there is no
way for an AWK program to _test_ whether a value is a number or a
string or an uninitialized value (which has defined properties) or
an uncommitted numeric string.
What you would have to do would be to define an *extended* 'core'
containing
case(E; U, x.I, x.F, x.UI, x.UF, x.S)
U - what to do for uninitialized value
x.I - what to do for an integral value
x.F - what to do for a non-integral number
x.UI - what to do for a uncommitted maybe-integer-maybe-string
x.UF - what to do for an uncommitted maybe-float-maybe-string
x.S - what to do for a string
That is, the core you need contains operations that are NOT in the
source language.
Here's one of my favourite quotations from the Single Unix Specification
V3 description of AWK:
For example, with historical implementations the following program:
{
a = "+2"
b = 2
if (NR % 2)
c = a + b
if (a == b)
print "numeric comparison"
else
print "string comparison"
}
would perform a numeric comparison (and output numeric comparison)
for each odd-numbered line, but perform a string comparison (and
output string comparison) for each even-numbered line. IEEE Std 1003.1-2001 ensures that comparisons will be numeric if necessary.
I just tried four AWK implementations.
GNU AWK and Mike's AWK both wrote
string comparison
string comparison
string comparison
string comparison
as required by the standard. But two others (one provided by a major
UNIX vendor, and the other provided by one of the inventors of AWK)
did indeed write
numeric comparison
string comparison
numeric comparison
string comparison
Now let's make an apparently tiny change to the program.
Let's replace
a = "+2"
by
a = ENVIRON["FOO"]
and do
setenv FOO +2
in the shell. Now all four implementations print
numeric comparison
four times.
Getting this right is not just a tiny tweak to the system,
it's a fundamental issue that affects the way you represent
AWK 'values' in your interpreter.
Then there are the undefined things.
Consider
BEGIN {
echo = "echo"
n = getline

I don't think Template Haskell will be essential for this - you will probably need a parser (probably written with Parsec), an eval function, and a state monad to represent imperative changes made by the language you're evaluating. Template Haskell is more for the elimination of boilerplate code or turning specs into compile-time constraints. On Thursday Aug 19, 2010, at 11:05 PM, Michael Litchard wrote:
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
An outline of a possible approach would be appreciated. I am using http://www.math.utah.edu/docs/info/gawk_toc.html as a guide to the language description. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/21/2010 11:22 PM, Bill Atkins wrote:
I don't think Template Haskell will be essential for this - you will probably need a parser (probably written with Parsec), an eval function, and a state monad to represent imperative changes made by the language you're evaluating. Template Haskell is more for the elimination of boilerplate code or turning specs into compile-time constraints.
On Thursday Aug 19, 2010, at 11:05 PM, Michael Litchard wrote:
I'd like the community to give me feedback on the difficulty level of implementing an awk interpreter. What language features would be required? Specifically I'm hoping that TH is not necessary because I'm nowhere near that skill level.
Something that might not be clear to a beginning Haskell programmer is that laziness subsumes many of the things you would use macros for in another language. In particular, you can trivially create new "control structures" because the code you control with them only executes when needed. This is why Haskell is popular for EDSLs (embedded domain-specific languages). Template Haskell can usually be ignored until you're programming in the type system or other advanced usages. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxwm14ACgkQIn7hlCsL25X+DwCfdsb+UABmQw1y9489F973EpC1 oKAAn1a2OKqrJpAzpZzUenHaP8gG6zPo =eWBI -----END PGP SIGNATURE-----
participants (9)
-
Bill Atkins
-
Brandon S Allbery KF8NH
-
Don Stewart
-
Jason Dagit
-
John Lask
-
Josef Svenningsson
-
Michael Litchard
-
Richard O'Keefe
-
Roel van Dijk