
On 24/11/2012, at 5:26 PM, wren ng thornton wrote:
On 11/20/12 6:54 AM, citb@lavabit.com wrote:
Hello,
I know nothing about compilers and interpreters. I checked several books, but none of them explained why we have to translate a high-level language into a small (core) language. Is it impossible (very hard) to directly translate high-level language into machine code?
It is possible to remove stages in the standard compilation pipeline, and doing so can speed up compilation time. For example, Perl doesn't build an abstract syntax tree (for now-outdated performance reasons), and instead compiles the source language directly into bytecode (which is then interpreted by the runtime). This is one of the reasons why Perl is (or was?) so much faster than other interpreted languages like Python etc.
I have found Perl anything from the same speed as AWK (reading and writing lots of data with hardly any processing) to 10 times slower than AWK (with respect to the 'mawk' implementation of AWK). The deeply saddening thing here is that there are lots of improvements I _know_ how to make to mawk to make it even faster, but the thing that stops me is the way mawk generates word-code directly without an AST. I don't know why Perl is direct-to-byte-code, but I'm pretty sure why mawk is direct-to-word-code and The One True Awk interprets its AST. AWK used to run on PDP-11s and for large source files had room for VM instructions or ASTs but not both at the same time. Designing a compiler for fast *compiling* leads to one kind of design; designing for fast *running* of generated code leads to another. And run times for scripting languages depends on things like the structure of hash tables, the quality of hashing functions, the cripplingly excessive richness of certain regular expression libraries, the memory management scheme, ...