Data constructors versus types

Peter Verswyvelen

16 Jan 2008 16 Jan '08

3:45 p.m.

I know nothing about theoretical computer science, but I was wondering if it possible to forget about types, and just keep the concept of data constructors, and have an analyzer determine correctness of the code and "staticness" of the data? Basically this is what SCHEME does no? Doesn't SCHEME have static whole program analyzers to remove the overhead of the "symbol tags" and check correctness of a program (Stalin, Petit-Scheme, ...)? What are to pros/contras? Thank you, Peter

Show replies by date

Don Stewart

16 Jan 16 Jan

3:50 p.m.

bf3:

...

I know nothing about theoretical computer science, but I was wondering if it possible to forget about types, and just keep the concept of data constructors, and have an analyzer determine correctness of the code and "staticness" of the data?

The analysis would be type inference and checking. -- Don

Achim Schneider

4:20 p.m.

Peter Verswyvelen wrote:

...

I know nothing about theoretical computer science, but I was wondering if it possible to forget about types, and just keep the concept of data constructors, and have an analyzer determine correctness of the code and "staticness" of the data?

Basically this is what SCHEME does no? Doesn't SCHEME have static whole program analyzers to remove the overhead of the "symbol tags" and check correctness of a program (Stalin, Petit-Scheme, ...)?

What are to pros/contras?

Basically, it's a matter of taste, and how much of the checking can be done at compile-time... which gets quite involved and O(big), if all you have is (tagged) lists with type information. And, yes, Stalin manages to specialize a -> a functions to Int -> Int to make numerical code as fast or faster than C, but so does GHC. Plus a GHC build may allow you to get a coffee, Stalin allows you to go shopping, watch a movie and then go on vacation. That is because, in general, you can't forget about the type of your data, you need it in some way or the other to do anything with it. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Achim Schneider

4:30 p.m.

Achim Schneider wrote:

...

And, yes, Stalin manages to specialize a -> a functions to Int -> Int to make numerical code as fast or faster than C, but so does GHC.

That is, seen formally, quite fuzzy. I'm going to be beaten for it. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Peter Verswyvelen

4:56 p.m.

Thank you for explaining. I was wondering if the same syntax could be used somehow (not in Haskell, in some theoretical language), I mean use an annotation to tell the compiler that a "type-tag" should be determined at compile time and not at runtime, otherwise -> error So eg // Runtime tag, aka data constructor foo (Int n) = ... // Compile tag, aka type foo (>Int< n) = ... Might not make any sense... You're talking about O(big)... But wasn't the C language in some way succesful because on the hardware at that time other much nicer languages (e.g. LISP) were just way too slow? Or was this just O(n) times slower? IMHO: Shouldn't concepts that are conceptually the same (in this case, "giving meaning/adding constraints to bits of data" ) at runtime and compile time look very similar in the language? Most languages require completely different syntax and code when you want something to be lazy versus strict. Haskell doesn't, you can just add an annotation if you want it to be strict, no much rewriting is required. However, if I want to change a runtime data constructor definition (and code) into a compiletime type, then I can rewrite all of my code basically. That is not the case in SCHEME as far as I understand it. On Wed, 2008-01-16 at 22:20 +0100, Achim Schneider wrote:

...

Peter Verswyvelen wrote:

...
I know nothing about theoretical computer science, but I was wondering if it possible to forget about types, and just keep the concept of data constructors, and have an analyzer determine correctness of the code and "staticness" of the data?

Basically this is what SCHEME does no? Doesn't SCHEME have static whole program analyzers to remove the overhead of the "symbol tags" and check correctness of a program (Stalin, Petit-Scheme, ...)?

What are to pros/contras?

Basically, it's a matter of taste, and how much of the checking can be done at compile-time... which gets quite involved and O(big), if all you have is (tagged) lists with type information.

And, yes, Stalin manages to specialize a -> a functions to Int -> Int to make numerical code as fast or faster than C, but so does GHC.

Plus a GHC build may allow you to get a coffee, Stalin allows you to go shopping, watch a movie and then go on vacation.

That is because, in general, you can't forget about the type of your data, you need it in some way or the other to do anything with it.

Achim Schneider

6:31 p.m.

Peter Verswyvelen wrote:

...

Thank you for explaining.

I was wondering if the same syntax could be used somehow (not in Haskell, in some theoretical language), I mean use an annotation to tell the compiler that a "type-tag" should be determined at compile time and not at runtime, otherwise -> error

So eg

// Runtime tag, aka data constructor foo (Int n) = ...

// Compile tag, aka type foo (>Int< n) = ...

Might not make any sense...

ghc --ddump-simpl and assure that your values get unboxed...

...

You're talking about O(big)... But wasn't the C language in some way succesful because on the hardware at that time other much nicer languages (e.g. LISP) were just way too slow? Or was this just O(n) times slower?

Compiler technology also wasn't as advanced as now, Stalin can't compile even small programs under say 5 minutes... compare this to e.g. TurboPascal, which afair uses three passes: Parsing, Error Reporting, Code Generation, it was similar with C compilers back then. Lisp was fast on lisp machines, where it is the same as what C is to Neumann-Architectures: An assembler. I'm not at all sure about the specific O's involved, but I guess it's quite easy to get to NP-complete if you want to do really much without much information.

...

IMHO: Shouldn't concepts that are conceptually the same (in this case, "giving meaning/adding constraints to bits of data" ) at runtime and compile time look very similar in the language? Most languages require completely different syntax and code when you want something to be lazy versus strict. Haskell doesn't, you can just add an annotation if you want it to be strict, no much rewriting is required. However, if I want to change a runtime data constructor definition (and code) into a compiletime type, then I can rewrite all of my code basically. That is not the case in SCHEME as far as I understand it.

Scheme knows no types but the builtins like INT or PAIR or LIST or SYMBOL and so on. Even if you distinguish say ('complex 1 2) from ('vec3 1 2 3) , the compiler in general won't stop you from passing these things into the wrong functions. It doesn't even know that a function is passed a "LIST (QUOTEDSYMBOL INT INT)" or "LIST (QUOTEDSYMBOL INT INT INT)", it just sees a pair, in both cases. Lisp is actually not really meant to be compiled, but interpreted. The nice thing is that it doesn't need more than a handful of primitives, a list parser and heap manager/garbage collector and evaluator, which all can be implemented in under 1000 lines of C. Things get more involved with get/cc, but then how many C programmers ever heard of setjmp... on top of my head, one set of possible primitives is quote lambda set! succ pred cond. you can then start by defining define by (set! 'define (lambda (sym f) (set! sym f))) There's the wizard book and Graham's Ansi Common Lisp if you're interested in how cheap lisp actually is. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

jerzy.karczmarczuk＠info.unicaen.fr

6:58 p.m.

Achim Schneider writes:

...

Lisp is actually not really meant to be compiled, but interpreted. The nice thing is that it doesn't need more than a handful of primitives, a list parser and heap manager/garbage collector and evaluator, which all can be implemented in under 1000 lines of C. Things get more involved with get/cc, but then how many C programmers ever heard of setjmp...

Would you mind stopping to spread dubious truths? Certainly, Lisp processors started with simple eval/apply interpreters, since they were easy to construct, but compilers, their name is Legion! Look at CMU Common Lisp compiler. GNU CLISP compiler Lisp Works compiler Allegro compiler ... There are also Lisp->C translators. The result is of course compiled. CLiCC, this is a German (Kiel) product. Perhaps not so far from you. Where did you read that Lisp is not meant to be compiled, for goodness' sake!? Jerzy Karczmarczuk

Brandon S. Allbery KF8NH

8:21 p.m.

On Jan 16, 2008, at 18:58 , jerzy.karczmarczuk@info.unicaen.fr wrote:

...

Achim Schneider writes:

...
Lisp is actually not really meant to be compiled, but interpreted. The

Would you mind stopping to spread dubious truths? Certainly, Lisp processors started with simple eval/apply interpreters, since they were easy to construct, but compilers, their name is Legion!

He is correct given that he almost certainly means "was not originally meant to be compiled" --- and please, spare us the obvious pedantry. Also, you might want to take a close look at your public persona as exposed on this list. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

gwern0＠gmail.com

8:31 p.m.

On 2008.01.17 00:58:19 +0100, jerzy.karczmarczuk@info.unicaen.fr scribbled 0.9K characters:

...

Achim Schneider writes:

...
Lisp is actually not really meant to be compiled, but interpreted. The nice thing is that it doesn't need more than a handful of primitives, a list parser and heap manager/garbage collector and evaluator, which all can be implemented in under 1000 lines of C. Things get more involved with get/cc, but then how many C programmers ever heard of setjmp...

Would you mind stopping to spread dubious truths? Certainly, Lisp processors started with simple eval/apply interpreters, since they were easy to construct, but compilers, their name is Legion! Look at CMU Common Lisp compiler. GNU CLISP compiler Lisp Works compiler Allegro compiler ... ... Jerzy Karczmarczuk

I don't think it's a dubious truth. Apparently a lot of Lisps (like Maclisp or Interlisp, I hear) had a situation where the semantics of a program could differ depending on whether it was compiled or interpreted, and Scheme and Common Lisp made a point of trying to avoid that. In _Introduction to Common Lisp_, we read: "Most Lisp implementations are internally inconsistent in that by default the interpreter and compiler may assign different semantics to correct programs. This semantic difference stems primarily from the fact that the interpreter assumes all variables to be dynamically scoped, whereas the compiler assumes all variables to be local unless explicitly directed otherwise. This difference has been the usual practice in Lisp for the sake of convenience and efficiency but can lead to very subtle bugs. The definition of Common Lisp avoids such anomalies by explicitly requiring the interpreter and compiler to impose identical semantics on correct programs so far as possible." http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node6.html#SECTION005100000000... Given that it was designed as interpreted, compilation was motivated by efficiency concerns, and interpreted techniques differed from compiled techniques (and in a way that would allow you to redefine and change more stuff on the fly), I think it's a reasonable case to say that many Lisps - like all the ones before Scheme and CL - were meant to be interpreted and not so much compiled. -- gwern NATIA DIA Burns espionage 97 utopia orthodox Meade cond SOCIMI

Anton van Straaten

17 Jan 17 Jan

2:43 a.m.

gwern0@gmail.com wrote:

...

On 2008.01.17 00:58:19 +0100, jerzy.karczmarczuk@info.unicaen.fr scribbled 0.9K characters:

...
Achim Schneider writes:

...
Lisp is actually not really meant to be compiled, but interpreted. ... Would you mind stopping to spread dubious truths? ... I don't think it's a dubious truth.

It's about as accurate as saying "Television is actually not really meant to be color, but black and white".

Achim Schneider

12:07 p.m.

Anton van Straaten wrote:

...

gwern0@gmail.com wrote:

...
On 2008.01.17 00:58:19 +0100, jerzy.karczmarczuk@info.unicaen.fr scribbled 0.9K characters:

...
Achim Schneider writes:

...
Lisp is actually not really meant to be compiled, but interpreted. ... Would you mind stopping to spread dubious truths? ... I don't think it's a dubious truth.

It's about as accurate as saying "Television is actually not really meant to be color, but black and white".

Yes, that about fits... the chroma data still has only half the resolution of luminosity. In fact, it wasn't even meant to be a programming language, just a calculus. But still, I should have written "was meant" not "is meant". -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

jerzy.karczmarczuk＠info.unicaen.fr

3:27 p.m.

Achim Schneider continues to comment the Lisp history:

...

In fact, it wasn't even meant to be a programming language, just a calculus.

There is comprehensive German article (in English), by Herbert Stoyan, on this historical issue: http://www8.informatik.uni-erlangen.de/html/lisp/histlit1.html Stoyan reminds a - currently - not so obvious truth, that something like functional paradigmatics was not so popular at that time, the second half of fifties, beginning of sixties. People reasoned rather in terms of algorithms, and McCarthy was no exception. Let's cite Stoyan: "To come back to functional programming, it is an important fact that McCarthy as mathematician was familiar with some formal mathematical languages but did not have a deep, intimate understanding of all their details. McCarthy himself has stressed this fact (23). His aim was to use the mathematical formalismus as languages and not as calculi. This is the root of the historical fact that he never took the Lambda-Calculus conversion rules as a sound basis for LISP implementation." So, I believe it is not so briliant an idea to confound the Church calculus with Lisp! We have also the text of the Master himself, available on-line: http://www-formal.stanford.edu/jmc/history/lisp/lisp.html The chapter on the "prehistory": http://www-formal.stanford.edu/jmc/history/lisp/node2.html#SECTION0002000000 0000000000 begins: "My desire for an algebraic list processing language for artificial intelligence work on the IBM 704 computer arose in the summer of 1956 during the Dartmouth Summer Research Project on Artificial Intelligence which was the first organized study of AI. During this meeting, Newell, Shaw and Simon described IPL 2, a list processing language for Rand Corporation's JOHNNIAC..." So, sorry, but McCarthy since the very beginning thought about making a usable computer language, not a "calculus". When discussing the evolution of FLPL, the *third* point mentions Church for the first time: "c. To use functions as arguments, one needs a notation for functions, and it seemed natural to use the -notation of Church (1941). I didn't understand the rest of his book, so I wasn't tempted to try to implement his more general mechanism for defining functions. Church used higher order functionals instead of using conditional expressions. ..." See also the article of Paul Graham: http://lib.store.yahoo.net/lib/paulgraham/jmc.ps ======= Before somebody once more accuses me of pedantry, or says publicly here that I am aggressive towards people: You know that spreading half-truths, and also plain rubbish on Internet is extremely easy. Wherever you look, you find plenty of occasions to err, it suffices to put yourself in a mode of a dead person from the movie "The sixth sense" of M. Night Shyamalan, with Bruce Willis, and Haley Joel Osment. The boy says to the other main personage (unaware of his own condition) that "dead people see only what they WANT to see"... And somehow the false information spreads easier than the true one. THIS list, which, independently of the freedom to chat about anything, is still targeted at serious comp. sci. problems, and I think that the fact that somebody is young and inexperienced, is not a justification to make false claims, just because this and that had XXX years less than myself in order to read some easily available texts. Of course, anybody may say "dubious truths", I am ashamed of myself, but from time to time I explode. Sorry about this. Jerzy Karczmarczuk

Anton van Straaten

3:51 p.m.

jerzy.karczmarczuk@info.unicaen.fr wrote:

...

[McCarthy's] aim was to use the mathematical formalismus as languages and not as calculi. This is the root of the historical fact that he never took the Lambda-Calculus conversion rules as a sound basis for LISP implementation." So, I believe it is not so briliant an idea to confound the Church calculus with Lisp!

It's difficult to extricate the two, as Lisp & McCarthy's experience shows. The decision to use lambda notation without the accompanying semantics would have been fine if Lisp had not also had first-class functions. But with first-class functions, and without lexical scoping semantics, Lisp suffered from scoping bugs which were only resolved once Lisp's 'lambda' was changed to follow Church's semantics, as Sussman and Steele originally did for Scheme. When CL adopted lexical scoping, it was seen as a choice, but it wasn't really much of a choice. The choice was between continuing with a fundamentally buggy language and working around those bugs somehow, or fixing it by adopting Church's lexical scoping rules.

...

Wherever you look, you find plenty of occasions to err, it suffices to put yourself in a mode of a dead person from the movie "The sixth sense" of M. Night Shyamalan, with Bruce Willis, and Haley Joel Osment.

I see dead languages... Anton

Achim Schneider

4:40 p.m.

jerzy.karczmarczuk@info.unicaen.fr wrote:

...

[...]

I heard it somewhere trustworthy. Instinctively, I would guess somewhere into the direction of Graham, but I'm not sure at all. On the other hand, you can be absolutely sure that I didn't get it off the next warez-board nor from Bill Gates. The Story, afaicr, went along the lines of "Professor writes something for a cool paper about some calculus, student gets hold of it, implements it and completely baffles the professor, he didn't ever dare to think about letting that one run on a computer". It might be one of those apples that didn't fall off trees while Newton had his idea about gravity, but then it's a complete, nice story with a nice morale on its own. After all, lisp's history isn't as important for me than its semantics, and everything I read about it was kind of accidental, it just wasn't uninteresting enough to skip. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Peter Verswyvelen

2:16 p.m.

...

It's about as accurate as saying "Television is actually not really meant to be color, but black and white".

Funny, but that is actually correct, since both NTSC and PAL did a lot of tricks to carry color information using the same infrastructure as black and white TVs :) Of course that will soon be over when we all go digital, then we have jaggies and blockies because of over-compression :) Okay, ooooooooff topic. Cheers, Peter

Richard A. O'Keefe

16 Jan 16 Jan

10:38 p.m.

On 17 Jan 2008, at 12:31 pm, Achim Schneider wrote:

...

Lisp is actually not really meant to be compiled, but interpreted.

The "classic" Lisp is Lisp 1.5. The Lisp 1.5 Programmer's Manual, published in I think 1961, contains "Appendix D: The Lisp Compiler". If I'm reading appendix G correctly, the compiler was under 4000 words of storage.

...

The nice thing is that it doesn't need more than a handful of primitives, a list parser and heap manager/garbage collector and evaluator, which all can be implemented in under 1000 lines of C. Things get more involved with get/cc, but then how many C programmers ever heard of setjmp...

I have no idea what get/cc might be, unless it is a mistake for call/cc, but that's Scheme, not Lisp. "Classic" Lisp stack management wasn't really any harder than Pascal stack management (in part because classic Lisps were still struggling to get closures right).

Achim Schneider

17 Jan 17 Jan

12:11 p.m.

"Richard A. O'Keefe" wrote:

...

I have no idea what get/cc might be, unless it is a mistake for call/cc, but that's Scheme, not Lisp.

Erm... yes. I guess it's the part of call/cc that gets the continuation before calling it. Actually, I shouldn't be talking about stuff that was used before I was even planned... -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Peter Verswyvelen

2:32 a.m.

...

ghc --ddump-simpl and assure that your values get unboxed...

I was not really talking about boxed/unboxed values, that's another issue I think. What I was talking about is more related to the work of Neil Mitchell and Colin Runciman in their static checker for pattern matching http://www-users.cs.york.ac.uk/~ndm/downloads/paper-unfailing_haskell_a_stat... For example, if we would have a language that only knows "bits" as datatype, together with data constructors (tagging the bits): data Number = Int Bits#32 | Float Bits#32 (Int x) + (Int y) = Int (primAddInt32 x y) (Float x) + (Float y) = Int (primAddFloat32 x y) etc (1) Would a sufficiently smart compiler be able to eliminate the tagging overhead at runtime? Answer: yes? (2) Would such a language be just as statically safe as Haskell? Answer: I guess so? (3) What would the complexity of the compiler given a code size? of n? Answer: O(???) (4) Would it be possible to create an incremental compiler that would reduce the complexity from say quadratic to linear or constant? On very old computers I worked with great incremental C assemblers and linkers, where the time needed to recompile/relink was mostly proportional to the amount of changes one did. I guess current compilers are so complex that making them incremental would be insane? Thank you, Peter

...

...
You're talking about O(big)... But wasn't toxbboxhe C language in some way succesful because on the hardware at that time other much nicer languages (e.g. LISP) were just way too slow? Or was this just O(n) times slower?

Compiler technology also wasn't as advanced as now, Stalin can't compile even small programs under say 5 minutes... compare this to e.g. TurboPascal, which afair uses three passes: Parsing, Error Reporting, Code Generation, it was similar with C compilers back then.

Lisp was fast on lisp machines, where it is the same as what C is to Neumann-Architectures: An assembler.

I'm not at all sure about the specific O's involved, but I guess it's quite easy to get to NP-complete if you want to do really much without much information.

...
IMHO: Shouldn't concepts that are conceptually the same (in this case, "giving meaning/adding constraints to bits of data" ) at runtime and compile time look very similar in the language? Most languages require completely different syntax and code when you want something to be lazy versus strict. Haskell doesn't, you can just add an annotation if you want it to be strict, no much rewriting is required. However, if I want to change a runtime data constructor definition (and code) into a compiletime type, then I can rewrite all of my code basically. That is not the case in SCHEME as far as I understand it.

Scheme knows no types but the builtins like INT or PAIR or LIST or SYMBOL and so on. Even if you distinguish say

('complex 1 2) from ('vec3 1 2 3)

, the compiler in general won't stop you from passing these things into the wrong functions. It doesn't even know that a function is passed a "LIST (QUOTEDSYMBOL INT INT)" or "LIST (QUOTEDSYMBOL INT INT INT)", it just sees a pair, in both cases.

Lisp is actually not really meant to be compiled, but interpreted. The nice thing is that it doesn't need more than a handful of primitives, a list parser and heap manager/garbage collector and evaluator, which all can be implemented in under 1000 lines of C. Things get more involved with get/cc, but then how many C programmers ever heard of setjmp...

on top of my head, one set of possible primitives is

quote lambda set! succ pred cond.

you can then start by defining define by

(set! 'define (lambda (sym f) (set! sym f)))

There's the wizard book and Graham's Ansi Common Lisp if you're interested in how cheap lisp actually is.

Richard A. O'Keefe

16 Jan 16 Jan

10:19 p.m.

On 17 Jan 2008, at 10:56 am, Peter Verswyvelen wrote:

...

You're talking about O(big)... But wasn't the C language in some way succesful because on the hardware at that time other much nicer languages (e.g. LISP) were just way too slow? Or was this just O(n) times slower?

No. C was designed as a Systems Implementation Language (there were lots of SILs) with the following advantages: (1) tolerable code from a fairly naive compiler (the PDP-11 UNIX V7 C compiler did a little optimisation, but not very much; function entry/ exit was done using calls to library support routines in order to save space, so every function call involved *three* hardware-level function calls; I speeded up a text editor by about 20% by replacing just two tiny functions by hand- written assembler, and it was this function call overhead that was saved) (2) tolerably compact code; a fairly useful library of stuff that you didn't actually have to use (again, said text editor got a notable space saving by *not* using any C stdio stuff at all, not using any floating point stuff at all, and not even linking, let alone calling, malloc()) (3) fairly direct mapping between language and machine so the "performance model" you had to keep in your head was simple (4) a fair degree of portability between compilers (although the PC world spoiled this to some extent). Lisp *performance* compared with C was always O(1) and sometimes excellent; I have had Scheme code running faster than C. It was the memory footprint caused by Lisp's comparatively large library (possibly even including the compiler) always being there in full, and the (largely imaginary) cost of garbage collection which scared people off. It is intensely annoying to an old Lisp hacker to see Java succeeding despite being worse at just about everything Lisp was ever criticised for. But in fairness, the other thing was that before the advent of Common Lisp, every Lisp was different. Develop in MacLisp and you could forget about delivering in Interlisp, and vice versa. This is why, although I actually have Lazy ML on my machine still, I dropped it for Haskell.

6497

Age (days ago)

6498

Last active (days ago)

List overview

Download

18 comments

8 participants

participants (8)

Achim Schneider
Anton van Straaten
Brandon S. Allbery KF8NH
Don Stewart
gwern0＠gmail.com
jerzy.karczmarczuk＠info.unicaen.fr
Peter Verswyvelen
Richard A. O'Keefe