Re: [Haskell-cafe] Well typed OS

At a high level, you seem to be proposing encoding and enforcing safety properties of executables using a type system. One instantiation of this idea is proof-carrying-code (Necula and Lee, 1996, see https://en.wikipedia.org/wiki/Proof-carrying_code), which ensures safety properties of machine code. The spin-off Cedilla Systems built a mobile OS (intended for phones) around this, with the idea that third-party apps would run unprotected with safety guaranteed solely by sophisticated use of types. The Fox Project, http://www.cs.cmu.edu/~fox/, was philosophically similar, and their web page contains a bunch of interesting pointers to relevant work. For whatever reason, this stuff never got much penetration. But in today's Internet of Things, with malware being injected into mobile phones and cars and light bulbs and toothbrushes and smart dust, it may deserve another look.

You could consider the compiler for the IO type part of the TCB and execute
source code with caching, like nix. Every process could be wrapped in an IO
sandbox that is enforcing the types for all IO operations.
That just requires being able to implement a->Maybe b i.e. have a fixed
binary representation for the data in and out of the inner program.
Something like rudimentary programmable scripts that run on kernel system
call invocations already exists in Linux, so it could be possible to
compile down to that.
Alexander
2018, 17:04 Barak A. Pearlmutter
At a high level, you seem to be proposing encoding and enforcing safety properties of executables using a type system. One instantiation of this idea is proof-carrying-code (Necula and Lee, 1996, see https://en.wikipedia.org/wiki/Proof-carrying_code), which ensures safety properties of machine code. The spin-off Cedilla Systems built a mobile OS (intended for phones) around this, with the idea that third-party apps would run unprotected with safety guaranteed solely by sophisticated use of types. The Fox Project, http://www.cs.cmu.edu/~fox/, was philosophically similar, and their web page contains a bunch of interesting pointers to relevant work.
For whatever reason, this stuff never got much penetration. But in today's Internet of Things, with malware being injected into mobile phones and cars and light bulbs and toothbrushes and smart dust, it may deserve another look. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Sat, Oct 6, 2018 at 11:03 PM Barak A. Pearlmutter
At a high level, you seem to be proposing encoding and enforcing safety properties of executables using a type system. One instantiation of this idea is proof-carrying-code (Necula and Lee, 1996, see https://en.wikipedia.org/wiki/Proof-carrying_code), which ensures safety properties of machine code.
I would guess a factor that prevented something like PCC from taking off is the fact that proofs over binaries are relatively pretty challenging to express and verify, and ISA semantics tend to be really complicated. I think an intermediate approach that might be a lot cheaper to implement is to ship some sort of IR (rather than machine code) that is A) relatively cheap to check and compile and B) carries some variety of safety guarantee. As I recall, A) is already in use. Apple requires developers to submit applications as LLVM IR rather than machine code. LLVM IR definitely isn't designed with B) in mind, however. I could imagine something like Core being useful here. A fully annotated type-checked language with semantics amenable to formal analysis. You could add some sort of effects system to the type system of the IR, as a basis for enforcing security policies. E.g. network access could be an effect. The advantage of doing this over PCC is that we don't have to retrofit the proof system to match the ISA, but instead can design the IR specifically to make the proof system easy. A downside is that you have to compile the application once upon downloading it, but this doesn't seem to cause too much of an issue in practice.

I would guess a factor that prevented something like PCC from taking off is the fact that proofs over binaries are relatively pretty challenging to express and verify, and ISA semantics tend to be really complicated.
The Cedilla folks had a pretty clever way of dealing with that issue. They had a very small proof checker on the embedded device, which used highly compressed information associated with the binary to basically guide a theorem prover, where the information helped the prover make all its choices. This information was generated from a proof generated by a compiler, which compiled a high-level language (C in their case) annotated with extra safety-proof-related information down to both machine code an a proof of that machine code's safety. So they sort of did what you're describing, but partially evaluated away the IR.

The JVM is also doing something similar. There's a step in the process of loading the .class files called "bytecode verification". Ιt's basically a type checker for the stack language used in the bytecode. This was problematic in Java before version 7, because it was possible to cause denial-of-service attacks by sending malicious bytecode payloads. The verifier would spend O(n^2) time verifying that the bytecode behaves correctly. As of Java 7, compilers are required to add additional info alongside the bytecode instructions. This extra info is called a stackmap table and allows the verifier to complete its verification in O(n) steps, IIRC. So, a certain instantiation of PPC is already used in the wild. On 07/10/2018 12:00, Barak A. Pearlmutter wrote:
I would guess a factor that prevented something like PCC from taking off is the fact that proofs over binaries are relatively pretty challenging to express and verify, and ISA semantics tend to be really complicated.
The Cedilla folks had a pretty clever way of dealing with that issue. They had a very small proof checker on the embedded device, which used highly compressed information associated with the binary to basically guide a theorem prover, where the information helped the prover make all its choices.
This information was generated from a proof generated by a compiler, which compiled a high-level language (C in their case) annotated with extra safety-proof-related information down to both machine code an a proof of that machine code's safety. So they sort of did what you're describing, but partially evaluated away the IR. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ionuț G. Stan | http://igstan.ro | http://bucharestfp.ro

Am 07.10.2018 um 12:41 schrieb Ionuț G. Stan:
The JVM is also doing something similar. There's a step in the process of loading the .class files called "bytecode verification". Ιt's basically a type checker for the stack language used in the bytecode.
It's checking more than just types: * Branch target and exception handler address validity. * Stack offset validity. * Each code branch ends with a return instruction. * Local variables initialized before use in each branch of code. * No stack overflow or underflow along each code path. * Validation of private and protected access. I suspect that all of this can be expressed as a type check, but some of them require that a type be generated for each natural number that's in the class file, and I have yet to see an approach that gives each of these types a printable, humanly understandable representation.

On 07/10/2018 14:10, Joachim Durchholz wrote:
Am 07.10.2018 um 12:41 schrieb Ionuț G. Stan:
The JVM is also doing something similar. There's a step in the process of loading the .class files called "bytecode verification". Ιt's basically a type checker for the stack language used in the bytecode.
It's checking more than just types:
Sure. I used an informal, hand-wavy sense for "type checker".
* Branch target and exception handler address validity. * Stack offset validity. * Each code branch ends with a return instruction. * Local variables initialized before use in each branch of code. * No stack overflow or underflow along each code path. * Validation of private and protected access.
I suspect that all of this can be expressed as a type check, but some of them require that a type be generated for each natural number that's in the class file, and I have yet to see an approach that gives each of these types a printable, humanly understandable representation.
Can you give a bit more detail here? Or a pointer to someone who elaborated on this? I got a bit lost on the natural number step, which seems to be some sort of theoretical way to represent those types. On the other hand you say they're in the class file. BTW, I'm quite familiar with the class file format and have even started writing a bytecode verifier for my own compiler pet project. So, be as specific as you can.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ionuț G. Stan | http://igstan.ro | http://bucharestfp.ro

Am 07.10.2018 um 13:47 schrieb Ionuț G. Stan:
a type be generated for each natural number that's in the class file, and I have yet to see an approach that gives each of these types a printable, humanly understandable representation.
Can you give a bit more detail here? Or a pointer to someone who elaborated on this? I got a bit lost on the natural number step, which seems to be some sort of theoretical way to represent those types.
You can make Haskell's type system count, by defining a `Zero` type and a `Succ a` type - well, I think it's a kind (i.e. a type of types). That (and equality) is enough to do natural numbers - you have Zero, One = Succ Zero, Two = Succ Succ Zero, etc. I don't know enough about this stuff to give a recipe, I just saw descriptons of what people were doing. The earliest awesome trick of this kind was somebody who encoded matrix squareness in the type system. (You cannot even define inversion for a non-square matrix, so square matrices are pretty different beasts than non-square ones.) Somebody else (I think) started doing the natural numbers. You can do any kinds of cool tricks with that, such as cleanly expressing that to multiply two matrices, the height of one matrix needs to be the same as the width of the other. In theory, you can express any arithmetic relationships within the type system that way: Addresses, matrix dimension constraints, etc. In practice, any error messages are going to look like "invalid type Succ Succ Succ .... Succ Zero", and you have to count the number of Succs, which sucks (pun intended). I suspect you can also do fractional numbers with that - you essentially translate the axioms some type (or kind) declarations. I'd expect that to be even less usable in practice. However, it's intriguing what you can do with types. Now please take this all with a grain of salt. I never used this kind of stuff in practice, I just saw the reports of people doing this clever insanity, and didn't want to do it myself. The approach never took on if this kind of old lore isn't present in Haskell Café anymore.
BTW, I'm quite familiar with the class file format and have even started writing a bytecode verifier for my own compiler pet project. So, be as specific as you can.
I was more like handwaving at the possibility of turning all assertions over integral numbers into type constraints. I am definitely not recommending doing this in practice! Regards, Jo

The problem with an IR is that some languages would inevitably suffer - LLVM in particular was designed as a backend for a C compiler, and so it is not necessarily well-suited for lazy languages, immutable languages, etc. (not to mention self-modifying assembly and other such pathological beasts...) On 10/06/2018 08:31 PM, William Yager wrote:
On Sat, Oct 6, 2018 at 11:03 PM Barak A. Pearlmutter
mailto:barak@pearlmutter.net> wrote: At a high level, you seem to be proposing encoding and enforcing safety properties of executables using a type system. One instantiation of this idea is proof-carrying-code (Necula and Lee, 1996, see https://en.wikipedia.org/wiki/Proof-carrying_code), which ensures safety properties of machine code.
I would guess a factor that prevented something like PCC from taking off is the fact that proofs over binaries are relatively pretty challenging to express and verify, and ISA semantics tend to be really complicated.
I think an intermediate approach that might be a lot cheaper to implement is to ship some sort of IR (rather than machine code) that is A) relatively cheap to check and compile and B) carries some variety of safety guarantee.
As I recall, A) is already in use. Apple requires developers to submit applications as LLVM IR rather than machine code.
LLVM IR definitely isn't designed with B) in mind, however. I could imagine something like Core being useful here. A fully annotated type-checked language with semantics amenable to formal analysis.
You could add some sort of effects system to the type system of the IR, as a basis for enforcing security policies. E.g. network access could be an effect.
The advantage of doing this over PCC is that we don't have to retrofit the proof system to match the ISA, but instead can design the IR specifically to make the proof system easy. A downside is that you have to compile the application once upon downloading it, but this doesn't seem to cause too much of an issue in practice.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Am 08.10.2018 um 01:34 schrieb Vanessa McHale:
The problem with an IR is that some languages would inevitably suffer - LLVM in particular was designed as a backend for a C compiler, and so it is not necessarily well-suited for lazy languages, immutable languages, etc. (not to mention self-modifying assembly and other such pathological beasts...) Actually LLVM is built for being adaptable to different kinds of languages. It does have a bias towards C-style languages, but you can adapt what doesn't fit your needs *and still keep the rest*.
The following was true a few years ago: When I asked, the LLVM IR was intentionally not specified to be reusable across languages, so that different compiler toolchain could adapt the IR to whatever needs their language or backend infrastructure needed. Garbage collection is one area where you have to do a lot of work. There are some primitive instructions that support it, but the semantics is vague and doesn't cover all kinds of write barriers. You'll have to roll your own IR extensions - or maybe I didn't understand the primitives well enough to see how much they cover. Anyway, LLVM does not come with a GC implementation. OTOH, it does not prevent you from doing a GC. In particular, you're free to avoid C-style pointers, so you have the full range of GC algorithms available. Laziness? No problem. If you do tagless/spineless, you'll code the evaluation machine anyway. Just add an IR instructions that calls the interpreter. Immutability? No problem - actually nowhere a problem. Immutability happens at the language level, at the IR level it is pretty irrelevant because compilers try to replace object copying by in-place modification wherever possible, anyway. Self-modifying assembly? No IR really supports that. Mostly it's backends that generate self-modifying code from IR instructions for specific backends. TL;DR: For its generality, LLVM IR is better suited to languages with specific needs in the backend than anything else that I have seen (which means C runtimes, various VM proofs of concept which don't really count, and JVM - in particular I don't know how .net compares). Regards, Jo

On Mon, Oct 8, 2018 at 8:53 AM Joachim Durchholz
Am 08.10.2018 um 01:34 schrieb Vanessa McHale:
The problem with an IR is that some languages would inevitably suffer - LLVM in particular was designed as a backend for a C compiler, and so it is not necessarily well-suited for lazy languages, immutable languages, etc. (not to mention self-modifying assembly and other such pathological beasts...) Actually LLVM is built for being adaptable to different kinds of languages. It does have a bias towards C-style languages, but you can adapt what doesn't fit your needs *and still keep the rest*.
The following was true a few years ago:
When I asked, the LLVM IR was intentionally not specified to be reusable across languages, so that different compiler toolchain could adapt the IR to whatever needs their language or backend infrastructure needed.
Garbage collection is one area where you have to do a lot of work. There are some primitive instructions that support it, but the semantics is vague and doesn't cover all kinds of write barriers. You'll have to roll your own IR extensions - or maybe I didn't understand the primitives well enough to see how much they cover. Anyway, LLVM does not come with a GC implementation. OTOH, it does not prevent you from doing a GC. In particular, you're free to avoid C-style pointers, so you have the full range of GC algorithms available.
Laziness? No problem. If you do tagless/spineless, you'll code the evaluation machine anyway. Just add an IR instructions that calls the interpreter.
I'm far from expert in this area, but isn't that "interpreter" a simple yet slow approach to codegen? My understanding is that when you use, say, a global variable as a register for your evaluation machine, it is slower than if you somehow pin real hardware register for that purpose. I think this is what "registerized" GHC build means. In LLVM you can't use, say, RSP in a way you want, but it is doomed to be "stack pointer register", even if you don't use stack at all. As I read in some blog, you can slightly affect LLVM codegen by adding calling conventions, but the real solution would be another algorithm for instruction selection. No one implemented that yet, AFAIK.
Immutability? No problem - actually nowhere a problem. Immutability happens at the language level, at the IR level it is pretty irrelevant because compilers try to replace object copying by in-place modification wherever possible, anyway.
Self-modifying assembly? No IR really supports that. Mostly it's backends that generate self-modifying code from IR instructions for specific backends.
TL;DR: For its generality, LLVM IR is better suited to languages with specific needs in the backend than anything else that I have seen (which means C runtimes, various VM proofs of concept which don't really count, and JVM - in particular I don't know how .net compares).
Regards, Jo _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Am 08.10.2018 um 09:08 schrieb Gleb Popov:
Laziness? No problem. If you do tagless/spineless, you'll code the evaluation machine anyway. Just add an IR instructions that calls the interpreter.
I'm far from expert in this area, but isn't that "interpreter" a simple yet slow approach to codegen? My understanding is that when you use, say, a global variable as a register for your evaluation machine, it is slower than if you somehow pin real hardware register for that purpose. I think this is what "registerized" GHC build means.
My impression has been that GHC is compiling everything down to machine code. FWIW most mentions I saw were of the kind "we replace tagless/spineless with X", not of the "we build on tagless/spineless doing Y". Not that I have read many papers - I'm awfully behind on these topics.
In LLVM you can't use, say, RSP in a way you want, but it is doomed to be "stack pointer register", even if you don't use stack at all.
Actually it seems possible since some version between 3.4 and 3.6. https://stackoverflow.com/questions/29365198/how-to-use-llvm-intrinsics-llvm... discusses a way of doing exactly that. Not sure about ramifications though. Still, seeing how many wildly varying languages use LLVM for code generation, it seems to be widely considered to be the most flexible available one. If somebody knows of something more flexible, paint me interested :-) Regards, Jo
participants (7)
-
Alexander Kjeldaas
-
Barak A. Pearlmutter
-
Gleb Popov
-
Ionuț G. Stan
-
Joachim Durchholz
-
Vanessa McHale
-
William Yager