Type checking to prevent data injection attacks?

While thinking about my next blog posting, I realized that it was a
technique similar to what Perl does with the concept of "tainted"
strings.
If you're not familiar with it (and I'm probably getting the details
wrong, not having written any perl in decades), a string that comes
from an external source is considered "tainted". String constants in
the program source are not. There's a builtin that can be used to say
that declares a tainted string as not being tainted (presumably, the
programmer has examined it somehow to determine this). Any string
operations that return strings return tainted strings if any of the
strings involved were tainted. Finally, functions that are subject to
data injection attacks don't work on tainted strings.
It seems like the Haskell type system ought to be able to support
this. I can see a couple of approaches that might work (a "tainted"
monad for use on the different string types or an Either-ish datatype)
that might work, but don't have the chops to decide which approach
might be better - or maybe that the answer is that it depends on the
goal.
I did some googling for this, but didn't turn up anything that seemed
promising. Lots of stuff on type checking untrusted values, but no
flagging strings as untrusted. Which leaves the questions...
Is this actually a sane idea?
Is there already a Haskell package that does this? Possibly part of a
web framework?
A package for another language, or a paper discussing doing this?
Thoughts on other approaches than the two I mentioned?
Thoughts on the best way do to do this?
Thanks,

Yep, search for the standard examples of Phantom Types.
Peter
On 21 December 2012 20:49, Mike Meyer
While thinking about my next blog posting, I realized that it was a technique similar to what Perl does with the concept of "tainted" strings.
If you're not familiar with it (and I'm probably getting the details wrong, not having written any perl in decades), a string that comes from an external source is considered "tainted". String constants in the program source are not. There's a builtin that can be used to say that declares a tainted string as not being tainted (presumably, the programmer has examined it somehow to determine this). Any string operations that return strings return tainted strings if any of the strings involved were tainted. Finally, functions that are subject to data injection attacks don't work on tainted strings.
It seems like the Haskell type system ought to be able to support this. I can see a couple of approaches that might work (a "tainted" monad for use on the different string types or an Either-ish datatype) that might work, but don't have the chops to decide which approach might be better - or maybe that the answer is that it depends on the goal.
I did some googling for this, but didn't turn up anything that seemed promising. Lots of stuff on type checking untrusted values, but no flagging strings as untrusted. Which leaves the questions...
Is this actually a sane idea?
Is there already a Haskell package that does this? Possibly part of a web framework?
A package for another language, or a paper discussing doing this?
Thoughts on other approaches than the two I mentioned?
Thoughts on the best way do to do this?
Thanks,
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Mike Meyer wrote:
Is this actually a sane idea?
Yes
Is there already a Haskell package that does this? Possibly part of a web framework?
I've been using Esqueleto (an SQL EDSL) and it sanitizes/quotes all values while constructing SQL queries. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On Fri, Dec 21, 2012 at 3:15 PM, Erik de Castro Lopo
Mike Meyer wrote:
Is there already a Haskell package that does this? Possibly part of a web framework? I've been using Esqueleto (an SQL EDSL) and it sanitizes/quotes all values while constructing SQL queries.
From that description, this is actually different from what I'm talking about. It's typical for SQL packages even in dynamic languages.
What I have in mind is that the query (with placeholders for the values) would have to be a string constant (provided by the programmer) or flagged as "checked and not tainted" by the programmer, otherwise trying to run the query would fail to type check. If you have a system where you want to let the user select the column names in the query, then SQL placeholders/sanitation may not work - you need to build the query string "by hand". Being able to use the type checking system to insure that no string gets used that hasn't been sanitized would be nice.

The approach Perl takes can be traced back to its origin as a
scripting language for systems programming. In that context it makes
sense as a simple safety system to help prevent basic injection
attacks on often hastily and poorly constructed scripts. Further
reason for the "tainted" flag lies in the incredibly poor type system
in Perl (so much so Perl doesn't even distinguish between strings,
integers, or floats). Very little in Haskell is vulnerable to any kind
of injection attack, there is for instance no real equivalent of the
eval function that's the source of the majority of Perls problems. In
general, those few areas that are vulnerable to injection (things like
templating systems, SQL queries, etc.) it's generally better to
perform whatever escaping is necessary at that point, as the proper
way to sanitize a string will largely depend on the context in which
it's being used (I.E. what's a proper sanitization for a SQL query,
would be inappropriate for an HTML form and vice versa). Most sane
libraries in Haskell will default to escaping appropriate values in
their inputs, and give the programmer explicit access to bypass that
sanitization if he has specific reason to want to do so, and it is
then his responsibility to ensure that the inputs are properly
sanitized.
If you really wanted to do something equivalent, it would be a simple
matter of creating a TaintedString type that just wraps an existing
String, and then a series of functions to "untaint" instances of
TaintedString. Ultimately though, I feel it would be an awful lot of
work for very little gain. Anything in your application that's
vulnerable to injection attacks should escape its inputs, no matter
what the source is.
-R. Kyle Murphy
--
Curiosity was framed, Ignorance killed the cat.
On Fri, Dec 21, 2012 at 4:31 PM, Mike Meyer
On Fri, Dec 21, 2012 at 3:15 PM, Erik de Castro Lopo
wrote: Mike Meyer wrote:
Is there already a Haskell package that does this? Possibly part of a web framework? I've been using Esqueleto (an SQL EDSL) and it sanitizes/quotes all values while constructing SQL queries.
From that description, this is actually different from what I'm talking about. It's typical for SQL packages even in dynamic languages.
What I have in mind is that the query (with placeholders for the values) would have to be a string constant (provided by the programmer) or flagged as "checked and not tainted" by the programmer, otherwise trying to run the query would fail to type check.
If you have a system where you want to let the user select the column names in the query, then SQL placeholders/sanitation may not work - you need to build the query string "by hand". Being able to use the type checking system to insure that no string gets used that hasn't been sanitized would be nice.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Mike Meyer
I did some googling for this, but didn't turn up anything that seemed promising. Lots of stuff on type checking untrusted values, but no flagging strings as untrusted. Which leaves the questions...
Is this actually a sane idea?
Of course. However, this hasn't come up too often in Haskell, because in most cases parsing/processing is part of getting a string from the outside world, so you don't get tainted strings in the first place. That's because the usual stream processing abstractions don't actually produce strings, but whatever you requested with all the processing necessary to convert the raw stream to it. Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

"Taint" can be thought of a baby version of dynamic information flow control; the general problem tries to avoid *any* information from being leaked out from a tainted source. Here's an example of IFC in Haskell: http://hackage.haskell.org/package/lio Cheers, Edward Excerpts from Mike Meyer's message of Sat Dec 22 04:49:20 +0800 2012:
While thinking about my next blog posting, I realized that it was a technique similar to what Perl does with the concept of "tainted" strings.
If you're not familiar with it (and I'm probably getting the details wrong, not having written any perl in decades), a string that comes from an external source is considered "tainted". String constants in the program source are not. There's a builtin that can be used to say that declares a tainted string as not being tainted (presumably, the programmer has examined it somehow to determine this). Any string operations that return strings return tainted strings if any of the strings involved were tainted. Finally, functions that are subject to data injection attacks don't work on tainted strings.
It seems like the Haskell type system ought to be able to support this. I can see a couple of approaches that might work (a "tainted" monad for use on the different string types or an Either-ish datatype) that might work, but don't have the chops to decide which approach might be better - or maybe that the answer is that it depends on the goal.
I did some googling for this, but didn't turn up anything that seemed promising. Lots of stuff on type checking untrusted values, but no flagging strings as untrusted. Which leaves the questions...
Is this actually a sane idea?
Is there already a Haskell package that does this? Possibly part of a web framework?
A package for another language, or a paper discussing doing this?
Thoughts on other approaches than the two I mentioned?
Thoughts on the best way do to do this?
Thanks,
participants (6)
-
Edward Z. Yang
-
Erik de Castro Lopo
-
Ertugrul Söylemez
-
Kyle Murphy
-
Mike Meyer
-
Peter Hall