Re: [Haskell-cafe] Tor project

Hi - Yes, we (Galois) are. The end goal is to have a Tor implementation running on a HaLVM. Right now the project is internal, but the plan is to push a basic relay node implementation out to our GitHub site sometime in the next few weeks. As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. On the other hand, functional language implementations protect you from a wide variety of attacks that occur in C implementations. I don’t believe the balance has been studied, but it’d be interesting. I do know the OCaml/Mirage folks have been having good luck with their TLS implementation. I believe they’ve at least started doing some red team analysis of it, as well, with good results. See their various blog posts, starting with http://openmirage.org/blog/introducing-ocaml-tls. I’m not sure if Vincent’s library has been subject to similar evaluation, and I know the partial library in our Tor implementation has not been. - Adam Quoth:
I’ll make a question out of it:
Is anyone working on a Haskell project compatible with Tor? Library, Server, Browser, etc. Is anyone interested in such a project? Would the Haskell Runtime and TLS modules have any time dependent behavior that would make encryption harder/easier to break than C implementations? Basically, is the TLS attach surface smaller or larger than openssl and is there any data to support claims of such?

On 2014-07-31 18:59, Adam Wick wrote:
Hi -
Yes, we (Galois) are. The end goal is to have a Tor implementation running on a HaLVM. Right now the project is internal, but the plan is to push a basic relay node implementation out to our GitHub site sometime in the next few weeks.
As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. On the other hand, functional language implementations protect you from a wide variety of attacks that occur in C implementations. I don’t believe the balance has been studied, but it’d be interesting.
I do know the OCaml/Mirage folks have been having good luck with their TLS implementation. I believe they’ve at least started doing some red team analysis of it, as well, with good results. See their various blog posts, starting with http://openmirage.org/blog/introducing-ocaml-tls. I’m not sure if Vincent’s library has been subject to similar evaluation, and I know the partial library in our Tor implementation has not been.
- Adam
Wow, that actually turned out to be interesting! Thanks Michael and Adam! :) PS: I wonder what the response to timing attacks will be... Regards,

On 31.07.2014 18:59, Adam Wick wrote:
As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. (...) I don’t believe the balance has been studied, but it’d be interesting.
I believe no evidence is available, not even anecdotal. And it would be rather expensive a subject to study. But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines. I strive not to use words I don't understand, but I have the M. word in mind for structuring such a computation. In other words, I think it is a myth. -- Kind regards, Wojtek N.

I am not an expert, but I think timing attacks try to push data though a system and look for time dependent calculations. If every packet that leaves a system has the same data size and the same encryption time, the attacker would not be able to detect any difference in time wrt difference in data. Time could also vary if you mucked with the voltage of the CPU, or if some calculation could.
I would guess that if it is not possible to make all packages the same size and time, randomizing time would hide time differences. However, it may be possible to extract randomness. This is just a conjecture on my part.
A work around might be to use hardware encryption. I work on an A9, and with openssl, there is the option to have hardware do the actual encryption, etc. I have not had the time to implement this, but I believe that Linux for IMX6 has support for hardware encryption. If nothing else, it is best to use the hardware for random number generation.
My interested would be to run it on my Wandboard and Yocto linux. Hence my questions about cross-compilers. I am still stuck on that problem because I have not figure out how to make GHC pass architecture options to the gcc cross compiler but not to the local linux gcc. It seems some variables in the build are tied together. But eventually I’ll probably figure it out.
I think the Hypervisor approach is also interesting. Just build a mini OS with TLS and Tor. That could reduce the attack surface by eliminating Linux. This would be interesting for a repeater. I was thinking doing the same onto of a smaller kernel such as eCos. I tried to get GHC running on that, but there is some missing POSIX support, so I went back to linux.
Mike
On Jul 31, 2014, at 3:11 PM, Wojtek Narczyński
On 31.07.2014 18:59, Adam Wick wrote:
As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. (...) I don’t believe the balance has been studied, but it’d be interesting.
I believe no evidence is available, not even anecdotal. And it would be rather expensive a subject to study.
But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines. I strive not to use words I don't understand, but I have the M. word in mind for structuring such a computation.
In other words, I think it is a myth.
-- Kind regards, Wojtek N. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Unrelated to original topic!] If you're using Linux on i.MX6, then you can also use native compilation which is working better than cross-compilation. If eCos does not provide proper POSIX, then you can also have a look at RTEMS. I attempted to port GHC to it year ago IIRC, but it would require some changes to cabal and libs too. But anyway this is really nice OS target... Karel On 08/ 1/14 01:42 AM, Michael Jones wrote:
I am not an expert, but I think timing attacks try to push data though a system and look for time dependent calculations. If every packet that leaves a system has the same data size and the same encryption time, the attacker would not be able to detect any difference in time wrt difference in data. Time could also vary if you mucked with the voltage of the CPU, or if some calculation could.
I would guess that if it is not possible to make all packages the same size and time, randomizing time would hide time differences. However, it may be possible to extract randomness. This is just a conjecture on my part.
A work around might be to use hardware encryption. I work on an A9, and with openssl, there is the option to have hardware do the actual encryption, etc. I have not had the time to implement this, but I believe that Linux for IMX6 has support for hardware encryption. If nothing else, it is best to use the hardware for random number generation.
My interested would be to run it on my Wandboard and Yocto linux. Hence my questions about cross-compilers. I am still stuck on that problem because I have not figure out how to make GHC pass architecture options to the gcc cross compiler but not to the local linux gcc. It seems some variables in the build are tied together. But eventually I’ll probably figure it out.
I think the Hypervisor approach is also interesting. Just build a mini OS with TLS and Tor. That could reduce the attack surface by eliminating Linux. This would be interesting for a repeater. I was thinking doing the same onto of a smaller kernel such as eCos. I tried to get GHC running on that, but there is some missing POSIX support, so I went back to linux.
Mike
On Jul 31, 2014, at 3:11 PM, Wojtek Narczyński
wrote: On 31.07.2014 18:59, Adam Wick wrote:
As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. (...) I don’t believe the balance has been studied, but it’d be interesting.
I believe no evidence is available, not even anecdotal. And it would be rather expensive a subject to study.
But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines. I strive not to use words I don't understand, but I have the M. word in mind for structuring such a computation.
In other words, I think it is a myth.
-- Kind regards, Wojtek N. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Karel, I know I can build on the target, but I am avoiding it. I am trying to keep the target simple and small. I don't even have a disk drive on the target. Mike
:
[Unrelated to original topic!]
If you're using Linux on i.MX6, then you can also use native compilation which is working better than cross-compilation.
If eCos does not provide proper POSIX, then you can also have a look at RTEMS. I attempted to port GHC to it year ago IIRC, but it would require some changes to cabal and libs too. But anyway this is really nice OS target...
Karel
On 08/ 1/14 01:42 AM, Michael Jones wrote: I am not an expert, but I think timing attacks try to push data though a system and look for time dependent calculations. If every packet that leaves a system has the same data size and the same encryption time, the attacker would not be able to detect any difference in time wrt difference in data. Time could also vary if you mucked with the voltage of the CPU, or if some calculation could.
I would guess that if it is not possible to make all packages the same size and time, randomizing time would hide time differences. However, it may be possible to extract randomness. This is just a conjecture on my part.
A work around might be to use hardware encryption. I work on an A9, and with openssl, there is the option to have hardware do the actual encryption, etc. I have not had the time to implement this, but I believe that Linux for IMX6 has support for hardware encryption. If nothing else, it is best to use the hardware for random number generation.
My interested would be to run it on my Wandboard and Yocto linux. Hence my questions about cross-compilers. I am still stuck on that problem because I have not figure out how to make GHC pass architecture options to the gcc cross compiler but not to the local linux gcc. It seems some variables in the build are tied together. But eventually I’ll probably figure it out.
I think the Hypervisor approach is also interesting. Just build a mini OS with TLS and Tor. That could reduce the attack surface by eliminating Linux. This would be interesting for a repeater. I was thinking doing the same onto of a smaller kernel such as eCos. I tried to get GHC running on that, but there is some missing POSIX support, so I went back to linux.
Mike
On Jul 31, 2014, at 3:11 PM, Wojtek Narczyński
wrote: On 31.07.2014 18:59, Adam Wick wrote: As for TLS, it is possible that timing attacks based on a functional language implementation could be more likely than those for a traditional C implementation. (...) I don’t believe the balance has been studied, but it’d be interesting. I believe no evidence is available, not even anecdotal. And it would be rather expensive a subject to study.
But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines. I strive not to use words I don't understand, but I have the M. word in mind for structuring such a computation.
In other words, I think it is a myth.
-- Kind regards, Wojtek N. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Jul 31, 2014 at 2:11 PM, Wojtek Narczyński
But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines.
I agree that we don't have a lot of evidence for/against timing attacks in functional languages (that I know of). But adding a randomized delay never seemed the correct solution to me (granted, I had the luck to never had to write code sensitive to such issues, and I never wrote a timing attack exploit either), I don't think that doing it in the protocol layer makes it neither necessary nor sufficient. http://rdist.root.org/2010/01/07/timing-independent-array-comparison/ This explains the pitfalls in some possible timing attack misconceptions -- xmpp: berdario@gmail.com bitmessage: BM-2cTYXfGiSTsnx3righ6aHcJSWe4MV17jDP gpg fingerprint: 3F8D53518012716C4EEF7DF67B498306B3BF75A0 (used just for signing commits)

A comparision would be interesting though. I'm aware of constant time implementations of cryptographic functions to reduce timing issues, but these are often coded in C or ASM - I have no clue if this would be possible in a functional language, as the compiler has to know not to optimize for short cuts or anything in the code? On 08/01/2014 02:05 AM, Dario Bertini wrote:
On Thu, Jul 31, 2014 at 2:11 PM, Wojtek Narczyński
wrote: But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines.
I agree that we don't have a lot of evidence for/against timing attacks in functional languages (that I know of).
But adding a randomized delay never seemed the correct solution to me (granted, I had the luck to never had to write code sensitive to such issues, and I never wrote a timing attack exploit either), I don't think that doing it in the protocol layer makes it neither necessary nor sufficient.
http://rdist.root.org/2010/01/07/timing-independent-array-comparison/
This explains the pitfalls in some possible timing attack misconceptions

On 01.08.2014 08:59, Friedrich Wiemer wrote:
A comparision would be interesting though. I'm aware of constant time implementations of cryptographic functions to reduce timing issues, but these are often coded in C or ASM - I have no clue if this would be possible in a functional language, as the compiler has to know not to optimize for short cuts or anything in the code?
Well, how about something like inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)

Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.

On 01.08.2014 09:27, Luke Clifton wrote:
Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.
Oh come on, there is still a number of slow buffers in between: kernel, network cards, switches, routers.

But the delay caused by those buffers does not depend on the internal
workings of your crypto algorithm.
The delay caused by potential cache misses could very well leak
information. If your algorithm terminates early on a string comparison, and
therefore doesn't end up evicting more pages out of the cache performing
whatever operations would have come after had the strings matched, there
could be real world measurable differences, even if you waited for the
clock to hit some threshold before continuing on with execution.
Regardless of how much delay there is in other sources, if it is not
directly correlated, it can be averaged out.
On Fri, Aug 1, 2014 at 3:38 PM, Wojtek Narczyński
On 01.08.2014 09:27, Luke Clifton wrote:
Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.
Oh come on, there is still a number of slow buffers in between: kernel, network cards, switches, routers.

On 08/ 1/14 09:38 AM, Wojtek Narczyński wrote:
On 01.08.2014 09:27, Luke Clifton wrote:
Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.
Oh come on, there is still a number of slow buffers in between: kernel, network cards, switches, routers.
I think original poster has been talking about something like that: https://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf https://eprint.iacr.org/2014/248.pdf not funny reading indeed. So yes, I would also like to see paper about attacks above being done against purely functional TLS implementation. Results may be interesting, especially when we consider functional programming to provide more secure code by default (in comparison with C)... Karel

On 01.08.2014 09:49, Karel Gardas wrote:
I think original poster has been talking about something like that:
https://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf https://eprint.iacr.org/2014/248.pdf
not funny reading indeed. So yes, I would also like to see paper about attacks above being done against purely functional TLS implementation. Results may be interesting, especially when we consider functional programming to provide more secure code by default (in comparison with C)...
A scientific proof that sending your private keys to public cloud is not very smart. Adorable. Now seriously, thanks for pointing out the papers.

Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.
One would need to add countermeasures for this sidechannel, too, I guess.

On 01.08.2014 10:02, Friedrich Wiemer wrote:
Well, how about something like
inConstantTime :: timeBudget -> (functionToPerform :: CryptoResult) -> IO (Maybe CryptoResult)
I'm no expert, but aren't timing attacks also possible with something like that. If your `functionToPerform' touches the cache in funny ways, the program after resuming from the timeout might have different timings as there could be cache misses in one scenario, but not the other.
One would need to add countermeasures for this sidechannel, too, I guess.
Countermeasures here, countermeasures there, and the best language to do it is C. I find it hard to believe.

On Fri, Aug 01, 2014 at 10:08:08AM +0200, Wojtek Narczyński wrote:
Countermeasures here, countermeasures there, and the best language to do it is C. I find it hard to believe.
Well; it's a two-edged sword. C is close to the metal, so you have a lot of control over timing, memory allocations, even CPU registers; this is beneficial when dealing with things like timing attacks, and it is a disaster when trying to write provably correct code. I believe that having both in the same language would be the holy grail of secure programming.

On 1 August 2014 08:59, Friedrich Wiemer
A comparision would be interesting though. I'm aware of constant time implementations of cryptographic functions to reduce timing issues, but these are often coded in C or ASM - I have no clue if this would be possible in a functional language, as the compiler has to know not to optimize for short cuts or anything in the code?
I thought about the possibilities to make a "fixed time" implementation of some standard functions a while ago. What I *think* should give a solution is the following. Why do C implementations against, say, timing attack works? Well, a very typical scenario is password checking: you have two input [Char]s, and need to see if they match, which you do by zipping them and seeing if all pairs are of equal elements. Now of course, once you found a first element which doesn't match, you /can/ stop computation because you know the result, but you don't /want/ to, because... well, because you don't want to expose the fact that you're done. In other words, you want to look like you're still processing! So instead, let's make an implementation where even if halfway through checking equality of two passwords we discover that they're unequal, there is still some more information we need to compute! Let's /actually/ make it so that we still need to do more computations. So what if we made, for every standard library haskell function, something that somehow accumulates small bits of data that take the full input data to process, but in the end we are not interested in? For example, we could make an implementation of all::(a->Bool)->[a]->Bool (with a different signature) which doesn't just compute the output we need, but also computes a "cumulative hash function" (for lack of better words) of all the elements. So let us give a name to the "trash data" we attach to any "input data", whose hash function (or whatever constant time function) gets accumulated throughout the algorithm that processes the input.... "slack data"? Now, exactly what slack data you attach to input data depends on the thing that you want to be constant in: if you want your computation to always take as long as the length of the input data, attach that much slack data. But, for example, if passwords can be up to 128 chars long, and you don't want to expose /any/ information about the given password, you should attach 128 items of slack data to your input, possibly making it larger. Let me know if the above didn't make sense. I don't know how to implement this, but I think it might give a way to fix *many* side-channel leaks using functional techniques. PS: obviously this is not how you want to be checking passwords, but this is quite a typical side channel leak scenario.

I dont think this is a solution as well. You're actually trying to cheat around compiler optimizations, don't you? What if the compiler notices you don't need this trash data and optimizes the computation away? On 08/01/2014 10:11 AM, Wojtek Narczyński wrote:
(...) Let me know if the above didn't make sense. You asked for it. For me, doing useless computations just to make sure
On 01.08.2014 10:03, Auke Booij wrote: the timing is right, does not make sense. Using a clock makes sense. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 1 August 2014 10:11, Wojtek Narczyński
You asked for it. For me, doing useless computations just to make sure the timing is right, does not make sense. Using a clock makes sense.
If you believe that is the case, how are you planning to tackle power analysis attacks? (Note that such attacks are not just theoretical.) Let's say you went to sit an exam, but were already given the answers by a friend beforehand. How do you fake actually sitting the exam? You don't set a clock: that only fixes one of the ways in which the examiners might discover you don't need to work to do the exam. Instead, you stare ate your paper intensely, make useless drawings on your rough work paper, and pretend you're working hard. If you want to look like you're processing, you better actually be processing. C allows you compute absolutely nothing, because the compiler isn't smart enough to see that. But Haskell's compiler is much better at detecting if you're computing trash, so we need to be more convincing that we're not. You can call it a cheat around compiler optimizations, but really that's missing the point, because Haskell doesn't even work without those optimizations. This is a way to process data in a configurable way. Obviously you'll need to somehow tell the compiler that the eventual result of all the slack data propagation should not be thrown out. But this is nothing new: we already have the IO monad (although admittedly its internal data gets optimized away, or so I heard; but the important thing is that the effects stay).

W dniu 2014-08-01 10:28, Auke Booij pisze:
On 1 August 2014 10:11, Wojtek Narczyński
wrote: You asked for it. For me, doing useless computations just to make sure the timing is right, does not make sense. Using a clock makes sense. If you believe that is the case, how are you planning to tackle power analysis attacks? (Note that such attacks are not just theoretical.)
Oh, okay, my mind is set on remote attacks. In the scenario where the attacker has access to the hardware under attack, doing fake computations most resembling normal computations is definitely better then clock. In general, access to the hardware raises the bar for security very high. In such a scenario assembly language might be the best choice, or at least analysis of the binary code.

Timing attacks generally work such that the attacker samples a large number of runs and maps their execution times; "humps" in the graph show different code paths being taken. Because they're statistical attacks, adding uniform noise (random delays) does not work - it's uniform noise after all, you'll just raise the floor, but the humps will still be there; the random delays will just average out. Ten million times (10 + rand()) is still going to be smaller than ten million times (1000 + rand()). Instead, a way I've seen recommended is run the entire input through a cryptographic hash function, and derive a delay from that, in a range that dwarfs the actual timing difference. This way, things will still average out, but the averaging will not compensate for the delay we added, because that delay is always the same for a given input. Of course designing functions to minimize execution time differences is still a good idea; it's just not very reliable on its own, and for many real-world cases, it's not even practical (or outright impossible) - even if your code doesn't branch based on whether a username was found in the database or not, your database does. On Thu, Jul 31, 2014 at 05:05:26PM -0700, Dario Bertini wrote:
On Thu, Jul 31, 2014 at 2:11 PM, Wojtek Narczyński
wrote: But, AFAIK, the (necessary and sufficient) protection against timing attacks is the addition of randomized waits. In the protocol layer, not in pure encryption/decryption/hashing routines.
I agree that we don't have a lot of evidence for/against timing attacks in functional languages (that I know of).
But adding a randomized delay never seemed the correct solution to me (granted, I had the luck to never had to write code sensitive to such issues, and I never wrote a timing attack exploit either), I don't think that doing it in the protocol layer makes it neither necessary nor sufficient.
http://rdist.root.org/2010/01/07/timing-independent-array-comparison/
This explains the pitfalls in some possible timing attack misconceptions
-- xmpp: berdario@gmail.com bitmessage: BM-2cTYXfGiSTsnx3righ6aHcJSWe4MV17jDP gpg fingerprint: 3F8D53518012716C4EEF7DF67B498306B3BF75A0 (used just for signing commits) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 01.08.2014 09:10, Tobias Dammers wrote:
Timing attacks generally work such that the attacker samples a large number of runs and maps their execution times; "humps" in the graph show different code paths being taken.
Because they're statistical attacks, adding uniform noise (random delays) does not work - it's uniform noise after all, you'll just raise the floor, but the humps will still be there; the random delays will just average out. Ten million times (10 + rand()) is still going to be smaller than ten million times (1000 + rand()). Okay, so basically we'd would have to add dependent random noise. For example send out a Normal distribution. Or a smiley (Beta, Weibull). Instead, a way I've seen recommended is run the entire input through a cryptographic hash function, and derive a delay from that, in a range that dwarfs the actual timing difference. This way, things will still average out, but the averaging will not compensate for the delay we added, because that delay is always the same for a given input.
Of course designing functions to minimize execution time differences is still a good idea; it's just not very reliable on its own, and for many real-world cases, it's not even practical (or outright impossible) - even if your code doesn't branch based on whether a username was found in the database or not, your database does.
Why the gimmick, for each and every piece of crypto code? Why not just use the clock for timing? The crypto lib would probably have to tune itself on startup. -- Wojtek N.

On Fri, Aug 01, 2014 at 09:35:37AM +0200, Wojtek Narczyński wrote:
Why the gimmick, for each and every piece of crypto code? Why not just use the clock for timing? The crypto lib would probably have to tune itself on startup.
Sorry, I didn't mean use the crypto function to cause the delay; I meant using a cryptographic hash to determine how long the delay should be. Once we have a number, the system clock should of course be used to actually delay things. The point of this is as follows. The attacker sends a large number of messages, each one of them multiple times, averaging execution times. The reasonable expectation is that the average execution times will fall into distinct groups, each of them representing one execution path. The averaging is needed because there will be fluctuations anyway, if only because there are other things happening on the network and on the target machine, but those noise sources are typically all uniform or at least normal distributions, so averaging will filter them out. The part that remains after filtering is what interests the attacker. Then if you have sampled, say, 10,000 random usernames, and 10 of them are in one group and the other 9,990 are in the other, it's a pretty good guess to assume that those 10 are usernames that actually exist on the target system and the others don't. Now, if we add some random noise, we just add one more uniformly distributed noise source to our signal, but we already have a bunch of those, so we're not really winning anything here - the extra noise will just get filtered out along with the rest. However, if we add noise that depends on the input, the filtering won't remove it, because for each given input it is constant. Instead of removing the noise component from `code(input) + system_noise()`, leaving `code(input)`, the attacker now deals with `code(input) + hash(input) + system_noise()`, leaving them with `code(input) + hash(input)`. Without cracking the hash, I don't know how you'd remove the hash-based delay from the sample; possible mistakes would be insufficient granularity in the hash-based delay (e.g. if the difference between code paths is 1ms and our delay has a 100ms granularity, it's still trivial to filter out the hash noise), insufficient range (if the difference between code paths is 100ms and our delay is between 0 and 25ms, we still have two distinct groups), a weak hash function, and probably a bunch more.
-- Wojtek N. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

hi,
However, if we add noise that depends on the input, the filtering won't remove it, because for each given input it is constant. Instead of removing the noise component from `code(input) + system_noise()`, leaving `code(input)`, the attacker now deals with `code(input) + hash(input) + system_noise()`, leaving them with `code(input) + hash(input)`. Without cracking the hash, I don't know how you'd remove the hash-based delay from the sample; possible mistakes would be insufficient granularity in the hash-based delay (e.g. if the difference between code paths is 1ms and our delay has a 100ms granularity, it's still trivial to filter out the hash noise), insufficient range (if the difference between code paths is 100ms and our delay is between 0 and 25ms, we still have two distinct groups), a weak hash function, and probably a bunch more.
just a note, that that's what sebastian schinzel's "Deterministic and Unpredictable delay padding" is about. see the talk (i linked to earlier): talk from sebastian schnitzler on 29c3: http://media.ccc.de/browse/congress/2012/29c3-5044-en-time_is_not_on_your_si... slides: http://sebastian-schinzel.de/29c3/download/29c3-schinzel.pdf abstract: http://sebastian-schinzel.de/_download/cosade-2011-extended-abstract.pdf cheers, tob

Yeah, hadn't read your message yet. This practice is actually pretty much common knowledge in the security industry. On Fri, Aug 01, 2014 at 12:04:48PM +0200, Tobias Florek wrote:
hi,
However, if we add noise that depends on the input, the filtering won't remove it, because for each given input it is constant. Instead of removing the noise component from `code(input) + system_noise()`, leaving `code(input)`, the attacker now deals with `code(input) + hash(input) + system_noise()`, leaving them with `code(input) + hash(input)`. Without cracking the hash, I don't know how you'd remove the hash-based delay from the sample; possible mistakes would be insufficient granularity in the hash-based delay (e.g. if the difference between code paths is 1ms and our delay has a 100ms granularity, it's still trivial to filter out the hash noise), insufficient range (if the difference between code paths is 100ms and our delay is between 0 and 25ms, we still have two distinct groups), a weak hash function, and probably a bunch more.
just a note, that that's what sebastian schinzel's "Deterministic and Unpredictable delay padding" is about. see the talk (i linked to earlier):
talk from sebastian schnitzler on 29c3:
http://media.ccc.de/browse/congress/2012/29c3-5044-en-time_is_not_on_your_si...
slides: http://sebastian-schinzel.de/29c3/download/29c3-schinzel.pdf
abstract: http://sebastian-schinzel.de/_download/cosade-2011-extended-abstract.pdf
cheers, tob _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 01.08.2014 12:04, Tobias Florek wrote:
just a note, that that's what sebastian schinzel's "Deterministic and Unpredictable delay padding" is about. see the talk (i linked to earlier):
talk from sebastian schnitzler on 29c3:
http://media.ccc.de/browse/congress/2012/29c3-5044-en-time_is_not_on_your_si...
slides: http://sebastian-schinzel.de/29c3/download/29c3-schinzel.pdf
abstract: http://sebastian-schinzel.de/_download/cosade-2011-extended-abstract.pdf
I just realized that the unifying framework for countermeasures of any resource based attacks (time, electricity, heat, radiation) is MaxEnt. Constant time is trivially MaxEnt. If you can't do constant time, go for uniform distribution over a number of predefined times. If you can't do that, from the space of possible distributions, choose the one with maximum entropy. So much for the theory. E.T. Jaynes would be delighted to see his work appear in yet another context. -- Wojtek

[random delay for mitigating side channel attacks]
that is not enough. to mitigate _network attacks_ you need at least predictable random delay. i was toying around in adding that to vincent hanquez tls library but did not find the time to do that in the end. it would be great to see others do that of course :). talk from sebastian schnitzler on 29c3: http://media.ccc.de/browse/congress/2012/29c3-5044-en-time_is_not_on_your_si... slides: http://sebastian-schinzel.de/29c3/download/29c3-schinzel.pdf abstract: http://sebastian-schinzel.de/_download/cosade-2011-extended-abstract.pdf it should be stressed that delay does only help against network side channels. if you have an attacker on the same physical hardware, you will at least need branchless code. that is a very hard problem. i think it's pretty much impossible to solve that problem in haskell alone. maybe with a dsl that generates code it's possible though. cryptol looks interesting in that regard (whenever it gets back it code generators). offloading the computation to a fpga that only you have access to should solve cache-line side-channels. http://cryptol.net regarding observable gc behavior it's interesting to see tedu's recent blogpost about attacking string compare in lua. http://www.tedunangst.com/flak/post/timing-attacks-vs-interned-strings tob

On 1 August 2014 10:26, Tobias Florek
it should be stressed that delay does only help against network side channels. if you have an attacker on the same physical hardware, you will at least need branchless code. that is a very hard problem. i think it's pretty much impossible to solve that problem in haskell alone. maybe with a dsl that generates code it's possible though. cryptol looks interesting in that regard (whenever it gets back it code generators). offloading the computation to a fpga that only you have access to should solve cache-line side-channels.
Just wanted to say that what I posted might give hope for such "branchless" code (or in fact: code that may branch, but by construction not in a detectable way).

hi,
Just wanted to say that what I posted might give hope for such "branchless" code (or in fact: code that may branch, but by construction not in a detectable way).
i don't have the papers handy, but on the same host you can observe cache line collisions. that means you cannot do something different that takes the same time and generates the same amount of heat. you will have to do _the same thing_. of course packages like vincent hanquez securemem provide that kind of equality checks (and other very useful properties). so some building blocks are there. interaction with the garbage collector is still something to worry about though. in some gcs you can observe whether a string is in use somewhere in the program or not. i am not intimate with ghc's gc but i don't expect that particular vulnerability is a problem when using securemem (or even bytestring or text), but there might (and i assume will) be many other opportunities to observe some state from outside the program. don't let me discourage you though. every step to less side channels is a valuable step! tob

Everybody, thanks for an interesting discussion. And, by the way, we're all on NSA watch list now.

Aren't we anyway? On Fri, Aug 01, 2014 at 11:54:06AM +0200, Wojciech Narczyński wrote:
Everybody,
thanks for an interesting discussion.
And, by the way, we're all on NSA watch list now.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Aug 1, 2014 at 5:54 AM, Wojciech Narczyński
And, by the way, we're all on NSA watch list now.
If you've ever given a thought to security, you were already on it. >.> -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Tobias Florek
hi,
Just wanted to say that what I posted might give hope for such "branchless" code (or in fact: code that may branch, but by construction not in a detectable way).
i don't have the papers handy, but on the same host you can observe cache line collisions. that means you cannot do something different that takes the same time and generates the same amount of heat. you will have to do _the same thing_.
For example Flush+Reload attack, https://eprint.iacr.org/2013/448.pdf. Or branch prediction attacks (see refs in the paper above). Absolutely beautiful stuff.
of course packages like vincent hanquez securemem provide that kind of equality checks (and other very useful properties). so some building blocks are there. interaction with the garbage collector is still something to worry about though. in some gcs you can observe whether a string is in use somewhere in the program or not. i am not intimate with ghc's gc but i don't expect that particular vulnerability is a problem when using securemem (or even bytestring or text), but there might (and i assume will) be many other opportunities to observe some state from outside the program.
don't let me discourage you though. every step to less side channels is a valuable step!
tob
-- lelf
participants (14)
-
Adam Wick
-
Antonio Nikishaev
-
Auke Booij
-
Bardur Arantsson
-
Brandon Allbery
-
Dario Bertini
-
Friedrich Wiemer
-
Karel Gardas
-
Luke Clifton
-
Michael Jones
-
Tobias Dammers
-
Tobias Florek
-
Wojciech Narczyński
-
Wojtek Narczyński