Encrypting streamed data

I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket). The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing). Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available). Can anyone recommend a solution? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Hi,
I do not know of a library to do this, sorry. Note that the way public-key
crypto works in a streaming fashion is typically to use the public-key bit
only to encrypt a key for a symmetric cipher and then use the (much-faster)
symmetric encryption for the actual data. The symmetric bit could well be
something like AES256-CBC or AES256-CTR.
This means that the format of the resulting data is a bit
implementation-defined as it has to include the asymetrically-encrypted
data first, followed by the stream of symmetrically-encrypted data. GnuPG
includes quite a bit of metadata in its files that describes the algorithms
used and delimits the pieces, so if you want the resulting files to be
GnuPG-compatible you'll need to take this into account.
If it were me, I'd probably just shell out to `gpg`. It's fast and low-risk.
Hope that helps,
David
On 6 Jul 2017 05:59, "Ivan Lazar Miljenovic"

Apologies, just seen the bit about wanting to pass the key in directly
rather than using the GPG keyring because there are no email addresses
attached to your various keys.
Maybe a silly question, but can you give them email addresses to identify
them in a GPG-compatible manner? They don't have to have associated
mailboxes so the addresses can be totally made-up.
If not, I'd probably start to look to something like openssl to do the
symmetric encryption and manage the keys for that separately. It looks
possible to build a streaming AES implementation using the nonstreaming
functions in `cryptonite`, but the usual recommendation is that its far too
risky to do any low-level crypto yourself so this seems like a bad idea.
On 6 Jul 2017 07:40, "David Turner"
Hi,
I do not know of a library to do this, sorry. Note that the way public-key crypto works in a streaming fashion is typically to use the public-key bit only to encrypt a key for a symmetric cipher and then use the (much-faster) symmetric encryption for the actual data. The symmetric bit could well be something like AES256-CBC or AES256-CTR.
This means that the format of the resulting data is a bit implementation-defined as it has to include the asymetrically-encrypted data first, followed by the stream of symmetrically-encrypted data. GnuPG includes quite a bit of metadata in its files that describes the algorithms used and delimits the pieces, so if you want the resulting files to be GnuPG-compatible you'll need to take this into account.
If it were me, I'd probably just shell out to `gpg`. It's fast and low-risk.
Hope that helps,
David
On 6 Jul 2017 05:59, "Ivan Lazar Miljenovic"
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing).
Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available).
Can anyone recommend a solution?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 6 July 2017 at 17:23, David Turner
Apologies, just seen the bit about wanting to pass the key in directly rather than using the GPG keyring because there are no email addresses attached to your various keys.
Maybe a silly question, but can you give them email addresses to identify them in a GPG-compatible manner? They don't have to have associated mailboxes so the addresses can be totally made-up.
Yes, that's my fallback, since it's possible to tell gpg to use a different directory so I can provide a key externally with my transmission request, load it into a temporary store, grab the identity out and use that. It would just be more convenient to have a "here's a ByteString with the public key" option (which I can always implement as a wrapper function).
If not, I'd probably start to look to something like openssl to do the symmetric encryption and manage the keys for that separately. It looks possible to build a streaming AES implementation using the nonstreaming functions in `cryptonite`, but the usual recommendation is that its far too risky to do any low-level crypto yourself so this seems like a bad idea.
On 6 Jul 2017 07:40, "David Turner"
wrote: Hi,
I do not know of a library to do this, sorry. Note that the way public-key crypto works in a streaming fashion is typically to use the public-key bit only to encrypt a key for a symmetric cipher and then use the (much-faster) symmetric encryption for the actual data. The symmetric bit could well be something like AES256-CBC or AES256-CTR.
This means that the format of the resulting data is a bit implementation-defined as it has to include the asymetrically-encrypted data first, followed by the stream of symmetrically-encrypted data. GnuPG includes quite a bit of metadata in its files that describes the algorithms used and delimits the pieces, so if you want the resulting files to be GnuPG-compatible you'll need to take this into account.
If it were me, I'd probably just shell out to `gpg`. It's fast and low-risk.
Hope that helps,
David
On 6 Jul 2017 05:59, "Ivan Lazar Miljenovic"
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing).
Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available).
Can anyone recommend a solution?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

If your data fits in RAM maybe it's best to forget about the streaming and
use the saltine library's Box module. Then you can't really go wrong on the
crypto front.
On Jul 6, 2017 09:41, "Ivan Lazar Miljenovic"
On 6 July 2017 at 17:23, David Turner
wrote: Apologies, just seen the bit about wanting to pass the key in directly rather than using the GPG keyring because there are no email addresses attached to your various keys.
Maybe a silly question, but can you give them email addresses to identify them in a GPG-compatible manner? They don't have to have associated mailboxes so the addresses can be totally made-up.
Yes, that's my fallback, since it's possible to tell gpg to use a different directory so I can provide a key externally with my transmission request, load it into a temporary store, grab the identity out and use that. It would just be more convenient to have a "here's a ByteString with the public key" option (which I can always implement as a wrapper function).
If not, I'd probably start to look to something like openssl to do the symmetric encryption and manage the keys for that separately. It looks possible to build a streaming AES implementation using the nonstreaming functions in `cryptonite`, but the usual recommendation is that its far
risky to do any low-level crypto yourself so this seems like a bad idea.
On 6 Jul 2017 07:40, "David Turner"
wrote: Hi,
I do not know of a library to do this, sorry. Note that the way
too public-key
crypto works in a streaming fashion is typically to use the public-key bit only to encrypt a key for a symmetric cipher and then use the (much-faster) symmetric encryption for the actual data. The symmetric bit could well be something like AES256-CBC or AES256-CTR.
This means that the format of the resulting data is a bit implementation-defined as it has to include the asymetrically-encrypted data first, followed by the stream of symmetrically-encrypted data. GnuPG includes quite a bit of metadata in its files that describes the algorithms used and delimits the pieces, so if you want the resulting files to be GnuPG-compatible you'll need to take this into account.
If it were me, I'd probably just shell out to `gpg`. It's fast and low-risk.
Hope that helps,
David
On 6 Jul 2017 05:59, "Ivan Lazar Miljenovic"
wrote:
I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing).
Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available).
Can anyone recommend a solution?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 6 July 2017 at 19:17, Patrick Chilton
If your data fits in RAM maybe it's best to forget about the streaming and use the saltine library's Box module. Then you can't really go wrong on the crypto front.
If that was the case, I wouldn't be asking ;-)
On Jul 6, 2017 09:41, "Ivan Lazar Miljenovic"
wrote: On 6 July 2017 at 17:23, David Turner
wrote: Apologies, just seen the bit about wanting to pass the key in directly rather than using the GPG keyring because there are no email addresses attached to your various keys.
Maybe a silly question, but can you give them email addresses to identify them in a GPG-compatible manner? They don't have to have associated mailboxes so the addresses can be totally made-up.
Yes, that's my fallback, since it's possible to tell gpg to use a different directory so I can provide a key externally with my transmission request, load it into a temporary store, grab the identity out and use that. It would just be more convenient to have a "here's a ByteString with the public key" option (which I can always implement as a wrapper function).
If not, I'd probably start to look to something like openssl to do the symmetric encryption and manage the keys for that separately. It looks possible to build a streaming AES implementation using the nonstreaming functions in `cryptonite`, but the usual recommendation is that its far too risky to do any low-level crypto yourself so this seems like a bad idea.
On 6 Jul 2017 07:40, "David Turner"
wrote: Hi,
I do not know of a library to do this, sorry. Note that the way public-key crypto works in a streaming fashion is typically to use the public-key bit only to encrypt a key for a symmetric cipher and then use the (much-faster) symmetric encryption for the actual data. The symmetric bit could well be something like AES256-CBC or AES256-CTR.
This means that the format of the resulting data is a bit implementation-defined as it has to include the asymetrically-encrypted data first, followed by the stream of symmetrically-encrypted data. GnuPG includes quite a bit of metadata in its files that describes the algorithms used and delimits the pieces, so if you want the resulting files to be GnuPG-compatible you'll need to take this into account.
If it were me, I'd probably just shell out to `gpg`. It's fast and low-risk.
Hope that helps,
David
On 6 Jul 2017 05:59, "Ivan Lazar Miljenovic"
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing).
Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available).
Can anyone recommend a solution?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

This is something I've been interested in too, so with some guidance from Vincent I put together a short example of doing this: https://gist.github.com/snoyberg/20243aae347b38ad09daaf8b129e2efb It's got some magic values in a few places (especially that 65!), and the usage of leftovers/B.append is far from efficient. However, it should get the general idea across. On Thu, Jul 6, 2017 at 7:58 AM, Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
The command-line gpg tool seems to be able to encrypt/decrypt data from stdin to stdout in a streaming fashion, but in my attempts to use it it seems very file-based for things like the keys to use (whereas I would prefer to be able to pass the public key as an actual value rather than a file; if nothing else because this is for tools that don't have email addresses to use and base their keys on for addressing).
Is there an existing library that can achieve this using conduit/pipes/whatever? cryptonite-conduit only covers hashing, hOpenPGP is poorly documented and I can't work out how to use it ("just follow the types" is difficult when Haddock docs don't link to the required types (seems to be because it uses the "import Module as X" trick for re-exporting everything but then everything from those modules isn't available).
Can anyone recommend a solution?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

https://gist.github.com/snoyberg/20243aae347b38ad09daaf8b129e2efb
It's got some magic values in a few places (especially that 65!)
Off topic, but sometimes when I find myself using magic values I can't/won't get rid off, I'll just apply the good old habits learned in Java times. For example you might find a section at the top of a file with things like _DAYS_IN_A_WEEK_ , _KNOWN_SIZE_OF_POINT_ :: INT-- | Seems to be a good approximation for now _DAYS_IN_A_WEEK_ = 7 -- | When you ask "What's the point", this will not answer your question. _KNOWN_SIZE_OF_POINT_ = 65 Uppercase makes it easy to identify these as constants/magic values. The underscore in front works as an initial lowercase letter so they can be used as values despite uppercase. The goal is not necessarily to make these values easy to change, but to add documentation to usage sites. It's unnecessary In such a quick demonstration, but I wanted to take the opportunity to throw in my _VALUE_OF_CONTRIBUTION_IN_CENTS_ cents because I haven't seen others do something like this. Cheers, MarLinn PS: The underscore does introduce its own error message though. • Found hole: _VALUE_OF_CONTRIBUTION_IN_CENTS_ :: Double Or perhaps ‘_VALUE_OF_CONTRIBUTION_IN_CENTS_’ is mis-spelled, or not in scope

You can avoid the underscores by defining the constants as pattern synonyms
so that they are syntactically more like constructors. This also, of
course, allows you to use them as patterns, which can also be nice for
constants.
On Thu, Jul 6, 2017 at 9:09 AM, MarLinn
https://gist.github.com/snoyberg/20243aae347b38ad09daaf8b129e2efb
It's got some magic values in a few places (especially that 65!)
Off topic, but sometimes when I find myself using magic values I can't/won't get rid off, I'll just apply the good old habits learned in Java times. For example you might find a section at the top of a file with things like
_DAYS_IN_A_WEEK_ , _KNOWN_SIZE_OF_POINT_ :: INT
-- | Seems to be a good approximation for now _DAYS_IN_A_WEEK_ = 7
-- | When you ask "What's the point", this will not answer your question. _KNOWN_SIZE_OF_POINT_ = 65
Uppercase makes it easy to identify these as constants/magic values. The underscore in front works as an initial lowercase letter so they can be used as values despite uppercase.
The goal is not necessarily to make these values easy to change, but to add documentation to usage sites.
It's unnecessary In such a quick demonstration, but I wanted to take the opportunity to throw in my _VALUE_OF_CONTRIBUTION_IN_CENTS_ cents because I haven't seen others do something like this.
Cheers, MarLinn
PS: The underscore does introduce its own error message though.
• Found hole: _VALUE_OF_CONTRIBUTION_IN_CENTS_ :: Double Or perhaps ‘_VALUE_OF_CONTRIBUTION_IN_CENTS_’ is mis-spelled, or not in scope
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

You can avoid the underscores by defining the constants as pattern synonyms so that they are syntactically more like constructors. This also, of course, allows you to use them as patterns, which can also be nice for constants.
Off topic, but sometimes when I find myself using magic values I can't/won't get rid off, I'll just apply the good old habits learned in Java times. For example you might find a section at the top of a file with things like
_DAYS_IN_A_WEEK_ , _KNOWN_SIZE_OF_POINT_ :: INT-- | Seems to be a good approximation for now _DAYS_IN_A_WEEK_ = 7 -- | When you ask "What's the point", this will not answer your question. _KNOWN_SIZE_OF_POINT_ = 65
Nice! This seems a bit evil, and not a full substitute. But it also helps hijack syntax highlighting, and it forces users to use a recent-ish version of GHC. I consider that A Good Thing™. Now… as everything is “constant” in Haskell, on to define everything in capital letter patterns and return to good old COBOL times! 🍺 Wohoo! Cheers, MarLinn

On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own? Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid! So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data. With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks. -- Viktor.

On 7 July 2017 at 01:44, Viktor Dukhovni
On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice. (The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

If you are not a security expert, I would strongly recommend against rolling your own encryption scheme. If you are a security expert, you would probably not be asking us for advice on how to roll your own encryption scheme. I would suggest that you find something off the rack that might meet your needs (whether it's written in Haskell or not) and make sure you understand how it works and what your threat model is well enough to decide whether it does, in fact, meet your needs. On Thu, Jul 6, 2017 at 3:29 PM Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
On 7 July 2017 at 01:44, Viktor Dukhovni
wrote: On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic <
ivan.miljenovic@gmail.com> wrote:
I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice.
(The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I'm also not sure what benefit you would get from rolling your own versus
using AWS's server-side S3 bucket encryption[1], now that I think about it.
[1]
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.htm...
On Thu, Jul 6, 2017 at 3:56 PM Rein Henrichs
If you are not a security expert, I would strongly recommend against rolling your own encryption scheme. If you are a security expert, you would probably not be asking us for advice on how to roll your own encryption scheme. I would suggest that you find something off the rack that might meet your needs (whether it's written in Haskell or not) and make sure you understand how it works and what your threat model is well enough to decide whether it does, in fact, meet your needs.
On Thu, Jul 6, 2017 at 3:29 PM Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
On 7 July 2017 at 01:44, Viktor Dukhovni
wrote: On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic <
ivan.miljenovic@gmail.com> wrote:
I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice.
(The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 7 July 2017 at 09:04, Rein Henrichs
I'm also not sure what benefit you would get from rolling your own versus using AWS's server-side S3 bucket encryption[1], now that I think about it.
[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.htm...
Due to regulatory requirements, we can't rely upon that and need to ensure we never put any private data in AWS that even someone with login access to AWS can read.
On Thu, Jul 6, 2017 at 3:56 PM Rein Henrichs
wrote: If you are not a security expert, I would strongly recommend against rolling your own encryption scheme. If you are a security expert, you would probably not be asking us for advice on how to roll your own encryption scheme. I would suggest that you find something off the rack that might meet your needs (whether it's written in Haskell or not) and make sure you understand how it works and what your threat model is well enough to decide whether it does, in fact, meet your needs.
On Thu, Jul 6, 2017 at 3:29 PM Ivan Lazar Miljenovic
wrote: On 7 July 2017 at 01:44, Viktor Dukhovni
wrote: On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice.
(The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Ah, yes. That would do it. On Thu, Jul 6, 2017 at 4:25 PM Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
On 7 July 2017 at 09:04, Rein Henrichs
wrote: I'm also not sure what benefit you would get from rolling your own versus using AWS's server-side S3 bucket encryption[1], now that I think about it.
[1]
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.htm...
Due to regulatory requirements, we can't rely upon that and need to ensure we never put any private data in AWS that even someone with login access to AWS can read.
On Thu, Jul 6, 2017 at 3:56 PM Rein Henrichs
wrote: If you are not a security expert, I would strongly recommend against rolling your own encryption scheme. If you are a security expert, you
would
probably not be asking us for advice on how to roll your own encryption scheme. I would suggest that you find something off the rack that might meet your needs (whether it's written in Haskell or not) and make sure you understand how it works and what your threat model is well enough to decide whether it does, in fact, meet your needs.
On Thu, Jul 6, 2017 at 3:29 PM Ivan Lazar Miljenovic
wrote: On 7 July 2017 at 01:44, Viktor Dukhovni
wrote:
On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic
wrote: I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out
of a
DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice.
(The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most
data
formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Hi,
I can't think of a terribly good way to achieve GPG/PGP-compatibility
without simply using GPG/PGP, since the file format is quite involved.
That said, here is how to implement a CBC-mode block cipher encryption
using Conduit, which is suitable for something like AES256 encryption. It
is almost certainly vulnerable to side-channel attacks (timing,
cache-poisoning, etc) but as a pure function from input to output it is
equivalent to `openssl aes-256-cbc -e -K <KEY-IN-HEX> -iv <IV-IN-HEX> -in
data/plain-text.txt` which I should hope would be standard enough for
analysis.
This leaves you with the problem of storing the key and IV securely,
encrypted using the asymmetric key that you first thought of, but hopefully
that problem is surmountable!
Cheers,
David
import Control.Monad
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import Crypto.Cipher.AES
import Crypto.Cipher.Types
import Crypto.Data.Padding
import Crypto.Error
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.Binary
import Data.Monoid
loadKey :: IO B.ByteString
loadKey = B.readFile "data/key.dat"
loadIV :: IO (IV AES256)
loadIV = do
bytes <- B.readFile "data/iv.dat"
maybe (error "makeIV failed") return $ makeIV bytes
loadCipher :: IO AES256
loadCipher = throwCryptoErrorIO =<< cipherInit <$> loadKey
loadPlainText :: IO B.ByteString
loadPlainText = B.readFile "data/plain-text.txt"
encryptConduit :: (BlockCipher c, Monad m) => c -> IV c -> B.ByteString ->
Conduit B.ByteString m B.ByteString
encryptConduit cipher iv partialBlock = await >>= \case
Nothing -> yield $ cbcEncrypt cipher iv $ pad (PKCS7 (blockSize cipher))
partialBlock
Just moreBytes -> let
fullBlocks = (B.length moreBytes + B.length
partialBlock) `div` blockSize cipher
(thisTime, nextTime) = B.splitAt (fullBlocks * blockSize cipher)
(partialBlock <> moreBytes)
in do
iv' <- if B.null thisTime then return iv else do
let cipherText = cbcEncrypt cipher iv thisTime
lastBlockOfCipherText = B.drop (B.length cipherText - blockSize
cipher) cipherText
yield cipherText
maybe (error "makeIV failed") return $ makeIV lastBlockOfCipherText
encryptConduit cipher iv' nextTime
go :: IO ()
go = do
c <- loadCipher
iv <- loadIV
pt <- loadPlainText
let padded = pad (PKCS7 (blockSize c)) $ pt
encrypted = cbcEncrypt c iv padded
B.writeFile "data/haskell-oneshot.dat" encrypted
runResourceT $ runConduit
$ sourceFile "data/plain-text.txt"
=$= encryptConduit c iv mempty
=$= sinkFile "data/haskell-streaming.dat"
On 6 July 2017 at 23:29, Ivan Lazar Miljenovic
On 7 July 2017 at 01:44, Viktor Dukhovni
wrote: On Jul 6, 2017, at 12:58 AM, Ivan Lazar Miljenovic <
ivan.miljenovic@gmail.com> wrote:
I have a use case for needing to use public key cryptography to encrypt a large amount of data in a streaming fashion (get it out of a DB, encrypt, put into an AWS S3 bucket).
What are the data-format requirements? Do you need (binary) CMS output? GPG-compatible output? Or just roll your own?
The intent is to be able to transfer data between two parties such that only the recipient is able to view it (hence the usage of public key cryptography). GPG/PGP compatability is preferable as it's common, but anything that is sufficiently standardised (as this will potentially be used by others that aren't me doing so with Haskell and thus can't just use a library to do so) will suffice.
(The other advantage of GPG/PGP is that the security testing team is more familiar with it and thus likely to sign off on it.)
Integrity protection can be tricky with large data streams. Most data formats for enveloped data have a single MAC at the end, which means that the decoder has to consume all the data before it is known to be valid!
So if you're in a position to avoid a standard all-in-one format, it makes sense to "packetize" the stream, with integrity protection for each "packet", and packet sequence numbers to preserve overall stream integrity. With vast amounts of data, you'll want to be careful with the symmetric cipher modes, AEAD (AES-GCM, for example) protects only a limited amount of data before you need to rekey. It may be simplest to just generate a new symmetric key for every N megabytes of data.
With a careful design of the "packet" format, you can use in-memory crypto for each packet. Don't forget to include an "end-of-stream" packet to defeat truncation attacks.
This sounds good in theory, but in practice I'm not versed enough in security to want to try and roll my own if I could avoid it, and trying to document such a format for others to use could be problematic.allowed to post.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Tue, Jul 11, 2017 at 03:35:56PM +0100, David Turner wrote:
I can't think of a terribly good way to achieve GPG/PGP-compatibility without simply using GPG/PGP, since the file format is quite involved.
That said, here is how to implement a CBC-mode block cipher encryption using Conduit, which is suitable for something like AES256 encryption. It is almost certainly vulnerable to side-channel attacks (timing, cache-poisoning, etc) but as a pure function from input to output it is equivalent to `openssl aes-256-cbc -e -K <KEY-IN-HEX> -iv <IV-IN-HEX> -in data/plain-text.txt` which I should hope would be standard enough for analysis.
Just straight CBC lacks integrity protection. A MAC is still required, and usually one wants asymmetric key exchange, rather than a shared symmetric key. So in practice one wants something like CMS (successor to S/MIME). The OpenSSL cms(1) command can do clear signing, encryption, or both (in either order) by piping the output of one to the other. In many applications it is safer to encrypt then sign, rather than sign and then encrypt. Packetizing large input streams is well worth it, but while each "packet" can use any of a number of standar formats, some standards (notably CMS AFAIK) may lack support for packetizing large input streams. The OpenPGP standard does support breaking streams into "packets", so for large streams that may be optimal, you just need an OpenPGP library implementation that supports sensible packet sizes, and perhaps an FFI interface for Haskell. Alternatively, just a pipe to a CLI will do, bug the "gpg" CLI does not appear to support creating streams with more than one packet (Unless this happens implicitly for "large-enough" streams). -- Viktor.
participants (8)
-
David Turner
-
Ivan Lazar Miljenovic
-
Jake McArthur
-
MarLinn
-
Michael Snoyman
-
Patrick Chilton
-
Rein Henrichs
-
Viktor Dukhovni