PSA: If you're serializing floating point numbers, don't use binary

Serialization of floating point numbers with binary is fantastically slow and incorrect if you’re using NaNs, see https://github.com/kolmodin/binary/issues/64 https://github.com/kolmodin/binary/issues/64 https://github.com/kolmodin/binary/issues/69 https://github.com/kolmodin/binary/issues/69 I recently spent half a day debugging performance problems because of this, and since backwards compatibility with older formats is required, this problem is probably not going to be solved. We decided to switch to cereal for this reason. With some patches https://github.com/GaloisInc/cereal/pull/46 cereal was 30x faster for the data we were serializing (scientific computing, mostly Doubles packed in nested records containing vectors). The size of the serialized data is also roughly 3 times smaller – with binary a Double takes at least 25 bytes of space instead of 8. With Float it’s even worse, 25 bytes instead of 8.

I unsafeCoerce to Word64 first. This usually helps in making the output
compatible with non-Haskell applications and is much faster.
Tom
On Tue, Feb 9, 2016 at 10:19 AM, Francesco Mazzoli
Serialization of floating point numbers with binary is fantastically slow and incorrect if you’re using NaNs, see
- https://github.com/kolmodin/binary/issues/64 - https://github.com/kolmodin/binary/issues/69
I recently spent half a day debugging performance problems because of this, and since backwards compatibility with older formats is required, this problem is probably not going to be solved.
We decided to switch to cereal for this reason. With some patches https://github.com/GaloisInc/cereal/pull/46 cereal was 30x faster for the data we were serializing (scientific computing, mostly Doubles packed in nested records containing vectors).
The size of the serialized data is also roughly 3 times smaller – with binary a Double takes at least 25 bytes of space instead of 8. With Float it’s even worse, 25 bytes instead of 8.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- *Tom Nielsen | Chief Data Science Officer * Soho works | Tea building | 56 Shoreditch High St | London E1 6JJ +44 7961 270 416 *Beautiful Destinations | The world’s largest travel influencer on Instagram* Destinations https://instagram.com/beautifuldestinations/ | Hotels https://instagram.com/beautifulhotels | Cuisines https://instagram.com/beautifulcuisines | Matters https://instagram.com/beautifulmatters www.beautifuldestinations.com

`cereal` and `bytestring` `Builder` work by encoding to a big-endian `Word64`. A bit more expensive than a blind `unsafeCoerce`, but same idea. Francesco
On 9 Feb 2016, at 11:40, Tom Nielsen
wrote: I unsafeCoerce to Word64 first. This usually helps in making the output compatible with non-Haskell applications and is much faster.
Tom
On Tue, Feb 9, 2016 at 10:19 AM, Francesco Mazzoli
mailto:f@mazzo.li> wrote: Serialization of floating point numbers with binary is fantastically slow and incorrect if you’re using NaNs, see https://github.com/kolmodin/binary/issues/64 https://github.com/kolmodin/binary/issues/64 https://github.com/kolmodin/binary/issues/69 https://github.com/kolmodin/binary/issues/69 I recently spent half a day debugging performance problems because of this, and since backwards compatibility with older formats is required, this problem is probably not going to be solved.
We decided to switch to cereal for this reason. With some patches https://github.com/GaloisInc/cereal/pull/46 cereal was 30x faster for the data we were serializing (scientific computing, mostly Doubles packed in nested records containing vectors).
The size of the serialized data is also roughly 3 times smaller – with binary a Double takes at least 25 bytes of space instead of 8. With Float it’s even worse, 25 bytes instead of 8.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- Tom Nielsen | Chief Data Science Officer Soho works | Tea building | 56 Shoreditch High St | London E1 6JJ +44 7961 270 416
Beautiful Destinations | The world’s largest travel influencer on Instagram Destinations https://instagram.com/beautifuldestinations/ | Hotels https://instagram.com/beautifulhotels | Cuisines https://instagram.com/beautifulcuisines | Matters https://instagram.com/beautifulmatters www.beautifuldestinations.com http://www.beautifuldestinations.com/
participants (2)
-
Francesco Mazzoli
-
Tom Nielsen