haskell crypto is reaaaaaaaaaally slow

Anatoly Yakovenko

20 Jun 2007 20 Jun '07

3:18 a.m.

$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132] real 0m4.790s user 0m3.688s sys 0m0.492s $ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum real 0m0.023s user 0m0.000s sys 0m0.008s this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong? module Main where import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt

Show replies by date

dons＠cse.unsw.edu.au

20 Jun 20 Jun

3:24 a.m.

aeyakovenko:

...

$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?

module Main where

import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS

main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^

not a good idea. You need an MD5 over bytestrings, not [Word8]. -- Don

Anatoly Yakovenko

3:26 a.m.

...

...
module Main where

import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS

main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^

not a good idea.

You need an MD5 over bytestrings, not [Word8].

Where do I get one?

Stefan O'Rear

3:34 a.m.

On Wed, Jun 20, 2007 at 01:24:00PM +1000, Donald Bruce Stewart wrote:

...

aeyakovenko:

...
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?

module Main where

import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS

main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^

not a good idea.

You need an MD5 over bytestrings, not [Word8].

Wouldn't deforestation have produced something within a factor of 4-ish of optimal? Stefan

Anatoly Yakovenko

4:51 a.m.

yea, this still gives me 4 seconds: module Main where import System import System.Time import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS main = do args <- getArgs dt <- BS.readFile $ head args let unpacked = BS.unpack dt print $ length $! unpacked start <- getClockTime print $ MD5.hash $! unpacked end <- getClockTime print $ diffClockTimes end start $ ./md5sum ./md5sum 633413 [194,27,77,139,124,90,48,7,141,255,193,164,169,2,128,63] TimeDiff {tdYear = 0, tdMonth = 0, tdDay = 0, tdHour = 0, tdMin = 0, tdSec = 4, tdPicosec = 21077000000

dons＠cse.unsw.edu.au

4:56 a.m.

stefanor:

...

On Wed, Jun 20, 2007 at 01:24:00PM +1000, Donald Bruce Stewart wrote:

...
aeyakovenko:

...
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?

module Main where

import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS

main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^

not a good idea.

You need an MD5 over bytestrings, not [Word8].

Wouldn't deforestation have produced something within a factor of 4-ish of optimal?

There's no deforestation between ByteString.unpack and MD5.hash. So all the time here is spent taking some big bytestring, and unpacking it into n list cons cells. md5ing the bytestring directly should knock 95% off that. -- Don

dons＠cse.unsw.edu.au

4:54 a.m.

dons:

...

aeyakovenko:

...
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?

module Main where

import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS

main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^

not a good idea.

You need an MD5 over bytestrings, not [Word8].

-- Don

I note a couple of other issues: crypto is compiled with: Ghc-options: -fglasgow-exts that is, no optimisations, although it would certainly benefit from Ghc-options: -O2 -fexcess-precision -funbox-strict-fields So I'd recompile the crypto package with that first. Then be sure to compile your code with ghc -O2 -- bytestrings love -O2. Finally, to actually get C speed, use a C md5. Here's an example Haskell binding to the OpenSSL libraries 'md5' function, which you can compile and run like so: $ ghc MD5.hs -lcrypto -o hsmd5 $ time ./hsmd5 /usr/share/dict/words MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3 ./hsmd5 /usr/share/dict/words 0.01s user 0.02s system 80% cpu 0.029 total versus my system's 'md5' program: $ time md5 /usr/share/dict/words MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3 md5 /usr/share/dict/words 0.02s user 0.00s system 29% cpu 0.052 total Oh huh, that's interesting... And the code: {-# OPTIONS -O2 -fffi -#include "openssl/md5.h" #-} -- -- A few imports, should tidy these up one day. -- import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src) -- --------------------------------------------------------------------- -- -- Fast md5 using OpenSSL and zero-copying bytestrings -- -- -- The md5 hash should be referentially transparent.. -- md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr go digest 0 [] where -- print it in 0-padded hex format go :: Ptr Word8 -> Int -> [String] -> IO String go p n acc | n >= 16 = return $ concat (reverse acc) | otherwise = do w <- peekElemOff p n go p (n+1) (draw w : acc) draw w = case showHex w [] of [x] -> ['0', x] x -> x -- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); -- foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8) Happy hacking. -- Don

dons＠cse.unsw.edu.au

5:21 a.m.

...

...
aeyakovenko:

...
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

I wasn't happy with the hex printing loop. Here's a shorter version. {-# OPTIONS -O2 -fffi #-} -- -- ghc MD5.hs -o hsmd5 -lcrypto -- import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf import Control.Monad main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src) -- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x -- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8) ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit. -- Don

dons＠cse.unsw.edu.au

5:58 a.m.

New subject: Haskell, crypto and OpenSSL

dons:

...

-- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x

-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)

ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.

by the way, I note we do have a binding to OpenSSL, of sorts: http://cryp.to/hopenssl/ But it needs updating to use ByteStrings. Would make a good project for someone into crypto. -- Don

Anatoly Yakovenko

6:15 a.m.

i don't think its a problem with bytestrings. I strictly unpacked the bytestring and i still have 4 seconds for an md5 sum of the [word8] On 6/19/07, Donald Bruce Stewart wrote:

...

...
...
aeyakovenko:

...
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]

real 0m4.790s user 0m3.688s sys 0m0.492s

$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum

real 0m0.023s user 0m0.000s sys 0m0.008s

I wasn't happy with the hex printing loop. Here's a shorter version.

{-# OPTIONS -O2 -fffi #-} -- -- ghc MD5.hs -o hsmd5 -lcrypto --

import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf import Control.Monad

main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src)

-- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x

-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)

ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.

-- Don

dons＠cse.unsw.edu.au

6:22 a.m.

aeyakovenko:

...

i don't think its a problem with bytestrings. I strictly unpacked the bytestring and i still have 4 seconds for an md5 sum of the [word8]

Until someone writes an md5 on bytestrings, we can't really say how much the slowdown is due to unpacking, GC and other memory stress building and walking the big list, or the list-based md5 sum. There's a few sources of problems once you decide to build a giant list from each byte in the file. I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL. -- Don

Michael T. Richter

6:46 a.m.

On Wed, 2007-20-06 at 15:21 +1000, Donald Bruce Stewart wrote:

...

-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)

...

ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.

I'm not so sure I like the idea of having to do this by passing it over to C. Crypto sounds like exactly the kind of application that would require the certainty of FP reasoning. Is there no way to make it work reasonably efficiently in Haskell proper? -- Michael T. Richter (GoogleTalk: ttmrichter@gmail.com) I can see computers everywhere - except in the productivity statistics! (Robert Solow)

dons＠cse.unsw.edu.au

6:50 a.m.

ttmrichter:

...

On Wed, 2007-20-06 at 15:21 +1000, Donald Bruce Stewart wrote:

-- unsigned char *MD5(const unsigned char *d, unsigned long n, un signed char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)

ByteStrings were designed for this zero-copy passing of big data to C , by the way, so its a perfect fit.

I'm not so sure I like the idea of having to do this by

You don't *have* to do it via C, but you *can* do it, if you want.

...

passing it over to C. Crypto sounds like exactly the kind of application that would require the certainty of FP reasoning. Is there no way to make it work reasonably efficiently in Haskell proper?

Sure, why not? We've a good native code compiler, after all. Write an md5 over ByteString -- it should be pretty competitive. Here's a reference C implementation to start from: http://www.cse.unsw.edu.au/~dons/tmp/md5.c -- Don

Dominic Steinitz

8:23 p.m.

I'm probably missing something here but writing MD5 (and for that matter SHA1) requires bit twiddling operations (Data.Bits) and these aren't defined for ByteString. For example, SHA1 defines the following function and it's not clear to me how you'd implement this for ByteString rather than Word8. f n x y z | n <= 19 = (x .&. y) .|. ((complement x) .&. z) | n <= 39 = x `xor` y `xor` z | n <= 59 = (x .&. y) .|. (x .&. z) .|. (y .&. z) | n <= 79 = x `xor` y `xor` z I'd love to have blazzingly fast implementations for all the functions in the crypto library so if anyone feels inclined, any contributions would be very gratefully accepted. Unfortunately, I don't have the time to do this myself. Dominic.

David Roundy

8:57 p.m.

As its name implies, ByteString stores a string of bytes, which are Word8. It's a replacement for a list, not for Word8. David On Wed, Jun 20, 2007 at 09:23:51PM +0100, Dominic Steinitz wrote:

...

I'm probably missing something here but writing MD5 (and for that matter SHA1) requires bit twiddling operations (Data.Bits) and these aren't defined for ByteString. For example, SHA1 defines the following function and it's not clear to me how you'd implement this for ByteString rather than Word8.

f n x y z | n <= 19 = (x .&. y) .|. ((complement x) .&. z) | n <= 39 = x `xor` y `xor` z | n <= 59 = (x .&. y) .|. (x .&. z) .|. (y .&. z) | n <= 79 = x `xor` y `xor` z

I'd love to have blazzingly fast implementations for all the functions in the crypto library so if anyone feels inclined, any contributions would be very gratefully accepted. Unfortunately, I don't have the time to do this myself.

Dominic.

Andrew Coppin

6:50 p.m.

Donald Bruce Stewart wrote:

...

Finally, to actually get C speed, use a C md5.

I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...

Jason Dagit

10:23 p.m.

On 6/20/07, Andrew Coppin wrote:

...

Donald Bruce Stewart wrote:

...
Finally, to actually get C speed, use a C md5.

I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...

I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :) Jason

Derek Elkins

10:34 p.m.

On Wed, 2007-06-20 at 15:23 -0700, Jason Dagit wrote:

...

On 6/20/07, Andrew Coppin wrote:

...
Donald Bruce Stewart wrote:

...
Finally, to actually get C speed, use a C md5.

I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...

I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :)

Well this is something else Don said,

...

I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL.

Anatoly Yakovenko

11:11 p.m.

I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks. Anatoly On 6/20/07, Derek Elkins wrote:

...

On Wed, 2007-06-20 at 15:23 -0700, Jason Dagit wrote:

...
On 6/20/07, Andrew Coppin wrote:

...
Donald Bruce Stewart wrote:

...
Finally, to actually get C speed, use a C md5.

I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...

I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :)

Well this is something else Don said,

...
I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Derek Elkins

11:19 p.m.

On Wed, 2007-06-20 at 16:11 -0700, Anatoly Yakovenko wrote:

...

I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.

A list of Word8 is -extremely- inefficient.

David Roundy

11:49 p.m.

On Wed, Jun 20, 2007 at 06:19:35PM -0500, Derek Elkins wrote:

...

On Wed, 2007-06-20 at 16:11 -0700, Anatoly Yakovenko wrote:

...
I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.

A list of Word8 is -extremely- inefficient.

To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your memory access pattern (which becomes random-access rather than sequential), and introduces additional nonlocal memory accesses. One would hope that a hash function would be moderately close to being memory-limited, so we're talking about a multiple-order-of-magnitude slowdown. The main benefit of ByteString isn't weird compiler tricks, it's using a reasonable data structure for the problem (although the weird compiler tricks help, too). -- David Roundy Department of Physics Oregon State University

Stefan O'Rear

11:53 p.m.

On Wed, Jun 20, 2007 at 04:49:55PM -0700, David Roundy wrote:

...

To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your

Three pointers. [ INFO PTR (like a tag but not quite) ] [ PTR to Word8 (these are hashconsed, thankfully) ] [ PTR to next value ] Stefan

Duncan Coutts

21 Jun 21 Jun

3:36 a.m.

On Wed, 2007-06-20 at 16:53 -0700, Stefan O'Rear wrote:

...

On Wed, Jun 20, 2007 at 04:49:55PM -0700, David Roundy wrote:

...
To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your

Three pointers.

[ INFO PTR (like a tag but not quite) ] [ PTR to Word8 (these are hashconsed, thankfully) ] [ PTR to next value ]

So that's 12 bytes on a 32bit box, or 24 or a 64bit one, to represent one byte of data. For comparison, ByteStrings have a bigger overhead but a lower linear factor: sizeof [Word8] of length n : 32bit: n * 12 64bit: n * 24 sizeof ByteString of length n : 32bit: 40 + n (or 32 + n for shared bytestrings like substrings) 64bit: 80 + n (or 64 + n for shared) Incidentally a more space efficient representation that could preserve the same operations speeds (like O(1) substring) would be: data ByteString = BS ByteArray# Int Int which would be 4 unshared words and 2 shared words rather than the current 5 unshared and 4 shared that we get from using ForeignPtrs. The smallest possible would be 2 words overhead by just using a ByteArray#, but that sacrifices O(1) substring which is pretty important for a functional style. Duncan

Stefan O'Rear

3:43 a.m.

On Thu, Jun 21, 2007 at 04:36:13AM +0100, Duncan Coutts wrote:

...

The smallest possible would be 2 words overhead by just using a ByteArray#, but that sacrifices O(1) substring which is pretty important for a functional style.

Bulat Ziganshin

4:14 a.m.

New subject: Re[2]: haskell crypto is reaaaaaaaaaally slow

Hello Duncan, Thursday, June 21, 2007, 7:36:13 AM, you wrote:

...

The smallest possible would be 2 words overhead by just using a ByteArray#,

i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Duncan Coutts

4:48 a.m.

New subject: Re[2]: haskell crypto is reaaaaaaaaaally slow

On Thu, 2007-06-21 at 08:14 +0400, Bulat Ziganshin wrote:

...

Hello Duncan,

Thursday, June 21, 2007, 7:36:13 AM, you wrote:

...
The smallest possible would be 2 words overhead by just using a ByteArray#,

i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average

Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes. Try the same experiment with C's malloc. I'd be very surprised if you can allocate 37 bytes and not end up using 40 (plus some extra for remembering the allocation length). Duncan

Bulat Ziganshin

22 Jun 22 Jun

6:52 a.m.

New subject: Re[4]: haskell crypto is reaaaaaaaaaally slow

Hello Duncan, Thursday, June 21, 2007, 8:48:53 AM, you wrote:

...

...
...
The smallest possible would be 2 words overhead by just using a ByteArray#,

i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average

...

Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes.

...

Try the same experiment with C's malloc. I'd be very surprised if you can allocate 37 bytes and not end up using 40 (plus some extra for remembering the allocation length).

that i'm trying to say is that one need to store exact string size because value returned by getSizeOfByteArray is aligned to 4 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Duncan Coutts

3:55 p.m.

New subject: Re[4]: haskell crypto is reaaaaaaaaaally slow

On Fri, 2007-06-22 at 10:52 +0400, Bulat Ziganshin wrote:

...

...
...
i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average

...
Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes.

...

that i'm trying to say is that one need to store exact string size because value returned by getSizeOfByteArray is aligned to 4

Ah yes, you're quite right. To allow GHC's ByteArray# to be used to implement a compact string type it'd have to be changed to store the length in bytes rather than words. Duncan

Bulat Ziganshin

21 Jun 21 Jun

4:10 a.m.

New subject: Re[2]: haskell crypto is reaaaaaaaaaally slow

Hello David, Thursday, June 21, 2007, 3:49:55 AM, you wrote:

...

...
A list of Word8 is -extremely- inefficient.

...

To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your memory access pattern (which becomes random-access rather than sequential), and introduces additional nonlocal memory accesses. One would hope that a hash function would be moderately close to being memory-limited, so we're talking about a multiple-order-of-magnitude slowdown.

if this list is produced lazily, then it will live in generation-2 area of GC which is 256kb large. lists are slow due to double laziness -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin

4:07 a.m.

New subject: Re[2]: haskell crypto is reaaaaaaaaaally slow

Hello Anatoly, Thursday, June 21, 2007, 3:11:13 AM, you wrote:

...

implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.

why you believe in it? are you ever implemented anything in Haskell without tricks and it was only 10x slower than C equivalent? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

apfelmus

9:25 a.m.

Anatoly Yakovenko wrote:

...

I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.

Anatoly, the idea to unpack is the problem. You have to operate directly on the ByteString to get decent performance, for instance with a fold. Compare import Data.ByteString.Lazy as BS -- very slow checksum = foldl' xor 0 . BS.unpack -- blazingly fast checksum' = BS.foldl' xor 0 Regards, apfelmus

Bulat Ziganshin

20 Jun 20 Jun

3:38 p.m.

Hello Anatoly, if you still believe in haskell/ghc speed i suggest you to read the following: http://www.cse.unsw.edu.au/~chak/papers/afp-arrays.ps.gz http://www.cse.unsw.edu.au/~dons/papers/fusion.pdf -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

6590

Age (days ago)

6592

Last active (days ago)

List overview

Download

31 comments

12 participants

participants (12)

Anatoly Yakovenko
Andrew Coppin
apfelmus
Bulat Ziganshin
David Roundy
Derek Elkins
Dominic Steinitz
dons＠cse.unsw.edu.au
Duncan Coutts
Jason Dagit
Michael T. Richter
Stefan O'Rear