haskell crypto is reaaaaaaaaaally slow

$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132] real 0m4.790s user 0m3.688s sys 0m0.492s $ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum real 0m0.023s user 0m0.000s sys 0m0.008s this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong? module Main where import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt

aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?
module Main where
import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS
main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^
not a good idea. You need an MD5 over bytestrings, not [Word8]. -- Don

module Main where
import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS
main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^
not a good idea.
You need an MD5 over bytestrings, not [Word8].
Where do I get one?

On Wed, Jun 20, 2007 at 01:24:00PM +1000, Donald Bruce Stewart wrote:
aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?
module Main where
import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS
main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^
not a good idea.
You need an MD5 over bytestrings, not [Word8].
Wouldn't deforestation have produced something within a factor of 4-ish of optimal? Stefan

yea, this still gives me 4 seconds: module Main where import System import System.Time import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS main = do args <- getArgs dt <- BS.readFile $ head args let unpacked = BS.unpack dt print $ length $! unpacked start <- getClockTime print $ MD5.hash $! unpacked end <- getClockTime print $ diffClockTimes end start $ ./md5sum ./md5sum 633413 [194,27,77,139,124,90,48,7,141,255,193,164,169,2,128,63] TimeDiff {tdYear = 0, tdMonth = 0, tdDay = 0, tdHour = 0, tdMin = 0, tdSec = 4, tdPicosec = 21077000000

stefanor:
On Wed, Jun 20, 2007 at 01:24:00PM +1000, Donald Bruce Stewart wrote:
aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?
module Main where
import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS
main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^
not a good idea.
You need an MD5 over bytestrings, not [Word8].
Wouldn't deforestation have produced something within a factor of 4-ish of optimal?
There's no deforestation between ByteString.unpack and MD5.hash. So all the time here is spent taking some big bytestring, and unpacking it into n list cons cells. md5ing the bytestring directly should knock 95% off that. -- Don

dons:
aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
this is my implementation using crypto (http://www.haskell.org/crypto/). Am I doing something wrong?
module Main where
import System import qualified Data.Digest.MD5 as MD5 import qualified Data.ByteString as BS
main = do args <- getArgs dt <- BS.readFile $ head args putStrLn $ show $ MD5.hash . BS.unpack $ dt ^^^^ ^^^^^^^^^
not a good idea.
You need an MD5 over bytestrings, not [Word8].
-- Don
I note a couple of other issues: crypto is compiled with: Ghc-options: -fglasgow-exts that is, no optimisations, although it would certainly benefit from Ghc-options: -O2 -fexcess-precision -funbox-strict-fields So I'd recompile the crypto package with that first. Then be sure to compile your code with ghc -O2 -- bytestrings love -O2. Finally, to actually get C speed, use a C md5. Here's an example Haskell binding to the OpenSSL libraries 'md5' function, which you can compile and run like so: $ ghc MD5.hs -lcrypto -o hsmd5 $ time ./hsmd5 /usr/share/dict/words MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3 ./hsmd5 /usr/share/dict/words 0.01s user 0.02s system 80% cpu 0.029 total versus my system's 'md5' program: $ time md5 /usr/share/dict/words MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3 md5 /usr/share/dict/words 0.02s user 0.00s system 29% cpu 0.052 total Oh huh, that's interesting... And the code: {-# OPTIONS -O2 -fffi -#include "openssl/md5.h" #-} -- -- A few imports, should tidy these up one day. -- import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src) -- --------------------------------------------------------------------- -- -- Fast md5 using OpenSSL and zero-copying bytestrings -- -- -- The md5 hash should be referentially transparent.. -- md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr go digest 0 [] where -- print it in 0-padded hex format go :: Ptr Word8 -> Int -> [String] -> IO String go p n acc | n >= 16 = return $ concat (reverse acc) | otherwise = do w <- peekElemOff p n go p (n+1) (draw w : acc) draw w = case showHex w [] of [x] -> ['0', x] x -> x -- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); -- foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8) Happy hacking. -- Don

aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
I wasn't happy with the hex printing loop. Here's a shorter version. {-# OPTIONS -O2 -fffi #-} -- -- ghc MD5.hs -o hsmd5 -lcrypto -- import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf import Control.Monad main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src) -- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x -- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8) ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit. -- Don

dons:
-- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x
-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)
ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.
by the way, I note we do have a binding to OpenSSL, of sorts: http://cryp.to/hopenssl/ But it needs updating to use ByteStrings. Would make a good project for someone into crypto. -- Don

i don't think its a problem with bytestrings. I strictly unpacked the
bytestring and i still have 4 seconds for an md5 sum of the [word8]
On 6/19/07, Donald Bruce Stewart
aeyakovenko:
$ time ./md5sum ./md5sum [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
real 0m4.790s user 0m3.688s sys 0m0.492s
$ time md5sum ./md5sum 69fc348abbc0d811e17bb9037c655684 ./md5sum
real 0m0.023s user 0m0.000s sys 0m0.008s
I wasn't happy with the hex printing loop. Here's a shorter version.
{-# OPTIONS -O2 -fffi #-} -- -- ghc MD5.hs -o hsmd5 -lcrypto --
import System.Environment import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen) import qualified Data.ByteString as B import Foreign import Foreign.C.Types import Numeric import Text.Printf import Control.Monad
main = do (f:_) <- getArgs src <- B.readFile f printf "MD5 (%s) = %s \n" f (md5sum src)
-- Fast md5 using OpenSSL and non-copying bytestrings md5sum :: B.ByteString -> String md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do digest <- c_md5 ptr (fromIntegral n) nullPtr liftM concat $ forM [0..15] $ \n -> do w <- peekElemOff digest n return $ case showHex w [] of [x] -> ['0', x]; x -> x
-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)
ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.
-- Don

aeyakovenko:
i don't think its a problem with bytestrings. I strictly unpacked the bytestring and i still have 4 seconds for an md5 sum of the [word8]
Until someone writes an md5 on bytestrings, we can't really say how much the slowdown is due to unpacking, GC and other memory stress building and walking the big list, or the list-based md5 sum. There's a few sources of problems once you decide to build a giant list from each byte in the file. I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL. -- Don

On Wed, 2007-20-06 at 15:21 +1000, Donald Bruce Stewart wrote:
-- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)
ByteStrings were designed for this zero-copy passing of big data to C, by the way, so its a perfect fit.
I'm not so sure I like the idea of having to do this by passing it over
to C. Crypto sounds like exactly the kind of application that would
require the certainty of FP reasoning. Is there no way to make it work
reasonably efficiently in Haskell proper?
--
Michael T. Richter

ttmrichter:
On Wed, 2007-20-06 at 15:21 +1000, Donald Bruce Stewart wrote:
-- unsigned char *MD5(const unsigned char *d, unsigned long n, un signed char *md); foreign import ccall "openssl/md5.h MD5" c_md5 :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)
ByteStrings were designed for this zero-copy passing of big data to C , by the way, so its a perfect fit.
I'm not so sure I like the idea of having to do this by
You don't *have* to do it via C, but you *can* do it, if you want.
passing it over to C. Crypto sounds like exactly the kind of application that would require the certainty of FP reasoning. Is there no way to make it work reasonably efficiently in Haskell proper?
Sure, why not? We've a good native code compiler, after all. Write an md5 over ByteString -- it should be pretty competitive. Here's a reference C implementation to start from: http://www.cse.unsw.edu.au/~dons/tmp/md5.c -- Don

I'm probably missing something here but writing MD5 (and for that matter SHA1) requires bit twiddling operations (Data.Bits) and these aren't defined for ByteString. For example, SHA1 defines the following function and it's not clear to me how you'd implement this for ByteString rather than Word8. f n x y z | n <= 19 = (x .&. y) .|. ((complement x) .&. z) | n <= 39 = x `xor` y `xor` z | n <= 59 = (x .&. y) .|. (x .&. z) .|. (y .&. z) | n <= 79 = x `xor` y `xor` z I'd love to have blazzingly fast implementations for all the functions in the crypto library so if anyone feels inclined, any contributions would be very gratefully accepted. Unfortunately, I don't have the time to do this myself. Dominic.

As its name implies, ByteString stores a string of bytes, which are Word8. It's a replacement for a list, not for Word8. David On Wed, Jun 20, 2007 at 09:23:51PM +0100, Dominic Steinitz wrote:
I'm probably missing something here but writing MD5 (and for that matter SHA1) requires bit twiddling operations (Data.Bits) and these aren't defined for ByteString. For example, SHA1 defines the following function and it's not clear to me how you'd implement this for ByteString rather than Word8.
f n x y z | n <= 19 = (x .&. y) .|. ((complement x) .&. z) | n <= 39 = x `xor` y `xor` z | n <= 59 = (x .&. y) .|. (x .&. z) .|. (y .&. z) | n <= 79 = x `xor` y `xor` z
I'd love to have blazzingly fast implementations for all the functions in the crypto library so if anyone feels inclined, any contributions would be very gratefully accepted. Unfortunately, I don't have the time to do this myself.
Dominic.

On 6/20/07, Andrew Coppin
Donald Bruce Stewart wrote:
Finally, to actually get C speed, use a C md5.
I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...
I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :) Jason

On Wed, 2007-06-20 at 15:23 -0700, Jason Dagit wrote:
On 6/20/07, Andrew Coppin
wrote: Donald Bruce Stewart wrote:
Finally, to actually get C speed, use a C md5.
I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...
I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :)
Well this is something else Don said,
I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL.

I don't think the problem with performance of crypto has anything to
do with unpacking ByteStrings. If I unpack the bytestrings first, then
run the hash, and just time the hash algorithm, i still get 4 seconds
with crypto where the C implementation gives me 0.02 seconds. Thats
200 times slower in haskell, to me it just seems like a bad
implementation. You should be able to stay within an order of
magnitude from C with haskell without resorting to weird compiler
tricks.
Anatoly
On 6/20/07, Derek Elkins
On Wed, 2007-06-20 at 15:23 -0700, Jason Dagit wrote:
On 6/20/07, Andrew Coppin
wrote: Donald Bruce Stewart wrote:
Finally, to actually get C speed, use a C md5.
I always feel worried when people say this... It's almost like admitting "hey, Haskell is beautiful, but it can never be fast". I always find myself wanting that statement to be false...
I agree with you, but at the same time, if Don says something about the performance of Haskell I tend to trust him on it. I find that his ability to optimize Haskell tends to set the bar. Otoh, I'd love to see someone demonstrate otherwise here :)
Well this is something else Don said,
I'd suspect a pure haskell md5 over bytestrings would be competitive with a C implemetation though. Easier to just call OpenSSL.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, 2007-06-20 at 16:11 -0700, Anatoly Yakovenko wrote:
I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.
A list of Word8 is -extremely- inefficient.

On Wed, Jun 20, 2007 at 06:19:35PM -0500, Derek Elkins wrote:
On Wed, 2007-06-20 at 16:11 -0700, Anatoly Yakovenko wrote:
I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.
A list of Word8 is -extremely- inefficient.
To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your memory access pattern (which becomes random-access rather than sequential), and introduces additional nonlocal memory accesses. One would hope that a hash function would be moderately close to being memory-limited, so we're talking about a multiple-order-of-magnitude slowdown. The main benefit of ByteString isn't weird compiler tricks, it's using a reasonable data structure for the problem (although the weird compiler tricks help, too). -- David Roundy Department of Physics Oregon State University

On Wed, Jun 20, 2007 at 04:49:55PM -0700, David Roundy wrote:
To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your
Three pointers. [ INFO PTR (like a tag but not quite) ] [ PTR to Word8 (these are hashconsed, thankfully) ] [ PTR to next value ] Stefan

On Wed, 2007-06-20 at 16:53 -0700, Stefan O'Rear wrote:
On Wed, Jun 20, 2007 at 04:49:55PM -0700, David Roundy wrote:
To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your
Three pointers.
[ INFO PTR (like a tag but not quite) ] [ PTR to Word8 (these are hashconsed, thankfully) ] [ PTR to next value ]
So that's 12 bytes on a 32bit box, or 24 or a 64bit one, to represent one byte of data. For comparison, ByteStrings have a bigger overhead but a lower linear factor: sizeof [Word8] of length n : 32bit: n * 12 64bit: n * 24 sizeof ByteString of length n : 32bit: 40 + n (or 32 + n for shared bytestrings like substrings) 64bit: 80 + n (or 64 + n for shared) Incidentally a more space efficient representation that could preserve the same operations speeds (like O(1) substring) would be: data ByteString = BS ByteArray# Int Int which would be 4 unshared words and 2 shared words rather than the current 5 unshared and 4 shared that we get from using ForeignPtrs. The smallest possible would be 2 words overhead by just using a ByteArray#, but that sacrifices O(1) substring which is pretty important for a functional style. Duncan

On Thu, Jun 21, 2007 at 04:36:13AM +0100, Duncan Coutts wrote:
The smallest possible would be 2 words overhead by just using a ByteArray#, but that sacrifices O(1) substring which is pretty important for a functional style.
Not necessarily the minimum! data String = S0 | S1 Word | S2 Word | S3 Word | S4 Word | S5 Word Word | S6 Word Word | S7 Word Word | S8 Word Word | S9 Word Word Word | S10 Word Word Word | S11 Word Word Word | S12 Word Word Word | S13 Word Word Word Word | S14 Word Word Word Word | S15 Word Word Word Word | S16 Word Word Word Word | S17 Word Word Word Word Word | S18 Word Word Word Word Word | S19 Word Word Word Word Word | S20 Word Word Word Word Word | SLong ByteArray# Stefan

Hello Duncan, Thursday, June 21, 2007, 7:36:13 AM, you wrote:
The smallest possible would be 2 words overhead by just using a ByteArray#,
i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, 2007-06-21 at 08:14 +0400, Bulat Ziganshin wrote:
Hello Duncan,
Thursday, June 21, 2007, 7:36:13 AM, you wrote:
The smallest possible would be 2 words overhead by just using a ByteArray#,
i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average
Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes. Try the same experiment with C's malloc. I'd be very surprised if you can allocate 37 bytes and not end up using 40 (plus some extra for remembering the allocation length). Duncan

Hello Duncan, Thursday, June 21, 2007, 8:48:53 AM, you wrote:
The smallest possible would be 2 words overhead by just using a ByteArray#,
i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average
Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes.
Try the same experiment with C's malloc. I'd be very surprised if you can allocate 37 bytes and not end up using 40 (plus some extra for remembering the allocation length).
that i'm trying to say is that one need to store exact string size because value returned by getSizeOfByteArray is aligned to 4 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Fri, 2007-06-22 at 10:52 +0400, Bulat Ziganshin wrote:
i tried it once and found that ByteArray# size is returned rounded to 4 - there is no way in GHC runtime to alloc, say, exactly 37 bytes. and don't forget to add 2 unused bytes at average
Right, GHC heap object are always aligned to the natural alignment of the architecture, be that 4 or 8 bytes.
that i'm trying to say is that one need to store exact string size because value returned by getSizeOfByteArray is aligned to 4
Ah yes, you're quite right. To allow GHC's ByteArray# to be used to implement a compact string type it'd have to be changed to store the length in bytes rather than words. Duncan

Hello David, Thursday, June 21, 2007, 3:49:55 AM, you wrote:
A list of Word8 is -extremely- inefficient.
To expand on that terse (but very true) statement, a list of Word8 increases the space usage by a factor of probably around an order of magnitude (two pointers + 1 byte vs 1 byte), completely destroys your memory access pattern (which becomes random-access rather than sequential), and introduces additional nonlocal memory accesses. One would hope that a hash function would be moderately close to being memory-limited, so we're talking about a multiple-order-of-magnitude slowdown.
if this list is produced lazily, then it will live in generation-2 area of GC which is 256kb large. lists are slow due to double laziness -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello Anatoly, Thursday, June 21, 2007, 3:11:13 AM, you wrote:
implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.
why you believe in it? are you ever implemented anything in Haskell without tricks and it was only 10x slower than C equivalent? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Anatoly Yakovenko wrote:
I don't think the problem with performance of crypto has anything to do with unpacking ByteStrings. If I unpack the bytestrings first, then run the hash, and just time the hash algorithm, i still get 4 seconds with crypto where the C implementation gives me 0.02 seconds. Thats 200 times slower in haskell, to me it just seems like a bad implementation. You should be able to stay within an order of magnitude from C with haskell without resorting to weird compiler tricks.
Anatoly, the idea to unpack is the problem. You have to operate directly on the ByteString to get decent performance, for instance with a fold. Compare import Data.ByteString.Lazy as BS -- very slow checksum = foldl' xor 0 . BS.unpack -- blazingly fast checksum' = BS.foldl' xor 0 Regards, apfelmus

Hello Anatoly, if you still believe in haskell/ghc speed i suggest you to read the following: http://www.cse.unsw.edu.au/~chak/papers/afp-arrays.ps.gz http://www.cse.unsw.edu.au/~dons/papers/fusion.pdf -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (12)
-
Anatoly Yakovenko
-
Andrew Coppin
-
apfelmus
-
Bulat Ziganshin
-
David Roundy
-
Derek Elkins
-
Dominic Steinitz
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Jason Dagit
-
Michael T. Richter
-
Stefan O'Rear