
I have re-written SHA1 so that is more idiomatically haskell and it is easy to see how it implements the specification. The only problem is I now have a space leak. I can see where the leak is but I'm less sure what to do about getting rid of it. Here's the offending function: pad :: [Word8] -> [Word8] pad xs = xs ++ [0x80] ++ ps ++ lb where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) I've thought about zipping the xs with [1..] which will give me a length as I go. Is this the right way to go are there better techniques for dealing with this? I've attached the full source below. Dominic. module Main(main) where import Data.Char import Data.Bits import Data.List import Data.Word import System import Codec.Utils type Rotation = Int rotL :: Rotation -> Word32 -> Word32 rotL s a = shiftL a s .|. shiftL a (s-32) instance Num [Word32] where a + b = zipWith (+) a b f n x y z | (0 <= n) && (n <= 19) = (x .&. y) .|. ((complement x) .&. z) | (20 <= n) && (n <= 39) = x `xor` y `xor` z | (40 <= n) && (n <= 59) = (x .&. y) .|. (x .&. z) .|. (y .&. z) | (60 <= n) && (n <= 79) = x `xor` y `xor` z | otherwise = error "invalid index for f" k n | (0 <= n) && (n <= 19) = 0x5a827999 | (20 <= n) && (n <= 39) = 0x6ed9eba1 | (40 <= n) && (n <= 59) = 0x8f1bbcdc | (60 <= n) && (n <= 79) = 0xca62c1d6 | otherwise = error "invalid index for k" -- Word120 -> Word512 -> Word120 oneBlock ss xs = (as!!80):(bs!!80):(cs!!80):(ds!!80):(es!!80):[] where ws = xs ++ map (rotL 1) (zipWith4 xxxor wm3s wm8s wm14s wm16s) where xxxor a b c d = a `xor` b `xor` c `xor` d wm3s = drop (16-3) ws wm8s = drop (16-8) ws wm14s = drop (16-14) ws wm16s = drop (16-16) ws as = (ss!!0):ts bs = (ss!!1):as cs = (ss!!2):(map (rotL 30) bs) ds = (ss!!3):cs es = (ss!!4):ds ts = (map (rotL 5) as) + (zipWith4 f [0..] bs cs ds) + es + (map k [0..]) + ws ss :: [Word32] ss = [0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0] pad :: [Word8] -> [Word8] pad xs = xs ++ [0x80] ++ ps ++ lb where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) blockWord8sIn512 :: [Word8] -> [[Word8]] blockWord8sIn512 = unfoldr g where g [] = Nothing g xs = Just (splitAt 64 xs) getWord32s :: [Word8] -> [Word32] getWord32s s = map f [0..15] where f i = foldl (+) 0 $ map (\n -> toEnum (fromEnum (s!!(i*4+n))) `shiftL` (fromIntegral (8 * (3-n)))) [0..3] blockWord32sIn512 :: [Word8] -> [[Word32]] blockWord32sIn512 = (map getWord32s) . blockWord8sIn512 . pad -- Word120 -> Word512 -> Word120 hashOnce ss a = ss + oneBlock ss a hash = foldl' hashOnce ss . blockWord32sIn512 convert :: String -> [Word8] convert = map (fromIntegral . ord) short :: [Word8] short = convert "abc" message :: [Word8] message = convert "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq" performance n = (convert . take n . repeat) 'a' test n = mapM_ (putStrLn . show . hash) [short, message, performance n] main = do progName <- getProgName args <- getArgs if length args /= 1 then putStrLn ("Usage: " ++ progName ++ " <testSize>") else test (read (args!!0))

I have re-written SHA1 so that is more idiomatically haskell and it is easy to see how it implements the specification. The only problem is I now have a space leak. I can see where the leak is but I'm less sure what to do about getting rid of it.
Here's the offending function:
pad :: [Word8] -> [Word8] pad xs = xs ++ [0x80] ++ ps ++ lb where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l)
I would try something along the following lines (untested): \begin{spec} catWithLen xs f = xs ++ f (length xs) \end{spec} \begin{code} catWithLen :: [a] -> (Int -> [a]) -> [a] catWithLen xs f = h 0 xs where h k [] = f k h k (x : xs) = case succ k of -- forcing evaluation k' -> x : h k' xs \end{code} \begin{code} pad :: [Word8] -> [Word8] pad xs = catWithLen xs f where f l = 0x80 : ps lb where -- we know that |l = length xs| pl = (64-(l+9)) `mod` 64 ps = funPow pl (0x00 :) lb = i2osp 8 (8*l) \end{code} If you are lucky, then the replicate and the (++lb) in the original code might be fused by the compiler as an instance of foldr-build or something related --- check the optimised core code. In my variant I changed this to rely on efficient function powering instead: \begin{spec} funPow k f = foldr (.) id $ replicate k f \end{spec} \begin{code} funPow :: Int -> (a -> a) -> (a -> a) funPow n f = case compare n 0 of LT -> error ("funPow: negative argument: " ++ show n) EQ -> id GT -> pow n f where pow m g = if m > 1 then let (m',r) = divMod m 2 g' = g . g in if r == 0 then pow m' g' else pow m' g' . g else g \end{code} (You will probably also consider using Data.Bits for (`mod` 64), (8*), and (`divMod` 2). ) Wolfram

On Saturday 03 February 2007 19:42, kahl@cas.mcmaster.ca wrote:
I have re-written SHA1 so that is more idiomatically haskell and it is easy to see how it implements the specification. The only problem is I now have a space leak. I can see where the leak is but I'm less sure what to do about getting rid of it.
Here's the offending function:
pad :: [Word8] -> [Word8] pad xs = xs ++ [0x80] ++ ps ++ lb where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l)
I would try something along the following lines (untested):
\begin{spec} catWithLen xs f = xs ++ f (length xs) \end{spec}
\begin{code} catWithLen :: [a] -> (Int -> [a]) -> [a] catWithLen xs f = h 0 xs where h k [] = f k h k (x : xs) = case succ k of -- forcing evaluation k' -> x : h k' xs \end{code}
\begin{code} pad :: [Word8] -> [Word8] pad xs = catWithLen xs f where f l = 0x80 : ps lb where -- we know that |l = length xs| pl = (64-(l+9)) `mod` 64 ps = funPow pl (0x00 :) lb = i2osp 8 (8*l) \end{code}
If you are lucky, then the replicate and the (++lb) in the original code might be fused by the compiler as an instance of foldr-build or something related --- check the optimised core code.
Wolfram, Thanks but this gives a different problem: dom@heisenberg:~/sha1> ./allInOne 1000001 +RTS -hc -RTS [2845392438,1191608682,3124634993,2018558572,2630932637] [2224569924,473682542,3131984545,4182845925,3846598897] Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. Dominic.

On Sun, Feb 04, 2007 at 08:30:44AM +0000, Dominic Steinitz wrote:
On Saturday 03 February 2007 19:42, kahl@cas.mcmaster.ca wrote:
I would try something along the following lines (untested):
\begin{spec} catWithLen xs f = xs ++ f (length xs) \end{spec}
\begin{code} catWithLen :: [a] -> (Int -> [a]) -> [a] catWithLen xs f = h 0 xs where h k [] = f k h k (x : xs) = case succ k of -- forcing evaluation k' -> x : h k' xs
Nice try. k', as a variable binding, is irrefutable.
\end{code}
\begin{code} pad :: [Word8] -> [Word8] pad xs = catWithLen xs f where f l = 0x80 : ps lb where -- we know that |l = length xs| pl = (64-(l+9)) `mod` 64 ps = funPow pl (0x00 :) lb = i2osp 8 (8*l) \end{code}
Thanks but this gives a different problem:
dom@heisenberg:~/sha1> ./allInOne 1000001 +RTS -hc -RTS [2845392438,1191608682,3124634993,2018558572,2630932637] [2224569924,473682542,3131984545,4182845925,3846598897] Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
expected result of the excessive laziness above.

\begin{code} catWithLen :: [a] -> (Int -> [a]) -> [a] catWithLen xs f = h 0 xs where h k [] = f k h k (x : xs) = case succ k of -- forcing evaluation k' -> x : h k' xs \end{code}
Thanks but this gives a different problem:
dom@heisenberg:~/sha1> ./allInOne 1000001 +RTS -hc -RTS [2845392438,1191608682,3124634993,2018558572,2630932637] [2224569924,473682542,3131984545,4182845925,3846598897] Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
Does it still do that if you youse seq instead of case? \begin{code} catWithLen :: [a] -> (Int -> [a]) -> [a] catWithLen xs f = h 0 xs where h k [] = f k h k (x : xs) = let k' = succ k in k' `seq` x : h k' xs \end{code} Wolfram

hi Dominic Explicit recursion works just fine for me and keeps things simple: pad :: [Word8] -> [Word8] pad xs = pad' xs 0 pad' (x:xs) l = x : pad' xs (succ l) pad' [] l = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) at the cost of (very slightly) hiding data flow. Seems exactly what you were trying to avoid? Cheers pepe

On Saturday 03 February 2007 19:56, Pepe Iborra wrote:
pad :: [Word8] -> [Word8] pad xs = pad' xs 0
pad' (x:xs) l = x : pad' xs (succ l) pad' [] l = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) Pepe,
Thanks but this gives me a different problem dom@heisenberg:~/sha1> ./allInOne 1000001 +RTS -hc -RTS [2845392438,1191608682,3124634993,2018558572,2630932637] [2224569924,473682542,3131984545,4182845925,3846598897] Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. Someone suggested pad :: Num a => [a] -> [a] pad = pad' 0 where pad' !l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' !l (x:xs) = x : pad' (l+1) xs but that didn't compile *Main> :r [2 of 2] Compiling Main ( allInOne.hs, interpreted ) allInOne.hs:83:14: Parse error in pattern Failed, modules loaded: Codec.Utils. Dominic.

On Sun, Feb 04, 2007 at 08:20:23AM +0000, Dominic Steinitz wrote:
Someone suggested
pad :: Num a => [a] -> [a] pad = pad' 0 where pad' !l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' !l (x:xs) = x : pad' (l+1) xs
but that didn't compile
*Main> :r [2 of 2] Compiling Main ( allInOne.hs, interpreted )
allInOne.hs:83:14: Parse error in pattern Failed, modules loaded: Codec.Utils.
Dominic.
The '!' is a GHC extension, enabled using the flag '-fbang-patterns'. Equivalently, you can use Haskell 98's "seq" : pad :: Num a => [a] -> [a] pad = pad' 0 where pad' l [] | l `seq` False = undefined pad' l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' l (x:xs) = x : pad' (l+1) xs The first alternative never succeeds, but to see that the compiler must force the evaluation of 'l'.

On Sunday 04 February 2007 08:28, Stefan O'Rear wrote:
On Sun, Feb 04, 2007 at 08:20:23AM +0000, Dominic Steinitz wrote:
Someone suggested
pad :: Num a => [a] -> [a] pad = pad' 0 where pad' !l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' !l (x:xs) = x : pad' (l+1) xs
but that didn't compile
*Main> :r [2 of 2] Compiling Main ( allInOne.hs, interpreted )
allInOne.hs:83:14: Parse error in pattern Failed, modules loaded: Codec.Utils.
Dominic.
The '!' is a GHC extension, enabled using the flag '-fbang-patterns'.
The test program runs to completion but still has a space leak consuming over 25m.
Equivalently, you can use Haskell 98's "seq" :
pad :: Num a => [a] -> [a] pad = pad' 0 where pad' l [] | l `seq` False = undefined pad' l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' l (x:xs) = x : pad' (l+1) xs
The first alternative never succeeds, but to see that the compiler must force the evaluation of 'l'.
dom@heisenberg:~/sha1> ./allInOne 1000001 +RTS -hc -RTS [2845392438,1191608682,3124634993,2018558572,2630932637] [2224569924,473682542,3131984545,4182845925,3846598897] Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. Dominic. PS I appreciate all the help I'm getting.

On Sun, Feb 04, 2007 at 09:45:12AM +0000, Dominic Steinitz wrote:
pad :: Num a => [a] -> [a] pad = pad' 0 where pad' l [] | l `seq` False = undefined
Stupid typo, that should be: where pad' l _ | l `seq` False = undefined
pad' l [] = [0x80] ++ ps ++ lb where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 lb = i2osp 8 (8*l) pad' l (x:xs) = x : pad' (l+1) xs
The first alternative never succeeds, but to see that the compiler must force the evaluation of 'l'.

If anyone wants to play with this, here's a version of the leak that doesn't need any libraries or extensions. pad causes a stack overflow and pad1 uses up about 6m of heap. Dominic. module Main(main) where import Data.Word import Data.Bits import Data.List pad = pad' 0 where pad' l [] = [0x80] ++ ps where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 pad' l (x:xs) = x : pad' (l+1) xs pad1 xs = xs ++ [0x80] ++ ps where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 test :: Int -> Word8 test n = foldl' xor 0x55 (pad (replicate n 0x55)) test1 :: Int -> Word8 test1 n = foldl' xor 0x55 (pad1 (replicate n 0x55)) main = putStrLn (show (test1 1000001))

pad causes a stack overflow and pad1 uses up about 6m of heap. pad1 xs = xs ++ [0x80] ++ ps where l = length xs pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00
wild guess: if you compute the length when the consumer reaches ps, you hold on to a copy of xs longer than needed, whereas if you compute the length upfront, you unfold xs too early. the zip-trick you mentioned might work around this, allowing you to consume xs lazily while still having its length at the end. but if you're only interested in the **length modulo 64**, you should be able to process and let go of xs in chunks of length 64, too small for overflows? Claus

On 2/4/07, Dominic Steinitz
pad causes a stack overflow and pad1 uses up about 6m of heap.
pad = pad' 0 where pad' l [] = [0x80] ++ ps where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 pad' l (x:xs) = x : pad' (l+1) xs
pad = pad' 0 where pad' l [] = [0x80] ++ ps where pl = (64-(l+9)) `mod` 64 ps = replicate pl 0x00 pad' l (x:xs) = x : (pad' $! l+1) xs -- otherwise (l+1) it will be deferred until replicate -- Tolik
participants (6)
-
Anatoly Zaretsky
-
Claus Reinke
-
Dominic Steinitz
-
kahl@cas.mcmaster.ca
-
Pepe Iborra
-
Stefan O'Rear