
In preparing the speed ups in uuid-0.1.2, I investigated various ways to store 16 bytes of data in a Haskell object. Surprisingly, storing as 4 Word32 values in a standard data type worked best for that application. I've extracted out the testing work for that and put it here: http://bitbucket.org/mtnviewmark/haskell-playground/src/tip/16bytes/ There you can find the code that tests storing 16 bytes in various ways:
import qualified Data.Array as A import qualified Data.Array.Unboxed as U import qualified Data.Array.Vector as V import qualified Data.ByteString as B
data InBytes = BY !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 !Word8 deriving (Eq, Ord)
data InWords = WO !Word32 !Word32 !Word32 !Word32 deriving (Eq, Ord)
data InList = LI [Word8] deriving (Eq, Ord) data InByteString = BS B.ByteString deriving (Eq, Ord) data InArray = AR (A.Array Int Word8) deriving (Eq, Ord) data InUArray = UA (U.UArray Int Word8) deriving (Eq, Ord) data InVector = VE (V.UArr Word8) deriving (Eq)
Depending on operations will be needed most, different storage methods do best. Enjoy! - Mark Mark Lentczner (MtnViewMark) http://www.ozonehouse.com/mark/

On Tue, Jan 05, 2010 at 04:06:09PM -0800, Mark Lentczner wrote:
In preparing the speed ups in uuid-0.1.2, I investigated various ways to store 16 bytes of data in a Haskell object. Surprisingly, storing as 4 Word32 values in a standard data type worked best for that application.
However, on an Core 2 Duo in x86-64 mode with GHC 6.10, 2 Word64 values wins over 4 Word32 values. Attached is the modified source code. Here is the summary between them equal-itself/words 76.41 ns equal-itself/words64 64.52 ns equal-same/words 79.64 ns equal-same/words64 67.85 ns equal-differ/words 66.11 ns equal-differ/words64 62.79 ns compare-same/words 80.59 ns compare-same/words64 68.68 ns compare-differ/words 67.14 ns compare-differ/words64 64.22 ns toList-and-sum/words 991.32 ns toList-and-sum/words64 839.45 ns map-and-sum/words 1.04 us map-and-sum/words64 1.12 us fold-and-sum/words 882.88 ns fold-and-sum/words64 755.98 ns fromList-and-eq/words 740.41 ns fromList-and-eq/words64 749.07 ns build-and-eq/words 484.54 ns build-and-eq/words64 447.81 ns unfold-and-eq/words 577.12 ns unfold-and-eq/words64 541.46 ns Cheers, -- Felipe.

2010/1/5 Mark Lentczner
There you can find the code that tests storing 16 bytes in various ways:
data InWords = WO !Word32 !Word32 !Word32 !Word32 deriving (Eq, Ord)
data InList = LI [Word8] deriving (Eq, Ord) data InByteString = BS B.ByteString deriving (Eq, Ord) data InArray = AR (A.Array Int Word8) deriving (Eq, Ord) data InUArray = UA (U.UArray Int Word8) deriving (Eq, Ord) data InVector = VE (V.UArr Word8) deriving (Eq)
You've got an extra level of indirection there due to the use of data instead of newtype, so you're paying an additional boxing penalty for everything except your first case. Are you compiling with -funbox-strict-fields?

On Tue, Jan 05, 2010 at 04:40:55PM -0800, Bryan O'Sullivan wrote:
You've got an extra level of indirection there due to the use of data instead of newtype, so you're paying an additional boxing penalty for everything except your first case. Are you compiling with -funbox-strict-fields?
I've changed those data's to newtype's but using words still seems better, unless mapping and folding over bytes is more important. In that case maybe storing the bytes separately might be better. Maybe a more complex/realistic benchmark? I've also rerun the benchmark in a x86-32 chroot. In this environment Word32 seems to win over Word64. But, who cares about 32 bits anyway? ;) I'm attaching the source code and the summary results. Everything was compiled with 'ghc -fforce-recomp -O3'. -- Felipe.
participants (3)
-
Bryan O'Sullivan
-
Felipe Lessa
-
Mark Lentczner