Hi,

As I said, here is the Criterion benchmark.

Pure version is quite faster than the IO version.

Results:

warming up
estimating clock resolution...
mean is 1.620930 us (320001 iterations)
found 1799 outliers among 319999 samples (0.6%)
1785 (0.6%) high severe
estimating cost of a clock call...
mean is 82.09459 ns (12 iterations)
found 2 outliers among 12 samples (16.7%)
2 (16.7%) high mild

benchmarking diffPure
collecting 100 samples, 1 iterations each, in estimated 14.30449 s
mean: 142.8618 ms, lb 142.0964 ms, ub 143.7192 ms, ci 0.950
std dev: 4.158374 ms, lb 3.678794 ms, ub 4.771238 ms, ci 0.950
variance introduced by outliers: 23.847%
variance is moderately inflated by outliers

benchmarking diffIO
collecting 100 samples, 1 iterations each, in estimated 33.09729 s
mean: 344.6756 ms, lb 342.0381 ms, ub 347.5559 ms, ci 0.950
std dev: 14.14651 ms, lb 12.91057 ms, ub 15.73398 ms, ci 0.950
variance introduced by outliers: 38.516%
variance is moderately inflated by outliers

Testing module:

doBench :: IO()
doBench = do
Right (ImageRGB8 img1) <- readImage origF
Right (ImageRGB8 img2) <- readImage chngdF

defaultMainWith (defaultConfig { cfgSamples = ljust 100 }) (return ())
    [
      bench "diffPure" $ nf    (diffPure img1) img2,
      bench "diffIO "   $ nfIO (diffIO    img1 img2)
    ]

diffIO     :: Image PixelRGB8 -> Image PixelRGB8 -> IO (Image PixelRGBA8)
diffIO i1@(Image { imageWidth = w, imageHeight = h }) i2 = withImage w h pixGen
where pixGen x y = return $ pixGenH i1 i2 x y

diffPure :: Image PixelRGB8 -> Image PixelRGB8 ->       Image PixelRGBA8
diffPure i1@(Image { imageWidth = w, imageHeight = h }) i2 = generateImage pixGen w h
where pixGen x y =               pixGenH i1 i2 x y

pixGenH :: Image PixelRGB8 -> Image PixelRGB8 -> Int -> Int -> PixelRGBA8
pixGenH i1 i2 x y =
if p1 == p2
    then PixelRGBA8 0 0 0 0
    else PixelRGBA8 r g b 254
where
    p1@(PixelRGB8 r g b) = pixelAt i1 x y
    p2                               = pixelAt i2 x y

Best regards,

vlatko

-------- Original Message --------
Subject: Re: [Haskell-cafe] JuicyFruit - explanation of speed difference of pure and monadic image generation
From: Vlatko Basic <vlatko.basic@gmail.com>
To: Alp Mestanogullari <alpmestan@gmail.com>, Joey Adams <joeyadams3.14159@gmail.com>
Cc: The Haskell Cafe <haskell-cafe@haskell.org>
Date: 21.03.2014 11:10

Hi,

Well, we can all relax. Library is good. :-)

My perfect timing function wasn't evaluating pure stuff deep enough in these cases.

However, there still seems to be a difference of about 50% , but I suppose that could be explained with intermediate lists on such large data sets, as Joey Adams said.
I'll benchmark it properly with Criterion, and post the results here, for info.

Sorry for the trouble. :-(

vlatko

-------- Original Message --------
Subject: Re: [Haskell-cafe] JuicyFruit - explanation of speed difference of pure and monadic image generation
From: Alp Mestanogullari <alpmestan@gmail.com>
To: Joey Adams <joeyadams3.14159@gmail.com>
Cc: The Haskell Cafe <haskell-cafe@haskell.org>, vlatko.basic@gmail.com
Date: 20.03.2014 15:43

Could it be because you are calling withImage in IO whereas generateImage coes through ST? A lot of the nice performance numbers of JuicyPixels come from its carefully tailored ST usage, which in turn comes from theefficiency of unboxed mutable vectors (as in the "vector" package).

So could you post the benchmark result for a version where you runST on the result of withImage? That should be a fairer comparison. Also, writing a criterion benchmark would help and make sure the functions are run properly without any of the two taking advantage of computations previsouly performed by the other.