New subject: Monad transformer performance - Request to review benchmarking code + results

29 Jan 2017

      Thanks for digging deeper, David. What exactly did you inline?

Also, am I the only one losing my mind over this? It's such a
straightforward use of available code structuring tools in Haskell. How
come the compiler is not being smart about this OOB?

-- Saurabh.

On 29 Jan 2017 9:42 pm, "David Turner" 
wrote:

Here's the profiling summary that I got:

COST CENTRE                      MODULE                              %time
%alloc

getOverhead                      Criterion.Monad                      41.3
   0.0
...
...
=                              Lucid.Base                           19.2
  41.6
makeElement.\.\                  Lucid.Base                           11.4
  23.4
fromHtmlEscapedString            Blaze.ByteString.Builder.Html.Utf8    7.9
  14.9
=                              Data.Vector.Fusion.Util               2.3
   1.7
return                           Lucid.Base                            1.4
   2.1
runBenchmark.loop                Criterion.Measurement                 1.2
   0.0
with.\                           Lucid.Base                            1.0
   2.1
foldlMapWithKey                  Lucid.Base                            0.5
   2.6
streamDecodeUtf8With.decodeChunk Data.Text.Encoding                    0.0
   1.7
As expected, HtmlT's bind is the expensive bit. However I've been unable to
encourage it to go away using INLINE pragmas.

On 29 January 2017 at 15:45, Oliver Charles  wrote:
...
I would start by inlining operations in the Functor, Applicative and Monad
classes for your monad and all the layers in the stack (such as HtmlT). An
un-inlining monadic bind can end up allocating a lot (as it's such a common
operation)
On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda, 
wrote:
...
Please tell me what to INLINE. I'll update the benchmarks.
Also, shouldn't this be treated as a GHC bug then? Using monad
transformers as intended should not result in a severe performance penalty!
Either monad transformers themselves are a problem or GHC is not doing the
right thing.
-- Saurabh.
On 29 Jan 2017 7:50 pm, "Oliver Charles"  wrote:
I would wager a guess that this can be solved with INLINE pragmas. We
recently added INLINE to just about everything in transformers and got a
significant speed up.
On Sun, 29 Jan 2017, 11:18 am David Turner, <
dct25-561bs@mythic-beasts.com> wrote:
I would guess that the issue lies within HtmlT, which looks vaguely
similar to a WriterT transformer but without much in the way of
optimisation (e.g. INLINE pragmas). But that's just a guess after about 30
sec of glancing at https://hackage.haskell.org
/package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as gospel.
My machine is apparently an i7-4770 of a similar vintage to yours,
running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM,
16 in the host FWIW.
On 29 Jan 2017 10:26, "Saurabh Nanda"  wrote:
Thank you for the PR. Does your research suggest something is wrong with
HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an
mtl issue or a lucid issue in that case?
Curiously, what's your machine config? I'm on a late 2011 macbook pro
with 10G ram and some old i5.
-- Saurabh.
On 29 Jan 2017 3:05 pm, "David Turner" 
wrote:
The methodology does look reasonable, although I think you should wait
for all the scotty threads to start before starting the benchmarks, as I
see this interleaved output:
Setting phasers to stun... (port 3002) (ctrl-c to quit)
Setting phasers to stun... (port 3003) (ctrl-c to quit)
Setting phasers to stun... (port 3001) (ctrl-c to quit)
benchmarking bareScotty
Setting phasers to stun... (port 3000) (ctrl-c to quit)
Your numbers are wayyy slower than the ones I see on my dev machine:
benchmarking bareScotty
Setting phasers to stun... (port 3000) (ctrl-c to quit)
time                 10.94 ms   (10.36 ms .. 11.52 ms)
                     0.979 R²   (0.961 R² .. 0.989 R²)
mean                 12.53 ms   (11.98 ms .. 13.28 ms)
std dev              1.702 ms   (1.187 ms .. 2.589 ms)
variance introduced by outliers: 66% (severely inflated)
benchmarking bareScottyBareLucid
time                 12.95 ms   (12.28 ms .. 13.95 ms)
                     0.972 R²   (0.951 R² .. 0.989 R²)
mean                 12.20 ms   (11.75 ms .. 12.69 ms)
std dev              1.236 ms   (991.3 μs .. 1.601 ms)
variance introduced by outliers: 50% (severely inflated)
benchmarking transScottyBareLucid
time                 12.05 ms   (11.70 ms .. 12.39 ms)
                     0.992 R²   (0.982 R² .. 0.996 R²)
mean                 12.43 ms   (12.06 ms .. 13.01 ms)
std dev              1.320 ms   (880.5 μs .. 2.071 ms)
variance introduced by outliers: 54% (severely inflated)
benchmarking transScottyTransLucid
time                 39.73 ms   (32.16 ms .. 49.45 ms)
                     0.668 R²   (0.303 R² .. 0.969 R²)
mean                 42.59 ms   (36.69 ms .. 54.38 ms)
std dev              16.52 ms   (8.456 ms .. 25.96 ms)
variance introduced by outliers: 92% (severely inflated)
benchmarking bareScotty
time                 11.46 ms   (10.89 ms .. 12.07 ms)
                     0.986 R²   (0.975 R² .. 0.994 R²)
mean                 11.73 ms   (11.45 ms .. 12.07 ms)
std dev              800.6 μs   (636.8 μs .. 975.3 μs)
variance introduced by outliers: 34% (moderately inflated)
but nonetheless I do also see the one using renderTextT to be
substantially slower than the one without.
I've sent you a PR [1] that isolates Lucid from Scotty and shows that
renderTextT is twice as slow over IO than it is over Identity, and it's
~10% slower over Reader too:
benchmarking renderText
time                 5.529 ms   (5.328 ms .. 5.709 ms)
                     0.990 R²   (0.983 R² .. 0.995 R²)
mean                 5.645 ms   (5.472 ms .. 5.888 ms)
std dev              593.0 μs   (352.5 μs .. 908.2 μs)
variance introduced by outliers: 63% (severely inflated)
benchmarking renderTextT Id
time                 5.439 ms   (5.243 ms .. 5.640 ms)
                     0.991 R²   (0.985 R² .. 0.996 R²)
mean                 5.498 ms   (5.367 ms .. 5.631 ms)
std dev              408.8 μs   (323.8 μs .. 552.9 μs)
variance introduced by outliers: 45% (moderately inflated)
benchmarking renderTextT Rd
time                 6.173 ms   (5.983 ms .. 6.396 ms)
                     0.990 R²   (0.983 R² .. 0.995 R²)
mean                 6.284 ms   (6.127 ms .. 6.527 ms)
std dev              581.6 μs   (422.9 μs .. 773.0 μs)
variance introduced by outliers: 55% (severely inflated)
benchmarking renderTextT IO
time                 12.35 ms   (11.84 ms .. 12.84 ms)
                     0.989 R²   (0.982 R² .. 0.995 R²)
mean                 12.22 ms   (11.85 ms .. 12.76 ms)
std dev              1.159 ms   (729.5 μs .. 1.683 ms)
variance introduced by outliers: 50% (severely inflated)
I tried replacing
forM [1..10000] (\_ -> div_ "hello world!")
with
replicateM_ 10000 (div_ "hello world!")
which discards the list of 10,000 () values that the forM thing
generates, but this made very little difference.
Hope this helps,
David
[1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
On 29 January 2017 at 07:26, Saurabh Nanda 
wrote:
Hi,
I was noticing severe drop in performance when Lucid's HtmlT was being
combined with Scotty's ActionT. I've tried putting together a minimal repro
at https://github.com/vacationlabs/monad-transformer-benchmark Request
someone with better knowledge of benchmarking to check if the benchmarking
methodology is correct.
Is my reading of 200ms performance penalty correct?
-- Saurabh.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Re: [Haskell-cafe] Monad transformer performance - Request to review benchmarking code + results

Saurabh Nanda

Oliver Charles

Saurabh Nanda

Saurabh Nanda

Oliver Charles

Saurabh Nanda

David Turner

David Turner

tags

participants (3)