Re: [Haskell-cafe] Monad transformer performance - Request to review benchmarking code + results

Thanks for digging deeper, David. What exactly did you inline?
Also, am I the only one losing my mind over this? It's such a
straightforward use of available code structuring tools in Haskell. How
come the compiler is not being smart about this OOB?
-- Saurabh.
On 29 Jan 2017 9:42 pm, "David Turner"
= Lucid.Base 19.2 41.6 makeElement.\.\ Lucid.Base 11.4 23.4 fromHtmlEscapedString Blaze.ByteString.Builder.Html.Utf8 7.9 14.9 = Data.Vector.Fusion.Util 2.3 1.7 return Lucid.Base 1.4 2.1 runBenchmark.loop Criterion.Measurement 1.2 0.0 with.\ Lucid.Base 1.0 2.1 foldlMapWithKey Lucid.Base 0.5 2.6 streamDecodeUtf8With.decodeChunk Data.Text.Encoding 0.0 1.7
As expected, HtmlT's bind is the expensive bit. However I've been unable to
encourage it to go away using INLINE pragmas.
On 29 January 2017 at 15:45, Oliver Charles
I would start by inlining operations in the Functor, Applicative and Monad classes for your monad and all the layers in the stack (such as HtmlT). An un-inlining monadic bind can end up allocating a lot (as it's such a common operation)
On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda,
wrote: Please tell me what to INLINE. I'll update the benchmarks.
Also, shouldn't this be treated as a GHC bug then? Using monad transformers as intended should not result in a severe performance penalty! Either monad transformers themselves are a problem or GHC is not doing the right thing.
-- Saurabh.
On 29 Jan 2017 7:50 pm, "Oliver Charles"
wrote: I would wager a guess that this can be solved with INLINE pragmas. We recently added INLINE to just about everything in transformers and got a significant speed up.
On Sun, 29 Jan 2017, 11:18 am David Turner, < dct25-561bs@mythic-beasts.com> wrote:
I would guess that the issue lies within HtmlT, which looks vaguely similar to a WriterT transformer but without much in the way of optimisation (e.g. INLINE pragmas). But that's just a guess after about 30 sec of glancing at https://hackage.haskell.org /package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as gospel.
My machine is apparently an i7-4770 of a similar vintage to yours, running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM, 16 in the host FWIW.
On 29 Jan 2017 10:26, "Saurabh Nanda"
wrote: Thank you for the PR. Does your research suggest something is wrong with HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an mtl issue or a lucid issue in that case?
Curiously, what's your machine config? I'm on a late 2011 macbook pro with 10G ram and some old i5.
-- Saurabh.
On 29 Jan 2017 3:05 pm, "David Turner"
wrote: The methodology does look reasonable, although I think you should wait for all the scotty threads to start before starting the benchmarks, as I see this interleaved output:
Setting phasers to stun... (port 3002) (ctrl-c to quit) Setting phasers to stun... (port 3003) (ctrl-c to quit) Setting phasers to stun... (port 3001) (ctrl-c to quit) benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit)
Your numbers are wayyy slower than the ones I see on my dev machine:
benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit) time 10.94 ms (10.36 ms .. 11.52 ms) 0.979 R² (0.961 R² .. 0.989 R²) mean 12.53 ms (11.98 ms .. 13.28 ms) std dev 1.702 ms (1.187 ms .. 2.589 ms) variance introduced by outliers: 66% (severely inflated)
benchmarking bareScottyBareLucid time 12.95 ms (12.28 ms .. 13.95 ms) 0.972 R² (0.951 R² .. 0.989 R²) mean 12.20 ms (11.75 ms .. 12.69 ms) std dev 1.236 ms (991.3 μs .. 1.601 ms) variance introduced by outliers: 50% (severely inflated)
benchmarking transScottyBareLucid time 12.05 ms (11.70 ms .. 12.39 ms) 0.992 R² (0.982 R² .. 0.996 R²) mean 12.43 ms (12.06 ms .. 13.01 ms) std dev 1.320 ms (880.5 μs .. 2.071 ms) variance introduced by outliers: 54% (severely inflated)
benchmarking transScottyTransLucid time 39.73 ms (32.16 ms .. 49.45 ms) 0.668 R² (0.303 R² .. 0.969 R²) mean 42.59 ms (36.69 ms .. 54.38 ms) std dev 16.52 ms (8.456 ms .. 25.96 ms) variance introduced by outliers: 92% (severely inflated)
benchmarking bareScotty time 11.46 ms (10.89 ms .. 12.07 ms) 0.986 R² (0.975 R² .. 0.994 R²) mean 11.73 ms (11.45 ms .. 12.07 ms) std dev 800.6 μs (636.8 μs .. 975.3 μs) variance introduced by outliers: 34% (moderately inflated)
but nonetheless I do also see the one using renderTextT to be substantially slower than the one without.
I've sent you a PR [1] that isolates Lucid from Scotty and shows that renderTextT is twice as slow over IO than it is over Identity, and it's ~10% slower over Reader too:
benchmarking renderText time 5.529 ms (5.328 ms .. 5.709 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 5.645 ms (5.472 ms .. 5.888 ms) std dev 593.0 μs (352.5 μs .. 908.2 μs) variance introduced by outliers: 63% (severely inflated)
benchmarking renderTextT Id time 5.439 ms (5.243 ms .. 5.640 ms) 0.991 R² (0.985 R² .. 0.996 R²) mean 5.498 ms (5.367 ms .. 5.631 ms) std dev 408.8 μs (323.8 μs .. 552.9 μs) variance introduced by outliers: 45% (moderately inflated)
benchmarking renderTextT Rd time 6.173 ms (5.983 ms .. 6.396 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 6.284 ms (6.127 ms .. 6.527 ms) std dev 581.6 μs (422.9 μs .. 773.0 μs) variance introduced by outliers: 55% (severely inflated)
benchmarking renderTextT IO time 12.35 ms (11.84 ms .. 12.84 ms) 0.989 R² (0.982 R² .. 0.995 R²) mean 12.22 ms (11.85 ms .. 12.76 ms) std dev 1.159 ms (729.5 μs .. 1.683 ms) variance introduced by outliers: 50% (severely inflated)
I tried replacing
forM [1..10000] (\_ -> div_ "hello world!")
with
replicateM_ 10000 (div_ "hello world!")
which discards the list of 10,000 () values that the forM thing generates, but this made very little difference.
Hope this helps,
David
[1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
On 29 January 2017 at 07:26, Saurabh Nanda
wrote: Hi,
I was noticing severe drop in performance when Lucid's HtmlT was being combined with Scotty's ActionT. I've tried putting together a minimal repro at https://github.com/vacationlabs/monad-transformer-benchmark Request someone with better knowledge of benchmarking to check if the benchmarking methodology is correct.
Is my reading of 200ms performance penalty correct?
-- Saurabh.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Why do you keep expecting the compiler to "be smart"? It's just shuffling
data around, any type of magic efficiency with Monadic computations
requires specific knowledge about monads, which is not something we encode
into the compiler. just saying "this should be obvious" is not very
productive.
On Sun, 29 Jan 2017, 4:59 pm Saurabh Nanda,
Thanks for digging deeper, David. What exactly did you inline?
Also, am I the only one losing my mind over this? It's such a straightforward use of available code structuring tools in Haskell. How come the compiler is not being smart about this OOB?
-- Saurabh.
On 29 Jan 2017 9:42 pm, "David Turner"
wrote: Here's the profiling summary that I got:
COST CENTRE MODULE %time %alloc
getOverhead Criterion.Monad 41.3 0.0
= Lucid.Base 19.2 41.6 makeElement.\.\ Lucid.Base 11.4 23.4 fromHtmlEscapedString Blaze.ByteString.Builder.Html.Utf8 7.9 14.9 = Data.Vector.Fusion.Util 2.3 1.7 return Lucid.Base 1.4 2.1 runBenchmark.loop Criterion.Measurement 1.2 0.0 with.\ Lucid.Base 1.0 2.1 foldlMapWithKey Lucid.Base 0.5 2.6 streamDecodeUtf8With.decodeChunk Data.Text.Encoding 0.0 1.7
As expected, HtmlT's bind is the expensive bit. However I've been unable to encourage it to go away using INLINE pragmas.
On 29 January 2017 at 15:45, Oliver Charles
wrote: I would start by inlining operations in the Functor, Applicative and Monad classes for your monad and all the layers in the stack (such as HtmlT). An un-inlining monadic bind can end up allocating a lot (as it's such a common operation)
On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda,
wrote: Please tell me what to INLINE. I'll update the benchmarks.
Also, shouldn't this be treated as a GHC bug then? Using monad transformers as intended should not result in a severe performance penalty! Either monad transformers themselves are a problem or GHC is not doing the right thing.
-- Saurabh.
On 29 Jan 2017 7:50 pm, "Oliver Charles"
wrote: I would wager a guess that this can be solved with INLINE pragmas. We recently added INLINE to just about everything in transformers and got a significant speed up.
On Sun, 29 Jan 2017, 11:18 am David Turner,
wrote: I would guess that the issue lies within HtmlT, which looks vaguely similar to a WriterT transformer but without much in the way of optimisation (e.g. INLINE pragmas). But that's just a guess after about 30 sec of glancing at https://hackage.haskell.org/package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as gospel.
My machine is apparently an i7-4770 of a similar vintage to yours, running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM, 16 in the host FWIW.
On 29 Jan 2017 10:26, "Saurabh Nanda"
wrote: Thank you for the PR. Does your research suggest something is wrong with HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an mtl issue or a lucid issue in that case?
Curiously, what's your machine config? I'm on a late 2011 macbook pro with 10G ram and some old i5.
-- Saurabh.
On 29 Jan 2017 3:05 pm, "David Turner"
wrote: The methodology does look reasonable, although I think you should wait for all the scotty threads to start before starting the benchmarks, as I see this interleaved output:
Setting phasers to stun... (port 3002) (ctrl-c to quit) Setting phasers to stun... (port 3003) (ctrl-c to quit) Setting phasers to stun... (port 3001) (ctrl-c to quit) benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit)
Your numbers are wayyy slower than the ones I see on my dev machine:
benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit) time 10.94 ms (10.36 ms .. 11.52 ms) 0.979 R² (0.961 R² .. 0.989 R²) mean 12.53 ms (11.98 ms .. 13.28 ms) std dev 1.702 ms (1.187 ms .. 2.589 ms) variance introduced by outliers: 66% (severely inflated)
benchmarking bareScottyBareLucid time 12.95 ms (12.28 ms .. 13.95 ms) 0.972 R² (0.951 R² .. 0.989 R²) mean 12.20 ms (11.75 ms .. 12.69 ms) std dev 1.236 ms (991.3 μs .. 1.601 ms) variance introduced by outliers: 50% (severely inflated)
benchmarking transScottyBareLucid time 12.05 ms (11.70 ms .. 12.39 ms) 0.992 R² (0.982 R² .. 0.996 R²) mean 12.43 ms (12.06 ms .. 13.01 ms) std dev 1.320 ms (880.5 μs .. 2.071 ms) variance introduced by outliers: 54% (severely inflated)
benchmarking transScottyTransLucid time 39.73 ms (32.16 ms .. 49.45 ms) 0.668 R² (0.303 R² .. 0.969 R²) mean 42.59 ms (36.69 ms .. 54.38 ms) std dev 16.52 ms (8.456 ms .. 25.96 ms) variance introduced by outliers: 92% (severely inflated)
benchmarking bareScotty time 11.46 ms (10.89 ms .. 12.07 ms) 0.986 R² (0.975 R² .. 0.994 R²) mean 11.73 ms (11.45 ms .. 12.07 ms) std dev 800.6 μs (636.8 μs .. 975.3 μs) variance introduced by outliers: 34% (moderately inflated)
but nonetheless I do also see the one using renderTextT to be substantially slower than the one without.
I've sent you a PR [1] that isolates Lucid from Scotty and shows that renderTextT is twice as slow over IO than it is over Identity, and it's ~10% slower over Reader too:
benchmarking renderText time 5.529 ms (5.328 ms .. 5.709 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 5.645 ms (5.472 ms .. 5.888 ms) std dev 593.0 μs (352.5 μs .. 908.2 μs) variance introduced by outliers: 63% (severely inflated)
benchmarking renderTextT Id time 5.439 ms (5.243 ms .. 5.640 ms) 0.991 R² (0.985 R² .. 0.996 R²) mean 5.498 ms (5.367 ms .. 5.631 ms) std dev 408.8 μs (323.8 μs .. 552.9 μs) variance introduced by outliers: 45% (moderately inflated)
benchmarking renderTextT Rd time 6.173 ms (5.983 ms .. 6.396 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 6.284 ms (6.127 ms .. 6.527 ms) std dev 581.6 μs (422.9 μs .. 773.0 μs) variance introduced by outliers: 55% (severely inflated)
benchmarking renderTextT IO time 12.35 ms (11.84 ms .. 12.84 ms) 0.989 R² (0.982 R² .. 0.995 R²) mean 12.22 ms (11.85 ms .. 12.76 ms) std dev 1.159 ms (729.5 μs .. 1.683 ms) variance introduced by outliers: 50% (severely inflated)
I tried replacing
forM [1..10000] (\_ -> div_ "hello world!")
with
replicateM_ 10000 (div_ "hello world!")
which discards the list of 10,000 () values that the forM thing generates, but this made very little difference.
Hope this helps,
David
[1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
On 29 January 2017 at 07:26, Saurabh Nanda
wrote: Hi,
I was noticing severe drop in performance when Lucid's HtmlT was being combined with Scotty's ActionT. I've tried putting together a minimal repro at https://github.com/vacationlabs/monad-transformer-benchmark Request someone with better knowledge of benchmarking to check if the benchmarking methodology is correct.
Is my reading of 200ms performance penalty correct?
-- Saurabh.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Why do you keep expecting the compiler to "be smart"? It's just shuffling data around, any type of magic efficiency with Monadic computations requires specific knowledge about monads, which is not something we encode into the compiler. just saying "this should be obvious" is not very productive.
Two reasons: * If inlining certain functions can give a boost to the performance, then is it unreasonable to expect the compiler to have better heuristics about when commonly occurring code patterns should be inlined? In this case monads and mtl being the commonly occurring code patterns. * At a broader level, the promise of writing pure functions was to be able to talk about 'intent', not 'implementation' -- the classic `map` vs `for loop` example. Additionally, pure functions give the compiler enough opportunity to optimise code. Both these higher level promises are being broken by this experience. Hence, I'm feeling "cheated" -- Saurabh.

Was going through
https://downloads.haskell.org/~ghc/7.0.3/docs/html/users_guide/pragmas.html
and came across the SPECIALIZE pragma. Is it possible to write it in a way
that it specializes a complete monad transformer stack to given concrete
stack?
-- Saurabh.
On 30 Jan 2017 10:27 am, "Saurabh Nanda"
Why do you keep expecting the compiler to "be smart"? It's just shuffling
data around, any type of magic efficiency with Monadic computations requires specific knowledge about monads, which is not something we encode into the compiler. just saying "this should be obvious" is not very productive.
Two reasons:
* If inlining certain functions can give a boost to the performance, then is it unreasonable to expect the compiler to have better heuristics about when commonly occurring code patterns should be inlined? In this case monads and mtl being the commonly occurring code patterns.
* At a broader level, the promise of writing pure functions was to be able to talk about 'intent', not 'implementation' -- the classic `map` vs `for loop` example. Additionally, pure functions give the compiler enough opportunity to optimise code. Both these higher level promises are being broken by this experience. Hence, I'm feeling "cheated"
-- Saurabh.

Saurabh Nanda
Why do you keep expecting the compiler to "be smart"? It's just shuffling data around, any type of magic efficiency with Monadic computations requires specific knowledge about monads, which is not something we encode into the compiler. just saying "this should be obvious" is not very productive.
Two reasons:
* If inlining certain functions can give a boost to the performance, then is it unreasonable to expect the compiler to have better heuristics about when commonly occurring code patterns should be inlined? In this case monads and mtl being the commonly occurring code patterns.
* At a broader level, the promise of writing pure functions was to be able to talk about 'intent', not 'implementation' -- the classic `map` vs `for loop` example. Additionally, pure functions give the compiler enough opportunity to optimise code. Both these higher level promises are being broken by this experience. Hence, I'm feeling "cheated"
Inlining can only happen if the code to be inlined is available. This means explicitly marking things with {-# INLINE #-} or {-# INLINEABLE #-} (or GHC seeing something as "small" and deciding it's worth putting in the .hi file). If I give you "foo True" and tell you nothing more about "foo" - how are you going to optimise that? That's essentially what is happening - GHC has no more information so all it can do is just call the function and hope for the best.
Additionally, pure functions give the compiler enough opportunity to optimise code. Both these higher level promises are being broken by this experience. Hence, I'm feeling "cheated"
Certainly, but the benefits of purity are different. We use purity to know that we can rewrite ASTs, introduce sharing, lift let-bindings up, move case statements around, and so on. This is very different that "understanding how monad transformers work". You still seem to be suggesting that "it's obvious" a compiler should be able to optimise this code, but without actually being able to inline the code and unleash all of GHC's optimisations, I'm not sure what else the compiler can do. It's just a black box otherwise. Of course, it's a shame that {-# INLINE #-} is almost a requirement for some types of performant code, but right now - it is what it is. -- ocharles

Inlining can only happen if the code to be inlined is available. This means explicitly marking things with {-# INLINE #-} or {-# INLINEABLE #-} (or GHC seeing something as "small" and deciding it's worth putting in the .hi file). If I give you "foo True" and tell you nothing more about "foo" - how are you going to optimise that? That's essentially what is happening - GHC has no more information so all it can do is just call the function and hope for the best. I don't understand. If the compiler can figure out how to inline certain functions automatically (without the use of the INLINE pragma), what's causing Monad binds and lifts to be out of the consideration set? Why do you say GHC has "no more information"? How does it have "more information" about functions that it does decide to inline automatically? -- Saurabh.

On 7 Feb 2017 02:49, "Saurabh Nanda"

Because, I guess, nobody has put the time and effort into optimising this
particular benchmark. Lucid's fast enough that there are normally other
more pressing bottlenecks in a real application. The compiler has no
relevant smarts here, it's the library code to look at.
There's something funny going on on your system that I can't help with,
since I'm seeing rendering and serving the 230kB HTML page in a reasonably
punchy 10-20ms range, an order of magnitude less than your numbers.
Nonetheless, here is a fork of Lucid which performs substantially better
running renderTextT over the IO monad on the benchmarks I sent earlier,
thanks to a sprinkling of inlining:
https://github.com/chrisdone/lucid/compare/master...DaveCTurner:98d69d045703...
benchmarking renderText
time 4.900 ms (4.577 ms .. 5.218 ms)
0.895 R² (0.749 R² .. 0.988 R²)
mean 5.560 ms (5.189 ms .. 6.461 ms)
std dev 1.717 ms (510.4 μs .. 3.380 ms)
variance introduced by outliers: 95% (severely inflated)
benchmarking renderTextT Id
time 4.879 ms (4.755 ms .. 5.036 ms)
0.989 R² (0.979 R² .. 0.997 R²)
mean 5.057 ms (4.946 ms .. 5.219 ms)
std dev 373.7 μs (285.5 μs .. 483.4 μs)
variance introduced by outliers: 47% (moderately inflated)
benchmarking renderTextT Rd
time 5.034 ms (4.916 ms .. 5.152 ms)
0.994 R² (0.989 R² .. 0.997 R²)
mean 5.226 ms (5.090 ms .. 5.772 ms)
std dev 713.8 μs (261.3 μs .. 1.417 ms)
variance introduced by outliers: 74% (severely inflated)
benchmarking renderTextT IO
time 7.168 ms (6.694 ms .. 7.557 ms)
0.969 R² (0.946 R² .. 0.982 R²)
mean 8.388 ms (8.014 ms .. 8.880 ms)
std dev 1.132 ms (932.1 μs .. 1.397 ms)
variance introduced by outliers: 71% (severely inflated)
and here is all the things I tried on it:
https://github.com/chrisdone/lucid/compare/master...DaveCTurner:inline-the-t...
Hope that helps,
David
On 29 January 2017 at 16:59, Saurabh Nanda
Thanks for digging deeper, David. What exactly did you inline?
Also, am I the only one losing my mind over this? It's such a straightforward use of available code structuring tools in Haskell. How come the compiler is not being smart about this OOB?
-- Saurabh.
On 29 Jan 2017 9:42 pm, "David Turner"
wrote: Here's the profiling summary that I got:
COST CENTRE MODULE %time %alloc
getOverhead Criterion.Monad 41.3 0.0
= Lucid.Base 19.2 41.6 makeElement.\.\ Lucid.Base 11.4 23.4 fromHtmlEscapedString Blaze.ByteString.Builder.Html.Utf8 7.9 14.9 = Data.Vector.Fusion.Util 2.3 1.7 return Lucid.Base 1.4 2.1 runBenchmark.loop Criterion.Measurement 1.2 0.0 with.\ Lucid.Base 1.0 2.1 foldlMapWithKey Lucid.Base 0.5 2.6 streamDecodeUtf8With.decodeChunk Data.Text.Encoding 0.0 1.7
As expected, HtmlT's bind is the expensive bit. However I've been unable to encourage it to go away using INLINE pragmas.
On 29 January 2017 at 15:45, Oliver Charles
wrote: I would start by inlining operations in the Functor, Applicative and Monad classes for your monad and all the layers in the stack (such as HtmlT). An un-inlining monadic bind can end up allocating a lot (as it's such a common operation)
On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda,
wrote: Please tell me what to INLINE. I'll update the benchmarks.
Also, shouldn't this be treated as a GHC bug then? Using monad transformers as intended should not result in a severe performance penalty! Either monad transformers themselves are a problem or GHC is not doing the right thing.
-- Saurabh.
On 29 Jan 2017 7:50 pm, "Oliver Charles"
wrote: I would wager a guess that this can be solved with INLINE pragmas. We recently added INLINE to just about everything in transformers and got a significant speed up.
On Sun, 29 Jan 2017, 11:18 am David Turner, < dct25-561bs@mythic-beasts.com> wrote:
I would guess that the issue lies within HtmlT, which looks vaguely similar to a WriterT transformer but without much in the way of optimisation (e.g. INLINE pragmas). But that's just a guess after about 30 sec of glancing at https://hackage.haskell.org /package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as gospel.
My machine is apparently an i7-4770 of a similar vintage to yours, running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM, 16 in the host FWIW.
On 29 Jan 2017 10:26, "Saurabh Nanda"
wrote: Thank you for the PR. Does your research suggest something is wrong with HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an mtl issue or a lucid issue in that case?
Curiously, what's your machine config? I'm on a late 2011 macbook pro with 10G ram and some old i5.
-- Saurabh.
On 29 Jan 2017 3:05 pm, "David Turner"
wrote: The methodology does look reasonable, although I think you should wait for all the scotty threads to start before starting the benchmarks, as I see this interleaved output:
Setting phasers to stun... (port 3002) (ctrl-c to quit) Setting phasers to stun... (port 3003) (ctrl-c to quit) Setting phasers to stun... (port 3001) (ctrl-c to quit) benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit)
Your numbers are wayyy slower than the ones I see on my dev machine:
benchmarking bareScotty Setting phasers to stun... (port 3000) (ctrl-c to quit) time 10.94 ms (10.36 ms .. 11.52 ms) 0.979 R² (0.961 R² .. 0.989 R²) mean 12.53 ms (11.98 ms .. 13.28 ms) std dev 1.702 ms (1.187 ms .. 2.589 ms) variance introduced by outliers: 66% (severely inflated)
benchmarking bareScottyBareLucid time 12.95 ms (12.28 ms .. 13.95 ms) 0.972 R² (0.951 R² .. 0.989 R²) mean 12.20 ms (11.75 ms .. 12.69 ms) std dev 1.236 ms (991.3 μs .. 1.601 ms) variance introduced by outliers: 50% (severely inflated)
benchmarking transScottyBareLucid time 12.05 ms (11.70 ms .. 12.39 ms) 0.992 R² (0.982 R² .. 0.996 R²) mean 12.43 ms (12.06 ms .. 13.01 ms) std dev 1.320 ms (880.5 μs .. 2.071 ms) variance introduced by outliers: 54% (severely inflated)
benchmarking transScottyTransLucid time 39.73 ms (32.16 ms .. 49.45 ms) 0.668 R² (0.303 R² .. 0.969 R²) mean 42.59 ms (36.69 ms .. 54.38 ms) std dev 16.52 ms (8.456 ms .. 25.96 ms) variance introduced by outliers: 92% (severely inflated)
benchmarking bareScotty time 11.46 ms (10.89 ms .. 12.07 ms) 0.986 R² (0.975 R² .. 0.994 R²) mean 11.73 ms (11.45 ms .. 12.07 ms) std dev 800.6 μs (636.8 μs .. 975.3 μs) variance introduced by outliers: 34% (moderately inflated)
but nonetheless I do also see the one using renderTextT to be substantially slower than the one without.
I've sent you a PR [1] that isolates Lucid from Scotty and shows that renderTextT is twice as slow over IO than it is over Identity, and it's ~10% slower over Reader too:
benchmarking renderText time 5.529 ms (5.328 ms .. 5.709 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 5.645 ms (5.472 ms .. 5.888 ms) std dev 593.0 μs (352.5 μs .. 908.2 μs) variance introduced by outliers: 63% (severely inflated)
benchmarking renderTextT Id time 5.439 ms (5.243 ms .. 5.640 ms) 0.991 R² (0.985 R² .. 0.996 R²) mean 5.498 ms (5.367 ms .. 5.631 ms) std dev 408.8 μs (323.8 μs .. 552.9 μs) variance introduced by outliers: 45% (moderately inflated)
benchmarking renderTextT Rd time 6.173 ms (5.983 ms .. 6.396 ms) 0.990 R² (0.983 R² .. 0.995 R²) mean 6.284 ms (6.127 ms .. 6.527 ms) std dev 581.6 μs (422.9 μs .. 773.0 μs) variance introduced by outliers: 55% (severely inflated)
benchmarking renderTextT IO time 12.35 ms (11.84 ms .. 12.84 ms) 0.989 R² (0.982 R² .. 0.995 R²) mean 12.22 ms (11.85 ms .. 12.76 ms) std dev 1.159 ms (729.5 μs .. 1.683 ms) variance introduced by outliers: 50% (severely inflated)
I tried replacing
forM [1..10000] (\_ -> div_ "hello world!")
with
replicateM_ 10000 (div_ "hello world!")
which discards the list of 10,000 () values that the forM thing generates, but this made very little difference.
Hope this helps,
David
[1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
On 29 January 2017 at 07:26, Saurabh Nanda
wrote: Hi,
I was noticing severe drop in performance when Lucid's HtmlT was being combined with Scotty's ActionT. I've tried putting together a minimal repro at https://github.com/vacationlabs/monad-transformer-benchmark Request someone with better knowledge of benchmarking to check if the benchmarking methodology is correct.
Is my reading of 200ms performance penalty correct?
-- Saurabh.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (3)
-
David Turner
-
Oliver Charles
-
Saurabh Nanda