I would start by inlining operations in the Functor, Applicative and Monad classes for your monad and all the layers in the stack (such as HtmlT). An un-inlining monadic bind can end up allocating a lot (as it's such a common operation)
On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda, <saurabhnanda@gmail.com> wrote:Please tell me what to INLINE. I'll update the benchmarks.Also, shouldn't this be treated as a GHC bug then? Using monad transformers as intended should not result in a severe performance penalty! Either monad transformers themselves are a problem or GHC is not doing the right thing.-- Saurabh.On 29 Jan 2017 7:50 pm, "Oliver Charles" <ollie@ocharles.org.uk> wrote:I would wager a guess that this can be solved with INLINE pragmas. We recently added INLINE to just about everything in transformers and got a significant speed up.
On Sun, 29 Jan 2017, 11:18 am David Turner, <dct25-561bs@mythic-beasts.com> wrote: I would guess that the issue lies within HtmlT, which looks vaguely similar to a WriterT transformer but without much in the way of optimisation (e.g. INLINE pragmas). But that's just a guess after about 30 sec of glancing at https://hackage.haskell.org/package/lucid-2.9.7/docs/ so don't take it as gospel.src/Lucid-Base.html My machine is apparently an i7-4770 of a similar vintage to yours, running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM, 16 in the host FWIW.______________________________On 29 Jan 2017 10:26, "Saurabh Nanda" <saurabhnanda@gmail.com> wrote:Thank you for the PR. Does your research suggest something is wrong with HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an mtl issue or a lucid issue in that case?Curiously, what's your machine config? I'm on a late 2011 macbook pro with 10G ram and some old i5.-- Saurabh.On 29 Jan 2017 3:05 pm, "David Turner" <dct25-561bs@mythic-beasts.com> wrote: The methodology does look reasonable, although I think you should wait for all the scotty threads to start before starting the benchmarks, as I see this interleaved output:Setting phasers to stun... (port 3002) (ctrl-c to quit)Setting phasers to stun... (port 3003) (ctrl-c to quit)Setting phasers to stun... (port 3001) (ctrl-c to quit)benchmarking bareScottySetting phasers to stun... (port 3000) (ctrl-c to quit)Your numbers are wayyy slower than the ones I see on my dev machine:benchmarking bareScottySetting phasers to stun... (port 3000) (ctrl-c to quit)time 10.94 ms (10.36 ms .. 11.52 ms)0.979 R² (0.961 R² .. 0.989 R²)mean 12.53 ms (11.98 ms .. 13.28 ms)std dev 1.702 ms (1.187 ms .. 2.589 ms)variance introduced by outliers: 66% (severely inflated)benchmarking bareScottyBareLucidtime 12.95 ms (12.28 ms .. 13.95 ms)0.972 R² (0.951 R² .. 0.989 R²)mean 12.20 ms (11.75 ms .. 12.69 ms)std dev 1.236 ms (991.3 μs .. 1.601 ms)variance introduced by outliers: 50% (severely inflated)benchmarking transScottyBareLucidtime 12.05 ms (11.70 ms .. 12.39 ms)0.992 R² (0.982 R² .. 0.996 R²)mean 12.43 ms (12.06 ms .. 13.01 ms)std dev 1.320 ms (880.5 μs .. 2.071 ms)variance introduced by outliers: 54% (severely inflated)benchmarking transScottyTransLucidtime 39.73 ms (32.16 ms .. 49.45 ms)0.668 R² (0.303 R² .. 0.969 R²)mean 42.59 ms (36.69 ms .. 54.38 ms)std dev 16.52 ms (8.456 ms .. 25.96 ms)variance introduced by outliers: 92% (severely inflated)benchmarking bareScottytime 11.46 ms (10.89 ms .. 12.07 ms)0.986 R² (0.975 R² .. 0.994 R²)mean 11.73 ms (11.45 ms .. 12.07 ms)std dev 800.6 μs (636.8 μs .. 975.3 μs)variance introduced by outliers: 34% (moderately inflated)but nonetheless I do also see the one using renderTextT to be substantially slower than the one without.I've sent you a PR [1] that isolates Lucid from Scotty and shows that renderTextT is twice as slow over IO than it is over Identity, and it's ~10% slower over Reader too:benchmarking renderTexttime 5.529 ms (5.328 ms .. 5.709 ms)0.990 R² (0.983 R² .. 0.995 R²)mean 5.645 ms (5.472 ms .. 5.888 ms)std dev 593.0 μs (352.5 μs .. 908.2 μs)variance introduced by outliers: 63% (severely inflated)benchmarking renderTextT Idtime 5.439 ms (5.243 ms .. 5.640 ms)0.991 R² (0.985 R² .. 0.996 R²)mean 5.498 ms (5.367 ms .. 5.631 ms)std dev 408.8 μs (323.8 μs .. 552.9 μs)variance introduced by outliers: 45% (moderately inflated)benchmarking renderTextT Rdtime 6.173 ms (5.983 ms .. 6.396 ms)0.990 R² (0.983 R² .. 0.995 R²)mean 6.284 ms (6.127 ms .. 6.527 ms)std dev 581.6 μs (422.9 μs .. 773.0 μs)variance introduced by outliers: 55% (severely inflated)benchmarking renderTextT IOtime 12.35 ms (11.84 ms .. 12.84 ms)0.989 R² (0.982 R² .. 0.995 R²)mean 12.22 ms (11.85 ms .. 12.76 ms)std dev 1.159 ms (729.5 μs .. 1.683 ms)variance introduced by outliers: 50% (severely inflated)I tried replacingforM [1..10000] (\_ -> div_ "hello world!")withreplicateM_ 10000 (div_ "hello world!")which discards the list of 10,000 () values that the forM thing generates, but this made very little difference.Hope this helps,DavidOn 29 January 2017 at 07:26, Saurabh Nanda <saurabhnanda@gmail.com> wrote:Hi,I was noticing severe drop in performance when Lucid's HtmlT was being combined with Scotty's ActionT. I've tried putting together a minimal repro at https://github.com/vacationlabs/monad- Request someone with better knowledge of benchmarking to check if the benchmarking methodology is correct.transformer-benchmark Is my reading of 200ms performance penalty correct?-- Saurabh.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell- cafe
Only members subscribed via the mailman list are allowed to post._________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell- cafe
Only members subscribed via the mailman list are allowed to post.