
Updated my gist and looped the benchmark to run 1000 times to reduce
variance caused by measurement error. My work machine is a 6-core Westmere,
but I'm home now and ran the benchmark on my quad-core i7 Macbook Pro (GHC
7.4.1, -O2, 64-bit), I get the same results:
master /Users/greg/tmp/haskell/gist-3756876 [*] $ ./bench
warming up
estimating clock resolution...
mean is 1.300787 us (640001 iterations)
found 4606 outliers among 639999 samples (0.7%)
3708 (0.6%) high severe
estimating cost of a clock call...
mean is 57.02712 ns (8 iterations)
found 1 outliers among 8 samples (12.5%)
1 (12.5%) high severe
benchmarking Char8
collecting 100 samples, 1 iterations each, in estimated 19.08500 s
mean: 190.8388 ms, lb 190.7002 ms, ub 190.9720 ms, ci 0.950
std dev: 695.8268 us, lb 597.0216 us, ub 840.2860 us, ci 0.950
benchmarking Char8 toLowerC
mean: 7.421057 ms, lb 7.398322 ms, ub 7.444365 ms, ci 0.950
std dev: 117.5378 us, lb 100.6587 us, ub 139.2367 us, ci 0.950
found 9 outliers among 100 samples (9.0%)
4 (4.0%) low mild
5 (5.0%) high mild
variance introduced by outliers: 8.495%
variance is slightly inflated by outliers
benchmarking bsToLower
mean: 15.43166 ms, lb 15.39326 ms, ub 15.47557 ms, ci 0.950
std dev: 210.4574 us, lb 178.0131 us, ub 256.2110 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
4 (4.0%) high mild
variance introduced by outliers: 6.588%
variance is slightly inflated by outliers
benchmarking Word8
mean: 21.76645 ms, lb 21.72059 ms, ub 21.82300 ms, ci 0.950
std dev: 259.4630 us, lb 209.6453 us, ub 374.8074 us, ci 0.950
G
On Thu, Sep 20, 2012 at 6:58 PM, Michael Snoyman
On Thu, Sep 20, 2012 at 7:25 PM, Gregory Collins
wrote: Hey Michael,
BTW -- you're getting crap performance here because of the fromEnum/toEnum in toLowerC, which does checks. An updated version using unsafeChr is faster than your bsToLower call: https://gist.github.com/3756876
master /home/gdc/tmp/haskell/chr/gist-3756876 ⚠ $ ./bench warming up estimating clock resolution... mean is 1.290661 us (640001 iterations) found 2935 outliers among 639999 samples (0.5%) 2541 (0.4%) high severe estimating cost of a clock call... mean is 31.63774 ns (13 iterations) found 2 outliers among 13 samples (15.4%) 2 (15.4%) high mild
benchmarking Char8 mean: 145.1935 us, lb 141.4375 us, ub 149.8138 us, ci 0.950 std dev: 21.28567 us, lb 18.14038 us, ub 23.87625 us, ci 0.950 found 22 outliers among 100 samples (22.0%) 22 (22.0%) high severe variance introduced by outliers: 89.411% variance is severely inflated by outliers
benchmarking Char8 toLowerC mean: 12.74308 us, lb 12.24657 us, ub 13.31365 us, ci 0.950 std dev: 2.712828 us, lb 2.434402 us, ub 2.904232 us, ci 0.950 variance introduced by outliers: 94.689% variance is severely inflated by outliers
benchmarking bsToLower mean: 20.68829 us, lb 20.66869 us, ub 20.70941 us, ci 0.950 std dev: 104.5939 ns, lb 95.37577 ns, ub 120.6401 ns, ci 0.950
G.
On Thu, Sep 20, 2012 at 5:01 PM, Michael Snoyman
wrote: Well... let's test it out:
benchmarking Char8 mean: 333.0050 us, lb 329.2846 us, ub 336.2362 us, ci 0.950 std dev: 17.73400 us, lb 15.69876 us, ub 19.45947 us, ci 0.950 variance introduced by outliers: 51.452% variance is severely inflated by outliers
benchmarking Char8 toLowerC mean: 117.1571 us, lb 116.8739 us, ub 117.4219 us, ci 0.950 std dev: 1.394150 us, lb 1.189928 us, ub 1.649276 us, ci 0.950
benchmarking Word8 mean: 41.01667 us, lb 40.94708 us, ub 41.09468 us, ci 0.950 std dev: 378.4175 ns, lb 335.4655 ns, ub 462.6281 ns, ci 0.950
benchmarking bsToLower mean: 37.37589 us, lb 37.24453 us, ub 37.48697 us, ci 0.950 std dev: 616.5653 ns, lb 513.7510 ns, ub 752.8996 ns, ci 0.950 found 9 outliers among 100 samples (9.0%) 3 (3.0%) low severe 4 (4.0%) low mild 2 (2.0%) high mild variance introduced by outliers: 9.426% variance is slightly inflated by outliers
So a specialized `Char -> Char` function helps, but doesn't completely close the performance gap. (Updates at the same gist[1].)
I disagree with a problem with an extra package: this is such a low-level detail that average users don't need to really be aware of the existence of the package, and I think the marginal increase in compile times shouldn't cause any issues. I used to worry much more about adding extra packages to the mix, but with the more recent versions of cabal-install and the community's general improvement in handling dependency hell, I see less of a reason to do so.
That said, I think having specialized toLower/toUpper in a central place- perhaps even bytestring itself- would be a good thing.
Michael
[1] https://gist.github.com/3756212
This is, of course, not an apples-to-apples test:
Prelude Data.Char> toUpper 'χ' '\935' Prelude Data.Char> putStrLn ('\935':[]) Χ
...which I suppose is the point. I wonder whether a version of toUpper/toLower on Char restricted to ASCII values would have the same performance here.
We only call toLower explicitly in one place in snap-server, but where
would be nice to fix is for HTTP headers, where I think we are all using case-insensitive (which just calls "map toLower"). Probably we should send Bas a patch to optimize the FoldCase instance for ByteString.
Personally I would prefer not to have yet another tiny package here, as the package zoo has enough creatures in it as it is. Do we think we have a real problem here beyond the toUpper/toLower case? I suspect that for most other uses of Data.ByteString.Char8 the conversion is a no-op.
G
On Thu, Sep 20, 2012 at 4:17 PM, Michael Snoyman
wrote: On Thu, Sep 20, 2012 at 2:10 PM, Michael Snoyman
wrote:
On Thu, Sep 20, 2012 at 11:41 AM, Kazu Yamamoto
wrote: > Hello, > > ByteString is an array of Word8 but it seems to me that people tend to > use the Char interface with Data.ByteString.Char8 instead of Word8 > interface with Data.ByteString. Since the functions defined in > Data.ByteString.Char8 converts Word8 to Char and Char to Word8, it has > unnecessary overhead. Yes, the overhead is ignorable in many cases, > but I would like to remove it for high performance server. > > Why do people use Data.ByteString.Char8? I guess that there are two > reasons: > > - There are no standard utility functions for Word8 such as "isUpper" > - Numeric literal (e.g 72 for 'H') is not readable > > To fix these problems, I implemented the Data.Word8 module and > uploaded the word8 library to Hackage: > > > http://hackage.haskell.org/packages/archive/word8/0.0.0/doc/html/Data-Word8.... > > If Michael and Bas like this, I would like to modify warp and > case-insensitive to use the word8 library. What do people think On Thu, Sep 20, 2012 at 5:47 PM, Gregory Collins
wrote: this this? > > My concern is that character names start with "_". Some people would > dislike this convention. But I have not a better idea at this moment. > Suggestions are welcome. > > --Kazu > > _______________________________________________ > web-devel mailing list > web-devel@haskell.org > http://www.haskell.org/mailman/listinfo/web-devel
Sounds good to me. I put together a simple benchmark to compare the performance of toLower, and the results are encouraging:
benchmarking Char8 mean: 38.04527 us, lb 37.94080 us, ub 38.12774 us, ci 0.950 std dev: 470.9770 ns, lb 364.8254 ns, ub 748.3015 ns, ci 0.950
benchmarking Word8 mean: 4.807265 us, lb 4.798199 us, ub 4.816563 us, ci 0.950 std dev: 47.20958 ns, lb 41.51181 ns, ub 55.07049 ns, ci 0.950
I want to try throwing one more idea into the mix, I'll post with updates when I have them.
So to answer your question: I'd be happy to include word8 in warp :).
Michael
{-# LANGUAGE OverloadedStrings #-} import Criterion.Main import qualified Data.ByteString as S import qualified Data.ByteString.Char8 as S8 import qualified Data.Char import qualified Data.Word8
main :: IO () main = do input <- S.readFile "bench.hs" defaultMain [ bench "Char8" $ whnf (S.length . S8.map Data.Char.toLower) input , bench "Word8" $ whnf (S.length . S.map Data.Word8.toLower) input ]
I tried implementing a more low-level approach to try and avoid the Word8 boxing. The results improved a bit, but not significantly:
benchmarking Char8 mean: 318.2341 us, lb 314.5367 us, ub 320.4834 us, ci 0.950 std dev: 14.48230 us, lb 10.00946 us, ub 21.22126 us, ci 0.950 found 9 outliers among 100 samples (9.0%) 8 (8.0%) low severe variance introduced by outliers: 43.472% variance is moderately inflated by outliers
benchmarking Word8 mean: 35.79037 us, lb 35.66547 us, ub 35.92601 us, ci 0.950 std dev: 665.5299 ns, lb 599.3413 ns, ub 741.6474 ns, ci 0.950 variance introduced by outliers: 11.349% variance is moderately inflated by outliers
benchmarking bsToLower mean: 31.49299 us, lb 31.32314 us, ub 31.65027 us, ci 0.950 std dev: 835.2251 ns, lb 744.4337 ns, ub 946.1789 ns, ci 0.950 variance introduced by outliers: 20.925% variance is moderately inflated by outliers
Perhaps someone with more experience with this level of optimization would be able to improve the algorithm:
https://gist.github.com/3756212
Michael
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
-- Gregory Collins
-- Gregory Collins
Hmm... I don't get your results.
benchmarking Char8 mean: 394.2475 us, lb 393.1611 us, ub 395.3824 us, ci 0.950 std dev: 5.674103 us, lb 4.574321 us, ub 7.548278 us, ci 0.950 found 16 outliers among 100 samples (16.0%) 1 (1.0%) low severe 4 (4.0%) low mild 6 (6.0%) high mild 5 (5.0%) high severe variance introduced by outliers: 7.517% variance is slightly inflated by outliers
benchmarking Char8 toLowerC mean: 81.19748 us, lb 80.95403 us, ub 81.40814 us, ci 0.950 std dev: 1.154865 us, lb 977.5925 ns, ub 1.497224 us, ci 0.950 found 2 outliers among 100 samples (2.0%) 1 (1.0%) low severe variance introduced by outliers: 7.506% variance is slightly inflated by outliers
benchmarking Word8 mean: 43.01692 us, lb 42.94030 us, ub 43.09647 us, ci 0.950 std dev: 401.2451 ns, lb 362.3989 ns, ub 458.7243 ns, ci 0.950
benchmarking bsToLower mean: 36.61481 us, lb 36.46137 us, ub 36.79378 us, ci 0.950 std dev: 850.7579 ns, lb 717.1316 ns, ub 1.004895 us, ci 0.950 found 16 outliers among 100 samples (16.0%) 2 (2.0%) low mild 10 (10.0%) high mild 4 (4.0%) high severe variance introduced by outliers: 17.062% variance is moderately inflated by outliers
I'm compiling with -O2 and running on 7.4.1, 64-bit Linux. I'm uncertain what would lead to such a significant difference in our runtimes. Any chance you can include Word8 in your run?
Michael
--
Gregory Collins