Re: [web-devel] Data.Word8 (word8 library)

20 Sep 2012

      On Thu, Sep 20, 2012 at 2:10 PM, Michael Snoyman  wrote:
...
On Thu, Sep 20, 2012 at 11:41 AM, Kazu Yamamoto  wrote:
...
Hello,
ByteString is an array of Word8 but it seems to me that people tend to
use the Char interface with Data.ByteString.Char8 instead of Word8
interface with Data.ByteString. Since the functions defined in
Data.ByteString.Char8 converts Word8 to Char and Char to Word8, it has
unnecessary overhead. Yes, the overhead is ignorable in many cases,
but I would like to remove it for high performance server.
Why do people use Data.ByteString.Char8? I guess that there are two
reasons:
- There are no standard utility functions for Word8 such as "isUpper"
- Numeric literal (e.g 72 for 'H') is not readable
To fix these problems, I implemented the Data.Word8 module and
uploaded the word8 library to Hackage:
http://hackage.haskell.org/packages/archive/word8/0.0.0/doc/html/Data-Word8....
If Michael and Bas like this, I would like to modify warp and
case-insensitive to use the word8 library. What do people think this?
My concern is that character names start with "_". Some people would
dislike this convention. But I have not a better idea at this moment.
Suggestions are welcome.
--Kazu
_______________________________________________
web-devel mailing list
web-devel@haskell.org
http://www.haskell.org/mailman/listinfo/web-devel
Sounds good to me. I put together a simple benchmark to compare the
performance of toLower, and the results are encouraging:
benchmarking Char8
mean: 38.04527 us, lb 37.94080 us, ub 38.12774 us, ci 0.950
std dev: 470.9770 ns, lb 364.8254 ns, ub 748.3015 ns, ci 0.950
benchmarking Word8
mean: 4.807265 us, lb 4.798199 us, ub 4.816563 us, ci 0.950
std dev: 47.20958 ns, lb 41.51181 ns, ub 55.07049 ns, ci 0.950
I want to try throwing one more idea into the mix, I'll post with
updates when I have them.
So to answer your question: I'd be happy to include word8 in warp :).
Michael
{-# LANGUAGE OverloadedStrings #-}
import Criterion.Main
import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8
import qualified Data.Char
import qualified Data.Word8
main :: IO ()
main = do
    input <- S.readFile "bench.hs"
    defaultMain
        [ bench "Char8" $ whnf (S.length . S8.map Data.Char.toLower) input
        , bench "Word8" $ whnf (S.length . S.map Data.Word8.toLower) input
        ]
I tried implementing a more low-level approach to try and avoid the
Word8 boxing. The results improved a bit, but not significantly:

benchmarking Char8
mean: 318.2341 us, lb 314.5367 us, ub 320.4834 us, ci 0.950
std dev: 14.48230 us, lb 10.00946 us, ub 21.22126 us, ci 0.950
found 9 outliers among 100 samples (9.0%)
  8 (8.0%) low severe
variance introduced by outliers: 43.472%
variance is moderately inflated by outliers

benchmarking Word8
mean: 35.79037 us, lb 35.66547 us, ub 35.92601 us, ci 0.950
std dev: 665.5299 ns, lb 599.3413 ns, ub 741.6474 ns, ci 0.950
variance introduced by outliers: 11.349%
variance is moderately inflated by outliers

benchmarking bsToLower
mean: 31.49299 us, lb 31.32314 us, ub 31.65027 us, ci 0.950
std dev: 835.2251 ns, lb 744.4337 ns, ub 946.1789 ns, ci 0.950
variance introduced by outliers: 20.925%
variance is moderately inflated by outliers

Perhaps someone with more experience with this level of optimization
would be able to improve the algorithm:

https://gist.github.com/3756212

Michael