Re: [Haskell-cafe] Bytestrings and [Char]

23 Mar 2010


      On Tue, Mar 23, 2010 at 03:31:33PM -0400, Nick Bowler wrote:
...
On 18:25 Tue 23 Mar     , Iustin Pop wrote:
...
On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
...
On 18:11 Tue 23 Mar     , Iustin Pop wrote:
...
I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…
I've been struggling with this problem too and it's not nice. Every time
one uses the system readFile & friends (anything that doesn't read via
ByteStrings), it hell slow.
Test: read a file and compute its size in chars. Input text file is
~40MB in size, has one non-ASCII char. The test might seem stupid but it
is a simple one. ghc 6.12.1.
Data.ByteString.Lazy (bytestring readFile + length) - < 10 miliseconds,
incorrect length (as expected).
Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.
Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
String (system readfile + length) - ~1 second, correct length.
Is this a mistake?  Your own report shows String & readFile being an
order of magnitude faster than everything else, contrary to your earlier
claim.
No, it's not a mistake. String is faster than pack to Text and length, but it's
100 times slower than ByteString.
Only if you don't care about obtaining the correct answer, in which case
you may as well just say const 42 or somesuch, which is even faster.
...
My whole point is that difference between byte processing and char processing
in Haskell is not a few percentage points, but order of magnitude. I would
really like to have only the 6x penalty that Python shows, for example.
Hang on a second... less than 10 milliseconds to read 40 megabytes from
disk?  Something's fishy here.
Of course I don't want to benchmark the disk, and therefore the source file is
on tmpfs.
...
I ran my own tests with a 400M file (419430400 bytes) consisting almost
exclusively of the letter 'a' with two Japanese characters placed at
every multiple of 40 megabytes (UTF-8 encoded).
With Prelude.readFile/length and 5 runs, I see
10145ms, 10087ms, 10223ms, 10321ms, 10216ms.
with approximately 10% of that time spent performing GC each run.
With Data.Bytestring.Lazy.readFile/length and 5 runs, I see
8223ms, 8192ms, 8077ms, 8091ms, 8174ms.
with approximately 20% of that time spent performing GC each run.
Maybe there's some magic command line options to tune the GC for our
purposes, but I only managed to make things slower.  Thus, I'll handwave
a bit and just shave off the GC time from each result.
Prelude: 9178ms mean with a standard deviation of 159ms.
Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms.
Therefore, we managed a throughput of 43 MB/s with the Prelude (and got
the right answer), while we managed 61 MB/s with lazy ByteStrings (and
got the wrong answer).  My disk won't go much, if at all, faster than
the second result, so that's good.
I'll bet that for a 400MB file, if you have more than two 2GB of ram, most of
it will be cached. If you want to check Haskell performance, just copy it to a
tmpfs filesytem so that the disk is out of the equation.
...
So that's a 30% reduction in throughput.  I'd say that's a lot worse
than a few percentage points, but certainly not orders of magnitude.
Because you're possibly benchmarking the disk also. With a 400MB file on tmpfs,
lazy bytestring readfile + length takes on my machine ~150ms, which is way
faster than 8 seconds…
...
On the other hand, using Data.ByteString.Lazy.readFile and
Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with
approximately 5% of that time spent in GC, which is rather worse than
the Prelude.  Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are
even worse, with results of around 25 *seconds* (!!) and 2% of that time
spent in GC.
GNU wc computes the correct answer as quickly as lazy bytestrings
compute the wrong answer.  With perl 5.8, slurping the entire file as
UTF-8 computes the correct answer just as slowly as Prelude.  In my
first ever Python program (with python 2.6), I tried to read the entire
file as a unicode string and it quickly crashes due to running out of
memory (yikes!), so it earns a DNF.
So, for computing the right answer with this simple test, it looks like
the Prelude is the best option.  We tie with Perl and lose only to GNU
wc (which is written in C).  Really, though, it would be nice to close
that gap.
Totally agreed :)

iustin