Re[4]: FPS again

15 Jul 2006

      Hello Duncan,

Saturday, July 15, 2006, 8:04:26 PM, you wrote:
...
...
can you test that this implementation
  lines = split 0x0a
is as fast as existing (long) ones both for Lazy and Strict ByteString?
...
It might actually be the other way around, that the split implementation
could benefit from the work that went into the optimisation of the lines
function. I spent quite some time trying to optimise the lines
implementation, at least for the Lazy module. To get better performance
it relies on the assumption that many lines fit into a chunk. That may
not be true for uses of split in general. It's worth investigating.
well, you know this problem much deeper than me. so i'm shutting up :)

although i can say that strict ByteString should benefit from your
implementation too (both for lines and split, for obvious reasons)

imho, Lazy.split should just use (map P.split) and then join lines
that was split between adjacent blocks
...
Btw, you can run the benchmarks too, they are included in the fps repo.
...
...
also, is not it faster to use the following implementation:
  isSpaceWord8 = (spacesFlagsArray!)?
...
Benchmark it and tell us which is faster.
can my laziness be enough justification? :)
...
...
also, i propose to move getLine/getContents/putStr/interact/readFile-type
functions into .Char8 modules (both for strict and lazy bytestrings),
because these functions are encoding-dependent and work with texts
(as opposite to hGet/hPut which works with raw binary data blocks).
...
Yes, getLine and putStrLn are encoding dependent (they know the encoding
of '\n'). getContents, putStr, readfile, interact etc are
encoding-independent, they're just the same as hGet/hPut, working on
binary data blocks. Indeed putStr = hPut stdout.
they all work with text files, so they are also encoding-dependent
(translating CR+LF to LF on windows). putStr is only exception, but
it can be moved for company :)

this will make clear distinction between functions using ByteString as
raw sequence of bytes (hGet/hPut) and functions using ByteString as
packed String representing text data
...
...
in particular, i tried to implement Lazy.hGetLines as 'hGetContents >>= lines'
but it was impossible because 'lines' function is defined only in
Lazy.Char8 module
...
Yes, that's the way it should be. And of course there is no need for
hGetLines in the Lazy module since it is just hGetContents >>= lines
In my opinion the hGetLines in the other module should be removed too as
it's just a special case of what the Lazy module does.
it's also possible. but the situation when one ByteString
implementation supports particular function while another don't
imho is not very good. user should be able to switch between
implementations w/o rewriting his entire program

btw, you may be interested to know that i implemented in Streams lib
mmapBinaryFile, based on the code from ByteString. it works both on
Windows and Unix, using universal mmap API i described in letter to
David Roundy

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@gmail.com

Re[4]: FPS again

Bulat Ziganshin