
On Sat, 2006-07-15 at 19:16 +0400, Bulat Ziganshin wrote:
Hello Donald,
can you test that this implementation lines = split 0x0a is as fast as existing (long) ones both for Lazy and Strict ByteString?
It might actually be the other way around, that the split implementation could benefit from the work that went into the optimisation of the lines function. I spent quite some time trying to optimise the lines implementation, at least for the Lazy module. To get better performance it relies on the assumption that many lines fit into a chunk. That may not be true for uses of split in general. It's worth investigating. Btw, you can run the benchmarks too, they are included in the fps repo.
also, is not it faster to use the following implementation: isSpaceWord8 = (spacesFlagsArray!)?
Benchmark it and tell us which is faster.
also, i propose to move getLine/getContents/putStr/interact/readFile-type functions into .Char8 modules (both for strict and lazy bytestrings), because these functions are encoding-dependent and work with texts (as opposite to hGet/hPut which works with raw binary data blocks).
Yes, getLine and putStrLn are encoding dependent (they know the encoding of '\n'). getContents, putStr, readfile, interact etc are encoding-independent, they're just the same as hGet/hPut, working on binary data blocks. Indeed putStr = hPut stdout.
in particular, i tried to implement Lazy.hGetLines as 'hGetContents >>= lines' but it was impossible because 'lines' function is defined only in Lazy.Char8 module
Yes, that's the way it should be. And of course there is no need for hGetLines in the Lazy module since it is just hGetContents >>= lines In my opinion the hGetLines in the other module should be removed too as it's just a special case of what the Lazy module does.
i send you a bunch of small patches that fixes I/O part of library, providing the same set of operations for lazy and strict bytestrings, for ghc and non-ghc platforms
also, i run into small problems using FPS repository to development (seems that i'm first windows developer of the lib). First, i propose to change darcs 'prefs' file to the following:
test cd tests && make fast
- it should work both on unix and windows
Fair enough. :-)
second, i've changed 'time' calls in tests/Makefile to use my own 't' utility instead of 'time'. but of course it's not universal solution. at least, 'time' in windows shell (cmd.exe) is _built-in_ utility that don't have anything common with unix 'time' :)
Duncan