
firefly:
General ------- Bytestrings are faster than haskell strings. This is good and expected. But their memory use, ouch! There seems to be a grave memory bug in 6.9.
Bug report -- since the bytestring library hasn't changed between 6.8.2 and head, that sounds like a bug report.
Lazy bytestrings are slower than strict bytestrings -- this was completely unexpected for me.
Which programs should I look at? Is there a specific program or two where lazy bytestring performance is entirely out of whack (against 6.8.2?)? If so, please forward it to me and Duncan.
I/O --- The I/O patterns seen in the various programs are: 1) strict bytestrings = read increasingly large chunks (they double every time). I don't know if the already-read data have to be copied around or if the memory allocator can handle extending existing blocks (if they are at the "front" of the heap, say) like realloc() can.
if you're reading from stdin, and we don't know the size, bytestring getContents uses realloc.
Even if the blocks do get copied around, that's not where the performance limiter is (speed-wise). It does seem to be a bit unwise in terms of memory pressure.
Does the gc kick in every time the heap has to expand? Does it do a full collection then? If no other allocations have happened since the last collection than the allocation that caused the heap to expand, then perhaps skip the collection? (or some criteria similar to that)
2) lazy bytestrings = read blocks of 32K-8 bytes, as expected. The C benchmarks show that there's no penalty for reading 32K-8 vs. 32K.
3) native strings = read blocks 8K.
The C benchmarks show that it barely matters if the block size is 4K, 32K, or 32K-8 bytes. In any case, the small differences due to block size are completely in the noise. Reading very large blocks, though, as the strict bytestrings do, actually has a cost (38-70% slower, depending on the CPU and kernel and RAM and busspeeds, etc -- see email 10 in the series). It is still peanets compared for almost all the benchmarks.
Useful information, thanks.
Backend ------- The backend turns out to be rather bad. It is generally very easy to improve the generated code by hand without doing anything in terms of sophisticated analysis. One can rewrite inner loops using heroic code (load four characters at a time together into a 32-bit register or even use MMX to handle eight characters in parallel) but it isn't really necessary to do that to gain a good improvement and start being competitive with C.
The backend is probably what is costly on some of the lazy bytestring and native string code because there are too many tests to see if the buffer has been exhausted (I think). Some simple hoisting and common subexpression elimination would go far here.
There may be things we can do at the library level too. We changed the lazy bytestring representation in the last release cycle to remove a number of tests. If you've a specific example would help though.
Thoroughness ------------ It really is necessary to benchmark more than one ghc/library combination, otherwise the bad memory bugs wouldn't stand out so clearly or I would have believed, as Don Stewart did, that he had fixed the memory allocation bug.
I fixed a particular issue with the library. But please make a bug report if there are other issues wrt. ghc head in particularly.
Action Items ------------ Suggested short-term actions: o Can the allocator expand blocks? If not, see if it can be implemented. o Find out why lazy bytestrings "by hand" are that slow - is it due to lack of hoisting in the backend?
Example please!
o Improve backend, mainly by not hiding information from register allocator. Alternatively, by using an assembly-to-assembly transformator that improves the register allocation.
Some nice examples that could help motivate the native gen hackers would be useful.
o Fix -sstderr. o perhaps even let the RTS print out /proc/self/maps or /proc/self/status or "peak RSS" returned by getrusage() in the ru_maxrss field? o Find memory leaks in ghc 6.9 code.
Bug report time.
o ghc should allow round-tripping of the C-- code it generates. o would be awfully nice if the backend didn't sometimes throw the recognizable labels away. o can the core/stg code be made easier to read?
Can you ensure these points are summarised and directed to the ghc bug tracker, http://hackage.haskell.org/trac/ghc/newticket?type=bug to this work isn't lost on -cafe@. -- Don