
However, there seems to be a conflict between the nature of mixing and stream processing when it comes to efficiency. As it turns out, it's more efficient to process channels one by one within a chunk instead of producing samples one by one. It takes a lot less context switching to first generate the output of channel 1, then generate channel 2 (and simultaneously add it to the mix) and so on, than to mix sample 1 of all channels, then sample 2 etc., since we can write much tighter loops when we only deal with one channel at a time. On the other hand, stream fusion is naturally fit to generate samples one by one. It looks like the general solution requires a fusable transpose operation, otherwise we're back to hand-coding the mixer. Have you found a satisfying solution to this problem?
I wonder if data-parallel haskell won't be able to help here, mod rendering is a scatter-gather style of processing, the problem is that the different channels trigger different processing.