
Hal Daume wrote a lot, including the following.
More seriously, I think the idea of going to and from lists of Word8 is going to kill performance. Especially if you are going to write to a file in the end. You're going to have to go:
data structure -> [Word8] -> File
and likely the middle won't be deforested by any compiler. Whether you use functional arrays or lists is somewhat irrelevant -- functional arrays are also exceedingly slow. I completely agree that going via lists of Word8's is unacceptable. My design does not require this. An array of Word8's can be written or read to a file in a single operation (not counting the length, which needs to be communicated some other way).
When you need to write to an intermediate data structure, the most efficient implementation would probably be to write directly to an array, which you realloc as necessary. This could be wrapped up into a single pure operation of type :: HasBinaryWrite a => a -> Bytes (say). Of course you would need to use IO in the middle, since it happens that the current interfaces to blocks of reallocable bytes require IO, but I don't really see how that can be avoided, and the end-user doesn't have to see it.