
On Mon, 2007-11-19 at 20:06 -0600, Nicolas Frisby wrote:
In light of this discussion, I think the "fully spine-strict list instance does more good than bad" argument is starting to sound like a premature optimization. Consequently, using a newtype to treat the necessarily lazy instances as special cases is an inappropriate bandaid.
I agree.
My current opinion: If Data.Binary makes both a fully strict list instance (not []) and a fully lazy list instance (this would be the default for []) available, then that will also make available all of the other intermediate strictness. I'll elaborate that a bit. If the user defines a function appSpecificSplit :: MyDataType -> [StrictList a], then the user can control the compactness and laziness of the serialisation by tuning that splitting function. Niel's 255 schema fits as one particular case, the split255 :: [a] -> [StrictList a] function. I would hesitate to hard code a number of elements, since it certainly depends on the application and only exposing it as a parameter maximizes the reusability of the code.
Fully lazy is the wrong default here I think. But fully strict is also not right. What would fit best with the style of the rest of the Data.Binary library is to be lazy in a lumpy way. This can give excellent performance where as being fully lazy cannot (because the chunk size becomes far too small which increases the overhead). Has anyone actually said they want the list serialisation to be fully lazy? Is there a need for anything more than just not being fully strict? If there is, I don't see it. If it really is needed it can be added just by flushing after serialising each element.
"Reaching for the sky" idea: Does the Put "monad" offer enough information for an instance to be able to recognize when it has filled a lazy bytestring's first chunk? It could cater its strictness ( i.e. vary how much of the spine is forced before any output is generated) in order to best line up with the chunks of lazy bytestring it is producing. This might be trying to fit too much into the interface. And it might even make Put an actual monad ;)
That is something I've considered. Serialise just as much of the list as is necessary to fill the remainder of a chunk. Actually we'd always fill just slightly more than a chunk because we don't know how big each list element will be, we only know when we've gone over. Duncan