Adding binarySize to Binary

Hello I keep wanting something like: binarySize :: Binary a => Proxy a -> (Int,Maybe Int) in the Data.Binary for returning the minimum and maximum (or Nothing for infinite) space requirements for objects of a given type. Proxy is just defined as "data Proxy t = Proxy", but omitting it is also possible. Would other people consider this an useful addition? - Einar Karttunen

ekarttun:
Hello
I keep wanting something like:
binarySize :: Binary a => Proxy a -> (Int,Maybe Int)
in the Data.Binary for returning the minimum and maximum (or Nothing for infinite) space requirements for objects of a given type.
Proxy is just defined as "data Proxy t = Proxy", but omitting it is also possible. Would other people consider this an useful addition?
Yes, I've thought this would be useful too. A la 'sizeOf' in Storable. Should it be a member of the Binary class? -- Don

On Sun, 2007-02-04 at 14:00 +1100, Donald Bruce Stewart wrote:
ekarttun:
Hello
I keep wanting something like:
binarySize :: Binary a => Proxy a -> (Int,Maybe Int)
in the Data.Binary for returning the minimum and maximum (or Nothing for infinite) space requirements for objects of a given type.
Proxy is just defined as "data Proxy t = Proxy", but omitting it is also possible. Would other people consider this an useful addition?
Yes, I've thought this would be useful too. A la 'sizeOf' in Storable. Should it be a member of the Binary class?
What would it mean? How hard would it have to work to be accurate? Is this just for (mostly-)fixed size records? What about lists? I don't think we should be forcing the list just to see how big it is. What is the use-case? Duncan

On 04.02 12:31, Duncan Coutts wrote:
Dons wrote:
Yes, I've thought this would be useful too. A la 'sizeOf' in Storable. Should it be a member of the Binary class?
Yes, as there is no other good way of implementing it in general.
What would it mean? How hard would it have to work to be accurate? Is this just for (mostly-)fixed size records? What about lists? I don't think we should be forcing the list just to see how big it is.
As it would operate on types rather than values (think sizeOf), the result for "[a]" would probably be "(1, Nothing)" - a list takes one byte at minimum and infinite bytes at maximum.
What is the use-case?
I am playing with on disk B-tree like things and object sizes become quite important when deciding layout. Several optimizations can be done if the objects are known to be fixed size (or have only small size variations). And if it is known beforehand how many objects will fit into a page that helps too. - Musasabi

On Sun, Feb 04, 2007 at 04:33:52PM +0200, Musasabi wrote:
On 04.02 12:31, Duncan Coutts wrote:
Dons wrote:
Yes, I've thought this would be useful too. A la 'sizeOf' in Storable. Should it be a member of the Binary class?
Yes, as there is no other good way of implementing it in general.
And I presume the class would define a default function returning (1,Nothing), or maybe returning (0,Nothing), since some data types will take no space. So it'd only put a burden on instance-declarers who want to help out optimizing users of their data type. -- David Roundy Department of Physics Oregon State University

Hi
As it would operate on types rather than values (think sizeOf), the result for "[a]" would probably be "(1, Nothing)" - a list takes one byte at minimum and infinite bytes at maximum.
That interface seems horrible - it looks like it will only be useful to a small number of people, and not be general at all. I really don't like the idea of an interface specifying a "fuzz factor" (which is what upper/lower bounds correspond to) In my BinaryDefer library I have a class BinaryDeferStatic: class BinaryDefer a => BinaryDeferStatic a where -- | Must be a constant, must not examine first argument getSize :: Proxy a -> Int This is for things which have a fixed and static size based on their type. I could also see a reason for having a sizeOf method in the Binary class - where if unimplemented it just calls encode and then B.length. Anything else just seems to be an ugly API... Thanks Neil

On Mon, Feb 05, 2007 at 06:37:38PM +0000, Neil Mitchell wrote:
Hi
As it would operate on types rather than values (think sizeOf), the result for "[a]" would probably be "(1, Nothing)" - a list takes one byte at minimum and infinite bytes at maximum.
That interface seems horrible - it looks like it will only be useful to a small number of people, and not be general at all. I really don't like the idea of an interface specifying a "fuzz factor" (which is what upper/lower bounds correspond to)
But often a fuzz factor is helpful, and it's always well-defined. One would often like to allocate padded structures so you can either modify them in-place or have O(1) access to the elements, and you need a max size for that. The min size would be helpful for knowing when it's worth padding. It'd be heuristic, but if the max size is 1000 times larger than the min size, you might not want to always allocate the maximum.
In my BinaryDefer library I have a class BinaryDeferStatic:
class BinaryDefer a => BinaryDeferStatic a where -- | Must be a constant, must not examine first argument getSize :: Proxy a -> Int
This is for things which have a fixed and static size based on their type.
Except that the class-based approach is no good for a function (or data type) which is intended to work with any data, even that which doesn't have a static size.
I could also see a reason for having a sizeOf method in the Binary class - where if unimplemented it just calls encode and then B.length.
Except that sizeOf as you describe it would operate on values rather than on types, and as such would be useless for any of the uses of binarySize.
Anything else just seems to be an ugly API...
I agree about the ugliness of binarySize as implemented (returning a tuple). Why not something like: class Binary a where ... maxSize :: Proxy a -> Maybe Int maxSize _ = Nothing minSize :: Proxy a -> Int minSize _ = 0 staticSize :: Proxy a -> Maybe Int staticSize p | maxSize p == Just (minSize p) = maxSize p staticSize _ = Nothing Thus one can do nice things like make an array class that operates on any Binary data, but can do nice tricks to optimize access times. e.g. one might want to allocate N*maxSize space, so you can have O(1) writes (in a mutable array, or on disk). -- David Roundy Department of Physics Oregon State University
participants (5)
-
David Roundy
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Musasabi
-
Neil Mitchell