computational overhead of the "data" declarations

I Haskell School of Expression (p172), it says: A newtype declaration is just like a data declaration, except that it can
only be used to defined data types with single constructor. The new data type is different from the analogous one created by a data declaration, in that there is no computational overhead in having the constructor.... You can think of a newtype as defining a "new type" with exactly the same structure, behaviour, and performance as the underlying type.
What is (or where do you see) the computational overhead of the "data" declrations?

On Tue, Oct 28, 2008 at 12:13:11PM -0700, Daryoush Mehrtash wrote:
I Haskell School of Expression (p172), it says:
A newtype declaration is just like a data declaration, except that it can
only be used to defined data types with single constructor. The new data type is different from the analogous one created by a data declaration, in that there is no computational overhead in having the constructor.... You can think of a newtype as defining a "new type" with exactly the same structure, behaviour, and performance as the underlying type.
What is (or where do you see) the computational overhead of the "data" declrations?
Consider the data declaration data Foo = Bar Int | Baz Char String When a value of type Foo is stored in memory, there will actually be some memory set aside for the tag (either Bar or Baz) since pattern matching may need to be performed at runtime. This is different from a C union type, where if you want such a tag you must store it yourself. However, consider data Bad = Bad Int Some memory will still be set aside to hold the 'Bad' tag, although in this case we can see that isn't necessary: a value of type Bad can only have one possible constructor. The solution is to use a newtype: newtype Good = Good Int Now, when a value of type Bad is stored at runtime, it will simply store an Int, with no tag. The benefit of declaring a newtype like this is that the typechecker will ensure that you cannot mix up Goods and Ints, although at runtime they will have the same representation. So, to answer your question, the only computational overhead with a data declaration is the extra memory and time to store and process the constructor tags. It usually isn't noticeable, but sometimes the difference can be important. -Brent

So, to answer your question, the only computational overhead with a data declaration is the extra memory and time to store and process the constructor tags. It usually isn't noticeable, but sometimes the difference can be important.
Is there also potentially overhead due to laziness? I mean, in this case: newtype Fast = Fast Int If you have a Fast value, you know you've got a real Int, not a thunk, right? But here: data Slow = Slow Int Doesn't a Slow value remain unevaluated until you pattern match out the Int?

eford:
So, to answer your question, the only computational overhead with a data declaration is the extra memory and time to store and process the constructor tags. It usually isn't noticeable, but sometimes the difference can be important.
Is there also potentially overhead due to laziness? I mean, in this case:
newtype Fast = Fast Int
If you have a Fast value, you know you've got a real Int, not a thunk, right? But here:
data Slow = Slow Int
Doesn't a Slow value remain unevaluated until you pattern match out the Int?
Also, remember that GHC is an optimising, strictness-analysing compiler, making 'data' cheap or non-existent. Tags are also represented as bits in pointers to heap values, so 'tags' are very cheap, if they're still needed (e.g. when testing on Just/Nothing). E.g. Slow = Slow { unSlow :: Int } deriving Show bigger :: Slow -> Slow bigger (Slow n) = Slow (n * 52) small = Slow 7 main = print (unSlow $ bigger small) Is compiled too: lvl :: String lvl = $wshowSignedInt 0 364 ([] @ Char) So there's no runtime overhead. -- Don

newtype Good = Good Int
Now, when a value of type Bad is stored at runtime, it will simply store an Int, with no tag
Did you mean to say Good in the above sentence?
What are the run time implication of newtype vs data where types are
unwrapped and wrapped from on type to another?
Thanks
Daryoush
On Tue, Oct 28, 2008 at 1:08 PM, Brent Yorgey
On Tue, Oct 28, 2008 at 12:13:11PM -0700, Daryoush Mehrtash wrote:
I Haskell School of Expression (p172), it says:
A newtype declaration is just like a data declaration, except that it can
only be used to defined data types with single constructor. The new data type is different from the analogous one created by a data declaration, in that there is no computational overhead in having the constructor.... You can think of a newtype as defining a "new type" with exactly the same structure, behaviour, and performance as the underlying type.
What is (or where do you see) the computational overhead of the "data" declrations?
Consider the data declaration
data Foo = Bar Int | Baz Char String
When a value of type Foo is stored in memory, there will actually be some memory set aside for the tag (either Bar or Baz) since pattern matching may need to be performed at runtime. This is different from a C union type, where if you want such a tag you must store it yourself.
However, consider
data Bad = Bad Int
Some memory will still be set aside to hold the 'Bad' tag, although in this case we can see that isn't necessary: a value of type Bad can only have one possible constructor. The solution is to use a newtype:
newtype Good = Good Int
Now, when a value of type Bad is stored at runtime, it will simply store an Int, with no tag. The benefit of declaring a newtype like this is that the typechecker will ensure that you cannot mix up Goods and Ints, although at runtime they will have the same representation.
So, to answer your question, the only computational overhead with a data declaration is the extra memory and time to store and process the constructor tags. It usually isn't noticeable, but sometimes the difference can be important.
-Brent _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Brent Yorgey wrote:
Daryoush Mehrtash wrote:
What is (or where do you see) the computational overhead of the "data" declrations?
So, to answer your question, the only computational overhead with a data declaration is the extra memory and time to store and process the constructor tags. It usually isn't noticeable, but sometimes the difference can be important.
Another important part of this that wasn't mentioned is pattern matching. For datatypes with only one constructor there's only one branch to choose among for (one-ply deep) patterns, and so the matching can be skipped entirely. This saves a conditional/vector jump, which can dramatically affect cache behavior. Not needing to pattern match also allows unpacking of the arguments to the constructor into arguments to the function, thus saving on allocation and garbage collection. More importantly, if the arguments are primitive types and strict then they can be unboxed into registers as well. In principle many of these optimizations can be and are applied to single-constructor 'data' types, but using 'newtype' tells the compiler to be sure to optimize them thusly. -- Live well, ~wren
participants (5)
-
Brent Yorgey
-
Daryoush Mehrtash
-
Don Stewart
-
Eli Ford
-
wren ng thornton