
On Thursday 06 May 2010 20:35:15, Travis Erdman wrote:
Two questions here, I think they might be related, perhaps even the same, but I am not sure, so I will ask both:
Q1: Re the function f below, I like the first implementation as it's "cleaner", but is the second implementation necessary for performance purposes?
-- g = some CPU intensive function
-- alternate 1 f a b = c + (g b) where c = dosomethingelse a (g b)
A Haskell implementation *might* calculate (g b) only once, but very probably it will calculate it twice.
-- alternate 2 f a b = c + saveit where saveit = g b c = dosomethingelse a saveit
A Haskell implementation is allowed to calculate saveit twice, but a decent one won't. If you want to share the result of a computation, give a name to it. One reason why sharing the common subexpression (g b) in the first snippet is not done is that it often is a terrible idea. If, for example, g b is a large list, sharing hogs a lot of memory. If dosomethingelse and (+) consume that list in a suitable manner, only a small portion of it need be in memory at any given time. Of course, if (g b) takes only very little memory (certainly if it is an Int, Word, ..., Integer [almost certainly]), automatic sharing would be a good idea. However, looking for common subexpressions to share depending on their type would be even more complicated than looking for all common subexpressions, so probably nobody implemented it.
Q2: Imagine that I might be creating a custom data structure. I need to access both the data in that structure, but also "features" (don't know proper comp sci term) of the data structure. For instance, consider a Haskell list. The stuff in the list is the data, whereas the length of the list is what I am calling a "feature" here. Some of my features might be quite a bit more computationally intensive than "length", and may be referenced multiple times in many different places in my code. For performance purposes, should I consider modifying my data structure to embed these "features" as part of the data to ensure that it is only calculated once?
Yes, unless these features eat too much memory (then recalculating might be cheaper). If you need the features only for the big total, a wrapper analogous to MyList below is a good choice data FeatureThing = FT { feature1, feature2 :: Int, feature3 :: Bool, ..., thing :: Thing } , otherwise incorporate the features directly in Thing (like the size is incorporated in Set/Map).
Or can I rely on Haskell to do that for me?
No, you can't rely on that.
For instance, do I need to create the equivalent of:
data MyList a = MyList {mylist::[a], mylength::Int}
Thanks again for all your generous advice.