Re: memory useage of data types in the time package

21 May 2010

      On Fri, 2010-05-21 at 19:17 +0300, Yitzchak Gale wrote:
...
Duncan Coutts wrote:
...
There are various sorts of programs that deal with a large quantity of
data and often that includes some date/time data. It's a great shame
that programs dealing with lots of time data might have to avoid using
the time package and revert to things like
newtype Seconds = Seconds Int32
simply because they take less space, and can be unpacked into other data
structures.
That is true. But on the other hand, my guess is that most applications
don't need those optimizations.
One of the most common complaints heard about Haskell is that Data.Time
is so much more difficult to use than the basic time API in most other
languages. I believe that those complaints are totally unjustified. It's
more difficult because unlike the others, it is correct, and it uses the
type system to prevent subtle intermittant bugs that almost always
exist in applications using those other libraries.
Indeed. I appreciate that the interface helps us all to write correct
time-handling code. That's why it's such a shame if people are tempted
to go back to primitive stuff simply because of the memory profile.
...
But in any case, I think we should be very careful not to make the
interface even more complicated
As you note below, it's not a real change or complication to the API.
...
just to achieve what is a premature optimization in the vast majority
of applications.
Unfortunately the burden of a standard/common library is that people
want to use it in a wide range of circumstances. So while it'd certainly
be a premature optimisation in most circumstances, it's fairly important
in others.

I happened to be working recently on an unremarkable program, where the
heap profiler told me that a very significant portion of the heap was
storing time data structures.
...
Many of the suggestions you are making would actually be transparent
except when constructors are used explicitly. So perhaps we could
achieve most of what you are suggesting without changing the interface
if we provide polymorphic functions in place of each of the constructors,
then move the actual contructors to "Internals" modules for use only
by those who need them. We would then be free to change the internal
types now and in the future without significant impact on the interface.
I don't think we need to go that far in making the representations
completely hidden, though I don't object to that if you think that's a
design improvement.

While technically it is an API change to switch the Pico fields to an
equivalent numeric type, it is one that is likely not to break many
uses, and where it does it's likely only to be a type signature or a
fromIntegral. The point is it keeps the existing spirit of the interface
and does not add interface complexity.
...
As for laziness, it's usually not correct to have strictness by default
in a library as general as this one. For example, it's conceivable that
someone will construct a UTCTime where the Day is easy to compute
but the TimeOfDay results in heavy computation or even a bottom.
Honestly I find it a bit hard to conceive of. :-)

It appears that pretty much all the functions that construct and consume
TimeOfDay, LocalTime and UTCTime are strict[*]. So unless people are
using the constructors directly and then not using other functions on
them, then it looks like one cannot really use time structures lazily
anyway (though perhaps I missed some, I don't know the library that
well.)

[*] localTimeToUTC, utcToLocalTime and utcToZonedTime appear to be lazy
in the time of day component, but strict in the day component.
...
That user would be unpleasantly surprised if we introduced strictness.
Gratuitous strictness is also a premature optimization, especially
in a general-purpose library. Haskell is lazy by default.
It's not a hard and fast rule that everything should be lazy. I
certainly appreciate that we need lazyness in the right places. I think
it depends on whether we consider these values as units or not. 

I tend to think of time values as atomic units, like complex or rational
numbers. The standard H98 complex and rational types are strict in their
components, because conceptually they're not compound types, they're
atomic.
...
Perhaps we could introduce strict alternatives for some of the functions.
That wouldn't help the size of the data types though, unless we change
some of them to type classes...
It's not about the strictness of the functions. It's about the in-memory
representation. We cannot achieve a compact representation if we use
large branching records. We only get it from using strict fields since
that allows us to unbox them into flat compact records.

I admit I was assuming that the lazyness in most of the time records is
unnecessary. If we all conclude that the lazyness of the existing
representations is essential then there's really few improvements we can
make (probably even if we were to hide the representations). In that
case it might make more sense to add separate compact representations
that can be converted to the standard representations, eg a reduced
resolution compact TimeStamp type that can be converted to/from UTCTime
or LocalTime or something like that. The idea being you store your
hundreds of thousands of compact TimeStamps in data structures, but
convert to the regular type for the calculations. The downside of course
is that this does add interface complexity, it would be nicer if we
could make the regular types reasonably compact.

Perhaps we can see what other people think about the balance of use
cases between those that need the lazyness vs those that need compat
representations. I may well be overestimating how common the latter use
case is.

Duncan