
On Fri, 2010-05-21 at 19:17 +0300, Yitzchak Gale wrote:
Duncan Coutts wrote:
There are various sorts of programs that deal with a large quantity of data and often that includes some date/time data. It's a great shame that programs dealing with lots of time data might have to avoid using the time package and revert to things like newtype Seconds = Seconds Int32 simply because they take less space, and can be unpacked into other data structures.
That is true. But on the other hand, my guess is that most applications don't need those optimizations.
One of the most common complaints heard about Haskell is that Data.Time is so much more difficult to use than the basic time API in most other languages. I believe that those complaints are totally unjustified. It's more difficult because unlike the others, it is correct, and it uses the type system to prevent subtle intermittant bugs that almost always exist in applications using those other libraries.
Indeed. I appreciate that the interface helps us all to write correct time-handling code. That's why it's such a shame if people are tempted to go back to primitive stuff simply because of the memory profile.
But in any case, I think we should be very careful not to make the interface even more complicated
As you note below, it's not a real change or complication to the API.
just to achieve what is a premature optimization in the vast majority of applications.
Unfortunately the burden of a standard/common library is that people want to use it in a wide range of circumstances. So while it'd certainly be a premature optimisation in most circumstances, it's fairly important in others. I happened to be working recently on an unremarkable program, where the heap profiler told me that a very significant portion of the heap was storing time data structures.
Many of the suggestions you are making would actually be transparent except when constructors are used explicitly. So perhaps we could achieve most of what you are suggesting without changing the interface if we provide polymorphic functions in place of each of the constructors, then move the actual contructors to "Internals" modules for use only by those who need them. We would then be free to change the internal types now and in the future without significant impact on the interface.
I don't think we need to go that far in making the representations completely hidden, though I don't object to that if you think that's a design improvement. While technically it is an API change to switch the Pico fields to an equivalent numeric type, it is one that is likely not to break many uses, and where it does it's likely only to be a type signature or a fromIntegral. The point is it keeps the existing spirit of the interface and does not add interface complexity.
As for laziness, it's usually not correct to have strictness by default in a library as general as this one. For example, it's conceivable that someone will construct a UTCTime where the Day is easy to compute but the TimeOfDay results in heavy computation or even a bottom.
Honestly I find it a bit hard to conceive of. :-) It appears that pretty much all the functions that construct and consume TimeOfDay, LocalTime and UTCTime are strict[*]. So unless people are using the constructors directly and then not using other functions on them, then it looks like one cannot really use time structures lazily anyway (though perhaps I missed some, I don't know the library that well.) [*] localTimeToUTC, utcToLocalTime and utcToZonedTime appear to be lazy in the time of day component, but strict in the day component.
That user would be unpleasantly surprised if we introduced strictness. Gratuitous strictness is also a premature optimization, especially in a general-purpose library. Haskell is lazy by default.
It's not a hard and fast rule that everything should be lazy. I certainly appreciate that we need lazyness in the right places. I think it depends on whether we consider these values as units or not. I tend to think of time values as atomic units, like complex or rational numbers. The standard H98 complex and rational types are strict in their components, because conceptually they're not compound types, they're atomic.
Perhaps we could introduce strict alternatives for some of the functions. That wouldn't help the size of the data types though, unless we change some of them to type classes...
It's not about the strictness of the functions. It's about the in-memory representation. We cannot achieve a compact representation if we use large branching records. We only get it from using strict fields since that allows us to unbox them into flat compact records. I admit I was assuming that the lazyness in most of the time records is unnecessary. If we all conclude that the lazyness of the existing representations is essential then there's really few improvements we can make (probably even if we were to hide the representations). In that case it might make more sense to add separate compact representations that can be converted to the standard representations, eg a reduced resolution compact TimeStamp type that can be converted to/from UTCTime or LocalTime or something like that. The idea being you store your hundreds of thousands of compact TimeStamps in data structures, but convert to the regular type for the calculations. The downside of course is that this does add interface complexity, it would be nicer if we could make the regular types reasonably compact. Perhaps we can see what other people think about the balance of use cases between those that need the lazyness vs those that need compat representations. I may well be overestimating how common the latter use case is. Duncan