Fwd: Proposal (breaking change, but probably not one that will break any real code): strictify genericLength

Since Krzysztof made their replies public, I figured I'd add this reply to the mix (an expanded/improved version of a message I sent them). I think these things are worth considering however things turn out. If, for the sake of stability, genericLength has to stay (pretty much) the same, it probably wouldn't hurt to add a genericLength' or some such. As for the current genericLength, it is not a panacea even for lazy naturals. I started by noticing that the same result is achieved by the slightly nicer definition genericLength = foldr (\_ y = 1 + y) However, just looking at the definition more closely than I had reveals an efficiency problem even here. Specifically, this definition assumes that a + b is defined something like Zero + b = b Succ a + b = a + Succ b It would, however, be perfectly legitimate to do it the other way around: a + Zero = a a + Succ b = Succ a + b In the former case, 1+y is O(1) and y+1 is O(y), whereas the opposite is true in the latter case! The situation for the strict version has some subtleties too. One thing that's not subtle is that the strict version should be written using foldl' (at least in most cases), which should allow it to take advantage of the new call arity analysis and resulting optimization. As for the nitty-gritty, there are several cases to think about: 1. Int8, Int16, Word8, and Word16, where it is advantageous to calculate the length as an Int (or perhaps a Word) and then narrow the result—I'm not sure if GHC will do this right automatically in this case (and the answer could depend on whether -fllvm is used or not), but it would be best to do some benchmarking before concluding that it's okay to ignore. 2. Int32, Word32, Int64, and Word64, which will generally work well calculated directly, although it's possible that the latter two might be done better some other way on 32-bit machines. 3. Integer: it should work to calculate that as a Word64 and convert it, but then again if we wait 30 years that might not be true anymore, maybe. 4. Floating point: The current definition, as well as the one you propose, seem wrong for floating point. I would expect the floating point length of a sufficiently long list to be the closest approximation from the Word64 length, but if you tried to find the genericLength of a list with length 2^24 as a Float with these definitions, you will have a problem: they will actually get stuck at the largest exactly representable integer and go no further.
participants (1)
-
David Feuer