Re: Proposal #3271: new methods for Data.Sequence

Ross, fromList2 takes perhaps 3ms off the time when n=50000. (It is essentially a two-liner, so I consider this acceptable.) Nevertheless, I have slightly improved the performance of the stable heapsort; essentially, when labeling values with their index, the index is left as a lazy thunk, but the computation is organized so computing any index takes O(log n). The speed increase from fromList . Data.List.sort . toList is still but 40% of the speed increase from the heapsort. Again, I'd like to propose a compromise: make Data.Sequence.sortBy cmp be either fromList2 . Data.List.sortBy cmp . toList, or use the stable pairing-heap sort, and add a second sorting method, sortBy', that is unstable but faster. Times (ms) min mean +/-sd median max to/from list: 113.561 119.800 5.470 118.678 139.390 pairing heap: 62.440 67.032 3.861 65.174 80.499 stable pheap: 93.402 102.253 11.854 98.529 158.922 fromList2 . toList: 110.221 118.364 6.525 116.265 136.683 Louis Wasserman wasserman.louis@gmail.com

Again, I'd like to propose a compromise: make Data.Sequence.sortBy cmp be either fromList2 . Data.List.sortBy cmp . toList, or use the stable pairing-heap sort, and add a second sorting method, sortBy', that is unstable but faster.
Sounds good to me but how about calling them unstableSortBy and unstableSort?

Done. See the new patch
herehttp://hackage.haskell.org/trac/ghc/attachment/ticket/3271/new-methods-for-d...
.
Key notes:
Among other things, I manually deforested the stable sorting algorithm,
resulting in a moderate performance gain on simply using Data.List.sortBy.
The current benchmarks indicate:
- unstableSort is typically 40% faster than sort, with no RTS options.
- With memory to spare, unstableSort is typically 15% faster than sort.
- On sorted data with no RTS options, unstableSort is nearly 70% faster
than sort.
- On sorted data with memory to spare, unstableSort is nearly 60% faster
than sort.
(Did I mention I think pairing heaps are excellent?)
Among all the methods I added to Data.Sequence, the most code-intensive are
tails/inits and consDigitToTree/snocDigitToTree.
The second was more intensively used with some previous zipWith
implementations that have since been replaced, but they add moderate speed
improvements to the significant amount of already-bulky code for appending
two sequences, and I can live with that on my conscience.
The code for tails/inits provides actual, concrete algorithmic benefits, in
particular that evaluating a tail from the sequence of tails is O(log n), as
opposed to O(n), but the entire tail sequence still only takes O(n) to
build. In particular, evaluating one tail and then the tail just after it
is still every bit as cheap as it should be. Finally, evaluating the whole
sequence of tails with this method is even faster than scanr (<|) empty, or
even a similar scanl, which is pretty nice.
Louis Wasserman
wasserman.louis@gmail.com
On Sun, Jul 19, 2009 at 9:57 PM, Evan Laforge
Again, I'd like to propose a compromise: make Data.Sequence.sortBy cmp be either fromList2 . Data.List.sortBy cmp . toList, or use the stable pairing-heap sort, and add a second sorting method, sortBy', that is unstable but faster.
Sounds good to me but how about calling them unstableSortBy and unstableSort?

On Mon, Jul 20, 2009 at 02:33:52PM -0400, Louis Wasserman wrote:
Among other things, I manually deforested the stable sorting algorithm, resulting in a moderate performance gain on simply using Data.List.sortBy.
That's not supposed to be necessary, because toList is supposed to fuse with list consumers. Unfortunately it doesn't in cases like this because it doesn't get inlined, even if we add an INLINE pragma, due to GHC bug #2353. If the GHC bug isn't fixed, I think the next best thing would be to force the inlining of toList with a RULES pragma in Data.Foldable.

I will look into this during August. Add yourself to the cc list for #2353 if interested Simon | -----Original Message----- | From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On | Behalf Of Ross Paterson | Sent: 03 August 2009 20:29 | To: libraries@haskell.org | Subject: Re: Proposal #3271: new methods for Data.Sequence | | On Mon, Jul 20, 2009 at 02:33:52PM -0400, Louis Wasserman wrote: | > Among other things, I manually deforested the stable sorting algorithm, | > resulting in a moderate performance gain on simply using Data.List.sortBy. | | That's not supposed to be necessary, because toList is supposed to fuse | with list consumers. Unfortunately it doesn't in cases like this because | it doesn't get inlined, even if we add an INLINE pragma, due to GHC bug | #2353. If the GHC bug isn't fixed, I think the next best thing would | be to force the inlining of toList with a RULES pragma in Data.Foldable. | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries

On Thu, Aug 06, 2009 at 08:22:28AM +0100, Simon Peyton-Jones wrote:
On 03 August 2009 20:29, Ross Paterson wrote: | On Mon, Jul 20, 2009 at 02:33:52PM -0400, Louis Wasserman wrote: | > Among other things, I manually deforested the stable sorting algorithm, | > resulting in a moderate performance gain on simply using Data.List.sortBy. | | That's not supposed to be necessary, because toList is supposed to fuse | with list consumers. Unfortunately it doesn't in cases like this because | it doesn't get inlined, even if we add an INLINE pragma, due to GHC bug | #2353. If the GHC bug isn't fixed, I think the next best thing would | be to force the inlining of toList with a RULES pragma in Data.Foldable.
I will look into this during August. Add yourself to the cc list for #2353 if interested.
OK, I'll add INLINE toList to Data.Foldable and wait till it starts working. (If it doesn't, I can add RULES to unconditionally force the expansion.) In any case we won't need to manually deforest Data.Sequence.sortBy. The naive version will (soon) be just as fast: sortBy :: (a -> a -> Ordering) -> Seq a -> Seq a sortBy cmp xs = fromList2 (length xs) (Data.List.sortBy cmp (toList xs))

On Mon, Jul 20, 2009 at 02:33:52PM -0400, Louis Wasserman wrote:
Among all the methods I added to Data.Sequence, the most code-intensive are tails/inits and consDigitToTree/snocDigitToTree.
The second was more intensively used with some previous zipWith implementations that have since been replaced, but they add moderate speed improvements to the significant amount of already-bulky code for appending two sequences, and I can live with that on my conscience.
In the latest version of your patch, consDTree/snocDTree are defined using foldr/foldl, which expands to repeated consTree/snocTree. In that case, it would be simpler to go back to writing out the repeated consTree/snocTree.
participants (4)
-
Evan Laforge
-
Louis Wasserman
-
Ross Paterson
-
Simon Peyton-Jones