
On Mon, Jul 6, 2009 at 12:26 PM, Petr Pudlak
More precisely: The result of sorting an n-element list or array should be a structure that allows to ask i-th element of the result (for example, a lazy array). Asking k arbitrary elements should take no more than O(min(n * log n, k * n))
I'd argue this is not really a true use of "laziness" and more just amortized cost. That said, yes, it can be done! I'm going to use an imperative data structure because I hate you all</sarcasm> and it makes the analysis/presentation slightly easier. To wit, we have a sorted array of unsorted bags of inputs. For example, (suppose we start with an array of numbers 1..10, scrambled), we might start with: (10,{1,5,6,8,3,4,2,7,9,10}) And after some operations, it might look like: (3,{1,3,2});(4,{6,7,5,4});(3:{8,9,10}) And we're trying to (eventually) hit the state: (1,{1});(2,{2});...etc. (It's not hard to see how to in constant time maintain an array of pointers into these bags to allow finding elements.) Now, using a linear-time selection algorithm like median-of-medians or Chazelle's, it's easy to see how to find the k-th largest element in such a data structure: find the bag containing the k-th element, apply the selection algorithm. Furthermore, still in linear time, we can (while we're working) split the bag around that element we found. So, given the data state: (10,{1,5,6,8,3,4,2,7,9,10}) if we're asked for the 4th largest element, we'll update to: (3,{1,3,2});(1,{4});(6,{5,6,8,7,9,10}) So far so good, right? Now we just apply one more change: after each lookup, we walk across the list of bags, and split each in half (as if we were asked for the median element of each bag.) In my running example, we'd go to: (1,{1});{1,{2});(1,{3});(1,{4});(2,{5,6});(1,{7});(3,{8,9,10}) This is still linear time--we run a linear-time algorithm on disjoint subsets of the input. Now, note: after k lookups to this structure, the largest bag is at most n/2^k elements long! Couple with a slight optimization that overwrites the lookup table into the bags with just the correct results once the bags are small enough, and this matches your time bounds quite nicely. Now, I doubt it'll be fast, but that's not what you asked. Plus, come up with a persuasive use case for a system where you /really need/ the 4th, 27th, and 957th largest elements /now/ and none of the rest, and I promise I'll make something practically efficient. :) If someone can translate my algorithm into a non-side-effecting one, I'd appreciate it, but I think that like disjoint set/union, this is probably inherently side-effecting. Yes, yes, we could use functional arrays, but short of that I don't see a way to avoid side effects to take care of my amortized work. AHH