
There seems to be some confusion about the question. There are two key things to keep in mind here:
1) You must make at most O(k*n) comparisons (in the worst case) if the list has length n.
2) The output must be the indices and not the numbers themselves.
So, any algorithm that sorts is no good (think of n as huge, and k small). Another interesting caveat to this is that if k=2, you can actually solve the problem with just (n+log n) comparisons in worst case(instead of 2*n, that you get by a naive approach), and it's a nice exercise to do this.
As a further clarification, this is not a homework question. I genereally do whatever programming I do in Matlab, as I work with matrices (huge ones) and use this function a lot. I just wanted to see how different an implementation you can get in Haskell (I am new to Haskell so I might not be able to come up with the best way to do this).
----- Original Message ----
From: Stefan O'Rear
On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.
Go read and thoroughly understand "Why Functional Programming Matters."
Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself).
Stefan
Don tells me (in #haskell) that you are legitimate, so here is the
example:
kminima k lst = take k $ sort lst
If you want to be really explicit about it, here is a sort that will
work:
sort [] = []
sort l@(x:_) = filter (

G'day all.
Quoting raghu vardhan
So, any algorithm that sorts is no good (think of n as huge, and k small).
The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell. The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort. Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time. Cheers, Andrew Bromage

raghu vardhan
So, any algorithm that sorts is no good (think of n as huge, and k small).
With lazy evaluation, it is! http://article.gmane.org/gmane.comp.lang.haskell.general/15010 ajb@spamcop.net wrote:
The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell.
(It's not only most elegant, it can even be put to work.)
The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort.
Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time.
You mean O(k * log n + n) of course. Regards, apfelmus

The link pretty much answers my question, though you probably require a little bit more book keeping to get the indices out. Compared to the iterative version or the matlab version, this definitely is more elegant, but also is trickier. apfelmus wrote:
raghu vardhan
: So, any algorithm that sorts is no good (think of n as huge, and k small).
With lazy evaluation, it is!
http://article.gmane.org/gmane.comp.lang.haskell.general/15010
ajb@spamcop.net wrote:
The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell.
(It's not only most elegant, it can even be put to work.)
The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort.
Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time.
You mean O(k * log n + n) of course.
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- View this message in context: http://www.nabble.com/k-minima-in-Haskell-tf3563220.html#a9964572 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

\begin{code} kmin :: Ord a => Int -> [a] -> [Int] kmin k x = map snd $ Set.toList $ foldl' insertIfSmall (Set.fromList h) t where (h, t) = splitAt k $ zip x [0..] insertIfSmall :: Ord a => Set.Set a -> a -> Set.Set a insertIfSmall s e | e < mx = Set.insert e s' | otherwise = s where (mx, s') = Set.deleteFindMax s \end{code} This gives O(log k * (n + k)) execution in constant memory. If you need the result indices to be in order, you can put in a sort at the end without changing the complexity. This could be improved by a significant constant factor by using a priority queue instead of Set. Any Edison people out there? Regards, Yitz

\begin{code} kmin :: Ord a => Int -> [a] -> [Int] kmin k x = map snd $ Set.toList $ foldl' insertIfSmall (Set.fromList h) t where (h, t) = splitAt k $ zip x [0..] insertIfSmall :: Ord a => Set.Set a -> a -> Set.Set a insertIfSmall s e | e < mx = Set.insert e s' | otherwise = s where (mx, s') = Set.deleteFindMax s \end{code} This gives O(log k * (n + k)) execution in constant memory. If you need the result indices to be in order, you can put in a sort at the end without changing the complexity. This could be improved by a significant constant factor by using a priority queue instead of Set. Any Edison people out there? Regards, Yitz

Does the answer change if the data source isn't a list, already in memory,
but a stream? That is, will the sort end up pulling the entire stream into
memory, when we only need k elements from the entire stream.
Interestingly, this is almost exactly the same as one of my standard
interview questions, with the main difference being looking for the k
elements closest to a target value, rather than the smallest. Not that it
really changes the problem, but recognizing that is one of the things I'm
looking for.
On 4/12/07, apfelmus
raghu vardhan
: So, any algorithm that sorts is no good (think of n as huge, and k small).
With lazy evaluation, it is!
http://article.gmane.org/gmane.comp.lang.haskell.general/15010
ajb@spamcop.net wrote:
The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell.
(It's not only most elegant, it can even be put to work.)
The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort.
Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time.
You mean O(k * log n + n) of course.
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ah, but which k elements? You won't know until you've drained your entire stream! Dan Steve Downey wrote:
Does the answer change if the data source isn't a list, already in memory, but a stream? That is, will the sort end up pulling the entire stream into memory, when we only need k elements from the entire stream.
Interestingly, this is almost exactly the same as one of my standard interview questions, with the main difference being looking for the k elements closest to a target value, rather than the smallest. Not that it really changes the problem, but recognizing that is one of the things I'm looking for.
On 4/12/07, *apfelmus*
mailto:apfelmus@quantentunnel.de> wrote: raghu vardhan
mailto:mrvr84@yahoo.co.in>: > So, any algorithm that sorts is no good (think of n as huge, and k small). With lazy evaluation, it is!
http://article.gmane.org/gmane.comp.lang.haskell.general/15010
ajb@spamcop.net mailto:ajb@spamcop.net wrote: > The essence of all the answers that you've received is that it doesn't > matter if k is small, sorting is still the most elegant answer in Haskell.
(It's not only most elegant, it can even be put to work.)
> The reason is that in Haskell, a function which sort function will produce the > sorted list lazily. If you don't demand the while list, you don't perform > the whole sort. > > Some algorithms are better than others for minimising the amount of > work if not all of the list is demanded, but ideally, producing the > first k elements will take O(n log k + k) time.
You mean O(k * log n + n) of course.
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dan Weston
Ah, but which k elements? You won't know until you've drained your entire stream!
True, but you still don't need to keep the whole stream in memory at once, just the k-least-so-far as you work your way through the stream - once you've read a part of the stream you can mostly forget it again. The question as I understood it was one of if even in Haskell there's a better way than sorting that means you need only have a fragment of the stream in memory at once. -- Mark

Both Yitzchak's and my suggestions should run in constant space--some strictness annotation or switching to foldl' might be necessary. On 4/12/07, Mark T.B. Carroll wrote:
Dan Weston
writes: Ah, but which k elements? You won't know until you've drained your entire stream!
True, but you still don't need to keep the whole stream in memory at once, just the k-least-so-far as you work your way through the stream - once you've read a part of the stream you can mostly forget it again. The question as I understood it was one of if even in Haskell there's a better way than sorting that means you need only have a fragment of the stream in memory at once.
-- Mark
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

ajb@spamcop.net wrote:
Quoting apfelmus
: You mean O(k * log n + n) of course.
Erm, yes. You can do it in an imperative language by building a heap in O(n) time followed by removing k elements, in O(k log n) time.
Ah, sorry, there are indeed to variants not comparable to each other. Threading a heap of k elements over the entire list needs O(n log k + k) time and putting all of the list into a heap takes O(k log n + n) time. For say k = O(sqrt(n)), the former is slower than the latter but it only needs to keep O(k) list elements in memory. I think that every k-minima algorithm of the form take k . sort has to keep all list elements in memory: the sort may not throw away anything because it cannot know how many elements are requested. Regards, apfelmus

For various reasons I had a similar problem that I solved iteratively simply with a sorted list of the actual best elements. The only particular things were 1. keep element count (to easily know if the element should be inserted in any case) 2. keep the list in reverse order to have the biggest element as first, and make the common case (list stays the same) fast 3. make the list strict (avoid space leaks) not the best in worst case of decreasingly ordered elements O(n*k), but good enough for me. A set + explicit maximal element would probably be the best solution. Fawzi -- | keeps the score of the n best (high score) -- (uses list, optimized for small n) data NBest a = NBest Int [a] deriving (Eq) -- | merges an element in the result with given ranking function merge1 :: Int -> (a -> Double) -> a -> NBest a -> NBest a merge1 n rankF fragment (NBest m []) | m==0 && n>0 = NBest 1 [fragment] | m==0 = NBest 0 [] | otherwise = error "empty list and nonzero count" merge1 n rankF fragment (NBest m (xl@(x0:xs))) | n>m = NBest (m+1) (insertOrdered fragment xl) | rankF fragment < (rankF x0) = NBest n (insertOrdered fragment xs) | otherwise = NBest m xl where insertOrdered x (x1:xr) | rankF x >= rankF x1 = x:x1:xr | otherwise = let r = insertOrdered x xr in x1 `seq` r `seq` x1:r where insertOrdered x [] = [x]
participants (11)
-
ajb@spamcop.net
-
apfelmus
-
Dan Weston
-
Fawzi Mohamed
-
mark@ixod.org
-
Nicolas Frisby
-
R M
-
raghu vardhan
-
Ronny Wichers Schreur
-
Steve Downey
-
Yitzchak Gale