k-minima in Haskell

raghu vardhan

12 Apr 2007 12 Apr '07

3:28 a.m.

What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. R M Send a FREE SMS to your friend's mobile from Yahoo! Messenger. Get it now at http://in.messenger.yahoo.com/

Attachments:

attachment.html (text/html — 940 bytes)

Show replies by date

dons＠cse.unsw.edu.au

12 Apr 12 Apr

3:30 a.m.

mrvr84:

...

What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. R M

Is this a homework question? -- Don

Stefan O'Rear

3:38 a.m.

On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:

...

What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.

Go read and thoroughly understand "Why Functional Programming Matters." Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself). Stefan

Stefan O'Rear

3:47 a.m.

On Wed, Apr 11, 2007 at 08:38:48PM -0700, Stefan O'Rear wrote:

...

On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:

...
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.

Go read and thoroughly understand "Why Functional Programming Matters."

Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself).

Stefan

Don tells me (in #haskell) that you are legitimate, so here is the example: kminima k lst = take k $ sort lst If you want to be really explicit about it, here is a sort that will work: sort [] = [] sort l@(x:_) = filter (x) l (A stable quicksort, btw) Note that this is FASTER than your bound - somewhere between O(n) and O(n*log n). Ain't lazy evaluation great? :) Stefan

Tim Chevalier

4:20 a.m.

On 4/11/07, Stefan O'Rear wrote:

...

If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-) Cheers, Tim -- Tim Chevalier * chevalier@alum.wellesley.edu * Often in error, never in doubt Confused? See http://catamorphism.org/transition.html

Stefan O'Rear

4:24 a.m.

On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:

...

On 4/11/07, Stefan O'Rear wrote:

...
If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-)

Indeed. Stefan

Thomas Hartman

13 Apr 13 Apr

9:34 a.m.

...

...
You may be missing a few recursive calls there :-)

Indeed.

I'm confused. Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.) This was also the first I have heard of stability as a sort property. http://perldoc.perl.org/sort.html may shed some light on this... "A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. " Is this description a fair characterization of stability for the current discussion? I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n). non-stable quicksort is just sort from the prelude: qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs) If any in the above was incorrect, please holler. 2007/4/12, Stefan O'Rear :

...

On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:

...
On 4/11/07, Stefan O'Rear wrote:

...
If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-)

Indeed.

Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas Hartman

9:38 a.m.

And for reference, here is again stefan's "stable" quicksort from his earlier post. " sort [] = [] sort l@(x:_) = filter (x) l (A stable quicksort, btw) " This is the code whose legitimacy I am requesting confirmation of. 2007/4/13, Thomas Hartman :

...

...
...
You may be missing a few recursive calls there :-)

Indeed.

I'm confused.

Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)

This was also the first I have heard of stability as a sort property.

http://perldoc.perl.org/sort.html may shed some light on this...

"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "

Is this description a fair characterization of stability for the current discussion?

I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).

non-stable quicksort is just sort from the prelude:

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

If any in the above was incorrect, please holler.

2007/4/12, Stefan O'Rear :

...
On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:

...
On 4/11/07, Stefan O'Rear wrote:

...
If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-)

Indeed.

Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas Hartman

9:53 a.m.

Trying to understand this better I also came across http://groups.google.de/group/fa.haskell/browse_thread/thread/1345c49faff85926/8f675bd2aeaa02ba?lnk=st&q=%22I+assume+that+you+want+to+find+the+k%27th+smallest+element%22&rnum=1&hl=en#8f675bd2aeaa02ba where apfulmus gives an implementation of mergesort, which he claims "runs in O(n) time instead of the expected O(n log n)" Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement? mergesort [] = [] mergesort xs = foldtree1 merge $ map return xs foldtree1 f [x] = x foldtree1 f xs = foldtree1 f $ pairs xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) = if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys 2007/4/13, Thomas Hartman :

...

And for reference, here is again stefan's "stable" quicksort from his earlier post.

" sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw) "

This is the code whose legitimacy I am requesting confirmation of.

2007/4/13, Thomas Hartman :

...
...
...
You may be missing a few recursive calls there :-)

Indeed.

I'm confused.

Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)

This was also the first I have heard of stability as a sort property.

http://perldoc.perl.org/sort.html may shed some light on this...

"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "

Is this description a fair characterization of stability for the current discussion?

I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).

non-stable quicksort is just sort from the prelude:

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

If any in the above was incorrect, please holler.

2007/4/12, Stefan O'Rear :

...
On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:

...
On 4/11/07, Stefan O'Rear wrote:

...
If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-)

Indeed.

Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas Hartman

9:56 a.m.

Rereading this, I see in fact apfelmus explains this is "O(n + k*log n)" for the first k elements, which this discussion also maintains is the best case. So, there's no discrepancy. I think this is a very valuable post to read for the explanation. 2007/4/13, Thomas Hartman :

...

Trying to understand this better I also came across

http://groups.google.de/group/fa.haskell/browse_thread/thread/1345c49faff85926/8f675bd2aeaa02ba?lnk=st&q=%22I+assume+that+you+want+to+find+the+k%27th+smallest+element%22&rnum=1&hl=en#8f675bd2aeaa02ba

where apfulmus gives an implementation of mergesort, which he claims

"runs in O(n) time instead of the expected O(n log n)"

Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?

mergesort [] = [] mergesort xs = foldtree1 merge $ map return xs

foldtree1 f [x] = x foldtree1 f xs = foldtree1 f $ pairs xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs

merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) = if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys

2007/4/13, Thomas Hartman :

...
And for reference, here is again stefan's "stable" quicksort from his earlier post.

" sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw) "

This is the code whose legitimacy I am requesting confirmation of.

2007/4/13, Thomas Hartman :

...
...
...
You may be missing a few recursive calls there :-)

Indeed.

I'm confused.

Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)

This was also the first I have heard of stability as a sort property.

http://perldoc.perl.org/sort.html may shed some light on this...

"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "

Is this description a fair characterization of stability for the current discussion?

I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).

non-stable quicksort is just sort from the prelude:

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

If any in the above was incorrect, please holler.

2007/4/12, Stefan O'Rear :

...
On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:

...
On 4/11/07, Stefan O'Rear wrote:

...
If you want to be really explicit about it, here is a sort that will work:

sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw)

You may be missing a few recursive calls there :-)

Indeed.

Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

ajb＠spamcop.net

11:32 a.m.

G'day all. Quoting Thomas Hartman :

...

Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?

It's worth considering what the theoretical minimum is. You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information. A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is: O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k)) That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k). Cheers, Andrew Bromage

Ian Lynagh

2:09 p.m.

On Fri, Apr 13, 2007 at 07:32:20AM -0400, ajb@spamcop.net wrote:

...

Quoting Thomas Hartman :

...
Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?

It's worth considering what the theoretical minimum is.

You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information.

A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is:

O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))

That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).

Hmm, is something wrong with the following?: Tuple each element with its position: O(n) Find kth smallest element in linear time, as per [1]: O(n) Filter list for elements <= kth smallest: O(n) Sort filtered list by position: O(k log k) map snd to get the positions: O(k) Total: O(n + k log k) (the filter step will take care of elements with the same value as the kth smallest, as the filter is also comparing element positions when the values are the same). Thanks Ian [1] http://en.wikipedia.org/wiki/Selection_algorithm

Colin DeVilbiss

6:44 p.m.

On 4/13/07, Ian Lynagh wrote:

...

Tuple each element with its position: O(n) Find kth smallest element in linear time, as per [1]: O(n) Filter list for elements <= kth smallest: O(n) Sort filtered list by position: O(k log k) map snd to get the positions: O(k)

Total: O(n + k log k)

[1] http://en.wikipedia.org/wiki/Selection_algorithm

Inspired by the above, I thought I'd see about writing it. Note that attaching the indices prevents equal items from comparing equal. I didn't feel like writing the code for a special data type that ignored a second element for the purposes of comparisons; that just means replacing "zip" and "map send". The user can add stability by selective use of reverse within the continuation functions. There should probably be strictness annotations somewhere, and calls to "length" should probably be accumulated in partition instead, but the idea should be sound (except for the likelihood of a bad pivot). partition cont _ [] lt eq gt = cont lt eq gt partition cont p (x:xs) lt eq gt = case x `compare` p of LT -> partition cont p xs (x:lt) eq gt EQ -> partition cont p xs lt (x:eq) gt GT -> partition cont p xs lt eq (x:gt) qsort [] = [] qsort (x:xs) = partition qs' x xs [] [x] [] where qs' lt eq gt = qsort lt ++ (eq ++ qsort gt) findfirst _ [] = [] findfirst k (x:xs) = partition ff' x xs [] [x] [] where ff' lt eq gt = let { ll = length lt; lle = ll + length eq } in if k < ll then findfirst k lt else if k > lle then lt ++ eq ++ findfirst (k - lle) gt else lt ++ take (k - ll) eq getSmallest k = qsort . findfirst k getSmallestIndices k = map snd . getSmallest k . flip zip [0..]

apfelmus

14 Apr 14 Apr

10:33 a.m.

ajb@spamcop.net wrote:

...

You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information.

A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is:

O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))

That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).

Ian Lynagh wrote:

...

Hmm, is something wrong with the following?: [...] Total: O(n + k log k)

Mh, I'm not sure. At least, we have log (n! / (n-k)!) = log n! - log (n-k)! = log 1 + log 2 + log 3 + ... + log (n-k) + ... + log n - log 1 - log 2 - log 3 - ... - log (n-k) = log (n-k +1) + ... + log (n-k +k) which is greater than (k log (n-k)) and smaller than (k log n). For k=1, this estimate yields a minimum time of (log n) to find the minimum of a list. While not wrong, this clearly underestimates the necessary O(n). I think that estimating (n log (n/(n-k)) to n for k << n is not correct because the logarithm of 1 = n/n is 0 and not 1 :) Ian Lynagh wrote:

...

[1] http://en.wikipedia.org/wiki/Selection_algorithm

Thanks for the link, Ian. It clearly shows that a lazy take k . qsort takes only O(n + k log k) time. I posted an analysis as follow up to the old thread on haskell-general http://article.gmane.org/gmane.comp.lang.haskell.general/15110 Regards, apfelmus

ajb＠spamcop.net

15 Apr 15 Apr

5:57 a.m.

G'day all. I wrote:

...

...
O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))

That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).

Quoting Ian Lynagh :

...

Hmm, is something wrong with the following?: [...] Total: O(n + k log k)

The problem with with my simplifications. :-) But as an exercise, prove: O(n log (n/(n-k)) + k log (n-k)) <= O(n + k log k) Cheers, Andrew Bromage

Thorkil Naur

13 Apr 13 Apr

1:08 p.m.

Hello, My Hugs tells me this: Prelude> let sort [] = []; sort l@(x:_) = filter (x) l in sort [1,3,2] [1,3,2] Prelude> So, no, this is not a working sorting function. Inserting the "few missing recursive calls": Prelude> let sort [] = []; sort l@(x:_) = sort ( filter (x) l ) in sort [1,3,2] [1,2,3] Prelude> Best regards Thorkil On Friday 13 April 2007 11:38, Thomas Hartman wrote:

...

And for reference, here is again stefan's "stable" quicksort from his earlier post.

" sort [] = [] sort l@(x:_) = filter (x) l

(A stable quicksort, btw) "

This is the code whose legitimacy I am requesting confirmation of.

2007/4/13, Thomas Hartman :

...
...
...
You may be missing a few recursive calls there :-)

Indeed.

...

ajb＠spamcop.net

12 Apr 12 Apr

5:33 a.m.

G'day all. Quoting raghu vardhan :

...

What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.

Pretty much like everyone has says, although it's best to use a real lazy O(n log n) sort, not quicksort-with-dumbest-pivot. To get the indices, use the Schwartzian transform: sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[]) mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss) mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..] Cheers, Andrew Bromage

Kurt Hutchinson

2:22 p.m.

On 4/12/07, ajb@spamcop.net wrote:

...

To get the indices, use the Schwartzian transform:

sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[])

mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss)

mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss

mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys

getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..]

For my own edification, what is the benefit of this sortWith over sortBy? f `on` g = \ x y -> f ( g x ) ( g y ) kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..]

Vincent Kraeutler

2:48 p.m.

Kurt Hutchinson wrote:

...

On 4/12/07, ajb@spamcop.net wrote:

...
To get the indices, use the Schwartzian transform:

sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[])

mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss)

mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss

mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys

getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..]

For my own edification, what is the benefit of this sortWith over sortBy?

f `on` g = \ x y -> f ( g x ) ( g y ) kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..]

possibly related (newbie question): pairs are instances of Ord, why not directly sort those (implying the item to be sorted is fst):

...

kminima k = \list -> map snd $ take k $ sort $ zip list [0..]

ajb＠spamcop.net

13 Apr 13 Apr

1:09 a.m.

G'day all. Quoting Kurt Hutchinson :

...

For my own edification, what is the benefit of this sortWith over sortBy?

None. I wanted to write my own sort, illustrating how the lazy evaluation thing works, and I didn't want a name clash with an existing library function. Cheers, Andrew Bromage

Dino Morelli

12 Apr 12 Apr

2:07 p.m.

On Thu, 12 Apr 2007, raghu vardhan wrote:

...

What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.

R M

I don't know about performance, but trying this problem I was struck again by the gorgeous, terse code that can be created: import Data.List import Data.Ord kminima :: (Enum a, Num a, Ord b) => Int -> [b] -> [a] kminima k lst = take k $ map fst $ sortBy (comparing snd) $ zip [0 ..] lst commented: kminima k lst = take k -- We want k items off the front $ map fst -- Just the list of indices $ sortBy (comparing snd) -- Sort the pairs by their snd $ zip [0 ..] lst -- Preserve indices in tuples Prelude> :l kminima.hs [1 of 1] Compiling Main ( kminima.lhs, interpreted ) Ok, modules loaded: Main. *Main> kminima 3 [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36] [14,3,4] *Main> kminima 4 [10,9,8,7,6,5,4,3,2,1] [9,8,7,6] -- .~. Dino Morelli /V\ email: dino@ui3.info /( )\ irc: dino- ^^-^^ preferred distro: Debian GNU/Linux http://www.debian.org

6665

Age (days ago)

6668

Last active (days ago)

List overview

Download

20 comments

13 participants

participants (13)

ajb＠spamcop.net
apfelmus
Colin DeVilbiss
Dino Morelli
dons＠cse.unsw.edu.au
Ian Lynagh
Kurt Hutchinson
raghu vardhan
Stefan O'Rear
Thomas Hartman
Thorkil Naur
Tim Chevalier
Vincent Kraeutler