
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. R M Send a FREE SMS to your friend's mobile from Yahoo! Messenger. Get it now at http://in.messenger.yahoo.com/

mrvr84:
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. R M
Is this a homework question? -- Don

On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.
Go read and thoroughly understand "Why Functional Programming Matters." Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself). Stefan

On Wed, Apr 11, 2007 at 08:38:48PM -0700, Stefan O'Rear wrote:
On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.
Go read and thoroughly understand "Why Functional Programming Matters."
Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself).
Stefan
Don tells me (in #haskell) that you are legitimate, so here is the
example:
kminima k lst = take k $ sort lst
If you want to be really explicit about it, here is a sort that will
work:
sort [] = []
sort l@(x:_) = filter (

On 4/11/07, Stefan O'Rear
If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-) Cheers, Tim -- Tim Chevalier * chevalier@alum.wellesley.edu * Often in error, never in doubt Confused? See http://catamorphism.org/transition.html

On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:
On 4/11/07, Stefan O'Rear
wrote: If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-)
Indeed. Stefan

You may be missing a few recursive calls there :-)
Indeed.
I'm confused.
Is this a legitimate stable quicksort, or not? (My guess is, it is
indeed legit as written.)
This was also the first I have heard of stability as a sort property.
http://perldoc.perl.org/sort.html may shed some light on this...
"A stable sort means that for records that compare equal, the original
input ordering is preserved. Mergesort is stable, quicksort is not. "
Is this description a fair characterization of stability for the
current discussion?
I'm a bit confused but if I understand correctly sort from the prelude
is non stable quicksort, which has O(k n^2) as the worst case, whereas
stable quicksort has O( k* log n + n).
non-stable quicksort is just sort from the prelude:
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
If any in the above was incorrect, please holler.
2007/4/12, Stefan O'Rear
On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:
On 4/11/07, Stefan O'Rear
wrote: If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-)
Indeed.
Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

And for reference, here is again stefan's "stable" quicksort from his
earlier post.
"
sort [] = []
sort l@(x:_) = filter (
You may be missing a few recursive calls there :-)
Indeed.
I'm confused.
Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)
This was also the first I have heard of stability as a sort property.
http://perldoc.perl.org/sort.html may shed some light on this...
"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "
Is this description a fair characterization of stability for the current discussion?
I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).
non-stable quicksort is just sort from the prelude:
qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
If any in the above was incorrect, please holler.
2007/4/12, Stefan O'Rear
: On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:
On 4/11/07, Stefan O'Rear
wrote: If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-)
Indeed.
Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Trying to understand this better I also came across
http://groups.google.de/group/fa.haskell/browse_thread/thread/1345c49faff85926/8f675bd2aeaa02ba?lnk=st&q=%22I+assume+that+you+want+to+find+the+k%27th+smallest+element%22&rnum=1&hl=en#8f675bd2aeaa02ba
where apfulmus gives an implementation of mergesort, which he claims
"runs in O(n) time instead of the expected O(n log n)"
Does that mean you can have the k minima in O(n) time, where n is
length of list, which would seem to be an improvement?
mergesort [] = []
mergesort xs = foldtree1 merge $ map return xs
foldtree1 f [x] = x
foldtree1 f xs = foldtree1 f $ pairs xs
where
pairs [] = []
pairs [x] = [x]
pairs (x:x':xs) = f x x' : pairs xs
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) =
if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys
2007/4/13, Thomas Hartman
And for reference, here is again stefan's "stable" quicksort from his earlier post.
" sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw) "
This is the code whose legitimacy I am requesting confirmation of.
2007/4/13, Thomas Hartman
: You may be missing a few recursive calls there :-)
Indeed.
I'm confused.
Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)
This was also the first I have heard of stability as a sort property.
http://perldoc.perl.org/sort.html may shed some light on this...
"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "
Is this description a fair characterization of stability for the current discussion?
I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).
non-stable quicksort is just sort from the prelude:
qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
If any in the above was incorrect, please holler.
2007/4/12, Stefan O'Rear
: On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:
On 4/11/07, Stefan O'Rear
wrote: If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-)
Indeed.
Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rereading this, I see in fact apfelmus explains this is
"O(n + k*log n)" for the first k elements, which this discussion also
maintains is the best case. So, there's no discrepancy.
I think this is a very valuable post to read for the explanation.
2007/4/13, Thomas Hartman
Trying to understand this better I also came across
where apfulmus gives an implementation of mergesort, which he claims
"runs in O(n) time instead of the expected O(n log n)"
Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?
mergesort [] = [] mergesort xs = foldtree1 merge $ map return xs
foldtree1 f [x] = x foldtree1 f xs = foldtree1 f $ pairs xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs
merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) = if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys
2007/4/13, Thomas Hartman
: And for reference, here is again stefan's "stable" quicksort from his earlier post.
" sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw) "
This is the code whose legitimacy I am requesting confirmation of.
2007/4/13, Thomas Hartman
: You may be missing a few recursive calls there :-)
Indeed.
I'm confused.
Is this a legitimate stable quicksort, or not? (My guess is, it is indeed legit as written.)
This was also the first I have heard of stability as a sort property.
http://perldoc.perl.org/sort.html may shed some light on this...
"A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. "
Is this description a fair characterization of stability for the current discussion?
I'm a bit confused but if I understand correctly sort from the prelude is non stable quicksort, which has O(k n^2) as the worst case, whereas stable quicksort has O( k* log n + n).
non-stable quicksort is just sort from the prelude:
qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
If any in the above was incorrect, please holler.
2007/4/12, Stefan O'Rear
: On Wed, Apr 11, 2007 at 09:20:12PM -0700, Tim Chevalier wrote:
On 4/11/07, Stefan O'Rear
wrote: If you want to be really explicit about it, here is a sort that will work:
sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw)
You may be missing a few recursive calls there :-)
Indeed.
Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

G'day all.
Quoting Thomas Hartman
Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?
It's worth considering what the theoretical minimum is. You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information. A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is: O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k)) That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k). Cheers, Andrew Bromage

On Fri, Apr 13, 2007 at 07:32:20AM -0400, ajb@spamcop.net wrote:
Quoting Thomas Hartman
: Does that mean you can have the k minima in O(n) time, where n is length of list, which would seem to be an improvement?
It's worth considering what the theoretical minimum is.
You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information.
A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is:
O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))
That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).
Hmm, is something wrong with the following?: Tuple each element with its position: O(n) Find kth smallest element in linear time, as per [1]: O(n) Filter list for elements <= kth smallest: O(n) Sort filtered list by position: O(k log k) map snd to get the positions: O(k) Total: O(n + k log k) (the filter step will take care of elements with the same value as the kth smallest, as the filter is also comparing element positions when the values are the same). Thanks Ian [1] http://en.wikipedia.org/wiki/Selection_algorithm

On 4/13/07, Ian Lynagh
Tuple each element with its position: O(n) Find kth smallest element in linear time, as per [1]: O(n) Filter list for elements <= kth smallest: O(n) Sort filtered list by position: O(k log k) map snd to get the positions: O(k)
Total: O(n + k log k)
Inspired by the above, I thought I'd see about writing it. Note that attaching the indices prevents equal items from comparing equal. I didn't feel like writing the code for a special data type that ignored a second element for the purposes of comparisons; that just means replacing "zip" and "map send". The user can add stability by selective use of reverse within the continuation functions. There should probably be strictness annotations somewhere, and calls to "length" should probably be accumulated in partition instead, but the idea should be sound (except for the likelihood of a bad pivot). partition cont _ [] lt eq gt = cont lt eq gt partition cont p (x:xs) lt eq gt = case x `compare` p of LT -> partition cont p xs (x:lt) eq gt EQ -> partition cont p xs lt (x:eq) gt GT -> partition cont p xs lt eq (x:gt) qsort [] = [] qsort (x:xs) = partition qs' x xs [] [x] [] where qs' lt eq gt = qsort lt ++ (eq ++ qsort gt) findfirst _ [] = [] findfirst k (x:xs) = partition ff' x xs [] [x] [] where ff' lt eq gt = let { ll = length lt; lle = ll + length eq } in if k < ll then findfirst k lt else if k > lle then lt ++ eq ++ findfirst (k - lle) gt else lt ++ take (k - ll) eq getSmallest k = qsort . findfirst k getSmallestIndices k = map snd . getSmallest k . flip zip [0..]

ajb@spamcop.net wrote:
You have n elements, and you need to locate a specific k-element permutation. There are n! / (n-k)! such permutations. You therefore need log(n! / (n-k)!) bits of information.
A binary comparison provides one bit of information. So the number of comparisons that you need to get that much information is:
O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))
That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).
Ian Lynagh wrote:
Hmm, is something wrong with the following?: [...] Total: O(n + k log k)
Mh, I'm not sure. At least, we have log (n! / (n-k)!) = log n! - log (n-k)! = log 1 + log 2 + log 3 + ... + log (n-k) + ... + log n - log 1 - log 2 - log 3 - ... - log (n-k) = log (n-k +1) + ... + log (n-k +k) which is greater than (k log (n-k)) and smaller than (k log n). For k=1, this estimate yields a minimum time of (log n) to find the minimum of a list. While not wrong, this clearly underestimates the necessary O(n). I think that estimating (n log (n/(n-k)) to n for k << n is not correct because the logarithm of 1 = n/n is 0 and not 1 :) Ian Lynagh wrote:
Thanks for the link, Ian. It clearly shows that a lazy take k . qsort takes only O(n + k log k) time. I posted an analysis as follow up to the old thread on haskell-general http://article.gmane.org/gmane.comp.lang.haskell.general/15110 Regards, apfelmus

G'day all. I wrote:
O(log(n! / (n-k)!)) = O(n log n - (n-k) log (n-k)) = O(n log (n/(n-k)) + k log (n-k))
That looks right to me. If k << n, then this simplifies to O(n + k log n), and if k is close to n, it simplifies to O(n log n + k).
Quoting Ian Lynagh
Hmm, is something wrong with the following?: [...] Total: O(n + k log k)
The problem with with my simplifications. :-) But as an exercise, prove: O(n log (n/(n-k)) + k log (n-k)) <= O(n + k log k) Cheers, Andrew Bromage

Hello,
My Hugs tells me this:
Prelude> let sort [] = []; sort l@(x:_) = filter (
And for reference, here is again stefan's "stable" quicksort from his earlier post.
" sort [] = [] sort l@(x:_) = filter (
x) l (A stable quicksort, btw) "
This is the code whose legitimacy I am requesting confirmation of.
2007/4/13, Thomas Hartman
: You may be missing a few recursive calls there :-)
Indeed.
...

G'day all.
Quoting raghu vardhan
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.
Pretty much like everyone has says, although it's best to use a real lazy O(n log n) sort, not quicksort-with-dumbest-pivot. To get the indices, use the Schwartzian transform: sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[]) mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss) mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..] Cheers, Andrew Bromage

On 4/12/07, ajb@spamcop.net
To get the indices, use the Schwartzian transform:
sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[])
mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss)
mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss
mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys
getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..]
For my own edification, what is the benefit of this sortWith over sortBy? f `on` g = \ x y -> f ( g x ) ( g y ) kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..]

Kurt Hutchinson wrote:
On 4/12/07, ajb@spamcop.net
wrote: To get the indices, use the Schwartzian transform:
sortWith :: (Ord b) => (a -> b) -> [a] -> [a] sortWith f = mergeRuns . runs where runs = map (:[])
mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss)
mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss
mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT -> x : mergeOne xs ys' GT -> y : mergeOne xs' ys EQ -> x : y : mergeOne xs ys
getKMinima :: (Ord a) => [a] -> [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..]
For my own edification, what is the benefit of this sortWith over sortBy?
f `on` g = \ x y -> f ( g x ) ( g y ) kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..]
possibly related (newbie question): pairs are instances of Ord, why not directly sort those (implying the item to be sorted is fst):
kminima k = \list -> map snd $ take k $ sort $ zip list [0..]

G'day all.
Quoting Kurt Hutchinson
For my own edification, what is the benefit of this sortWith over sortBy?
None. I wanted to write my own sort, illustrating how the lazy evaluation thing works, and I didn't want a name clash with an existing library function. Cheers, Andrew Bromage

On Thu, 12 Apr 2007, raghu vardhan wrote:
What's the best way to implement the following function in haskell: Given a list and an integer k as input return the indices of the least k elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations.
R M
I don't know about performance, but trying this problem I was struck again by the gorgeous, terse code that can be created: import Data.List import Data.Ord kminima :: (Enum a, Num a, Ord b) => Int -> [b] -> [a] kminima k lst = take k $ map fst $ sortBy (comparing snd) $ zip [0 ..] lst commented: kminima k lst = take k -- We want k items off the front $ map fst -- Just the list of indices $ sortBy (comparing snd) -- Sort the pairs by their snd $ zip [0 ..] lst -- Preserve indices in tuples Prelude> :l kminima.hs [1 of 1] Compiling Main ( kminima.lhs, interpreted ) Ok, modules loaded: Main. *Main> kminima 3 [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36] [14,3,4] *Main> kminima 4 [10,9,8,7,6,5,4,3,2,1] [9,8,7,6] -- .~. Dino Morelli /V\ email: dino@ui3.info /( )\ irc: dino- ^^-^^ preferred distro: Debian GNU/Linux http://www.debian.org
participants (13)
-
ajb@spamcop.net
-
apfelmus
-
Colin DeVilbiss
-
Dino Morelli
-
dons@cse.unsw.edu.au
-
Ian Lynagh
-
Kurt Hutchinson
-
raghu vardhan
-
Stefan O'Rear
-
Thomas Hartman
-
Thorkil Naur
-
Tim Chevalier
-
Vincent Kraeutler