Re: Naive question on lists of duplicates

7 Jun 2003

      On Thu, Jun 05, 2003 at 08:09:02AM -0500, Stecher, Jack wrote:
...
I have an exceedingly simple problem to address, and am wondering if
there are relatively straightforward ways to improve the efficiency
of my solution.
Was there actually a problem with the efficiency of your first code?
...
The task is simply to look at a lengthy list of stock keeping units
(SKUs -- what retailers call individual items), stores, dates that a
promotion started, dates the promotion ended, and something like
sales amount; we want to pull out the records where promotions
overlap.  I will have dates in yyyymmdd format, so there's probably
no harm in treating them as Ints.
(Unless this is really a one-shot deal, I suspect using Ints for dates
is a bad decision...)
...
My suggestion went something like this (I'm not at my desk so I
don't have exactly what I typed):
I have a different algorithm, which should be nearly optimal, but I
find it harder to describe than to show the code (which is untested):
...
import List(sortBy, insertBy)
data PromotionRec  = PR {sku :: String, store :: String, startDate :: Int, endDate :: Int, amount::Float}
compareStart, compareEnd :: PromotionRec -> PromotionRec -> Ordering
compareStart x y = compare (startDate x) (startDate y)
compareEnd x y = compare (endDate x) (endDate y)
...
overlap :: [PromoRec] -> [[PromoRec]]
overlap l = filter (lambda l. length l > 1) 
               (overlap' [] (sortBy compareStart l))
overlap' _ [] = []
overlap' active (x:xs) =
  let {active' = dropWhile (lambda y. endDate y < startDate x) active} in
  (x:active') : overlap' (insertBy compareEnd x active') xs
The key is that, by keeping a list of the currently active promotions
in order sorted by the ending date, we only need to discared an
initial portion of the list.

You could get a moderately more efficient implementation by keeping
the active list as a heap rather than a list.

Peace,
	Dylan

Re: Naive question on lists of duplicates

Dylan Thurston