Re: [Haskell-cafe] Re: FASTER primes

9 Jan 2010


      Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
...
Daniel Fischer  writes:
...
Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
...
Daniel Fischer  writes:
It's not tail-recursive, the recursive call is inside a celebrate.
It is (spMerge that is).
No.
"In computer science, tail recursion (or tail-end recursion) is a special 
case of recursion in which the last operation of the function, the tail 
call, is a recursive call."

The last operation of spMerge is a call to celebrate or the pair 
constructor (be that P or (,)). Doesn't matter, though, as for lazy 
languages, tail recursion isn't very important.
...
It calls tail-recursive celebrate in a tail
position. What you've done, is to eliminate the outstanding context, by
moving it inward. Your detailed explanation is more clear than that. :)
BTW when I run VIP code it is consistently slower than using just pairs,
I can't reproduce that. Ceteris paribus, I get the exact same allocation 
and GC figures whether I use People or (,), running times identical enough 
(difference between People and (,) is smaller than the difference between 
runs of the same; the difference between the fastest and the slowest run of 
the two is less than 0.5%). I think it must be the other changes you made.
...
modified with wheel and feeder and all. So what's needed is to
re-implement your approach for pairs:
mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
                           in (a ++ bc, bd)
     where
      spMerge u [] d = ([], merge u d)
      spMerge u@(x:xs) w@(y:ys) d = case compare x y of
               LT -> consSP x $ spMerge xs w  d
               EQ -> consSP x $ spMerge xs ys d
               GT -> consSP y $ spMerge u  ys d
consSP x ~(a,b) = (x:a,b)   -- don't forget that magic `~` !!!
I called that (<:).
...
BTW I'm able to eliminate sharing without a compiler switch by using
Yes, I can too. But it's easy to make a false step and trigger sharing.

I can get a nice speedup (~15%, mostly due to much less garbage collecting) 
by doing the final merge in a function without unnecessarily wrapping the 
result in a pair (whose secondcomponent is ignored):

-- Doesn't need -fno-cse anymore,
-- but it needs -XScopedTypeVariables for the local type signatures

primes :: forall a. Integral a => () -> [a]
primes () = 2:3:5:7:11:13:calcPrimes 17 primes''
   where
    calcPrimes s cs = rollFrom s `minus` compos cs
    bootstrap   = 17:19:23:29:31:37:calcPrimes 41 bootstrap
    primes'     = calcPrimes 17 bootstrap
    primes''    = calcPrimes 17 primes'

    pmults :: a -> ([a],[a])
    pmults p = case map (*p) (rollFrom p) of
                (x:xs) -> ([x],xs)

    multip :: [a] -> [([a],[a])]
    multip ps = map pmults ps

    compos :: [a] -> [a]
    compos ps = case pairwise mergeSP (multip ps) of
                    ((a,b):cs) -> a ++ funMerge b (pairwise mergeSP cs)

    funMerge b (x:y:zs) = let (c,d) = mergeSP x y
                            in mfun b c d (pairwise mergeSP zs)

    mfun u@(x:xs) w@(y:ys) d  l = case compare x y of
                LT -> x:mfun xs w d l
                EQ -> x:mfun xs ys d l
                GT -> y:mfun u ys d l
    mfun u [] d l = funMerge (merge u d) l


This uses a different folding structure again, which seems to give slightly 
better performance than the original tree-fold structure. In contrast to 
the VippyPrimes, it profits much from a larger allocation area, running 
with +RTS -A2M gives a >10% speedup for prime # 10M/20M, +RTS -A8M nearly 
20%. -A16M and -A32M buy a little more, but in that range at least, it's 
not much (may be significant for larger targets).
Still way slower than PQ, but the gap is narrowing.
...
mtwprimes () = 2:3:5:7:primes
   where
    primes = doPrimes 121 primes
doPrimes n prs = let (h,t) = span (< n) $ rollFrom 11
                  in h ++ t `diff` comps prs
 doPrimes2 n prs = let (h,t) = span (< n) $ rollFrom (12-1)
                   in h ++ t `diff` comps prs
mtw2primes () = 2:3:5:7:primes
   where
    primes  = doPrimes 26 primes2
    primes2 = doPrimes2 121 primes2
Using 'splitAt 26' in place of 'span (< 121)' didn't work though.
How about them wheels? :)
Well, what about them?