
Dear Malcolm, Ian, et al. On Thu, 2009-11-19 at 04:14 +0000, Malcolm Wallace wrote:
I suppose the only convincing argument is empirical.
Hear, hear! Although... the empirical argument changes with the versions of the compiler. Anyone up for an evaluation of alternative implementations of all functions in base? :p
Using the following simple and unscientific benchmark, it turns out that take +drop is ~ 2x faster than splitAt. Maybe list fusion or something is kicking in.
Is it possible that the tuple wrapping and unwrapping in the splitAt-implementation hurts optimization? On Wed, 2009-11-18 at 14:47 +0000, Ian Lynagh wrote:
My breaks has generally been such that breaks "123,456,,78" == ["123", "456", "", "78"] but the details probably depend on exactly what I've been using it for.
I don't remember ever needing yours. I'd have thought that breaks :: (a -> Bool) -> [a] -> [([a], [a])] would make more sense, but personally I'd vote for not adding a breaks at all.
The more I looked at the spans/breaks, the more I figured that didn't quite cover the majority of cases for which I hack in extra functionality. I think I have it down to the bare bones of what I was missing; there's no function in Data.List to segment a list based on sequence properties. In other words, there is no way to extract runs (or clumps, if you prefer). An alternative suggestion, thus: runs :: (a -> a -> Bool) -> [a] -> [[a]] runs p xs = ... which produces a list of runs, i.e. the first result is that prefix of xs, such that for all consecutive elements e_i, e_{i+1}, the property holds, i.e. p e_i e_{i+1} -->> True. Although not exactly equivalent, spans' can be implemented as: spans' p = runs (\x y -> p x == p y) the difference being the first span:
spans odd [2,3,4,5,7,9,8,0,3,5,9] [[],[2],[3],[4],[5,7,9],[8,0],[3,5,9]] spans' odd [2,3,4,5,7,9,8,0,3,5,9] [[2],[3],[4],[5,7,9],[8,0],[3,5,9]]
This difference is of no consequence to the types of programs I used spans in. This new implementation makes spans so simple that inclusion in Data.List is no longer necessary. So my new proposal would be to include groupsOf and runs. Now for the empirical stuff. I have two implementations: runs :: (a -> a -> Bool) -> [a] -> [[a]] runs _ [ ] = [] runs _ [x] = [[x]] runs p (x:xs) = r : runs p xs' where (r,xs') = run x xs cons' x (xs,y) = (x:xs,y) run y [] = ([y],[]) run y l@(x:xs) | p y x = cons' y $ run x xs | otherwise = ([y],l) runsAlt :: (a -> a -> Bool) -> [a] -> [[a]] runsAlt _ [] = [[]] runsAlt _ xs@[x] = [xs] runsAlt p (x:xs@(y:_)) | p x y = (x : head xs') : tail xs' | otherwise = [x]:xs' where xs' = runsAlt p xs (I welcome suggestions for improvements.) Used in the program: bigList = concat $ replicate 10000000 [5,6,9,1,3,4,2] main = print (length (runs (>) bigList)) compiled without and with -O2: runs : 5.40s user 0.03s system 99% cpu 5.438 total runsOpt : 4.40s user 0.03s system 99% cpu 4.440 total runsAlt : 4.89s user 0.04s system 99% cpu 4.934 total runsAltOpt : 4.14s user 0.03s system 99% cpu 4.207 total We have a winner ;) Regards, Philip