
Hi again folks, I am still at it with my time-series problem, for those who haven't been following; I have a list of (time stamp, value) pairs and I need to do various bits and bobs with them. I have got arithmetic down pat now, thanks to the kind help of various members of the list - now I am looking at functions that look at some historical data in the time-series and do some work on that to give me an answer for a particular day. I have chosen to represent my time series in reverse date order, since non of the operations will ever want to look into the future, but often they would like to look in to the past. A function I would like to write is 'avg'. For a particular day, it computes the average of the values last 'n' points; if there are not n points to fetch, thee is no answer. I then combine those to make a new time series. e.g. If my input time series was [(5,10),(4,20),(3,30),(2,40), (1,50)] (Where 5, 4, 3, 2, 1 are timestamps and 10, 20, 30, 50, 50 are values) I would like the answer [(5,20), (4,30), (3,40)] (e.g. 20 = (10+20+30)/3 etc.. I can't get an answer for timestamps 2 and 1 because there isn't enough historical data) So I have written some code to do this, and it works nicely enough; but it is _slow_. To do 1000 averages of different lengths on a series with only 3000 points takes about 200 seconds on my (not overly shabby) laptop. The equivalent C program takes under a second. I am entirely sure that this is due to some failing on my part. I have been mucking around with the profiler all afternoon lazifying and delazifying various bits and bobs with no dramatic success so I thought I might put it to y'all if you don't mind! So here's some code. I've kept it quite general because there are a lot of functions I would like to implement that do similar things with bits of historical data. General comments on the Haskellyness/goodness of my code are welcomed as well, I'm still very much a beginner at this! --------- SNIP -------------- -- Take n elements from a list if at least n exist takeMaybe n l | length l < n = Nothing | otherwise = Just $! (take n l) -- Little utility function, take a function f and apply it to the whole list, -- then the tail etc... lMap _ [] = [] lMap f (x:xs) = (f (x:xs)):(lMap f xs) -- Little utility function to take a list containing Maybes and delete them -- Returning a list with the values inside the Just maybeListToList [] = [] maybeListToList (x:xs) = maybe (maybeListToList xs) (\y -> y:(maybeListToList xs)) x -- Return a list of lists, where each sublist is a list of the next n values histMaybe x = lMap (takeMaybe x) hist n x = maybeListToList $ histMaybe n x -- Take a function which works on a list of things and apply it only to a -- list of the second elements in a list of tuples 'l'. applyToValues f l = let (ts,vs) = unzip l in zip ts $ f vs -- Create a timeseries with the cumulative sum of the last n values cumL n l = map sum (hist n l) cum = applyToValues . cumL -- Creates a timeseries with the average of the last n values avgL n l = map ((*) (1/fromIntegral(n))) $ cumL n l avg = applyToValues . avgL --------- SNIP -------------- According to the profiler (log attached), the vast majority of the time is spent in takeMaybe, presumably allocating and deallocating enormous amounts of memory for each of my little temporary sublists. I have tried liberally sprinkling $! and 'seq' about, thinking that might help but I am clearly not doing it right. Perhaps list is the wrong basic data structure for what I am doing? I hope I didn't bore you with that rather long email, I will leave it at that. If it would be useful, I could give you the complete program with a data set if anyone is keen enough to try for themselves. Thanks, Philip