
Am Freitag 31 Juli 2009 22:39:53 schrieb Paul Moore:
How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).
Clearly, this won't be possible for infinite input, but I would like it to be as efficient as possible for (lazy) input lists containing many millions of elements. So an implementation based on group . sort is not going to work.
In an imperative language like Python, I'd use a dictionary as an accumulator - something like
for el in input: accums[i] = accums.get(i, 0) + 1
That's about as efficient as you can hope for (assuming an efficient dictionary implementation). How would I code something equivalent in Haskell?
If the elements come from a relatively small range and are suitable for array indexing, import Data.Array.IArray import Data.Array.Unboxed accumArray :: (IArray a e, Ix i) => (e -> e' -> e) -- ^ An accumulating function -> e -- ^ A default element -> (i,i) -- ^ The bounds of the array -> [(i, e')] -- ^ List of associations -> a i e -- ^ Returns: the array accumArray (+) 0 (mini,maxi) $ zip list (repeat 1) is pretty fast (beats the hell out of Data.Map). If your elements can't be unboxed, the accumArray function from Data.Array does it too, albeit much slower (still faster than Data.Map, in my experience).
Thanks, Paul.