Re: [Haskell-cafe] Efficient functional idiom for histogram

31 Jul 2009

      Am Freitag 31 Juli 2009 22:39:53 schrieb Paul Moore:
...
How would I efficiently write a function in Haskell to count
occurrences of unique elements in a (potentially very large) list? For
example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
[[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
representation).
Clearly, this won't be possible for infinite input, but I would like
it to be as efficient as possible for (lazy) input lists containing
many millions of elements. So an implementation based on group . sort
is not going to work.
In an imperative language like Python, I'd use a dictionary as an
accumulator - something like
for el in input:
        accums[i] = accums.get(i, 0) + 1
That's about as efficient as you can hope for (assuming an efficient
dictionary implementation). How would I code something equivalent in
Haskell?
If the elements come from a relatively small range and are suitable for array indexing,

import Data.Array.IArray
import Data.Array.Unboxed

accumArray :: (IArray a e, Ix i)
           => (e -> e' -> e)     -- ^ An accumulating function
           -> e                  -- ^ A default element
           -> (i,i)              -- ^ The bounds of the array
           -> [(i, e')]          -- ^ List of associations
           -> a i e              -- ^ Returns: the array

accumArray (+) 0 (mini,maxi) $ zip list (repeat 1)

is pretty fast (beats the hell out of Data.Map). If your elements can't be unboxed, the 
accumArray function from Data.Array does it too, albeit much slower (still faster than 
Data.Map, in my experience).
...
Thanks,
Paul.

Re: [Haskell-cafe] Efficient functional idiom for histogram

Daniel Fischer