Efficient functional idiom for histogram

older
Basic questions about concurrency...

Paul Moore

31 Jul 2009 31 Jul '09

8:39 p.m.

How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation). Clearly, this won't be possible for infinite input, but I would like it to be as efficient as possible for (lazy) input lists containing many millions of elements. So an implementation based on group . sort is not going to work. In an imperative language like Python, I'd use a dictionary as an accumulator - something like for el in input: accums[i] = accums.get(i, 0) + 1 That's about as efficient as you can hope for (assuming an efficient dictionary implementation). How would I code something equivalent in Haskell? Thanks, Paul.

Show replies by date

Gregory Collins

31 Jul 31 Jul

8:52 p.m.

Paul Moore writes:

...

How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).

import qualified Data.Map as Map import Data.Map (Map) histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m G. -- Gregory Collins

Gregory Collins

8:53 p.m.

Gregory Collins writes:

...

Paul Moore writes:

...
How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).

import qualified Data.Map as Map import Data.Map (Map)

histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m

Here "foldl" should be "foldl'" (d'oh!) G -- Gregory Collins

Paul Moore

10:39 p.m.

2009/7/31 Gregory Collins :

...

Paul Moore writes:

...
How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).

import qualified Data.Map as Map import Data.Map (Map)

histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m

Right. I see how that works, and can work out how to think about this sort of thing from your example. Thanks very much. BTW, I did know that Haskell had an efficient map implementation, I just wasn't sure how to use it "functionally" - I probably should have searched a bit harder for examples before posting. Thanks for the help in any case. Paul.

Deniz Dogan

11:54 p.m.

2009/8/1 Paul Moore :

...

BTW, I did know that Haskell had an efficient map implementation, I just wasn't sure how to use it "functionally" - I probably should have searched a bit harder for examples before posting. Thanks for the help in any case.

Know that Data.Map uses size balanced trees and is not e.g. a hash map. -- Deniz Dogan

Paul Moore

1 Aug 1 Aug

1:44 p.m.

2009/7/31 Paul Moore :

...

2009/7/31 Gregory Collins :

...
Paul Moore writes:

...
How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).

import qualified Data.Map as Map import Data.Map (Map)

histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m

Right. I see how that works, and can work out how to think about this sort of thing from your example. Thanks very much.

BTW, I did know that Haskell had an efficient map implementation, I just wasn't sure how to use it "functionally" - I probably should have searched a bit harder for examples before posting. Thanks for the help in any case.

Hmm, I'm obviously still mucking up the performance somehow. My full program (still a toy, but a step on the way to what I'm aiming at) is as follows. It's rolling 3 6-sided dice 100000 times, and printing a summary of the results. import System.Random import qualified Data.Map as Map import Data.Map (Map) import Data.List dice :: Int -> Int -> IO Int dice 0 n = return 0 dice m n = do total <- dice (m - 1) n roll <- randomRIO (1, n) return (total + roll) simulate count m n = do mapM (dice m) (replicate count n) histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m simulation = do lst <- simulate 100000 3 6 return (histogram lst) main = do s <- simulation putStrLn (show s) When compiled, this takes over twice as long as a naively implemented Python program. What am I doing wrong here? I'd have expected compiled Haskell to be faster than interpreted Python, so obviously my approach is wrong. I'm expecting the answer to be that I've got unnecessary laziness - which is fine, but ultimately my interest is in ease of expression and performance combined, so I'm looking for beginner-level improvements rather than subtle advanced techniques like unboxing. Thanks, Paul. PS I know my code is probably fairly clumsy - I'd appreciate style suggestions, but my main interest here is whether a beginner, with a broad programming background, a basic understanding of Haskell, and access to Google, put together a clear, efficient, program (ie, the case where my usual scripting language is too slow and I want to knock something up quickly in a high-level, high-performance language).

Pekka Karjalainen

2:58 p.m.

On Sat, Aug 1, 2009 at 4:44 PM, Paul Moore wrote:

...

PS I know my code is probably fairly clumsy - I'd appreciate style suggestions, but my main interest here is whether a beginner, with a broad programming background, a basic understanding of Haskell, and access to Google, put together a clear, efficient, program (ie, the case where my usual scripting language is too slow and I want to knock something up quickly in a high-level, high-performance language).

Here is one way to rewrite your program. It improved the speed somewhat for me. I timed both programs on my computer. I suppose one could try using an array for calculating the histogram as well, but I only tried the simples thing here. I hope someone can weigh in with a more thorough analysis. Please note how I've avoided including the IO Monad in some type signatures by extracting the data from it locally (with <-). It is quite possible to apply the histogram function to the data before going through the IO Monad as well, but it doesn't appear to change the execution speed much here. Caveat: My testing wasn't extensive. I just compiled with -O and timed the programs a couple of times. import System.Random import qualified Data.Map as Map import Data.Map (Map) import Data.List diceRolls :: Int -> IO [Int] diceRolls highVal = do generator <- getStdGen return (randomRs (1, highVal) generator) groupDice :: Int -> [Int] -> [[Int]] groupDice chunk rolls = map (take chunk) $ iterate (drop chunk) rolls simulate :: Int -> Int -> Int -> IO [Int] simulate count m n = do rolls <- diceRolls n let sums = map sum $ groupDice m rolls return (take count sums) histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m simulation = do lst <- simulate 100000 3 6 return (histogram $ lst) main = do s <- simulation putStrLn (show s)

Ketil Malde

5:36 p.m.

Paul Moore writes:

...

What am I doing wrong here? I'd have expected compiled Haskell to be faster than interpreted Python, so obviously my approach is wrong.

Two things from my experience: Python associative arrays are fast, and Haskell random numbers are slow. So by building a map from random numbers, you are likely doing a fairly pessimal comparison. Did you profile to see where the time is spent?

...

I'm expecting the answer to be that I've got unnecessary laziness

IME, laziness usually affects space, but not so much time performance. Although 'insertWith (+)' looks like it would build unevaluated thunks for the sums. -k -- If I haven't seen further, it is by standing in the footprints of giants

Paul Moore

6:06 p.m.

2009/8/1 Ketil Malde :

...

Paul Moore writes:

...
What am I doing wrong here? I'd have expected compiled Haskell to be faster than interpreted Python, so obviously my approach is wrong.

Two things from my experience: Python associative arrays are fast, and Haskell random numbers are slow. So by building a map from random numbers, you are likely doing a fairly pessimal comparison.

Hmm, I think you're saying that for this problem, Haskell is likely to be worse than Python. Interesting. I guess I'd assumed that something CPU-intensive like this would benefit from a compiled language. Just goes to show that intuition about performance is usually wrong (something I really should know by now :-)) Is the issue with random numbers just the implementation, or is it something inherent in the non-pure nature of a random number generator that makes it hard for Haskell to implement efficiently? If the latter, that probably makes Haskell a somewhat poor choice for simulation-type programs.

...

Did you profile to see where the time is spent?

Not until you suggested it :-) I was pleasantly surprised by how easy it was to do. The results bear out what you were saying, the bottleneck is entirely in the "dice" function.

...

...
I'm expecting the answer to be that I've got unnecessary laziness

IME, laziness usually affects space, but not so much time performance. Although 'insertWith (+)' looks like it would build unevaluated thunks for the sums.

That'll teach me to bandy about technical terms as if I understand them :-) Thanks for the comments, Paul.

Sterling Clover

6:50 p.m.

On Aug 1, 2009, at 2:06 PM, Paul Moore wrote:

...

Is the issue with random numbers just the implementation, or is it something inherent in the non-pure nature of a random number generator that makes it hard for Haskell to implement efficiently? If the latter, that probably makes Haskell a somewhat poor choice for simulation-type programs.

Well, I'm not sure of the details, but in your original implementation, you're performing IO to pull the seed out of a ref at every iteration. Pekka Karjalainen's doesn't do that, which probably helps with the speedup. Along with that, Haskell has a fairly slow random implementation. As I recall however, this is partially because it hasn't received a great deal of optimization, but mainly because the algorithm itself fulfills some rather strong properties -- in particular it must be fairly statistically robust, and it must provide a "split" function which produces generators that are independently robust [1]. This limits algorithm choice quite a bit. For other random numbers, with different properties (faster, but with tradeoffs in robustness, or ability to split, or both), you can check hackage for at least mersenne-random and gsl-random. There may be others that I don't recall. Cheers, Sterl. [1] http://www.haskell.org/onlinereport/random.html

Anton van Straaten

7:40 p.m.

Sterling Clover wrote:

...

For other random numbers, with different properties (faster, but with tradeoffs in robustness, or ability to split, or both), you can check hackage for at least mersenne-random and gsl-random.

gsl-random is definitely worth a try for performance. I recently did a rough benchmark (not rigorous) of Control.Monad.Random, which uses System.Random, and Control.Monad.MC, which uses GSL.Random. Control.Monad.Random took about 3.6 times longer to generate a few million random numbers compared to Control.Monad.MC, and used five times as much heap. The GSL-based version generated about 340,000 random numbers per second on a 1.8GHz notebook. Anton

Daniel Fischer

8:38 p.m.

Am Samstag 01 August 2009 20:50:10 schrieb Sterling Clover:

...

On Aug 1, 2009, at 2:06 PM, Paul Moore wrote:

...
Is the issue with random numbers just the implementation, or is it something inherent in the non-pure nature of a random number generator that makes it hard for Haskell to implement efficiently? If the latter, that probably makes Haskell a somewhat poor choice for simulation-type programs.

If you view a PRNG as a function from the seed to the sequence of generated numbers or as a function state -> (bitpattern, newstate), PRNGs are pure (at least, I know no counterexample), so it's not inherently inefficient in Haskell, though it's probably still faster in C. One thing that makes StdGen slow is splittability, as Sterling points out below. For a simulation programme where you don't need splittability, choose a different PRNG.

...

Well, I'm not sure of the details, but in your original implementation, you're performing IO to pull the seed out of a ref at every iteration. Pekka Karjalainen's doesn't do that, which probably helps with the speedup. Along with that, Haskell has a fairly slow random implementation. As I recall however, this is partially because it hasn't received a great deal of optimization, but mainly because the algorithm itself fulfills some rather strong properties -- in particular it must be fairly statistically robust, and it must provide a "split" function which produces generators that are independently robust [1]. This limits algorithm choice quite a bit.

For other random numbers, with different properties (faster, but with tradeoffs in robustness, or ability to split, or both), you can check hackage for at least mersenne-random and gsl-random.

I didn't get much speedup with System.Random.Mersenne.Pure64 (might be because I have a 32-bit system and Word64 goes through foreign calls, if that is still the case), but GSL.Random.Gen reduced the time by a factor of over 5. It forces you back into IO and is a little less convenient, but if speed is a concern, it's a price worth to pay. --------------------------------------- module Main (main) where import GSL.Random.Gen import qualified Data.Map as Map import Data.Map (Map) import Data.List import System.IO.Unsafe import System.Time import Data.Word dice :: RNG -> Int -> Int -> IO Int dice _ 0 n = return 0 dice rng m n = do total <- dice rng (m - 1) n roll <- fmap (+1) $ getUniformInt rng n return (total + roll) simulate _ 0 _ _ = return [] simulate rng count m n = unsafeInterleaveIO $ do val <- dice rng m n tl <- simulate rng (count-1) m n return (val:tl) histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl' f Map.empty where f m k = Map.insertWith' (+) k 1 m simulation rng = do lst <- simulate rng 1000000 3 6 return (histogram lst) main = do rng <- newRNG mt19937 sd <- getTimeSeed setSeed rng sd -- omit seeding for reproducible results s <- simulation rng putStrLn (show s) getTimeSeed :: IO Word64 getTimeSeed = do TOD a b <- getClockTime return . fromInteger $ 10^6*a + b `quot` (10^6) --------------------------------------------------

...

There may be others that I don't recall.

Cheers, Sterl.

Dmitry Olshansky

5 Aug 5 Aug

1:14 p.m.

My measurements show that - using strict version of insertWith doesn't improve performance. - in case of compilation with -O2 flag foldl' also is equal to foldl (-O2 gives approx 2 time impovements).- using RandomGen and State monad to generate a list gives at least 4 times improvements (on 1 000 000 items). More complicated improvements (using Array, PRNG and so on) were not tested by me.

Yitzchak Gale

3:01 p.m.

Dmitry Olshansky wrote:

...

My measurements show that... (-O2 gives approx 2 time impovements). ...using RandomGen and State monad to generate a list gives at least 4 times improvements (on 1 000 000 items).

You earlier said:

...

this takes over twice as long as a naively implemented Python program

So now our "naive" Haskell - ordinary usage of Data.Map and System.Random, without resorting to things like unboxed arrays - is beating naive Python handily. Correct? Or is this with an alternate RNG? Although I think even that would be fair, since Python uses Mersenne. Regards, Yitz

Paul Moore

3:42 p.m.

2009/8/5 Yitzchak Gale :

...

Dmitry Olshansky wrote:

...
My measurements show that... (-O2 gives approx 2 time impovements). ...using RandomGen and State monad to generate a list gives at least 4 times improvements (on 1 000 000 items).

You earlier said:

...
this takes over twice as long as a naively implemented Python program

The latter was me :-)

...

So now our "naive" Haskell - ordinary usage of Data.Map and System.Random, without resorting to things like unboxed arrays - is beating naive Python handily. Correct?

I haven't checked myself (and won't have time in the next couple of weeks, as I'm on holiday - but I'll pick this up when I get back)., but it sounds like it. I'd like to check Dmitry's suggestions, mainly to see how they fit with my feel for "naive" (ie, at my beginner level, do I understand why they are more efficient). But I'd expect (compiled) Haskell to beat (interpreted) Python. That's sort of the point, really... The big measures for me are (1) by how much, and (2) how readable is the code.

...

Or is this with an alternate RNG? Although I think even that would be fair, since Python uses Mersenne.

I got the impression Dmitry was using Haskell's standard RNG, not Mersenne Twister. If so, then we'd get further improvements with MT, but that's still a hit against Haskell, as I'd interpret it as meaning that Haskell supplies as default a PRNG which costs noticeable performance in order to provide guarantees that "ordinary" programs don't need. Paul.

Dmitry Olshansky

7:11 p.m.

...

I got the impression Dmitry was using Haskell's standard RNG, not Mersenne Twister. If so, then we'd get further improvements with MT, but that's still a hit against Haskell, as I'd interpret it as meaning that Haskell supplies as default a PRNG which costs noticeable performance in order to provide guarantees that "ordinary" programs don't need.

Paul.

Yes, I used standard RNG just made corrections from the second Daniel Fischer's post. My file is attached. Commented strings are from original version and next lines are from Daniel. (Note that "simulation" and "simulate" types depend on type of "dice"). Just compile with ghc --make -O2 histogram.hs and run histogram +RTS -s As a pre-intermediate ;-) I just check some proposals...

wren ng thornton

11:57 p.m.

Paul Moore wrote:

...

2009/8/5 Yitzchak Gale :

...
Or is this with an alternate RNG? Although I think even that would be fair, since Python uses Mersenne.

I got the impression Dmitry was using Haskell's standard RNG, not Mersenne Twister. If so, then we'd get further improvements with MT, but that's still a hit against Haskell, as I'd interpret it as meaning that Haskell supplies as default a PRNG which costs noticeable performance in order to provide guarantees that "ordinary" programs don't need.

I'm not sure how fair that is. For most "ordinary" programs I've written that use randomness anywhere, they have many different places that want randomness. Having a single global seed can suffice for this, but it introduces severe constraints that are seldom noticed. Namely, these are in fact *P*RNGs. If every place needing randomness is given its own seed, then it becomes possible to store all those seeds, allowing for replay. Replay can be good for capturing the history of a particular run of a simulation/game ---which is difficult at best in non-Haskell approaches. And more critically the seeds can be stored in a "core dump" allowing replay as a debugging tool, allowing reproduction of 'random' bugs which are otherwise quite difficult to track down. Imperative languages could use the same algorithms as Haskell, but they do not; which means these benefits of PRNGs are seldom even thought of. By separating concerns, the Haskell approach not only leads to cleaner code, but that separation can also be used to add new capabilities. The invisible manacles of imperativism should not be looked upon lightly. -- Live well, ~wren

Malcolm Wallace

1 Aug 1 Aug

6:14 p.m.

...

I'm expecting the answer to be that I've got unnecessary laziness - which is fine, but ultimately my interest is in ease of expression and performance combined, so I'm looking for beginner-level improvements rather than subtle advanced techniques like unboxing.

You're right, it is too lazy. Here are a couple of strictifications that should help:

...

histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m

Turn foldl into foldl' (from Data.List) and Map.insertWith into Map.insertWith'. The strict versions simply force the intermediate structures to be evaluated, rather than hanging around as large accumulations of closures. Regards, Malcolm

Daniel Fischer

6:31 p.m.

Am Samstag 01 August 2009 15:44:39 schrieb Paul Moore:

...

2009/7/31 Paul Moore :

...
2009/7/31 Gregory Collins :

Hmm, I'm obviously still mucking up the performance somehow. My full program (still a toy, but a step on the way to what I'm aiming at) is as follows. It's rolling 3 6-sided dice 100000 times, and printing a summary of the results.

import System.Random import qualified Data.Map as Map import Data.Map (Map) import Data.List

dice :: Int -> Int -> IO Int dice 0 n = return 0 dice m n = do total <- dice (m - 1) n roll <- randomRIO (1, n) return (total + roll)

Don't do too much in IO, it's better to separate the pure parts from the IO. IMO, this would better have signature dice :: RandomGen g => Int -> Int -> g -> (Int,g) dice 0 _ g = (0,g) dice m n g = case dice (m-1) n g of (total,g1) -> case randomR (1,n) g1 of (roll,g2) -> (total+roll,g2) or, better still be in a State monad or the Random monad ( http://hackage.haskell.org/package/MonadRandom ) die :: RandomGen g => Int -> State g Int die n = State $ randomR (1,n) dice :: RandomGen g => Int -> Int -> State g Int dice m n = liftM sum $ replicateM m (die n)

...

-- the "do" is superfluous

...

simulate count m n = do mapM (dice m) (replicate count n)

Ouch, that hurts (not yet so incredibly much for 100000 rolls, but if you try 1000000, it'll really hurt). Since you're doing it in IO, the whole list must be built before any further processing can begin, so you're building up a largish list, only to destroy it immediately afterwards, much work for the garbage collector. If you can accumulate the scores as they come, the intermediate list can be fused away and the garbage collector is kept idle. If you absolutely have to do it in IO, use import System.IO.Unsafe simulate 0 _ _ = return [] simulate count m n = unsafeInterleaveIO $ do val <- dice m n rst <- simulate (count-1) m n return (val:rst) to avoid building a large list. If you use the (lazy) State monad, that's automatically done :). simulate count m n = replicateM count (dice m n) -- now in State histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m -- simulation :: RandomGen g => State g [(Int,Int)] simulation = do lst <- simulate 1000000 3 6 return (histogram lst) main = do sg <- getStdGen print $ evalState simulation sg much faster, still not very fast, since StdGen isn't a particularly fast PRNG. Another method is to create an infinite list of random numbers and use it as needed: ------------------------------------------------------- module Main (main) where import System.Random import Data.Array.Unboxed import Data.List import System.Environment (getArgs) dice :: RandomGen g => g -> Int -> [Int] dice g mx = randomRs (1,mx) g splits :: Int -> [a] -> [[a]] splits l = unfoldr f where f xs = case splitAt l xs of r@(h,t) | null t -> Nothing | otherwise -> Just r simulation :: RandomGen g => g -> Int -> Int -> Int -> UArray Int Int simulation g rep dn df = accumArray (+) 0 (dn,dn*df) lst where rls = dice g df scs = splits dn rls lst = take rep [(sum rll,1) | rll <- scs] main :: IO () main = do (rp:dn:df:_) <- getArgs sg <- getStdGen print $ assocs $ simulation sg (read rp) (read dn) (read df) ------------------------------------------------------------- Using an unboxed array instead of a Map gives a little extra speed, but not much.

...

histogram :: Ord a => [a] -> [(a,Int)] histogram = Map.assocs . foldl f Map.empty where f m k = Map.insertWith (+) k 1 m

For some reason it doesn't make much difference here, but it should be the strict versions, foldl' and insertWith' in general.

...

simulation = do lst <- simulate 100000 3 6 return (histogram lst)

main = do s <- simulation putStrLn (show s)

When compiled, this takes over twice as long as a naively implemented Python program.

What am I doing wrong here? I'd have expected compiled Haskell to be faster than interpreted Python, so obviously my approach is wrong. I'm expecting the answer to be that I've got unnecessary laziness

Quite on the contrary, it's unnecessary strictness here :D

...

- which is fine, but ultimately my interest is in ease of expression and performance combined, so I'm looking for beginner-level improvements rather than subtle advanced techniques like unboxing.

Nothing advanced with using unboxed arrays.

...

Thanks, Paul.

PS I know my code is probably fairly clumsy

Actually, the style is rather good, I think (mine's worse, usually). You shouldn't use IO so much, though, and your code betrays a certain level of unfamiliarity with strictness/performance characteristics of the libraries. But that's natural.

...

- I'd appreciate style suggestions, but my main interest here is whether a beginner, with a broad programming background, a basic understanding of Haskell, and access to Google, put together a clear, efficient, program (ie, the case where my usual scripting language is too slow and I want to knock something up quickly in a high-level, high-performance language).

Performance is a nontrivial thing, it takes some experience to know which data structures to use when. And, as said above, Haskell's StdGen isn't fast, the above programme spends about 90% of the time creating pseudo random numbers.

Malcolm Wallace

31 Jul 31 Jul

8:54 p.m.

...

In an imperative language like Python, I'd use a dictionary as an accumulator - something like

for el in input: accums[i] = accums.get(i, 0) + 1

Haskell has efficient dictionary structures too, e.g. Data.Map List.foldl' (\m x-> Map.insertWith' (+) x 1 m) Map.empty Regards, Malcolm

Daniel Fischer

9:14 p.m.

Am Freitag 31 Juli 2009 22:39:53 schrieb Paul Moore:

...

How would I efficiently write a function in Haskell to count occurrences of unique elements in a (potentially very large) list? For example, given the list [1,2,3,4,5,3,4,2,4] I would like the output [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent representation).

Clearly, this won't be possible for infinite input, but I would like it to be as efficient as possible for (lazy) input lists containing many millions of elements. So an implementation based on group . sort is not going to work.

In an imperative language like Python, I'd use a dictionary as an accumulator - something like

for el in input: accums[i] = accums.get(i, 0) + 1

That's about as efficient as you can hope for (assuming an efficient dictionary implementation). How would I code something equivalent in Haskell?

If the elements come from a relatively small range and are suitable for array indexing, import Data.Array.IArray import Data.Array.Unboxed accumArray :: (IArray a e, Ix i) => (e -> e' -> e) -- ^ An accumulating function -> e -- ^ A default element -> (i,i) -- ^ The bounds of the array -> [(i, e')] -- ^ List of associations -> a i e -- ^ Returns: the array accumArray (+) 0 (mini,maxi) $ zip list (repeat 1) is pretty fast (beats the hell out of Data.Map). If your elements can't be unboxed, the accumArray function from Data.Array does it too, albeit much slower (still faster than Data.Map, in my experience).

...

Thanks, Paul.

5812

Age (days ago)

5817

Last active (days ago)

List overview

Download

20 comments

12 participants

participants (12)

Anton van Straaten
Daniel Fischer
Deniz Dogan
Dmitry Olshansky
Gregory Collins
Ketil Malde
Malcolm Wallace
Paul Moore
Pekka Karjalainen
Sterling Clover
wren ng thornton
Yitzchak Gale