
On 10/25/05, Charles SDudu
Hello, I need to calculate the frequency of each character in a String. And if I can do this really well in C, I dont find a nice (and fast) answer in haskell. I tried several functions, listed below, and even the fastest do a lot of unnecessary things :
calc :: String -> [ (Char, Int) ]
-- 3.0s normally (without profiling) -- time 10-12% alloc 59% (info from profiling) -- so it's the fastest when I profile but not when I compile normally -- mutable array may be better but it's to complicated for me
calc = filter (\p -> snd p > 0) . assocs . foldl (\k c -> unsafeReplace k [(fromEnum c, (unsafeAt k (fromEnum c))+1)] ) k where k = array (toEnum 0, toEnum 255) [(toEnum i, 0) | i <- [0 .. 255]] :: UArray Char Int
-- 2.1s normally -- time 15-19% alloc 40% (info from profiling) -- so for true, it's the best but the sort and group probably do unnecessary things calc s = map (\l -> (head l, length l)) $ group $ sort s
-- 3.4s normally -- time 58% alloc 0% (info from profiling) -- this one dont do unnecessary things but has to read the file again for each character -- calc s = map (\c -> (c, foldl (\a b -> if b==c then a+1 else a) 0 s)) $ nub s
-- 22s normally -- time 85% alloc 92% (info from profiling) -- this one read the file only one time but is really slow calc = foldl (addfreq) [] where addfreq f c = let xs1 = takeWhile (\f -> fst f /= c) f xs2 = dropWhile (\f -> fst f /= c) f xs = if null xs2 then [(c,1)] else ((fst . head) xs2, (snd . head) xs2 + 1) : tail xs2 in xs1 ++ xs
-- I have a lot of even slower version but I wont include them -- each compilation was done with GHC 6.4.1 with the -O flag and with -O -prof -auto-all for profiling
Try this:
import Data.Array.ST import Data.Array.Base import Control.Monad import Control.Monad.ST import Data.Word import Data.Char
main = do c <- getContents print (frequency c)
frequency str = runST (frequency' str)
frequency' :: String -> ST s [(Char,Int)] frequency' str = do arr <- newArray ('\0','\255') 0 :: ST s (STUArray s Char Int) flip mapM_ str $ \c -> do r <- unsafeRead arr (ord c) unsafeWrite arr (ord c) (r+1) liftM (filter (\(c,n) -> n>0)) (getAssocs arr)
-- Friendly, Lemmih