Hmm, is insertWith' new? If I remember right, I think the stack overflows were happening because Map.insertWith isn't strict enough. Otherwise I think the code is the same. But I would expect intTable to be faster, since it uses IntMap, and there's no
IntMap.insertWith' as of 6.6.1 (though it may be easy enough to add one).
Chad
Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem.
As far as I can tell, they're the same. They work on lists up to 100000 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish.
Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools?
Or by reasoning about the code?
t.
****************************************
import Data.List
import qualified Data.Map as M
import Control.Arrow
import Test.QuickCheck
import Test.GenTestData
import System.Random
{-
Is there a library function to take a list of Strings and return a list of
ints showing how many times each String occurs in the list.
So for example:
["egg", "egg", "cheese"] would return [2,1]
-}
testYitzGale n = do
l <- rgenBndStrRow (10,10) (10^n,10^n) -- 100000 strings, strings are 10 chars long, works. craps out on 10^6.
m <- return $ freqFold l
putStrLn $ "map items: " ++ ( show $ M.size m )
testCScherer n = do
l <- rgenBndStrRow (10,10) (10^n,10^n) -- same limitations as yitz gale code.
m <- return $ ordTable l
putStrLn $ "items: " ++ ( show $ length m )
-- slow for big lists
--freqArr = Prelude.map ( last &&& length ) . group . sort
-- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast?
freqFold :: [[Char]] -> M.Map [Char] Int
freqFold = foldl' g M.empty
where g accum x = M.insertWith' (+) x 1 accum
-- c scherer code. insists on ord. far as I can tell, same speed as yitz.
ordTable :: (Ord a) => [a] -> [(a,Int)]
ordTable xs = M.assocs $! foldl' f M.empty xs
where f m x = let m' = M.insertWith (+) x 1 m
Just v = M.lookup x m'
in v `seq` m'
l = ["egg","egg","cheese"]
-- other quickcheck stuff
--prop_unchanged_by_reverse = \l -> ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l )
--prop_freqArr_eq_freqFold = \l -> ( freqArr (l :: [[Char]]) == (freqFold l))
--test1 = quickCheck prop_unchanged_by_reverse
--test2 = quickCheck prop_freqArr_eq_freqFold
--------------- generate test data:
genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) )
gen gen = do
sg <- newStdGen
return $ generate 10000 sg gen
-- generator for a list with length between min and max
genBndList :: Arbitrary a => (Int, Int) -> Gen [a]
genBndList (min,max) = do
len <- choose (min,max)
vector len
-- lists of lists
--genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]]
genBndLoL (min1,max1) (min2,max2) = do
len1 <- choose (min1,max1)
len2 <- choose (min2,max2)
vec2 len1 len2
--vec2 :: Arbitrary a => Int -> Int -> Gen [[a]]
vec2 n m = sequence [ vector m | i <- [1..n] ]