Hmm, is insertWith' new? If I remember right, I think the stack overflows
were happening because Map.insertWith isn't strict enough. Otherwise I think
the code is the same. But I would expect intTable to be faster, since it
uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be
easy enough to add one).
Chad
On 10/17/07, Thomas Hartman
Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem.
As far as I can tell, they're the same. They work on lists up to 100000 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish.
Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools?
Or by reasoning about the code?
t.
****************************************
import Data.List import qualified Data.Map as M import Control.Arrow import Test.QuickCheck import Test.GenTestData import System.Random
{- Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list.
So for example:
["egg", "egg", "cheese"] would return [2,1] -}
testYitzGale n = do l <- rgenBndStrRow (10,10) (10^n,10^n) -- 100000 strings, strings are 10 chars long, works. craps out on 10^6. m <- return $ freqFold l putStrLn $ "map items: " ++ ( show $ M.size m )
testCScherer n = do l <- rgenBndStrRow (10,10) (10^n,10^n) -- same limitations as yitz gale code. m <- return $ ordTable l putStrLn $ "items: " ++ ( show $ length m )
-- slow for big lists --freqArr = Prelude.map ( last &&& length ) . group . sort
-- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast? freqFold :: [[Char]] -> M.Map [Char] Int freqFold = foldl' g M.empty where g accum x = M.insertWith' (+) x 1 accum -- c scherer code. insists on ord. far as I can tell, same speed as yitz. ordTable :: (Ord a) => [a] -> [(a,Int)] ordTable xs = M.assocs $! foldl' f M.empty xs where f m x = let m' = M.insertWith (+) x 1 m Just v = M.lookup x m' in v `seq` m'
l = ["egg","egg","cheese"]
-- other quickcheck stuff --prop_unchanged_by_reverse = \l -> ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l ) --prop_freqArr_eq_freqFold = \l -> ( freqArr (l :: [[Char]]) == (freqFold l)) --test1 = quickCheck prop_unchanged_by_reverse --test2 = quickCheck prop_freqArr_eq_freqFold
--------------- generate test data: genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) )
gen gen = do sg <- newStdGen return $ generate 10000 sg gen
-- generator for a list with length between min and max genBndList :: Arbitrary a => (Int, Int) -> Gen [a] genBndList (min,max) = do len <- choose (min,max) vector len
-- lists of lists --genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]] genBndLoL (min1,max1) (min2,max2) = do len1 <- choose (min1,max1) len2 <- choose (min2,max2) vec2 len1 len2
--vec2 :: Arbitrary a => Int -> Int -> Gen [[a]] vec2 n m = sequence [ vector m | i <- [1..n] ]