But if I understand correctly, intTable
can only deal with integer keys, whereas BH's original question would have
wanted string keys, and I can't see a way to convert string to int and
back.
Hmm, is insertWith' new? If I remember right, I think
the stack overflows were happening because Map.insertWith isn't strict
enough. Otherwise I think the code is the same. But I would expect intTable
to be faster, since it uses IntMap, and there's no IntMap.insertWith' as
of 6.6.1 (though it may be easy enough to add one).
Since I'm interested in the stack overflow issue, and getting acquainted
with quickcheck, I thought I would take this opportunity to compare your
ordTable with some code Yitzchak Gale posted earlier, against Ham's original
problem.
As far as I can tell, they're the same. They work on lists up to 100000
element lists of strings, but on 10^6 size lists I lose patience waiting
for them to finish.
Is there a more scientific way of figuring out if one version is better
than the other by using, say profiling tools?
Or by reasoning about the code?
t.
****************************************
import Data.List
import qualified Data.Map as M
import Control.Arrow
import Test.QuickCheck
import Test.GenTestData
import System.Random
{-
Is there a library function to take a list of Strings and return a list
of
ints showing how many times each String occurs in the list.
So for example:
["egg", "egg", "cheese"] would return [2,1]
-}
testYitzGale n = do
l <- rgenBndStrRow (10,10) (10^n,10^n) -- 100000 strings,
strings are 10 chars long, works. craps out on 10^6.
m <- return $ freqFold l
putStrLn $ "map items: " ++ ( show $ M.size m )
testCScherer n = do
l <- rgenBndStrRow (10,10) (10^n,10^n) -- same limitations
as yitz gale code.
m <- return $ ordTable l
putStrLn $ "items: " ++ ( show $ length m )
-- slow for big lists
--freqArr = Prelude.map ( last &&& length ) . group . sort
-- yitz gale code. same as chad scherer code? it's simpler to understand,
but is it as fast?
freqFold :: [[Char]] -> M.Map [Char] Int
freqFold = foldl' g M.empty
where g accum x = M.insertWith' (+) x 1 accum
-- c scherer code. insists on ord. far as I can tell, same speed as yitz.
ordTable :: (Ord a) => [a] -> [(a,Int)]
ordTable xs = M.assocs $! foldl' f M.empty xs
where f m x = let m' = M.insertWith (+) x 1 m
Just v = M.lookup x m'
in v `seq`
m'
gen gen = do
sg <- newStdGen
return $ generate 10000 sg gen
-- generator for a list with length between min and max
genBndList :: Arbitrary a => (Int, Int) -> Gen [a]
genBndList (min,max) = do
len <- choose (min,max)
vector len
-- lists of lists
--genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]]
genBndLoL (min1,max1) (min2,max2) = do
len1 <- choose (min1,max1)
len2 <- choose (min2,max2)
vec2 len1 len2
--vec2 :: Arbitrary a => Int -> Int -> Gen [[a]]
vec2 n m = sequence [ vector m | i <- [1..n] ]
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.