Since I'm interested in the stack overflow
issue, and getting acquainted with quickcheck, I thought I would take this
opportunity to compare your ordTable with some code Yitzchak Gale posted
earlier, against Ham's original problem.
As far as I can tell, they're the same.
They work on lists up to 100000 element lists of strings, but on 10^6 size
lists I lose patience waiting for them to finish.
Is there a more scientific way of figuring
out if one version is better than the other by using, say profiling tools?
Or by reasoning about the code?
t.
****************************************
import Data.List
import qualified Data.Map as M
import Control.Arrow
import Test.QuickCheck
import Test.GenTestData
import System.Random
{-
Is there a library function to take
a list of Strings and return a list of
ints showing how many times each String
occurs in the list.
So for example:
["egg", "egg",
"cheese"] would return [2,1]
-}
testYitzGale n = do
l <- rgenBndStrRow (10,10)
(10^n,10^n) -- 100000 strings, strings are 10 chars long, works.
craps out on 10^6.
m <- return $ freqFold l
putStrLn $ "map items:
" ++ ( show $ M.size m )
testCScherer n = do
l <- rgenBndStrRow (10,10)
(10^n,10^n) -- same limitations as yitz gale code.
m <- return $ ordTable l
putStrLn $ "items: "
++ ( show $ length m )
-- slow for big lists
--freqArr = Prelude.map ( last &&&
length ) . group . sort
-- yitz gale code. same as chad scherer
code? it's simpler to understand, but is it as fast?
freqFold :: [[Char]] -> M.Map [Char]
Int
freqFold = foldl' g M.empty
where g accum x = M.insertWith'
(+) x 1 accum
-- c scherer code. insists on ord.
far as I can tell, same speed as yitz.
ordTable :: (Ord a) => [a] ->
[(a,Int)]
ordTable xs = M.assocs $! foldl' f
M.empty xs
where f m x = let m'
= M.insertWith (+) x 1 m
Just v = M.lookup x m'
in v `seq` m'
gen gen = do
sg <- newStdGen
return $ generate 10000 sg gen
-- generator for a list with length
between min and max
genBndList :: Arbitrary a => (Int,
Int) -> Gen [a]
genBndList (min,max) = do
len <- choose (min,max)
vector len
-- lists of lists
--genBndLoL :: (Int, Int) -> (Int,
Int) -> Gen [[a]]
genBndLoL (min1,max1) (min2,max2) =
do
len1 <- choose (min1,max1)
len2 <- choose (min2,max2)
vec2 len1 len2
--vec2 :: Arbitrary a => Int ->
Int -> Gen [[a]]
vec2 n m = sequence [ vector m | i
<- [1..n] ]
Chad Scherrer <chad.scherrer@gmail.com> Sent by: haskell-cafe-bounces@haskell.org
>
>
> Is there a library function to take a list of Strings and return a
list of
> ints showing how many times each String occurs in the list.
>
> So for example:
>
> ["egg", "egg", "cheese"] would return
[2,1]
>
> I couldn't find anything on a search, or anything in the librarys.
>
> Thanks BH.
Hi BH,
This might be overkill, but it works well for me. And it avoid stack overflows
I
was originally getting for very large lists. Dean Herrington and I came
up with
this:
ordTable :: (Ord a) => [a] -> [(a,Int)]
ordTable xs = Map.assocs $! foldl' f Map.empty xs
where f m x = let m' = Map.insertWith (+) x 1 m
Just v = Map.lookup x m'
in v `seq`
m'
intTable :: [Int] -> [(Int,Int)]
intTable xs = IntMap.assocs $! foldl' f IntMap.empty xs
where f m x = let m' = IntMap.insertWith (+) x 1 m
Just v = IntMap.lookup x m'
in v `seq`
m'
If you like, it's easily wrapped in a Table class.
Chad
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.