>  But I would expect intTable to be faster,

But if I understand correctly, intTable can only deal with integer keys, whereas BH's original question would have wanted string keys, and I can't see a way to convert string to int and back.

t.



"Chad Scherrer" <chad.scherrer@gmail.com>

10/17/2007 11:38 PM

To
Thomas Hartman/ext/dbcom@DBAmericas
cc
haskell-cafe@haskell.org, haskell-cafe-bounces@haskell.org
Subject
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question





Hmm, is insertWith' new? If I remember right, I think the stack overflows were happening because Map.insertWith isn't strict enough. Otherwise I think the code is the same. But I would expect intTable to be faster, since it uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be easy enough to add one).

Chad

On 10/17/07, Thomas Hartman < thomas.hartman@db.com> wrote:

Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem.


As far as I can tell, they're the same. They work on lists up to 100000 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish.


Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools?


Or by reasoning about the code?


t.


****************************************


import Data.List

import qualified Data.Map as M

import Control.Arrow

import Test.QuickCheck

import Test.GenTestData

import System.Random


{-

Is there a library function to take a list of Strings and return a list of

ints showing how many times each String occurs in the list.


So for example:


["egg", "egg", "cheese"] would return [2,1]
-}


testYitzGale n = do

 l <- rgenBndStrRow (10,10) (10^n,10^n)  -- 100000 strings, strings are 10 chars long, works. craps out on 10^6.

 m <- return $ freqFold l    
 putStrLn $ "map items: " ++ ( show $ M.size m )


testCScherer n = do

 l <- rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale code.

 m <- return $ ordTable l    
 putStrLn $ "items: " ++ ( show $ length m )



-- slow for big lists

--freqArr = Prelude.map ( last &&& length ) . group . sort


-- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast?

freqFold :: [[Char]] -> M.Map [Char] Int

freqFold = foldl' g M.empty

 where g accum x = M.insertWith' (+) x 1 accum

-- c scherer code. insists on ord. far as I can tell, same speed as yitz.

ordTable :: (Ord a) => [a] -> [(a,Int)]

ordTable xs = M.assocs $! foldl' f M.empty xs

   where f m x = let  m' = M.insertWith (+) x 1 m

                      Just v = M.lookup x m'

                 in v `seq` m'



l = ["egg","egg","cheese"]


-- other quickcheck stuff

--prop_unchanged_by_reverse = \l -> ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l )

--prop_freqArr_eq_freqFold = \l -> ( freqArr (l :: [[Char]]) == (freqFold l))

--test1 = quickCheck prop_unchanged_by_reverse

--test2 = quickCheck prop_freqArr_eq_freqFold


--------------- generate test data:
genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) )


gen gen = do

 sg <- newStdGen

 return $ generate 10000 sg gen


-- generator for a list with length between min and max

genBndList :: Arbitrary a => (Int, Int) -> Gen [a]

genBndList (min,max) = do

 len <- choose (min,max)

 vector len



-- lists of lists

--genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]]

genBndLoL (min1,max1) (min2,max2) = do

 len1 <- choose (min1,max1)

 len2 <- choose (min2,max2)

 vec2 len1 len2


--vec2 :: Arbitrary a => Int -> Int -> Gen [[a]]

vec2 n m = sequence [ vector m | i <- [1..n] ]





---

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.