
I have some numbers on this. I have a list of first names for males from the census data. I have a function 'male :: String -> Maybe Double' which returns (maybe) the probability of a person being given that name. I have two versions, one based on string matching, the other based on building the data into a trie and then converting the trie into haskell source and the using that to match). For example, we have:
module Male where
male "james" = Just 3.318 male "john" = Just 3.271 male "robert" = Just 3.143 male "michael" = Just 2.629 male "william" = Just 2.451 male "david" = Just 2.363 male "richard" = Just 1.703 ... male _ = Nothing
and in the other version, we have
male = male_start where male_start ('a':xs) = male_66 xs male_start ('b':xs) = male_67 xs male_start ('c':xs) = male_68 xs male_start ('d':xs) = male_69 xs ... male_start _ = Nothing male_66 ('a':xs) = male_66_66 xs male_66 ('b':xs) = male_66_67 xs male_66 ('d':xs) = male_66_69 xs male_66 ('g':xs) = male_66_72 xs ...
Finally, I implemented a version which reads data into a finitemap. the original database contains 1219 names. i test this by taking all male names, all female names, randomizing them, and repeating this data 20 times. this leads to 109880 runs. i ran each implementation 5 times; the results are: using trie: 0.890u 0.020s 0:00.90 101.1% 0+0k 0+0io 327pf+0w 0.910u 0.000s 0:00.90 101.1% 0+0k 0+0io 327pf+0w 0.870u 0.030s 0:00.90 100.0% 0+0k 0+0io 327pf+0w 0.910u 0.020s 0:00.93 100.0% 0+0k 0+0io 327pf+0w 0.920u 0.020s 0:00.95 98.9% 0+0k 0+0io 327pf+0w using string-matching: 10.280u 0.060s 0:10.51 98.3% 0+0k 0+0io 280pf+0w 10.340u 0.030s 0:10.86 95.4% 0+0k 0+0io 279pf+0w 10.310u 0.040s 0:10.72 96.5% 0+0k 0+0io 281pf+0w 10.330u 0.040s 0:10.55 98.2% 0+0k 0+0io 280pf+0w 10.420u 0.020s 0:10.63 98.2% 0+0k 0+0io 280pf+0w for finitemap: 1.110u 0.020s 0:01.14 99.1% 0+0k 0+0io 195pf+0w 1.110u 0.010s 0:01.14 98.2% 0+0k 0+0io 195pf+0w 1.100u 0.030s 0:01.14 99.1% 0+0k 0+0io 195pf+0w 1.120u 0.010s 0:01.15 98.2% 0+0k 0+0io 195pf+0w 1.190u 0.010s 0:01.24 96.7% 0+0k 0+0io 195pf+0w so string-matching is terribly slow; using the finitemap is actually surprisingly fast, though still about 30% slower than the trie version. perhaps a better-optimized trie version would do better, but it's hard to say. On Mon, 23 Feb 2004, Simon Peyton-Jones wrote:
The trouble is that you probably *don't* want to expand this case x of { "foogle" -> e1; _ -> e2 } to this
case x of c1:x1 -> case c1 of 'f' -> case x1 of c2:x2 -> case c2 of 'o' -> of ....
So GHC generates a series of equality tests instead. A decent alternative might be:
generate case expressions when there is more than one string in the list, otherwise use an equality test
That would not be hard to do. If it becomes important to you, I'd have a go. But before doing so, could you do the work by hand and see if it makes a useful performance difference?
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Sven Panne | Sent: 22 February 2004 15:32 | To: John Meacham | Cc: glasgow-haskell-users@haskell.org | Subject: Re: optimization question | | John Meacham wrote: | > I was wondering if: | > | > case x of | > "foo" -> Foo | > "bar" -> Bar | > "fuzz" -> Fuzz | > "fuzo" -> Fuzo | > x -> other .. thing | > | > would optimize to | > | > let z = other .. thing in | > case x of | > ('f':x) -> case x of | > ('u':'z': x) -> | > "z" -> Fuzz | > "o" -> Fuzo | > _ -> z | > "oo" -> Foo | > _ -> z | > "bar" -> Bar | > _ -> z | | String literals are handled in a special way in GHC, so your example is | essentially converted into an if-cascade, which is not what you want. | OTOH, if you write the strings in their expanded form like ['f','o','o'], | you get your optimized version automatically. Perhaps Simon^2 can comment | on the rationale behind this, I can't remember the reason... | | Cheers, | S. | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume