optimization question

John Meacham

22 Feb 2004 22 Feb '04

8:20 a.m.

I was wondering if: case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing would optimize to let z = other .. thing in case x of ('f':x) -> case x of ('u':'z': x) -> "z" -> Fuzz "o" -> Fuzo _ -> z "oo" -> Foo _ -> z "bar" -> Bar _ -> z The reason I ask is I am writing someting which will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Show replies by date

Ketil Malde

22 Feb 22 Feb

8:41 a.m.

John Meacham writes:

...

The reason I ask is I am writing someting which will generate large case statements [over strings] and want to know if I should bother pre-optimizing it to [search tries]Â¹

I've no idea what compilers do, but one quick-and-dirty way to optimize this could perhaps be to insert the strings into a FiniteMap, and use lookupFMWithDefault(?) to get the resulting value? -kzm Â¹ I think this is an accurate representation of your question? -- If I haven't seen further, it is by standing in the footprints of giants

Sven Panne

3:31 p.m.

John Meacham wrote:

...

I was wondering if:

case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing

would optimize to

let z = other .. thing in case x of ('f':x) -> case x of ('u':'z': x) -> "z" -> Fuzz "o" -> Fuzo _ -> z "oo" -> Foo _ -> z "bar" -> Bar _ -> z

String literals are handled in a special way in GHC, so your example is essentially converted into an if-cascade, which is not what you want. OTOH, if you write the strings in their expanded form like ['f','o','o'], you get your optimized version automatically. Perhaps Simon^2 can comment on the rationale behind this, I can't remember the reason... Cheers, S.

Max Kirillov

23 Feb 23 Feb

4:24 p.m.

On Sun, Feb 22, 2004 at 12:20:35AM -0800, John Meacham wrote:

...

case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing

...

The reason I ask is I am writing someting which will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second. John

I suppose such things should be made by flex-style generators. Haskell has one (called Happy); I wonder how efficient it is compared to hand-written "case .. of" or "if .. = .." scanners. I've heard flex is at least not worse than hand-written C code. -- Max

Peter Simons

5 p.m.

Max Kirillov writes:

...

...
[...] I will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second.

...

I suppose such things should be made by flex-style generators.

If you don't mind using FFI, the tool of choice would probably be http://www.gnu.org/software/gperf/. Peter

ajb＠spamcop.net

11:53 p.m.

G'day all. Quoting Peter Simons :

...

If you don't mind using FFI, the tool of choice would probably be http://www.gnu.org/software/gperf/.

Perfect hash functions are actually not that much better than "imperfect" hash functions for the case where you have keys to search for which are not in the set. "Imperfect" hashing requires that a key be scanned at least twice: Once to compute the hash and once for the final equality test. The equality test may need to be performed more than once if two keys hash to the same value. A good rule of thumb is that hash collisions are unlikely until you reach around sqrt(N) keys, where N is the size of the hash space. So, for example, for 32-bit hash values, you almost certainly won't get a collision unless you insert 65,000 or so keys, which is much more than Hal's 1,300 or so. In the absence of hash collisions "perfect" hashing is pretty much the same. The only differences are a) the hash function is more expensive to compute, and b) the equality test is guaranteed to happen at most once. Perfect hashing is best when you know that you're not going to search for keys that are not in the set. For example, an algorithm which requires two passes over the words in some set of documents could easily benefit from perfect hashing, since the words that you'll find in the second pass will only be those found in the first pass. Radix-based searching, on the other hand, requires only one pass through the key and no arithmetic. Cheers, Andrew Bromage

7813

Age (days ago)

7814

Last active (days ago)

List overview

Download

5 comments

6 participants

participants (6)

ajb＠spamcop.net
John Meacham
Ketil Malde
Max Kirillov
Peter Simons
Sven Panne