
I was wondering if: case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing would optimize to let z = other .. thing in case x of ('f':x) -> case x of ('u':'z': x) -> "z" -> Fuzz "o" -> Fuzo _ -> z "oo" -> Foo _ -> z "bar" -> Bar _ -> z The reason I ask is I am writing someting which will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

John Meacham
The reason I ask is I am writing someting which will generate large case statements [over strings] and want to know if I should bother pre-optimizing it to [search tries]¹
I've no idea what compilers do, but one quick-and-dirty way to optimize this could perhaps be to insert the strings into a FiniteMap, and use lookupFMWithDefault(?) to get the resulting value? -kzm ¹ I think this is an accurate representation of your question? -- If I haven't seen further, it is by standing in the footprints of giants

John Meacham wrote:
I was wondering if:
case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing
would optimize to
let z = other .. thing in case x of ('f':x) -> case x of ('u':'z': x) -> "z" -> Fuzz "o" -> Fuzo _ -> z "oo" -> Foo _ -> z "bar" -> Bar _ -> z
String literals are handled in a special way in GHC, so your example is essentially converted into an if-cascade, which is not what you want. OTOH, if you write the strings in their expanded form like ['f','o','o'], you get your optimized version automatically. Perhaps Simon^2 can comment on the rationale behind this, I can't remember the reason... Cheers, S.

On Sun, Feb 22, 2004 at 12:20:35AM -0800, John Meacham wrote:
case x of "foo" -> Foo "bar" -> Bar "fuzz" -> Fuzz "fuzo" -> Fuzo x -> other .. thing
The reason I ask is I am writing someting which will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second. John
I suppose such things should be made by flex-style generators. Haskell has one (called Happy); I wonder how efficient it is compared to hand-written "case .. of" or "if .. = .." scanners. I've heard flex is at least not worse than hand-written C code. -- Max

Max Kirillov writes:
[...] I will generate large case statements like the first form and want to know if I should bother pre-optimizing it to the second.
I suppose such things should be made by flex-style generators.
If you don't mind using FFI, the tool of choice would probably be http://www.gnu.org/software/gperf/. Peter

G'day all.
Quoting Peter Simons
If you don't mind using FFI, the tool of choice would probably be http://www.gnu.org/software/gperf/.
Perfect hash functions are actually not that much better than "imperfect" hash functions for the case where you have keys to search for which are not in the set. "Imperfect" hashing requires that a key be scanned at least twice: Once to compute the hash and once for the final equality test. The equality test may need to be performed more than once if two keys hash to the same value. A good rule of thumb is that hash collisions are unlikely until you reach around sqrt(N) keys, where N is the size of the hash space. So, for example, for 32-bit hash values, you almost certainly won't get a collision unless you insert 65,000 or so keys, which is much more than Hal's 1,300 or so. In the absence of hash collisions "perfect" hashing is pretty much the same. The only differences are a) the hash function is more expensive to compute, and b) the equality test is guaranteed to happen at most once. Perfect hashing is best when you know that you're not going to search for keys that are not in the set. For example, an algorithm which requires two passes over the words in some set of documents could easily benefit from perfect hashing, since the words that you'll find in the second pass will only be those found in the first pass. Radix-based searching, on the other hand, requires only one pass through the key and no arithmetic. Cheers, Andrew Bromage
participants (6)
-
ajb@spamcop.net
-
John Meacham
-
Ketil Malde
-
Max Kirillov
-
Peter Simons
-
Sven Panne