
G'day all.
Quoting Brian Hulley
So perhaps my original spec is impossible to implement, though it is an open question whether some very clever encoding (with corresponding implementation of <) could be found which would lead to a better average performance (whatever that means).
For what it's worth: - It's not such a dumb idea to separate equality testing from less-than testing. A (Unique,ByteString) pair makes the former fast and the latter pretty fast when it's needed. - Arithmetic coding preserves lexicographic ordering. If you have a good statistical model for what sort of strings might be used as atoms, it may be possible to construct a representation of atoms which fit in machine 32- or 64-bit word most of the time. (This is similar to the way that GHC represents Integers; use a machine word when it fits.) - Hash table implementations rely on using a cheap test first (comparing hash codes), going to a full comparison only if the cheap test doesn't rule out what you're looking for.
As an aside, if the monad was removed then the result of atom "a" < atom "b" (atom :: String -> Atom) could not be determined by analysis of the program. It would depend on the evaluation order chosen by the compiler, but in a sense this doesn't matter because whatever the result is, it would be the same at any future time during the same run of the program so the use of Atoms as keys would still be safe. But is this still "functional"?
Questions of this nature produce much disagreement. Once upon a time, program arguments and environment variables were passed to a Haskell program using a similar mechanism. I think we got rid of this because it's not "functional" if you consider programs that fork. Cheers, Andrew Bromage