
Brian Hulley wrote:
So perhaps my original spec is impossible to implement, though it is an open question whether some very clever encoding (with corresponding implementation of <) could be found which would lead to a better average performance (whatever that means).
An alternative design for an atom module could be:
create :: MonadIO m => String -> m Atom toString :: Atom -> String
instance Eq Atom -- O(1) instance Ord Atom -- O(1) but depends on creation order
but here the < would not be lexicographic, so although it would be useful for implementing symbol tables, environments etc it's not ideal for GUI use (eg when displaying a tree of modules where everything should be listed alphabetically).
Eureka! There *is* a way to get O(1) lexicographic comparisons of Atoms :-) although at the expense of O(n + k) but possibly amortized O(n + ?) creation time where n is the length of the string and k is the number of Atoms in existence at any given time. The previous reason for needing to be clairvoyant in the choice of Unique (or Integer or Int) to represent an Atom was to allow expressions such as atom "a" < atom "b" because when the code has determined the representation for the first atom the creation of the second atom mustn't be allowed to change it. However with monadic creation, such expressions can't be written, and therefore it would be possible to represent atoms as follows: newtype Atom = Atom (IORef Int) -- or IORef Integer and when new atoms are created, we could simply update the Int's referenced by other atoms to ensure that the relative ordering is still lexicographic. If we assume that it's very unlikely we'd need more than 2^31 atoms to exist at any given time (if this assumption is wrong we could use Integer instead of Int) and that usually we'd have a lot less, the Int's could be allocated with gaps between them so that we'd only occasionally need to adjust the Int's of the atoms to the left or right when a new atom is inserted in the table, and in all cases it's extremely unlikely that we'd need to adjust the representation of all the other Atoms (hence amortized O(n) hopefully). The comparison would then be: compare (Atom l) (Atom r) = unsafePerformIO $ do li <- readIORef l ri <- readIORef r return (compare li ri) which is safe as long as creation of atoms is not allowed inside unsafePerformIO (it would be nice if there was a way to tell the typechecker that a specific action is not allowed in unsafe IO) Still there would be some tricky details to work out eg how to ensure that the average number of Integer's that need to be updated on creation of each new atom is as small as possible, so I'll leave this as an exercise for the reader!!! :-) Essentially it depends on how well the following can be implemented: data OrderingToken = ??? instance Eq OrderingToken instance Ord OrderingToken create :: MonadIO m => m OrderingToken -- x < y |- createBetween x y == z where x < z and z < y createBetween :: MonadIO m => OrderingToken -> OrderingToken -> m (OrderingToken) It would be interesting to investigate what the theoretical tradeoffs would be between the complexities of the above functions (assuming we want (<) to be O(1)) and whether or not they require to be monadic. Anyway apologies for rambling on, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com