
Is there an implementation of a symbol type in Haskell i.e. a string which has a constant-time comparison operation? Mike

Michael Vanier wrote:
Is there an implementation of a symbol type in Haskell i.e. a string which has a constant-time comparison operation?
Stefan O'Rear wrote:
Yes, I beleive GHC uses one (utils/FastString.lhs iirs)
In some cases where you would need that in other languages, you would use an algebraic data type in Haskell, e.g.: data Color = Red | Orange | Yellow | Green | Blue | Purple deriving (Eq, Show) -Yitz

I'm thinking of a symbol type that can be used for a compiler, so a simple algebraic data type wouldn't work. Unfortunately the GHC datatype isn't part of the GHC haskell libraries AFAICT. Mike Yitzchak Gale wrote:
Michael Vanier wrote:
Is there an implementation of a symbol type in Haskell i.e. a string which has a constant-time comparison operation?
Stefan O'Rear wrote:
Yes, I beleive GHC uses one (utils/FastString.lhs iirs)
In some cases where you would need that in other languages, you would use an algebraic data type in Haskell, e.g.:
data Color = Red | Orange | Yellow | Green | Blue | Purple deriving (Eq, Show)
-Yitz

Michael Vanier, before Yitzchak Gale and himself:
I'm thinking of a symbol type that can be used for a compiler...
Ah. Perhaps Data.HashTable is what you are looking for then?
Hmm, I was hoping for something that didn't involve side effects.
Some popular Haskell libraries suffer from acute Monaditis. The hash tables, the random numbers, etc., all is embedded in IO or something similar. But the essential algorithms are independent of this structuration. You *can* make streams of random numbers, using the same generation algorithms, without monads. You *can* make your hashed sequences of strings using lists, without monads. You will be simply obliged to carry with you all that stuff, as *explicit* world state. So, go ahead, extract from Data.Hash what you need, and structure the access and the eventual insertions/deletions in a way you wish. The basic idea of constant-time access strings is usually based on hashing. But you may wish to be more elegant... So, why not try tries? http://en.wikipedia.org/wiki/Trie http://www.eecs.harvard.edu/~ellard/Q-97/HTML/root/node22.html http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/ They have been implemented in Haskell as well, probably many, many times. But beware, Google on "Haskell tries" will offer you this: http://www.muskogeephoenix.com/highschoolsports/local_story_281003617.html Jerzy Karczmarczuk

Michael Vanier wrote:
Yitzchak Gale wrote:
Ah. Perhaps Data.HashTable is what you are looking for then?
Hmm, I was hoping for something that didn't involve side effects.
There's always Data.Map newtype Symbol = S { unS :: Integer } deriving (Eq, Ord) type SymbolTable = Data.Map Symbol String Regards, apfelmus

I wrote:
Perhaps Data.HashTable is what you are looking for then?
Jerzy Karczmarczuk wrote:
extract from Data.Hash what you need... why not try tries?
apfelmus wrote:
There's always Data.Map
Those are log n. I would personally use those for almost every application, but Mike says he wants constant time, for a compiler. Apparantly he wants to do some really serious optimization. HashTable is highly optimized. Of course, any memory access is really no better than log n, but HashTable uses the built-in parallelization available on most hardware that makes it look like constant time up to the size of system memory. (Otherwise known as accessing memory locations by address.) So yes, using this hardware has side effects and needs to be in IO. You could do it in the ST monad instead of the IO monad if you want, if that is any consolation. -Yitzchak -Yitz

On Wed, 10 Oct 2007, Yitzchak Gale wrote:
I wrote:
Perhaps Data.HashTable is what you are looking for then?
Jerzy Karczmarczuk wrote:
extract from Data.Hash what you need... why not try tries?
apfelmus wrote:
There's always Data.Map
Those are log n. I would personally use those for almost every application, but Mike says he wants constant time, for a compiler.
However, I'm guessing he can use key comparison rather than value comparison - that changes things somewhat. -- flippa@flippac.org "My religion says so" explains your beliefs. But it doesn't explain why I should hold them as well, let alone be restricted by them.

On 10/10/07, Michael Vanier
Is there an implementation of a symbol type in Haskell i.e. a string which has a constant-time comparison operation?
To borrow Prolog terminology, it sounds like you're looking for an "atom" data type. I've not done it, but I've plotted to implement a module according to the following sketch: module Data.Atom where data Atom .... atom :: String -> Atom -- or ByteString name :: Atom -> String -- or ByteString instance Eq Atom where ... instance Ord Atom where ... The constructor function would do hash-consing using unsafePerformIO internally to build a [hash] table of extant Atoms. If ByteString is used for the internal "name", the hash consing means that you can use the Ptr for O(1) equality tests. The implementation of compare would still need to do a normal string comparison, after doing an initial equality test. If you do the O(1) equality test before doing a full compare, the performance will be very good in many situations, since non-equal comparisons tend to terminate quickly. The exception of course is strings with long common prefixes (e.g. URLs). For symbol names in a compiler, this is unlikely to be a significant problem. cheers, T. -- Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

For a contrary point of view, there is a footnote at the bottom of page 20 in "Parsec, a fast combinator parser" by Daan Leijen, the creator of Parsec: "I have to warn the reader though that experience with the HaskellLight compiler has shown that it hardly pays off in practice to use special identifier representations instead of normal strings" Thomas Conway wrote:
On 10/10/07, Michael Vanier
wrote: Is there an implementation of a symbol type in Haskell i.e. a string which has a constant-time comparison operation?
To borrow Prolog terminology, it sounds like you're looking for an "atom" data type.
I've not done it, but I've plotted to implement a module according to the following sketch:
module Data.Atom where
data Atom ....
atom :: String -> Atom -- or ByteString
name :: Atom -> String -- or ByteString
instance Eq Atom where ...
instance Ord Atom where ...
The constructor function would do hash-consing using unsafePerformIO internally to build a [hash] table of extant Atoms. If ByteString is used for the internal "name", the hash consing means that you can use the Ptr for O(1) equality tests. The implementation of compare would still need to do a normal string comparison, after doing an initial equality test.
If you do the O(1) equality test before doing a full compare, the performance will be very good in many situations, since non-equal comparisons tend to terminate quickly. The exception of course is strings with long common prefixes (e.g. URLs). For symbol names in a compiler, this is unlikely to be a significant problem.
cheers, T.
participants (8)
-
apfelmus
-
Dan Weston
-
jerzy.karczmarczuk@info.unicaen.fr
-
Michael Vanier
-
Philippa Cowderoy
-
Stefan O'Rear
-
Thomas Conway
-
Yitzchak Gale