
On Thu, 2005-08-18 at 12:17 +0100, Adrian Hey wrote:
On Thursday 18 Aug 2005 8:27 am, Axel Simon wrote:
On Thu, 2005-08-18 at 08:25 +0100, Adrian Hey wrote:
Also, I'd be grateful someone could tell me what the data structure I've used is actually called :-)
AFAIK, what you've implemented is called a trie, pronounced "try" from retrieval.
Thanks, I guessed it is some form of trie but I'm a bit confused about the precise definitions of various very similar data structures. I was looking at.. http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic7/
Ah! I thought it was a benign/naive question, but it really was a trap and I fell for it...
I'm not entirly clear about the difference between "tries", "compact tries" (depends what you call compact AFAICS) and "PATRICIA tries", and is a PATRICIA trie the same thing as a "PATRICIA tree"?
The TString type is basically a linked list which may or may not be terminated with an N way fork (in the form of an AVL tree, N >=2), but then strings are linked lists anyway in Haskell. So is this compact or not I wonder?
I guess in that sense there are no compact tries in Haskell. Unless you use PackedString.
It's compact in the sense that it doesn't require an AVL tree at every character (most of which would be singletons in practice). I guess you could achieve the same thing with this definition..
data TString v = TString {-# UNPACK #-} !Char String v !(AVL (TString v))
.. which might look more compact, but isn't really AFAICS (well not while Strings are linked lists of boxed Chars).
There is still a difference: If you look up a word, the suffix of the word will never be evaluated. In particular, the chain of TString nodes containing a character each will never be built. This could avoid potential space leaks. Furthermore, a compiler might do some optimisation magic on lists, which doesn't apply to your data type.
Replacing the String with a packed string might be better, but I'm not sure how efficient those are in reality (and if I was to go that route it might be a lot simpler and more efficient to use a completely different approach, such as hashing and a pure AVL tree say).
PackedStrings are certainly not the right thing to put into a general- purpose data structure. The problem is that if you request one character from a thunk that contains a conversion to PackedString, the whole packed string will be built before this single character can be extracted again. Hence the library would have a rather weird evaluation behaviour. I once fiddled with Olaf Chitil's pretty printing library and tried to improve it by using PackedStrings. It blew right up into my face since it forced stuff which was nicely delayed before, rendering all benefits of the concise representation useless. My 5p, Axel.