
When I was at university, we learned a programming language known as Smalltalk. I was rather good at it. [Ironically, making "small talk" is one of the things I do worst IRL! But anyway, back to the topic...] In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types. However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists. Why is that, exactly? Does writing software in Haskell magically change the properties of these data structures such that lists become more efficient than all the other types? Or is it that other data structures are only efficient when used with in-place updates? (The latter statement appears to be isomorphic to stating that Haskell programs must necessarily be less efficient than impure ones.) Thoughts?

On Tue, 19 Jun 2007 19:26:20 +0100
Andrew Coppin
When I was at university, we learned a programming language known as Smalltalk. I was rather good at it. [Ironically, making "small talk" is one of the things I do worst IRL! But anyway, back to the topic...]
In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types.
See Edison (http://www.eecs.tufts.edu/~rdocki01/edison.html). Yes, the standard library is lacking a few structures (I often miss priority queues), but the code certainly exists elsewhere.
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
I think you need to read more Haskell code. Data.Map and Data.Set (both tree-based data structures) are used very often. Arrays exist in the standard library, but aren't used very often -- O(1) access is usually not needed and the O(n) update cost for immutable arrays is quite expensive. Also, note the wild success of Data.ByteString -- a structure that is like a weird list/array hybrid.
Why is that, exactly?
Lists are very handy in a lazy programming language -- Haskell lets us use lists not only as a data structure, but as control structures as well. Also, lists or Data.Map are often "close enough". Who needs a bag when "Map a Int" gets the job done? What good is a heap when Data.Map supports findMin? Cheers, Spencer Janssen
Does writing software in Haskell magically change the properties of these data structures such that lists become more efficient than all the other types? Or is it that other data structures are only efficient when used with in-place updates? (The latter statement appears to be isomorphic to stating that Haskell programs must necessarily be less efficient than impure ones.)
Thoughts?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 6/20/07, Spencer Janssen
(I often miss priority queues)
I must say, there have been several times lately where I have wanted such a type. Actually, I've found Data.Map isn't too bad if you use deleteMin/deleteMax. What's more, deleteM{in,ax} & insert are both O(log n) for most reasonable implementations that could be used for a Data.Map-like structure, which works out much the same in many cases. In particular, I find my self wanting to use a priority queue for N-way sorted merge, which you can do with Data.Map: (compiles, so clearly works even though I have not tested it. ;-) ) import Data.List as List import Data.Map as Map merge :: Ord t => [[t]] -> [t] merge lists = merge' $ Map.fromList $ concatMap makePair lists where makePair [] = [] makePair (x:xs) = [(x,xs)] merge' heap | Map.null heap = [] | otherwise = x:(merge' $ removeEqual x $ reinsert xs heap') where ((x,xs), heap') = deleteFindMin heap reinsert [] heap = heap reinsert (x:xs) heap = Map.insert x xs heap removeEqual x heap | Map.null heap = heap | x /= y = heap | otherwise = removeEqual x $ reinsert ys heap' where ((y,ys), heap') = deleteFindMin heap The other thing I have found myself doing often is using splitLookup followed by union, though what I really want is "join" being the dual of split - i.e. requiring all the keys in the rhs to be greater than the keys in the lhs. My own AVL tree implementation has this operation which is O(log n), which is rather better than union's O(n log n). T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

Thomas Conway wrote:
In particular, I find my self wanting to use a priority queue for N-way sorted merge, which you can do with Data.Map: (compiles, so clearly works even though I have not tested it. ;-) )
import Data.List as List import Data.Map as Map
merge :: Ord t => [[t]] -> [t] merge lists = merge' $ Map.fromList $ concatMap makePair lists where makePair [] = [] makePair (x:xs) = [(x,xs)]
merge' heap | Map.null heap = [] | otherwise = x:(merge' $ removeEqual x $ reinsert xs heap') where ((x,xs), heap') = deleteFindMin heap
reinsert [] heap = heap reinsert (x:xs) heap = Map.insert x xs heap
removeEqual x heap | Map.null heap = heap | x /= y = heap | otherwise = removeEqual x $ reinsert ys heap' where ((y,ys), heap') = deleteFindMin heap
Eh, why not a simple mergesort that also deletes duplicates? -- the nested lists must be sorted: map sort xs == xs mergesort :: Ord a => [[a]] -> [a] mergesort [] = [] mergesort xs = foldtree1 merge xs foldtree1 :: (a -> a -> a) -> [a] -> a foldtree1 f [x] = x foldtree1 f xs = foldtree1 f $ pairs xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs merge :: Ord a => [a] -> [a] -> [a] merge [] ys = ys merge xs [] = xs merge xs'@(x:xs) ys'@(y:ys) | x < y = x:merge xs ys' | x == y = merge xs ys' | otherwise = y:merge xs' ys The function 'foldtree1' folds the elements of the list as if they where in a binary tree: foldrtree1 f [1,2,3,4,5,6,7,8] ==> ((1 `f` 2) `f` (3 `f` 4)) `f` ((5 `f` 6) `f` (7 `f` 8)) and with f = merge, this serves as heap (although a very implicit one). The hole mergesort will take O(n*log (length xs)) where n = length (concat xs) time. Moreover, this variant of mergesort happens to generate elements as soon as they are available, i.e. head . mergesort is O(n) See also http://article.gmane.org/gmane.comp.lang.haskell.general/15010
The other thing I have found myself doing often is using splitLookup followed by union, though what I really want is "join" being the dual of split - i.e. requiring all the keys in the rhs to be greater than the keys in the lhs. My own AVL tree implementation has this operation which is O(log n), which is rather better than union's O(n log n).
2-3-Finger trees support efficient splits and concatenations: http://www.soi.city.ac.uk/~ross/papers/FingerTree.html In fact, you can build a plethora of data structures from them. Regards, apfelmus

On 6/20/07, apfelmus
Eh, why not a simple mergesort that also deletes duplicates?
I had to sit down and think about this, and while for the simple case that I showed, your equivalent code is definitely simpler, and probably more efficient. The actual case that I'm dealing with, where I believe Data.Map (or similar, incl finger trees) has a benefit is one in which it's not simply a case of lists of items, yielding a list of items. I'm manipulating an on-disk inverted index, so rather than a simple list of items, the code is actually monadic, doing IO to retrieve the items off disk, and the cost of creating the intermediate lists is unwearable. The key problem is that you loose the laziness because of the IO monad, so if you're not careful, you end up trying to store the complete intermediate lists. If you can assume you can hold enough stuff in memory, then nice elegant lazy algorithms work beautifully. I'm doing external algorithms which have to work on Gbs of data, which I can't hold in memory. So the type signature of my merge is approximately: type Reader t = IO (Maybe t) type Writer t = t -> IO () merge :: [Reader t] -> Writer t -> IO () Actually, the scope of the problem is such that I could *almost* finesse the problem by reading the globs of data as ByteStrings, then use lazy evaluation internally, and write the outputs merge blobLocators writer = do lists <- fmap decode $ mapM readOffDisk blobLocators mapM_ writer (mergesort lists) but I need to keep a firm lid on the resource usage, so I can't. <sigh> cheers, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

On Fri, 2007-06-22 at 09:38 +1000, Thomas Conway wrote:
The actual case that I'm dealing with, where I believe Data.Map (or similar, incl finger trees) has a benefit is one in which it's not simply a case of lists of items, yielding a list of items. I'm manipulating an on-disk inverted index, so rather than a simple list of items, the code is actually monadic, doing IO to retrieve the items off disk, and the cost of creating the intermediate lists is unwearable. The key problem is that you loose the laziness because of the IO monad, so if you're not careful, you end up trying to store the complete intermediate lists.
You might find that lazy IO is helpful in this case. The primitive that implements lazy IO is unsafeInterleaveIO :: IO a -> IO a Note that using a Map will probably not help since it needs to read all the keys to be able to construct it so that'd pull in all the data from disk. Duncan

On 6/22/07, Duncan Coutts
You might find that lazy IO is helpful in this case. The primitive that implements lazy IO is unsafeInterleaveIO :: IO a -> IO a
Personally, unsafeInterleaveIO is so horribly evil, that even just having typed the name, I'll have to put the keyboard through the dishwasher (see http://www.coudal.com/keywasher.php). Also, I need to support concurrent querying and updates, and trying to manage the locking is quite hard enough as it is, without trying to keep track of which postings vectors have closures pointing to them!
Note that using a Map will probably not help since it needs to read all the keys to be able to construct it so that'd pull in all the data from disk.
Well, in the case I'm dealing with, the map can contain the current key from each postings vector, and the closure for reading the remainder of the vector. E.g. Map Key ([IO (Maybe Key)]). T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

On Fri, 2007-06-22 at 15:34 +1000, Thomas Conway wrote:
On 6/22/07, Duncan Coutts
wrote: You might find that lazy IO is helpful in this case. The primitive that implements lazy IO is unsafeInterleaveIO :: IO a -> IO a
Personally, unsafeInterleaveIO is so horribly evil, that even just having typed the name, I'll have to put the keyboard through the dishwasher (see http://www.coudal.com/keywasher.php). Also, I need to support concurrent querying and updates, and trying to manage the locking is quite hard enough as it is, without trying to keep track of which postings vectors have closures pointing to them!
Ah yes, fair enough. If you're doing updates at the same time then lazy IO isn't appropriate as you need control over when the IO happens. Duncan

Thomas Conway wrote:
On 6/22/07, Duncan Coutts
wrote: You might find that lazy IO is helpful in this case. The primitive that implements lazy IO is unsafeInterleaveIO :: IO a -> IO a
Personally, unsafeInterleaveIO is so horribly evil, that even just having typed the name, I'll have to put the keyboard through the dishwasher (see http://www.coudal.com/keywasher.php).
:D :D Finally someone who fully understands the true meaning of the prefix "unsafe" ;)
Note that using a Map will probably not help since it needs to read all the keys to be able to construct it so that'd pull in all the data from disk.
Well, in the case I'm dealing with, the map can contain the current key from each postings vector, and the closure for reading the remainder of the vector. E.g. Map Key ([IO (Maybe Key)]).
In any case, you have to store as many keys as you have lists to sort, but lazy mergesort will not hold on more than (length xs + 1) keys in memory at a single moment in time and only force one new key per retrieval. No lingering intermediate lists :) In this situation, unsafeInterleaveIO is an easy way to carry this behavior over to the IO-case: type Reader t = IO (Maybe t) type Writer t = t -> IO () readList :: Reader t -> IO [t] readList m = unsafeInterleaveIO $ do mx <- m case mx of Just x -> liftM (x:) $ readList m Nothing -> return [] mergesortIO :: Ord t => [Reader t] -> Writer t -> IO () mergesortIO xs f = do ys <- mapM readList xs mapM_ f $ mergesort ys Here, readList creates only as many list elements as you demand, similarly to getContents. Of course, it has the same problem as getContents, namely that you can accidentally close the file before having read all data. But this is applies to any on-demand approach be it with IO or without. Also, you can make the heap in mergesort explicit and obtain something similar to your current approach with Data.Map. The observation is that while mergesort does create a heap, its shape does not change and is determined solely by (length xs). -- convenient invariant: -- the smaller element comes from the left child data Ord b => Heap m b = Leaf m b | Branch b (Tree a b) (Tree a b) -- smart constructor branch :: Ord b => Tree m b -> Tree m b -> Tree m b branch x y | gx <= gy = Branch gx x y | otherwise = Branch gy y x where (gx,gy) = (getMin x, getMin y) -- fromList is the only way to "insert" elements into a heap fromList :: Ord b => [(m,b)] -> Heap m b fromList = foldtree1 branch . map (uncurry Leaf) getMin :: Heap m b -> b getMin (Leaf _ b) = b getMin (Branch b _ _ ) = b deleteMin :: Heap (Reader b) b -> IO (Maybe (Heap (Reader b) b)) deleteMin (Leaf m _) = m >>= return . fmap (Leaf m) deleteMin (Branch _ x y) = do mx' <- deleteMin x return . Just $ case mx' of Just x' -> branch x' y Nothing -> y mergesortIO :: Ord t => [Reader t] -> Writer t -> IO () mergesortIO xs f = ...
Also, I need to support concurrent querying and updates, and trying to manage the locking is quite hard enough as it is, without trying to keep track of which postings vectors have closures pointing to them!
I guess you have considered Software Transactional Memory for atomic operations? http://research.microsoft.com/~simonpj/papers/stm/index.htm Also, write-once-read-many data structures (like lazy evaluation uses them all the time) are probably very easy to get locked correctly. Regards, apfelmus

On 6/22/07, apfelmus
I guess you have considered Software Transactional Memory for atomic operations? http://research.microsoft.com/~simonpj/papers/stm/index.htm
Also, write-once-read-many data structures (like lazy evaluation uses them all the time) are probably very easy to get locked correctly.
STM was *the* justification to the mgt for letting me use Haskell rather than C++. :-) However, you do need to take care, because in this context it would be easy to end up creating great big transactions which conflict with one another, which quite aside from wasting CPU on retries, can in extreme cases lead to starvation. A bit like laziness, STM is fantastic for correctness, but can be a bit obtuse for performance. With that proviso, I think STM is better than sliced bread.[*] Incidentally, I read Herlihy's papers on lock free data structures early on in my work on parallelism and concurrency for Mercury in the mid 90's. What a shame I didn't have the wit to understand them properly at the time, or Mercury might have had STM 10 years ago. :-) T. [*] People who know me well, would realize that since I bake my own bread and slice it with a bread-knife myself, comparison to sliced bread may be faint praise. It isn't. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

On Tue, Jun 19, 2007 at 07:26:20PM +0100, Andrew Coppin wrote:
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Not true. They also use efficient sets, maps, priority queues, (imperative) hash tables, etc. Take a look at Data.Map and Data.Set, which come with GHC. Best regards Tomek

I don't know where you got the notion that such structures are not
available in Haskell. There are many efficient data structures in the
libraries. Lists are not magical, just popular, natural, and
traditional. Specialized data structures are always important.
Take a look at the Data.* modules in
http://haskell.org/ghc/docs/latest/html/libraries/
Also see
http://www.haskell.org/ghc/docs/edison/
There are many references to be found. You may want to cozy up with
this one if you're really interested.
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
http://books.google.com/books?id=SxPzSTcTalAC
On 6/19/07, Andrew Coppin
When I was at university, we learned a programming language known as Smalltalk. I was rather good at it. [Ironically, making "small talk" is one of the things I do worst IRL! But anyway, back to the topic...]
In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types.
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Why is that, exactly? Does writing software in Haskell magically change the properties of these data structures such that lists become more efficient than all the other types? Or is it that other data structures are only efficient when used with in-place updates? (The latter statement appears to be isomorphic to stating that Haskell programs must necessarily be less efficient than impure ones.)
Thoughts?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Nicolas Frisby wrote:
I don't know where you got the notion that such structures are not available in Haskell. There are many efficient data structures in the libraries. Lists are not magical, just popular, natural, and traditional. Specialized data structures are always important.
Mmm, OK. Maybe it's just that Haskell (or Haskell-related writings) doesn't place quite so much emphasis on them...?
There are many references to be found. You may want to cozy up with this one if you're really interested.
That certainly looks interesting...

I'm not a Haskell guru but I think I can answer this... On Tuesday 19 June 2007 19:26:20 Andrew Coppin wrote:
In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types.
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Why is that, exactly? Does writing software in Haskell magically change the properties of these data structures such that lists become more efficient than all the other types? Or is it that other data structures are only efficient when used with in-place updates? (The latter statement appears to be isomorphic to stating that Haskell programs must necessarily be less efficient than impure ones.)
I believe you are misinterpreting what you see. Like other modern functional languages, Haskell simply provides a different starting point in the context of data structures. Singly linked lists are integrated into the language, with custom syntax, literals and so forth. This makes them easier to use and, consequently, they are more common. If you choose immutability (which has many benefits) then doubly linked lists become comparatively undesirable. In the presence of sum types, pattern matching, views, persistence and concurrency, tree-based collections can be preferable. Also, the Haskell standard library appears to include all of the usual data structures and many more that are made accessible by functional programming and immutability. For example, hash tables, sets and maps: http://haskell.org/ghc/docs/latest/html/libraries/base/Data-HashTable.html http://haskell.org/ghc/docs/latest/html/libraries/base/Data-IntMap.html http://haskell.org/ghc/docs/latest/html/libraries/base/Data-IntSet.html -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. The OCaml Journal http://www.ffconsultancy.com/products/ocaml_journal/?e

On 6/19/07, Andrew Coppin
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Um, not quite. Lists may be the most common data structure in a Haskell program, but they certainly aren't the *only* collection type. A quick tour through the Hierarchical Libraries[0] documentation finds: - Data.Map (more commonly known as 'dictionaries') - Data.HashTable (same idea, different runtime characteristics) - many varieties of Data.Array (fixed sized arrays, with quick access to any element) - Data.Sequence - Data.Set - .... The core language also offers tuples, which are a very interesting kind of collection, if very primitive. Also, tree structures are trivial to create that it is often easier to define the precise tree structure you want instead of reusing one of a dozen possible modules. Here's one that's common in many tutorials: data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show, Eq) These are just some of the purely functional data structures[1] that provide collection types within a Haskell program. -- Adam [0]: http://www.haskell.org/ghc/docs/latest/html/libraries/index.html [1]: http://www.google.com/search?q=purely+functional+data+structures

Adam Turoff wrote:
- Data.Map (more commonly known as 'dictionaries')
Ah yes, I've heard about this one.
- Data.HashTable (same idea, different runtime characteristics)
Well, you learn something every day... (And that, of course, is the reason for asking questions in the first place!)
- many varieties of Data.Array (fixed sized arrays, with quick access to any element)
Again, well known... uh... well not "friends" exactly, but yeah.
- Data.Sequence
What's that do?
- Data.Set
IIRC, doesn't that only work for types in Ord?
The core language also offers tuples, which are a very interesting kind of collection, if very primitive.
Tuples are definitely "interesting". They are of course a sort-of "collection", although not in the usually used sense. Here in Haskell, we also have Maybe, another "interesting" collection type...
Also, tree structures are trivial to create that it is often easier to define the precise tree structure you want instead of reusing one of a dozen possible modules.
I read in The Fun of Programming an implementation of [several kinds of] heaps in Haskell. It was a joy to behold... (I don't see any heaps in the standard libraries though...?)

Andrew Coppin writes:
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.)
Woah there, what about: Data.Map -- lookup tables Data.Array -- lookup tables with enumerated keys. Mutable interfaces are also available. Data.Sequence -- two-ended sequence type supporting a fast variety of operations quicker than lists Data.Graph -- graph type Data.Set -- unordered collection Data.Tree -- rose tree type And those are just the ones distributed with GHC. -- -David House, dmhouse@gmail.com

David House wrote:
Data.Sequence -- two-ended sequence type supporting a fast variety of operations quicker than lists
Um... I guess I'll have to go research that one a bit...
Data.Graph -- graph type
What would you use that for? (And what does it do?)
Data.Tree -- rose tree type
What's a rose tree? (I only know about binary trees. Well, and N-ary trees... but nobody uses those.)

Andrew Coppin writes:
Data.Graph -- graph type
What would you use that for? (And what does it do?)
It's for graphs, in the graph-theory [1] sense. The referenced page gives a list of example problems in the area, most of which are very practical. [1]: http://en.wikipedia.org/wiki/Graph_theory
Data.Tree -- rose tree type
What's a rose tree? (I only know about binary trees. Well, and N-ary trees... but nobody uses those.)
Well, it is said that a rose tree by any other name would be just as N-ary. (I think they're the same concept :)). -- -David House, dmhouse@gmail.com

David House wrote:
Andrew Coppin writes:
Data.Graph -- graph type
What would you use that for? (And what does it do?)
It's for graphs, in the graph-theory [1] sense.
Yes, I realise that. (I'm not a graph theory expert, but I'm aware of the subject.) But what kind of thing would you use a general graph for? (Rather than some more specific custom data type.)
Data.Tree -- rose tree type
What's a rose tree? (I only know about binary trees. Well, and N-ary trees... but nobody uses those.)
Well, it is said that a rose tree by any other name would be just as N-ary. (I think they're the same concept :)).
LOL! I asked Wikipedia about "rose tree" and got something quite different... ;-)

Just a couple of examples: many non-trivial program analyses (like
optimizations or type-inference) rely on viewing the AST as a graph.
Graph reduction is an evaluation paradigm, and I'm guessing that a
(specification-oriented) interpreter might use a graph.
On 6/20/07, Andrew Coppin
David House wrote:
Andrew Coppin writes:
Data.Graph -- graph type
What would you use that for? (And what does it do?)
It's for graphs, in the graph-theory [1] sense.
Yes, I realise that. (I'm not a graph theory expert, but I'm aware of the subject.) But what kind of thing would you use a general graph for? (Rather than some more specific custom data type.)
Data.Tree -- rose tree type
What's a rose tree? (I only know about binary trees. Well, and N-ary trees... but nobody uses those.)
Well, it is said that a rose tree by any other name would be just as N-ary. (I think they're the same concept :)).
LOL! I asked Wikipedia about "rose tree" and got something quite different... ;-)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Albert Y. C. Lai wrote:
Andrew Coppin wrote:
But what kind of thing would you use a general graph for? (Rather than some more specific custom data type.)
Representing networks.
Yes... "graph" and "network" are virtually synonymous. I'm still wondering what you'd use a network for in a computer program. (Unless of course you were writing a network simulator, that is...)

On 6/20/07, Andrew Coppin
Yes... "graph" and "network" are virtually synonymous. I'm still wondering what you'd use a network for in a computer program.
Writing a Haskell compiler. http://en.wikibooks.org/wiki/Haskell/Graph_reduction -- Dan

Dan Piponi wrote:
On 6/20/07, Andrew Coppin
wrote: Yes... "graph" and "network" are virtually synonymous. I'm still wondering what you'd use a network for in a computer program.
Writing a Haskell compiler.
True enough - but that's a rather specific task. I'm still not seeing vast numbers of other uses for this...

Andrew Coppin wrote:
Dan Piponi wrote:
On 6/20/07, Andrew Coppin
wrote: Yes... "graph" and "network" are virtually synonymous. I'm still wondering what you'd use a network for in a computer program.
Writing a Haskell compiler.
True enough - but that's a rather specific task. I'm still not seeing vast numbers of other uses for this...
You can see lots of applications for graphs at the following page: http://www.graph-magics.com/practic_use.php That was the second result for http://www.google.com/search?as_q=%22applications%20of%20graph%20algorithms%..., and there are some other results in there that you might find interesting too.

Calvin Smith wrote:
Andrew Coppin wrote:
True enough - but that's a rather specific task. I'm still not seeing vast numbers of other uses for this...
You can see lots of applications for graphs at the following page:
I see a pattern here - these are all the kinds of programs that I'm never likely to ever write. Maybe that's the cause? ;-)

On Wednesday 20 June 2007 19:42:59 Andrew Coppin wrote:
But what kind of thing would you use a general graph for?
Connectivity in networks, being anything from computers on the internet to atoms in a molecule. Most graphs are best represented as sum types (and abstract references, like identifiers) in FPLs like Haskell though, rather than using a more general representation. Abstract syntax trees in compilers and interpreters, and scene graphs in computer graphics being two obvious examples. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. The OCaml Journal http://www.ffconsultancy.com/products/ocaml_journal/?e

On Tue, 19 Jun 2007, Andrew Coppin wrote:
In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types.
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Most Haskell programs use other data structures than lists. Writing Haskell programs doesn't automagically change the the behaviour of linked list into the collection type you need. Have you looked at the Haskell standard library? There are quite many collection types available: sets, maps, hash tables, queues, graphs, you name it. Note that most collection types are not built into the language itself, but are part of the standard library, i.e. the STL in case of C++, the Java standard library, and I presume it is the same with Smalltalk. The equivalent of Haskell's list data type would be the array type of most imperative or object-oriented languages. Both are some sort of basic collection type, good for their own sake, but if you want more specialized collection types, you have to implement them. Regards, Jens

Jens Fisseler wrote:
The equivalent of Haskell's list data type would be the array type of most imperative or object-oriented languages. Both are some sort of basic collection type, good for their own sake, but if you want more specialized collection types, you have to implement them.
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."

I'm afraid I must agree with you a little. Many people use lists when a
different data structure would have been better. It's a pity, because
Haskell provides a large number of different data structures.
On 6/19/07, Andrew Coppin
Jens Fisseler wrote:
The equivalent of Haskell's list data type would be the array type of most imperative or object-oriented languages. Both are some sort of basic collection type, good for their own sake, but if you want more specialized collection types, you have to implement them.
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Jun 19, 2007, at 16:23 , Andrew Coppin wrote:
Jens Fisseler wrote:
The equivalent of Haskell's list data type would be the array type of most imperative or object-oriented languages. Both are some sort of basic collection type, good for their own sake, but if you want more specialized collection types, you have to implement them.
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."
Haskell is, in many ways, a descendant of Lisp. This does tend to lead to lists being *the* collection type, in my experience: sure, others get used, but lists are the ones you see in examples and such. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Tue, 2007-06-19 at 18:49 -0400, Brandon S. Allbery KF8NH wrote:
On Jun 19, 2007, at 16:23 , Andrew Coppin wrote:
Jens Fisseler wrote:
The equivalent of Haskell's list data type would be the array type of most imperative or object-oriented languages. Both are some sort of basic collection type, good for their own sake, but if you want more specialized collection types, you have to implement them.
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."
Haskell is, in many ways, a descendant of Lisp. This does tend to lead to lists being *the* collection type, in my experience: sure, others get used, but lists are the ones you see in examples and such.
Not in my experience. Certainly lists are used all over the place*, but I rarely see them abused. Also, "lists" aren't lists in Lisp, they're more akin to rose-trees (or going the other way, there are only pairs in Lisp). In practice, almost all Haskell programs use custom defined algebraic data types which are usually tree like. Declaring and using data types is easier in Haskell than it is in almost any other language. * As others have mentioned, lists represent loops and loops are extremely common in programming in general.

Derek Elkins wrote:
On Tue, 2007-06-19 at 18:49 -0400, Brandon S. Allbery KF8NH wrote:
Haskell is, in many ways, a descendant of Lisp. This does tend to lead to lists being *the* collection type, in my experience: sure, others get used, but lists are the ones you see in examples and such.
Not in my experience. Certainly lists are used all over the place*, but I rarely see them abused. Also, "lists" aren't lists in Lisp, they're more akin to rose-trees (or going the other way, there are only pairs in Lisp).
In practice, almost all Haskell programs use custom defined algebraic data types which are usually tree like. Declaring and using data types is easier in Haskell than it is in almost any other language.
True...
* As others have mentioned, lists represent loops and loops are extremely common in programming in general.
Um... surely *every* collection type represents a loop?

I don't think the collection type (a,b) is best thought of as a loop.
Neither is a (non-trivial) tree.
On 6/20/07, Andrew Coppin
Derek Elkins wrote:
On Tue, 2007-06-19 at 18:49 -0400, Brandon S. Allbery KF8NH wrote:
Haskell is, in many ways, a descendant of Lisp. This does tend to lead to lists being *the* collection type, in my experience: sure, others get used, but lists are the ones you see in examples and such.
Not in my experience. Certainly lists are used all over the place*, but I rarely see them abused. Also, "lists" aren't lists in Lisp, they're more akin to rose-trees (or going the other way, there are only pairs in Lisp).
In practice, almost all Haskell programs use custom defined algebraic data types which are usually tree like. Declaring and using data types is easier in Haskell than it is in almost any other language.
True...
* As others have mentioned, lists represent loops and loops are extremely common in programming in general.
Um... surely *every* collection type represents a loop?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lennart Augustsson wrote:
I don't think the collection type (a,b) is best thought of as a loop.
True. That's a rather special type; I haven't seen anything remotely like it in any other language.
Neither is a (non-trivial) tree.
Erm... Depends on your idea of "loop" I suppose. A tree represents a recursive loop quite nicely. ;-) My point of course was that an array is a loop just as much as a list is. Same goes for a set. Or even a dictionary (looping over key/value pairs), whichever way it's implemented internally.

On 6/21/07, Andrew Coppin
Lennart Augustsson wrote:
I don't think the collection type (a,b) is best thought of as a loop.
True. That's a rather special type; I haven't seen anything remotely like it in any other language.
Is it that special? How is it different to the C++ STL std::pair template type? I must be missing something. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

Thomas Conway wrote:
On 6/21/07, Andrew Coppin
wrote: Lennart Augustsson wrote:
I don't think the collection type (a,b) is best thought of as a loop.
True. That's a rather special type; I haven't seen anything remotely like it in any other language.
Is it that special? How is it different to the C++ STL std::pair template type? I must be missing something.
I've never "seen" C++. (Well, not properly anyway...)

On Wednesday 20 June 2007 20:04:25 Andrew Coppin wrote:
Lennart Augustsson wrote:
I don't think the collection type (a,b) is best thought of as a loop.
True. That's a rather special type; I haven't seen anything remotely like it in any other language.
Are you referring to a 2-tuple/pair? -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. The OCaml Journal http://www.ffconsultancy.com/products/ocaml_journal/?e

Andrew Coppin wrote:
[...] type (a,b) [...]
That's a rather special type; I haven't seen anything remotely like it in any other language.
This type isn't that special in Haskell (apart from being syntax-sugared), it could be defined as data Pair a b = Pair a b The equivalent of this definition introduces pairs in other languages, too. Consider Java: public class Pair { public A fst; public B snd; public Pair(A fst, B snd) { this.fst = fst; this.snd = snd; } } But there's Lisp, wich doesn't allow custom algebraic data types, but instead build all data from pairs. They are called cons cells, and could be defined like this in Haskell: data Cons = Nil | Cons Cons Cons In Lisp, pairs are indeed special.
A tree represents a recursive loop quite nicely. ;-)
What's a recursive loop?
My point of course was that an array is a loop just as much as a list is.
No, it isn't. A loop has the following properties: (1) you can access only the current value (2) you can move only forward by computing some new current value A (single linked) list has the following properties: (1) you can access only the current value (2) you can move only forward by following the next pointer An array has the following properties: (1) you can access each value (2) you don't need to move around In a lazy language, "following the next pointer" triggers "computing the new value", so loops are similar to lists, but different from arrays.
[...] whichever way it's implemented internally.
The point is: Some usage of Haskell lists is internally implemented as loops. for example this haskell code let result = 1 : zipWith (+) result result in result !! 10 is equivalent to this c code int result = 1; for (int i = 0; i < 10; i++) result = result + result; and is hopefully compiled to something like this c code. Of course, you can loop over most collections, in the sense of repeatedly running some code for each element. This is expressed in Haskell in a bunch of type classes, most notably Functor.
Same goes for a set. Or even a dictionary (looping over key/value pairs), whichever way it's implemented internally.
I take your "wichever way" to apply to all collections you mentioned. Let's consider this set representation: type Set a = a -> Bool dual a x = not (a x) member y a = a y notMember y a = dual a y empty y = False insert x a y = x == y || a y remove a x y = x /= y && a y union a b y = a y || b y intersection a b y = a y && b y difference a b = intersection a (dual b) filter = intersection (Function names and their meaning is taken from Data.Set. a and b stands for sets, x and y for set elements and f for some function. the dual set is the set containing all elements except those in the original set) What has a set represented like this in common with a loop? Tillmann

Tillmann Rendel wrote:
Andrew Coppin wrote:
[...] type (a,b) [...]
That's a rather special type; I haven't seen anything remotely like it in any other language.
This type isn't that special in Haskell (apart from being syntax-sugared), it could be defined as
data Pair a b = Pair a b
The equivalent of this definition introduces pairs in other languages, too. Consider Java:
public class Pair { public A fst; public B snd; public Pair(A fst, B snd) { this.fst = fst; this.snd = snd; } }
OK, I don't even understand that syntax. Have they changed the Java language spec or something?
My point of course was that an array is a loop just as much as a list is.
No, it isn't. A loop has the following properties:
(1) you can access only the current value (2) you can move only forward by computing some new current value
A (single linked) list has the following properties:
(1) you can access only the current value (2) you can move only forward by following the next pointer
An array has the following properties:
(1) you can access each value (2) you don't need to move around
In a lazy language, "following the next pointer" triggers "computing the new value", so loops are similar to lists, but different from arrays.
I just observe that in most programs, almost all looping constructs are to do with looping over *collections*, and it doesn't really matter what kind of collections they are...
Of course, you can loop over most collections, in the sense of repeatedly running some code for each element. This is expressed in Haskell in a bunch of type classes, most notably Functor.
Sadly, in Haskell you can't do that to "collections" that can only contain a specific type of element. I'm kinda hoping that Associated Types will fix this...?
Same goes for a set. Or even a dictionary (looping over key/value pairs), whichever way it's implemented internally.
I take your "wichever way" to apply to all collections you mentioned. Let's consider this set representation:
type Set a = a -> Bool
dual a x = not (a x) member y a = a y notMember y a = dual a y empty y = False insert x a y = x == y || a y remove a x y = x /= y && a y union a b y = a y || b y intersection a b y = a y && b y difference a b = intersection a (dual b) filter = intersection
(Function names and their meaning is taken from Data.Set. a and b stands for sets, x and y for set elements and f for some function. the dual set is the set containing all elements except those in the original set)
What has a set represented like this in common with a loop? I don't even understand that... :-S

public class Pair { public A fst; public B snd; public Pair(A fst, B snd) { this.fst = fst; this.snd = snd; } }
OK, I don't even understand that syntax. Have they changed the Java language spec or something?
Yes. As of version 5 (or 1.5, or whatever you want to call it), Java has parametric polymorphism. Do a Google search for "Java generics". -Brent

Brent Yorgey wrote:
OK, I don't even understand that syntax. Have they changed the Java language spec or something?
Yes. As of version 5 (or 1.5, or whatever you want to call it), Java has parametric polymorphism. Do a Google search for "Java generics".
OMG - they actually added a language feature to Java... o_O I bet it's broken though! ;-)

It's not that broken, It was designed by people from the FP community. :)
On 6/21/07, Andrew Coppin
Brent Yorgey wrote:
OK, I don't even understand that syntax. Have they changed the Java language spec or something?
Yes. As of version 5 (or 1.5, or whatever you want to call it), Java has parametric polymorphism. Do a Google search for "Java generics".
OMG - they actually added a language feature to Java... o_O
I bet it's broken though! ;-)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Andrew Coppin wrote:
I don't even understand that... :-S
Ok, I'll try to explain it: I represent sets by their characteristic function, wich returns True for members of the set, and False for other values. type Set a = a -> Bool For example, the set of numbers containing only 42 is represented by answerSet = (== 42) and the set of even numbers by evenSet = even Given this representation, we can easily encode a very fundamental set operation, namely taking the dual set. The dual set is the set containing all values (of a given type) except those contained in the original set. The dual set of the set containing only 42 is the set containing every number except 42. The dual set of the set of even numbers is the set of odd numbers. To implement the dual operation, we start with it's type signature. dual should take a set and return a set: dual :: Set a -> Set a By substituting the definition of Set, we have dual :: (a -> Bool) -> (a -> Bool) The last pair of parentheses is redundant, so we arrive at dual :: (a -> Bool) -> a -> Bool So dual can be seen as having two arguments, the original set and the value to test for membership in the dual set. The implementation is easy: the value is in the dual set, if it isn't in the original set:
dual a y = not (a y)
Remember that a y is True, if and only if a contains y. so (not (a y)) is False, if and only if a contains y, wich is exactly what we want for the characteristic function of the dual set. (I'm not actually sure if dual sets are called dual sets. I've chosen the word because dual sets arise from dual characteristic functions, in the "swap the meaning of True and False" sense of dual). The other operations are implemented likewise. If you're interested, try to the write their type signatures to understand what each parameter means. Tillmann

Andrew Coppin writes:
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."
The design stage of any Haskell program should include a lot of time thinking about your data structures, type classes, and how they all interact. If anything this plays a larger role than in OOP. -- -David House, dmhouse@gmail.com

On Tue, 19 Jun 2007, Andrew Coppin wrote:
Maybe it's just a culture thing then... In your typical OOP language, you spend five minutes thinking "now, what collection type shall I use here?" before going on to actually write the code. In Haskell, you just go "OK, so I'll put a list here..."
I seriously doubt this. You kind of mix two things, languages and libraries. Collections will most of the time be implemented as a library. So, in order to use the appropriate collection type, you have to know you're standard library or know of other libraries available. Then, whenever you need an appropriate collection type, you look what's available. It's this way with Java, with C++ and *definitely* with Haskell, too. Regards, Jens

Andrew Coppin wrote:
When I was at university, we learned a programming language known as Smalltalk. I was rather good at it. [Ironically, making "small talk" is one of the things I do worst IRL! But anyway, back to the topic...]
In Smalltalk, there is a wide selection of collection types, all with different facilities and efficiency trade offs. There is bag, set, list, array, ordered list, dictionary, hash table, weak array, etc. A whole menagerie of collection types.
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Why is that, exactly? Does writing software in Haskell magically change the properties of these data structures such that lists become more efficient than all the other types? Or is it that other data structures are only efficient when used with in-place updates? (The latter statement appears to be isomorphic to stating that Haskell programs must necessarily be less efficient than impure ones.)
Thoughts?
You seem to only bee looking at 2 line long functions in Haskell. Partly the use of lists comes from the fact that iterating over a sequence of values is a very very common operation (the C++ STL's forward iterators; java has 'foreach' syntax). Haskell's lazy list captures this idiom, and the compiler can apply useful optimizations (deforesting). But Haskell does comes with many data structures that most programs use: http://haskell.org/ghc/docs/latest/html/libraries/ documents what comes with GHC: Data.ByteString ( new and very efficient operations ) Data.Sequence ( based on finger trees ) Data.Tree Data.HashTable Data.Map Data.IntMap Data.Set Data.IntSet Data.PackedString Data.Queue Data.Graph For arrays there are the immutable: Data.Array Data.Array.Diff (secretly mutates for efficiency) Data.Array.Unboxed And mutable arrays: Data.Array.IO (boxed and unboxed) Data.Array.ST (boxed and unboxed) And for interfacing with the world: Foreign.Marshal.Array Data.Array.Storable http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Data%20Structu... lists "collections" and "Edison" which both aim to implement a family of collections. And http://haskell.org/haskellwiki/Applications_and_libraries/Data_structures has more information on the wiki. -- Chris

*Albert Y. C. Lai wrote: *
I am actually amazed that we have a comprehensive, widely-distributed package for graphs and graph algorithms. I can't say the same about other programming communities (they tend to re-implement it every time).
Yeah... pitty the documentation isn't more readable. :-(

Hallo,
On 6/19/07, Andrew Coppin
However, Haskell only has 1 type of collection: linked lists. (And only single-linked at that.) While other "normal" programming languages spend huge amounts of effort trying to select exactly the right collection type for the task in hand, Haskell programs only ever use linked lists.
Huh? -- -alex http://www.ventonegro.org/

Take a look at http://www.haskell.org/ghc/docs/latest/html/libraries/index.html There are more elsewhere. I am actually amazed that we have a comprehensive, widely-distributed package for graphs and graph algorithms. I can't say the same about other programming communities (they tend to re-implement it every time).

A lot of people have had comments on this thread, but I have a off-hand question: what data types are required by the 98 standard? I figured it was just lists & tuples because they have syntactic support, but is that true? Thanks, Creighton

On Tue, Jun 19, 2007 at 06:42:22PM -0500, Creighton Hogg wrote:
A lot of people have had comments on this thread, but I have a off-hand question: what data types are required by the 98 standard? I figured it was just lists & tuples because they have syntactic support, but is that true?
According to http://haskell.org/onlinereport we have: () - unit (,+) - tuples (->) - functions Array - arrays Bool - booleans BufferMode - buffering CalendarTime - parsed human time Char - characters ClockTime - time_t Complex - complexes Day - duh Double - dp. floating point Either - sums ExitCode - for system etc Float - single precision Handle - filehandles HandlePosn - fpos_t IO - actions IOError - exceptions IOMode - open mode Int - fixnums Integer - bignums Maybe - optionality Month - duh Ordering - trichotomy Permissions - filemodes Ratio - rationals SeekMode - whence StdGen - standard rng seed TimeDiff - parsed human timediff TimeLocale - for formatting times [] - lists Of these, Array, Maybe, (,), (), (->), and [] can be considered general containers. PS: why is there no index for the libraries - I had to compile that list manually :( Stefan
participants (25)
-
Adam Turoff
-
ajb@spamcop.net
-
Albert Y. C. Lai
-
Alex Queiroz
-
Andrew Coppin
-
apfelmus
-
Brandon S. Allbery KF8NH
-
Brent Yorgey
-
Calvin Smith
-
Chris Kuklewicz
-
Creighton Hogg
-
Dan Piponi
-
David House
-
Derek Elkins
-
Duncan Coutts
-
Jens Fisseler
-
Jens Fisseler
-
Jon Harrop
-
Lennart Augustsson
-
Nicolas Frisby
-
Spencer Janssen
-
Stefan O'Rear
-
Thomas Conway
-
Tillmann Rendel
-
Tomasz Zielonka