ANNOUNCE: hierarchical-clustering and gsc-weighting

Hello! I'm pleased to announce the release of two new packages: http://hackage.haskell.org/package/hierarchical-clustering http://hackage.haskell.org/package/gsc-weighting 'hierarchical-clustering' provides a function to create a dendrogram from a list of items and a distance function between them. The most common linkage types are available: single linkage, complete linkage and UPGMA. An item can be anything, for example a DNA sequence, so this may used to create a phylogenetic tree. Or it may be used with the 'gsc-weighting' package to assign weights to the items. Weights are assigned such that close items get smaller weight than distance items, meaning that the weights try to avoid the over-representation of some closely related items. The package name come from the authors of the algorithm, Gerstein, Sonnhammer and Chothia. Again, this may be used for DNA or protein sequences. Cheers! -- Felipe.

Felipe Lessa
Hello!
I'm pleased to announce the release of two new packages:
http://hackage.haskell.org/package/hierarchical-clustering http://hackage.haskell.org/package/gsc-weighting
'hierarchical-clustering' provides a function to create a dendrogram from a list of items and a distance function between them. The most common linkage types are available: single linkage, complete linkage and UPGMA. An item can be anything, for example a DNA sequence, so this may used to create a phylogenetic tree.
What actual clustering algorithm are you using here? Also, would it be possible to have some more documentation there in general? At the very least, in your next release explain what a dendogram is and why someone would want to use your package (I had to do some quick wikipedia looking to refresh my memory on what dendogram, etc. were to get an understanding of what it does). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Tue, Aug 3, 2010 at 8:01 AM, Ivan Lazar Miljenovic
Felipe Lessa
writes: 'hierarchical-clustering' provides a function to create a dendrogram from a list of items and a distance function between them. The most common linkage types are available: single linkage, complete linkage and UPGMA. An item can be anything, for example a DNA sequence, so this may used to create a phylogenetic tree.
What actual clustering algorithm are you using here?
A naïve O(n^2) algorithm using a distance matrix. This can be improved without changing the API, however.
Also, would it be possible to have some more documentation there in general? At the very least, in your next release explain what a dendogram is and why someone would want to use your package (I had to do some quick wikipedia looking to refresh my memory on what dendogram, etc. were to get an understanding of what it does).
Documentation is always good, but I didn't want to take the time to explain everything from the beginning. I guess most people coming to this package will already know that they want a dendrogram. But if they don't, a quick googling is very effective. Hmm, I guess some diagrams would be nice. I've took the time only to explain why there is an "UPGMA" and a "FakeAverageLinkage", because that distinction isn't easy to find on the web. Actually, I still haven't found someone talking about it, just people using either with the same name "average linkage". =) Cheers, -- Felipe.

On Tue, Aug 3, 2010 at 8:23 AM, Felipe Lessa
On Tue, Aug 3, 2010 at 8:01 AM, Ivan Lazar Miljenovic
wrote: Felipe Lessa
writes: 'hierarchical-clustering' provides a function to create a dendrogram from a list of items and a distance function between them. The most common linkage types are available: single linkage, complete linkage and UPGMA. An item can be anything, for example a DNA sequence, so this may used to create a phylogenetic tree.
What actual clustering algorithm are you using here?
A naïve O(n^2) algorithm using a distance matrix. This can be improved without changing the API, however.
What a blunder! I mean, an O(n^3) algorithm -- each step takes O(n^2), and you need 'n' steps to create the whole dendrogram. I'll fix the documentation on the next release. Cheers! =) -- Felipe.
participants (2)
-
Felipe Lessa
-
Ivan Lazar Miljenovic