Re: Proposal for Data.Graph : Improve graph creation complexity when nodes have "consecutive" keys

2 Apr 2018


      2018-04-03 0:18 GMT+02:00 Ivan Lazar Miljenovic :
...
(Sending this back to the libraries@ list as well.)
On 2 April 2018 at 23:59, Olivier S.  wrote:
...
2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic
:
...
On 2 April 2018 at 21:30, Olivier S.  wrote:
...
Hello,
I'm resending this proposal, which is simplified w.r.t the first one,
and
where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
lookups
and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing
the
...
...
...
performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges
consist
of a binary search on an array, with a time complexity of O(log V) (I
will
use V for "Count of vertices", E for "Count of edges").
At the risk of bikeshedding, can you please use |V| and |E| to refer
to the order and size of the graph respectively?
Do you mean in the haddock documentation for complexities or here? I
don't
know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V|
).
Anyway it would be a quick change in the PR, I'm not particularly
attached
to the notation.
Both.  |V| and |E| are more standard for this, as V and E represent
the vertices and edges themselves.
...
...
...
When key is Integral, and keys (of nodes passed to the graph creation
function) form a set of /consecutive/ values (for example : [4,5,6,7]
or
[5,6,4,7]), we can have an O(1) lookup by substracting the value of
the
smallest key, and checking for bounds.
I'm not sure I follow this part; are you ignoring order in these lists
(you're referring to sets but using list notation)?
I'm not ignoring the order, let me try to give a more precise definition:
keys is a list of consecutive keys iff it verifies:
-- (1) keys contains no duplicates
Set.size (Set.fromList keys) == length keys
-- (2) there is no "gap" between values, when sorted:
sort keys == [minimum keys .. maximum keys]
The O(1) lookup is at line 516 of Data/Graph.hs in
https://github.com/haskell/containers/pull/549/files (key_vertex)
...
...
Hence, graph creation complexity is improved, and user algorithms
using
...
(key
-> Maybe Vertex) lookups will see their complexity reduced by a factor
of
up-to O(log V).
The PR introduces this lookup and uses it for functions
graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys.
Here is a summary of complexities for (graph creation, lookup
function)
as
they stand in the current state of the PR:
- graphFromEdges (the currently existing function):
O( (V+E) * log V ), O(log V)
- graphFromEdgesWithConsecutiveKeys (new function):
O( E + (V*log V) ), O(1)
- graphFromEdgesWithConsecutiveAscKeys (new function) :
O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])]
in
favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges'
it
is
undefined which node for that key will actually be used.
Introducing 'graphFromMap', taking a (Map key (node,[key]) would
alleviate
this issue, through the type used.
Off the top of my head, I'm not a big fan of this.  If we're going to
improve this, then I'd prefer to do so in such a way that allowed for
usage with IntMap
Yes, IntMap seems to be better wrt performances than Map. Quoting the
doc of
IntMap:
This data structure performs especially well on binary operations like
union
and intersection. However, my benchmarks show that it is also (much)
faster
on insertions and deletions when compared to a generic size-balanced map
implementation (see Data.Map).
...
(is there an existing type-class that covers
association list-style data structures?).
There is the Map type-class (which I just discovered) in :
https://hackage.haskell.org/package/collections-api-1.0.0.
0/docs/Data-Collections.html#g:4
Except that it's in another library ;-)
So it seems using Data.IntMap would be a good compromise?
...
...
With instances defined here, but only for Lazy versions Data.Map and
Data.IntMap:
Note that the data structures for the Lazy and Strict variants of
[Int]Map are the same, it's just the strictness of the functions that
operate on them that differ.
That's interesting, I wasn't aware of this.
...
...
https://hackage.haskell.org/package/collections-base-
instances-1.0.0.0/docs/Data-Collections-BaseInstances.html
...
Also, the type-class class doesn't have toAscList (or toList) functions
(which is what we would use in the implementation).
So if we want to rely on this we would need to implement toAscList, and
probably add instances for Strict maps (Data.IntMap.Strict,
Data.Map.Strict)
...
...
Ideally you could also use
HashMap from unordered-containers as well, but since we ultimately
want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
however, is.
I see another problem with HashMap : it doesn't provide a toAscList
...
where the keys are sorted, so we would have to sort them, incurring a
fixed
O(V log V) cost, whereas with Map and IntMap the user has the
...
create the map from an ascending list (fromAscList), in O(V) time and we
can
get the list back also (toAscList) in O(V) time.
...
...
Also, using a Map makes the implementation a bit more "natural" :
function
possibility to
there
...
...
...
is
no need for sorting by key, as Map.toAscList gives exactly the sorted
list
we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and
graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in
favor
of
graphFromConsecutiveMap.
About the naming, I propose two different schemes:
Either:
    - graphFromEdges                 (takes a List, deprecated,
existing
function)
    - graphFromEdgesInMap            (takes a Map)
    - graphFromEdgesInConsecutiveMap (takes a Map with consecutive
keys)
Or:
    - graphFromEdges                 (takes a List, deprecated,
existing
function)
    - graphFromMap
    - graphFromConsecutiveMap
 with these, to reflect the Map / List duality in the naming scheme:
    - graphFromList               (takes a List, deprecated, redirects
to
graphFromEdges)
    - graphFromConsecutiveList    (takes a List, deprecated, redirects
to
graphFromEdgesWithConsecutiveKeys)
    - graphFromConsecutiveAscList (takes a List, deprecated, redirects
to
graphFromEdgesWithConsecutiveAscKeys)
Cheers,
Olivier Sohn
[1] https://github.com/haskell/containers/pull/549
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
--
Ivan Lazar Miljenovic
Ivan.Miljenovic@gmail.com
http://IvanMiljenovic.wordpress.com
--
Ivan Lazar Miljenovic
Ivan.Miljenovic@gmail.com
http://IvanMiljenovic.wordpress.com