
Hello,
While we began discussing Proposal II with Ivan Lazar Miljenovic, I was
expecting first to discuss Proposal I, as my short-term goal is to get the
PR associated with Proposal I to be merged : https://github.com/haskell/
containers/pull/549
So please bring forward any remarks / concerns / approvals, for Proposal I
in priority!
Thank you,
Olivier
2018-04-02 13:30 GMT+02:00 Olivier S.
Hello,
I'm resending this proposal, which is simplified w.r.t the first one, and where I removed a wrong analysis of a benchmark.
- Proposal I : Optimize the time complexity of (key -> Maybe Vertex) lookups and graph creation when keys are Integral and consecutive.
(The related PR for proposal I is [1], including benchmarks showing the performance improvements.)
Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges consist of a binary search on an array, with a time complexity of O(log V) (I will use V for "Count of vertices", E for "Count of edges").
When key is Integral, and keys (of nodes passed to the graph creation function) form a set of /consecutive/ values (for example : [4,5,6,7] or [5,6,4,7]), we can have an O(1) lookup by substracting the value of the smallest key, and checking for bounds.
Hence, graph creation complexity is improved, and user algorithms using (key -> Maybe Vertex) lookups will see their complexity reduced by a factor of up-to O(log V).
The PR introduces this lookup and uses it for functions graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveA scKeys.
Here is a summary of complexities for (graph creation, lookup function) as they stand in the current state of the PR:
- graphFromEdges (the currently existing function): O( (V+E) * log V ), O(log V) - graphFromEdgesWithConsecutiveKeys (new function): O( E + (V*log V) ), O(1) - graphFromEdgesWithConsecutiveAscKeys (new function) : O( V+E ), O(1)
- Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in favor of `graphFromMap` taking (Map key (node,[key]))
If we pass the same key twice in the list we pass to 'graphFromEdges' it is undefined which node for that key will actually be used. Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate this issue, through the type used.
Also, using a Map makes the implementation a bit more "natural" : there is no need for sorting by key, as Map.toAscList gives exactly the sorted list we want.
We could also deprecate graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor of graphFromConsecutiveMap.
About the naming, I propose two different schemes:
Either: - graphFromEdges (takes a List, deprecated, existing function) - graphFromEdgesInMap (takes a Map) - graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys) Or: - graphFromEdges (takes a List, deprecated, existing function) - graphFromMap - graphFromConsecutiveMap with these, to reflect the Map / List duality in the naming scheme: - graphFromList (takes a List, deprecated, redirects to graphFromEdges) - graphFromConsecutiveList (takes a List, deprecated, redirects to graphFromEdgesWithConsecutiveKeys) - graphFromConsecutiveAscList (takes a List, deprecated, redirects to graphFromEdgesWithConsecutiveAscKeys)
Cheers, Olivier Sohn