A rule of thumb that has served me well w.r.t exposing internal modules is to expose a Data.Foo.Internal but make it clear it is a very fragile interface.

Even by going to far as to say this module does not follow the PVP and that they should expect breaking changes to come fast and often. Users should only safely depend on it with minor-version specific bounds then. This ameliorates the concerns about how it ties your hands as an implementor.

Breaking this API shouldn't require discussion on the mailing lists, as it is an internal implementation detail. This should further ameliorate concerns about it tying your hands as an implementor.

This lets users who need to write performant code not have to fork the entire package. (I've had to do this with Map before, Text and other packages that have been rather hide-bound about not exposing implementation details, it sucks.)

My experience is maybe 1-2% of your users need it, but when they need it it is the difference between the package being usable or having to be completely replaced with something else. They are also the kind of users who understand the need for the best possible implementation and who will roll with the punches.

Chasing after changes in the implementation is generally far less work than maintaining an entire fork.

-Edward

On Mon, Oct 7, 2013 at 4:10 AM, Milan Straka <fox@ucw.cz> wrote:

Hi all,

> -----Original message-----
> From: Ryan Newton <rrnewton@gmail.com>

> Sent: 7 Oct 2013, 00:16
>
> Ok, so we've narrowed the focus quite a bit to JUST exposing enough from
> containers to enable a third-party library to do all the parallel
> traversals it wants. Which of the following limited proposal would people
> like more?
>
> (1) Expose Bin/Tip from, say, Data.Map.Internal, as in this patch:
>
> https://github.com/rrnewton/containers/commit/5d6b07f69e8396023101039a4aaab619af41c810
>
> (2) a splitTree function [1]. A patch can be found here:
> https://github.com/rrnewton/containers/commit/6153896f0c7e6cdf70656dc6b641ce61711175f8
>
> The argument for (1) would be that it doesn't pollute any namespaces people
> actually use at all, and Tip & Bin would seem to be pretty darn stable at
> this point. The only consumers of this information in practice would be
> downstream companion libraries (like, say, a parallel traversals library
> for monad-par & LVish!) Those could be updated if there were ever a
> seismic shift in the containers implementations.

I am strongly against (1). Exposing internal representation seem really
wrong. FYI, I am planning to change the representation of Data.Map and
Data.Set to a three constructor representation (I have already some
benchmarks, halving time complexity of fold and decreasing memory usage
by ~ 20%). So no, the internal representation is subject to change and
I do not want it to become part of API.

As for (2), I am not very happy about the type -- returning _three_ maps
makes some assumptions about the internal representation. This can be
seen when considering IntMap.splitTree -- there are no three IntMaps to
return in a IntMap.splitTree, only two.

What about the list version
splitTree :: Map k a -> [Map k a]
If splitTree is INLINE, I think we can assume the deforestation will
happen. That would allow us to define splitTree for IntMap too.
If someone is worried, could they check that deforestation does really
happen?

Cheers,
Milan