
Brent Yorgey wrote:
P2. There should be no information loss, that is, keep the delimiters, keep the separators, keep the parts of the original list xs that satisfy a predicate p, do not lose information about the beginning and the end of the list relative to the first and last elements of the list respectively. The user of the function decides what to discard.
P3. A split list should be unsplittable so as to recover the original list xs. (I made up the word unsplittable.) (P2 implies P3, but let us state this anyway.)
I'm not sure I agree with this.
Thanks for stating this. Dropping P3 would change my thinking about this topic, that is, if we drop P3, then I would prefer that no splitter functions are added to Data.List and that it is left as is.
The problem is that much (most?) of the time, people looking for a split function want to discard delimiters; for example, if you have a string like "foo;bar;baz" and you want to split it into ["foo","bar","baz"].
I agree with this comment when thinking about strings and what I would do most of the time and from a pragmatic point of view.
In this case it's really annoying to have to throw away the delimiters yourself, especially if you just get back a list like ["foo",";","bar",";","baz"] and have to decide which things are delimiters and which aren't,
I certainly understand this point, however,
with no help from the type system.
-- P5. The splitter functions should fit within the spirit of the -- Data.List module and even the original Haskell 98 List module -- in terms of type signature and complexity of implementation. In my mind, the idea of adding a few splitter functions to Data.List does not preclude Data.List.Split. From my perspective of P5 above, within the spirit of Data.List, you can work with things like a, [a], (a,b), Maybe a, Eq a, ... . If you wish to do more, a separate module would be in order as you have done.
But, as you noted, throwing away information like this is bad from an elegance/formal properties point of view. This is exactly why I designed the Data.List.Split library as I did: the core internal splitting function is information-preserving, and by using various combinators the user can choose to throw away whatever information they are not interested in.
Perfect. So, as you see it, are there one, two, or three functions in or hiding in Data.List.Split.Internals that can be factored and placed into Data.List that are in line with P1 to P7? You do not actually need to agree to P1 to P7, it is a conceptual exercise. The idea is that Data.List.Split would flow more naturally from Data.List with these few functions added to it. Finally, as concrete examples or to clarify points, the words split, delimiter, separator and variations thereof have been used. This already implies a theme. Do you conceptualize of Data.List.Split as primarily to help programmers from other backgrounds to be able to manipulate strings, that is, supply some nice idioms but generalized from [Char] to [a]? If I were to write organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])] could you think of a specification such that this function would be a work horse in implementing Data.List.Split.Internals and Data.List.Split? Alex had the point of view later in this thread that now that Data.List.Split exists, anything that we move to Data.List will be arbitrary in the cutoff. Duncan responded by advancing the idea that by examining what is happening in Haskell code, we may find a few useful functions for Data.List. My intermediate idea would be to examine Data.List.Split and Data.List.Split.Internals and think about factoring very general idioms that could be placed into Data.List, would be the work horse for implementing Data.List.Split.Internals and Data.List.Split, and would be in line with P1 to P7, which I acknowledge is my point of view. If a few words could be found that fit the above, they would merit Data.List. Finally, this whole thread brings up the question in my mind about module design. As work is put into Data.List.Split, what is the guiding principle that prevents it from becoming Data.List.Extensions or to be a bit more direct, Data.List.TheFunctionsThatWereForgotten? At http://haskell.org/haskellwiki/Data.List.Split, we have
An important caveat: we should strive to keep things flexible yet SIMPLE. The more complicated things get, the closer this gets to just being a general parsing or regex library. So the right balance needs to be struck.
I agree, and we have
A theoretical module which contains implementations/combinators for implementing every possible method of list-splitting known to man. This way no one has to argue about what the correct interface for split is, we can just have them all.
Is not this Data.List? In other words, what idea or theme does a new Haskell programmer use to decide to first look into Data.List as opposed to Data.List.Split and vice versa? Cheers, - Marcus -- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75