
Christian Maeder wrote:
Marcus D. Gabriel wrote:
If I were to write
organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])]
I quite like your idea,
Thanks.
but I think the input predicate "([a] -> Bool)" is too ambitious, although it would nicely unify Brent's
data Delimiter a where DelimEltPred :: (a -> Bool) -> Delimiter a DelimSublist :: Eq a => [a] -> Delimiter a
With the predicate ([a] -> Bool) you have to check all inits of your input to detect a delimiter and only if all inits are no delimiter you know the head element is not part of a delimiter and repeat checking the inits of the tail. I think this is too inefficient in general.
The name "organizeBy" was a metaphor. I was trying to distance my language from words such as split, delimiter, and separator to something more general. As for the use of inits, yes, this is a performance hit. If you use something like
splitOnAList :: Eq a => [a] -> [a] -> <To Be Decided>
to split on a list such as "\r\n", then you can use isPrefixOf whereupon the performance is good enough (actually, its not bad at all). (http://www.haskell.org/pipermail/libraries/2009-January/011061.html) In retrospect, the idea of splitting, delimiters, separators, and fields is what this is all about anyway.
In order to keep things simple I would vote for a split function that only takes the simple predicate (a -> Bool) and leaves the more complicated splitting to _additional_ functions (following http://haskell.org/haskellwiki/Simple_to_complex)
For instance
replace :: [a] -> a -> [a] -> [a]
could replace a sublist (first argument) with a single element (second argument)
This would help in simple cases, but leaves it to the user to choose a suitable delimiter element.
But a general splitting on sublists could be implemented via
splitBy isNothing (replace (map Just sl) Nothing (map Just l))
(For a fixed sublist sl, "Nothing" is enough to represent the delimiter)
Nice. It took me a moment, but nice.
For a simple predicate "(a -> Bool)" it remains to discuss the output of splitBy.
You've proposed "[([a], [a])]", but for the simpler case "[(a, [a])]" or "[([a], a)]" may do,
Actually, although enticing, I do not believe that [(a, [a])] is possible due to the corner cases when there is no beginning non-delimiter or ending delimiter, that is, one needs [(Maybe a, [a])]. (Please check me on this.)
but in order to capture the full information of lists before and after delimiters, something like "([a], [(a, [a])])" is needed.
Since a tuple "(a, [a])" can be viewed as non-empty list of type "[a]", "([a], [(a, [a])])" collapses to a non-empty list of type "[[a]]" with a _tail_ of non-empty element lists.
I unfortunately do not follow you here, sorry. Be that as it may, I have come to appreciate the output [[a]].
Therefore I propose the following splitBy as a "work horse".
splitBy :: (a -> Bool) -> [a] -> [[a]]
This works for me.
The implementation is simply using "break". The basic property is:
concat (splitBy p l) == l
I would consider this a requirement, so this works for me.
(This is almost in the spirit of Data.List, since groupBy or words also produces non-empty element lists.)
Getting rid of a final delimiter or all of them can be implemented in O(n) (see below).
And finally:
wordsBy p = filter (not . null) . dropDelims . splitBy p linesBy p = dropDelims . dropFinalDelim . splitBy p
with:
words s == wordsBy isSpace s lines s == linesBy (== '\n') s
Surely, one might prefer a more type-safe version wrt non-empty lists, but the main idea is to use function composition rather than composing a (non-haskell98) splitter input data type as Brent does in his split package.
I am aligned with your point of view above in the sense of additions to Data.List. The more that I think about and play with your definition of the function replace, the more I like it. It captures an idiom that I have used on the UNIX command line when hacking away on text, that is, replace sequences with an unambiguous marker for later use. So, in summary, your idea would be to introduce two functions into Data.List:
splitBy :: (a -> Bool) -> [a] -> [[a]] replace :: Eq a => [a] -> a -> [a] -> [a]
Is this correct? If so, how would you define
splitOnAList :: Eq a => [a] -> [a] -> [[a]]
using splitBy and replace. For example,
splitOnAList "\r\n" "abc\r\nxyz\r\n" == ["abc","\r\n","xyz","\r\n"]
Given this, maybe we should look at all of the many ways that software makes CSV (TSV) files and see if splitBy and replace can reasonably handle the task. Actually, can even Data.List.Split reasonably handle the task? (I only just recalled this common little problem that is almost trivial but never really so.) Cheers, - Marcus
Cheers Christian
import Data.Char (isSpace)
splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy p l = let (fr, rt) = break p l in case rt of [] -> [fr] d : tl -> let hd : tll = splitBy p tl in fr : (d : hd) : tll
dropFinalDelim :: [[a]] -> [[a]] dropFinalDelim ll@(l : ls) = if null ls then if null l then [] else ll else if null (tail (last ls)) then init ll else ll
dropDelims :: [[a]] -> [[a]] dropDelims ll = case ll of [] -> [] l : ls -> l : map tail ls
wordsBy :: (a -> Bool) -> [a] -> [[a]] wordsBy p = filter (not . null) . dropDelims . splitBy p
linesBy :: (a -> Bool) -> [a] -> [[a]] linesBy p = dropDelims . dropFinalDelim . splitBy p
prop_wordsBy :: String -> Bool prop_wordsBy s = words s == wordsBy isSpace s
prop_linesBy :: String -> Bool prop_linesBy s = lines s == linesBy (== '\n') s
replace :: Eq a => [a] -> a -> [a] -> [a] replace sl@(_ : _) r l = case l of [] -> l x : xs -> case stripPrefix sl l of Nothing -> x : replace sl r xs Just rt -> r : replace sl r rt
subListSplit :: Eq a => [a] -> [a] -> [[Maybe a]] subListSplit sl@(_ : _) l = splitBy isNothing (replace (map Just sl) Nothing (map Just l))
unintercalate :: Eq a => [a] -> [a] -> [[a]] unintercalate sl@(_ : _) = map (map fromJust) . dropDelims . subListSplit sl
-- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name