
Marcus D. Gabriel wrote:
Christian Maeder wrote: [...]
splitOnAList :: Eq a => [a] -> [a] -> <To Be Decided>
to split on a list such as "\r\n", then you can use isPrefixOf whereupon the performance is good enough (actually, its not bad at all).
The special case for "\r\n" is actually trivial, because "\r" can simply be filtered out first.
But a general splitting on sublists could be implemented via
splitBy isNothing (replace (map Just sl) Nothing (map Just l))
(For a fixed sublist sl, "Nothing" is enough to represent the delimiter)
Nice. It took me a moment, but nice.
Good, see below my definition of subListSplit and unintercalate.
For a simple predicate "(a -> Bool)" it remains to discuss the output of splitBy.
You've proposed "[([a], [a])]", but for the simpler case "[(a, [a])]" or "[([a], a)]" may do,
Actually, although enticing, I do not believe that [(a, [a])] is possible due to the corner cases when there is no beginning non-delimiter or ending delimiter, that is, one needs [(Maybe a, [a])]. (Please check me on this.)
You're right here, there are several ways to accommodate all delimiters and non-delimiters. I've done it as below.
but in order to capture the full information of lists before and after delimiters, something like "([a], [(a, [a])])" is needed.
Since a tuple "(a, [a])" can be viewed as non-empty list of type "[a]", "([a], [(a, [a])])" collapses to a non-empty list of type "[[a]]" with a _tail_ of non-empty element lists.
I unfortunately do not follow you here, sorry. Be that as it may, I have come to appreciate the output [[a]].
In "([a], [(a, [a])])" the first component takes the (possibly empty) part before the first delimiter. The second part of type [(a, [a])] are the remaining delimiter with (possibly empty or longer) non-delimiter pairs. Such pairs are then viewed as non-empty lists. After changing the second component to [[a]] the resulting pair ([a], [[a]]) is then also changed to a non-empty list of lists [[a]].
wordsBy p = filter (not . null) . dropDelims . splitBy p linesBy p = dropDelims . dropFinalDelim . splitBy p
I'ld like to improve linesBy as follows: linesBy p = dropFinalNil . dropDelims . splitBy p (dropFinalNil is simpler than dropFinalDelim and dropDelims can assume a non-empty list from splitBy.)
So, in summary, your idea would be to introduce two functions into Data.List:
splitBy :: (a -> Bool) -> [a] -> [[a]] replace :: Eq a => [a] -> a -> [a] -> [a]
Is this correct?
Personally, I'ld be content with wordsBy only, but adding linesBy, splitBy and the combination of "dropDelims . splitBy p" (under some suitable names) would make sense for me with or without replace. (In fact, replace should be generalized further.)
If so, how would you define
splitOnAList :: Eq a => [a] -> [a] -> [[a]]
using splitBy and replace. For example,
splitOnAList "\r\n" "abc\r\nxyz\r\n" == ["abc","\r\n","xyz","\r\n"]
Again, for "\r\n" this is simply: concatMap (: ["\r\n"]) . linesBy (== '\n') . filter (/= '\r') For the general case take my unintercalate below: splitOnAList sl = intercalate [sl] . map (: []) . unintercalate sl This code only puts back identical delimiters that nobody needs because the delimiter is fixed and known via the input argument. For sublist matching, keeping delimiters is unnecessary in general!
reasonably handle the task. Actually, can even Data.List.Split reasonably handle the task? (I only just recalled this common little problem that is almost trivial but never really so.)
Data.List.Split can handle all these tasks (and more), too, only less elegant (I think).
import Data.Char (isSpace)
splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy p l = let (fr, rt) = break p l in case rt of [] -> [fr] d : tl -> let hd : tll = splitBy p tl in fr : (d : hd) : tll
dropFinalDelim :: [[a]] -> [[a]]
forget dropFinalDelim one and take: dropFinalNil :: [[a]] -> [[a]] dropFinalNil ll@(_ : _) = if null (last ll) then init ll else ll
dropDelims :: [[a]] -> [[a]]
dropDelims does no longer need to work for empty inputs:
dropDelims ll = case ll of [] -> [] l : ls -> l : map tail ls
dropDelims (l : ls) = l : map tail ls but a total variant makes also sense: dropDelims ll = let (ft, rt) = splitAt 1 ll in ft ++ map (drop 1) rt
wordsBy :: (a -> Bool) -> [a] -> [[a]] wordsBy p = filter (not . null) . dropDelims . splitBy p
linesBy :: (a -> Bool) -> [a] -> [[a]]
change linesBy to: linesBy p = dropFinalNil . dropDelims . splitBy p
replace :: Eq a => [a] -> a -> [a] -> [a] replace sl@(_ : _) r l = case l of [] -> l x : xs -> case stripPrefix sl l of Nothing -> x : replace sl r xs Just rt -> r : replace sl r rt
subListSplit :: Eq a => [a] -> [a] -> [[Maybe a]] subListSplit sl@(_ : _) l = splitBy isNothing (replace (map Just sl) Nothing (map Just l))
unintercalate :: Eq a => [a] -> [a] -> [[a]] unintercalate sl@(_ : _) = map (map fromJust) . dropDelims . subListSplit sl
unintercalate can also be simplified to: unintercalate sl@(_ : _) = map catMaybes . subListSplit sl Cheers Christian