
We seem to have not added any splitters such as splitBy or split to Data.List despite the discussions. Here is yet another perspective with the hope that it will make it. For the record, see the threads * http://www.haskell.org/pipermail/libraries/2004-July/thread.html#2342 * http://www.haskell.org/pipermail/haskell-cafe/2006-July/thread.html#16559 * http://www.haskell.org/pipermail/libraries/2008-January/thread.html#8922 see the article * http://www.haskell.org/pipermail/libraries/2006-October/006072.html and see the wiki page * http://haskell.org/haskellwiki/List_function_suggestions Essentially, I believe that the difficulty in choosing a splitter for Data.List comes from the fact that it does not take much of a problem to solve before a simple splitter is just not enough whereupon you should just use a FSM, Alex, or Parsec for example. A way forward would be to state a few meta-properties and properties up front and then see if this leads to a reasonable solution for the simple problems that just need a solution that is "good enough." I would suggest that 1. a set of splitter words should be small in number, 2. such a set should fit within the spirit of the List module of the Haskell 98 standard libraries, 3. it should pick up where break and span left off, and 4. the results coming from this set of splitter words should be unsplittable. I would argue that the above suggestions imply at least two words * splitBy :: (a -> Bool) -> [a] -> [([a], [a])] and * split :: [a] -> [a] -> [([a], [a])] such that splitBy takes a predicate p, e.g., (=='\n'), and a list xs and such that split takes a list of equivalent delimiters ds, e.g., " \t", and a list xs. The functions splitBy and split should at least have the follow properites.
(concatMap (\(x1, x2) -> x1 ++ x2) $ splitBy p xs) == xs splitBy _ [] == [([],[])] splitBy (\x -> False) xs == [(xs,[])] splitBy (\x -> True) xs == [([],[x1]),([],[x2]),([],[x3]), ...]
and
(concatMap (\(x1, x2) -> x1 ++ x2) $ split ds xs) == xs split _ [] == [([],[])] split [] xs == [(xs,[])] split xs xs == [([],[x1]),([],[x2]), ([],[x3]), ...]
where xs == x1:x2:x2:...:[]. Examples would be
splitBy (=='/') "/a/b" == [("","/"),("a","/"),("b","")] splitBy (=='/') "//aa//bb" == [("","/"),("","/"),("aa","/"),("","/"),("bb","")] splitBy (=='\n') "\na\nb" == [("","\n"),("a","\n"),("b","")] split xs xs == [([],[x1]),([],[x2]), ([],[x3]), ...] split " \t" "a\tb \tc\t d " == [("a","\t"),("b"," "),("","\t"),("c","\t"),(""," "),("d"," ")] split "/" "/a/b" == [("","/"),("a","/"),("b","")]
This leaves the one remaining "simple" case which is splitting using a list such as "\r\n" which splitBy and split cannot handle easily. This leads to * splitUsing :: (Eq a) => [a] -> [a] -> [([a], [a])] such that
(concatMap (\(x1, x2) -> x1 ++ x2) $ splitUsing xs xs') == xs' splitUsing _ [] == [([] , [])] splitUsing [] xs == [(xs , [])] splitUsing xs xs == [([] , xs)]
Examples would be
splitUsing "\r\n" "a" == [("a", "")] splitUsing "\r\n" "\r\n" == [("" , "\r\n")] splitUsing "\r\n" "a\r\n" == [("a", "\r\n")] splitUsing "\r\n" "a\r\n\r\n" == [("a", "\r\n"), ("" , "\r\n")] splitUsing "\r\n" "a\r\nb" == [("a", "\r\n"), ("b", "")] splitUsing "\r\n" "a\r\nb\r\n" == [("a", "\r\n"), ("b", "\r\n")]
Wrapping up, based on a sort of logical extension, you could imagine * splitWhere :: ([a] -> Bool) -> [a] -> [([a], [a])] but I believe that this fourth word is one too many, cute, but not necessary. If the above three splitter words are not enough, then I would say that you need a more powerful tool beyond the spirit of the Haskell 98 standard List module. Just to check, the Haskell 98 List.lines function would be defined as
lines98 xs = map fst $ split "\n" xs
and proper lines' and unlines' functions would be
lines' :: String -> Either [String] [String] lines' [] = Left [] lines' xs = let res = split "\n" xs nul = (null.snd.last) res in (if nul then Left else Right) $ map fst res
unlines' :: Either [String] [String] -> String unlines' exss = intercalate "\n" $ either (id) (++[[]]) exss
such that
unlines' . lines' == id
This is to say, (Right [String]) if the original xs ends in '\n' and (Left [String]) if it does not. For your review, the attached file Split.hs defines and documents splitBy, split, splitUsing, and splitWhere, and the attached file SplitProperties.hs defines tests for these same words. Finally, I would propose adding splitBy, split, and splitUsing to Data.List as three words that fit within the spirit of the Haskell 98 List module, take off where break and span leave off, are unsplittable, and are "good enough". Cheers, - Marcus -- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75