RE: Prelude function suggestions

How about: -- | Splits a list into components delimited by separators, where the -- predicate returns True for a separator element. The resulting -- components do not contain the separators. Two adjacent separators -- result in an empty component in the output. eg. -- -- @ -- > split (=='a') "aabbaca" -- ["","","bb","c",""] -- @ split :: (a -> Bool) -> [a] -> [[a]] split p s = case rest of [] -> [chunk] _:rest -> chunk : split p rest where (chunk, rest) = break p s -- | Like 'split', except that sequences of adjacent separators are -- treated as a single separator. eg. -- -- @ -- > tokens (=='a') "aabbaca" -- ["bb","c"] -- @ tokens :: (a -> Bool) -> [a] -> [[a]] tokens p = filter (not.null) . split p 'split' is the same as Python's split. There is already a precedent for using short names in the Data.List module so I think 'split' is the right choice, and 'tokens' is fairly descriptive. Then we have words = tokens isSpace which is quite nice. Unfortunately lines /= split (=='\n') because lines strips off blank lines at the end. So perhaps we do need the third version. Cheers, Simon

Simon Marlow writes:
split :: (a -> Bool) -> [a] -> [[a]]
tokens :: (a -> Bool) -> [a] -> [[a]]
Unfortunately, neither function would help me solve my \r\n-line-ending case. I need separators that are longer than one "character". I also think the idea of being able to use the full "current prefix" for the decision is important. I can't use these function, because they would, for example, erroneously split the string "abc\ndef". So far, I like the breakWhere function Ketil proposed best. Peter

On 29 Jul 2004 12:32:19 +0200, Peter Simons
Simon Marlow writes:
split :: (a -> Bool) -> [a] -> [[a]]
tokens :: (a -> Bool) -> [a] -> [[a]]
Unfortunately, neither function would help me solve my \r\n-line-ending case. I need separators that are longer than one "character". I also think the idea of being able to use the full "current prefix" for the decision is important. I can't use these function, because they would, for example, erroneously split the string "abc\ndef".
So far, I like the breakWhere function Ketil proposed best.
I often use an even more generic function: splitter :: ([a] -> (b,[a])) -> [a] -> [b] splitter _ [] = [] splitter f xs = b : splitter f rest where (b,rest) = f xs The name might not be ideal, but the general mechanics of it is pretty nice. /Martin

On Thu, Jul 29, 2004 at 12:40:08PM +0200, Martin Sjögren wrote:
I often use an even more generic function:
splitter :: ([a] -> (b,[a])) -> [a] -> [b] splitter _ [] = [] splitter f xs = b : splitter f rest where (b,rest) = f xs
The name might not be ideal, but the general mechanics of it is pretty nice.
Have you looked at List.unfoldr? It's even more general. unfoldr :: (b -> Maybe (a, b)) -> b -> [a] Best regards, Tom -- .signature: Too many levels of symbolic links

W liście z czw, 29-07-2004, godz. 13:39 +0200, Tomasz Zielonka napisał:
Have you looked at List.unfoldr? It's even more general.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
And when we finally find a very general solution, the specific case of "split colon-separated fields" requires some thought and three lines of code using higher order functions and pattern matching, instead of just splitSep (== ':') line :-) Make simple things easy, complex things possible. Concentrate on the first part. Generalize only if it doesn't make the common simple case harder. After all, one always can write a complex splitting function by hand. The simple version doesn't have to cover all possible generalizations, only those which don't disturb common cases. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

On Thu, Jul 29, 2004 at 01:57:51PM +0200, Marcin 'Qrczak' Kowalczyk wrote:
W liście z czw, 29-07-2004, godz. 13:39 +0200, Tomasz Zielonka napisał:
Have you looked at List.unfoldr? It's even more general.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
And when we finally find a very general solution, the specific case of "split colon-separated fields" requires some thought and three lines of code using higher order functions and pattern matching, instead of just splitSep (== ':') line :-)
Make simple things easy, complex things possible. Concentrate on the first part. Generalize only if it doesn't make the common simple case harder. After all, one always can write a complex splitting function by hand. The simple version doesn't have to cover all possible generalizations, only those which don't disturb common cases.
I agree. I often found that writing a splitting function with unfoldr is a bit cumbersome. Best regards, Tom -- .signature: Too many levels of symbolic links

At 14:11 29/07/04 +0200, Tomasz Zielonka wrote:
On Thu, Jul 29, 2004 at 01:57:51PM +0200, Marcin 'Qrczak' Kowalczyk wrote:
W li¶cie z czw, 29-07-2004, godz. 13:39 +0200, Tomasz Zielonka napisa³:
Have you looked at List.unfoldr? It's even more general.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
And when we finally find a very general solution, the specific case of "split colon-separated fields" requires some thought and three lines of code using higher order functions and pattern matching, instead of just splitSep (== ':') line :-)
Make simple things easy, complex things possible. Concentrate on the first part. Generalize only if it doesn't make the common simple case harder. After all, one always can write a complex splitting function by hand. The simple version doesn't have to cover all possible generalizations, only those which don't disturb common cases.
I agree. I often found that writing a splitting function with unfoldr is a bit cumbersome.
So it appears. But how much is this due to the prelude/library functions not working together so well? Using the formulation of unfold from [1], some solutions appear a bit less cumbersome; e.g.: [[ unfold :: (b->Bool) -> (b->a) -> (b->b) -> b -> [a] unfold p f g x | p x = [] | otherwise = f x : unfold p f g (g x) splitBy p = unfold null (fst . break p) (tail . snd . break p) t1 = splitBy (==':') "ab:cde:fgh::ijk:" ]] The p and g functions can be tweaked to achieve the variations mentioned, though that's arguably getting as complicated as writing the required function to begin with. It does bother me that this formulation requires (in the example above) the 'break p' to be evaluated twice. I also note that none of the other functions offered have provided for the two use-cases I mentioned in [2], AFAICT. #g -- [1] http://web.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/publications/index.html#... [2] http://www.haskell.org//pipermail/libraries/2004-July/002353.html ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact

Simon Marlow wrote:
How about: split :: (a -> Bool) -> [a] -> [[a]]
I would not mind if you put it into Data.List (although I would need to change some code that I've copied from elsewhere) It's the libraries's writer privilege to choose (short) names.
Unfortunately
lines /= split (=='\n')
because lines strips off blank lines at the end.
Only the last newline is ignored: lines "\na\nb\n\n" = ["","a","b",""] Thus, it behaves in the same way as splitPS (arguably this is maybe not a bug) Christian
participants (7)
-
Christian Maeder
-
Graham Klyne
-
Marcin 'Qrczak' Kowalczyk
-
Martin Sjögren
-
Peter Simons
-
Simon Marlow
-
Tomasz Zielonka