
I want to write a function whose behavior is as follows: foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"] Note the sequence "\r\n", which is ignored. How can I do this?

Doesn't the function "lines" handle different line-endings?
(In the Prelude and in Data.List)
If not, doing this with parsec would be easy (yet maybe slightly
overkill...)
2012/1/2 max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

В Mon, 2 Jan 2012 10:45:18 +0100
Yves Parès
Doesn't the function "lines" handle different line-endings? (In the Prelude and in Data.List)
If not, doing this with parsec would be easy (yet maybe slightly overkill...)
2012/1/2 max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Okay, so it doesn't handle different line-endings.
I have a more general solution (statefulSplit)
http://hpaste.org/55980
I cannot test it as I don't have an interpreter at hand, but if someone
has, I'd be glad to have comments.
(It might be more readable by using the State monad)
2012/1/2 max
В Mon, 2 Jan 2012 10:45:18 +0100 Yves Parès
пишет: Prelude> lines "string1\nstring2\r\nstring3\nstring4" ["string1","string2\r","string3","string4"]
Doesn't the function "lines" handle different line-endings? (In the Prelude and in Data.List)
If not, doing this with parsec would be easy (yet maybe slightly overkill...)
2012/1/2 max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am 02.01.2012 10:44, schrieb max:
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
replace the sequence by something unique first, i.e. a single "\r" (and revert this change later). (Replacing a single character is easier using concatMap). HTH Christian -- | replace first (non-empty) sublist with second one in third -- argument list replace :: Eq a => [a] -> [a] -> [a] -> [a] replace sl r = case sl of [] -> error "replace: empty list" _ -> concat . unfoldr (\ l -> case l of [] -> Nothing hd : tl -> Just $ case stripPrefix sl l of Nothing -> ([hd], tl) Just rt -> (r, rt))

On 02/01/2012 09:44, max wrote:
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this? Doing it probably the hard way (and getting it wrong) looks like the following...
-- Function to accept (normally) a single character. Special-cases -- \r\n. Refuses to accept \n. Result is either an empty list, or -- an (accepted, remaining) pair. parseTok :: String -> [(String, String)] parseTok "" = [] parseTok (c1:c2:cs) | ((c1 == '\r') && (c2 == '\n')) = [(c1:c2:[], cs)] parseTok (c:cs) | (c /= '\n') = [(c:[], cs)] | True = [] -- Accept a sequence of those (mostly single) characters parseItem :: String -> [(String, String)] parseItem "" = [("","")] parseItem cs = [(j1s ++ j2s, k2s) | (j1s,k1s) <- parseTok cs , (j2s,k2s) <- parseItem k1s ] -- Accept a whole list of strings parseAll :: String -> [([String], String)] parseAll [] = [([],"")] parseAll cs = [(j1s:j2s,k2s) | (j1s,k1s) <- parseItem cs , (j2s,k2s) <- parseAll k1s ] -- Get the first valid result, which should have consumed the -- whole string but this isn't checked. No check for existence either. parse :: String -> [String] parse cs = fst (head (parseAll cs)) I got it wrong in that this never consumes the \n between items, so it'll all go horribly wrong. There's a good chance there's a typo or two as well. The basic idea should be clear, though - maybe I should fix it but I've got some other things to do at the moment. Think of the \n as a separator, or as a prefix to every "item" but the first. Alternatively, treat it as a prefix to *every* item, and artificially add an initial one to the string in the top-level parse function. The use tail etc to remove that from the first item. See http://channel9.msdn.com/Tags/haskell - there's a series of 13 videos by Dr. Erik Meijer. The eighth in the series covers this basic technique - it calls them monadic and uses the do notation and that confused me slightly at first, it's the *list* type which is monadic in this case and (as you can see) I prefer to use list comprehensions rather than do notation. There may be a simpler way, though - there's still a fair bit of Haskell and its ecosystem I need to figure out. There's a tool called alex, for instance, but I've not used it.

On Mon, Jan 2, 2012 at 3:14 PM, max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
Here's a simple way (may not be the most efficient) - import Data.List (isSuffixOf) split = reverse . foldl f [] . lines where f [] w = [w] f (x:xs) w = if "\r" `isSuffixOf` x then ((x++"\n"++w):xs) else (w:x:xs) Testing - ghci> split "ab\r\ncd\nefgh\nhijk" ["ab\r\ncd","efgh","hijk"] -- Anupam

On Mon, Jan 02, 2012 at 12:44:23PM +0300, max wrote:
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
unixLines :: String -> [String] unixLines xs = reverse . map reverse $ go xs "" [] where go [] l ls = l:ls go ('\r':'\n':xs) l ls = go xs ('\n':'\r':l) ls go ('\n':xs) l ls = go xs "" (l:ls) go (x:xs) l ls = go xs (x:l) ls

max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
cabal install split then do something like import Data.List (groupBy) import Data.List.Split (splitOn) rn '\r' '\n' = True rn _ _ = False required_function = fmap concat . splitOn ["\n"] . groupBy rn (though that might be an abuse of groupBy) -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

В Mon, 02 Jan 2012 11:12:49 +0000
Jon Fairbairn
max
writes: I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
cabal install split
then do something like
import Data.List (groupBy) import Data.List.Split (splitOn)
rn '\r' '\n' = True rn _ _ = False
required_function = fmap concat . splitOn ["\n"] . groupBy rn
(though that might be an abuse of groupBy)
This is the simplest solution of the proposed, in my opinion. Thank you very much.

On Mon, Jan 2, 2012 at 10:12 AM, max
This is the simplest solution of the proposed, in my opinion. Thank you very much.
Better yet, don't use String and use Text. Then you just need T.splitOn "\r\n" [1]. Cheers, [1] http://hackage.haskell.org/packages/archive/text/0.11.1.12/doc/html/Data-Tex... -- Felipe.

On Mon, Jan 2, 2012 at 5:52 PM, Felipe Almeida Lessa
On Mon, Jan 2, 2012 at 10:12 AM, max
wrote: This is the simplest solution of the proposed, in my opinion. Thank you very much.
Better yet, don't use String and use Text. Then you just need T.splitOn "\r\n" [1].
That is actually the opposite of what the OP wants, however it's interesting that Text has a function like that and not the String functions in the standard library. -- Anupam

String is really for small strings. Text is more efficent and also has
more functionality, including most, if not all, of the functions
defined for String.
On Mon, Jan 2, 2012 at 3:12 PM, Anupam Jain
On Mon, Jan 2, 2012 at 5:52 PM, Felipe Almeida Lessa
wrote: On Mon, Jan 2, 2012 at 10:12 AM, max
wrote: This is the simplest solution of the proposed, in my opinion. Thank you very much.
Better yet, don't use String and use Text. Then you just need T.splitOn "\r\n" [1].
That is actually the opposite of what the OP wants, however it's interesting that Text has a function like that and not the String functions in the standard library.
-- Anupam
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Markus Läll

On 02/01/2012 11:12, Jon Fairbairn wrote:
max
writes: I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this? cabal install split
then do something like
import Data.List (groupBy) import Data.List.Split (splitOn)
rn '\r' '\n' = True rn _ _ = False
required_function = fmap concat . splitOn ["\n"] . groupBy rn
(though that might be an abuse of groupBy)
Sadly, it turns out that not only is this an abuse of groupBy, but it has (I think) a subtle bug as a result. I was inspired by this to try some other groupBy stuff, and it didn't work. After scratching my head a bit, I tried the following... Prelude> import Data.List Prelude Data.List> groupBy (<) [1,2,3,2,1,2,3,2,1] [[1,2,3,2],[1,2,3,2],[1]] That wasn't exactly the result I was expecting :-( Explanation (best guess) - the function passed to groupBy, according to the docs, is meant to test whether two values are 'equal'. I'm guessing the assumption is that the function will effectively treat values as belonging to equivalence classes. That implies some rules such as... (a == a) reflexivity : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c) I'm not quite certain I got those names right, and I can't remember the name of the first rule at all, sorry. The third rule is probably to blame here. By the rules, groupBy doesn't need to compare adjacent items. When it starts a new group, it seems to always use the first item in that new group until it finds a mismatch. In my test, that means it's always comparing with 1 - the second 2 is included in each group because although (3 < 2) is False, groupBy isn't testing that - it's testing (1 < 2). In the context of this \r\n test function, this behaviour will I guess result in \r\n\n being combined into one group. The second \n will therefore not be seen as a valid splitting point. Personally, I think this is a tad disappointing. Given that groupBy cannot check or enforce that it's test respects equivalence classes, it should ideally give results that make as much sense as possible either way. That said, even if the test was always given adjacent elements, there's still room for a different order of processing the list (left-to-right or right-to-left) to give different results - and in any case, maybe it's more efficient the way it is.

On 04/01/2012 16:47, Steve Horne wrote:
(a == a) reflexivity : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c)
Oops - that's... reflexivity : (a == a) symmetry : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c) An equivalence relation is a relation that meets all these conditions.

Le Wed, 04 Jan 2012 17:49:15 +0000,
Steve Horne
On 04/01/2012 16:47, Steve Horne wrote:
(a == a) reflexivity : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c)
Oops - that's...
reflexivity : (a == a) symmetry : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c)
An equivalence relation is a relation that meets all these conditions.
I prefer to use "transymmetry" (although I guess it is not a regular word): reflexivity: a ≃ a transymmetry: ∀ a b. b≃a ⇒ ∀ c. c≃a ⇒ b≃c so I only have 2 rules. transymmetry is trivially derived from transitivity and symmetry. symmetry is trivially derived from reflexivity and transymmetry. transitivity is trivially derived from symmetry and transymmetry (and thus from transymmetry and reflexivity)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am 04.01.2012 17:47, schrieb Steve Horne:
On 02/01/2012 11:12, Jon Fairbairn wrote:
max
writes: I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
Why do you have these (unhealthy) different kinds of line breaks (Unix and Windows style) in your string in the first place? I hope, not by something calling "unlines" (or intercalate "\n") earlier. Cheers Christian

Steve Horne
On 02/01/2012 11:12, Jon Fairbairn wrote:
max
writes: I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this? cabal install split
then do something like
import Data.List (groupBy) import Data.List.Split (splitOn)
rn '\r' '\n' = True rn _ _ = False
required_function = fmap concat . splitOn ["\n"] . groupBy rn
(though that might be an abuse of groupBy)
Sadly, it turns out that not only is this an abuse of groupBy, but it has (I think) a subtle bug as a result.
It does indeed. Thanks. That was pretty much what I feared.
Explanation (best guess) - the function passed to groupBy, according to the docs, is meant to test whether two values are 'equal'. I'm guessing the assumption is that the function will effectively treat values as belonging to equivalence classes. That implies some rules such as...
Right. This issue has come up from time to time since groupBy was first written, and someone pops up to justify the present behaviour, but I can never remember why.
In the context of this \r\n test function, this behaviour will I guess result in \r\n\n being combined into one group. The second \n will therefore not be seen as a valid splitting point.
Correct. In my defence, I did say “do something like” :-)
Personally, I think this is a tad disappointing. Given that groupBy cannot check or enforce that it's test respects equivalence classes, it should ideally give results that make as much sense as possible either way. That said, even if the test was always given adjacent elements, there's still room for a different order of processing the list (left-to-right or right-to-left) to give different results - and in any case, maybe it's more efficient the way it is.
Looking back at the libraries list, I get the impression that there was a suggestion to change the behaviour of groupBy, but it doesn’t seem to have happened. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html (updated 2010-09-14)

On 05/01/2012 10:02, Jon Fairbairn wrote:
Steve Horne
writes: Personally, I think this is a tad disappointing. Given that groupBy cannot check or enforce that it's test respects equivalence classes, it should ideally give results that make as much sense as possible either way. That said, even if the test was always given adjacent elements, there's still room for a different order of processing the list (left-to-right or right-to-left) to give different results - and in any case, maybe it's more efficient the way it is. Looking back at the libraries list, I get the impression that there was a suggestion to change the behaviour of groupBy, but it doesn’t seem to have happened.
I've realised that the left-to-right vs. right-to-left order thing makes no difference - I don't know why I thought that now. I've written an implementation, only the predicate is inverse-logic - True means cut-between-these rather than keep-these-together. I keep thinking there should be a tail-recursive implementation, but the usual trick would either mean using ++ or difference lists or similar, or would deliver the results in reverse order. If anyone can think of a way to get the correct result in one pass through the list (assuming tail recursion is optimised), I'm curious. Or... does non-strict evaluation mean I shouldn't worry about it? Maybe it does a good job of evaluating the head quickly anyway, as the data dependencies are quite localized? I've been wondering how lazy evaluation interacts with recursion over lists in performance terms for a while. -- groupCut - Similar to groupBy, but where groupBy assumes an equivalence relation, -- groupCut takes a function that indicates where to cut. The two parameters to this -- function are always adjacent items from the list, and if the function returns True, -- a cut is done between the two items. groupCut :: (x -> x -> Bool) -> [x] -> [[x]] groupCut f [] = [] groupCut f xs = let (y,ys,yss) = groupCut' f xs in (y:ys):yss -- arg1 - cut here test function -- arg2 - input list -- result - triple of current (head char, head group excl. head char, tail groups) -- -- the input list must not be empty - this is handled in the front-end function. groupCut' :: (x -> x -> Bool) -> [x] -> (x, [x], [[x]]) groupCut' f (x:[]) = (x, [], []) groupCut' f (x:xs) = let (y,ys,yss) = groupCut' f xs in if (f x y) then (x, [], (y:ys):yss) else (x, y:ys, yss)

On Thu, Jan 5, 2012 at 05:57, Steve Horne
-- groupCut - Similar to groupBy, but where groupBy assumes an equivalence relation, -- groupCut takes a function that indicates where to cut. The two parameters to this -- function are always adjacent items from the list, and if the function returns True, -- a cut is done between the two items.
span/break? -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 05/01/2012 11:09, Brandon Allbery wrote:
On Thu, Jan 5, 2012 at 05:57, Steve Horne
mailto:sh006d3592@blueyonder.co.uk> wrote: -- groupCut - Similar to groupBy, but where groupBy assumes an equivalence relation, -- groupCut takes a function that indicates where to cut. The two parameters to this -- function are always adjacent items from the list, and if the function returns True, -- a cut is done between the two items.
span/break?
Using those, the test function won't always be passed two *adjacent* elements from the list. After all, they're based on takeWhile and dropWhile, which take unary functions, meaning an element has already been curried in (the starting element of the group). That's probably how the current groupBy is implemented - the approach that assumes an equivalence relation, giving unexpected results when the By function isn't an equivalence relation.

Steve Horne
On 05/01/2012 11:09, Brandon Allbery wrote:
On Thu, Jan 5, 2012 at 05:57, Steve Horne
mailto:sh006d3592@blueyonder.co.uk> wrote: -- groupCut - Similar to groupBy, but where groupBy assumes an equivalence relation, -- groupCut takes a function that indicates where to cut. The two parameters to this -- function are always adjacent items from the list, and if the function returns True, -- a cut is done between the two items.
span/break?
Using those, the test function won't always be passed two *adjacent* elements from the list. After all, they're based on takeWhile and dropWhile, which take unary functions, meaning an element has already been curried in (the starting element of the group).
That's probably how the current groupBy is implemented - the approach that assumes an equivalence relation, giving unexpected results when the By function isn't an equivalence relation.
groupBy is currently implemented using span. It strikes me that we ought to specify some properties for what we want. Start by defining: pairwiseInOrderBy p l = all (uncurry p) (l `zip` drop 1 l) giving all (pairwiseInOrderBy p) (groupCut p l) and we would want concat (groupCut p l) == l (all modulo nontermination side conditions). Anything else? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

groupBy is currently implemented using span. It strikes me that we ought to specify some properties for what we want. Start by defining: pairwiseInOrderBy p l = all (uncurry p) (l `zip` drop 1 l) giving all (pairwiseInOrderBy p) (groupCut p l) and we would want concat (groupCut p l) == l (all modulo nontermination side conditions). Anything else? To be honest, I've worked out what's going on in this case and I have an implementation or two of what I'd want in case I need it, plus I've
On 06/01/2012 10:39, Jon Fairbairn wrote: posted it in case it was useful to the OP. There's nothing I really want to persue any further.

Am 05.01.2012 11:57, schrieb Steve Horne: [...]
groupCut :: (x -> x -> Bool) -> [x] -> [[x]] [...]
How about a break function that respects an escape character (1. arg) (and drops the delimiter - 2. arg) and use this function for unfolding? import Data.List break' :: (a -> Bool) -> (a -> Bool) -> [a] -> ([a], [a]) break' e p l = case l of [] -> (l, []) c : r | p c -> ([], r) | e c -> case r of [] -> (l, []) d : t -> let (f, s) = break' e p t in (c : d : f, s) | otherwise -> let (f, s) = break' e p r in (c : f, s) split' :: (a -> Bool) -> (a -> Bool) -> [a] -> [[a]] split' e p = unfoldr $ \ l -> if null l then Nothing else Just $ break' e p l *Main> split' (== '\r') (== '\n') "string1\nstring2\r\nstring3\nstring4" ["string1","string2\r\nstring3","string4"] C.

On 05/01/2012 11:55, Christian Maeder wrote:
Am 05.01.2012 11:57, schrieb Steve Horne: [...]
groupCut :: (x -> x -> Bool) -> [x] -> [[x]] [...]
How about a break function that respects an escape character (1. arg) (and drops the delimiter - 2. arg) and use this function for unfolding? Interesting.
I was going to accuse you of cheating - who says there's a spare value to use? - but you seem to be using Maybe, so well played. You're also using unfoldr, which I really must play with a bit - I don't really have a feel for how unfolding works ATM. Thanks.

Am 05.01.2012 13:04, schrieb Steve Horne: [...]
I was going to accuse you of cheating - who says there's a spare value to use? - but you seem to be using Maybe, so well played.
You're also using unfoldr, which I really must play with a bit - I don't really have a feel for how unfolding works ATM.
You may prefer another variant of unfoldr (without Maybe): unfoldr' :: ([b] -> (a, [b])) -> [b] -> [a] unfoldr' f l = if null l then [] else let (a, r) = f l in a : unfoldr' f r split' :: (a -> Bool) -> (a -> Bool) -> [a] -> [[a]] split' e p = unfoldr' $ break' e p C. P.S. my break' function fails for "\r\r\n" (as the first char escapes the second and the second no longer the third)

On Mon, Jan 02, 2012 at 12:44:23PM +0300, max wrote:
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
A short yet requiring regex solution:
import Text.Regex.PCRE match (makeRegex "(?:[^\r\n]+|\r\n)+" :: Regex) "b\nc\r\n\n\r\n\nd" :: [[String]]

If you're interested in learning parsec, RWH covered this topic in depth in
Chapter 16, Choices and Errors:
http://book.realworldhaskell.org/read/using-parsec.html.
On Mon, Jan 2, 2012 at 3:44 AM, max
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (13)
-
Anupam Jain
-
AUGER Cédric
-
Brandon Allbery
-
Christian Maeder
-
emacsray@gmail.com
-
Felipe Almeida Lessa
-
Jon Fairbairn
-
Jonathan Frywater
-
Markus Läll
-
max
-
Simon Hengel
-
Steve Horne
-
Yves Parès