
On 02/01/2012 11:12, Jon Fairbairn wrote:
max
writes: I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this? cabal install split
then do something like
import Data.List (groupBy) import Data.List.Split (splitOn)
rn '\r' '\n' = True rn _ _ = False
required_function = fmap concat . splitOn ["\n"] . groupBy rn
(though that might be an abuse of groupBy)
Sadly, it turns out that not only is this an abuse of groupBy, but it has (I think) a subtle bug as a result. I was inspired by this to try some other groupBy stuff, and it didn't work. After scratching my head a bit, I tried the following... Prelude> import Data.List Prelude Data.List> groupBy (<) [1,2,3,2,1,2,3,2,1] [[1,2,3,2],[1,2,3,2],[1]] That wasn't exactly the result I was expecting :-( Explanation (best guess) - the function passed to groupBy, according to the docs, is meant to test whether two values are 'equal'. I'm guessing the assumption is that the function will effectively treat values as belonging to equivalence classes. That implies some rules such as... (a == a) reflexivity : (a == b) => (b == a) transitivity : (a == b) && (b == c) => (a == c) I'm not quite certain I got those names right, and I can't remember the name of the first rule at all, sorry. The third rule is probably to blame here. By the rules, groupBy doesn't need to compare adjacent items. When it starts a new group, it seems to always use the first item in that new group until it finds a mismatch. In my test, that means it's always comparing with 1 - the second 2 is included in each group because although (3 < 2) is False, groupBy isn't testing that - it's testing (1 < 2). In the context of this \r\n test function, this behaviour will I guess result in \r\n\n being combined into one group. The second \n will therefore not be seen as a valid splitting point. Personally, I think this is a tad disappointing. Given that groupBy cannot check or enforce that it's test respects equivalence classes, it should ideally give results that make as much sense as possible either way. That said, even if the test was always given adjacent elements, there's still room for a different order of processing the list (left-to-right or right-to-left) to give different results - and in any case, maybe it's more efficient the way it is.