
As someone who's not used these library methods before, I would expect splitBy and splitLines to work differently to each other. When splitting into lines, I would assume that it is repeatedly applying the regular expression "([^t]*) (t|$)" where t is the line-terminator. You return the first group each time, and discard the rest. The 2nd group also handles the end-of-string boundary condition. As others have said, I would expect splitBy to return all of the zero-length matches as well - interlieving a "[^t]*" match-and-return with a "t" match-and-discard. The collapsed form of the output is the same as interleving a "[^t]" match-and-return with a "t*" match-and-discard. Matthew On Thursday 13 July 2006 10:16, Jon Fairbairn wrote:
On 2006-07-12 at 23:24BST "Brian Hulley" wrote:
Christian Maeder wrote:
Donald Bruce Stewart schrieb:
Question over whether it should be: splitBy (=='a') "aabbaca" == ["","","bb","c",""] or splitBy (=='a') "aabbaca" == ["bb","c"]
I argue the second form is what people usually want.
Yes, the second form is needed for "words", but the first form is needed for "lines", where one final empty element needs to be removed from your version!
Prelude> lines "a\nb\n" ["a","b"] Prelude> lines "a\n\nb\n\n" ["a","","b",""]
Prelude.lines and Prelude.unlines treat '\n' as a terminator instead of a separator. I'd argue that this is poor design, since information is lost ie lines . unlines === id whereas unlines . lines =/= id whereas if '\n' had been properly conceived of as a separator, the identity would hold.
Hooray! I've been waiting to ask "Why aren't we asking what laws hold for these operations?" but now you've saved me the effort. I've been bitten by unlines . lines /= id already; it's something we could gainfully change without wrecking too much code, methinks.
So I vote for the first option ie:
splitBy (=='a') "aabbaca" == ["","","bb","c",""]
Seconded.
As far as naming is concerned, since this is a declarative language, surely we shouldn't be using active verbs like this? (OK I lost that argument way back in the mists of Haskell 0.0 with take. Before then I called "take" "first": "first n some_list" reads perfectly well).
Jón