
[split, chop, all that]
How about biting the bullet and providing a real "tokenizer"? I have had the problem of having to split a text into lines, for instance, which used \r\n as EOL marker, not just \n. So I couldn't use 'lines'. Judging by the 'split' (or 'chop') proposals I've seen so far, I wouldn't be able to use them for that purpose either, because they don't support multi-byte tokens. Shooting from the hip, I'd say this more general function would do the trick: tokenize :: (a -> Bool) -> (a -> Bool) -> [a] -> [[a]] The first function returns 'True' if the the current input element is part of a valid token. The second function (the "skipper") would return 'True' if the current element is ignorable "whitespace". The input "foo bar \t claus \r\n stuff", for instance, could be tokenized into ["foo", "bar", "claus", "stuff"] by something along the lines of the following function call: tokenize isAlphaNum isSpace "input string" Basically, the 'tokenize' function would consume input until the first function says "False". Then it would consume (and drop) input until the second function says "False". And so on, until the end of input string is reached. One would have to think about what 'tokenize' would do if _both_ functions say 'False' for an input element, but IMHO that could just be an 'error'. I think that would be a nice addition to the standard library, and 'split' (or 'chop') would simply be specialized versions of this one. Peter