On 02/01/2012 09:44, max wrote:
I want to write a function whose behavior is as follows:
foo "string1\nstring2\r\nstring3\nstring4" = ["string1", "string2\r\nstring3", "string4"]
Note the sequence "\r\n", which is ignored. How can I do this? Doing it probably the hard way (and getting it wrong) looks like the following...
-- Function to accept (normally) a single character. Special-cases -- \r\n. Refuses to accept \n. Result is either an empty list, or -- an (accepted, remaining) pair. parseTok :: String -> [(String, String)] parseTok "" = [] parseTok (c1:c2:cs) | ((c1 == '\r') && (c2 == '\n')) = [(c1:c2:[], cs)] parseTok (c:cs) | (c /= '\n') = [(c:[], cs)] | True = [] -- Accept a sequence of those (mostly single) characters parseItem :: String -> [(String, String)] parseItem "" = [("","")] parseItem cs = [(j1s ++ j2s, k2s) | (j1s,k1s) <- parseTok cs , (j2s,k2s) <- parseItem k1s ] -- Accept a whole list of strings parseAll :: String -> [([String], String)] parseAll [] = [([],"")] parseAll cs = [(j1s:j2s,k2s) | (j1s,k1s) <- parseItem cs , (j2s,k2s) <- parseAll k1s ] -- Get the first valid result, which should have consumed the -- whole string but this isn't checked. No check for existence either. parse :: String -> [String] parse cs = fst (head (parseAll cs)) I got it wrong in that this never consumes the \n between items, so it'll all go horribly wrong. There's a good chance there's a typo or two as well. The basic idea should be clear, though - maybe I should fix it but I've got some other things to do at the moment. Think of the \n as a separator, or as a prefix to every "item" but the first. Alternatively, treat it as a prefix to *every* item, and artificially add an initial one to the string in the top-level parse function. The use tail etc to remove that from the first item. See http://channel9.msdn.com/Tags/haskell - there's a series of 13 videos by Dr. Erik Meijer. The eighth in the series covers this basic technique - it calls them monadic and uses the do notation and that confused me slightly at first, it's the *list* type which is monadic in this case and (as you can see) I prefer to use list comprehensions rather than do notation. There may be a simpler way, though - there's still a fair bit of Haskell and its ecosystem I need to figure out. There's a tool called alex, for instance, but I've not used it.