
Hello, I need a function and wonder whether I can copy some existing code so I don't have to write it myself. It should split a string into a list of strings: splitAtTopLevelCommas :: String -> [String] I need something similar to `splitOn ","` from the Text package with the property intercalate "," . splitAtTopLevelCommas = id But it should split the string only on a few commas, not all. You can think of the result list as representations of Haskell values, for example splitAtTopLevelCommas "True,(1,(2,[3,4])),Just ('a',\")\")" should yield ["True", "(1,(2,[3,4]))", "Just ('a',\")\")"] I expect writing this function to be quite tedious (ignore commas in parens, ignore parens in strings, quotation, ...) and would prefer to copy code from some parsing library. Do you have an idea what I could use? Or how to solve it from scratch in a few lines? Sebastian

You might want to check out parsec, and the chapter related to it in RWH.
http://book.realworldhaskell.org/read/using-parsec.html
On Sun, Jan 23, 2011 at 12:09 AM, Sebastian Fischer
Hello,
I need a function and wonder whether I can copy some existing code so I don't have to write it myself.
It should split a string into a list of strings:
splitAtTopLevelCommas :: String -> [String]
I need something similar to `splitOn ","` from the Text package with the property
intercalate "," . splitAtTopLevelCommas = id
But it should split the string only on a few commas, not all. You can think of the result list as representations of Haskell values, for example
splitAtTopLevelCommas "True,(1,(2,[3,4])),Just ('a',\")\")"
should yield
["True", "(1,(2,[3,4]))", "Just ('a',\")\")"]
I expect writing this function to be quite tedious (ignore commas in parens, ignore parens in strings, quotation, ...) and would prefer to copy code from some parsing library. Do you have an idea what I could use? Or how to solve it from scratch in a few lines?
Sebastian
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Alex R

Sebastian Fischer
I expect writing this function to be quite tedious (ignore commas in parens, ignore parens in strings, quotation, ...) and would prefer to copy code from some parsing library. Do you have an idea what I could use? Or how to solve it from scratch in a few lines?
Maybe Text.Show.Pretty.parseValue in the pretty-show package can help? -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig Insert wit here.

On Sun, Jan 23, 2011 at 4:31 PM, Chung-chieh Shan
Maybe Text.Show.Pretty.parseValue in the pretty-show package can help?
That's what I was looking for, thanks!
On Sun, Jan 23, 2011 at 5:23 PM, Stephen Tetley
I don't think you can do this "simply" as you think you would always have to build a parse tree.
Isn't it enough to maintain a stack of open parens, brackets, char- and string-terminators and escape chars? Below is my attempt at solving the problem without an expression parser. In practice, if you follow the skeleton syntax tree style you might
find "not caring" about the details of syntax is almost as much work as caring about them. I've tried a couple of times to make a skeleton parser that does paren nesting and little else, but always given up and just used a proper parser as the skeleton parser never seemed robust.
Indeed I doubt that the implementation below is robust and it's too tricky to be easily maintainable. I include it for reference. Let me know if you spot an obvious mistake.. Sebastian splitTLC :: String -> [String] splitTLC = parse "" type Stack = String parse :: Stack -> String -> [String] parse _ "" = [] parse st (c:cs) = next c st $ parse (updStack c st) cs next :: Char -> Stack -> [String] -> [String] next c [] xs = if c==',' then [] : xs else c <: xs next c (_:_) xs = c <: xs infixr 0 <: (<:) :: Char -> [String] -> [String] c <: [] = [[c]] c <: (x:xs) = (c:x):xs updStack :: Char -> Stack -> Stack updStack char stack = case (char,stack) of -- char is an escaped character (_ ,'\\':xs) -> xs -- the next character is not -- char is the escape character ('\\', xs) -> '\\':xs -- push it on the stack -- char is the string terminator ('"' , '"':xs) -> xs -- closes current string literal ('"' , ''':xs) -> ''':xs -- ignored inside character ('"' , xs) -> '"':xs -- opens a new string -- char is the character terminator (''' , ''':xs) -> xs -- closes current character literal (''' , '"':xs) -> '"':xs -- ignored inside string (''' , xs) -> ''':xs -- opens a new character -- parens and brackets (_ , '"':xs) -> '"':xs -- are ignored inside strings (_ , ''':xs) -> ''':xs -- and characters ('(' , xs) -> '(':xs -- new opening paren (')' , '(':xs) -> xs -- closing paren ('[' , xs) -> '[':xs -- opening bracket (']' , '[':xs) -> xs -- closing bracket -- other character don't modify the stack (ignoring record syntax) (_ , xs) -> xs

I don't think you can do this "simply" as you think you would always have to build a parse tree. If the input is valid Haskell you could follow Chung-chieh Shan's suggestion, otherwise you could parse to a "skeleton syntax tree" - look for work by Jonathan Bacharach on Dylan macros and "Java Syntax Extension". In practice, if you follow the skeleton syntax tree style you might find "not caring" about the details of syntax is almost as much work as caring about them. I've tried a couple of times to make a skeleton parser that does paren nesting and little else, but always given up and just used a proper parser as the skeleton parser never seemed robust.
participants (4)
-
Alex Rozenshteyn
-
Chung-chieh Shan
-
Sebastian Fischer
-
Stephen Tetley