
Thanks. It seems my original parser also works against FOO,BAR,BAZ if you only modify atom = string "," <|> ( many1 $ noneOf "()<>," ) -- add , Indeed, what to call the "thingies" in a parser is a source of some personal consternation. What is a token, what is an atom, what is an expr? It all seems to be somewhat ad hoc.
2009/6/9 Daniel Fischer
: Am Dienstag 09 Juni 2009 20:29:09 schrieb Thomas Hartman:
All I want to do is split on commas, but not the commas inside () or <> tags.
I have been wanting to master parsec for a long time and this simple exercise looked like a good place to start.
The code below does the right thing. Am I missing any tricks to make this simpler/neater?
Thanks, thomas.
thartman@ubuntu:~/perlArena>cat splitEm. splitEm.hs splitEm.hs~ splitEm.pl splitEm.pl~ thartman@ubuntu:~/perlArena>cat splitEm.hs {-# LANGUAGE ScopedTypeVariables #-} import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Char import Text.PrettyPrint (vcat, render, text) import Data.List.Split hiding (sepBy, chunk) import Text.ParserCombinators.Parsec.Token
import Debug.Trace import Debug.Trace.Helpers
-- this works, but is there a neater/cleaner way? main = ripInputsXs (toEof splitter) "splitter" [ goodS, badS ]
-- I need a way to split on commas, but not the commas inside '<>' or '()' characters goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>" badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>" -- the first < matches a ), so reject this
splitter = do chunks :: [String] <- toEof (many chunk) let pieces = map concat $ splitOn [","] chunks return pieces -- chunks where atom = string "," <|> ( many1 $ noneOf "()<>" ) chunk = parenExpr <|> atom
I think that does not do what you want.
For input "FOO,BAR,BAZ", chunks is ["FOO,BAR,BAZ"], that won't be split; as far as I can see, it splits only on commas directly following a parenExpr (or at the beginning of the input or directly following another splitting comma).
parenExpr :: GenParser Char st [Char] parenExpr = let paren p = betweenInc (char '(' ) (char ')' ) p <|> betweenInc (char '<' ) (char '>' ) p in paren $ option "" $ do ps <- many1 $ parenExpr <|> atom return . concat $ ps
betweenInc o' c' p' = do o <- o' p <- p' c <- c' return $ [o] ++ p ++ [c]
toEof p' = do r <- p' eof return r
ripInputs prs prsName xs = mapM_ (putStrLn . show . parse prs prsName ) xs ripInputsXs prs prsName xs = mapM_ (putStrLn . showXs . parse prs prsName ) xs where showXs v = case v of Left e -> show e Right xs -> render . vcat . map text $ xs
I can offer (sorry for the names, and I don't know if what that does is really what you want):
keepSepBy :: Parser a -> Parser a -> Parser [a] keepSepBy p sep = (do r <- p (do s <- sep xs <- keepSepBy p sep return (r:s:xs)) <|> return [r]) <|> return []
twain :: Parser a -> Parser a -> Parser [a] -> Parser [a] twain open close list = do o <- open l <- list c <- close return (o:l++[c])
comma :: Parser String comma = string ","
simpleChar :: Parser Char simpleChar = noneOf "<>(),"
suite :: Parser String suite = many1 simpleChar
atom :: Parser String atom = fmap concat $ many1 (parenExp <|> suite)
parenGroup :: Parser String parenGroup = fmap concat $ keepSepBy atom comma
parenExp :: Parser String parenExp = twain (char '<') (char '>') parenGroup <|> twain (char '(') (char ')') parenGroup
chunks :: Parser [String] chunks = sepBy atom comma
splitter = do cs <- chunks eof return cs
goodS = "<*2>FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>" badS = "<*2)FOO<2,1>,<*3>(SigB<8:0:2>,BAR),<*2>Siga<2:0>,Sigb<8,7,6,5,0>"
goodRes = parse splitter "splitter" goodS badRes = parse splitter "splitter" badS
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe