Re: [Haskell-cafe] attpoarsec and recursive parsing

Hi Frederic, Essentially, the '|' character acts like a binary operator in parsing arithmetical expression trees. To that end, the most ergonomic module I know of is provided by parsec [1], which has a port [2] to attoparsec. You have to decide whether the '|' operator should be left- or right- associative. That question vanishes if you decide to re-design your data type as data MaskLocation = MaskLocation Text | MaskLocation'Tmpl Text | MaskLocation'Or (NonEmpty MaskLocation) and use the sepBy1 combinator. Also you should think about whether and how to escape the separator Char inside ordinary MaskLocation strings. Olaf [1] https://hackage.haskell.org/package/parsec-3.1.17.0/docs/Text-ParserCombinat... [2] https://hackage.haskell.org/package/attoparsec-expr

Hello, I end up with this solution in order to parse this data type. If you have some advice in order to simplify or beautify the Parser part do not hesitate. Cheers Fred data MaskLocation = MaskLocation Text | MaskLocation'Tmpl Text | MaskLocation'Or MaskLocation MaskLocation deriving (Eq, Generic, Show) deriving anyclass (FromJSON, ToJSON) instance FieldEmitter MaskLocation where fieldEmitter (MaskLocation t) = t fieldEmitter (MaskLocation'Tmpl t) = t fieldEmitter (MaskLocation'Or l r) = fieldEmitter l <> " | " <> fieldEmitter r instance FieldParsable MaskLocation where fieldParser = do let loc :: Text -> MaskLocation loc t = if "{scannumber:" `Data.Text.isInfixOf` t then MaskLocation'Tmpl (strip t) else MaskLocation (strip t) t <- takeTill (== '|') if t == "" then fail "MaskLocation is Empty" else do mc <- peekChar case mc of Nothing -> pure $ loc t Just '|' -> do _ <- char '|' -- extract the '|' char MaskLocation'Or (loc t) <$> fieldParser Just c -> fail ("MaskLocation " <> [c])

On Fri, 2024-11-22 at 11:05 +0100, PICCA Frederic-Emmanuel wrote:
Hello, I end up with this solution
in order to parse this data type.
If you have some advice in order to simplify or beautify the Parser part do not hesitate.
Cheers
Fred
data MaskLocation = MaskLocation Text | MaskLocation'Tmpl Text | MaskLocation'Or MaskLocation MaskLocation deriving (Eq, Generic, Show) deriving anyclass (FromJSON, ToJSON)
instance FieldEmitter MaskLocation where fieldEmitter (MaskLocation t) = t fieldEmitter (MaskLocation'Tmpl t) = t fieldEmitter (MaskLocation'Or l r) = fieldEmitter l <> " | " <> fieldEmitter r
instance FieldParsable MaskLocation where fieldParser = do let loc :: Text -> MaskLocation loc t = if "{scannumber:" `Data.Text.isInfixOf` t then MaskLocation'Tmpl (strip t) else MaskLocation (strip t)
t <- takeTill (== '|') if t == "" then fail "MaskLocation is Empty" else do mc <- peekChar case mc of Nothing -> pure $ loc t Just '|' -> do _ <- char '|' -- extract the '|' char MaskLocation'Or (loc t) <$> fieldParser Just c -> fail ("MaskLocation " <> [c])
The peekChar seems suspicious. That is what backtracking parsers abstract over. Attoparsec always backtracks, in other parser libraries there is the try combinator. You match on the '|' char using peekChar and then later discard it by the _ <- char '|' parser fragment. Also note that MaskLocation is a tree (with two different types of leaves), but your fieldParser will only ever produce trees that have a single leaf in the left branch. These trees are therefore equivalent to non-empty linked lists. Therefore you might as well use sepBy1: type P = Data.Attoparsec.Text.Parser loc :: Text -> Either Text Text loc t = if ("{scannumber:" `Data.Text.isInfixOf` t) then Left (strip t) else Right (strip t) -- note: either id id ~ fieldEmitter locationOrTempl :: P (Either Text Text) locationOrTempl = fmap loc (takeWhile1 (/= '|')) sep :: P Text sep = fromString "|" -- IsString instance of P maskLocations :: P [Either Text Text] maskLocations = locationOrTempl `sepBy1` sep Olaf
participants (2)
-
Olaf Klinke
-
PICCA Frederic-Emmanuel