
Hi, I'm trying (for the first time ever) to use RULES pragmas to achieve some nice speedups in my bytestring parsing library. The relevant code in my library's module is: -- The module imports Control.Applicative which containes 'many' and 'some'. -- | The parser @satisfy p@ succeeds for any byte for which the -- supplied function @p@ returns 'True'. Returns the byte that is -- actually parsed. satisfy :: (Word8 -> Bool) -> Parser Word8 satisfy p = Parser $ \s@(S bs pos eof) succ fail -> case S.uncons bs of Just (b, bs') -> if p b then succ b (S bs' (pos + 1) eof) else fail s Nothing -> if eof then fail s else IPartial $ \x -> case x of Just bs' -> retry (S bs' pos eof) Nothing -> fail (S bs pos True) where retry s' = unParser (satisfy p) s' succ fail -- | @byte b@ parses a single byte @b@. Returns the parsed byte -- (i.e. @b@). byte :: Word8 -> Parser Word8 byte b = satisfy (== b) -- --------------------------------------------------------------------- -- Rewrite rules satisfyMany :: (Word8 -> Bool) -> Parser S.ByteString satisfyMany p = undefined -- More efficient implementation goes here. satisfySome :: (Word8 -> Bool) -> Parser S.ByteString satisfySome p = undefined -- More efficient implementation goes here. {-# RULES "fmap/pack/many/satisfy" forall p. fmap S.pack (many (satisfy p)) = satisfyMany p "fmap/pack/some/satisfy" forall p. fmap S.pack (some (satisfy p)) = satisfySome p #-} In another module where I use the library I have this code: pHeaders :: Parser [(S.ByteString, S.ByteString)] pHeaders = many header where header = liftA2 (,) fieldName (byte (c2w ':') *> spaces *> contents) fieldName = liftA2 (S.cons) letter fieldChars contents = liftA2 (S.append) (fmap S.pack $ some notEOL <* crlf) (continuation <|> pure S.empty) continuation = liftA2 (S.cons) ((c2w ' ') <$ some (oneOf (map c2w " \t"))) contents -- It's important that all three of these definitions are kept on the -- top level to have RULES fire correctly. fieldChars = fmap S.pack $ many fieldChar -- fieldChar = letter <|> digit <|> oneOf (map c2w "-_") fieldChar = satisfy isFieldChar where isFieldChar b = (isDigit $ chr $ fromIntegral b) || (isAlpha $ chr $ fromIntegral b) || (b `elem` map c2w "-_") I want the fieldChars use of 'fmap S.pack $ many fieldChar' to trigger my rewrite rule "fmap/pack/many/satisfy" which it does in this case. The trouble is that the rule only triggers when I make at least fieldChars and fieldChar top-level definition and isFieldChar either a named local definition in fieldChar or a top-level definition. If I turn the predicate (isFieldChar) into to an anonymous lambda function it doesn't trigger, if I make either fieldChars or fieldChars a local defintion (in a where clause) of pHeaders it doesn't trigger. If I make fieldChar a local definition in fieldChars it doesn't trigger, etc. It would be great if there was a way to make this a bit less fragile and have the rule trigger more often as it is potentially a huge performance win. I understand it's hard to guarantee that the rule always triggers but now it triggers in rare cases. -- Johan

Hey Johan, The main thing to remember is that anything you wish to match on in a rule needs to not be inlined in the first pass. So to match "many" or "satisfy" robustly, you'll need: {-# NOINLINE [1] many #-} For example. johan.tibell:
Hi,
I'm trying (for the first time ever) to use RULES pragmas to achieve some nice speedups in my bytestring parsing library. The relevant code in my library's module is:
-- The module imports Control.Applicative which containes 'many' and 'some'.
-- | The parser @satisfy p@ succeeds for any byte for which the -- supplied function @p@ returns 'True'. Returns the byte that is -- actually parsed. satisfy :: (Word8 -> Bool) -> Parser Word8 satisfy p = Parser $ \s@(S bs pos eof) succ fail -> case S.uncons bs of Just (b, bs') -> if p b then succ b (S bs' (pos + 1) eof) else fail s Nothing -> if eof then fail s else IPartial $ \x -> case x of Just bs' -> retry (S bs' pos eof) Nothing -> fail (S bs pos True) where retry s' = unParser (satisfy p) s' succ fail
-- | @byte b@ parses a single byte @b@. Returns the parsed byte -- (i.e. @b@). byte :: Word8 -> Parser Word8 byte b = satisfy (== b)
-- --------------------------------------------------------------------- -- Rewrite rules
satisfyMany :: (Word8 -> Bool) -> Parser S.ByteString satisfyMany p = undefined -- More efficient implementation goes here.
satisfySome :: (Word8 -> Bool) -> Parser S.ByteString satisfySome p = undefined -- More efficient implementation goes here.
{-# RULES
"fmap/pack/many/satisfy" forall p. fmap S.pack (many (satisfy p)) = satisfyMany p
"fmap/pack/some/satisfy" forall p. fmap S.pack (some (satisfy p)) = satisfySome p #-}
In another module where I use the library I have this code:
pHeaders :: Parser [(S.ByteString, S.ByteString)] pHeaders = many header where header = liftA2 (,) fieldName (byte (c2w ':') *> spaces *> contents) fieldName = liftA2 (S.cons) letter fieldChars contents = liftA2 (S.append) (fmap S.pack $ some notEOL <* crlf) (continuation <|> pure S.empty) continuation = liftA2 (S.cons) ((c2w ' ') <$ some (oneOf (map c2w " \t"))) contents
-- It's important that all three of these definitions are kept on the -- top level to have RULES fire correctly. fieldChars = fmap S.pack $ many fieldChar
-- fieldChar = letter <|> digit <|> oneOf (map c2w "-_") fieldChar = satisfy isFieldChar where isFieldChar b = (isDigit $ chr $ fromIntegral b) || (isAlpha $ chr $ fromIntegral b) || (b `elem` map c2w "-_")
I want the fieldChars use of 'fmap S.pack $ many fieldChar' to trigger my rewrite rule "fmap/pack/many/satisfy" which it does in this case. The trouble is that the rule only triggers when I make at least fieldChars and fieldChar top-level definition and isFieldChar either a named local definition in fieldChar or a top-level definition. If I turn the predicate (isFieldChar) into to an anonymous lambda function it doesn't trigger, if I make either fieldChars or fieldChars a local defintion (in a where clause) of pHeaders it doesn't trigger. If I make fieldChar a local definition in fieldChars it doesn't trigger, etc.
It would be great if there was a way to make this a bit less fragile and have the rule trigger more often as it is potentially a huge performance win. I understand it's hard to guarantee that the rule always triggers but now it triggers in rare cases.
-- Johan _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Thu, Mar 13, 2008 at 11:24 PM, Don Stewart
Hey Johan,
The main thing to remember is that anything you wish to match on in a rule needs to not be inlined in the first pass.
So to match "many" or "satisfy" robustly, you'll need:
{-# NOINLINE [1] many #-}
For example.
Does that mean that I should add a [2] to my rules so they are run after? I guess it doesn't hurt to try them earlier. -- Johan

johan.tibell:
On Thu, Mar 13, 2008 at 11:24 PM, Don Stewart
wrote: Hey Johan,
The main thing to remember is that anything you wish to match on in a rule needs to not be inlined in the first pass.
So to match "many" or "satisfy" robustly, you'll need:
{-# NOINLINE [1] many #-}
For example.
Does that mean that I should add a [2] to my rules so they are run after? I guess it doesn't hurt to try them earlier.
Right, the rules can run anytime. You want them to run before many gets inlined.

Note also that you are attaching your rule to 'fmap', however, since your rule only applies to the 'Parser' monad (if I am reading it properly) then 'fmap' will have been replaced by the 'fmap' in the Functor Parser instance. So you would probably be better off doing something like instance Functor Parser where fmap = parserFmap {-# RULES forall x . parserFmap S.pack ( ... Also, you have to be careful attaching rules to functions you don't define, as you don't know how it will interact with other rules, inline pragmas or whatnot that might already be attached to that function. John -- John Meacham - ⑆repetae.net⑆john⑈

On Fri, Mar 14, 2008 at 6:28 PM, John Meacham
Note also that you are attaching your rule to 'fmap', however, since your rule only applies to the 'Parser' monad (if I am reading it properly) then 'fmap' will have been replaced by the 'fmap' in the Functor Parser instance. So you would probably be better off doing something like
instance Functor Parser where fmap = parserFmap
{-# RULES forall x . parserFmap S.pack ( ...
Didn't think of that. I'll change my code accordingly.
Also, you have to be careful attaching rules to functions you don't define, as you don't know how it will interact with other rules, inline pragmas or whatnot that might already be attached to that function.
Sounds like good advice. Thanks.
participants (3)
-
Don Stewart
-
Johan Tibell
-
John Meacham