
8 Mar
2017
8 Mar
'17
7:48 p.m.
Have you seen uu-parsinglib (http://foswiki.cs.uu.nl/foswiki/HUT/ParserCombinators)? It implements a non-backtracking parser that works similarly to what you describe. On 9 March 2017 01:21:09 GMT+01:00, Javran Chengwrote: >Hi all, > >I come across an idea about parsing a bunch of strings, >just want to share this idea and looking for some comments and maybe >pointing me to some related articles. > >The idea is when you have: > >1 <$ string "aabb" <|> 2 <$ string "aacc" > >you can rewrite it as: > >string "aa" >> (1 <$ string "bb" <|> 2 <$ string "cc") > >so that you don't have to deal with the common prefix "aa" twice. > >Then I further extend this idea: if we have a set of strings (s1, s2, >...), >whose element is not a prefix of any other set member, >why don't we turn things like (v1 <$ string s1 <|> v2 <$ string s2 <|> >...) >into a "tree-structure" so we don't need to deal with >any common prefix more than once? > >So this motivates my little toy implementation at >https://gist.github.com/Javran/1cbfe9897d9a5c8fae5d20a33fd83813 > >basically given a list [(s1,v1),(s2,v2)...], we want to generate a >parser >equivalent to (v1 <$ string s1 <|> v2 <$ string s2 <|> ...) >but perhaps with less backtracking. My code does this by turning the >set of >strings into a tree structure (see function "compact") and >using this "Compact" representation as a driver to generate the parser. >I used ReadP to have a concrete parser for testing, but this should >work >with any parsing library supporting MonadPlus. > >I think they are bunch of things that can be improved: > >- for now every intermediate node of "Compact" contains just a char, so >this will end up generating a parser that has branch `char 'f' >> char >'o' >>> 'o'` in it > instead of the slightly more efficient version `string "foo"` >- we could further attach a weight to each string, this allows >rearranging >(<|>) chains so that a more frequent string parser appears earlier. >- if the input [(s1,v1),(s2,v2)...] are all known constants at compile >time, probably TemplateHaskell can be used for generating the parser at >compile time (as a Haskell expression). >I'm not sure how difficult this would be though, as I have very limited >understanding of TH. > >but before I explore further, I want to collect some comments as I said >in >the beginning. > >Best, >-- >Javran (Fang) Cheng -- Sent from my Android device with K-9 Mail. Please excuse my brevity.