
8 Mar
2017
8 Mar
'17
9:41 p.m.
Hi Jaro, thanks for the link, having worked with few parsing libs but uu-parsinglib sounds new to me and I'm definitely thinking about having a try. I'm taking a look but haven't yet found any part of code that attempts to factor common prefixes out. On Wed, Mar 8, 2017 at 7:48 PM, Jarowrote: > Have you seen uu-parsinglib (http://foswiki.cs.uu.nl/ > foswiki/HUT/ParserCombinators)? It implements a non-backtracking parser > that works similarly to what you describe. > > > On 9 March 2017 01:21:09 GMT+01:00, Javran Cheng > wrote: >> >> Hi all, >> >> I come across an idea about parsing a bunch of strings, >> just want to share this idea and looking for some comments and maybe >> pointing me to some related articles. >> >> The idea is when you have: >> >> 1 <$ string "aabb" <|> 2 <$ string "aacc" >> >> you can rewrite it as: >> >> string "aa" >> (1 <$ string "bb" <|> 2 <$ string "cc") >> >> so that you don't have to deal with the common prefix "aa" twice. >> >> Then I further extend this idea: if we have a set of strings (s1, s2, >> ...), whose element is not a prefix of any other set member, >> why don't we turn things like (v1 <$ string s1 <|> v2 <$ string s2 <|> >> ...) into a "tree-structure" so we don't need to deal with >> any common prefix more than once? >> >> So this motivates my little toy implementation at >> https://gist.github.com/Javran/1cbfe9897d9a5c8fae5d20a33fd83813 >> >> basically given a list [(s1,v1),(s2,v2)...], we want to generate a parser >> equivalent to (v1 <$ string s1 <|> v2 <$ string s2 <|> ...) >> but perhaps with less backtracking. My code does this by turning the set >> of strings into a tree structure (see function "compact") and >> using this "Compact" representation as a driver to generate the parser. >> I used ReadP to have a concrete parser for testing, but this should work >> with any parsing library supporting MonadPlus. >> >> I think they are bunch of things that can be improved: >> >> - for now every intermediate node of "Compact" contains just a char, so >> this will end up generating a parser that has branch `char 'f' >> char 'o' >> >> 'o'` in it >> instead of the slightly more efficient version `string "foo"` >> - we could further attach a weight to each string, this allows >> rearranging (<|>) chains so that a more frequent string parser appears >> earlier. >> - if the input [(s1,v1),(s2,v2)...] are all known constants at compile >> time, probably TemplateHaskell can be used for generating the parser at >> compile time (as a Haskell expression). >> I'm not sure how difficult this would be though, as I have very >> limited understanding of TH. >> >> but before I explore further, I want to collect some comments as I said >> in the beginning. >> >> Best, >> -- >> Javran (Fang) Cheng >> > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -- Javran (Fang) Cheng