
a brain fart? Hi, cafe, I've been playing a little bit with a small command processor, and I decided it'd be nice to allow the user to not have to enter a complete command, but to recognize a unique prefix of it. So I started with the list of allowed commands, used filter and isPrefixOf, and was happy. But then I increased the complexity a little bit and it got hairier, so I decided to rewrite the parser for this bit in parsec. The function I came up with is parsePrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs) which I call as parseFoo = parsePrefixOf 1 "foo" and it recognizes all of "f", "fo", and "foo" as "foo". OK so far, this also seems to work fine. But during the course of writing this, I made a stupid mistake at one point, and the result of that seemed odd. Consider the following program. It's stupid because the required prefix of "frito" is only 2 characters, which isn't enough to actually distinguish this from the next one, "fromage". (And if I change that to 2 to 3 characters, everything works fine.) So here's the complete program module Main where import Prelude import System import Text.ParserCombinators.Parsec as TPCP myPrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs) myTest = myPrefixOf 1 "banana" <|> myPrefixOf 1 "chocolate" <|> TPCP.try (myPrefixOf 2 "frito") <|> myPrefixOf 3 "fromage" myBig = spaces >> myTest >>= (\g -> spaces >> eof >> return g) parseTry input = case parse myBig "test" input of Left err -> return (show err) Right val -> return ("success: '" ++ val ++ "'") main = getArgs >>= (\a -> parseTry (a !! 0)) >>= putStrLn If I compile this, say as program "opry", and run it as shown below, I expect the results I get for all but the last one: % ./opry b success: 'banana' % ./opry c success: 'chocolate' % ./opry fr success: 'frito' % ./opry fri success: 'frito' % ./opry fro "test" (line 1, column 3): unexpected "o" expecting "i", white space or end of input Sooo... why do I get that last one? My expectation was that parsec would try the string "fro" with the parser for "frito", it would fail, having consumed 2 characters, but then the TPCP.try which is wrapped around all of that should restore everything, and then the final parser for "fromage" should succeed. The same reasoning seems to me to apply if I specify 3 characters as the required initial portion for "frito", and if I do that it does succeed as I expect. So is this a bug in parsec, or a bug in my brain? thanks... Uwe

On Mon, Oct 12, 2009 at 9:28 PM, Uwe Hollerbach
a brain fart?
Hi, cafe, I've been playing a little bit with a small command processor, and I decided it'd be nice to allow the user to not have to enter a complete command, but to recognize a unique prefix of it. So I started with the list of allowed commands, used filter and isPrefixOf, and was happy. But then I increased the complexity a little bit and it got hairier, so I decided to rewrite the parser for this bit in parsec. The function I came up with is
parsePrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
which I call as
parseFoo = parsePrefixOf 1 "foo"
and it recognizes all of "f", "fo", and "foo" as "foo".
OK so far, this also seems to work fine. But during the course of writing this, I made a stupid mistake at one point, and the result of that seemed odd. Consider the following program. It's stupid because the required prefix of "frito" is only 2 characters, which isn't enough to actually distinguish this from the next one, "fromage". (And if I change that to 2 to 3 characters, everything works fine.) So here's the complete program
module Main where
import Prelude import System import Text.ParserCombinators.Parsec as TPCP
myPrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
myTest = myPrefixOf 1 "banana" <|> myPrefixOf 1 "chocolate" <|> TPCP.try (myPrefixOf 2 "frito") <|> myPrefixOf 3 "fromage"
myBig = spaces >> myTest >>= (\g -> spaces >> eof >> return g)
parseTry input = case parse myBig "test" input of Left err -> return (show err) Right val -> return ("success: '" ++ val ++ "'")
main = getArgs >>= (\a -> parseTry (a !! 0)) >>= putStrLn
If I compile this, say as program "opry", and run it as shown below, I expect the results I get for all but the last one:
% ./opry b success: 'banana'
% ./opry c success: 'chocolate'
% ./opry fr success: 'frito'
% ./opry fri success: 'frito'
% ./opry fro "test" (line 1, column 3): unexpected "o" expecting "i", white space or end of input
Sooo... why do I get that last one? My expectation was that parsec would try the string "fro" with the parser for "frito", it would fail, having consumed 2 characters, but then the TPCP.try which is wrapped around all of that should restore everything, and then the final parser for "fromage" should succeed. The same reasoning seems to me to apply if I specify 3 characters as the required initial portion for "frito", and if I do that it does succeed as I expect.
So is this a bug in parsec, or a bug in my brain?
Move the try to the last alternative.

On 10/12/09, Derek Elkins
On Mon, Oct 12, 2009 at 9:28 PM, Uwe Hollerbach
wrote: a brain fart?
Hi, cafe, I've been playing a little bit with a small command processor, and I decided it'd be nice to allow the user to not have to enter a complete command, but to recognize a unique prefix of it. So I started with the list of allowed commands, used filter and isPrefixOf, and was happy. But then I increased the complexity a little bit and it got hairier, so I decided to rewrite the parser for this bit in parsec. The function I came up with is
parsePrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
which I call as
parseFoo = parsePrefixOf 1 "foo"
and it recognizes all of "f", "fo", and "foo" as "foo".
OK so far, this also seems to work fine. But during the course of writing this, I made a stupid mistake at one point, and the result of that seemed odd. Consider the following program. It's stupid because the required prefix of "frito" is only 2 characters, which isn't enough to actually distinguish this from the next one, "fromage". (And if I change that to 2 to 3 characters, everything works fine.) So here's the complete program
module Main where
import Prelude import System import Text.ParserCombinators.Parsec as TPCP
myPrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
myTest = myPrefixOf 1 "banana" <|> myPrefixOf 1 "chocolate" <|> TPCP.try (myPrefixOf 2 "frito") <|> myPrefixOf 3 "fromage"
myBig = spaces >> myTest >>= (\g -> spaces >> eof >> return g)
parseTry input = case parse myBig "test" input of Left err -> return (show err) Right val -> return ("success: '" ++ val ++ "'")
main = getArgs >>= (\a -> parseTry (a !! 0)) >>= putStrLn
If I compile this, say as program "opry", and run it as shown below, I expect the results I get for all but the last one:
% ./opry b success: 'banana'
% ./opry c success: 'chocolate'
% ./opry fr success: 'frito'
% ./opry fri success: 'frito'
% ./opry fro "test" (line 1, column 3): unexpected "o" expecting "i", white space or end of input
Sooo... why do I get that last one? My expectation was that parsec would try the string "fro" with the parser for "frito", it would fail, having consumed 2 characters, but then the TPCP.try which is wrapped around all of that should restore everything, and then the final parser for "fromage" should succeed. The same reasoning seems to me to apply if I specify 3 characters as the required initial portion for "frito", and if I do that it does succeed as I expect.
So is this a bug in parsec, or a bug in my brain?
Move the try to the last alternative.
No, that doesn't do it... I get the same error (and also the same if I wrap both alternatives in try). Uwe

On Oct 12, 2009, at 22:28 , Uwe Hollerbach wrote:
parsePrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
Seems to me this will succeed as soon as it possibly can...
myTest = myPrefixOf 1 "banana" <|> myPrefixOf 1 "chocolate" <|> TPCP.try (myPrefixOf 2 "frito") <|> myPrefixOf 3 "fromage"
...so the "frito" branch gets committed as soon as "fr" is read/parsed (myTest returns)...
% ./opry fro "test" (line 1, column 3): unexpected "o" expecting "i", white space or end of input
...which is why this is looking for "white space or end of input". My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On 10/12/09, Brandon S. Allbery KF8NH
On Oct 12, 2009, at 22:28 , Uwe Hollerbach wrote:
parsePrefixOf n str = string (take n str) >> opts (drop n str) >> return str where opts [] = return () opts (c:cs) = optional (char c >> opts cs)
Seems to me this will succeed as soon as it possibly can...
myTest = myPrefixOf 1 "banana" <|> myPrefixOf 1 "chocolate" <|> TPCP.try (myPrefixOf 2 "frito") <|> myPrefixOf 3 "fromage"
...so the "frito" branch gets committed as soon as "fr" is read/parsed (myTest returns)...
% ./opry fro "test" (line 1, column 3): unexpected "o" expecting "i", white space or end of input
...which is why this is looking for "white space or end of input".
My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows.
-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
Ah, yes, I see where I went wrong; thank you! Uwe

Brandon S. Allbery KF8NH wrote:
My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows.
Maybe you can use notFollowedBy for this. HTH, Martijn.

On 10/12/09, Martijn van Steenbergen
Brandon S. Allbery KF8NH wrote:
My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows.
Maybe you can use notFollowedBy for this.
HTH,
Martijn.
Yes, I've looked at that and am thinking about it. I'm not quite certain it's needed in my real program... I seem to have convinced myself that if I actually specify a proper set of unique prefixes, ie, set the required lengths for both "frito" and "fromage" to 3 in the test program, I won't get into this situation. Assuming I haven't committed another brain-fart there, that would be sufficient; presumably, in a real program one would want to actually specify the unique prefix, rather than a non-unique pre-prefix. It seems to work fine in my real program, anyway. Uwe

I could not resist this. The code import Text.ParserCombinators.UU.Parsing pCommand [] = pure [] pCommand xxs@(x:xs) = ((:) <$> pSym x <*> pCommand xs) `opt` xxs pCommands = amb . foldr (<|>) pFail . map pCommand $ ["banana", "chocolate", "frito", "fromage"] t :: String -> ([String], [Error Char Char Int]) t input = parse ( (,) <$> pCommands <*> pEnd) (listToStr input) gives the following results: *Main> t "" (["banana","chocolate","frito","fromage"],[]) *Main> t "b" (["banana"],[]) *Main> t "fr" (["frito","fromage"],[]) *Main> t "x" (["banana","chocolate","frito","fromage"],[ The token 'x'was not consumed by the parsing process.]) *Main> t "frox" (["fromage"],[ The token 'x'was not consumed by the parsing process.]) *Main> t "frx" (["frito","fromage"],[ The token 'x'was not consumed by the parsing process.]) *Main> I think it is less clumsy and far less confusing than the Parsec code. Note that the function amb tells that its parameter parser can be ambiguous Doaitse On 13 okt 2009, at 17:10, Uwe Hollerbach wrote:
On 10/12/09, Martijn van Steenbergen
wrote: Brandon S. Allbery KF8NH wrote:
My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows.
Maybe you can use notFollowedBy for this.
HTH,
Martijn.
Yes, I've looked at that and am thinking about it. I'm not quite certain it's needed in my real program... I seem to have convinced myself that if I actually specify a proper set of unique prefixes, ie, set the required lengths for both "frito" and "fromage" to 3 in the test program, I won't get into this situation. Assuming I haven't committed another brain-fart there, that would be sufficient; presumably, in a real program one would want to actually specify the unique prefix, rather than a non-unique pre-prefix. It seems to work fine in my real program, anyway.
Uwe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi, all, thanks for the further inputs, all good stuff to think about... although it's going to be a little while before I can appreciate the inner beauty of Doaitse's version! :-) I had considered the approach of doing a post-parsec verification, but decided I wanted to keep it all inside the parser, hence the desire to match prefixes there (and lack of desire to write 'string "p" <|> string "pr" <|> string "pre" ...'. By way of background, the actual stuff I'm wanting to match is not food names, but some commands for a small ledger program I'm working on. I needed something like that and was tired of losing data to quicken every so often. I realize of course that there are other excellent ledger-type programs out there, but hey, I also needed another hacking project. I'll put this onto hackage in a while, once it does most of the basics of what I need. No doubt the main differentiator between mine and those other excellent ledger programs out there will be that mine has fewer features and more bugs... thanks again, all! Uwe

On 15 okt 2009, at 16:58, Uwe Hollerbach wrote:
Hi, all, thanks for the further inputs, all good stuff to think about... although it's going to be a little while before I can appreciate the inner beauty of Doaitse's version! :-)
The nice thing is that you do not have to understand the inner workings ;-} I basically builds a greedy parser for each word to be recognised which can stop and assume the rest is there if it can no longer proceed (the `opt` is greedy in its left alternative) . Hence it recognises the longest possible prefix. Since my parsers pursue all alternatives in parallel you automatically get what you want, without having to indicate prefix lengths, calls to try, etc. The "amb" combinator has type amb :: Parser a -> Parser [a] and collects the result from all alternatives its argument parser is constructed from; you might say it convert an ambiguous parser to a parser with a list as result, hence preventing the rest of the input being parsed over and over again. I am currently working on bringing back more abstract interpretation in the implementation (i.e. what we have had for almost 10 years in the uulib library), but I do not expect you to see a lot of that from the outside. If you want to work with left-recursive parsers (which does not seem to be the case), you may revert to more complicated solutions such as found in the "christmastree" (Changing Haskell's Read Implementation Such That by Manipulationg Abstract Syntax Trees Read Evaluates Efficiently) package if you need to generate parsers online, or to happy-based solutions in case your grammar is fixed. If you have any questions do not hesitate to ask, Doaitse
I had considered the approach of doing a post-parsec verification, but decided I wanted to keep it all inside the parser, hence the desire to match prefixes there (and lack of desire to write 'string "p" <|> string "pr" <|> string "pre" ...'.
By way of background, the actual stuff I'm wanting to match is not food names, but some commands for a small ledger program I'm working on. I needed something like that and was tired of losing data to quicken every so often. I realize of course that there are other excellent ledger-type programs out there, but hey, I also needed another hacking project. I'll put this onto hackage in a while, once it does most of the basics of what I need. No doubt the main differentiator between mine and those other excellent ledger programs out there will be that mine has fewer features and more bugs...
thanks again, all!
Uwe

Uwe Hollerbach wrote:
Yes, I've looked at that and am thinking about it. I'm not quite certain it's needed in my real program... I seem to have convinced myself that if I actually specify a proper set of unique prefixes, ie, set the required lengths for both "frito" and "fromage" to 3 in the test program, I won't get into this situation. Assuming I haven't committed another brain-fart there, that would be sufficient; presumably, in a real program one would want to actually specify the unique prefix, rather than a non-unique pre-prefix. It seems to work fine in my real program, anyway.
Another approach ---assuming you're not wedded to Parsec--- would be to construct a trie, e.g. with bytestring-trie[1]. Then use Data.Trie.submap to look up the query. If the result is unique then you go with it, if not then list the submap's keys in your error message. The big benefit of this approach is that you needn't maintain a list of lengths for disambiguating the keys, the trie does that for you. [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie -- Live well, ~wren

My fix would be to parse as many letters as possible "many1 alpha" (that's longest match) and then check the result with "isPrefixOf" for all your alternatives (and return the alternative that matches first). Cheers Christian Martijn van Steenbergen wrote:
Brandon S. Allbery KF8NH wrote:
My fix would be to have myPrefixOf require the prefix be terminated in whatever way is appropriate (end of input, white space, operator?) instead of simply accepting as soon as it gets a prefix match regardless of what follows.
Maybe you can use notFollowedBy for this.
HTH,
Martijn.
participants (7)
-
Brandon S. Allbery KF8NH
-
Christian Maeder
-
Derek Elkins
-
Martijn van Steenbergen
-
S. Doaitse Swierstra
-
Uwe Hollerbach
-
wren ng thornton