Proving my point

Achim Schneider

16 May 2008 16 May '08

5:42 p.m.

"test.l" (line 7, column 1): unexpected end of input expecting "(", Lambda abstraction, Let binding, Atom, end of input or Function application I obviously don't know anything about Parsec's inner workings. I'm going to investigate as soon as I stopped despairing. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Show replies by date

Andrew Coppin

16 May 16 May

6:52 p.m.

Achim Schneider wrote:

...

"test.l" (line 7, column 1): unexpected end of input expecting "(", Lambda abstraction, Let binding, Atom, end of input or Function application

I obviously don't know anything about Parsec's inner workings. I'm going to investigate as soon as I stopped despairing.

Wait... "unexpected end of input; expecting [...] end of input [...]" That's just *wrong*...! ;-) But don't despaire - show us your parser and what it's supposed to parse, and I'm sure somebody [maybe even me] will be able to tell you what's up.

Achim Schneider

7:33 p.m.

Andrew Coppin wrote:

...

Wait... "unexpected end of input; expecting [...] end of input [...]"

That's just *wrong*...! ;-)

But don't despaire - show us your parser and what it's supposed to parse, and I'm sure somebody [maybe even me] will be able to tell you what's up.

This is what I came up with while simplifying the parser: import Text.Parsec identifier = do whiteSpace s <- many1 letter whiteSpace return s whiteSpace = do eof <|> ((many $ choice [ char ' ', newline ]) >> return ()) main = do let syn = runParser (do char '\\' many1 identifier char ':' whiteSpace identifier whiteSpace ) () "" "\\a b" print syn Admittedly, this is a quite degenerate case crouching in at least 10 corners simultaneously. Anyway, I get % ./test Left (line 1, column 5): unexpected end of input expecting end of input, letter or ":" and if I change it to whiteSpace = do (many eof >> return ()) <|> ((many $ choice [ char ' ', newline ]) >> return ()) Left (line 1, column 3): unexpected " " expecting letter, end of input or ":" Please, please don't ask me for the rationale of using eof like this, you would get the same answer as if you'd ask me why I cast a stone into the sea. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Philippa Cowderoy

8:25 p.m.

On Fri, 16 May 2008, Achim Schneider wrote:

...

Andrew Coppin wrote:

...
Wait... "unexpected end of input; expecting [...] end of input [...]"

That's just *wrong*...! ;-)

But don't despaire - show us your parser and what it's supposed to parse, and I'm sure somebody [maybe even me] will be able to tell you what's up.

This is what I came up with while simplifying the parser:

import Text.Parsec

identifier = do whiteSpace s <- many1 letter whiteSpace return s

whiteSpace = do eof <|> ((many $ choice [ char ' ', newline ]) >> return ())

main = do let syn = runParser (do char '\\' many1 identifier char ':' whiteSpace identifier whiteSpace ) () "" "\\a b" print syn

Admittedly, this is a quite degenerate case crouching in at least 10 corners simultaneously. Anyway, I get

% ./test Left (line 1, column 5): unexpected end of input expecting end of input, letter or ":"

Confusing, isn't it? It's almost the right message, too. I'm pretty sure the misbehaviour's because eof doesn't consume - see what happens if you put an error message on all of whiteSpace?

...

and if I change it to

whiteSpace = do (many eof >> return ()) <|> ((many $ choice [ char ' ', newline ]) >> return ())

Left (line 1, column 3): unexpected " " expecting letter, end of input or ":"

Which is broken for your purposes, but that's because many always succeeds so the changed whiteSpace doesn't actually eat whitespace.

...

Please, please don't ask me for the rationale of using eof like this, you would get the same answer as if you'd ask me why I cast a stone into the sea.

As a matter of general practice I'd suggest including eof exactly once, as in: topLevel = do {r <- realTopLevel; eof; return r} realTopLevel = ... Which isn't to say that you haven't run into something confusing and possibly broken here, of course. -- flippa@flippac.org "The reason for this is simple yet profound. Equations of the form x = x are completely useless. All interesting equations are of the form x = y." -- John C. Baez

Philippa Cowderoy

8:40 p.m.

On Fri, 16 May 2008, Philippa Cowderoy wrote:

...

Confusing, isn't it? It's almost the right message, too. I'm pretty sure the misbehaviour's because eof doesn't consume - see what happens if you put an error message on all of whiteSpace?

It is indeed, and because the error merging code can't tell eof's "don't consume" from the "don't consume" try returns when its parm fails - nor is there any equivalent distinction in the error values. Which is to say: it's broken, but at least I know how to fix it in the library. -- flippa@flippac.org "The reason for this is simple yet profound. Equations of the form x = x are completely useless. All interesting equations are of the form x = y." -- John C. Baez

Daniel Fischer

8:25 p.m.

Am Freitag, 16. Mai 2008 21:33 schrieb Achim Schneider:

...

Andrew Coppin wrote:

...
Wait... "unexpected end of input; expecting [...] end of input [...]"

That's just *wrong*...! ;-)

But don't despaire - show us your parser and what it's supposed to parse, and I'm sure somebody [maybe even me] will be able to tell you what's up.

This is what I came up with while simplifying the parser:

import Text.Parsec

identifier = do whiteSpace s <- many1 letter whiteSpace return s

whiteSpace = do eof <|> ((many $ choice [ char ' ', newline ]) >> return ())

main = do let syn = runParser (do char '\\' many1 identifier char ':' whiteSpace identifier whiteSpace ) () "" "\\a b" print syn

running char '\\' on input "\\a b", okay, no problem. running many1 identifier on remaining input "a b" 1. whiteSpace: no eof, many (choice [char ' ', newline) returns [], all fine, nothing consumed 2. many1 letter, remainin input begins with 'a', okay, many1 letter returns "a", remaining input " b" 3. whiteSpace: n eof, one ' ' ~> ' ' consumed, remains "b" first identifier parsed, try another one. whiteSpace again consumes nothing, many1 letter returns "b", remaining input is null. whiteSpace finds eof, second identifier parsed, all input consumed. Now there are two options for a successful parse, a) another identifier b) ':' a) 1. whiteSpace, finds eof, consumes nothing, success 2. many1 letter fails immediately, nothing consumed, overall, identifier fails without consuming anything. b) char ':' fails without consumption. Drat, I want an identifier or a ':' here, I don't expect end of input yet. "unexpected end of input" "expecting " Lemme look what would I need here to continue. Ah, an identifier, how would that start? Oh, yes it could start with "end of input" what else? many (choice [char ' ', newline]), 'many', hmm doesn't require anything specific, I'll skip that, what then? many1 letter, oh, yes ", letter" Or I could continue with char ':' here, so " or \":\"" Oops, bad definition of whiteSpace, I'd say. Don't accept an eof unless you really don't want to continue.

...

Admittedly, this is a quite degenerate case crouching in at least 10 corners simultaneously. Anyway, I get

% ./test Left (line 1, column 5): unexpected end of input expecting end of input, letter or ":"

and if I change it to

whiteSpace = do (many eof >> return ()) <|> ((many $ choice [ char ' ', newline ]) >> return ())

Left (line 1, column 3): unexpected " " expecting letter, end of input or ":"

Sure thing, (many eof) never fails, so the new whiteSpace is in fact many eof >> return () So after having parsed the first identifier, we're trying to either parse another one or a ':'. First, identifier is tried. many eof succeeds, now try letter, that fails. so we have a successful start into many eof, hence, for identifier to succeed, we need more eofs or a letter next ~> "expecting end of input, letter" Or we have no more identifiers, then we need " or \":\""

...

Please, please don't ask me for the rationale of using eof like this, you would get the same answer as if you'd ask me why I cast a stone into the sea.

And why did you do that?

Achim Schneider

8:47 p.m.

Daniel Fischer wrote:

...

[very helpful stuff]

...
Please, please don't ask me for the rationale of using eof like this, you would get the same answer as if you'd ask me why I cast a stone into the sea.

And why did you do that?

To cast away something I don't understand. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Philippa Cowderoy

7:39 p.m.

On Fri, 16 May 2008, Achim Schneider wrote:

...

"test.l" (line 7, column 1): unexpected end of input expecting "(", Lambda abstraction, Let binding, Atom, end of input or Function application

I obviously don't know anything about Parsec's inner workings. I'm going to investigate as soon as I stopped despairing.

One gotcha here, which really wants fixing: if the show instance for your token type returns "", Parsec assumes that's EOF for error purposes. Guess who ran into that with a separate token for layout-inserted braces? -- flippa@flippac.org Sometimes you gotta fight fire with fire. Most of the time you just get burnt worse though.

Achim Schneider

8:49 p.m.

Philippa Cowderoy wrote:

...

On Fri, 16 May 2008, Achim Schneider wrote:

Guess who ran into that with a separate token for layout-inserted braces?

It can't be me, as I attempted to be as lazy as possible, not going for a tokenising pass, and ended up being too lazy. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Philippa Cowderoy

8:57 p.m.

On Fri, 16 May 2008, Achim Schneider wrote:

...

Philippa Cowderoy wrote:

...
On Fri, 16 May 2008, Achim Schneider wrote:

Guess who ran into that with a separate token for layout-inserted braces?

It can't be me, as I attempted to be as lazy as possible, not going for a tokenising pass, and ended up being too lazy.

Nah, you just picked the wrong way to attempt discipline. I don't use separate tokenising/lexing passes in a lot of my code (though you can't really avoid it when you want to do layout), it's a matter of knowing how it's done. Unless you've got a lexical structure that prevents it (which is to say, there're situations in which two tokens following each other aren't allowed to have whitespace between them), it's a good idea to have your token productions eat any whitespace following them, and then your toplevel becomes: do {whitespace; r <- realTopLevel; eof; return r} and then you need never worry about it again. -- flippa@flippac.org Ivanova is always right. I will listen to Ivanova. I will not ignore Ivanova's recomendations. Ivanova is God. And, if this ever happens again, Ivanova will personally rip your lungs out!

Achim Schneider

9:14 p.m.

Philippa Cowderoy wrote:

...

On Fri, 16 May 2008, Achim Schneider wrote:

...
Philippa Cowderoy wrote:

...
On Fri, 16 May 2008, Achim Schneider wrote:

Guess who ran into that with a separate token for layout-inserted braces?

It can't be me, as I attempted to be as lazy as possible, not going for a tokenising pass, and ended up being too lazy.

Nah, you just picked the wrong way to attempt discipline. I don't use separate tokenising/lexing passes in a lot of my code (though you can't really avoid it when you want to do layout), it's a matter of knowing how it's done. Unless you've got a lexical structure that prevents it (which is to say, there're situations in which two tokens following each other aren't allowed to have whitespace between them), it's a good idea to have your token productions eat any whitespace following them, and then your toplevel becomes:

do {whitespace; r <- realTopLevel; eof; return r}

and then you need never worry about it again.

My problem is that realTopLevel = expr, and that I get into an infinite recursion, never "closing" enough parens, never hitting eof. Additionally, two passes are definitely easier to reason about, and it's wise to be lazy there. Btw: Is there any way to make Parsec return a tree of things it tried? The end-user error messages are quite often just not informative enough while debugging the parser itself. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Philippa Cowderoy

9:46 p.m.

On Fri, 16 May 2008, Achim Schneider wrote:

...

My problem is that realTopLevel = expr, and that I get into an infinite recursion, never "closing" enough parens, never hitting eof.

Have you run into the left-recursion trap, by any chance? This doesn't work: expr = do expr; ... You can cover common cases with combinators like many* and chain* though.

...

Btw: Is there any way to make Parsec return a tree of things it tried? The end-user error messages are quite often just not informative enough while debugging the parser itself.

If you're willing to accept a little pain, you can write a few helper functions akin to that keep a log in Parsec's state and extract it from there. -- flippa@flippac.org Society does not owe people jobs. Society owes it to itself to find people jobs.

Achim Schneider

10:17 p.m.

Philippa Cowderoy wrote:

...

On Fri, 16 May 2008, Achim Schneider wrote:

...
My problem is that realTopLevel = expr, and that I get into an infinite recursion, never "closing" enough parens, never hitting eof.

Have you run into the left-recursion trap, by any chance?

This doesn't work:

expr = do expr; ...

expr = do {e <- parens expr; return $ Nest e} <|> lambda <|> _let <|> try app <|> atom There's at least one token before any recursion, so I guess not. After all, it terminates. It's my state that does not succeed in directing the parser not to mess up, so I'm reimplementing the thing as a two-pass but stateless parser now. Definitely the easier and clearer thing to do: I can have an end of line token that carries the number of trailing spaces, so I got perfect indent information without any pain involved, at all, and don't have to make parsers fail based on state. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

Philippa Cowderoy

10:45 p.m.

On Sat, 17 May 2008, Achim Schneider wrote:

...

There's at least one token before any recursion, so I guess not. After all, it terminates. It's my state that does not succeed in directing the parser not to mess up, so I'm reimplementing the thing as a two-pass but stateless parser now.

In most cases, you're better off stateless unless you've got a really good reason for it. Or at least, not using the state for anything that affects the parse itself.

...

Definitely the easier and clearer thing to do: I can have an end of line token that carries the number of trailing spaces, so I got perfect indent information without any pain involved, at all, and don't have to make parsers fail based on state.

Definitely! Are you doing some form of layout? It's certainly not worth doing in one pass IMO, I ended up with a three pass design much like that in the Haskell 98 report. Well, that's an understatement - I took the algorithm from it! -- flippa@flippac.org There is no magic bullet. There are, however, plenty of bullets that magically home in on feet when not used in exactly the right circumstances.

Achim Schneider

11:53 p.m.

Philippa Cowderoy wrote:

...

On Sat, 17 May 2008, Achim Schneider wrote:

...
Definitely the easier and clearer thing to do: I can have an end of line token that carries the number of trailing spaces, so I got perfect indent information without any pain involved, at all, and don't have to make parsers fail based on state.

Definitely! Are you doing some form of layout?

Yes, /pair x y m: m x y /fst z: z \p q: p /snd z: z \p q: q /numbers: pair one two /run: pair (fst numbers) (snd numbers) run is supposed to work (/ indicates a let). I'm trying to purge scheme out of my mind by implementing something that looks quite like it, and then change it. The rule is simple: An indented line continues the previous, and a non-indented closes every opened paren except one from the previous line, eof closing all that are left. I still have to think about recursive lets, but I guess I will go unlambda and just include a Y combinator, keeping the syntax simple. OTOH, I'm thinking about experimenting with a thing remotely resembling varargs and streams, being able to generate and consume possibly infinite argument streams, somewhat equalling tuples, lists and application. Just playing around, you know. -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

6262

Age (days ago)

6262

Last active (days ago)

List overview

Download

14 comments

4 participants

participants (4)

Achim Schneider
Andrew Coppin
Daniel Fischer
Philippa Cowderoy

Proving my point

Achim Schneider

Andrew Coppin

Achim Schneider

Philippa Cowderoy

Philippa Cowderoy

Daniel Fischer

Achim Schneider

Philippa Cowderoy

Achim Schneider

Philippa Cowderoy

Achim Schneider

Philippa Cowderoy

Achim Schneider

Philippa Cowderoy

Achim Schneider

tags

participants (4)