Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages

Hi Tim,
Ø I realize you've already finished with the project ...
Actually, your message comes at an excellent time. I am not finished with the project. I have only finished one of the email headers -- the From Header.
Just today I was wondering how to proceed next:
- Should I extend my parser so that it deals with each of the other email headers? That is, create one monolithic parser for the entire email message? That doesn't seem very modular. I don't think that Happy supports importing other Happy parsers. Ideally I would create a parser for the From Header, a parser for the To Header, a parser for the Subject Header, and so forth. Then I would import each of them to create one unified email parser. If Happy doesn't support importing, I figured it might be better to switch to something that can combine parsers - a parser combinator - such as parsec. Unfortunately, I don't know anything about Parsec, but am eager to learn.
- I wonder if I can use Happy to generate individual parsers - a parser for the From Header, a parser for the To Header, a parser for the Subject Header - and then use Parsec to combine them?
As you see Tim, your suggestion to use parsec falls on receptive ears. I welcome all suggestions.
/Roger
From: beginners-bounces@haskell.org [mailto:beginners-bounces@haskell.org] On Behalf Of Tim Holland
Sent: Friday, June 28, 2013 12:18 PM
To: beginners@haskell.org
Subject: Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages
Hi Roger,
I realize you've already finished with the project, but for the future I think its a lot easier to use a parser combinator with Text.Parsec and Text.Parsec.String to do a similar thing. For example, if you were parsing XML to get a parse a single tag, you would try something like this:
parseTag :: Parser Tag
parseTag = many1 alphanum <?> "tag"
To get a tagged form, try
parseTagged :: Parser (Tag, [Elem])
parseTagged = do
char '<'
name <- parseTag
char '>'
content <- many (try parseElem)
string "</"
parseTag
char '>'
return (name, content)
<?> "tagged form"
and so one. I haven't tried this out, but a parser similar to yours would go something like this:
--Datatypes
type DisplayName = String
type EmailAddress = String
data Mailbox = Mailbox DisplayName EmailAddress deriving (Show)
parseFromHeader :: Parser [Mailbox]
parseFromHeader = do
string "From: "
mailboxes = many (try parseMailbox)
return mailboxes
parseMailbox :: Parser Mailbox
parseMailbox = do
parseComments
-- Names are optional
parseComments
name <- try parseDisplayName
parseComments
address <- parseEmailAddress
parseComments
try char ','
return Mailbox name address
> "Parse an indidivuals mailbox"
parseEmailAddress :: Parser EmailAddress
parseEmailAddress = do
try char '<'
handle <- many1 (noneof "@") -- Or whatever is valid here
char '@'
domain <- parseDomain
try char '<'
return handle++@++domain
parseDomain :: Parser String
parseDomain =
(char '[' >> parseDomain >>= (\domainName -> do char ']'
return domainName))
<|> parseWebsiteName >>= return
And so on. Again, I've tested none of the Email header bits but the XML bit works. It requires some level of comfort with monadic operations, but beyond that I think it's a much simpler may to parse.
Regards,
Tim Holland
On 28 June 2013 03:00,
data Foo = Int Int | Float :t Int Int :: Int -> Foo :t Float Float :: Foo :t Int 4 Int 4 :: Foo
It's confusing to have type constructors that use names of existing types. It's not intuitive that the name "Int" could refer to two different things, which brings me to:
data Bar = Bar Int :t Bar Bar :: Int -> Bar
Yay? I can have a simple type with one constructor named the same as the type.
Why is this allowed? Is it useful somehow?
--Patrick
------------------------------
Message: 2
Date: Thu, 27 Jun 2013 11:37:46 -0400
From: Brandon Allbery
I noticed that ghci lets me do this:
Not just ghci, but ghc as well.
Yay? I can have a simple type with one constructor named the same as the type. Why is this allowed? Is it useful somehow?
It's convenient for pretty much the situation you showed, where the type constructor and data constructor have the same name. A number of people do advocate that it not be used, though, because it can be confusing for people. (Not for the compiler; data and type constructors can't be used in the same places, it never has trouble keeping straight which is which.) It might be best to consider this as "there is no good reason to *prevent* it from happening, from a language standpoint". -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.commailto:allbery.b@gmail.com ballbery@sinenomine.netmailto:ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (1)
-
Costello, Roger L.