Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages

28 Jun 2013

      Hi Tim,

Ø  I realize you've already finished with the project ...

Actually, your message comes at an excellent time. I am not finished with the project. I have only finished one of the email headers -- the From Header.

Just today I was wondering how to proceed next:

-          Should I extend my parser so that it deals with each of the other email headers? That is, create one monolithic parser for the entire email message? That doesn't seem very modular. I don't think that Happy supports importing other Happy parsers. Ideally I would create a parser for the From Header, a parser for the To Header, a parser for the Subject Header,  and so forth. Then I would import each of them to create one unified email parser. If Happy doesn't support importing, I figured it might be better to switch to something that can combine parsers - a parser combinator - such as parsec. Unfortunately, I don't know anything about Parsec, but am eager to learn.

-          I wonder if I can use Happy to generate individual parsers - a parser for the From Header, a parser for the To Header, a parser for the Subject Header - and then use Parsec to combine them?

As you see Tim, your suggestion to use parsec falls on receptive ears. I welcome all suggestions.

/Roger

From: beginners-bounces@haskell.org [mailto:beginners-bounces@haskell.org] On Behalf Of Tim Holland
Sent: Friday, June 28, 2013 12:18 PM
To: beginners@haskell.org
Subject: Re: [Haskell-beginners] How to Lex, Parse, and Serialize-to-XML email messages

Hi Roger,

I realize you've already finished with the project, but for the future I think its a lot easier to use a parser combinator with Text.Parsec and Text.Parsec.String to  do a similar thing. For example, if you were parsing XML to get a parse a single tag, you would try something like this:

parseTag :: Parser Tag
parseTag = many1 alphanum <?> "tag"

To get a tagged form, try
parseTagged :: Parser (Tag, [Elem])
parseTagged = do
  char '<'
  name <- parseTag
  char '>'
  content <- many (try parseElem)
  string "</"
  parseTag
  char '>'
  return (name, content)
  <?> "tagged form"

and so one. I haven't tried this out, but a parser similar to yours would go something like this:

--Datatypes
type DisplayName = String
type EmailAddress = String
data Mailbox = Mailbox DisplayName EmailAddress deriving (Show)

parseFromHeader :: Parser [Mailbox]
parseFromHeader = do
  string "From: "
  mailboxes = many (try parseMailbox)
  return mailboxes

parseMailbox :: Parser Mailbox
parseMailbox = do
  parseComments
  -- Names are optional
  parseComments
  name <- try parseDisplayName
  parseComments
  address <- parseEmailAddress
  parseComments
  try char ','
  return Mailbox name address
   "Parse an indidivuals mailbox"

parseEmailAddress :: Parser EmailAddress
parseEmailAddress = do
  try char '<'
  handle <- many1 (noneof "@") -- Or whatever is valid here
  char '@'
  domain <- parseDomain
  try char '<'
  return handle++@++domain

parseDomain :: Parser String
parseDomain =
  (char '[' >> parseDomain >>= (\domainName -> do char ']'
    return domainName))
<|> parseWebsiteName >>= return

And so on. Again, I've tested none of the Email header bits but the XML bit works. It requires some level of comfort with monadic operations, but beyond that I think it's a much simpler may to parse.

Regards,
Tim Holland

On 28 June 2013 03:00, mailto:beginners-request@haskell.org> wrote:
Send Beginners mailing list submissions to
        beginners@haskell.orgmailto:beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-request@haskell.orgmailto:beginners-request@haskell.org

You can reach the person managing the list at
        beginners-owner@haskell.orgmailto:beginners-owner@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."

Today's Topics:

   1.  data declaration using other type's names? (Patrick Redmond)
   2. Re:  data declaration using other type's names? (Brandon Allbery)
   3. Re:  data declaration using other type's names? (Nikita Danilenko)
   4. Re:  what to do about excess memory usage (Chadda? Fouch?)
   5. Re:  what to do about excess memory usage (James Jones)
   6.  How to Lex, Parse,       and Serialize-to-XML email messages
      (Costello, Roger L.)

----------------------------------------------------------------------

Message: 1
Date: Thu, 27 Jun 2013 11:24:51 -0400
From: Patrick Redmond mailto:plredmond@gmail.com>
Subject: [Haskell-beginners] data declaration using other type's
        names?
To: beginners@haskell.orgmailto:beginners@haskell.org
Message-ID:
        mailto:v6g@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hey Haskellers,

I noticed that ghci lets me do this:
...
data Foo = Int Int | Float
:t Int
Int :: Int -> Foo
:t Float
Float :: Foo
:t Int 4
Int 4 :: Foo
It's confusing to have type constructors that use names of existing
types. It's not intuitive that the name "Int" could refer to two
different things, which brings me to:
...
data Bar = Bar Int
:t Bar
Bar :: Int -> Bar
Yay? I can have a simple type with one constructor named the same as the type.

Why is this allowed? Is it useful somehow?

--Patrick

------------------------------

Message: 2
Date: Thu, 27 Jun 2013 11:37:46 -0400
From: Brandon Allbery mailto:allbery.b@gmail.com>
Subject: Re: [Haskell-beginners] data declaration using other type's
        names?
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell mailto:beginners@haskell.org>
Message-ID:
        mailto:CAKFCL4U-E4B_%2Bcts0vpNX8Ar9wccQDjgzWOYHLXLsLAv%2BQn_cg@mail.gmail.co...>
Content-Type: text/plain; charset="utf-8"

On Thu, Jun 27, 2013 at 11:24 AM, Patrick Redmond mailto:plredmond@gmail.com>wrote:
...
I noticed that ghci lets me do this:
Not just ghci, but ghc as well.
...
Yay? I can have a simple type with one constructor named the same as the
type.
Why is this allowed? Is it useful somehow?
It's convenient for pretty much the situation you showed, where the type
constructor and data constructor have the same name. A number of people do
advocate that it not be used, though, because it can be confusing for
people. (Not for the compiler; data and type constructors can't be used in
the same places, it never has trouble keeping straight which is which.)

It might be best to consider this as "there is no good reason to *prevent*
it from happening, from a language standpoint".

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b@gmail.commailto:allbery.b@gmail.com                                  ballbery@sinenomine.netmailto:ballbery@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

Costello, Roger L.

tags

participants (1)