Re: [Haskell-cafe] simple parsec question

5 Mar 2013

      Carlo,

Thanks a lot! This looks very promising (though I have to test it for my
purpose more in depth). As you mention, the key seems to be the optionMaybe
combinator. Thanks for pointing to it.

Immanuel

2013/3/5 Carlo Hamalainen 
...
On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann <
immanuel.normann@googlemail.com> wrote:
...
I am trying to parse a semi structured text with parsec that basically
should identify sections. Each section starts with a headline and has an
unstructured content - that's all.
Here's my attempt: https://gist.github.com/carlohamalainen/5087207
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Control.Applicative hiding ((<|>),many)
-- Example input:
{-
top 1:
some text ... bla
top 2:
more text ... bla bla
-}
data Top = Top String deriving (Show)
data Content = Content [String] deriving (Show)
data Section = Section Top Content deriving (Show)
headline = do
    t <- many1 (noneOf ":\n")
    char ':'
    newline
return $ Top t
contentLine = do
    x <- many (noneOf ":\n")
    newline
    return x
content = do
    line <- optionMaybe (try contentLine)
case line of Just x -> do xs <- content
                              return (x:xs)
                 _      -> return []
section = do
    h <- headline
    c <- Content <$> content
    return $ Section h c
main = do
    x <- readFile "simple.txt"
    print $ parse (many section) "" x
Example run using your sample data:
$ runhaskell Simple.hs
Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section
(Top "top 2") (Content ["","more text ... bla bla",""])]
Notes:
* I had to assume that a content line does not contain a ':', because that
is the only way to distinguish a head-line (correct me if I'm wrong).
* The key was to use optionMaybe along with try; see the definition of
content.
* I haven't tested this code on very large inputs.
* I slightly changed the definition of Content to have a list of Strings,
one for each line. I'm sure this could be altered if you wanted to retain
all whitespace.
* I am still new to Parsec, so don't take this as the definitive answer ;-)
--
Carlo Hamalainen
http://carlo-hamalainen.net