Re: [Haskell-cafe] simple parsec question

3 Mar 2013

      Andrey,

Thanks for your attempt, but it doesn't seem to work. The easy part is the
headline, but the content makes trouble.

Let me write the code a bit more explicit, so you can copy and paste it:

------------------------------------------
{-# LANGUAGE FlexibleContexts #-}

module Main where

import Text.Parsec

data Top = Top String deriving (Show)
data Content = Content String deriving (Show)
data Section = Section Top Content deriving (Show)

headline :: Stream s m Char => ParsecT s u m Top
headline = manyTill anyChar (char ':' >> newline) >>= return . Top

content :: Stream s m Char => ParsecT s u m Content
content = manyTill anyChar (try headline) >>= return . Content

section :: Stream s m Char => ParsecT s u m Section
section = do {h <- headline; c <- content; return (Section h c)}
------------------------------------------

Assume the following example text is stored in  "/tmp/test.txt":
---------------------------
top 1:

some text ... bla

top 2:

more text ... bla bla
---------------------------

Now I run the section parser in ghci against the above mentioned example
text stored in "/tmp/test.txt":

*Main> parseFromFile section "/tmp/test.txt"
Right (Section (Top "top 1") (Content ""))

I don't understand the behaviour of the content parser here. Why does it
return ""? Or perhaps more generally, I don't understand the manyTill
combinator (though I read the docs).

Side remark: of cause for this little task it is probably to much effort to
use parsec. However, my content in fact has an internal structure which I
would like to parse further, but I deliberately abstracted from these
internals as they don't effect my above stated problem.

Immanuel

2013/3/3 Andrey Chudnov 
...
Immanuel,
Since a heading always starts with a new line (and ends with a colon
followed by a carriage return or just a colon?), I think it might be useful
to first separate the input into lines and then classify them depending on
whether it's a heading or not and reassemble them into the value you need.
You don't even need parsec for that.
However, if you really want to use parsec, you can write something like
(warning, not tested):
many $ liftM2 Section headline content
   where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
               content  = anyChar `manyTill` (try $ newline >> headline)
/Andrey
On 3/3/2013 10:44 AM, Immanuel Normann wrote:
...
I am trying to parse a semi structured text with parsec that basically
should identify sections. Each section starts with a headline and has an
unstructured content - that's all. For instance, consider the following
example text (inside the dashed lines):
---------------------------
top 1:
some text ... bla
top 2:
more text ... bla bla
---------------------------
This should be parsed into a structure like this:
[Section (Top 1) (Content "some text ... bla"), Section (Top 1) (Content
"more text ... bla")]
Say, I have a parser "headline", but the content after a headline could
be anything that is different from what "headline" parses.
How could the "section" parser making use of "headline" look like?
My idea would be to use the "manyTill" combinator, but I don"t find an
easy solution.