
On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann < immanuel.normann@googlemail.com> wrote:
I am trying to parse a semi structured text with parsec that basically should identify sections. Each section starts with a headline and has an unstructured content - that's all.
Here's my attempt: https://gist.github.com/carlohamalainen/5087207 {-# LANGUAGE FlexibleContexts #-} import Text.Parsec import Control.Applicative hiding ((<|>),many) -- Example input: {- top 1: some text ... bla top 2: more text ... bla bla -} data Top = Top String deriving (Show) data Content = Content [String] deriving (Show) data Section = Section Top Content deriving (Show) headline = do t <- many1 (noneOf ":\n") char ':' newline return $ Top t contentLine = do x <- many (noneOf ":\n") newline return x content = do line <- optionMaybe (try contentLine) case line of Just x -> do xs <- content return (x:xs) _ -> return [] section = do h <- headline c <- Content <$> content return $ Section h c main = do x <- readFile "simple.txt" print $ parse (many section) "" x Example run using your sample data: $ runhaskell Simple.hs Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section (Top "top 2") (Content ["","more text ... bla bla",""])] Notes: * I had to assume that a content line does not contain a ':', because that is the only way to distinguish a head-line (correct me if I'm wrong). * The key was to use optionMaybe along with try; see the definition of content. * I haven't tested this code on very large inputs. * I slightly changed the definition of Content to have a list of Strings, one for each line. I'm sure this could be altered if you wanted to retain all whitespace. * I am still new to Parsec, so don't take this as the definitive answer ;-) -- Carlo Hamalainen http://carlo-hamalainen.net