What I'm trying to do is write a function with the signature:
data Urls =
Txt Text.Text
| Url Text.Text URI.URI
deriving (Show)
parseUrls :: Text.Text -> Either String [ Urls ]
parseUrls text = ...
Given a text block, it finds all the URLs, and breaks things
into either URLs, or blocks of text which are not URLs. The
full text is attached, for those who are interested. But the
problem I'm hitting is using the Attoparsec parser
URI.ByteString exports. When I do:
*Base.DetectURL AP>
AP.parseOnly (URI.uriParser URI.laxURIParserOptions) "http://foo/bar"
Right (...)
So, that works. But when I add a single space on the end of the
string:
*Base.DetectURL AP>
AP.parseOnly (URI.uriParser URI.laxURIParserOptions) "http://foo/bar
"
Left "Failed reading: MalformedPath"
It fails. Note that this isn't a problem with parseOnly-
the real code looks like:
parseAllUris ::
AP.Parser (Bldr.Builder, [ Urls ])
parseAllUris = msum
[ aUri, noUri, finished ]
where
finished =
return (mempty, [])
aUri = do
(txt,
url) <- AP.match $
URI.uriParser URI.laxURIParserOptions
(bldr,
us) <- msum [ noUri, finished ]
return
$ (mempty, (Url (E.decodeUtf8 txt) url
: prependText bldr us))
noUri = do
c <-
AP.anyChar
(bldr,
us) <- parseAllUris
return
$ ((Bldr.charUtf8 c) `mappend` bldr, us)
And this has the problem as well- parsing a URL with
anything following it fails, and it doesn't detect any URLs.
The parseOnly is just the easy way to demonstrate it.
So, my question is, is there some way in attoparsec to
tell it to just parse as much as makes sense, and leave the
rest? Alternatively, is this a problem with the way
URI.ByteString module constructed it's parser, and a
different parser could work? Or, worst of all, is this a
problem with the way that URIs are defined and no conforming
parser will work?
Thanks.
Brian