
On Fri, Oct 12, 2012 at 8:28 AM, Emmanuel Touzery
Hi,
when parsing the string representing a page, you could save all the links you encounter.
After the parsing you would load the linked pages and start again parsing.
You would redo this until no more links are returned or a maximum deepness is reached.
Thanks for the tip. That sounds much more reasonable than what I mentioned. It seems a bit "spaghetti" to me though in a way (but maybe I just have to get used to the Haskell way).
To be more specific about what I want to do: I want to parse TV programs. On the first page I have the daily listing for a channel. start/end hour, title, category, and link or not. To fully parse one TV program I can follow the link if it's present and get the extra info which is there (summary, pictures..).
If this were me, I would write the following: data ChannelListing = ChannelListing [BasicProgramInfo] -- | Summary of a program data BasicProgramInfo = BasicProgramInfo { basicStartTime :: ... , basicEndTime :: ... , basicTitle :: ... , basicUrl :: URL } -- | Full details of a program data ProgramInfo = ... fetchChannelListing :: ChannelId -> IO ChannelListing fetchProgramInfo :: BasicProgramInfo -> IO ProgramInfo And then I would string my program together from these primitives. That way large portions of the code can be built up from the pure data types, but the top-level can load them up as needed with impure functions. This is just my first impression, though. Antoine