On Fri, Oct 12, 2012 at 8:58 PM, Sean Perry <shaleh@speakeasy.net> wrote:
On Oct 12, 2012, at 7:19 AM, Emmanuel Touzery wrote:
>>
>> Overall, splitting your algorithm into simple steps — steps that would
>> do just a part of work and return incomplete objects — is the way to go.
>>
>
> You have a point, about splitting code for smaller functions. I would just rather have getDetails called from getProgramme rather than a parent calling both separately. And the parent must do the connection by doing the IO if I want both pieces to be pure. That is what is bothering me mostly.
>

Think about this from a testing perspective. How do you verify that your code which identifies links is working? If the link finding is mixed in with the link retrieving you end up having to dummy out the IO. Think of this as the code becomes more complicated and like Alexander suggests you want to later retrieve images too. Now you need to mock out the image retrieval as well.

Perhaps you should think of this as creating a matching DOM like structure. First you tree starts out empty. Then you parse the top level and return a new tree with data and dangling nodes that are links needing to be followed. You check "have I gone as deep as I would like?". If not, pass in the new partial tree to the retrieval routine and start filling it in. Now you are back to the depth check. When the retrieval has reached its goal the tree is returned and it is as populated as it can be. Now the rest of your code can use the tree for whatever it needs.

Remember to always ask "how do I test this?". One of the key reasons to keep purity is it makes the testing so much easier. Every small piece can be verified.

Thank you for your opinion, it does bring another set of concerns.

What you suggest is the approach suggested by Daniel Trstenjak, the very first answer, and it definitely has value, but the question is code readability. It's a fine balance. That's what I was asking at the beginning, how hard should we try to strive for pure code, and here the balance seems to depend on the person (while I thought it's a dogma in the Haskell community, as much pure code as possible).

In this case I think the purity means more code to be written (re-reading and re-writing data structures instead of writing them just once) and I'm not sure it's worth the cost, I'd say Daniel Trstenjak's second answer convinced me, but I'm just starting with Haskell and I guess I'll get a clearer sense of this with time.

But it's also good to see that there is consensus on how to code this, if we want to maximize pure code.

Emmanuel