
Hi, I'm working on an rss file getter. I was wondering if I could get some help getting files to download and save without holding the entire file in memory in between. I chose Conduit's version of SimpleHttp only because it was recommended, and it was the quickest thing I could get to work correctly because I was eager to get started on this project, so I'd be happy to switch. Here's where I define the download and save functions: https://github.com/orblivion/feedGetter/blob/master/rss.hs#L107 And here's where I use them, getting multiple at a time with async: https://github.com/orblivion/feedGetter/blob/master/rss.hs#L208 What happens when I run this is that it outputs that it's "Getting" the file, waits a while (presumably to download the whole thing), then says it's "Saving". And I checked the file system, it's not there during the pause. I'm not entirely sure why. Is it my choice of libraries, or the way I'm using them? Perhaps something to do with async? I just tried content <- simpleHttp "http://google.com" in ghci, and it does pause for a second, so I'm guessing this is strict from the getgo. But I've done almost no I/O before. Is there a straightforward, canonical option? It seems like there perhaps should be. But if it comes down to using pipes or conduit, what the heck I'll try it out, I'd like to learn pipes eventually. Thanks a lot, Dan

On Sat, Aug 10, 2013 at 05:16:58PM -0700, Dan Krol wrote:
Hi,
I'm working on an rss file getter. I was wondering if I could get some help getting files to download and save without holding the entire file in memory in between. I chose Conduit's version of SimpleHttp only because it was recommended, and it was the quickest thing I could get to work correctly because I was eager to get started on this project, so I'd be happy to switch.
Here's where I define the download and save functions:
https://github.com/orblivion/feedGetter/blob/master/rss.hs#L107
And here's where I use them, getting multiple at a time with async:
https://github.com/orblivion/feedGetter/blob/master/rss.hs#L208
What happens when I run this is that it outputs that it's "Getting" the file, waits a while (presumably to download the whole thing), then says it's "Saving". And I checked the file system, it's not there during the pause. I'm not entirely sure why. Is it my choice of libraries, or the way I'm using them? Perhaps something to do with async? I just tried content <- simpleHttp "http://google.com" in ghci, and it does pause for a second, so I'm guessing this is strict from the getgo. But I've done almost no I/O before.
Is there a straightforward, canonical option? It seems like there perhaps should be. But if it comes down to using pipes or conduit, what the heck I'll try it out, I'd like to learn pipes eventually.
Michael is very good with documenting his packages, this is what I found in the docs for http-conduit (http://is.gd/WkDb7G): Note: Even though this function returns a lazy bytestring, it does not utilize lazy I/O, and therefore the entire response body will live in memory. If you want constant memory usage, you'll need to use the conduit package and http directly. /M -- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus I invented the term Object-Oriented, and I can tell you I did not have C++ in mind. -- Alan Kay

Ah yes, the actual docs. Somehow didn't think to check that, sorry.
Alright, I'll try to figure that one out, thanks. Any particular reason
nobody just offers http over lazy I/O? Is it just because lazy I/O is
generally discouraged? Or just particularly bad over a network?
And is this an area where Conduit is better than Pipes? There doesn't seem
to be a similar http for Pipes.
On Sat, Aug 10, 2013 at 11:30 PM, Magnus Therning
On Sat, Aug 10, 2013 at 05:16:58PM -0700, Dan Krol wrote:
Hi,
I'm working on an rss file getter. I was wondering if I could get some help getting files to download and save without holding the entire file in memory in between. I chose Conduit's version of SimpleHttp only because it was recommended, and it was the quickest thing I could get to work correctly because I was eager to get started on this project, so I'd be happy to switch.
Here's where I define the download and save functions:
https://github.com/orblivion/feedGetter/blob/master/rss.hs#L107
And here's where I use them, getting multiple at a time with async:
https://github.com/orblivion/feedGetter/blob/master/rss.hs#L208
What happens when I run this is that it outputs that it's "Getting" the file, waits a while (presumably to download the whole thing), then says it's "Saving". And I checked the file system, it's not there during the pause. I'm not entirely sure why. Is it my choice of libraries, or the way I'm using them? Perhaps something to do with async? I just tried content <- simpleHttp "http://google.com" in ghci, and it does pause for a second, so I'm guessing this is strict from the getgo. But I've done almost no I/O before.
Is there a straightforward, canonical option? It seems like there perhaps should be. But if it comes down to using pipes or conduit, what the heck I'll try it out, I'd like to learn pipes eventually.
Michael is very good with documenting his packages, this is what I found in the docs for http-conduit (http://is.gd/WkDb7G):
Note: Even though this function returns a lazy bytestring, it does not utilize lazy I/O, and therefore the entire response body will live in memory. If you want constant memory usage, you'll need to use the conduit package and http directly.
/M
-- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus
I invented the term Object-Oriented, and I can tell you I did not have C++ in mind. -- Alan Kay
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Dan Krol wrote:
Ah yes, the actual docs. Somehow didn't think to check that, sorry.
Alright, I'll try to figure that one out, thanks. Any particular reason nobody just offers http over lazy I/O? Is it just because lazy I/O is generally discouraged? Or just particularly bad over a network?
Lazy I/O is particularly problematic when implmenting network servers and generally discouraged for networking code.
And is this an area where Conduit is better than Pipes?
Conduit has been around longer and is thus more mature and complete.
There doesn't seem to be a similar http for Pipes.
I believe Andrew Cowie, author of http-streams (on top of io-streams) is working on a Pipes version. HTH, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
participants (3)
-
Dan Krol
-
Erik de Castro Lopo
-
Magnus Therning