Conduits vs. lazy byte strings

I'm working on a web app that loads a file, tweaks it a bit, then downloads the results. I'd like for it to use the minimal amount of memory possible, just as good practice. Especially since the tweaking all happens in the first K or so of the file, and the rest of it is passed through untouched. The current version uses a conduit that just reads the data to a sinkLbs to get a lazy bytestring, which is then processed. I think this will have the desired behavior (after all, the bytestring is lazy), but have this itch that says I should be doing the processing in the conduit. Someone want to tell me if I correctly understand things and the itch is just leftover imperative thinking, or the itch is right and I need to fix the code? If you're intersted, you can find the code at https://www.fpcomplete.com/user/mwm/xyzifiy

Mike Meyer
writes:
The current version uses a conduit that just reads the data to a sinkLbs to get a lazy bytestring, which is then processed.
sinkLbs reads the entire contents into memory, so this is the exact opposite of what you want.
Someone want to tell me if I correctly understand things and the itch is just leftover imperative thinking, or the itch is right and I need to fix the code?
You should write a Conduit or a Sink which will do the processing you need. By default, you'll receive "chunks" at each call to "await". If you need lines, there is a linesUnbounded Conduit (as of conduit 1.1), but it still reads whole chunks into memory at a time (I believe the default chunk size is 32k)? But that's the same behavior as plain lazy I/O. Once your Conduit or Sink (i.e., Consumer) finds the data it needs, it should simply end, and not call await anymore. This will inform upstream that processing and done and that all finalizers should be executed. John
participants (2)
-
John Wiegley
-
Mike Meyer