Auto-termination and leftovers in Conduits

Hey, Say I have a stream of Data.Text.Text objects flowing through a conduit, where the divisions between successive Data.Text.Text items occur at arbitrary boundaries (maybe the source is sourceFile $= decode utf8). I'd like to create a Sink that returns a tuple of (the first line, the rest of the input). My first attempt at this looks like this: sink = do out1 <- CT.lines C.=$ CL.head out2 <- CL.consume return (out1, T.concat out2) However, the following input provides: runIdentity $ CL.sourceList ["abc\nde", "f\nghi"] C.$$ sink (Just "abc","f\nghi") But what I really want is (Just "abc", "\ndef\nghi") I think this is due to the auto-termination you mention in [1]. My guess is that when CT.lines yields the first value, (CL.head then also yields it,) and execution is auto-terminated before CT.lines gets a chance to specify any leftovers. How can I write this sink? (I know I can just use CL.consume and T.break (== '\n'), but I'm not interested in that. I'm trying to figure out how to get the behavior I'm looking for with conduits.) Thanks, Myles [1] http://hackage.haskell.org/packages/archive/conduit/0.5.2.7/doc/html/Data-Co...

The important issue here is that, when using =$, $=, and =$=, leftovers
will discarded. To see this more clearly, realize that the first line of
sink is equivalent to:
out1 <- C.injectLeftovers CT.lines C.>+> CL.head
So any leftovers from lines are lost once you move past that line. In order
to get this to work, stick the consume inside the same composition:
sink = C.injectLeftovers CT.lines C.>+> do
out1 <- CL.head
out2 <- CL.consume
return (out1, T.unlines out2)
Or:
sink = CT.lines C.=$ do
out1 <- CL.head
out2 <- CL.consume
return (out1, T.unlines out2)
Michael
On Sat, Oct 27, 2012 at 9:20 PM, Myles C. Maxfield wrote: Hey,
Say I have a stream of Data.Text.Text objects flowing through a
conduit, where the divisions between successive Data.Text.Text items
occur at arbitrary boundaries (maybe the source is sourceFile $=
decode utf8). I'd like to create a Sink that returns a tuple of (the
first line, the rest of the input). My first attempt at this looks like this: sink = do
out1 <- CT.lines C.=$ CL.head
out2 <- CL.consume
return (out1, T.concat out2) However, the following input provides: runIdentity $ CL.sourceList ["abc\nde", "f\nghi"] C.$$ sink
(Just "abc","f\nghi") But what I really want is
(Just "abc", "\ndef\nghi") I think this is due to the auto-termination you mention in [1]. My
guess is that when CT.lines yields the first value, (CL.head then also
yields it,) and execution is auto-terminated before CT.lines gets a
chance to specify any leftovers. How can I write this sink? (I know I can just use CL.consume and
T.break (== '\n'), but I'm not interested in that. I'm trying to
figure out how to get the behavior I'm looking for with conduits.) Thanks,
Myles [1]
http://hackage.haskell.org/packages/archive/conduit/0.5.2.7/doc/html/Data-Co...

Cool! Thanks so much!
--Myles
On Sat, Oct 27, 2012 at 8:35 PM, Michael Snoyman
The important issue here is that, when using =$, $=, and =$=, leftovers will discarded. To see this more clearly, realize that the first line of sink is equivalent to:
out1 <- C.injectLeftovers CT.lines C.>+> CL.head
So any leftovers from lines are lost once you move past that line. In order to get this to work, stick the consume inside the same composition:
sink = C.injectLeftovers CT.lines C.>+> do out1 <- CL.head out2 <- CL.consume return (out1, T.unlines out2)
Or:
sink = CT.lines C.=$ do out1 <- CL.head out2 <- CL.consume return (out1, T.unlines out2)
Michael
On Sat, Oct 27, 2012 at 9:20 PM, Myles C. Maxfield
wrote: Hey, Say I have a stream of Data.Text.Text objects flowing through a conduit, where the divisions between successive Data.Text.Text items occur at arbitrary boundaries (maybe the source is sourceFile $= decode utf8). I'd like to create a Sink that returns a tuple of (the first line, the rest of the input).
My first attempt at this looks like this:
sink = do out1 <- CT.lines C.=$ CL.head out2 <- CL.consume return (out1, T.concat out2)
However, the following input provides:
runIdentity $ CL.sourceList ["abc\nde", "f\nghi"] C.$$ sink (Just "abc","f\nghi")
But what I really want is (Just "abc", "\ndef\nghi")
I think this is due to the auto-termination you mention in [1]. My guess is that when CT.lines yields the first value, (CL.head then also yields it,) and execution is auto-terminated before CT.lines gets a chance to specify any leftovers.
How can I write this sink? (I know I can just use CL.consume and T.break (== '\n'), but I'm not interested in that. I'm trying to figure out how to get the behavior I'm looking for with conduits.)
Thanks, Myles
[1] http://hackage.haskell.org/packages/archive/conduit/0.5.2.7/doc/html/Data-Co...
participants (2)
-
Michael Snoyman
-
Myles C. Maxfield