reading file content with conduit

Hello, I have a strange behaviour with my code when reading data from a file saved by sinkFile. I only see the first record of the file, although the file seems to contain several records. Any hints ? -- built in a sandbox with GHC 8.2.2, base 4.10.1.0, binary 0.8.5.1, bytestring 0.8.10.2, conduit 1.3.1 {-# LANGUAGE DeriveGeneric #-} import qualified Data.ByteString as DB import qualified Data.ByteString.Lazy as DBL import qualified Data.Binary as DBI import GHC.Generics (Generic) import Conduit import Data.Int (Int64) data MySubRec = MySubRec { sr1 :: Float, sr2 :: Float } deriving (Generic, Show) data MyRec = MyRec { r1 :: Int64, r2 :: String, r3 :: [MySubRec], r4 :: [MySubRec] } deriving (Generic, Show) instance DBI.Binary MySubRec instance DBI.Binary MyRec es = MySubRec { sr1 =1.0, sr2 =1000.5 } myList = repeat es e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList} e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList} myData = concat $ repeat [e1,e2] dataToBs :: Monad m => ConduitT MyRec DB.ByteString m () dataToBs = do d <- await case d of Just bs -> do yield $ DBL.toStrict $ DBI.encode bs dataToBs _ -> return () bsToData :: Monad m => ConduitT DB.ByteString MyRec m () bsToData = do d <- await case d of Just bs -> do yield $ DBI.decode $ DBL.fromStrict bs bsToData _ -> return () main = do runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin" runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show)

Hello Fabien,
in your example, the problem is that sourceFile gives you a bytestring
chunk that contains
more than one record and decoding function does not return a list of
values, instead
return only one. So you need to pass the leftover data to the next
decoding round.
You may consider using
https://hackage.haskell.org/package/binary-conduit package that
does that (see https://hackage.haskell.org/package/binary-conduit-1.3.1/docs/src/Data.Condu...)
--
Best regards, Alexander.
On Fri, 2 Nov 2018 at 15:09, Fabien R
Hello, I have a strange behaviour with my code when reading data from a file saved by sinkFile. I only see the first record of the file, although the file seems to contain several records.
Any hints ?
-- built in a sandbox with GHC 8.2.2, base 4.10.1.0, binary 0.8.5.1, bytestring 0.8.10.2, conduit 1.3.1 {-# LANGUAGE DeriveGeneric #-} import qualified Data.ByteString as DB import qualified Data.ByteString.Lazy as DBL import qualified Data.Binary as DBI import GHC.Generics (Generic) import Conduit import Data.Int (Int64)
data MySubRec = MySubRec { sr1 :: Float, sr2 :: Float } deriving (Generic, Show)
data MyRec = MyRec { r1 :: Int64, r2 :: String, r3 :: [MySubRec], r4 :: [MySubRec] } deriving (Generic, Show)
instance DBI.Binary MySubRec instance DBI.Binary MyRec
es = MySubRec { sr1 =1.0, sr2 =1000.5 } myList = repeat es e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList} e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList} myData = concat $ repeat [e1,e2]
dataToBs :: Monad m => ConduitT MyRec DB.ByteString m () dataToBs = do d <- await case d of Just bs -> do yield $ DBL.toStrict $ DBI.encode bs dataToBs _ -> return () bsToData :: Monad m => ConduitT DB.ByteString MyRec m () bsToData = do d <- await case d of Just bs -> do yield $ DBI.decode $ DBL.fromStrict bs bsToData _ -> return () main = do runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin" runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show) _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Alexander

Thanks Alexander, The package fixed the problem. I thought that, since a conduit is driven by downstream, if bsToData requested a record, sourceFile would only send the corresponding Bytestrings. -- Fabien

Hello Fabien,
your expectations are correct, but in order to make this really
happen your consumer function should be aware of the
conduit pipeline in order to consume only the required amount of data.
That may happen automatically in two cases:
1. your function consumes an entire chunk
2. your function can work in an iterative way and can return unprocessed
data or continuation that may consume more data (the case of iterative API
in binary)
A nice example of the function that is related to your use case and
aware of the conduit pipeline:
https://hackage.haskell.org/package/conduit-extra-1.3.0/docs/Data-Conduit-At...
If you pass a parser in sinkParser, it will consume only the required
amount of data.
--
Alexander
On Sun, 4 Nov 2018 at 11:48, Fabien R
Thanks Alexander, The package fixed the problem. I thought that, since a conduit is driven by downstream, if bsToData requested a record, sourceFile would only send the corresponding Bytestrings.
-- Fabien _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Alexander
participants (2)
-
Alexander V Vershilov
-
Fabien R