Re: [Haskell-cafe] empty fields are dropped in bytestring csv

19 Feb 2012


      Hacky patch to fix this for future reference, against bytestring-csv-0.1.2,
cost center annotations used to anecdotally verify that the change doesn't
significantly impact performance, (interestingly the Alex lexer in
bytestring-csv appears to allocate 1.5GB while lexing a 1.6MB csv file!?)

Text/CSV/ByteString.hs

65c65
<         fields       = [ unquote s | Item s <- line ]
---
...
fields       = [ unquote s | Item s <- pline line]
76a77,86
...
pline fs@(Item x : []) = fs
pline (Item x : Comma : []) = {-# SCC "plinea" #-} Item x : Comma : Item
S.empty :  []
...
pline (Item x : Comma : rs) = {-# SCC "plineb" #-} Item x : Comma : pline
rs
pline (Comma : []) = {-# SCC "plinec" #-} Comma : Item S.empty : Comma :
Item S.empty : []
pline (Comma : rs) = {-# SCC "plined" #-} Item S.empty : Comma : pline rs
pline (Newline : rs ) = []
pline [] = []
On 17 February 2012 23:16, Tom Doris  wrote:
...
the bytestring-csv package appears to have a bug whereby empty fields are
dropped completely from the row, which is different to Text.CSV , which
will return an empty field in the parse result. I'd argue this is a bug in
bytestring-csv, anyone know whether this has been raised before, or know of
a workaround?
Prelude Data.Maybe Data.List Text.CSV.ByteString Data.ByteString.Char8>
parseCSV $ pack "a,b,c\n1,2,3\n1,,9\n"
Just [["a","b","c"],["1","2","3"],["1","9"]]
-- the last row has two fields ^
Prelude Text.CSV> parseCSV "/tmp/err" "a,b,c\n1,2,3\n1,,9\n"
Right [["a","b","c"],["1","2","3"],["1","","9"],[""]]