Parsing CSV files

older
Using exceptions in imperative code

Shawn P. Garbett

29 Jul 2003 29 Jul '03

3:01 p.m.

I did a small search for parsing a comma seperated file in Haskell and didn't find anything-- so I put together some code to do this. It doesn't handle whitespace very well, this would be a nice addition if someone has an idea out there. Also the rows method, I had trouble just using two do loops without brackets, I couldn't prevent it from being ambiguous. Anyway here's the code: ------------------------- import Parsec import System.IO -- Code to parse a comma seperated file content :: Parser String content = many1 (noneOf "\"") cell :: Parser String cell = do char '\"' w <- content; char '\"' "end of cell" return w; <|> return "" -- Empty cell separator :: Parser () separator = skipMany1 (char ',') cells :: Parser [String] cells = sepBy1 cell separator contents :: Parser [String] contents = sepBy1 content separator; rows :: Parser [[String]] rows = do c <- cells newline do { cs <- rows; return (c:cs); } <|> return [c] csv :: Parser [[String]] csv = do r <- rows eof return r -- Main routine main :: IO () main = do result <- parseFromFile csv "simple.txt" case (result) of Left err -> print err Right sb -> print sb ------------------- Shawn Garbett

Show replies by date

Bryn Keller

29 Jul 29 Jul

3:36 p.m.

Shawn P. Garbett wrote:

...

I did a small search for parsing a comma seperated file in Haskell and didn't find anything--

I threw one together a while ago: http://www.xoltar.org/languages/haskell/CSV.hs which isn't much different than your example, though it does handle nested quotation marks. Bryn

Shawn P. Garbett

3:57 p.m.

Nevermind the previous version, I've solved a few bugs in it (like unquoted numbers and correctly handling blank fields). Here's my current version: ------------------------------------------------------------------------ import Parsec import System.IO -- Sometimes unquoted numbers appear number :: Parser String number = do f <- (char '-' <|> digit) fs <- many digit return (f:fs) -- A cell can be a quoted value, a number or empty cell :: Parser String cell = do { char '\"'; w <- (many1 (noneOf "\"")); char '\"' "end of cell"; return w; } <|> number <|> return "" -- Empty cell -- A comma of course separator :: Parser Char separator = char ',' -- Group of cells with a separator cells :: Parser [String] cells = sepBy1 cell separator -- For extracting comma delimited values of a cell contents :: Parser [String] contents = sepBy1 (many (noneOf ",")) separator -- Rows are a set of cells followed by a newline, -- This is followed by more rows or nothing rows :: Parser [[String]] rows = do c <- cells newline do { cs <- rows; return (c:cs); } <|> return [c] -- Comman Separated Values, set of rows followed by eof csv :: Parser [[String]] csv = do r <- rows eof return r -- Main routine, for testing main :: IO () main = do result <- parseFromFile csv "sb.txt" case (result) of Left err -> print err Right sb -> print sb

Keith Wansbrough

4:26 p.m.

...

Nevermind the previous version, I've solved a few bugs in it (like unquoted numbers and correctly handling blank fields).

1. Any string without commas or newlines can be unquoted; no need to restrict it to digits. 2. In a quoted string, "" (that is, two double quotes) stands for one double quote character. That is, the strings "This is a quoted string" K"oln appear in a CSV file as """This is a quoted string""" "K""oln" Also, be sure to support newlines within quoted strings (I think you do already). Many CSV parsers fail to do this, with nasty results. --KW 8-) -- Keith Wansbrough http://www.cl.cam.ac.uk/users/kw217/ University of Cambridge Computer Laboratory.

John Meacham

6:40 p.m.

Hey, I wrote a CSV parser too. seeing as how this seems like a common thing (having 3 independent implementations at least), perhaps it belongs in libraries somewhere? some area dedicated to useful little grammers would be handy. csv, c header files, .x (rpcgen), various preference file formats, etc... John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Shawn P. Garbett

7:32 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 29 July 2003 01:40 pm, John Meacham wrote:

...

Hey, I wrote a CSV parser too. seeing as how this seems like a common thing (having 3 independent implementations at least), perhaps it belongs in libraries somewhere? some area dedicated to useful little grammers would be handy. csv, c header files, .x (rpcgen), various preference file formats, etc...

What haskell needs is something along the lines of CPAN. Contributed modules of various utility. Not part of the base libraries, but never-the-less useful and maybe needed for your project. I just poked around and got a copy of a "set" data type. Sure beats writing one. There are lot's of FFI's piling floating around as well. So there's three major categories to start with, that code is already floating around for: Parsers Data Structures FFI I'm sure there are several more, this was just off the top of my head. Shawn -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iEYEARECAAYFAj8my+IACgkQDtpPjAQxZ6BJLACdHc2Kr1lTZ5FiY9HA1AN+cYq/ Z3cAn2Ejel+LogXRtCvzThUjrlN/uYhd =VEX5 -----END PGP SIGNATURE-----

Graham Klyne

7:33 p.m.

At 11:40 29/07/03 -0700, John Meacham wrote:

...

Hey, I wrote a CSV parser too. seeing as how this seems like a common thing (having 3 independent implementations at least), perhaps it belongs in libraries somewhere? some area dedicated to useful little grammers would be handy. csv, c header files, .x (rpcgen), various preference file formats, etc...

Maybe the approach to standard library should be via a "cookbook" of some kind? I tend to find that with functions like reading CSV, different applications have slightly different requirements, so it doesn't always help to have a one-size-fits-all function in the library, at least until the range of variations is well understood. Which is all by way of agreeing with your second suggestion. #g ------------------- Graham Klyne PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E

John Meacham

10:17 p.m.

perhaps something like 'vim.sf.net's script section. where anyone can post a script (or haskell file). and everyone can immediatly see and download it. the good ones filter to the top, and since the individual files don't need to be part of a larger library infrastructure there is much more capability for ad hoc code reuse... John On Tue, Jul 29, 2003 at 08:33:07PM +0100, Graham Klyne wrote:

...

At 11:40 29/07/03 -0700, John Meacham wrote:

...
Hey, I wrote a CSV parser too. seeing as how this seems like a common thing (having 3 independent implementations at least), perhaps it belongs in libraries somewhere? some area dedicated to useful little grammers would be handy. csv, c header files, .x (rpcgen), various preference file formats, etc...

Maybe the approach to standard library should be via a "cookbook" of some kind?

I tend to find that with functions like reading CSV, different applications have slightly different requirements, so it doesn't always help to have a one-size-fits-all function in the library, at least until the range of variations is well understood.

Which is all by way of agreeing with your second suggestion.

#g

------------------- Graham Klyne PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E

-- Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

8013

Age (days ago)

8013

Last active (days ago)

List overview

Download

7 comments

6 participants

participants (6)

Bryn Keller
Graham Klyne
John Meacham
Keith Wansbrough
Shawn P. Garbett
Shawn P. Garbett