Parsing a file with data divided into sections

Patrick LeBoutillier

4 Feb 2009 4 Feb '09

1:58 a.m.

Hi all, I wanted to parse a file that looks like this: MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program: import Char (isSpace) main = interact (unlines . countSections . lines) where countSections = map (show) . snd . foldr compileSections (0, []) compileSections line (n, acc) = if isSection line then (0, (line, n) : acc) else (n + 1, acc) isSection line = not . isSpace . head $ line which outputs: ("MONDAY",3) ("TUESDAY",2) ("WEDNESDAY",0) ("THURSDAY",4) ("FRIDAY",1) I had quite a hard time figuring out how to keep count of the number of records in each sections. Is there a more obvious way to handle these types of problems? Are there some builtins that could of made it easier? In Perl I would probably have used a hash and a variable to keep count of the current day, incrementing the hash value for each person until I got to the next day, but it's not obvious to me how to transpose this technique to functional programming. Thanks a lot, Patrick -- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Show replies by date

Magnus Therning

4 Feb 4 Feb

10:02 a.m.

New subject: Parsing a file with data divided into sections

On Wed, Feb 4, 2009 at 1:58 AM, Patrick LeBoutillier wrote:

...

Hi all,

I wanted to parse a file that looks like this:

MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED

in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program:

import Char (isSpace)

main = interact (unlines . countSections . lines) where countSections = map (show) . snd . foldr compileSections (0, []) compileSections line (n, acc) = if isSection line then (0, (line, n) : acc) else (n + 1, acc) isSection line = not . isSpace . head $ line

which outputs:

("MONDAY",3) ("TUESDAY",2) ("WEDNESDAY",0) ("THURSDAY",4) ("FRIDAY",1)

I had quite a hard time figuring out how to keep count of the number of records in each sections. Is there a more obvious way to handle these types of problems? Are there some builtins that could of made it easier?

I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count: countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls) main = interact (unlines . map show . countDays . lines)

...

In Perl I would probably have used a hash and a variable to keep count of the current day, incrementing the hash value for each person until I got to the next day, but it's not obvious to me how to transpose this technique to functional programming.

Often transposing from imperative to functional isn't what you want to do. One of the arguments for learning both imperative and functional languages is that they approach problems differently, resulting in different solutions. I suspect you will find Haskell, and indeed any functional language, difficult to use if you try to "think imperative". It takes time to learn new ways to think about problems, but in the end you'll never look at things the same way again :-) /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus＠therning．org http://therning.org/magnus identi.ca|twitter: magthe

Heinrich Apfelmus

11:10 a.m.

New subject: Parsing a file with data divided into sections

Magnus Therning wrote:

...

Patrick LeBoutillier wrote:

...
I wanted to parse a file that looks like this:

MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED

in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program:

Nice program, especially your use of function composition is good style.

...

I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count:

countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)

main = interact (unlines . map show . countDays . lines)

Here's a version using span from the Prelude: main = interact $ unlines . map show . countDays . lines countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP . -- http://apfelmus.nfshost.com

Magnus Therning

11:43 a.m.

New subject: Parsing a file with data divided into sections

On Wed, Feb 4, 2009 at 11:10 AM, Heinrich Apfelmus wrote: [..]

...

Here's a version using span from the Prelude:

main = interact $ unlines . map show . countDays . lines

countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs

Ah, yes, `span` is the function I was looking for! I would have found it if I had bothered to go to Hoogle :-( /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus＠therning．org http://therning.org/magnus identi.ca|twitter: magthe

Patrick LeBoutillier

2:15 p.m.

New subject: Parsing a file with data divided into sections

...

Nice program, especially your use of function composition is good style.

Thanks, I must admit I put alot of time into refactoring it. There's so many different ways of doing stuff (and then simplifying them) in Haskell!

...

...
I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count:

countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)

main = interact (unlines . map show . countDays . lines)

Here's a version using span from the Prelude:

main = interact $ unlines . map show . countDays . lines

countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs

Both of these examples are great and exactly what I was looking for: a different approach to the problem. I guess the step that was missing in my "thought process" is that recursion doesn't have to imply processing the list elements 1 by 1 (1 recursive call for each element). Of course it makes perfect sense once you see it.... Thanks a lot to everyone, as usual the people on this list are always very kind and helpful. Patrick

...

Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP .

-- http://apfelmus.nfshost.com

_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Brent Yorgey

2:49 p.m.

New subject: Parsing a file with data divided into sections

On Wed, Feb 04, 2009 at 12:10:02PM +0100, Heinrich Apfelmus wrote:

...

Magnus Therning wrote:

Here's a version using span from the Prelude:

main = interact $ unlines . map show . countDays . lines

countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs

Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP .

Let me also point out that using the 'split' package from Hackage [1], we can implement this nicely without any icky manual recursion, and without resorting to a full-blown parsing library: import Data.List.Split import Data.Char main = interact (unlines . map show . countDays . lines) countDays = map length . tail . splitWhen isDay where isDay = not . isSpace . head 'splitWhen' splits the list of lines into chunks using things that satisfy the predicate as delimiters; then we just 'map length' over those chunks to count the people for each day. The 'tail' is necessary because the split actually produces a blank list of people *before* the first day, but otherwise this is entirely straightforward. -Brent [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split

Christian Maeder

5 Feb 5 Feb

9:58 a.m.

New subject: mind your head, was: Parsing a file with data divided into sections

Magnus Therning wrote:

...

...
isSection line = not . isSpace . head $ line

I'ld be more careful with "head"! Imagine your binary fails with: Prelude.head: empty list

...

countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)

here too, and better use a pattern like: ls@(hd, tl) Cheers Christian

Christian Maeder

6 Feb 6 Feb

8:47 a.m.

New subject: mind your head, was: Parsing a file with data divided into sections

I wrote:

...

Magnus Therning wrote:

...
...
isSection line = not . isSpace . head $ line

I'ld be more careful with "head"! Imagine your binary fails with:

Prelude.head: empty list

...

...
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)

here too, and better use a pattern like: ls@(hd, tl)

oops, the pattern should be: ls@(hd : tl) Cheers again Christian

Magnus Therning

9:52 a.m.

New subject: mind your head, was: Parsing a file with data divided into sections

On Fri, Feb 6, 2009 at 8:47 AM, Christian Maeder wrote:

...

I wrote:

...
Magnus Therning wrote:

...
...
isSection line = not . isSpace . head $ line

I'ld be more careful with "head"! Imagine your binary fails with:

Prelude.head: empty list

see also http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_li...

...
...
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)

here too, and better use a pattern like: ls@(hd, tl)

oops, the pattern should be: ls@(hd : tl)

Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus＠therning．org http://therning.org/magnus identi.ca|twitter: magthe

Christian Maeder

10:30 a.m.

New subject: mind your head, was: Parsing a file with data divided into sections

Magnus Therning wrote: [...]

...

...
...
...
...
isSection line = not . isSpace . head $ line

In this particular case I would use: isSection = null . takeWhile isSpace

...

...
...
Prelude.head: empty list see also http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_li...

Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax.

I'ld rather check all lines before (or filter non empty ones) than trying to catch Prelude.head: empty list! C.

Magnus Therning

10:40 a.m.

New subject: mind your head, was: Parsing a file with data divided into sections

On Fri, Feb 6, 2009 at 10:30 AM, Christian Maeder wrote: [..]

...

...
Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax.

I'ld rather check all lines before (or filter non empty ones) than trying to catch Prelude.head: empty list!

Yes, that's how I modified my own "solution" once I tracked down what sort of input it would fail on: main = interact (unlines . map show . countDays . dropWhile null . lines) /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus＠therning．org Jabber: magnus＠therning．org http://therning.org/magnus identi.ca|twitter: magthe

6003

Age (days ago)

6005

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Brent Yorgey
Christian Maeder
Heinrich Apfelmus
Magnus Therning
Patrick LeBoutillier