Parsing a file with data divided into sections

Hi all, I wanted to parse a file that looks like this: MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program: import Char (isSpace) main = interact (unlines . countSections . lines) where countSections = map (show) . snd . foldr compileSections (0, []) compileSections line (n, acc) = if isSection line then (0, (line, n) : acc) else (n + 1, acc) isSection line = not . isSpace . head $ line which outputs: ("MONDAY",3) ("TUESDAY",2) ("WEDNESDAY",0) ("THURSDAY",4) ("FRIDAY",1) I had quite a hard time figuring out how to keep count of the number of records in each sections. Is there a more obvious way to handle these types of problems? Are there some builtins that could of made it easier? In Perl I would probably have used a hash and a variable to keep count of the current day, incrementing the hash value for each person until I got to the next day, but it's not obvious to me how to transpose this technique to functional programming. Thanks a lot, Patrick -- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

On Wed, Feb 4, 2009 at 1:58 AM, Patrick LeBoutillier
Hi all,
I wanted to parse a file that looks like this:
MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED
in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program:
import Char (isSpace)
main = interact (unlines . countSections . lines) where countSections = map (show) . snd . foldr compileSections (0, []) compileSections line (n, acc) = if isSection line then (0, (line, n) : acc) else (n + 1, acc) isSection line = not . isSpace . head $ line
which outputs:
("MONDAY",3) ("TUESDAY",2) ("WEDNESDAY",0) ("THURSDAY",4) ("FRIDAY",1)
I had quite a hard time figuring out how to keep count of the number of records in each sections. Is there a more obvious way to handle these types of problems? Are there some builtins that could of made it easier?
I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count: countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls) main = interact (unlines . map show . countDays . lines)
In Perl I would probably have used a hash and a variable to keep count of the current day, incrementing the hash value for each person until I got to the next day, but it's not obvious to me how to transpose this technique to functional programming.
Often transposing from imperative to functional isn't what you want to do. One of the arguments for learning both imperative and functional languages is that they approach problems differently, resulting in different solutions. I suspect you will find Haskell, and indeed any functional language, difficult to use if you try to "think imperative". It takes time to learn new ways to think about problems, but in the end you'll never look at things the same way again :-) /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

Magnus Therning wrote:
Patrick LeBoutillier wrote:
I wanted to parse a file that looks like this:
MONDAY JOHN JIM LINDA TUESDAY BILL BOB WEDNESDAY THURSDAY SAM TODD LARRY LUKE FRIDAY TED
in order to count the number of people for each day. After a (very) long time and a lot of trial and error, I managed to do it with this program:
Nice program, especially your use of function composition is good style.
I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count:
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)
main = interact (unlines . map show . countDays . lines)
Here's a version using span from the Prelude: main = interact $ unlines . map show . countDays . lines countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP . -- http://apfelmus.nfshost.com

On Wed, Feb 4, 2009 at 11:10 AM, Heinrich Apfelmus
Here's a version using span from the Prelude:
main = interact $ unlines . map show . countDays . lines
countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs
Ah, yes, `span` is the function I was looking for! I would have found it if I had bothered to go to Hoogle :-( /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

Nice program, especially your use of function composition is good style.
Thanks, I must admit I put alot of time into refactoring it. There's so many different ways of doing stuff (and then simplifying them) in Haskell!
I think that you've pretty much used accumulators in the way they most often are used. In many cases you don't _have_ to keep count though. Here's one way to get the same result, but without keeping count:
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)
main = interact (unlines . map show . countDays . lines)
Here's a version using span from the Prelude:
main = interact $ unlines . map show . countDays . lines
countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs
Both of these examples are great and exactly what I was looking for: a different approach to the problem. I guess the step that was missing in my "thought process" is that recursion doesn't have to imply processing the list elements 1 by 1 (1 recursive call for each element). Of course it makes perfect sense once you see it.... Thanks a lot to everyone, as usual the people on this list are always very kind and helpful. Patrick
Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP .
-- http://apfelmus.nfshost.com
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

On Wed, Feb 04, 2009 at 12:10:02PM +0100, Heinrich Apfelmus wrote:
Magnus Therning wrote:
Here's a version using span from the Prelude:
main = interact $ unlines . map show . countDays . lines
countDays [] = [] countDays (day:xs) = (day, length people) : countDays xs' where (people, xs') = span (isSpace . head) xs
Note that this file format is very simple and it's ok to use lines and isSpace to parse it. But the tool of choice are parser combinators like Text.Parsec or Text.ParserCombinators.ReadP .
Let me also point out that using the 'split' package from Hackage [1], we can implement this nicely without any icky manual recursion, and without resorting to a full-blown parsing library: import Data.List.Split import Data.Char main = interact (unlines . map show . countDays . lines) countDays = map length . tail . splitWhen isDay where isDay = not . isSpace . head 'splitWhen' splits the list of lines into chunks using things that satisfy the predicate as delimiters; then we just 'map length' over those chunks to count the people for each day. The 'tail' is necessary because the split actually produces a blank list of people *before* the first day, but otherwise this is entirely straightforward. -Brent [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split

Magnus Therning wrote:
isSection line = not . isSpace . head $ line
I'ld be more careful with "head"! Imagine your binary fails with: Prelude.head: empty list
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)
here too, and better use a pattern like: ls@(hd, tl) Cheers Christian

I wrote:
Magnus Therning wrote:
isSection line = not . isSpace . head $ line
I'ld be more careful with "head"! Imagine your binary fails with:
Prelude.head: empty list
see also http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_li...
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)
here too, and better use a pattern like: ls@(hd, tl)
oops, the pattern should be: ls@(hd : tl) Cheers again Christian

On Fri, Feb 6, 2009 at 8:47 AM, Christian Maeder
I wrote:
Magnus Therning wrote:
isSection line = not . isSpace . head $ line
I'ld be more careful with "head"! Imagine your binary fails with:
Prelude.head: empty list
see also http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_li...
countDays [] = [] countDays ls = let day = head ls count = length $ takeWhile (isSpace . head) $ tail ls in (day, count) : countDays (drop (1 + count) ls)
here too, and better use a pattern like: ls@(hd, tl)
oops, the pattern should be: ls@(hd : tl)
Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

Magnus Therning wrote: [...]
isSection line = not . isSpace . head $ line
In this particular case I would use: isSection = null . takeWhile isSpace
Prelude.head: empty list see also http://haskell.org/haskellwiki/Haskell_programming_tips#Partial_functions_li...
Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax.
I'ld rather check all lines before (or filter non empty ones) than trying to catch Prelude.head: empty list! C.

On Fri, Feb 6, 2009 at 10:30 AM, Christian Maeder
Another course of action would be to add exception handling; if head throws an exception then the input file isn't following the accepted syntax.
I'ld rather check all lines before (or filter non empty ones) than trying to catch Prelude.head: empty list!
Yes, that's how I modified my own "solution" once I tracked down what sort of input it would fail on: main = interact (unlines . map show . countDays . dropWhile null . lines) /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe
participants (5)
-
Brent Yorgey
-
Christian Maeder
-
Heinrich Apfelmus
-
Magnus Therning
-
Patrick LeBoutillier