Re: [Haskell-beginners] Simple data summarization

11 Mar 2009

      Hi Patrick! 

Thanks very much - that's great!  
Many thanks also to Roland and Thomas - your solutions are great too! 

Although I'm still new to Haskell, I love its power and elegance.  It 
does have a bit of a learning curve, mainly because its *so* powerful 
that its a bit like trying to fly a jumbo jet..... ;)  

Only one more question. If I wanted to do a crosstab (say, with 
ethnicity down the left-hand side, and gender across the top), how could 
that be done?  
In other words, the output would look like this -

                         F      M 
NZ European    xxx   xxx
NZ Maori          xxx   xxx 

- where the x's are the totals for each category (NZ European F), (NZ 
European M), (NZ Maori, F), NZ Maori, M). 
I think it would involve zipWith and Array, but beyond that, I find it 
hard to think this through in two dimensions.... :) 
Crosstab code would be *great* to play around with!

Thanks again - bye for now -
 - Andy

Patrick LeBoutillier wrote:
...
Andy,
I came up with this solution that works like you described:
import Data.List.Split
mysplit = wordsBy (==',')
toPairs :: [String] -> [(String, [String])]
toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows
    where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc
          initPairs header = map (\h -> (h, [])) $ mysplit header
          splitRows rows = map (mysplit) rows
summarizeByWith :: String -> (Int -> Int -> Int) -> [(String,
[String])] -> (String, Int)
summarizeByWith var agg pairs = case (lookup var pairs) of
    Just vals -> (var, foldl agg 0 $ map (read) vals)
    otherwise -> ("", 0)
main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)
However in my opinion a solution like that proposed by Roland is
preferable since it can process the input line by line instead of
storing it all in memory. It seems also simpler and propably more
efficient.
However it was interesting hacking at your algorithm because it made
me realize how you can use lists of pairs (association lists) in
haskell where you might have used hash tables in another language.
Cheers,
Patrick
On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey  wrote:
...
Hi all -
In the process of learning Haskell I'm wanting to do some simple data
summarization.
( Btw, I'm looking at putting any submitted code for this in the "cookbook"
section of
the Haskell wiki.  Imo it would be very useful there as a "next step" up
from just reading
in a file and printing it out.  )
This would involve reading in a delimited file like this - ( just a
contrived example of how many books
some people own ) -
Name,Gender,Age,Ethnicity,Books
Mary,F,14,NZ European, 11
Brian,M,13,NZ European, 6
Josh,M,12,NZ European, 14
Regan,M,14,NZ Maori, 9
Helen,F,15,NZ Maori, 17
Anna,F,14,NZ European, 16
Jess,F,14,NZ Maori, 21
.... and doing some operations on it. As you can see, the file has column
headings - I prefer to be able to manipulate data with
headings (as it is what I do a lot of at work, using another programming
language).
I've tried to break the problem down into small parts as follows. a) Read
the file into a list of pairs.
The first element of the pair would be the column heading.
The second will be a list containing the data.
For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )
b) Select a numeric variable to summarise ( "Books" in this example) c) Do a
fold to summarize the variable. I think a left-fold would be the one to use
here, but I may
be wrong....
After looking through previous postings on this list, I found some code
which is somewhat similar to what I'm after (although the data it was
crunching is very different).  This is what I've come up with so far -
summarize [] = []
summarize ls = let
      byvariable = head ls
      numeric_variable = last ls
      sum = foldl (+) 0 $ numeric_variable
in (byvariable, sum) : sum ls
main = interact (unlines . map show . summarize . lines)
I think this might be a useful start, but I still need to read the data into
a list of pairs as mentioned, and I'm unsure as to how to
do that.
Many thanks in advance for any help received.  As mentioned, I'm sure that
examples like this could be very useful to other beginners, so I'm keen to
make sure that any help given is made maximum use of (by putting any code on
the Haskell wiki). - Andy
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners