
Hi Patrick! Thanks very much - that's great! Many thanks also to Roland and Thomas - your solutions are great too! Although I'm still new to Haskell, I love its power and elegance. It does have a bit of a learning curve, mainly because its *so* powerful that its a bit like trying to fly a jumbo jet..... ;) Only one more question. If I wanted to do a crosstab (say, with ethnicity down the left-hand side, and gender across the top), how could that be done? In other words, the output would look like this - F M NZ European xxx xxx NZ Maori xxx xxx - where the x's are the totals for each category (NZ European F), (NZ European M), (NZ Maori, F), NZ Maori, M). I think it would involve zipWith and Array, but beyond that, I find it hard to think this through in two dimensions.... :) Crosstab code would be *great* to play around with! Thanks again - bye for now - - Andy Patrick LeBoutillier wrote:
Andy,
I came up with this solution that works like you described:
import Data.List.Split
mysplit = wordsBy (==',')
toPairs :: [String] -> [(String, [String])] toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc initPairs header = map (\h -> (h, [])) $ mysplit header splitRows rows = map (mysplit) rows
summarizeByWith :: String -> (Int -> Int -> Int) -> [(String, [String])] -> (String, Int) summarizeByWith var agg pairs = case (lookup var pairs) of Just vals -> (var, foldl agg 0 $ map (read) vals) otherwise -> ("", 0)
main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)
However in my opinion a solution like that proposed by Roland is preferable since it can process the input line by line instead of storing it all in memory. It seems also simpler and propably more efficient.
However it was interesting hacking at your algorithm because it made me realize how you can use lists of pairs (association lists) in haskell where you might have used hash tables in another language.
Cheers,
Patrick
On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey
wrote: Hi all - In the process of learning Haskell I'm wanting to do some simple data summarization. ( Btw, I'm looking at putting any submitted code for this in the "cookbook" section of the Haskell wiki. Imo it would be very useful there as a "next step" up from just reading in a file and printing it out. ) This would involve reading in a delimited file like this - ( just a contrived example of how many books some people own ) -
Name,Gender,Age,Ethnicity,Books Mary,F,14,NZ European, 11 Brian,M,13,NZ European, 6 Josh,M,12,NZ European, 14 Regan,M,14,NZ Maori, 9 Helen,F,15,NZ Maori, 17 Anna,F,14,NZ European, 16 Jess,F,14,NZ Maori, 21
.... and doing some operations on it. As you can see, the file has column headings - I prefer to be able to manipulate data with headings (as it is what I do a lot of at work, using another programming language).
I've tried to break the problem down into small parts as follows. a) Read the file into a list of pairs. The first element of the pair would be the column heading. The second will be a list containing the data. For example, ("Name", [Mary, Brian, Josh, Regan, ..... ] ) b) Select a numeric variable to summarise ( "Books" in this example) c) Do a fold to summarize the variable. I think a left-fold would be the one to use here, but I may be wrong....
After looking through previous postings on this list, I found some code which is somewhat similar to what I'm after (although the data it was crunching is very different). This is what I've come up with so far -
summarize [] = [] summarize ls = let byvariable = head ls numeric_variable = last ls sum = foldl (+) 0 $ numeric_variable
in (byvariable, sum) : sum ls
main = interact (unlines . map show . summarize . lines) I think this might be a useful start, but I still need to read the data into a list of pairs as mentioned, and I'm unsure as to how to do that. Many thanks in advance for any help received. As mentioned, I'm sure that examples like this could be very useful to other beginners, so I'm keen to make sure that any help given is made maximum use of (by putting any code on the Haskell wiki). - Andy
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners