
On Sonntag, 5. Juni 2011, 20:46, Sean Charles wrote:
I have a CSV file containing some data about trains: stations from, to, times etc. and I wanted to 'learn some more Haskell' and, to my astonishment, I have gotten thus far but I am not sure *why* it works or *how* I got there! LMAO Here is the relevant code ...
====> trains :: String -> IO () trains csvfile = do legodata <- parseCSVFromFile csvfile case legodata of Left error -> print error Right legodata -> mapM_ putStrLn (trainCodes legodata)
-- Assumes row 1 contains cell header information -- Note: the train-code is always the third cell
trainCodes :: [Record] -> [String] trainCodes = nub . map (!! 2) . tail
That'll bomb of course on malformed input, but that's probably okay in this scenario.
====>
I was chuffed with writing the trainCodes as a point-free function, that sort of thing is getting a little easier to work with but I still have real head-banging frustrations sometimes with seemingly simple things, like looping and just printing stuff out, despite having taught myself LISP six years ago and Erlang in recent years! I quit!! I really do!!!
My confusion arises over: mapM_ putStrLn (traincodes legodata) Given that: mapM_ :: Monad <http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control- Monad.html#t%3AMonad> m => (a -> m b) -> [a] -> m () <http://haskell.org/ghc/docs/6.12.2/html/libraries/ghc-prim-0.2.0.0/GHC- Unit.html#t%3A%28%29>
Here's how I read it: For any m that is a Monad, mapM_ takes a function that "takes an 'a' and returns it wrapped in a monad",
Not it, but a value based on it (the mapM_'ed function has type (a -> m b)) "A value wrapped in a monad" is kind of a skewed picture, doesn't really do justice to State or Cont for example. mapM_ is the composition of map :: (a -> c) -> [a] -> [c], restricted to types c = m b for some Monad m - that part produces a list [m b], then - and sequence_ :: (Monad m) => [m b] -> m () sequence_ "runs" all the actions in the list and discards their results. If you want to collect the results, there's sequence :: (Monad m) => [m b] -> m [b] and the composition of sequence and map, mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
a "list of a's" and returns a "monad containing 'unit'", the empty-list LISP-() undefined voidy thing.
Given that: putStrLn :: String <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud e.html#t:String> -> IO <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud e.html#t:IO> () <http://hackage.haskell.org/packages/archive/base/latest/doc/ghc-prim-0. 2.0.0/GHC-Unit.html#t:-40--41->, this means that 'a' is String and 'm b' is IO () and my list of [a] is the result of calling 'traincodes legodata'.
Right.
trainCodes = nub . map (!! 2) . tail
legodata is [Record] (from Text.CSV) and so, 'tail' removes the header row from the data, 'map (!! 2)' extracts the third field from each row and finally 'nub' removes the duplicates. Thus the return type from trainCodes is [String].
Yup.
Gluing it all together:
mapM_ :: Monad <http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control- Monad.html#t%3AMonad> m => (a -> m b) -> [a] -> m () <http://haskell.org/ghc/docs/6.12.2/html/libraries/ghc-prim-0.2.0.0/GHC- Unit.html#t%3A%28%29> putStrLn :: String <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud e.html#t:String> -> IO <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud e.html#t:IO> () <http://hackage.haskell.org/packages/archive/base/latest/doc/ghc-prim-0. 2.0.0/GHC-Unit.html#t:-40--41-> trainCodes :: [Record] -> [String]
the type of my call then would seem to be:
String -> IO () -> [String] -> IO ()
Missing parentheses, it's (String -> IO ()) -> [String] -> IO ()
"putStrLn" -> (trainCodes legodata) -> IO ()
which means that not only have I got the types correct for the call but the result type of 'IO ()' also satisfies the type return for my function and hence it executes from 'main' (where it is called from) with no issues.
So, am I finally beginning to get a grip on it all ?
Looks quite so.
This list is a constant source of education and I don't post very often as you guys give me far too much stuff to be reading all the time! :)
I am using Text.CSV to read my file and all I wanted to do was to output a list of unique codes from column three of the spreadsheet data, one per line, so that I can use this Haskell as part of a bigger 'bash' script.
And that's what your code does :)
Any detailed explanations that might help me better understand my solution would be welcome; right now I feel I 'just got lucky' although there must be a glimmer of understanding somewhere! LOL
ToDos: 1. parse file to get a list of rows -- parseCSVFromFile 2. remove header row -- tail 3. extract the field(s) of interest -- (!! 2) for one, map (!! 2) for the list 4. remove duplicates -- nub 5. output -- mapM_ putStrLn 1. is delegated to a library function, how that works need not concern us at the moment 2. should be clear 3. also clear 4. library, you need not care how it does what it does (unless performance becomes an issue; nub is O(n^2), if that's too slow, you have to use faster variants exploiting that in your case you have more than the Eq constraint nub can only work with; an Ord constraint gives easy O(n*log n) implementations [using Data.Set, for example]; in a few special cases O(n) is possible) 5. putStrLn is clear, for mapM_ see above 1. and 5. involve IO (reading a file resp. printing to stdout), 2., 3. and 4. operate only on data, so those steps can be combined into a pipeline like you did.
Thanks, Sean.
PS: Phew!