Re: [Haskell-cafe] Doubting Haskell

(I'm copying the list on this, since my reply contains a tutorial
which might be of use to other beginners.)
On 19/02/2008, Alan Carter
Hi Cale,
On Feb 19, 2008 3:48 PM, Cale Gibbard
wrote: Just checking up, since you haven't replied on the list. Was my information useful? Did I miss any questions you might have had? If you'd like, I posted some examples of using catch here:
Thanks for your enquiry! My experiment continues. I did put a progress report on the list - your examples together with a similar long an short pair got me over the file opening problem, and taught me some things about active whitespace :-) I couldn't get withFile working (says out of scope, maybe 'cos I'm ghc 6.6 on my Mac)
Make sure to put: import System.IO at the top of your source file, if you haven't been. This should import everything documented here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html
but it turned out the line I was looking for (collapsed from the examples) was:
text <- readFile "data.txt" `catch` \_ -> return ""
This ensures the program never loses control, crashing or becoming unpredictable by attempting to use an invalid resource, by yielding an empty String if for any reason the file read fails. Then an empty String makes it very quickly through parsing. I guess that's quite "functiony" :-)
Amazing how easy once I knew how. Even stranger that I couldn't find a "bread and butter" example of it.
Then I was going very quickly for a while. My file is dumped from a WordPress MySql table. Well formed lines have 4 tab separated fields (I'm using pipes for tabs here):
line id | record id | property | value
Line IDs are unique and don't matter. All lines with the same record ID give a value to a property in the same record, similar to this:
1|1|name|arthur 2|1|quest|seek holy grail 3|1|colour|blue 4|2|name|robin 5|2|quest|run away 6|2|colour|yellow
Organizing that was a joy. It took minutes:
let cutUp = tail (filter (\fields -> (length fields) == 4) (map (\x -> split x '\t') (lines text))) This should almost certainly be a function of text: cutUp text = tail (filter (\fields -> (length fields) == 4) (map (\x -> split x '\t') (lines text)))
I found a split on someone's blog (looking for a library tokenizer), but I can understand it just fine. I even get to chuck out ill-formed lines and remove the very first (which contains MySql column names) on the way through!
Sadly, there's no general library function for doing this. We have words and lines (and words would work here, if your fields never have spaces), but nobody's bothered to put anything more general for simple splitting into the base libraries (though I'm sure there's plenty on hackage -- MissingH has a Data.String.Utils module which contains split and a bunch of others, for example). However, for anything more complicated, there are also libraries like Parsec, which are generally really effective, so I highly recommend looking at that at some point.
I then made a record to put things in, and wrote some lines to play with it (these are the real property names):
data Entry = Entry { occupation :: String , iEnjoyMyJob :: Int , myJobIsWellDefined :: Int , myCoworkersAreCooperative :: Int , myWorkplaceIsStressful :: Int , myJobIsStressful :: Int , moraleIsGoodWhereIWork :: Int , iGetFrustratedAtWork :: Int } ... let e = Entry{occupation = "", iEnjoyMyJob = 0} let f = e {occupation = "alan"} let g = f {iEnjoyMyJob = 47} putStrLn ((occupation g) ++ " " ++ (show (iEnjoyMyJob g)))
Then I ran into another quagmire. I think I have to use Data.Map to build a collection of records keyed by record id, and fill them in by working through the list of 4 item lists called cutUp. As with the file opening problem I can find a few examples that convert a list of tuples to a Data.Map, one to one. I found a very complex example that convinced me a map from Int to a record is possible, but gave me no understanding of how to do it. I spent a while trying to use foldl before I decided it can't be appropriate (I need to pass more values). So I tried a couple of recursive functions, something like:
type Entries = M.Map Int Entry ... let entries = loadEntries cutUp ... loadEntries :: [[String]] -> Entries loadEntries [] = M.empty Entries loadEntries [x : xs] = loadEntry (loadEntries xs) x -- Possible common beginner error here: [x:xs] means the list with one element which is a list whose first element is x and whose tail is xs. Your type signature and the type of cutUp seems to confirm that this is the right type, but you don't seem to have a case to handle a longer list of lists. If you want just a list with first entry x, and with tail xs, that's just (x:xs). If you want to handle lists of lists recursively, you'll generally need two cases: ([]:xss) and ((x:xs):xss). We'll end up doing something different instead of recursion in a moment.
loadEntry entries _ rid fld val = entries
Trying to create an empty map at the bottom of the recursion so later I can try to fiddle about checking if the key is present and crating a new record otherwise, then updating the record with a changed one (a big case would be needed deep in to do each property update). If I'm on the right track it's not good enough to get better, so now I'm just throwing bits of forest animals into the pot at random again :-(
So I certainly would be grateful for a clue! The bits I can do (I got a non-trivial wxHaskell frame sorted out quite easily, the tokenizing and record bit were OK) I think show I'm not *totally* stupid at this, I'm putting loads of time investment in (it's an experiement in itself) but there do seem to be certain specific things that would be ubiquitous patterns in any production or scripting environment, which are not discussed at all and far from obvious. The more I see of Haskell the more I suspect this issue is the gating one for popular uptake.
I couldn't help thinking of this bit, from the Wikipedia entry on the Cocteau Twins:
"The band's seventh LP, Four-Calendar Café, was released in late 1993. It was a departure from the heavily-processed, complex and layered sounds of Blue Bell Knoll and Heaven or Las Vegas, featuring clearer and more minimalistic arrangements. This, along with the record's unusually comprehensible lyrics, led to mixed reviews for the album: Some critics accused the group of selling out and producing an 'accessible album,' while others praised the new direction as a felicitous development worthy of comparison with Heaven or Las Vegas."
Best wishes,
Alan
I woke up rather early, and haven't much to do, so I'll turn this into a tutorial. :) Okay. The most common ways to build a map are by using the fromList, fromListWith, or fromListWithKey functions. You can see them in the documentation here: http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht... The types are: fromList :: (Ord k) => [(k,a)] -> Map k a fromListWith :: (Ord k) => (a -> a -> a) -> [(k,a)] -> Map k a fromListWithKey :: (Ord k) => (k -> a -> a -> a) -> [(k,a)] -> Map k a They take a list of (key,value) pairs, and build a map from it. Additionally, the fromListWith function takes a function which specifies how the values should be combined if their keys collide. There is also a fromListWithKey function which allows the means of combination to depend on the key as well. At this point we realise something interesting about the way the data is being represented: if there is a field in someone's record with no row in the database, what should the resulting field contain? In C, they often use some integer which is out of range, like -1 for this. How about for a missing occupation field? Well, that's a String, you could use some generic failure string, or an empty string, but I'll show you another possibility that just might be convenient. If t is any type, then the type (Maybe t) consists of the values Nothing, and Just x, whenever x is a value of type t. This is another convenient way to represent the idea that a computation might fail. Let's start by changing your record type so that each field is a Maybe value, that is, either the value Nothing, or the value Just x, where x is the value it would have been. data Entry = Entry { occupation :: Maybe String , iEnjoyMyJob :: Maybe Int , myJobIsWellDefined :: Maybe Int , myCoworkersAreCooperative :: Maybe Int , myWorkplaceIsStressful :: Maybe Int , myJobIsStressful :: Maybe Int , moraleIsGoodWhereIWork :: Maybe Int , iGetFrustratedAtWork :: Maybe Int } There's a very general function in the module Control.Monad which I'd like to use just for the Maybe type here. It's called mplus, and for Maybe, it works like this: mplus (Just x) _ = Just x mplus Nothing y = y So if the first parameter isn't Nothing, that's what you get, otherwise, you get the second parameter. Of course, this operation has an identity element which is Nothing. So this lets you combine partial information expressed by Maybe types, in a left-biased way. It's about to become obvious that record types are less convenient than perhaps they could be in Haskell, and this is absolutely true -- I'd actually probably use a somewhat different representation myself (possibly something involving a Map from Strings (field names) to Int values), but I can't really be sure what you intend with this data, and how much type safety you want. I'll elide the field names just because I can here. It's not necessarily good style. combine :: Entry -> Entry -> Entry combine (Entry a1 a2 a3 a4 a5 a6 a7 a8) (Entry b1 b2 b3 b4 b5 b6 b7 b8) = Entry (a1 `mplus` b1) (a2 `mplus` b2) (a3 `mplus` b3) (a4 `mplus` b4) (a5 `mplus` b5) (a6 `mplus` b6) (a7 `mplus` b7) (a8 `mplus` b8) Even with all the shorthand, this is pretty ugly (and I'll show how I'd represent the data in a moment), but what this does is to combine two partial entries, favouring the information in the first, but filling the holes in the first with data from the second. This operation has an identity element, which is: emptyEntry = Entry Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing Let's try a different representation, which is a little more flexible, but expresses less in the type system. data Entry = Entry { occupation :: Maybe String, survey :: M.Map String Int } deriving (Eq, Ord, Show) So now, instead of a bunch of separate Maybe Int fields, we have just one Map from String to Int. If we don't have information for a field, we simply won't have that key in the Map. Of course, this means we'll have to use strings for field labels. If that seems unhappy, you could always define a type like: data SurveyQuestion = IEnjoyMyJob | MyJobIsWellDefined | MyCoworkersAreCooperative | MyWorkplaceIsStressful | MyJobIsStressful | MoraleIsGoodWhereIWork | IGetFrustratedAtWork deriving (Eq, Ord, Show) to be used in place of the String type. Let's see how combine will look now: combine :: Entry -> Entry -> Entry combine (Entry o1 s1) (Entry o2 s2) = Entry (o1 `mplus` o2) (s1 `M.union` s2) Or, using the record syntax more: combine :: Entry -> Entry -> Entry combine e1 e2 = Entry { occupation = (occupation e1 `mplus` occupation e2), survey = (survey e1 `M.union` survey e2) } Again, this new version has an identity with respect to combine, which is: emptyEntry = Entry {occupation = Nothing, survey = (M.empty)} Now, we just need a way to take one of your rows, and turn it into a (key,value) pair, where the value is a partial entry. readRow :: [String] -> (Int, Entry) readRow [n, k, "occupation", v] = (read k, emptyEntry { occupation = Just v }) readRow [n, k, f, v] = (read k, emptyEntry { survey = M.singleton f (read v) }) readRow xs = error "failure case, should never happen!" There is actually a failure case that I'm not handling here, which is what happens when the value or key fails to parse as an Int. For that we'd use reads instead of read, but let's ignore it for now. We can then map this function over our cut up rows, something along the lines of: map readRow (cutUp text) at which point we'll have a list of (Int, Entry) pairs. We then want to fill up our Entries Map with those, and we want to combine them as we go using the combine function: entryMap text = M.fromListWith combine (map readRow (cutUp text)) Some final changes we could consider would be putting more of the error handling into readRow itself: if it was to return a singleton Map rather than an (Int, Entry) pair, then it could return the empty Map on failure, and the results would then be combined using the function M.unionsWith combine. We could move the length 4 test out of cutUp then, and just make it the fall-through case in readRow. I'll also use reads, which returns a list of (parse,rest-of-string) pairs, to handle the failure cases where the numbers don't parse, by just treating those rows as nonexistent: readRow :: [String] -> M.Map Int Entry readRow [n, k, f, v] = case reads k of [] -> M.empty -- the key didn't parse (k',_):_ -> if f == "occupation" then M.singleton k' (emptyEntry { occupation = Just v }) else case reads v of [] -> M.empty -- the value didn't parse (v',_):_ -> M.singleton k' (emptyEntry { survey = M.singleton f v' }) readRow xs = M.empty -- this handles the case when the list is any length but 4 cutUp text = tail (map (\x -> split x '\t') (lines text)) -- which allows cutUp to be simpler entryMap text = M.unionsWith combine (map readRow (cutUp text)) Anyway, I hope this tutorial gives some idea of how things progress, and what sort of thinking is generally involved. Note that the focus here was more on finding the right combining operations, and then using pre-existing higher-order functions to collapse the structure, than it was on recursion. Indeed, the final program there doesn't contain any explicit recursion at all! This is generally something to aim for. Using explicit recursion is generally a means of last resort. Higher order functions on data structures are our control structures. - Cale

Cale,
On Feb 20, 2008 10:58 AM, Cale Gibbard
(I'm copying the list on this, since my reply contains a tutorial which might be of use to other beginners.)
Thank you so much for this - I've just started playing with it so few intelligent responses yet. I'm sure it will be of *huge* use to others, right in the middle of the "gap" I fell into. The experiment continues - I'll be back :-) Many thanks, Alan -- ... the PA system was moaning unctuously, like a lady hippopotamus reading A. E. Housman ..." -- James Blish, "They Shall Have Stars"

Cale Gibbard wrote:
I woke up rather early, and haven't much to do, so I'll turn this into a tutorial. :)
Cale, this is fantastic, as always. I often find myself searching for material like this when introducing people to Haskell. Would you be willing to put this on the wiki? Thanks, Yitz

Many thanks for the explanations when I was first experimenting with Haskell. I managed to finish translating a C++ wxWidgets program into Haskell wxHaskell, and am certainly impressed. I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at: http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/ Regards, Alan -- ... the PA system was moaning unctuously, like a lady hippopotamus reading A. E. Housman ..." -- James Blish, "They Shall Have Stars"

Alan Carter wrote:
I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at:
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
Thank you for writing this. On the lack of simple examples showing, for example, file IO: I seem to recall a Perl book (maybe it was Edition 1 of the Camel Book) which had lots of very short programs each illustrating one typical job. Also the Wiki does have some pages of "worked example" programs. But I agree, we could do better. I'm surprised you found the significant whitespace difficult. Yes, the formal rules are a bit arcane, but I just read them as "does the Right Thing", and it generally works for me. I didn't know about the significance of comments, but then I've never written an outdented comment. I had a look through your code, and although I admit I haven't done the work, I'm sure that there would be ways of factoring out all the commonality and thereby reducing the length. Finally, thanks for that little story about the BBC B. I had one of those, and I always wondered about that heatsink, and the stonking big resistor next to it. They looked out of scale with the rest of the board. Paul.

Paul Johnson
I'm surprised you found the significant whitespace difficult.
I wonder if this has something to do with the editor one uses? I use Emacs, and just keep hitting TAB, cycling through possible alignments, until things align sensibly. I haven't really tried, but I can imagine lining things up manually would be more painful, especially if mixing tabs and spaces. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, Mar 4, 2008 at 4:16 AM, Ketil Malde
Paul Johnson
writes: I'm surprised you found the significant whitespace difficult.
I wonder if this has something to do with the editor one uses? I use Emacs, and just keep hitting TAB, cycling through possible alignments, until things align sensibly. I haven't really tried, but I can imagine lining things up manually would be more painful, especially if mixing tabs and spaces.
Especially if mixing tabs and spaces indeed. Haskell does the Python thing of assuming that a tab is 8 spaces, which IMO is a mistake. The sensible thing to do if you have a whitespace-sensitive language that accepts both spaces in tabs is to make them incomparable to each other; i.e. main = do <sp><sp>putStrLn $ "Hello" <sp><sp><tab>++ "World" -- compiles fine main = do <sp><sp>putStrLn $ "Hello" <tab>++ "World" -- error, can't tell how indented '++ "World"' is... Luke

On 04/03/2008, Luke Palmer
On Tue, Mar 4, 2008 at 4:16 AM, Ketil Malde
wrote: Paul Johnson
writes: I'm surprised you found the significant whitespace difficult.
I wonder if this has something to do with the editor one uses? I use Emacs, and just keep hitting TAB, cycling through possible alignments, until things align sensibly. I haven't really tried, but I can imagine lining things up manually would be more painful, especially if mixing tabs and spaces.
Especially if mixing tabs and spaces indeed. Haskell does the Python thing of assuming that a tab is 8 spaces, which IMO is a mistake. The sensible thing to do if you have a whitespace-sensitive language that accepts both spaces in tabs is to make them incomparable to each other; i.e. <snip>
I honestly think that tab characters occurring anywhere but in a comment should be considered a lexical error and rejected by the compiler outright. More problems are caused by trying to continue with only tabs, or some mixture of tabs and spaces than just getting one's editor to expand tabs automatically. - Cale

Especially if mixing tabs and spaces indeed. Haskell does the Python thing of assuming that a tab is 8 spaces, which IMO is a mistake. The
FWIW, most people in python land think the same thing, and the -t flag makes mixed tabs and spaces a warning or error. At the least, -Wall could report mixed usage. At the most, make it an error.

On 04/03/2008, Alan Carter
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
That was an interesting read. Thanks for posting it. I also liked the tale of the BBC ULA - it reminded me of a demo I saw once at an Acorn show, where they had a RISC PC on show, with a (IBM) PC card in it. They were demonstrating how hot the PC chip runs compared to the ARM RISC chip by using it to make toast. I dread to think what you could do with one of today's monsters :-) Paul.

About the line length needed for Haskell programs, there was a discussion
about this some time ago, that could be regarded as a tutorial for
reducing indentation:
http://haskell.org/pipermail/haskell-cafe/2007-July/028787.html
As for the idle core you mention: I keep one core fully occupied with a
program that searches for a cure against cancer, see:
http://www.computeagainstcancer.org/
The example you gave for the use of "map" can be simplified:
map func (take (10 [0..])) -- should actually be: map func (take 10
[0..])
->
map func [0..9]
Regards,
Henk-Jan van Tuyl
--
http://functor.bamikanarie.com
http://Van.Tuyl.eu/
--
On Tue, 04 Mar 2008 07:29:24 +0100, Alan Carter
Many thanks for the explanations when I was first experimenting with Haskell. I managed to finish translating a C++ wxWidgets program into Haskell wxHaskell, and am certainly impressed.
I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at:
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
Regards,
Alan
-- -- Met vriendelijke groet, Henk-Jan van Tuyl -- http://functor.bamikanarie.com http://Van.Tuyl.eu/ --

2008/3/4, Alan Carter
I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at:
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
This is truly interesting, any learning experience is enlightening, we truly do need to lower this barrier of admittance of which you speak. On another subject, there are still point in your code that could be clearer or done with less cruft : maxOfHistogram stats = snd (foldl (\(cA, vA) (cB, vB) -> if (vA > vB) then (cA, vA) else (cB, vB)) (0, 0) stats) can become : maxofHistogram stats = foldl' max 0 (map snd stats) ("foldl' max 0" could be replaced by "maximum" but there wouldn't be a default 0 anymore) more importantly, you can replace this kind of code : vA <- varCreate [] vB <- varCreate [] -- ... vL <- varCreate [] vM <- varCreate [] vN <- varCreate [] vO <- varCreate [] by : [vA, vB, vC, vD, vE, vF, vG, vH, vI, vJ, vK, vL, vM, vN, vO] <- replicateM 15 (varCreate []) (true also for the "dA <- textEntry statusFrame [text := "0", alignment := AlignRight]" sequence) I'm not sure that functions like getdTotal couldn't be improved, I wonder if a small Map for the elements of d wouldn't make the code much better and offer other opportunities for abstractions. As it is, enumeration like : [[label "Total Entries", widget (getdTotal d)] ,[label "Valid Entries", widget (getdValid d)] -- ... ,[label "MDMA", widget (getdMdma d)] ,[label "Caffeine", widget (getdCaffeine d)]] could be slightly reduced by : let bindLabelAndWidget (lbl,getter) = [label lbl, widget (getter d)] in map bindLabelAndWidget [("Total Entries", getdTotal), ("Valid Entries", getdValid) ,(...)] And little thing like : mapM_ (\f -> do repaint f) knownFrames becoming : mapM_ repaint knownFrames I also do concur that a flag or a warning to signal mixed tabulations and space would be a _very_ good idea ! -- Jedaï

chaddai.fouche:
2008/3/4, Alan Carter
: I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at:
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
I also do concur that a flag or a warning to signal mixed tabulations and space would be a _very_ good idea !
Such a flag already exists: -fwarn-tabs As in: $ ghc -fwarn-tabs A.hs -no-recomp A.hs:3:0: Tab character -- Don

Thanks for an interesting write-up. And not bad for a first Haskell
program. :)
There's still a number of things you could do to limit the boiler plate
code, though.
On Tue, Mar 4, 2008 at 6:29 AM, Alan Carter
Many thanks for the explanations when I was first experimenting with Haskell. I managed to finish translating a C++ wxWidgets program into Haskell wxHaskell, and am certainly impressed.
I've written up some reflections on my newbie experience together with both versions, which might be helpful to people interested in popularizing Haskell, at:
http://the-programmers-stone.com/2008/03/04/a-first-haskell-experience/
Regards,
Alan
-- ... the PA system was moaning unctuously, like a lady hippopotamus reading A. E. Housman ..." -- James Blish, "They Shall Have Stars" _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Concerning the Haskell program that does some statistics and displays some graphs, I must say that if that were the task I had to solve I would not use either C++ or Haskell, but R, the open source S lookalike. The best way to be productive as a programmer is to not write code if you can steal it. R looks like an imperative language, but it is "value-oriented" in the same way that SETL is, so is by some criteria a functional language of sorts.

A quick note here. This is a *really* excellent tutorial on a variety of subjects. It shows how monad operators can be used responsibly (to clarify code, not obfuscate it), it shows how chosing a good data structure and a good algorithm can work wonders for your code, and on a simplistic level, it shows how to build a database in Haskell. Would it be possible to clean this up and put it in the wiki somewhere? Thanks Bob On 20 Feb 2008, at 09:58, Cale Gibbard wrote:
(I'm copying the list on this, since my reply contains a tutorial which might be of use to other beginners.)
On 19/02/2008, Alan Carter
wrote: Hi Cale,
On Feb 19, 2008 3:48 PM, Cale Gibbard
wrote: Just checking up, since you haven't replied on the list. Was my information useful? Did I miss any questions you might have had? If you'd like, I posted some examples of using catch here:
Thanks for your enquiry! My experiment continues. I did put a progress report on the list - your examples together with a similar long an short pair got me over the file opening problem, and taught me some things about active whitespace :-) I couldn't get withFile working (says out of scope, maybe 'cos I'm ghc 6.6 on my Mac)
Make sure to put:
import System.IO
at the top of your source file, if you haven't been. This should import everything documented here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html
but it turned out the line I was looking for (collapsed from the examples) was:
text <- readFile "data.txt" `catch` \_ -> return ""
This ensures the program never loses control, crashing or becoming unpredictable by attempting to use an invalid resource, by yielding an empty String if for any reason the file read fails. Then an empty String makes it very quickly through parsing. I guess that's quite "functiony" :-)
Amazing how easy once I knew how. Even stranger that I couldn't find a "bread and butter" example of it.
Then I was going very quickly for a while. My file is dumped from a WordPress MySql table. Well formed lines have 4 tab separated fields (I'm using pipes for tabs here):
line id | record id | property | value
Line IDs are unique and don't matter. All lines with the same record ID give a value to a property in the same record, similar to this:
1|1|name|arthur 2|1|quest|seek holy grail 3|1|colour|blue 4|2|name|robin 5|2|quest|run away 6|2|colour|yellow
Organizing that was a joy. It took minutes:
let cutUp = tail (filter (\fields -> (length fields) == 4) (map (\x -> split x '\t') (lines text)))
This should almost certainly be a function of text:
cutUp text = tail (filter (\fields -> (length fields) == 4) (map (\x -> split x '\t') (lines text)))
I found a split on someone's blog (looking for a library tokenizer), but I can understand it just fine. I even get to chuck out ill-formed lines and remove the very first (which contains MySql column names) on the way through!
Sadly, there's no general library function for doing this. We have words and lines (and words would work here, if your fields never have spaces), but nobody's bothered to put anything more general for simple splitting into the base libraries (though I'm sure there's plenty on hackage -- MissingH has a Data.String.Utils module which contains split and a bunch of others, for example). However, for anything more complicated, there are also libraries like Parsec, which are generally really effective, so I highly recommend looking at that at some point.
I then made a record to put things in, and wrote some lines to play with it (these are the real property names):
data Entry = Entry { occupation :: String , iEnjoyMyJob :: Int , myJobIsWellDefined :: Int , myCoworkersAreCooperative :: Int , myWorkplaceIsStressful :: Int , myJobIsStressful :: Int , moraleIsGoodWhereIWork :: Int , iGetFrustratedAtWork :: Int } ... let e = Entry{occupation = "", iEnjoyMyJob = 0} let f = e {occupation = "alan"} let g = f {iEnjoyMyJob = 47} putStrLn ((occupation g) ++ " " ++ (show (iEnjoyMyJob g)))
Then I ran into another quagmire. I think I have to use Data.Map to build a collection of records keyed by record id, and fill them in by working through the list of 4 item lists called cutUp. As with the file opening problem I can find a few examples that convert a list of tuples to a Data.Map, one to one. I found a very complex example that convinced me a map from Int to a record is possible, but gave me no understanding of how to do it. I spent a while trying to use foldl before I decided it can't be appropriate (I need to pass more values). So I tried a couple of recursive functions, something like:
type Entries = M.Map Int Entry ... let entries = loadEntries cutUp ... loadEntries :: [[String]] -> Entries loadEntries [] = M.empty Entries loadEntries [x : xs] = loadEntry (loadEntries xs) x -- Possible common beginner error here: [x:xs] means the list with one element which is a list whose first element is x and whose tail is xs. Your type signature and the type of cutUp seems to confirm that this is the right type, but you don't seem to have a case to handle a longer list of lists. If you want just a list with first entry x, and with tail xs, that's just (x:xs). If you want to handle lists of lists recursively, you'll generally need two cases: ([]:xss) and ((x:xs):xss). We'll end up doing something different instead of recursion in a moment.
loadEntry entries _ rid fld val = entries
Trying to create an empty map at the bottom of the recursion so later I can try to fiddle about checking if the key is present and crating a new record otherwise, then updating the record with a changed one (a big case would be needed deep in to do each property update). If I'm on the right track it's not good enough to get better, so now I'm just throwing bits of forest animals into the pot at random again :-(
So I certainly would be grateful for a clue! The bits I can do (I got a non-trivial wxHaskell frame sorted out quite easily, the tokenizing and record bit were OK) I think show I'm not *totally* stupid at this, I'm putting loads of time investment in (it's an experiement in itself) but there do seem to be certain specific things that would be ubiquitous patterns in any production or scripting environment, which are not discussed at all and far from obvious. The more I see of Haskell the more I suspect this issue is the gating one for popular uptake.
I couldn't help thinking of this bit, from the Wikipedia entry on the Cocteau Twins:
"The band's seventh LP, Four-Calendar Café, was released in late 1993. It was a departure from the heavily-processed, complex and layered sounds of Blue Bell Knoll and Heaven or Las Vegas, featuring clearer and more minimalistic arrangements. This, along with the record's unusually comprehensible lyrics, led to mixed reviews for the album: Some critics accused the group of selling out and producing an 'accessible album,' while others praised the new direction as a felicitous development worthy of comparison with Heaven or Las Vegas."
Best wishes,
Alan
I woke up rather early, and haven't much to do, so I'll turn this into a tutorial. :)
Okay. The most common ways to build a map are by using the fromList, fromListWith, or fromListWithKey functions. You can see them in the documentation here:
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht...
The types are:
fromList :: (Ord k) => [(k,a)] -> Map k a
fromListWith :: (Ord k) => (a -> a -> a) -> [(k,a)] -> Map k a
fromListWithKey :: (Ord k) => (k -> a -> a -> a) -> [(k,a)] -> Map k a
They take a list of (key,value) pairs, and build a map from it. Additionally, the fromListWith function takes a function which specifies how the values should be combined if their keys collide. There is also a fromListWithKey function which allows the means of combination to depend on the key as well.
At this point we realise something interesting about the way the data is being represented: if there is a field in someone's record with no row in the database, what should the resulting field contain? In C, they often use some integer which is out of range, like -1 for this.
How about for a missing occupation field? Well, that's a String, you could use some generic failure string, or an empty string, but I'll show you another possibility that just might be convenient.
If t is any type, then the type (Maybe t) consists of the values Nothing, and Just x, whenever x is a value of type t. This is another convenient way to represent the idea that a computation might fail.
Let's start by changing your record type so that each field is a Maybe value, that is, either the value Nothing, or the value Just x, where x is the value it would have been.
data Entry = Entry { occupation :: Maybe String , iEnjoyMyJob :: Maybe Int , myJobIsWellDefined :: Maybe Int , myCoworkersAreCooperative :: Maybe Int , myWorkplaceIsStressful :: Maybe Int , myJobIsStressful :: Maybe Int , moraleIsGoodWhereIWork :: Maybe Int , iGetFrustratedAtWork :: Maybe Int }
There's a very general function in the module Control.Monad which I'd like to use just for the Maybe type here. It's called mplus, and for Maybe, it works like this:
mplus (Just x) _ = Just x mplus Nothing y = y
So if the first parameter isn't Nothing, that's what you get, otherwise, you get the second parameter. Of course, this operation has an identity element which is Nothing.
So this lets you combine partial information expressed by Maybe types, in a left-biased way.
It's about to become obvious that record types are less convenient than perhaps they could be in Haskell, and this is absolutely true -- I'd actually probably use a somewhat different representation myself (possibly something involving a Map from Strings (field names) to Int values), but I can't really be sure what you intend with this data, and how much type safety you want.
I'll elide the field names just because I can here. It's not necessarily good style.
combine :: Entry -> Entry -> Entry combine (Entry a1 a2 a3 a4 a5 a6 a7 a8) (Entry b1 b2 b3 b4 b5 b6 b7 b8) = Entry (a1 `mplus` b1) (a2 `mplus` b2) (a3 `mplus` b3) (a4 `mplus` b4) (a5 `mplus` b5) (a6 `mplus` b6) (a7 `mplus` b7) (a8 `mplus` b8)
Even with all the shorthand, this is pretty ugly (and I'll show how I'd represent the data in a moment), but what this does is to combine two partial entries, favouring the information in the first, but filling the holes in the first with data from the second. This operation has an identity element, which is:
emptyEntry = Entry Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing
Let's try a different representation, which is a little more flexible, but expresses less in the type system.
data Entry = Entry { occupation :: Maybe String, survey :: M.Map String Int } deriving (Eq, Ord, Show)
So now, instead of a bunch of separate Maybe Int fields, we have just one Map from String to Int. If we don't have information for a field, we simply won't have that key in the Map. Of course, this means we'll have to use strings for field labels. If that seems unhappy, you could always define a type like:
data SurveyQuestion = IEnjoyMyJob | MyJobIsWellDefined | MyCoworkersAreCooperative | MyWorkplaceIsStressful | MyJobIsStressful | MoraleIsGoodWhereIWork | IGetFrustratedAtWork deriving (Eq, Ord, Show)
to be used in place of the String type.
Let's see how combine will look now:
combine :: Entry -> Entry -> Entry combine (Entry o1 s1) (Entry o2 s2) = Entry (o1 `mplus` o2) (s1 `M.union` s2)
Or, using the record syntax more:
combine :: Entry -> Entry -> Entry combine e1 e2 = Entry { occupation = (occupation e1 `mplus` occupation e2), survey = (survey e1 `M.union` survey e2) }
Again, this new version has an identity with respect to combine, which is:
emptyEntry = Entry {occupation = Nothing, survey = (M.empty)}
Now, we just need a way to take one of your rows, and turn it into a (key,value) pair, where the value is a partial entry.
readRow :: [String] -> (Int, Entry) readRow [n, k, "occupation", v] = (read k, emptyEntry { occupation = Just v }) readRow [n, k, f, v] = (read k, emptyEntry { survey = M.singleton f (read v) }) readRow xs = error "failure case, should never happen!"
There is actually a failure case that I'm not handling here, which is what happens when the value or key fails to parse as an Int. For that we'd use reads instead of read, but let's ignore it for now.
We can then map this function over our cut up rows, something along the lines of:
map readRow (cutUp text)
at which point we'll have a list of (Int, Entry) pairs.
We then want to fill up our Entries Map with those, and we want to combine them as we go using the combine function:
entryMap text = M.fromListWith combine (map readRow (cutUp text))
Some final changes we could consider would be putting more of the error handling into readRow itself: if it was to return a singleton Map rather than an (Int, Entry) pair, then it could return the empty Map on failure, and the results would then be combined using the function M.unionsWith combine. We could move the length 4 test out of cutUp then, and just make it the fall-through case in readRow. I'll also use reads, which returns a list of (parse,rest-of-string) pairs, to handle the failure cases where the numbers don't parse, by just treating those rows as nonexistent:
readRow :: [String] -> M.Map Int Entry readRow [n, k, f, v] = case reads k of [] -> M.empty -- the key didn't parse (k',_):_ -> if f == "occupation" then M.singleton k' (emptyEntry { occupation = Just v }) else case reads v of [] -> M.empty -- the value didn't parse (v',_):_ -> M.singleton k' (emptyEntry { survey = M.singleton f v' }) readRow xs = M.empty -- this handles the case when the list is any length but 4
cutUp text = tail (map (\x -> split x '\t') (lines text)) -- which allows cutUp to be simpler
entryMap text = M.unionsWith combine (map readRow (cutUp text))
Anyway, I hope this tutorial gives some idea of how things progress, and what sort of thinking is generally involved. Note that the focus here was more on finding the right combining operations, and then using pre-existing higher-order functions to collapse the structure, than it was on recursion.
participants (14)
-
Alan Carter
-
Cale Gibbard
-
Chaddaï Fouché
-
Don Stewart
-
Evan Laforge
-
hjgtuyl@chello.nl
-
Ketil Malde
-
Lennart Augustsson
-
Luke Palmer
-
Paul Johnson
-
Paul Moore
-
Richard A. O'Keefe
-
Thomas Davie
-
Yitzchak Gale