New subject: Doubting Haskell

20 Feb 2008

      (I'm copying the list on this, since my reply contains a tutorial
which might be of use to other beginners.)

On 19/02/2008, Alan Carter  wrote:
...
Hi Cale,
On Feb 19, 2008 3:48 PM, Cale Gibbard  wrote:
...
Just checking up, since you haven't replied on the list. Was my
information useful? Did I miss any questions you might have had? If
you'd like, I posted some examples of using catch here:
Thanks for your enquiry! My experiment continues. I did put a progress
report on the list - your examples together with a similar long an
short pair got me over the file opening problem, and taught me some
things about active whitespace :-) I couldn't get withFile working
(says out of scope, maybe 'cos I'm ghc 6.6 on my Mac)
Make sure to put:

import System.IO

at the top of your source file, if you haven't been. This should
import everything documented here:
http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html
...
but it turned out the line I was looking for (collapsed from the examples)
was:
text <- readFile "data.txt" `catch` \_ -> return ""
This ensures the program never loses control, crashing or becoming
unpredictable by attempting to use an invalid resource, by yielding an
empty String if for any reason the file read fails. Then an empty
String makes it very quickly through parsing. I guess that's quite
"functiony" :-)
Amazing how easy once I knew how. Even stranger that I couldn't find a
"bread and butter" example of it.
Then I was going very quickly for a while. My file is dumped from a
WordPress MySql table. Well formed lines have 4 tab separated fields
(I'm using pipes for tabs here):
line id | record id | property | value
Line IDs are unique and don't matter. All lines with the same record
ID give a value to a property in the same record, similar to this:
1|1|name|arthur
2|1|quest|seek holy grail
3|1|colour|blue
4|2|name|robin
5|2|quest|run away
6|2|colour|yellow
Organizing that was a joy. It took minutes:
let cutUp = tail (filter (\fields -> (length fields) == 4)
                                      (map (\x -> split x '\t') (lines text)))

This should almost certainly be a function of text:

cutUp text = tail (filter (\fields -> (length fields) == 4)
                                 (map (\x -> split x '\t') (lines text)))
...
I found a split on someone's blog (looking for a library tokenizer),
but I can understand it just fine. I even get to chuck out ill-formed
lines and remove the very first (which contains MySql column names) on
the way through!
Sadly, there's no general library function for doing this. We have
words and lines (and words would work here, if your fields never have
spaces), but nobody's bothered to put anything more general for simple
splitting into the base libraries (though I'm sure there's plenty on
hackage -- MissingH has a Data.String.Utils module which contains
split and a bunch of others, for example). However, for anything more
complicated, there are also libraries like Parsec, which are generally
really effective, so I highly recommend looking at that at some point.
...
I then made a record to put things in, and wrote some lines to play
with it (these are the real property names):
data Entry = Entry
  { occupation                :: String
  , iEnjoyMyJob               :: Int
  , myJobIsWellDefined        :: Int
  , myCoworkersAreCooperative :: Int
  , myWorkplaceIsStressful    :: Int
  , myJobIsStressful          :: Int
  , moraleIsGoodWhereIWork    :: Int
  , iGetFrustratedAtWork      :: Int
  }
...
  let e = Entry{occupation = "", iEnjoyMyJob = 0}
  let f = e {occupation = "alan"}
  let g = f {iEnjoyMyJob = 47}
  putStrLn ((occupation g) ++ " " ++ (show (iEnjoyMyJob g)))
Then I ran into another quagmire. I think I have to use Data.Map to
build a collection of records keyed by record id, and fill them in by
working through the list of 4 item lists called cutUp. As with the
file opening problem I can find a few examples that convert a list of
tuples to a Data.Map, one to one. I found a very complex example that
convinced me a map from Int to a record is possible, but gave me no
understanding of how to do it. I spent a while trying to use foldl
before I decided it can't be appropriate (I need to pass more values).
So I tried a couple of recursive functions, something like:
type Entries = M.Map Int Entry
...
  let entries = loadEntries cutUp
...
loadEntries :: [[String]] -> Entries
loadEntries [] = M.empty Entries
loadEntries [x : xs] = loadEntry (loadEntries xs) x
-- Possible common beginner error here: [x:xs] means the list with one
element which is a list whose first element is x and whose tail is xs.
Your type signature and the type of cutUp seems to confirm that this
is the right type, but you don't seem to have a case to handle a
longer list of lists. If you want just a list with first entry x, and
with tail xs, that's just (x:xs). If you want to handle lists of lists
recursively, you'll generally need two cases: ([]:xss) and
((x:xs):xss). We'll end up doing something different instead of
recursion in a moment.
loadEntry entries _ rid fld val = entries
Trying to create an empty map at the bottom of the recursion so later
I can try to fiddle about checking if the key is present and crating a
new record otherwise, then updating the record with a changed one (a
big case would be needed deep in to do each property update). If I'm
on the right track it's not good enough to get better, so now I'm just
throwing bits of forest animals into the pot at random again :-(
So I certainly would be grateful for a clue! The bits I can do (I got
a non-trivial wxHaskell frame sorted out quite easily, the tokenizing
and record bit were OK) I think show I'm not *totally* stupid at this,
I'm putting loads of time investment in (it's an experiement in
itself) but there do seem to be certain specific things that would be
ubiquitous patterns in any production or scripting environment, which
are not discussed at all and far from obvious. The more I see of
Haskell the more I suspect this issue is the gating one for popular
uptake.
I couldn't help thinking of this bit, from the Wikipedia entry on the
Cocteau Twins:
"The band's seventh LP, Four-Calendar Café, was released in late 1993.
It was a departure from the heavily-processed, complex and layered
sounds of Blue Bell Knoll and Heaven or Las Vegas, featuring clearer
and more minimalistic arrangements. This, along with the record's
unusually comprehensible lyrics, led to mixed reviews for the album:
Some critics accused the group of selling out and producing an
'accessible album,' while others praised the new direction as a
felicitous development worthy of comparison with Heaven or Las Vegas."
Best wishes,
Alan
I woke up rather early, and haven't much to do, so I'll turn this into
a tutorial. :)

Okay. The most common ways to build a map are by using the fromList,
fromListWith, or fromListWithKey functions. You can see them in the
documentation here:

http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht...

The types are:

fromList :: (Ord k) => [(k,a)] -> Map k a

fromListWith :: (Ord k) => (a -> a -> a) -> [(k,a)] -> Map k a

fromListWithKey :: (Ord k) => (k -> a -> a -> a) -> [(k,a)] -> Map k a

They take a list of (key,value) pairs, and build a map from it.
Additionally, the fromListWith function takes a function which
specifies how the values should be combined if their keys collide.
There is also a fromListWithKey function which allows the means of
combination to depend on the key as well.

At this point we realise something interesting about the way the data
is being represented: if there is a field in someone's record with no
row in the database, what should the resulting field contain? In C,
they often use some integer which is out of range, like -1 for this.

How about for a missing occupation field? Well, that's a String, you
could use some generic failure string, or an empty string, but I'll
show you another possibility that just might be convenient.

If t is any type, then the type (Maybe t) consists of the values
Nothing, and Just x, whenever x is a value of type t. This is another
convenient way to represent the idea that a computation might fail.

Let's start by changing your record type so that each field is a Maybe
value, that is, either the value Nothing, or the value Just x, where x
is the value it would have been.

data Entry = Entry
  { occupation                :: Maybe String
  , iEnjoyMyJob               :: Maybe Int
  , myJobIsWellDefined        :: Maybe Int
  , myCoworkersAreCooperative :: Maybe Int
  , myWorkplaceIsStressful    :: Maybe Int
  , myJobIsStressful          :: Maybe Int
  , moraleIsGoodWhereIWork    :: Maybe Int
  , iGetFrustratedAtWork      :: Maybe Int
  }

There's a very general function in the module Control.Monad which I'd
like to use just for the Maybe type here. It's called mplus, and for
Maybe, it works like this:

mplus (Just x) _ = Just x
mplus Nothing  y = y

So if the first parameter isn't Nothing, that's what you get,
otherwise, you get the second parameter. Of course, this operation has
an identity element which is Nothing.

So this lets you combine partial information expressed by Maybe types,
in a left-biased way.

It's about to become obvious that record types are less convenient
than perhaps they could be in Haskell, and this is absolutely true --
I'd actually probably use a somewhat different representation myself
(possibly something involving a Map from Strings (field names) to Int
values), but I can't really be sure what you intend with this data,
and how much type safety you want.

I'll elide the field names just because I can here. It's not
necessarily good style.

combine :: Entry -> Entry -> Entry
combine (Entry a1 a2 a3 a4 a5 a6 a7 a8) (Entry b1 b2 b3 b4 b5 b6 b7 b8)
    = Entry (a1 `mplus` b1) (a2 `mplus` b2) (a3 `mplus` b3) (a4 `mplus` b4)
            (a5 `mplus` b5) (a6 `mplus` b6) (a7 `mplus` b7) (a8 `mplus` b8)

Even with all the shorthand, this is pretty ugly (and I'll show how
I'd represent the data in a moment), but what this does is to combine
two partial entries, favouring the information in the
first, but filling the holes in the first with data from the second.
This operation has an identity element, which is:

emptyEntry = Entry Nothing Nothing Nothing Nothing Nothing Nothing
Nothing Nothing

Let's try a different representation, which is a little more flexible,
but expresses less in the type system.

data Entry = Entry { occupation :: Maybe String, survey :: M.Map String Int }
   deriving (Eq, Ord, Show)

So now, instead of a bunch of separate Maybe Int fields, we have just
one Map from String to Int. If we don't have information for a field,
we simply won't have that key in the Map. Of course, this means we'll
have to use strings for field labels. If that seems unhappy, you could
always define a type like:

data SurveyQuestion = IEnjoyMyJob
                    | MyJobIsWellDefined
                    | MyCoworkersAreCooperative
                    | MyWorkplaceIsStressful
                    | MyJobIsStressful
                    | MoraleIsGoodWhereIWork
                    | IGetFrustratedAtWork
    deriving (Eq, Ord, Show)

to be used in place of the String type.

Let's see how combine will look now:

combine :: Entry -> Entry -> Entry
combine (Entry o1 s1) (Entry o2 s2) = Entry (o1 `mplus` o2) (s1 `M.union` s2)

Or, using the record syntax more:

combine :: Entry -> Entry -> Entry
combine e1 e2 = Entry { occupation = (occupation e1 `mplus` occupation e2),
                        survey = (survey e1 `M.union` survey e2) }

Again, this new version has an identity with respect to combine, which is:

emptyEntry = Entry {occupation = Nothing, survey = (M.empty)}

Now, we just need a way to take one of your rows, and turn it into a
(key,value) pair, where the value is a partial entry.

readRow :: [String] -> (Int, Entry)
readRow [n, k, "occupation", v] = (read k, emptyEntry { occupation = Just v })
readRow [n, k, f, v] = (read k, emptyEntry { survey = M.singleton f (read v) })
readRow xs = error "failure case, should never happen!"

There is actually a failure case that I'm not handling here, which is
what happens when the value or key fails to parse as an Int. For that
we'd use reads instead of read, but let's ignore it for now.

We can then map this function over our cut up rows, something along
the lines of:

map readRow (cutUp text)

at which point we'll have a list of (Int, Entry) pairs.

We then want to fill up our Entries Map with those, and we want to
combine them as we go using the combine function:

entryMap text = M.fromListWith combine (map readRow (cutUp text))

Some final changes we could consider would be putting more of the
error handling into readRow itself: if it was to return a singleton
Map rather than an (Int, Entry) pair, then it could return the empty
Map on failure, and the results would then be combined using the
function M.unionsWith combine. We could move the length 4 test out of
cutUp then, and just make it the fall-through case in readRow. I'll
also use reads, which returns a list of (parse,rest-of-string) pairs,
to handle the failure cases where the numbers don't parse, by just
treating those rows as nonexistent:

readRow :: [String] -> M.Map Int Entry
readRow [n, k, f, v] =
   case reads k of
      []       -> M.empty -- the key didn't parse
      (k',_):_ ->
          if f == "occupation"
             then M.singleton k' (emptyEntry { occupation = Just v })
             else case reads v of
                    []       -> M.empty -- the value didn't parse
                    (v',_):_ -> M.singleton k'
                                    (emptyEntry { survey = M.singleton f v' })
readRow xs = M.empty -- this handles the case when the list is any length but 4

cutUp text = tail (map (\x -> split x '\t') (lines text)) -- which
allows cutUp to be simpler

entryMap text = M.unionsWith combine (map readRow (cutUp text))

Anyway, I hope this tutorial gives some idea of how things progress,
and what sort of thinking is generally involved. Note that the focus
here was more on finding the right combining operations, and then
using pre-existing higher-order functions to collapse the structure,
than it was on recursion. Indeed, the final program there doesn't
contain any explicit recursion at all! This is generally something to
aim for. Using explicit recursion is generally a means of last resort.
Higher order functions on data structures are our control structures.

 - Cale

Re: [Haskell-cafe] Doubting Haskell

Cale Gibbard

Alan Carter

Yitzchak Gale

Alan Carter

Paul Johnson

Ketil Malde

Luke Palmer

Cale Gibbard

Evan Laforge

Paul Moore

hjgtuyl＠chello.nl

Chaddaï Fouché

Don Stewart

Lennart Augustsson

Richard A. O'Keefe

Thomas Davie

tags

participants (14)