Splitting of "More on Datatypes"

Hi all, One of the ideas I had while reading the book but didn't have the guts to pull off without asking for feedback was splitting the More on Datatypes module in Intermediate Haskell. As of now, it has a first part which describes an assortment of important datatype techniques (enumerations, records and a more formal presentation of parametrized types - using Maybe as example) and then moves on to a long discussion of binary trees and how to define maps and folds for arbitrary data structures. The coupling of these two parts is very loose (the only concept of the first part needed for the data structure discussions is parametrization) and probably does not justify them being glued together to make such a long (27k) module. Another related issue is that one important omission from the Beginner's Track is an explanation of newtype, and I guess More on Datatypes would be the right place to put it. That would make the module even longer, however, and would be another reason to make the split. The only thing I am not sure of is a good name for the second "half" of More on Datatypes in case of a split. The obvious choice would be "Trees and ???" (I have no idea of a good concise term to replace the ???), although maybe something more generic like "Introduction to Data Structures" could work. Thanks, and see you, Daniel Mlot P.S.: By the way, Apfelmus has moved Type Declarations to just before Pattern Matching. Probably the right thing to do (it just didn't fit very well in Haskell Basics), but it will also require some reworking of Type Declarations (when I made that module more verbose I envisioned it as a preparation for a more general Pattern Matching discussion several modules ahead, but with the rearranging the redundancies are probably too obvious). I will give it a try soon.

Daniel Mlot wrote:
One of the ideas I had while reading the book but didn't have the guts to pull off without asking for feedback was splitting the More on Datatypes module in Intermediate Haskell. As of now, it has a first part which describes an assortment of important datatype techniques (enumerations, records and a more formal presentation of parametrized types - using Maybe as example) and then moves on to a long discussion of binary trees and how to define maps and folds for arbitrary data structures. The coupling of these two parts is very loose (the only concept of the first part needed for the data structure discussions is parametrization) and probably does not justify them being glued together to make such a long (27k) module. Another related issue is that one important omission from the Beginner's Track is an explanation of newtype, and I guess More on Datatypes would be the right place to put it. That would make the module even longer, however, and would be another reason to make the split.
The only thing I am not sure of is a good name for the second "half" of More on Datatypes in case of a split. The obvious choice would be "Trees and ???" (I have no idea of a good concise term to replace the ???), although maybe something more generic like "Introduction to Data Structures" could work.
I concur. In my opinion, the second part about generalized folds and maps does not even belong to the beginner's track; it should be moved to a chapter "Generic Programming" in the "Fun With Types" section. (Some prefer the name "Datatype algebra" for that currently missing chapter, but the standard name is "Generic Programming". By the way, it's prerequisite to the second part of the "Zippers" chapter.)
P.S.: By the way, Apfelmus has moved Type Declarations to just before Pattern Matching. Probably the right thing to do (it just didn't fit very well in Haskell Basics), but it will also require some reworking of Type Declarations (when I made that module more verbose I envisioned it as a preparation for a more general Pattern Matching discussion several modules ahead, but with the rearranging the redundancies are probably too obvious). I will give it a try soon.
Yep, I moved it because it seems silly to discuss data types but then defer the discussion of how to use them (= pattern matching) until much later. My vision for the 'Haskell Basics' section is to present a minimal subset of the Haskell syntax and language that still enables people to write pretty much any program they want. This way, readers don't have to wade through all the syntactic variety before getting to the core concepts. In particular, I propose the following subset: * guards, but no if .. then .. else * where clauses for local definitions, but no let * Lists, list comprehensions and many Prelude functions involving lists; but no pattern matching. (Not sure about omitting pattern matching on lists yet, but the idea is to encourage "whole-meal" programming with lists.) * Standard primitive types Int, Double, Char, String, (,), -> and polymorphism. * Type synonyms to encourage abstraction; but no user-defined data types, tuples will have to serve. * Type classes need to be mentioned because of error messages due to the overloading of numeric literals. Also, readers need the show function. * Very simple IO, do syntax: reading and writing files, interacting on the terminal Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On 05/10/2010 10:51 AM, Heinrich Apfelmus wrote:
I concur. In my opinion, the second part about generalized folds and maps does not even belong to the beginner's track; it should be moved to a chapter "Generic Programming" in the "Fun With Types" section. (Some prefer the name "Datatype algebra" for that currently missing chapter, but the standard name is "Generic Programming". By the way, it's prerequisite to the second part of the "Zippers" chapter.)
In that case, the part about "Trees" would stay more or less where it is now, while the final sections would be moved forward to "Fun With Types/Generic Programming", which would replace the existing red link to "Datatype algebra"? Anyway, I intend to carry out a provisional splitting in two parts - "More on Datatypes" and "Other data structures" (the weak unspecific name is intended to be provisional) so that restructuring of the first part is easier to carry out. Before doing that, however, I must ensure I understand how page moving in Wikibooks works... (By the way, I noticed the missing prerequisite on "Zippers", and would in normal circumstances have got seriously confused by it - but dataype derivatives struck me as such an awesome concept I didn't bother)
Yep, I moved it because it seems silly to discuss data types but then defer the discussion of how to use them (= pattern matching) until much later.
My vision for the 'Haskell Basics' section is to present a minimal subset of the Haskell syntax and language that still enables people to write pretty much any program they want. This way, readers don't have to wade through all the syntactic variety before getting to the core concepts.
In particular, I propose the following subset:
* guards, but no if .. then .. else * where clauses for local definitions, but no let * Lists, list comprehensions and many Prelude functions involving lists; but no pattern matching. (Not sure about omitting pattern matching on lists yet, but the idea is to encourage "whole-meal" programming with lists.) * Standard primitive types Int, Double, Char, String, (,), -> and polymorphism. * Type synonyms to encourage abstraction; but no user-defined data types, tuples will have to serve. * Type classes need to be mentioned because of error messages due to the overloading of numeric literals. Also, readers need the show function. * Very simple IO, do syntax: reading and writing files, interacting on the terminal
A restructuring like this would help to make the book to have better internal consistency. For instance, reworking the way local definitions conditionals are presented in the beginning by presenting just the simpler possibilities would help with finding a proper use for modules like "Control structures" or "More on functions", which as of now are rather annoying because they feel detached from the context of the rest of the book. I also think that the absence of any discussion on how numeric types/classes work is a big problem, and basic reading and writing from files is probably too useful to leave out. I have a few doubts about your subset, though. Firstly, even if you are probably thinking of a full-scale reassembling of the modules and chapters that might nullify my concerns, I wonder if a minimal subset which "enables people to write pretty much any program they want" wouldn't cause too much material to be placed in a single book segment/chapter, in detriment of the other chapters in the Beginner's Trail. Also, with regards to the list-related material, pattern matching looks like such an integral part of day-to-day Haskell usage it seems natural to present initial examples without getting into details of how it actually works. Of course this is just an impression from a newbie haskeller, and that there may be conceptual problems I might not be giving due weight to - for instance, it could well be that many people, after seeing pattern matching in action without a proper explanation, think it is some kind of magic and develop wrong ideas about how the language works (such a situation could make a good reason for stimulating "whole-meal programming"). Finally, a note about exploration of "many Prelude functions involving lists". One of the things I enjoyed about the initial modules of the book was how it presented the essential concepts with minimal "clutter" of alternative syntax and, specially, systematic study of the standard library. I realize that this slower, less practical style probably does not work for everyone, and is occasionally overdone (for instance, by waiting an eternity to introduce type classes), but still I feel such an approach has merit - and furthermore, it would help to distinguish our basic modules from, say, LYAH. There are a few subtler ways of working with Prelude in the book, which is already done semi-intentionally in some parts of the book, such as slipping new Prelude functions in examples and exercises and stimulating readers to be curious (for instance, by using GHCi to immediately query the type signature of every and each unknown function they happen to meet). Regards, Daniel Mlot

Daniel Mlot wrote:
Heinrich Apfelmus wrote:
I concur. In my opinion, the second part about generalized folds and maps does not even belong to the beginner's track; it should be moved to a chapter "Generic Programming" in the "Fun With Types" section.
In that case, the part about "Trees" would stay more or less where it is now, while the final sections would be moved forward to "Fun With Types/Generic Programming", which would replace the existing red link to "Datatype algebra"?
Yes, exactly. (Generic programming = doing things 'generically' for many data types, like generalizing the fold function to every data type.)
My vision for the 'Haskell Basics' section is to present a minimal subset of the Haskell syntax and language that still enables people to write pretty much any program they want. This way, readers don't have to wade through all the syntactic variety before getting to the core concepts. [..] [..] I have a few doubts about your subset, though. Firstly, even if you are probably thinking of a full-scale reassembling of the modules and chapters that might nullify my concerns, I wonder if a minimal subset which "enables people to write pretty much any program they want" wouldn't cause too much material to be placed in a single book segment/chapter, in detriment of the other chapters in the Beginner's Trail.
True, the Beginner's track could lose one segment, but that's not necessarily bad; the number of segments is not set in stone, after all. That said, I envision a separation like this: * Haskell Basics => write any functionality * Elementary Haskell => write idiomatic Haskell code, full syntax * Intermediate Haskell => write libraries, i.e. modules and type classes So, Haskell Basics is really just a subset of what a full Haskell programmer should know, but it's enough to write small programs in a rather limited style that can nonetheless express pretty much anything. For example, I imagine that Haskell Basics teaches only squares n = map square [1..n] where square x = x ^ 2 while a full Haskell programmer would write this as squares n = map (^2) [1..n]
Also, with regards to the list-related material, pattern matching looks like such an integral part of day-to-day Haskell usage it seems natural to present initial examples without getting into details of how it actually works. Of course this is just an impression from a newbie haskeller, and that there may be conceptual problems I might not be giving due weight to - for instance, it could well be that many people, after seeing pattern matching in action without a proper explanation, think it is some kind of magic and develop wrong ideas about how the language works (such a situation could make a good reason for stimulating "whole-meal programming").
There's nothing wrong with pattern matching. If anything, the problem is this: With pattern matching and recursion, you can define any function on lists. But then, it is also tempting to do exactly that and forgo the vocabulary of functions offered in the Prelude, at the detriment of good Haskell style. For instance, consider the following task: you are given a file that has a number on each line and you have to add them all up. A direct, recursive, and utterly unintelligible solution would be to go character by character and accumulate the sum example :: String -> Int example s = go 0 "" s where go sum 0 [] = sum go sum number (c:cs) | c == '\n' = go (sum + read number) 0 cs | isDigit c = go sum (number ++ [c]) cs whereas the proper Haskell way to think about this task is example = sum . map read . lines Okay, I created this example for this very purpose, but it does happen in practice: "Memory usage problem" http://thread.gmane.org/gmane.comp.lang.haskell.beginners/3842 (read the original question and my reply)
Finally, a note about exploration of "many Prelude functions involving lists". One of the things I enjoyed about the initial modules of the book was how it presented the essential concepts with minimal "clutter" of alternative syntax and, specially, systematic study of the standard library. I realize that this slower, less practical style probably does not work for everyone, and is occasionally overdone (for instance, by waiting an eternity to introduce type classes), but still I feel such an approach has merit - and furthermore, it would help to distinguish our basic modules from, say, LYAH. There are a few subtler ways of working with Prelude in the book, which is already done semi-intentionally in some parts of the book, such as slipping new Prelude functions in examples and exercises and stimulating readers to be curious (for instance, by using GHCi to immediately query the type signature of every and each unknown function they happen to meet).
In the light of the above, conveying a good grasp on the vocabulary is necessary, but I agree that a diligent study of standard libraries is not necessarily a good idea. The thing is that the reader can only memorize so many things at once, and it would be a waste to squander this on uninteresting syntactic variations and library functions that he doesn't need right now. I imagine the best way to solve this is to present a subset of the Prelude in style of a "cheat sheet", i.e. as documentation that the reader should consult whenever he needs new vocabulary. Astonishingly, such a thing does not exist yet, at least not how I imagine it. There is * Bernie Pope - A Tour of the Haskell Prelude * GHC doc - Prelude but the former has no thematic categories like the latter while the latter does not restrict itself to the Haskell Basics subset. I've given some private lessons recently and I was surprised to find that it's really tricky to find the show function if you only know that you're looking for something that will convert stuff into a String . Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On 05/12/2010 05:53 AM, Heinrich Apfelmus wrote:
That said, I envision a separation like this:
* Haskell Basics => write any functionality * Elementary Haskell => write idiomatic Haskell code, full syntax * Intermediate Haskell => write libraries, i.e. modules and type classes
So, Haskell Basics is really just a subset of what a full Haskell programmer should know, but it's enough to write small programs in a rather limited style that can nonetheless express pretty much anything.
For example, I imagine that Haskell Basics teaches only
squares n = map square [1..n] where square x = x ^ 2
while a full Haskell programmer would write this as
squares n = map (^2) [1..n]
There is one very important issue which your proposal accounts for. As of now, that there no clear rationale which justifies the division between Basics, Elementary and Intermediate. For that reason, I don't believe the separation is really useful to readers. Furthermore, it is confusing to have "Elementary Haskell" as the second segment of the book, and a restructuring would be a nice opportunity to address such problems with less generic names. As for the number of segments, I guess that as long as the underlying logic is clear it is just a convenience decision - for instance, if Haskell Basics ends up with, say, 15 modules it could be handy to make an arbitrary pt. 1 / pt. 2 division. I will make an exercise and try to display an interpretation of your suggestions using the "List of topics" I created in the Wikibook. By the way: I guess that both "More about lists" and "List processing" would go in their entirety to the new "Basics" (except for the casual "teaser" about currying and partial function application nested in "More about lists")
There's nothing wrong with pattern matching. If anything, the problem is this: With pattern matching and recursion, you can define any function on lists. But then, it is also tempting to do exactly that and forgo the vocabulary of functions offered in the Prelude, at the detriment of good Haskell style.
[...]
Okay, I created this example for this very purpose, but it does happen in practice:
"Memory usage problem" http://thread.gmane.org/gmane.comp.lang.haskell.beginners/3842 (read the original question and my reply)
That thread drives the point quite clearly, and I must admit I can, based on my newbie experiences, relate to this tendency of defining everything explicitly due to inexperience or merely to the desire of playing with the language. Guess that's why someone put a funny-looking "Don't get TOO excited about recursion" header in the "Recursion" module :-)
Daniel Mlot wrote:
One of the things I enjoyed about the initial modules of the book was how it presented the essential concepts with minimal "clutter" of alternative syntax and, specially, systematic study of the standard library.
[...]
In the light of the above, conveying a good grasp on the vocabulary is necessary, but I agree that a diligent study of standard libraries is not necessarily a good idea. The thing is that the reader can only memorize so many things at once, and it would be a waste to squander this on uninteresting syntactic variations and library functions that he doesn't need right now.
I imagine the best way to solve this is to present a subset of the Prelude in style of a "cheat sheet", i.e. as documentation that the reader should consult whenever he needs new vocabulary.
Astonishingly, such a thing does not exist yet, at least not how I imagine it. There is
* Bernie Pope - A Tour of the Haskell Prelude * GHC doc - Prelude
but the former has no thematic categories like the latter while the latter does not restrict itself to the Haskell Basics subset. I've given some private lessons recently and I was surprised to find that it's really tricky to find the show function if you only know that you're looking for something that will convert stuff into a String .
Full agreement here. A cheat sheet should really be a top priority, specially if there is nothing like what we are envisioning available elsewhere. To your considerations on how the cheat sheet should be a "documentation" module detached from the main flow of the book, how it should restrict itself to the essentials and how it should be organized by topics I would only add a proposal on presentation. Essentially, I believe the "clutter" applies both ways - at the same time I don't want to be bothered by systematic discussion of Prelude while I'm trying to grasp the concepts I would like to be able to use the cheat sheet as expediently as possible, without having to wade through vague discussions or minute details if I don't want to, just picking the tool I need and stealing away back to the book or to my code. For that reason, a format like the one of "A Tour of the Haskell Prelude" is truly inadequate for what we intend. Instead, I believe the cheat sheet pages should have two parts: * A very straight-to-the-point table, located at the very beginning of the pages, which presented the functions, type signatures and one-line descriptions, (ideally) in a way that no more than a quick glance should be needed for locating the function you need. * Following the table, a section with more detailed comments on each of the functions. The subsections on each function would be linked from the table for convenience. This separation would also have the advantage of not forcing us to write the extra comments and notes in some strict format, as the necessary restrictions would already have been put in practice in the table. I will help kickstarting the cheat sheet with formatting experiments and some initial examples, even if I don't think I can go too far on my own due to lack of Prelude experience. Regards, Daniel Mlot

Hello everybody, It has been nearly two years since I wrote a flurry of messages to this list, while animatedly copy-editing and restructuring the early parts of the Wikibook and, alongside Apfelmus, establishing grand plans for a reorganization. Regrettably, the energy I had available to back such efforts extinguished itself way too quickly... Anyway, the book-writing bug has just bitten me again, and so over the last few nights I worked towards making the early chapters less confusing. Things are far from perfect, but I believe they are now easier to follow, both for readers and for any contributors trying to figure out what is missing. In particular, as far as I can tell the creeping problems with chapters having prerequisites found later in the book have now been solved. Beyond telling you of such developments, I also would like to ask for your opinion on a specific chapter. Yesterday I reorganized "More About Lists" in an attempt to 1. make it into a coherent story - one which starts with the "obvious" recursive doubleList and ends with multiplyList n = map ((*) n); 2. give readers an informal, "en passant" primer on what higher-order functions are, without getting down to technicalities; and 3. illustrating why map is so useful, of course. The problem is that now I look at the finished text and worry that the "Generalizing even further" section, which prepares the ground for the introduction of map (and where most of the higher-order primer is), is now too dense in information for a newbie. It would be perfectly feasible to avoid most of the higher-order discussion by, immediately after the first "real" paragraph, telling it bluntly that "Haskell allows us to make our function more general by passing another function as argument" and then going straight into the definition of map. (Such an approach, by the way, would be much more in line with Apfelmus' plans (documented in the list archives from around May 2010), which is a good thing as his plans were coherent and had great potential.) Summing it up, while I think there is some pedagogical value in my current approach to "More About Lists", I have serious doubts on whether it would work in practice. So I invite you to skim through it and share your thoughts on those issues. Your help will be much appreciated. Regards, Daniel Mlot
participants (2)
-
Daniel Mlot
-
Heinrich Apfelmus