Re: [Haskell-cafe] Re: Seeking advice on a style question

30 Dec 2006

      On Fri, 29 Dec 2006 21:01:31 +0100, you wrote:
...
...
...
process :: Item -> MediaKind -> MediaSize -> Language -> SFO
"Item" doesn't tell me anything. Seems to be an XML-File containing the
questions and such.
The reason it's just "Item" is that it can be a number of different
things. It can be a full-blown questionnaire, composed of a number of
questions, but it could also be just one question (sometimes the users
want to see what a question layout looks like before okaying its
inclusion into the questionnaire stream). The functions are overloaded
to handle the various different kinds of Items.
...
Mh, I cannot guess what a "pagemaster" might do, but from its arguments,
it looks like the "ink guy" responsible for actual printing (or
on-screen display). So he might know about graphics, colors and inches
but not about content.
A pagemaster defines the sizes and locations of the various parts of the
page (top and bottom margins, left and right sidebars, body region), as
well as the content of everything except the body region (which is where
the questions go). There are four different page definitions in the
pagemaster: first page, last page, even page and odd page.

The pagemaster also contains a couple of other bits of information that
don't fit neatly anywhere else (discussed below).
...
Maybe one should write
 filter willBeDisplayedQuestion $
instead, but I think the name 'stripUndisplayedQuestions' says it all.
Sure. "stripUndisplayedQuestions" is indeed just a simple filter.
...
...
...
appendEndQuestions item pagemaster $
Uh, why do questions depend on pagemaster and thus on mediaSize? Are
these some floating questions appearing on every page, like the name of
the guy to be questioned? Those should be treated somewhere else.
End questions are questions that are inserted automagically at the end
of (almost) every questionnaire. They depend on the Item because only
questionnaires get them, and they depend on the pagemaster because not
every questionnaire gets them. (This is one of those additional bits of
information that is stored in the pagemaster. It may seem like it would
be better stored in the questionnaire itself, but there are some
complicated reasons why that doesn't work. Obviously, it would be
possible to rearrange the data after it is retrieved from the database,
although I'm not sure that there would be a net simplification.)
...
...
...
coalesceParentedQuestions :: [Question] -> [Question]
                  coalesceParentedQuestions $
This makes me suspicious whether [Question] is the right type. Apparently,
  data Question = GroupedQuestions [String]
or something like that, so that a Question may well be a tree of
questions. Guessing that this function resolves some tree structure that
got specified by explicitly naming nodes, I'd suggest a type
'[QuestionTaggedWithLevel] -> Tree Question' instead. Note that one also
has fold and filter on Trees for further processing.
Some questions are composed of multiple sub-questions that are treated
as separate questions in the database. Because the people who created
and maintain the database have difficulty fully grasping the concept of
trees (or hierarchies in general, actually), I have to jump through a
few hoops here and there to massage the data into something meaningful.

While it's true that a parent question looks superficially like a tree
of child questions, there's more to it than that; the visual layout of
the parent question is not generated by a simple traversal over its
children, for example. So, for all of the processing that follows, a
parent question (one with child questions) looks just like any other
question, and any parent question-specific details remain hidden inside.
...
...
...
validateQuestionContent :: [Question] -> [Question]
                  validateQuestionContent $
Uh, I think the type is plain wrong. Doesn't the name suggest 'Question
-> Bool' and a fatal error when a question content is invalid?
No. The idea is to never fail to assemble the questionnaire. If there is
a question with invalid content, then it is replaced by a dummy question
that contains some descriptive text explaining the problem. So
"validateQuestionContent" might more loquaciously be called
"inspectTheQuestionsAndReplaceAnyThatDontLookRightWithAnErrorMessageShapedLikeAQuestion."

I haven't shown it here, but there is an accompanying Writer that
accumulates a log of errors and warnings as well. The final step
generates and prepends a "job ticket" page onto the output; the errors
and warnings are listed on that page.
...
...
...
loadQuestions item;
'loadQuestions' is a strange name (too imperative) but that's personal
taste and maybe a hint to the fact that 'item' is stored inside a file.
A database, actually. First, the item's details are retrieved, and
depending on what kind of item it is, a list of questions associated
with that item is retrieved. For example, if the item is a
questionnaire, things like the questionnaire title, etc. are retrieved
along with the list of questions contained within the questionnaire.
...
...
...
(numberedQuestions,questionCategories) = numberQuestions pagemaster questions;
Yet again the pagemaster. I don't think that mere numbering should
depend on mediaSize, not even implicitly. Why must questionCategories be
collected? Aren't they inherent in 'Tree Question', so that every Branch
has a unique category? Automatic numbering is fine, though.
Another piece of miscellaneous information contained within the
pagemaster is the starting question number. (Some questionnaires start
with a question number other than 1 because there is a post-processing
step where various "front ends" are pasted onto variable "back
ends"--another example of where a hierarchical approach would have made
more sense, but couldn't be adopted because the database people couldn't
cope.)

"Question categories" is a slight misnomer; it should be "Question
category/question number associations"; they're used in cross-reference
resolution, described below. It would be possible to separate the
question numbering and question category association generation into two
separate passes, but while doing so would eliminate the need to return a
tuple, it wouldn't significantly "linearize" the data flow.
...
...
...
numberedQuestions' = coalesceNAQuestions numberedQuestions;
Does 'NA' mean not answered? Isn't that either a fatal error or a
Maybe-Answer? 'coalesce' makes me suspicious, I could live with a 'filter'.
NA means "not applicable." In order to maintain parallel question
numbers across a range of related questionnaires, some questions are
marked "not applicable" in some questionnaires. The idea of NA question
coalescence is that if there are two or more NA questions in a row, they
are replaced by a single combined NA question. Thus, instead of

 (16) Not applicable
 (17) Not applicable
 (18) Not applicable

we have

 (16)-(18) Not applicable
...
...
...
(bands,sequenceLayouts) = buildLayout mediaKind language numberedQuestions';
Ah, there's no pagemaster, only mediaKind and language, although the
pagemaster would be tempting here. I guess that layout builds for
'endless paper' (band).
At this point, questions lose their identities as questions, and are
replaced by their bands, which are the page body-wide rectangles that
are painted, one after another, into the body regions of the pages. Each
question consits of one or more bands (some questions go for many pages,
and so contain dozens of bands). Each band also contains information
that is used to automatically insert continuation headers and footers
(e.g., "Question (45) continued on next page") whenever a page break
occurs in the middle of a question. This is where "language" comes in,
by the way. Although the item and pagemaster implicitly contain language
information, this is the only place where the questionnaire assembler
itself needs to know what language is being used, because it has to
decide between "Continued" and "Continuación," etc.
...
...
...
bands' = resolveCrossReferences bands;
Mh, cross reference for thing x on page y? But there aren't any pages
yet. Likely that I just don't know what bands are.
That's a typo, by the way; it should have been:

 bands' = resolveCrossReferences bands questionCategories;

Questions are cross-referenced by question number. For example, question
4 might be in the "Sales" category, while question 22 might be "Detailed
Sales." The last item of question 22 might be "Total; should equal the
value reported in (4)." In order to make the layouts as reusable as
possible, rather than hard-coding "(4)" in that last item in (22), there
is a tag that looks something like this:
...
<text>Total; should equal the value reported in <question-ref category="Sales"/>.</text>
...
...
...
groupedBands = groupBands bands';
(can't guess on that)
In order to implement widow/orphan control, not every band is allowed to
start a new page ("keep with previous" and "keep with next," in effect).
Before being handed off to the paginator, the bands are grouped so that
each group of bands begins with a band that _is_ allowed to start a
page, followed by the next n bands that aren't allowed to start a page.
Each grouped band is then treated by the paginator as an indivisible
entity. (At this point, the grouped bands could be coalesced into single
bands, but doing so adds a bit of unnecessary overhead to the rendering
phase.)
...
...
...
pages = paginate item mediaKind mediaSize pagemaster groupedBands;
Now, the dependence on mediaSize is fine. But it's duplicated by pagemaster.
That's correct; it would be possible to simply copy some of the
information that needs to be propagated into a "long-lived" structure
like the pagemaster.
...
...
...
pages' = combineRows pages;
     sfo = createSFO pages' sequenceLayouts;
     in sfo
(can't guess on that)
SFO is the final XML-format document description that is then rendered
to a variety of output devices (screen, printer, PS or PDF file, etc.).
...
In summary, I think that the dependencies on the pagemaster are not
adequate, he mixes too many concerns that should be separated.
True, but then that's even more miscellaneous bits and pieces to carry
around. I guess what makes me uncomfortable is that when I'm writing
down a function like process1 (not its real name, as you might imagine),
I want to concentrate on the high-level data flow and the steps of the
transformation. I don't want to have to exposes all of the little bits
and pieces that aren't really relevant to the high-level picture.
Obviously, in the definitions of the functions that make up process1,
those details become important, but all of that should be internal to
those function definitions.

Steve Schafer
Fenestra Technologies Corp.
http://www.fenestra.com/