
On Fri, 29 Dec 2006 21:01:31 +0100, you wrote:
process :: Item -> MediaKind -> MediaSize -> Language -> SFO "Item" doesn't tell me anything. Seems to be an XML-File containing the questions and such.
The reason it's just "Item" is that it can be a number of different things. It can be a full-blown questionnaire, composed of a number of questions, but it could also be just one question (sometimes the users want to see what a question layout looks like before okaying its inclusion into the questionnaire stream). The functions are overloaded to handle the various different kinds of Items.
Mh, I cannot guess what a "pagemaster" might do, but from its arguments, it looks like the "ink guy" responsible for actual printing (or on-screen display). So he might know about graphics, colors and inches but not about content.
A pagemaster defines the sizes and locations of the various parts of the page (top and bottom margins, left and right sidebars, body region), as well as the content of everything except the body region (which is where the questions go). There are four different page definitions in the pagemaster: first page, last page, even page and odd page. The pagemaster also contains a couple of other bits of information that don't fit neatly anywhere else (discussed below).
Maybe one should write filter willBeDisplayedQuestion $ instead, but I think the name 'stripUndisplayedQuestions' says it all.
Sure. "stripUndisplayedQuestions" is indeed just a simple filter.
appendEndQuestions item pagemaster $
Uh, why do questions depend on pagemaster and thus on mediaSize? Are these some floating questions appearing on every page, like the name of the guy to be questioned? Those should be treated somewhere else.
End questions are questions that are inserted automagically at the end of (almost) every questionnaire. They depend on the Item because only questionnaires get them, and they depend on the pagemaster because not every questionnaire gets them. (This is one of those additional bits of information that is stored in the pagemaster. It may seem like it would be better stored in the questionnaire itself, but there are some complicated reasons why that doesn't work. Obviously, it would be possible to rearrange the data after it is retrieved from the database, although I'm not sure that there would be a net simplification.)
coalesceParentedQuestions :: [Question] -> [Question] coalesceParentedQuestions $
This makes me suspicious whether [Question] is the right type. Apparently, data Question = GroupedQuestions [String] or something like that, so that a Question may well be a tree of questions. Guessing that this function resolves some tree structure that got specified by explicitly naming nodes, I'd suggest a type '[QuestionTaggedWithLevel] -> Tree Question' instead. Note that one also has fold and filter on Trees for further processing.
Some questions are composed of multiple sub-questions that are treated as separate questions in the database. Because the people who created and maintain the database have difficulty fully grasping the concept of trees (or hierarchies in general, actually), I have to jump through a few hoops here and there to massage the data into something meaningful. While it's true that a parent question looks superficially like a tree of child questions, there's more to it than that; the visual layout of the parent question is not generated by a simple traversal over its children, for example. So, for all of the processing that follows, a parent question (one with child questions) looks just like any other question, and any parent question-specific details remain hidden inside.
validateQuestionContent :: [Question] -> [Question] validateQuestionContent $
Uh, I think the type is plain wrong. Doesn't the name suggest 'Question -> Bool' and a fatal error when a question content is invalid?
No. The idea is to never fail to assemble the questionnaire. If there is a question with invalid content, then it is replaced by a dummy question that contains some descriptive text explaining the problem. So "validateQuestionContent" might more loquaciously be called "inspectTheQuestionsAndReplaceAnyThatDontLookRightWithAnErrorMessageShapedLikeAQuestion." I haven't shown it here, but there is an accompanying Writer that accumulates a log of errors and warnings as well. The final step generates and prepends a "job ticket" page onto the output; the errors and warnings are listed on that page.
loadQuestions item;
'loadQuestions' is a strange name (too imperative) but that's personal taste and maybe a hint to the fact that 'item' is stored inside a file.
A database, actually. First, the item's details are retrieved, and depending on what kind of item it is, a list of questions associated with that item is retrieved. For example, if the item is a questionnaire, things like the questionnaire title, etc. are retrieved along with the list of questions contained within the questionnaire.
(numberedQuestions,questionCategories) = numberQuestions pagemaster questions;
Yet again the pagemaster. I don't think that mere numbering should depend on mediaSize, not even implicitly. Why must questionCategories be collected? Aren't they inherent in 'Tree Question', so that every Branch has a unique category? Automatic numbering is fine, though.
Another piece of miscellaneous information contained within the pagemaster is the starting question number. (Some questionnaires start with a question number other than 1 because there is a post-processing step where various "front ends" are pasted onto variable "back ends"--another example of where a hierarchical approach would have made more sense, but couldn't be adopted because the database people couldn't cope.) "Question categories" is a slight misnomer; it should be "Question category/question number associations"; they're used in cross-reference resolution, described below. It would be possible to separate the question numbering and question category association generation into two separate passes, but while doing so would eliminate the need to return a tuple, it wouldn't significantly "linearize" the data flow.
numberedQuestions' = coalesceNAQuestions numberedQuestions;
Does 'NA' mean not answered? Isn't that either a fatal error or a Maybe-Answer? 'coalesce' makes me suspicious, I could live with a 'filter'.
NA means "not applicable." In order to maintain parallel question numbers across a range of related questionnaires, some questions are marked "not applicable" in some questionnaires. The idea of NA question coalescence is that if there are two or more NA questions in a row, they are replaced by a single combined NA question. Thus, instead of (16) Not applicable (17) Not applicable (18) Not applicable we have (16)-(18) Not applicable
(bands,sequenceLayouts) = buildLayout mediaKind language numberedQuestions';
Ah, there's no pagemaster, only mediaKind and language, although the pagemaster would be tempting here. I guess that layout builds for 'endless paper' (band).
At this point, questions lose their identities as questions, and are replaced by their bands, which are the page body-wide rectangles that are painted, one after another, into the body regions of the pages. Each question consits of one or more bands (some questions go for many pages, and so contain dozens of bands). Each band also contains information that is used to automatically insert continuation headers and footers (e.g., "Question (45) continued on next page") whenever a page break occurs in the middle of a question. This is where "language" comes in, by the way. Although the item and pagemaster implicitly contain language information, this is the only place where the questionnaire assembler itself needs to know what language is being used, because it has to decide between "Continued" and "Continuación," etc.
bands' = resolveCrossReferences bands;
Mh, cross reference for thing x on page y? But there aren't any pages yet. Likely that I just don't know what bands are.
That's a typo, by the way; it should have been: bands' = resolveCrossReferences bands questionCategories; Questions are cross-referenced by question number. For example, question 4 might be in the "Sales" category, while question 22 might be "Detailed Sales." The last item of question 22 might be "Total; should equal the value reported in (4)." In order to make the layouts as reusable as possible, rather than hard-coding "(4)" in that last item in (22), there is a tag that looks something like this:
<text>Total; should equal the value reported in <question-ref category="Sales"/>.</text>
groupedBands = groupBands bands';
(can't guess on that)
In order to implement widow/orphan control, not every band is allowed to start a new page ("keep with previous" and "keep with next," in effect). Before being handed off to the paginator, the bands are grouped so that each group of bands begins with a band that _is_ allowed to start a page, followed by the next n bands that aren't allowed to start a page. Each grouped band is then treated by the paginator as an indivisible entity. (At this point, the grouped bands could be coalesced into single bands, but doing so adds a bit of unnecessary overhead to the rendering phase.)
pages = paginate item mediaKind mediaSize pagemaster groupedBands;
Now, the dependence on mediaSize is fine. But it's duplicated by pagemaster.
That's correct; it would be possible to simply copy some of the information that needs to be propagated into a "long-lived" structure like the pagemaster.
pages' = combineRows pages; sfo = createSFO pages' sequenceLayouts; in sfo
(can't guess on that)
SFO is the final XML-format document description that is then rendered to a variety of output devices (screen, printer, PS or PDF file, etc.).
In summary, I think that the dependencies on the pagemaster are not adequate, he mixes too many concerns that should be separated.
True, but then that's even more miscellaneous bits and pieces to carry around. I guess what makes me uncomfortable is that when I'm writing down a function like process1 (not its real name, as you might imagine), I want to concentrate on the high-level data flow and the steps of the transformation. I don't want to have to exposes all of the little bits and pieces that aren't really relevant to the high-level picture. Obviously, in the definitions of the functions that make up process1, those details become important, but all of that should be internal to those function definitions. Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/