
[Apologies for the long delay in replying; I've been traveling, etc.] On Sun, 31 Dec 2006 20:11:47 +0100, you wrote:
The other extreme is the one I favor: the whole pipeline is expressible as a chain of function compositions via (.). One should be able to write
process = rectangles2pages . questions2rectangles
This means that (rectangles2pages) comes from a (self written) layout library and that (questions2rectangles) comes from a question formatting library and both concern are completely separated from each other. If such a factorization can be achieved, you get clear semantics, bug reduction and code reuse for free.
I favor that approach, too. ;) The problem is that when there is a multi-step process, and various bits of information get propagated throughout, as required by the various steps in the process, the overall decomposition into a series of steps a . b . c . ... can become brittle in the face of changing requirements. Let's say, for example, a change request comes in that now requires step 13 to access information that had previously been discarded back at step 3. The simple approach is to propagate that information in the data structures that are passed among the intervening steps. But that means that all of the steps are "touched" by the change--because the relevant data structures are redefined--even though they're just passing the new data along. The less simple (and not always feasible) approach is to essentially start over again and re-jigger all of the data structures and subprocesses to handle the new requirement. But this can obviously become quite a task.
If there are only the cases of some single question or a full questionnaire, you could always do
blowup :: SingleQuestion -> FullQuestionaire preview = process (blowup a_question) ...
In general, I think that it's the task of (process) to inspect (Item) and to plug together the right steps. For instance, a single question does not need page breaks or similar. I would avoid overloading the (load*) functions and (paginate) on (Item).
A single question can be several pages long, so it does need to be paginated. The reason for the decomposition as it now stands is that any item (and there are more kinds of items than just questions and questionnaires) can be decomposed into a pagemaster and a list of questions. Once that has occurred, all items acquire essentially the same "shape." That's why loading the pagemaster and loading the questions are the first two steps in the process.
Btw, the special place "end" suggests that the "question markup language" does not incorporate all of: "conditional questions", "question groups", "group templates"? Otherwise, I'd just let the user insert
<if media="print"> <template-instance ref="endquestions.xml" /> </if>
at the end of every questionnaire. If you use such a tiny macro language (preferably with sane and simple semantics), you can actually merge (stripUndisplayedQuestions) and (appendEndQuestions) into a function (evalMacros) without much fuss.
If only I had the power to impose those kinds of changes.... Unfortunately, I have little control over the logical organization of questions, questionnaires and all of the other little bits and pieces. (I assure you I would have done it quite differently if I could.) Instead, I have to deal with an ad hoc pseudo-hierarchical quasi-relational database structure, and to settle for occasional extra columns to be added to the tables in order to specify information that I can't synthesize any other way.
Uh, that doesn't sound good. I assume that the post-processing is not implemented in Haskell?
Not even remotely so. ;) In the paper world, post-processing consists of semi-automated collation and stapling of the actual printed pages. In the electronic world, during previous survey periods, an analogous process was used (a "front" questionnaire and a "back" questionnaire would be figuratively stapled together); we're looking to make the merging a bit smoother and more automatic this time around. As is often the case, the motivation for the rather arcane post-processing is human, rather than technical. Let's say I have ten different questionnaires, where the first five pages of each questionnaire are identical, and these are followed by six additional pages that differ from one questionnaire to another. That's a total of 10 * 11 = 110 pages, but only 5 + 10 * 6 = 65 _distinct_ pages. As hard as it may be to believe, the people who are responsible for approving the questionnaires see it like this: If the system produces one 5-page "front" questionnaire and ten 6-page "back" questionnaires, then that's 65 pages that they have to inspect. But if the system were to produce ten 11-page questionnaires, even though the first five pages of each questionnaire are generated from exactly the same data using exactly the same software, that's 110 pages that they have to inspect.
Fine, though I don't see exactly why this isn't done before after the questions have been transformed to printable things but before there are distributed across pages. So the references cannot refer to page numbers, yet must be processed after transforming questions to rectangles?
It's not until you get to the "rectangles" level that you can see the text and tokens that need to be replaced. -------- Thanks for all of the discussion. I think I have a lot to ponder.... Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/