
Steve Schafer wrote:
[Apologies for the long delay in replying; I've been traveling, etc.] [never mind]
The other extreme is the one I favor: the whole pipeline is expressible as a chain of function compositions via (.). One should be able to write
process = rectangles2pages . questions2rectangles
This means that (rectangles2pages) comes from a (self written) layout library and that (questions2rectangles) comes from a question formatting library and both concern are completely separated from each other. If such a factorization can be achieved, you get clear semantics, bug reduction and code reuse for free.
I favor that approach, too. ;) The problem is that when there is a multi-step process, and various bits of information get propagated throughout, as required by the various steps in the process, the overall decomposition into a series of steps a . b . c . ... can become brittle in the face of changing requirements.
Let's say, for example, a change request comes in that now requires step 13 to access information that had previously been discarded back at step 3. The simple approach is to propagate that information in the data structures that are passed among the intervening steps. But that means that all of the steps are "touched" by the change--because the relevant data structures are redefined--even though they're just passing the new data along.
Ah, I forgot to point it out, polymorphism is your dear friend, of course. For example, 'rectangles2pages' should be fully polymorphic in the stuff that's inside the rectangles, just like for instance 'nub :: Eq a => [a] -> [a]' is polymorphic in the list elements. One possibility is something like class Rectangle a where width, height :: a -> Integer type Pagenumber = Integer data Rectangle a => Pages a = Pages { stickyboxes :: [(Position, a)] -- appear on every page , pagenumberpos :: Position -- absolute numbering later , pages :: Data.Map Pagenumber [(Position, a)] -- actual contents } data Position = Position { x :: Integer, y :: Integer } data Footer a = Footer { content :: a, position :: HAlign } data HAlign = Left | Center | Right rectangles2pages :: Rectangle a => Footer a -> [a] -> Pages a but there are many others. The type of 'rectangles2pages' dictates that it can only rearrange but not alter the data inside 'a' (this is due to _parametric_ polymorphism). Now, you may use it for normal text processing via instance Rectangle Paragraph where Or you can abuse the pagination algorithm to align bread, buns and cookies on several tablets for baking in the stove: instance Rectangle Cookie where -- in the sense of bounding box however you like it :) The point is that you will have minimal trouble when requirements change if you somehow managed to keep 'rectangles2pages' as general as possible. The code above does not need to be changed if you need to carry extra information around in 'a'. That's what I meant with "(self written) layout library": for me, a "library" is necessarily polymorphic. Later on, you can specialize the types. So printing will be print :: Pages Paragraph -> Graphic and it is clear that 'Pages Cookie' cannot be printed. A pipeline could look like process = print . rectangles2pages footer . ... where footer = loadFooter item Note that I chose to still plumb a footer around, which makes sense in case it is fully specified in the item. Of course, it can also be put in pair '(Footer a, [a])': process = print . uncurry rectangles2pages . ... which is the way to go if it is generated on the fly by the previous step in the pipeline. Given suitable generality, the right choice is often natural and satisfying. And who says that there isn't an even better generalization that incorporates the footer more elegantly? Of course, the difficult thing is to discover the right generalization. This can be quite an art. Oh, I can spent days of thinking without writing a single line of code and I have never, ever encountered a situation where it wasn't worth the effort. The point is: you have to implement the corresponding general functionality anyway because often, even the special case needs the full power in some way or another. Implementing it for the special case only is like coding with a blindfold. Still, the generalization can turn out to be inadequate, but because we don't need to worry about the type 'a', things will be easier. And, at some point, the interface doesn't need to change anymore: every change makes it more general, there is a maximum generality and we know that monotone sequences converge :)
Btw, the special place "end" suggests that the "question markup language" does not incorporate all of: "conditional questions", "question groups", "group templates"? Otherwise, I'd just let the user insert
<if media="print"> <template-instance ref="endquestions.xml" /> </if>
at the end of every questionnaire. If you use such a tiny macro language (preferably with sane and simple semantics), you can actually merge (stripUndisplayedQuestions) and (appendEndQuestions) into a function (evalMacros) without much fuss.
If only I had the power to impose those kinds of changes....
Unfortunately, I have little control over the logical organization of questions, questionnaires and all of the other little bits and pieces. (I assure you I would have done it quite differently if I could.) Instead, I have to deal with an ad hoc pseudo-hierarchical quasi-relational database structure, and to settle for occasional extra columns to be added to the tables in order to specify information that I can't synthesize any other way.
Argh. Dealing with badly designed external interfaces is a nightmare. But there may be a way out: while you unfortunately cannot change their database design, they fortunately cannot control how your code represents their database. Who says that you import the database as it is? Who forbids to rearrange it internally? I think I'd interleave a processing step saneImport :: RawDatabase -> (QuestionHierarchy,PageTemplate) that fetches all present information and groups and massages the data until it becomes organized. This is much like your 'questions' and 'pagemaster' combined, but it only depends on the data base content, not on parameters like 'mediaKind' or 'Item' which determine how and what content is to be processed. This way, all troubles arising from a badly designed database are banned into 'saneImport' and don't affect the actual processing step.
Fine, though I don't see exactly why this isn't done before after the questions have been transformed to printable things but before there are distributed across pages. So the references cannot refer to page numbers, yet must be processed after transforming questions to rectangles?
It's not until you get to the "rectangles" level that you can see the text and tokens that need to be replaced.
I don't quite understand why, but let's have another example of polymorphism that may be used before the "rectangle" level but also after. We tackle macros (the end questions can be transformed to them when fetching them from the database) and cross references inside some other data structure. One possibility to transform questions with macros and references to plain text is -- very simple macros data Macro a = Return a | If (MediaKind -> Bool) a execute1 :: MediaKind -> Macro a -> Maybe a execute1 _ (Return x) = Just x execute1 media (If b x) = if b media then Just x else Nothing -- execute can be formulated with an ApplicativeFunctor -- but we'll keep it easy here executeMacros :: MediaKind -> [Macro a] -> [a] executeMacros media xs = concatMap (maybeToList . execute1 media) -- references data Ref a = Here a | Ref CrossRef resolve :: Data.Map CrossRef a -> Ref a -> a resolve = ... -- for simplicity, the lookup always succeeds -- questions whose question text is of type a data Question a = FreeForm a (Answer String) | SingleChoice (Answer Index) [a] instance Functor Question where fmap f (FreeForm x a) = FreeForm (f x) a fmap f (SingleChoice a xs) = SingleChoice a (map f xs) -- now comes the interesting type type RawQuestion = Macro (Question (Ref String)) -- and we want to flatten it to (Question String) textonly :: MediaKind -> Data.Map CrossRef String -> [RawQuestion] -> [Question String] textonly media refs = fmap (resolve refs) . executeMacros media The basic idea is the following: it is clear how to resolve a single cross reference, that's what resolve :: Data.Map CrossRef a -> Ref a -> a does. Now, we simply want to lift this into other data structures: resolveQs :: Data.Map CrossRef a -> Question (Ref String) -> Question String resolveQs refs = fmap (resolve refs) The lifting itself can be made polymorphic in the sourrounding structure: resolveInside :: Functor f => Data.Map CrossRef a -> f (Ref String) -> f String resolveInside refs = fmap (resolve refs) with the help of type constructors and functors. In more complicated cases, one may need other type classes like those from 'Data.Foldable', 'Data.Applicative', 'Data.Traversable'. Depending on the functions that need to be lifted, custom (multi parameter) type classes may quickly show up. In general, the discipline of lifting functions systematically to many data types is called "generic programming". In a sense, not only the pipeline's functionality needs to be assembled from smaller functions, but their types need to be assembled from smaller pieces as well!
As hard as it may be to believe, the people who are responsible for approving the questionnaires see it like this: If the system produces one 5-page "front" questionnaire and ten 6-page "back" questionnaires, then that's 65 pages that they have to inspect. But if the system were to produce ten 11-page questionnaires, even though the first five pages of each questionnaire are generated from exactly the same data using exactly the same software, that's 110 pages that they have to inspect.
X-)
-------- Thanks for all of the discussion. I think I have a lot to ponder....
May the λ guide your path ;) And of course, you can always outsource some pondering to the mailing list. Regards, apfelmus