Re: [Haskell-cafe] Efficient use of ByteString and type classes in template system

16 Apr 2007


      johan.tibell:
...
Hi Haskell Caf?!
I'm writing a perl/python like string templating system which I plan
to release soon:
darcs get http://darcs.johantibell.com/template
The goal is to provide simple string templating; no inline code, etc..
An alternative to printf and ++.
Ok. You might also want to briefly look at the other templating system I
know of in Haskell, this small module by Stefan Wehr,

    http://www.cse.unsw.edu.au/~dons/code/icfp05/tests/unit-tests/VariableExpans...

Just a quick thing he did for the ICFP contest, but does indicate one
way to do it (i.e. via pretty printing).
...
Example usage:
...
import qualified Data.ByteString as B
import Text.Template
helloTemplate = "Hello, $name! Would you like some ${fruit}s?"
helloContext = [("name", "Johan"), ("fruit", "banana")]
test1 = B.putStrLn $ substitute (B.pack helloTemplate) helloContext
I want to make it perform well, especially when creating a template
once and then rendering it multiple times. "Compiling" the template is
a separate step from rendering in this use case:
...
compiledTemplate = template $ B.pack helloTemplate
test2 = B.putStrLn $ render compiledTemplate helloContext
A template is represented by a list of template fragments, each
fragment is either a ByteString literal or a variable which is looked
up in the "context" when rendered.
...
data Frag = Lit ByteString | Var ByteString
newtype Template = Template [Frag]
This leads me to my first question. Would a lazy ByteString be better
or worse here? The templates are of limited length. I would say the
length is usually between one paragraph and a whole HTML page. The
Template data type already acts a bit like a lazy ByteString since it
consists of several chunks (although the chunck size is not adjusted
to the CPU cache size like with the lazy ByteString).
Probably lazy bytestrings are better here, since you get O(n/k) append
cost, rather than O(n).  If most strings are small, it mightn't be
noticeable.
...
Currently the context in which a template is rendered is represented
by a type class.
...
class Context c where
   lookup :: ByteString -> c -> Maybe ByteString
instance Context (Map String String) where
   lookup k c = liftM B.pack (Map.lookup (B.unpack k) c)
instance Context (Map ByteString ByteString) where
   lookup = Map.lookup
-- More instance, for [(String, String)], etc.
I added this as a convenience for the user, mainly to work around the
problem of not having ByteString literals. A typical usage would have
the keys in the context being literals and the values some variables:
note sure if it is relevant, but:

    pack "Foo"

will be converted via rewrite rules to a bytestring literal at compile
time. So there's no overhead for having String literals.
...
...
someContext = Map.fromList [("name", name), ("fruit", fruit)]
I'm not sure if this was a good decision, With this I'm halfway to the
(in)famous Stringable class and it seems like many smarter people than
Yes, seems a little worrying.
...
me have avoided introducing such a class. How will this affect
performace? Take for example the rendering function:
...
render :: Context c => Template -> c -> ByteString
render (Template frags) ctx = B.concat $ map (renderFrag ctx) frags
renderFrag :: Context c => c -> Frag -> ByteString
renderFrag ctx (Lit s) = s
renderFrag ctx (Var x) = case Text.Template.lookup x ctx of
                          Just v  -> v
                          Nothing -> error $ "Key not found: " ++ 
                          (B.unpack x)
How will the type dictionary 'c' hurt performance here? Would
specializing the function directly in render help?
Hmm. Hard to say: look at the Core code and we will know.

Really though, you'll need some stress test cases to be able to make
resonable conclusions about performance.
...
...
render (Template frags) ctx = B.concat $ map (renderFrag f) frags
   where f = flip Text.Template.lookup ctx
renderFrag f (Var x) = case f x of
I can see the implementation taking one of the following routes:
- Go full Stringable, including for the Template
- Revert to Context = Map ByteString ByteString which was the original
implementation.
- Some middle road, without MPTC, for example:
...
class Context c where
   lookup :: ByteString -> c ByteString ByteString -> Maybe ByteString
This would allow the user to supply some more efficient data type for
lookup but not change the string type. Having a type class would allow
me to provide things like the possibility to create a Context from a
record where each record accessor function would server as key.
Something like:
...
data Person { personName :: String, personAge :: Int }
would get converted (using Data?) to:
personContext = [("personName", show $ personName aPerson),
                ("personAge", show $ personAge aPerson)]
but not actually using a Map but the record itself.
I guess my more general question is: how do I reason about the
performance of my code or any code like this? Are there any other
performance improvements that could be made?
Also, I would be grateful if someone could provide some feedback on
the implementation, anything goes!
I still have some known TODOs:
- Import error messages for invalid uses of "$".
- Improve the regex usage overall.
- Add some more functions; the plan is to add those function which
could be expressed in efficiently with the current interface. An
example is things like renderAndWrite, when writing doing a B.concat
first is unnecessary.
I'd suggest: keep it simple and fast. Then work out what extra stuff you
need.

-- Don

Re: [Haskell-cafe] Efficient use of ByteString and type classes in template system

dons＠cse.unsw.edu.au