
johan.tibell:
Hi Haskell Caf?!
I'm writing a perl/python like string templating system which I plan to release soon:
darcs get http://darcs.johantibell.com/template
The goal is to provide simple string templating; no inline code, etc.. An alternative to printf and ++.
Ok. You might also want to briefly look at the other templating system I know of in Haskell, this small module by Stefan Wehr, http://www.cse.unsw.edu.au/~dons/code/icfp05/tests/unit-tests/VariableExpans... Just a quick thing he did for the ICFP contest, but does indicate one way to do it (i.e. via pretty printing).
Example usage:
import qualified Data.ByteString as B import Text.Template
helloTemplate = "Hello, $name! Would you like some ${fruit}s?" helloContext = [("name", "Johan"), ("fruit", "banana")]
test1 = B.putStrLn $ substitute (B.pack helloTemplate) helloContext
I want to make it perform well, especially when creating a template once and then rendering it multiple times. "Compiling" the template is a separate step from rendering in this use case:
compiledTemplate = template $ B.pack helloTemplate
test2 = B.putStrLn $ render compiledTemplate helloContext
A template is represented by a list of template fragments, each fragment is either a ByteString literal or a variable which is looked up in the "context" when rendered.
data Frag = Lit ByteString | Var ByteString newtype Template = Template [Frag]
This leads me to my first question. Would a lazy ByteString be better or worse here? The templates are of limited length. I would say the length is usually between one paragraph and a whole HTML page. The Template data type already acts a bit like a lazy ByteString since it consists of several chunks (although the chunck size is not adjusted to the CPU cache size like with the lazy ByteString).
Probably lazy bytestrings are better here, since you get O(n/k) append cost, rather than O(n). If most strings are small, it mightn't be noticeable.
Currently the context in which a template is rendered is represented by a type class.
class Context c where lookup :: ByteString -> c -> Maybe ByteString
instance Context (Map String String) where lookup k c = liftM B.pack (Map.lookup (B.unpack k) c)
instance Context (Map ByteString ByteString) where lookup = Map.lookup
-- More instance, for [(String, String)], etc.
I added this as a convenience for the user, mainly to work around the problem of not having ByteString literals. A typical usage would have the keys in the context being literals and the values some variables:
note sure if it is relevant, but: pack "Foo" will be converted via rewrite rules to a bytestring literal at compile time. So there's no overhead for having String literals.
someContext = Map.fromList [("name", name), ("fruit", fruit)]
I'm not sure if this was a good decision, With this I'm halfway to the (in)famous Stringable class and it seems like many smarter people than
Yes, seems a little worrying.
me have avoided introducing such a class. How will this affect performace? Take for example the rendering function:
render :: Context c => Template -> c -> ByteString render (Template frags) ctx = B.concat $ map (renderFrag ctx) frags
renderFrag :: Context c => c -> Frag -> ByteString renderFrag ctx (Lit s) = s renderFrag ctx (Var x) = case Text.Template.lookup x ctx of Just v -> v Nothing -> error $ "Key not found: " ++ (B.unpack x)
How will the type dictionary 'c' hurt performance here? Would specializing the function directly in render help?
Hmm. Hard to say: look at the Core code and we will know. Really though, you'll need some stress test cases to be able to make resonable conclusions about performance.
render (Template frags) ctx = B.concat $ map (renderFrag f) frags where f = flip Text.Template.lookup ctx
renderFrag f (Var x) = case f x of
I can see the implementation taking one of the following routes: - Go full Stringable, including for the Template - Revert to Context = Map ByteString ByteString which was the original implementation. - Some middle road, without MPTC, for example:
class Context c where lookup :: ByteString -> c ByteString ByteString -> Maybe ByteString This would allow the user to supply some more efficient data type for lookup but not change the string type. Having a type class would allow me to provide things like the possibility to create a Context from a record where each record accessor function would server as key. Something like:
data Person { personName :: String, personAge :: Int } would get converted (using Data?) to: personContext = [("personName", show $ personName aPerson), ("personAge", show $ personAge aPerson)] but not actually using a Map but the record itself.
I guess my more general question is: how do I reason about the performance of my code or any code like this? Are there any other performance improvements that could be made?
Also, I would be grateful if someone could provide some feedback on the implementation, anything goes!
I still have some known TODOs:
- Import error messages for invalid uses of "$". - Improve the regex usage overall. - Add some more functions; the plan is to add those function which could be expressed in efficiently with the current interface. An example is things like renderAndWrite, when writing doing a B.concat first is unnecessary.
I'd suggest: keep it simple and fast. Then work out what extra stuff you need. -- Don