Hi dear Cafe!
I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc.
So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck.
I want to put few words about `pdflatex` and `xelatex` creators first, for someone who will struggle with same task in future, it's quite hard to find code examples on the web.
Initially I wasn't able to render document with `pdflatex` creator. I would like to mention that `pdflatex` required a lot of LaTeX stuff to be installed, especially font packages. Also I've spent several hours to make rendering happen because I haven't specified template in `WriterOptions`. `pdflatex` do not capable to handle Cyrillic Unicode characters, and finally I figured out I have to use `xelatex` creator. Also I've found and used default template:
> pandoc <- readHtml def (toStrict $ renderHtml html)
> tpl' <- getDefaultTemplate "latex"
> makePDF "xelatex" [] writeLaTeX (def {writerTemplate = Just tpl'}) pandoc
But in this case I got white space instead of Cyrillic chars in resulting PDF and a bunch of warnings about missing chars in default font in console. I assume the font itself is specified in template. I've looked into default template and it's huge. I guess I can prepare more simple template for my own needs but it will take a lot of time to get familiar with LaTeX document syntax.
I've tried `wkhtmltopdf`, which seems to be lightweight and easy solution. It seemed to work well except encoding issues: resulting PDF contains Cyrillic which rendered incorrectly. I've tried to pass `["encoding utf-8"]` as arguments in `makePDF` call, but this results in runtime error:
> --margin-bottom specified in incorrect location
Googling around this issue led me to glue that when I pass encoding argument to `wkhtmltopdf` it breaks expected arguments order in command which Pandoc generates. This is likely could be easily fixed, but Pandoc have a lot of opened issues on Github and also it requires some digging into `wkhtmltopdf` command line arguments syntax. I've looked into Pandoc sources and it seems possible to provide simple patch, but I need a guidance. According to `wkhtmltopdf` it distinguish global args, page args, cover args, table of contents args. `encoding` argument is page level argument, but Pandoc put extra args specified in `makePDF` after default page arguments (`pdfargs` in following code sample):
> let args = mathArgs ++ concatMap toArgs
> [("page-size", getField "papersize" meta')
> ,("title", getField "title" meta')
> ,("margin-bottom", Just $ fromMaybe "1.2in"
> (getField "margin-bottom" meta'))
> ,("margin-top", Just $ fromMaybe "1.25in"
> (getField "margin-top" meta'))
> ,("margin-right", Just $ fromMaybe "1.25in"
> (getField "margin-right" meta'))
> ,("margin-left", Just $ fromMaybe "1.25in"
> (getField "margin-left" meta'))
> ,("footer-html", getField "footer-html" meta')
> ,("header-html", getField "header-html" meta')
> ] ++ pdfargs
Likely this breaks everything. The quickest and dirtiest workaround I see is to check each argument, and if it is a page level argument put it for each page object. Another solution may be to specify encoding for Pandoc document some other way, but I can't guess how to do that yet.
Maybe someone have already faced similar task and knows easier way to render HTML to PDF with Haskell. I will very grateful for any help, advice or other glues how to achieve my goal.
Arthur.
_______________________________________________