Generating PDF from HTML with Pandoc

Hi dear Cafe! I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc. So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck. I want to put few words about `pdflatex` and `xelatex` creators first, for someone who will struggle with same task in future, it's quite hard to find code examples on the web. Initially I wasn't able to render document with `pdflatex` creator. I would like to mention that `pdflatex` required a lot of LaTeX stuff to be installed, especially font packages. Also I've spent several hours to make rendering happen because I haven't specified template in `WriterOptions`. `pdflatex` do not capable to handle Cyrillic Unicode characters, and finally I figured out I have to use `xelatex` creator. Also I've found and used default template:
pandoc <- readHtml def (toStrict $ renderHtml html) tpl' <- getDefaultTemplate "latex" makePDF "xelatex" [] writeLaTeX (def {writerTemplate = Just tpl'}) pandoc
But in this case I got white space instead of Cyrillic chars in resulting PDF and a bunch of warnings about missing chars in default font in console. I assume the font itself is specified in template. I've looked into default template and it's huge. I guess I can prepare more simple template for my own needs but it will take a lot of time to get familiar with LaTeX document syntax. I've tried `wkhtmltopdf`, which seems to be lightweight and easy solution. It seemed to work well except encoding issues: resulting PDF contains Cyrillic which rendered incorrectly. I've tried to pass `["encoding utf-8"]` as arguments in `makePDF` call, but this results in runtime error:
--margin-bottom specified in incorrect location
Googling around this issue led me to glue that when I pass encoding argument to `wkhtmltopdf` it breaks expected arguments order in command which Pandoc generates. This is likely could be easily fixed, but Pandoc have a lot of opened issues on Github and also it requires some digging into `wkhtmltopdf` command line arguments syntax. I've looked into Pandoc sources and it seems possible to provide simple patch, but I need a guidance. According to `wkhtmltopdf` it distinguish global args, page args, cover args, table of contents args. `encoding` argument is page level argument, but Pandoc put extra args specified in `makePDF` after default page arguments (`pdfargs` in following code sample):
let args = mathArgs ++ concatMap toArgs [("page-size", getField "papersize" meta') ,("title", getField "title" meta') ,("margin-bottom", Just $ fromMaybe "1.2in" (getField "margin-bottom" meta')) ,("margin-top", Just $ fromMaybe "1.25in" (getField "margin-top" meta')) ,("margin-right", Just $ fromMaybe "1.25in" (getField "margin-right" meta')) ,("margin-left", Just $ fromMaybe "1.25in" (getField "margin-left" meta')) ,("footer-html", getField "footer-html" meta') ,("header-html", getField "header-html" meta') ] ++ pdfargs
Likely this breaks everything. The quickest and dirtiest workaround I see is to check each argument, and if it is a page level argument put it for each page object. Another solution may be to specify encoding for Pandoc document some other way, but I can't guess how to do that yet. Maybe someone have already faced similar task and knows easier way to render HTML to PDF with Haskell. I will very grateful for any help, advice or other glues how to achieve my goal. Arthur.

Could you call wkhtmltopdf directly with System.Process? Pandoc doesn't
seem to add much value here.
On Tue, Jun 9, 2020, 09:15 Geraldus
Hi dear Cafe!
I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc.
So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck.
I want to put few words about `pdflatex` and `xelatex` creators first, for someone who will struggle with same task in future, it's quite hard to find code examples on the web.
Initially I wasn't able to render document with `pdflatex` creator. I would like to mention that `pdflatex` required a lot of LaTeX stuff to be installed, especially font packages. Also I've spent several hours to make rendering happen because I haven't specified template in `WriterOptions`. `pdflatex` do not capable to handle Cyrillic Unicode characters, and finally I figured out I have to use `xelatex` creator. Also I've found and used default template:
pandoc <- readHtml def (toStrict $ renderHtml html) tpl' <- getDefaultTemplate "latex" makePDF "xelatex" [] writeLaTeX (def {writerTemplate = Just tpl'}) pandoc
But in this case I got white space instead of Cyrillic chars in resulting PDF and a bunch of warnings about missing chars in default font in console. I assume the font itself is specified in template. I've looked into default template and it's huge. I guess I can prepare more simple template for my own needs but it will take a lot of time to get familiar with LaTeX document syntax.
I've tried `wkhtmltopdf`, which seems to be lightweight and easy solution. It seemed to work well except encoding issues: resulting PDF contains Cyrillic which rendered incorrectly. I've tried to pass `["encoding utf-8"]` as arguments in `makePDF` call, but this results in runtime error:
--margin-bottom specified in incorrect location
Googling around this issue led me to glue that when I pass encoding argument to `wkhtmltopdf` it breaks expected arguments order in command which Pandoc generates. This is likely could be easily fixed, but Pandoc have a lot of opened issues on Github and also it requires some digging into `wkhtmltopdf` command line arguments syntax. I've looked into Pandoc sources and it seems possible to provide simple patch, but I need a guidance. According to `wkhtmltopdf` it distinguish global args, page args, cover args, table of contents args. `encoding` argument is page level argument, but Pandoc put extra args specified in `makePDF` after default page arguments (`pdfargs` in following code sample):
let args = mathArgs ++ concatMap toArgs [("page-size", getField "papersize" meta') ,("title", getField "title" meta') ,("margin-bottom", Just $ fromMaybe "1.2in" (getField "margin-bottom" meta')) ,("margin-top", Just $ fromMaybe "1.25in" (getField "margin-top" meta')) ,("margin-right", Just $ fromMaybe "1.25in" (getField "margin-right" meta')) ,("margin-left", Just $ fromMaybe "1.25in" (getField "margin-left" meta')) ,("footer-html", getField "footer-html" meta') ,("header-html", getField "header-html" meta') ] ++ pdfargs
Likely this breaks everything. The quickest and dirtiest workaround I see is to check each argument, and if it is a page level argument put it for each page object. Another solution may be to specify encoding for Pandoc document some other way, but I can't guess how to do that yet.
Maybe someone have already faced similar task and knows easier way to render HTML to PDF with Haskell. I will very grateful for any help, advice or other glues how to achieve my goal.
Arthur. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Thank you Patrick. This is exact solution I've came up with right after
I've sent letter to Cafe. For some reason I didn't received copy of my own
message and wasn't able to respond until someone respond first. Many
thanks. Indeed in most cases if you can ask a question you can find a
solution.
вт, 9 июн. 2020 г. в 13:29, Patrick Chilton
Could you call wkhtmltopdf directly with System.Process? Pandoc doesn't seem to add much value here.
On Tue, Jun 9, 2020, 09:15 Geraldus
wrote: Hi dear Cafe!
I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc.
So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck.
I want to put few words about `pdflatex` and `xelatex` creators first, for someone who will struggle with same task in future, it's quite hard to find code examples on the web.
Initially I wasn't able to render document with `pdflatex` creator. I would like to mention that `pdflatex` required a lot of LaTeX stuff to be installed, especially font packages. Also I've spent several hours to make rendering happen because I haven't specified template in `WriterOptions`. `pdflatex` do not capable to handle Cyrillic Unicode characters, and finally I figured out I have to use `xelatex` creator. Also I've found and used default template:
pandoc <- readHtml def (toStrict $ renderHtml html) tpl' <- getDefaultTemplate "latex" makePDF "xelatex" [] writeLaTeX (def {writerTemplate = Just tpl'}) pandoc
But in this case I got white space instead of Cyrillic chars in resulting PDF and a bunch of warnings about missing chars in default font in console. I assume the font itself is specified in template. I've looked into default template and it's huge. I guess I can prepare more simple template for my own needs but it will take a lot of time to get familiar with LaTeX document syntax.
I've tried `wkhtmltopdf`, which seems to be lightweight and easy solution. It seemed to work well except encoding issues: resulting PDF contains Cyrillic which rendered incorrectly. I've tried to pass `["encoding utf-8"]` as arguments in `makePDF` call, but this results in runtime error:
--margin-bottom specified in incorrect location
Googling around this issue led me to glue that when I pass encoding argument to `wkhtmltopdf` it breaks expected arguments order in command which Pandoc generates. This is likely could be easily fixed, but Pandoc have a lot of opened issues on Github and also it requires some digging into `wkhtmltopdf` command line arguments syntax. I've looked into Pandoc sources and it seems possible to provide simple patch, but I need a guidance. According to `wkhtmltopdf` it distinguish global args, page args, cover args, table of contents args. `encoding` argument is page level argument, but Pandoc put extra args specified in `makePDF` after default page arguments (`pdfargs` in following code sample):
let args = mathArgs ++ concatMap toArgs [("page-size", getField "papersize" meta') ,("title", getField "title" meta') ,("margin-bottom", Just $ fromMaybe "1.2in" (getField "margin-bottom" meta')) ,("margin-top", Just $ fromMaybe "1.25in" (getField "margin-top" meta')) ,("margin-right", Just $ fromMaybe "1.25in" (getField "margin-right" meta')) ,("margin-left", Just $ fromMaybe "1.25in" (getField "margin-left" meta')) ,("footer-html", getField "footer-html" meta') ,("header-html", getField "header-html" meta') ] ++ pdfargs
Likely this breaks everything. The quickest and dirtiest workaround I see is to check each argument, and if it is a page level argument put it for each page object. Another solution may be to specify encoding for Pandoc document some other way, but I can't guess how to do that yet.
Maybe someone have already faced similar task and knows easier way to render HTML to PDF with Haskell. I will very grateful for any help, advice or other glues how to achieve my goal.
Arthur. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hi Arthur, Geraldus writes:
I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc.
So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck.
I second Patrick's suggestion to use wkhtmltopdf directly. You may also want to post this question to the [pandoc-discuss] mailing list, as this appears to be more of a pandoc than a Haskell question. Cheers, Albert [pandoc-discuss]: https://groups.google.com/forum/#!forum/pandoc-discuss -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

With Pandoc, you might try using the Context engine to generate the PDF
instead of the default pdflatex.
I have no idea about Cyrillic characters or processing HTML directly, but
the Context engine seemed to work better when I wanted to process Markdown
and generate tagged PDFs for accessibility.
Conrad
H. Conrad Cunningham, D.Sc.
Professor Emeritus
Computer & Information Science
University of Mississippi, USA
On Tue, Jun 9, 2020 at 3:51 AM Albert Krewinkel
Hi Arthur,
Geraldus writes:
I'm trying to achieve trivial task to generate PDF from HTML template using Pandoc.
So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no luck.
I second Patrick's suggestion to use wkhtmltopdf directly.
You may also want to post this question to the [pandoc-discuss] mailing list, as this appears to be more of a pandoc than a Haskell question.
Cheers, Albert
participants (4)
-
Albert Krewinkel
-
Conrad Cunningham
-
Geraldus
-
Patrick Chilton