Converting wiki pages into pdf

8 Sep 2011

      Hello all
I am trying to write a Haskell program which download html pages from
wikipedia   including images and convert them into pdf . I wrote a
small script

import Network.HTTP
import Data.Maybe
import Data.List

main = do
        x <- getLine
        htmlpage <-  getResponseBody =<< simpleHTTP ( getRequest x ) --
open url
        --print.words $ htmlpage
        let ind_1 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
tails $ htmlpage ) $ "<!-- content -->"
            ind_2 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
tails $ htmlpage ) $ "<!-- /content -->"
            tmphtml = drop ind_1 $ take ind_2  htmlpage
        writeFile "down.html" tmphtml

and its working fine except some symbols are not rendering as it
should be. Could some one please suggest me how to accomplish this
task.

Thank you
Mukesh Tiwari

mukesh tiwari

Max Rabkin

mukesh tiwari

mukesh tiwari

Daniel Patterson

mukesh tiwari

Conrad Parker

Kyle Murphy

Matti Oinas

mukesh tiwari

Michael Snoyman

tags

participants (7)