Hi all
Recently I wrote a little program that takes text file with ANSI color codes and outputs HTML file. Example input and output files (also attached):
http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log
http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log.html
The good thing: it is linear in time. But this is not so much surprising given the problem stated. The bad thing: it is also linear in space and have a feeling that it can run a lot faster.
I decided to optimize the code and now I have two things I don't understand and one bug:
1) Profiling shows that very simple functions are source of great memory and time consumption. However, if I turn them off and simply print their input arguments instead, the overall time and memory consumption doesn't change. But now another function is acting badly. My guess: somehow the cost of Parsec code is shifted into whatever function is using it's output. Let's see:
doOneFile :: String -> IO ()
doOneFile fname = do
t1 <- getCurrentTime
doesFileExist (fname ++ ".html") >>= \b -> if b then hPutStrLn stderr $ printf "File already processed, skipping: %s" fname else do
src <- readFile fname
out <- openFile (fname ++ ".html") WriteMode
let parsed = (parse mainParser fname src)
hSetBuffering out (BlockBuffering (Just 64000))
hPutStrLn out "<html>"
hPutStrLn out "<body bgcolor=\"black\">"
hPutStrLn out "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">"
hPutStrLn out "<span style=\"font-family: monospace; font-size: 13;\" ><span>"
execStateT (hPrintHtml (St id)) (out,emptyStyle) -- wypisujemy pierwszy wiersz
execStateT (mapM_ hPrintHtml (((either (\x -> (trace $ show x) [] ) (id) parsed)) :: [CharOrColor])) (out,emptyStyle) -- [1]
-- mapM_ (hPutStr out . show) (((either (\x -> (trace $ show x) [] ) (id) ((parse mainParser fname src)))) :: [CharOrColor]) -- [2]
hPutStrLn out "</span></span>"
hPutStrLn out "</body>"
hPutStrLn out "</html>"
t2 <- getCurrentTime
hPutStrLn stderr $ printf "File %s processed. It took %s. File size was %d characters." fname (show $ diffUTCTime t2 t1) (length src)
hFlush out
hClose out
When profiled:
beginSpan Main 32.7 39.7
hPrintHtml Main 21.1 18.4
dataParser Main 17.3 18.6
doOneFile Main 10.4 7.4
colorParser Main 8.7 8.8
justColor Main 3.2 2.0
mainParser Main 3.2 1.9
code0' Main 1.1 1.1
Ok, let's exchange [1] for [2]. Now:
doOneFile Main 53.4 59.9
dataParser Main 21.6 21.7
colorParser Main 13.6 10.3
justColor Main 3.8 2.3
mainParser Main 3.6 2.2
code0' Main 1.3 1.2
2) I tried to use both of my processor cores. But simply typing spawning a thread for every file quickly lead to swapping. So instead of this I implemented "thread pool" with fixed number of threads.
-- begin code --
type Pool = Chan ()
takeFromPool :: Pool -> IO ()
takeFromPool p = readChan p >> return ()
fillPool :: Pool -> IO ()
fillPool p = writeChan p ()
makeThreadPool :: Int -> IO Pool
makeThreadPool num = do
p <- newChan
repeatNum num (fillPool p)
return p
repeatNum :: Int -> IO () -> IO ()
repeatNum n act | n > 0 = act >> (repeatNum (n-1) act)
| otherwise = return ()
sparkComp :: Pool -> IO () -> IO (MVar ())
sparkComp pool comp = do
takeFromPool pool >>= evaluate
mvar <- newEmptyMVar >>= evaluate
forkIO $ (comp >> fillPool pool >> putMVar mvar ()) >>= evaluate -- core dumps when changed do forkOS
return mvar
mapMPar :: (a -> IO ()) -> [a] -> Int -> IO ()
mapMPar comp lst numT = do
tPool <- makeThreadPool numT
mvars <- mapM (sparkComp tPool) (map comp lst)
mapM_ takeMVar (mvars :: [MVar ()])
return ()
numThreads = 2
-- end code --
Now, when compiled with -threaded and run with -N2 the program is not faster, but this may be result of -threaded switch. The real puzzle for me: without -threaded switch code using mapMPar is 30% faster then using mapM_, even when numThreads = 1. I have no idea why.
3) When I changed my thread pool implementation to use forkOS instead of forkIO it core dumped when run with -N2 and multiple file input. With -N1 it was ok. Unfortunately I was unable to reproduce this bug after I changed something unrelated to above code. I'm working on GHC version 6.9.20080622, Windows XP. Perhaps someone will succeed reproducing it.
Do you have any ideas about this program?
Best regards
Christopher Skrzêtnicki