
Hi all Recently I wrote a little program that takes text file with ANSI color codes and outputs HTML file. Example input and output files (also attached): http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log.html The good thing: it is linear in time. But this is not so much surprising given the problem stated. The bad thing: it is also linear in space and have a feeling that it can run a lot faster. I decided to optimize the code and now I have two things I don't understand and one bug: 1) Profiling shows that very simple functions are source of great memory and time consumption. However, if I turn them off and simply print their input arguments instead, the overall time and memory consumption doesn't change. But now another function is acting badly. My guess: somehow the cost of Parsec code is shifted into whatever function is using it's output. Let's see: doOneFile :: String -> IO () doOneFile fname = do t1 <- getCurrentTime doesFileExist (fname ++ ".html") >>= \b -> if b then hPutStrLn stderr $ printf "File already processed, skipping: %s" fname else do src <- readFile fname out <- openFile (fname ++ ".html") WriteMode let parsed = (parse mainParser fname src) hSetBuffering out (BlockBuffering (Just 64000)) hPutStrLn out "<html>" hPutStrLn out "
" hPutStrLn out "" hPutStrLn out "<span>" execStateT (hPrintHtml (St id)) (out,emptyStyle) -- wypisujemy pierwszy wiersz execStateT (mapM_ *hPrintHtml* (((either (\x -> (trace $ show x) [] ) (id) parsed)) :: [CharOrColor])) (out,emptyStyle) -- *[1]* -- mapM_ (hPutStr out . show) (((either (\x -> (trace $ show x) [] ) (id) ((parse mainParser fname src)))) :: [CharOrColor]) -- *[2]* hPutStrLn out "</span></span>" hPutStrLn out "</body>" hPutStrLn out "</html>" t2 <- getCurrentTime hPutStrLn stderr $ printf "File %s processed. It took %s. File size was %d characters." fname (show $ diffUTCTime t2 t1) (length src) hFlush out hClose out
When profiled: beginSpan Main 32.7 39.7 hPrintHtml Main 21.1 18.4 dataParser Main 17.3 18.6 doOneFile Main 10.4 7.4 colorParser Main 8.7 8.8 justColor Main 3.2 2.0 mainParser Main 3.2 1.9 code0' Main 1.1 1.1 Ok, let's exchange [1] for [2]. Now: doOneFile Main 53.4 59.9 dataParser Main 21.6 21.7 colorParser Main 13.6 10.3 justColor Main 3.8 2.3 mainParser Main 3.6 2.2 code0' Main 1.3 1.2 2) I tried to use both of my processor cores. But simply typing spawning a thread for every file quickly lead to swapping. So instead of this I implemented "thread pool" with fixed number of threads. -- begin code -- type Pool = Chan () takeFromPool :: Pool -> IO () takeFromPool p = readChan p >> return () fillPool :: Pool -> IO () fillPool p = writeChan p () makeThreadPool :: Int -> IO Pool makeThreadPool num = do p <- newChan repeatNum num (fillPool p) return p repeatNum :: Int -> IO () -> IO () repeatNum n act | n > 0 = act >> (repeatNum (n-1) act) | otherwise = return () sparkComp :: Pool -> IO () -> IO (MVar ()) sparkComp pool comp = do takeFromPool pool >>= evaluate mvar <- newEmptyMVar >>= evaluate forkIO $ (comp >> fillPool pool >> putMVar mvar ()) >>= evaluate -- core dumps when changed do forkOS return mvar mapMPar :: (a -> IO ()) -> [a] -> Int -> IO () mapMPar comp lst numT = do tPool <- makeThreadPool numT mvars <- mapM (sparkComp tPool) (map comp lst) mapM_ takeMVar (mvars :: [MVar ()]) return () numThreads = 2 -- end code -- Now, when compiled with -threaded and run with -N2 the program is not faster, but this may be result of -threaded switch. The real puzzle for me: without -threaded switch code using mapMPar is 30% faster then using mapM_, even when numThreads = 1. I have no idea why. 3) When I changed my thread pool implementation to use forkOS instead of forkIO it core dumped when run with -N2 and multiple file input. With -N1 it was ok. Unfortunately I was unable to reproduce this bug after I changed something unrelated to above code. I'm working on GHC version 6.9.20080622, Windows XP. Perhaps someone will succeed reproducing it. Do you have any ideas about this program? Best regards Christopher Skrzętnicki