Hi all

Recently I wrote a little program that takes text file with ANSI color codes and outputs HTML file. Example input and output files (also attached):

http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log
http://tener.videomed.com.pl/ansi2html/samo_pk_merusa.log.html

The good thing: it is linear in time. But this is not so much surprising given the problem stated. The bad thing: it is also linear in space and have a feeling that it can run a lot faster.
I decided to optimize the code and now I have two things I don't understand and one bug:

1) Profiling shows that very simple functions are source of great memory and time consumption. However, if I turn them off and simply print their input arguments instead, the overall time and memory consumption doesn't change. But now another function is acting badly. My guess: somehow the cost of Parsec code is shifted into whatever function is using it's output. Let's see:

doOneFile :: String -> IO ()
doOneFile fname = do
  t1 <- getCurrentTime
  doesFileExist (fname ++ ".html") >>= \b -> if b then hPutStrLn stderr $ printf "File already processed, skipping: %s" fname else do
    src <- readFile fname
    out <- openFile (fname ++ ".html") WriteMode
    let parsed = (parse mainParser fname src)
    hSetBuffering out (BlockBuffering (Just 64000))
    hPutStrLn out "<html>"
    hPutStrLn out "<body bgcolor=\"black\">"
    hPutStrLn out "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">"
    hPutStrLn out "<span style=\"font-family: monospace; font-size: 13;\" ><span>"
    execStateT (hPrintHtml (St id)) (out,emptyStyle) -- wypisujemy pierwszy wiersz
    execStateT (mapM_ hPrintHtml (((either (\x -> (trace $ show x) [] ) (id) parsed)) :: [CharOrColor])) (out,emptyStyle) -- [1]
--    mapM_ (hPutStr out . show) (((either (\x -> (trace $ show x) [] ) (id) ((parse mainParser fname src)))) :: [CharOrColor]) -- [2]
    hPutStrLn out "</span></span>"
    hPutStrLn out "</body>"
    hPutStrLn out "</html>"
    t2 <- getCurrentTime
    hPutStrLn stderr $ printf "File %s processed. It took %s. File size was %d characters." fname (show $ diffUTCTime t2 t1) (length src)
    hFlush out
    hClose out

When profiled:

beginSpan                      Main                  32.7   39.7
hPrintHtml                     Main                  21.1   18.4
dataParser                     Main                  17.3   18.6
doOneFile                      Main                  10.4    7.4
colorParser                    Main                   8.7    8.8
justColor                      Main                   3.2    2.0
mainParser                     Main                   3.2    1.9
code0'                         Main                   1.1    1.1

Ok, let's exchange [1] for [2]. Now:

doOneFile                      Main                  53.4   59.9
dataParser                     Main                  21.6   21.7
colorParser                    Main                  13.6   10.3
justColor                      Main                   3.8    2.3
mainParser                     Main                   3.6    2.2
code0'                         Main                   1.3    1.2


2) I tried to use both of my processor cores. But simply typing spawning a thread for every file quickly lead to swapping. So instead of this I implemented "thread pool" with fixed number of threads.

-- begin code --
type Pool = Chan ()

takeFromPool :: Pool -> IO ()
takeFromPool p = readChan p >> return ()

fillPool :: Pool -> IO ()
fillPool p = writeChan p ()

makeThreadPool :: Int -> IO Pool
makeThreadPool num = do
  p <- newChan
  repeatNum num (fillPool p)
  return p

repeatNum :: Int -> IO () -> IO ()
repeatNum n act | n > 0     = act >> (repeatNum (n-1) act)
                | otherwise = return ()

sparkComp :: Pool -> IO () -> IO (MVar ())
sparkComp pool comp = do
  takeFromPool pool  >>= evaluate
  mvar <- newEmptyMVar  >>= evaluate
  forkIO $ (comp >> fillPool pool >> putMVar mvar ()) >>= evaluate -- core dumps when changed do forkOS
  return mvar

mapMPar :: (a -> IO ()) -> [a] -> Int -> IO ()
mapMPar comp lst numT = do
  tPool <- makeThreadPool numT
  mvars <- mapM (sparkComp tPool) (map comp lst)
  mapM_ takeMVar (mvars :: [MVar ()])
  return ()

numThreads = 2
-- end code --

Now, when compiled with -threaded and run with -N2 the program is not faster, but this may be result of -threaded switch. The real puzzle for me: without -threaded switch code using mapMPar is 30% faster then using mapM_, even when numThreads = 1. I have no idea why.

3) When I changed my thread pool implementation to use forkOS instead of forkIO it core dumped when run with -N2 and multiple file input. With -N1 it was ok. Unfortunately I was unable to reproduce this bug after I changed something unrelated to above code. I'm working on  GHC version 6.9.20080622, Windows XP. Perhaps someone will succeed reproducing it.


Do you have any ideas about this program?

Best regards

Christopher Skrzêtnicki