Here is the part that actually changed:
-----------
split c str = let (p,ps) = aux str in (p:ps)
where
aux [] = ([],[])
aux (x:cs) = let (xs,xss) = aux cs in
if x == c then ([c],(xs:xss)) else ((x:xs),xss)
splitPred :: (Eq a) => (a -> Bool) -> [a] -> [[a]]
splitPred pr str = let (p,ps) = aux str in (p:ps)
where
aux [] = ([],[])
aux (x:cs) = let (xs,xss) = aux cs in
if pr x then ([],((x:xs):xss)) else ((x:xs),xss)
doOneFile :: String -> IO ()
doOneFile fname = do
t1 <- getCurrentTime
doesFileExist (fname ++ ".html") >>= \b -> if b then hPutStrLn stderr $ printf "File already processed, skipping: %s" fname else do
src <- readFile fname
out <- openFile (fname ++ ".html") WriteMode
hSetBuffering out (BlockBuffering (Just 64000))
hPutStrLn out "<html>"
hPutStrLn out "<body bgcolor=\"black\">"
hPutStrLn out "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">"
hPutStrLn out "<span style=\"font-family: monospace; font-size: 13;\" ><span>"
let extractData = \p -> case p of
Right x -> x
Left err -> (trace . show $ err) []
let srcSplit = splitPred (`elem`"\n") src
let parsed = concatMap (extractData . parse mainParser fname) srcSplit
execStateT (hPrintHtml (St id)) (out,emptyStyle) -- wypisujemy pierwszy wiersz
execStateT (mapM_ hPrintHtml parsed) (out,emptyStyle)
hPutStrLn out "</span></span>"
hPutStrLn out "</body>"
hPutStrLn out "</html>"
t2 <- getCurrentTime
hPutStrLn stderr $ printf "File %s processed. It took %s. File size was %d characters." fname (show $ diffUTCTime t2 t1) (length src)
hClose out
------
The whole file is also attached. You will find there another (worse) implementation of split and a little bit of code similar to thread pool stuff.
On Sun, Jul 20, 2008 at 8:17 AM, John Meacham <
john@repetae.net> wrote:
On Sun, Jul 20, 2008 at 02:34:09AM +0400, Bulat Ziganshin wrote:
> i think that Parsec library should hold entire file in memory only when
> you use 'try' for whole file. otherwise it should omit data as
> proceeded
I do not believe that is the case, since the return type of runParser
"Either ParseError a" means that before you can extract the result of
the parse from the 'Right' branch, it must evaluate whether the result
is 'Left' or 'Right' meaning it needs to parse the whole input in order
to determine whether the parse was succesful.
It's true it has to parse the whole file, but it is not true it has to
reside in the memory: only the results must be there. In this case,
when the result is 1-1 transformation of input, it is true. But
consider this program:
----
module Main where
import Text.ParserCombinators.Parsec
par = eof <|> (char 'a' >> par)
alst = take 200000000 (repeat 'a')
main = print (runParser par () "" alst)
----
It runs in constant memory:
$ ./partest.exe +RTS -sstderr
C:\cygwin\home\Metharius\killer\killerPy\ansi2html\partest.exe +RTS -sstderr
Right ()
84,326,845,636 bytes allocated in the heap
22,428,536 bytes copied during GC
9,684 bytes maximum residency (1 sample(s))
13,848 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 160845 collections, 0 parallel, 0.63s, 0.63s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.02s ( 0.00s elapsed)
MUT time 54.31s ( 54.55s elapsed)
GC time 0.63s ( 0.63s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 54.95s ( 55.17s elapsed)
%GC time 1.1% (1.1% elapsed)
Alloc rate 1,552,176,623 bytes per MUT second
Productivity 98.8% of total user, 98.4% of total elapsed
Best regards
Christopher Skrzętnicki