Dear all,
I'm trying to implement a simple file-based database. I apparently have
a space leak, but I have no clue where it comes from.
Here's the file-based database implementation:
http://pastebin.com/QqiqcXFw
The idea to have a database table in a single textual file. One line
equals one table row. The fields within a row are whitespace separated.
The first field is the key. Because I'd like to work with large files, I
don't want to load the whole file into memory. Instead, I'd like to be
able to fetch the rows on demand, by keys. Thus I first create an index
that links keys to file seeks. I use the readerT to add the index to the
IO monad.
For testing, I use a dummy table produced as follows:
import System.IO
import Text.Printf
import Control.Monad
row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]
main = do
forM_ [1..250000] $ \i ->
putStrLn $ printf "row%06d %s" (i::Int) row
This generates a 2.1G textual file, which I store on my disk.
The testing code:
import FileDB
import qualified Data.Text as T
import Text.Printf
import Control.Applicative
import Control.Monad
import Control.Monad.Trans
import System.IO
import System.Environment
main = do
(f:_) <- getArgs
t <- openTable f
runDB t $ do
ks <- getKeys
liftIO $ do
putStrLn . printf "%d keys read" $ length ks
putStrLn "Press any key to continue..."
getChar
forM_ ks $ \k -> do
Just r <- getRow k
liftIO . putStrLn $ printf "Row \"%s\" has %d fields"
(T.unpack k) (length r)
When I run the test on the 2.1GB file, the whole program consumes 10GB.
6GB seem to be allocated after the index is built (just before entering
the forM_ function). The remaining 4GB are allocated while fetching all
the rows.
I find both things difficult to explain.
6GB seems too much for the index. Each key is 9 characters (stored as
Data.Text), and I have 250K such keys in a Data.Map. Should this really
add up to 6GB?
Also, I have no idea why fetching all the rows, one by one, should
consume any additional memory. Each row is fetched and its length is
computed and printed out. I see no reason for the rows to be retained in
the memory.
Here's the memory allocation summary:
> 1,093,931,338,632 bytes allocated in the heap
> 2,225,144,704 bytes copied during GC
> 4,533,898,000 bytes maximum residency (26 sample(s))
> 3,080,926,336 bytes maximum slop
> 10004 MB total memory in use (0 MB lost due to fragmentation)
>
> Tot time (elapsed) Avg pause Max
pause
> Gen 0 2171739 colls, 0 par 45.29s 45.26s 0.0000s
0.0030s
> Gen 1 26 colls, 0 par 1.50s 1.53s 0.0589s
0.7087s
>
> INIT time 0.00s ( 0.00s elapsed)
> MUT time 279.92s (284.85s elapsed)
> GC time 46.80s ( 46.79s elapsed)
> EXIT time 0.68s ( 0.71s elapsed)
> Total time 327.40s (332.35s elapsed)
>
> %GC time 14.3% (14.1% elapsed)
>
> Alloc rate 3,908,073,170 bytes per MUT second
>
> Productivity 85.7% of total user, 84.4% of total elapsed
Btw., I don't get the "bytes allocated in the heap" figure, which is
approx. 1000 GB (?).
I'm obviously doing something wrong here. I'd be thankful for any help.
Best,
Jan
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners