filesystem verification utility

Hi, I had a requirement to generate various kinds of I/O patterns on a filesystem and subsequently verify this. The initial version of the program below implements a random I/O pattern with multiple threads. Even when the number of I/O is as small as 200 and 10 concurrent theads, the amount of memory used is huge. When I run the program it consumes close to 1to2GB memory. Moreover the rate at which it generates the I/O is very low which is not good for testing a filesystem. I have used System.POSIX.IO but I tried System.IO and did not see much difference either. I would appreciate help in identifying ways to improve this. {-# OPTIONS -fglasgow-exts #-} import System.Random import Data.List import Monad import System.Posix.IO import System.Posix.Types import Data.Time.Clock import System.Posix.Files import Data.Maybe import GHC.IO.Device (SeekMode(..)) import Control.Exception import Data.Typeable import System import Control.Concurrent --myrandomRlist :: (Num t, Random t) => t -> IO [t] --myrandomRlist x = liftM (randomRs (0,x)) newStdGen myrandomRlist :: (Num t, Random t) => t ->t -> StdGen -> [t] myrandomRlist min max seed = randomRs (min,max) seed data IoLoc = IoLoc {offset :: FileOffset, size::ByteCount, num::Int } deriving (Show, Typeable) instance Exception IoLoc data Corrupt = Corrupt String IoLoc deriving (Show, Typeable) instance Exception Corrupt data FileHdr = FileHdr {fileName::FilePath, seed::StdGen, minIoSize::Int, maxIoSize::Int, ioCount::Int} deriving (Show) data FileIO = FileIO {fd::Fd, params::FileHdr, fileData::[IoLoc]} genPattern :: FilePath -> IoLoc -> String genPattern f l = take (read $ show $size l) (cycle $ "(" ++ f ++ ")" ++ "[" ++ (show (offset l)) ++ ":" ++ (show (size l)) ++ (show (num l)) ++ "]") doHdrRead :: Fd -> IO [Char] doHdrRead fd = do (str, count) <- fdRead fd 100 return (takeWhile (\x-> not (x == '\n')) str) doGetHdr :: Fd -> IO FileHdr doGetHdr fd = do file <- doHdrRead fd seed <- doHdrRead fd minIoSize <- doHdrRead fd maxIoSize <- doHdrRead fd ioCount <- doHdrRead fd return (FileHdr file (read seed) (read minIoSize) (read maxIoSize) (read ioCount)) doHdrFromFile :: FilePath -> IO FileIO doHdrFromFile name = do fd <- openFd name ReadWrite (Just ownerModes) (OpenFileFlags {append=False, exclusive=False, noctty=True, nonBlock=False, trunc=False}) hdr <- doGetHdr fd return (FileIO fd hdr []) doHdrWrite :: Fd -> [Char] -> IO ByteCount doHdrWrite fd str = do fdWrite fd (take 100 (str ++ "\n" ++ (cycle ['\0']))) doWriteHdr :: Fd -> FileHdr -> IO ByteCount doWriteHdr fd hdr = do doHdrWrite fd (show (fileName hdr)) doHdrWrite fd (show (seed hdr)) doHdrWrite fd (show (minIoSize hdr)) doHdrWrite fd (show (maxIoSize hdr)) doHdrWrite fd (show (ioCount hdr)) doWriteFile :: FilePath -> IO FileHdr doWriteFile name = do fd <- openFd name ReadWrite (Just ownerModes) (OpenFileFlags {append=False, exclusive=False, noctty=True, nonBlock=False, trunc=True}) seed <- newStdGen hdr <- return (FileHdr name seed 4096 (2*8096) 200) doWriteHdr fd hdr return hdr overLap (IoLoc off1' sz1' num1) (IoLoc off2' sz2' num2) = ((off1 > off2) && (off1 < off2 +sz2)) ||((off1+sz1 >off2) &&(off1+sz1 < off2 +sz2)) where off1 = read (show off1') sz1 = read (show sz1') off2 = read (show off2') sz2 = read (show sz2') genIoList1 :: [IoLoc] -> [Int] -> [Int] -> Int -> Int -> [IoLoc] genIoList1 list offset size _ 0 = list genIoList1 list offset size 0 _ = list genIoList1 list (offset:os) (size:ss) count bound = if isNothing $ find (overLap x) list then genIoList1 ([x] ++ list) os ss (count - 1) (bound -1) else genIoList1 list os ss count (bound -1) where x = IoLoc (read (show offset)) (read(show size)) count -- offset = (read (show offset1)) -- size = (read(show size1)) genIoList :: FileHdr -> [IoLoc] genIoList (FileHdr name seed min max count) = genIoList1 [] (myrandomRlist 4000 1099511627776 seed) (myrandomRlist min max seed) count (count*2) doActualIo :: Fd -> IoLoc -> String -> IO () doActualIo fd (IoLoc off sz num) str = do off <- fdSeek fd AbsoluteSeek off fdWrite fd str return () doVerifyIo :: Fd -> IoLoc -> String -> IO () doVerifyIo fd (IoLoc off sz num) str = do off <- fdSeek fd AbsoluteSeek off (filedata, count) <- fdRead fd sz if str == filedata then return () else throwIO (Corrupt ("Data corruption in #" ++ (take 200 str) ++ "#" ++ (take 200 filedata))(IoLoc off sz num)) mainWrite file = do hdr <- doWriteFile file hdrIO <- doHdrFromFile file iolst <- return $ genIoList hdr app <- return $ zip iolst (map (genPattern (fileName hdr)) iolst) putStrLn (fileName hdr) mapM (\(x,y)-> doActualIo (fd hdrIO) x y) app mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app return hdr mainVerify file = do hdrIO <- doHdrFromFile file hdr <- return (params hdrIO) iolst <- return $ genIoList hdr file <- return $ filter (\x->not ( x=='"')) (fileName hdr) app <- return $ zip iolst (map (genPattern file) iolst) putStrLn $ filter (\x->not ( x=='"')) (fileName hdr) mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app return hdr main = do x <- getArgs if (length x) == 3 then do main1 else do putStrLn "USAGE:\nfile-io write/verify <full-path file name> <number of threads> \ \\n\n\ \Simple IO load generator with write verification. This utility is designed\n\ \to generate multi-threaded IO load which will write a pattern to the file. \n\ \When this is invoked with the same parameters with the verify option \n\ \the data written will be verified.\n\ \ " main1 = do [op, name, numProcs] <- getArgs m <- newEmptyMVar n <- return $read numProcs case op of "write" -> do mapM forkIO [(fillfile (name ++ (show i)) ) m|i<-[1..n]] "verify" -> do mapM forkIO [(verifyfile (name ++ (show i)) ) m|i<-[1..n]] x <- mapM takeMVar $ take (read numProcs) $ repeat m putStrLn $ show $ and x fillfile filename m = do mainWrite filename putMVar m True verifyfile filename m = do mainVerify filename putMVar m True

I don't know much, but you're using show, read, and String. These may be
part of your problem.
On Mon, Jan 10, 2011 at 2:32 PM, Anand Mitra
Hi,
I had a requirement to generate various kinds of I/O patterns on a filesystem and subsequently verify this. The initial version of the program below implements a random I/O pattern with multiple threads. Even when the number of I/O is as small as 200 and 10 concurrent theads, the amount of memory used is huge. When I run the program it consumes close to 1to2GB memory. Moreover the rate at which it generates the I/O is very low which is not good for testing a filesystem. I have used System.POSIX.IO but I tried System.IO and did not see much difference either. I would appreciate help in identifying ways to improve this.
{-# OPTIONS -fglasgow-exts #-} import System.Random import Data.List import Monad import System.Posix.IO import System.Posix.Types import Data.Time.Clock import System.Posix.Files import Data.Maybe import GHC.IO.Device (SeekMode(..)) import Control.Exception import Data.Typeable import System import Control.Concurrent
--myrandomRlist :: (Num t, Random t) => t -> IO [t] --myrandomRlist x = liftM (randomRs (0,x)) newStdGen
myrandomRlist :: (Num t, Random t) => t ->t -> StdGen -> [t] myrandomRlist min max seed = randomRs (min,max) seed
data IoLoc = IoLoc {offset :: FileOffset, size::ByteCount, num::Int } deriving (Show, Typeable) instance Exception IoLoc
data Corrupt = Corrupt String IoLoc deriving (Show, Typeable) instance Exception Corrupt
data FileHdr = FileHdr {fileName::FilePath, seed::StdGen, minIoSize::Int, maxIoSize::Int, ioCount::Int} deriving (Show)
data FileIO = FileIO {fd::Fd, params::FileHdr, fileData::[IoLoc]}
genPattern :: FilePath -> IoLoc -> String genPattern f l = take (read $ show $size l) (cycle $ "(" ++ f ++ ")" ++ "[" ++ (show (offset l)) ++ ":" ++ (show (size l)) ++ (show (num l)) ++ "]")
doHdrRead :: Fd -> IO [Char] doHdrRead fd = do (str, count) <- fdRead fd 100 return (takeWhile (\x-> not (x == '\n')) str)
doGetHdr :: Fd -> IO FileHdr doGetHdr fd = do file <- doHdrRead fd seed <- doHdrRead fd minIoSize <- doHdrRead fd maxIoSize <- doHdrRead fd ioCount <- doHdrRead fd return (FileHdr file (read seed) (read minIoSize) (read maxIoSize) (read ioCount))
doHdrFromFile :: FilePath -> IO FileIO doHdrFromFile name = do fd <- openFd name ReadWrite (Just ownerModes) (OpenFileFlags {append=False, exclusive=False, noctty=True, nonBlock=False, trunc=False}) hdr <- doGetHdr fd return (FileIO fd hdr [])
doHdrWrite :: Fd -> [Char] -> IO ByteCount doHdrWrite fd str = do fdWrite fd (take 100 (str ++ "\n" ++ (cycle ['\0'])))
doWriteHdr :: Fd -> FileHdr -> IO ByteCount doWriteHdr fd hdr = do doHdrWrite fd (show (fileName hdr)) doHdrWrite fd (show (seed hdr)) doHdrWrite fd (show (minIoSize hdr)) doHdrWrite fd (show (maxIoSize hdr)) doHdrWrite fd (show (ioCount hdr))
doWriteFile :: FilePath -> IO FileHdr doWriteFile name = do fd <- openFd name ReadWrite (Just ownerModes) (OpenFileFlags {append=False, exclusive=False, noctty=True, nonBlock=False, trunc=True}) seed <- newStdGen hdr <- return (FileHdr name seed 4096 (2*8096) 200) doWriteHdr fd hdr return hdr
overLap (IoLoc off1' sz1' num1) (IoLoc off2' sz2' num2) = ((off1 > off2) && (off1 < off2 +sz2)) ||((off1+sz1 >off2) &&(off1+sz1 < off2 +sz2)) where off1 = read (show off1') sz1 = read (show sz1') off2 = read (show off2') sz2 = read (show sz2')
genIoList1 :: [IoLoc] -> [Int] -> [Int] -> Int -> Int -> [IoLoc]
genIoList1 list offset size _ 0 = list genIoList1 list offset size 0 _ = list
genIoList1 list (offset:os) (size:ss) count bound = if isNothing $ find (overLap x) list then genIoList1 ([x] ++ list) os ss (count - 1) (bound -1) else genIoList1 list os ss count (bound -1) where x = IoLoc (read (show offset)) (read(show size)) count -- offset = (read (show offset1)) -- size = (read(show size1))
genIoList :: FileHdr -> [IoLoc] genIoList (FileHdr name seed min max count) = genIoList1 [] (myrandomRlist 4000 1099511627776 seed) (myrandomRlist min max seed) count (count*2)
doActualIo :: Fd -> IoLoc -> String -> IO () doActualIo fd (IoLoc off sz num) str = do off <- fdSeek fd AbsoluteSeek off fdWrite fd str return ()
doVerifyIo :: Fd -> IoLoc -> String -> IO () doVerifyIo fd (IoLoc off sz num) str = do off <- fdSeek fd AbsoluteSeek off (filedata, count) <- fdRead fd sz if str == filedata then return () else throwIO (Corrupt ("Data corruption in #" ++ (take 200 str) ++ "#" ++ (take 200 filedata))(IoLoc off sz num))
mainWrite file = do hdr <- doWriteFile file hdrIO <- doHdrFromFile file iolst <- return $ genIoList hdr app <- return $ zip iolst (map (genPattern (fileName hdr)) iolst) putStrLn (fileName hdr) mapM (\(x,y)-> doActualIo (fd hdrIO) x y) app mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app return hdr
mainVerify file = do hdrIO <- doHdrFromFile file hdr <- return (params hdrIO) iolst <- return $ genIoList hdr file <- return $ filter (\x->not ( x=='"')) (fileName hdr) app <- return $ zip iolst (map (genPattern file) iolst) putStrLn $ filter (\x->not ( x=='"')) (fileName hdr) mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app return hdr
main = do x <- getArgs if (length x) == 3 then do main1 else do putStrLn "USAGE:\nfile-io write/verify <full-path file name> <number of threads> \ \\n\n\ \Simple IO load generator with write verification. This utility is designed\n\ \to generate multi-threaded IO load which will write a pattern to the file. \n\ \When this is invoked with the same parameters with the verify option \n\ \the data written will be verified.\n\ \ "
main1 = do [op, name, numProcs] <- getArgs m <- newEmptyMVar n <- return $read numProcs case op of "write" -> do mapM forkIO [(fillfile (name ++ (show i)) ) m|i<-[1..n]] "verify" -> do mapM forkIO [(verifyfile (name ++ (show i)) ) m|i<-[1..n]] x <- mapM takeMVar $ take (read numProcs) $ repeat m putStrLn $ show $ and x
fillfile filename m = do mainWrite filename putMVar m True
verifyfile filename m = do mainVerify filename putMVar m True
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- Alex R

On 10 January 2011 22:26, Stephen Tetley
And append (++) inside the genPattern function...
Changing to use "functional strings" which have the ShowS type is the obvious way to avoid the cost of (++) append. There is a library on Hackage that provides many utility functions for a wrapped ShowS type and an IsString instance so you can still use string literals. http://hackage.haskell.org/package/dstring However you might be better using one of the alternative String packages such as Text or ByteString.

Based on feedback I inferred that the huge memory usage was mostly
because of the String handling in the patern generation. To make it
more efficient I have used Data.ByteString.Lazy.Char8
but now I get the following error when I execute
stress: tmp/asdf-1: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-3: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-5: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-2: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-7: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-10: hPutBuf: illegal operation (handle is closed)
I have explicitly remove all calls to close despite this I get this
error. I tried both version of seek the fdSeek as well as hSeek. The
documentation on hSeek had a confusing comment that "The offset i is
given in terms of 8-bit bytes"
At this point I am unable to understand how the handle is getting
explicitly closed.
{-# OPTIONS -fglasgow-exts #-}
import Data.Int
import qualified Data.ByteString.Lazy.Char8 as L
import System.Random
import Data.List
import Monad
import System.Posix.IO
import System.Posix.Types
import Data.Time.Clock
import System.Posix.Files
import Data.Maybe
import GHC.IO.Device (SeekMode(..))
import Control.Exception
import Data.Typeable
import System
import Control.Concurrent
import System.IO
--myrandomRlist :: (Num t, Random t) => t -> IO [t]
--myrandomRlist x = liftM (randomRs (0,x)) newStdGen
myrandomRlist :: (Num t, Random t) => t ->t -> StdGen -> [t]
myrandomRlist min max seed = randomRs (min,max) seed
data IoLoc = IoLoc {offset ::FileOffset, size::Int, num::Int }
deriving (Show, Typeable)
instance Exception IoLoc
data Corrupt = Corrupt String IoLoc
deriving (Show, Typeable)
instance Exception Corrupt
data FileHdr = FileHdr {fileName::FilePath, seed::StdGen,
minIoSize::Int, maxIoSize::Int, ioCount::Int}
deriving (Show)
data FileIO = FileIO {fd::Handle, params::FileHdr, fileData::[IoLoc]}
check hdl = do
closed <- hIsClosed hdl
if closed
then do
putStrLn $ "file handle was closed" ++ (show hdl)
else do
return ()
genPattern :: FilePath -> IoLoc -> L.ByteString
genPattern f l =
L.take (fromIntegral (size l)) (L.cycle $ L.pack ("(" ++ f ++
")" ++ "[" ++ (show (offset l)) ++ ":" ++ (show (size l)) ++ (show
(num l)) ++ "]"))
doHdrRead :: Handle -> IO [Char]
doHdrRead x = do
fd <- handleToFd x
(str, count) <- fdRead fd 100
return (takeWhile (\x-> not (x == '\n')) str)
doGetHdr :: Handle -> IO FileHdr
doGetHdr fd = do
file <- doHdrRead fd
seed <- doHdrRead fd
minIoSize <- doHdrRead fd
maxIoSize <- doHdrRead fd
ioCount <- doHdrRead fd
return (FileHdr file (read seed) (read minIoSize) (read maxIoSize)
(read ioCount))
doHdrFromFile :: Handle -> IO FileIO
doHdrFromFile fd = do
-- fd <- openBinaryFile name ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=False})
hSeek fd AbsoluteSeek 0
hdr <- doGetHdr fd
return (FileIO fd hdr [])
doHdrWrite :: Handle -> [Char] -> IO ()
doHdrWrite fd str = do
hPutStr fd (take 100 (str ++ "\n" ++ (cycle ['\0'])))
doWriteHdr :: Handle -> FileHdr -> IO ()
doWriteHdr fd hdr = do
doHdrWrite fd (show (fileName hdr))
doHdrWrite fd (show (seed hdr))
doHdrWrite fd (show (minIoSize hdr))
doHdrWrite fd (show (maxIoSize hdr))
doHdrWrite fd (show (ioCount hdr))
doWriteFile :: FilePath -> IO FileIO
doWriteFile name = do
fd <- openBinaryFile name ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=True})
seed <- newStdGen
hdr <- return (FileHdr name seed 4096 (2*8096) 200)
doWriteHdr fd hdr
return (FileIO fd hdr [])
overLap (IoLoc off1' sz1' num1) (IoLoc off2' sz2' num2) =
((off1 > off2) && (off1 < off2 +sz2)) ||((off1+sz1 >off2)
&&(off1+sz1 < off2 +sz2))
where
off1 = read (show off1')
sz1 = read (show sz1')
off2 = read (show off2')
sz2 = read (show sz2')
genIoList1 :: [IoLoc] -> [Int] -> [Int] -> Int -> Int -> [IoLoc]
genIoList1 list offset size _ 0 =
list
genIoList1 list offset size 0 _ =
list
genIoList1 list (offset:os) (size:ss) count bound =
if isNothing $ find (overLap x) list
then genIoList1 ([x] ++ list) os ss (count - 1) (bound -1)
else genIoList1 list os ss count (bound -1)
where
x = IoLoc (read (show offset)) (read(show size)) count
-- offset = (read (show offset1))
-- size = (read(show size1))
genIoList :: FileHdr -> [IoLoc]
genIoList (FileHdr name seed min max count) =
genIoList1 [] (myrandomRlist 4000 1099511627776 seed)
(myrandomRlist min max seed) count (count*2)
doActualIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doActualIo fd (IoLoc off sz num) str = do
-- hSeek fd AbsoluteSeek off
rfd <- (handleToFd fd)
fdSeek rfd AbsoluteSeek off
L.hPut fd str
return ()
doVerifyIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doVerifyIo fd (IoLoc off sz num) str = do
-- hSeek fd AbsoluteSeek off
rfd <- (handleToFd fd)
fdSeek rfd AbsoluteSeek off
filedata <- L.hGet fd sz
if str == filedata
then return ()
else throwIO (Corrupt ("Data corruption in #" ++ (take 200
(L.unpack str)) ++ "#" ++ (take 200 (L.unpack filedata)))(IoLoc off
sz num))
mainWrite file = do
(FileIO hd _ _) <- doWriteFile file
hdrIO <- doHdrFromFile hd
check hd
(FileIO _ hdr _) <- return hdrIO
check hd
iolst <- return $ genIoList hdr
app <- return $ zip iolst (map (genPattern (fileName hdr)) iolst)
putStrLn (fileName hdr)
mapM (\(x,y)-> doActualIo (fd hdrIO) x y) app
mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app
-- hClose hd
return hdr
mainVerify file = do
hd <- openBinaryFile file ReadWriteMode
hdrIO <- doHdrFromFile hd
hdr <- return (params hdrIO)
iolst <- return $ genIoList hdr
file <- return $ filter (\x->not ( x=='"')) (fileName hdr)
app <- return $ zip iolst (map (genPattern file) iolst)
putStrLn $ filter (\x->not ( x=='"')) (fileName hdr)
mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app
-- hClose hd
return hdr
main = do
x <- getArgs
if (length x) == 3
then do
main1
else do
putStrLn "USAGE:\nfile-io write/verify <full-path file name>
<number of threads> \
\\n\n\
\Simple IO load generator with write verification. This utility is designed\n\
\to generate multi-threaded IO load which will write a pattern to the file. \n\
\When this is invoked with the same parameters with the verify option \n\
\the data written will be verified.\n\
\ "
main1 = do
[op, name, numProcs] <- getArgs
m <- newEmptyMVar
n <- return $read numProcs
case op of
"write" -> do mapM forkIO [(fillfile (name ++ (show i)) ) m|i<-[1..n]]
"verify" -> do mapM forkIO [(verifyfile (name ++ (show i)) ) m|i<-[1..n]]
x <- mapM takeMVar $ take (read numProcs) $ repeat m
putStrLn $ show $ and x
fillfile filename m = do
mainWrite filename
putMVar m True
verifyfile filename m = do
mainVerify filename
putMVar m True
On Tue, Jan 11, 2011 at 1:02 AM, Anand Mitra
Hi,
I had a requirement to generate various kinds of I/O patterns on a filesystem and subsequently verify this. The initial version of the program below implements a random I/O pattern with multiple threads. Even when the number of I/O is as small as 200 and 10 concurrent theads, the amount of memory used is huge. When I run the program it consumes close to 1to2GB memory. Moreover the rate at which it generates the I/O is very low which is not good for testing a filesystem. I have used System.POSIX.IO but I tried System.IO and did not see much difference either. I would appreciate help in identifying ways to improve this.

Hi Anand Firstly apologies - my advice from yesterday was trivial advice, changing to a better representation of Strings and avoiding costly operations (++) is valuable and should improve the performance of the program, but it might be a small overall improvement and it doesn't get to the heart of things. Really you need to do two things - one is consider what you are doing and evaluate whether it is appropriate for a performance sensitive app, the other is to profile and find the bits that are too slow. I rarely use Control.Concurrent so I can't offer any real experience but I'm concerned that it is adding overhead for no benefit. Looking at the code and what the comments say it does - I don't think your situation benefits from concurrency. A thread in your program could do all is work in one go, its not that you need to be servicing many clients (cf. a web server that needs to service many clients without individual long waits so it makes sense to schedule them) or that you are waiting on other processes making resources available. So for your program, any changes to execution caused by scheduling / de-scheduling threads (probably) just add to the total time. If you have a multi-core machine you could potentially benefit from parallelism - splitting the work amongst available cores. But in GHC forkIO spawns "green threads" which run in the same OS thread so you won't be getting any automatic parallelism from the code even if you have multi-core. However don't take my word for this - I could easily be wrong. If you want performance you really do need to see what the profiler tells you. Best wishes Stephen

Hi Stephen,
Thanks for you reply. I will give you my motivation behind this
application so that you have better context. We have been porting ZFS
to linux and have written a few programs to stress the filesystem and
generate IO load similar to what would be expected in a production
environment. The key objective is to find bugs in the ZFS code. I have
been interested in haskell for quite some time and used this as an
excuse to write something which might be useful. It is possible that
what I am doing would be much more efficient in C than in haskell.
However if my objective is fulfilled to within a fair degree, the
effort will be fun as well as worth the time invested in it.
On the high level I expect the program to do the following
- ability to perform multi-threaded i/o both threadlevel and process
level to increasing contention on locks and shared resources in the
filesystem data structures to expose race conditions.
- verify each write that has been written. i.e. the content of the
file is a function of a random seed which will allow us to detect
errors like misplaced writes lost writes etc.
- the contents should be self identifying for debugging purposes. If
mysterious data appears within a file the contents would make it
obvious where it should actually belong. I.e. file name, offset, size
and io sequence number.
- ability to generate metadata load
The existing code does only some of these but it could be expanded to
cover all aspects if the existing performance is promising.
Getting back to the problem at hand. I had some luck in identifying
the cause of the "handle is closed". I suspected that it could be
because I was mixing calls from System.IO and System.POSIX.IO. After
making the them uniform atleast I don't get a "handle is closed" error
but it is hung. Not being familiar with the debugger I resorted to
more traditional means of putting putStrLn to find what was happening.
And it appeared that the program was getting hung just before starting
the random IO. At this point I was distracted with some other work
while the apparently hung program was executing. When I came back to
the xterm there was the debug putStrLn I had added. This told me that
it wasn't hung but just taking a lot of time. From this evidence it
was instantly clear what was happening. To simplify the verification
process I was checking that there weren't any overlaping I/O. It seems
that a large number of them were overlapping and hence an inordinate
amount of time was being spent generating the list. I could bet that
is where my memory was going as well. I have removed the constraints
on the random list generator and now program is able to saturate the
disk bandwidth without any trouble. Thanks for you help I'll welcome
suggestions to improve the code.
regards
--
mitra
On Tue, Jan 11, 2011 at 3:38 PM, Stephen Tetley
Hi Anand
Firstly apologies - my advice from yesterday was trivial advice, changing to a better representation of Strings and avoiding costly operations (++) is valuable and should improve the performance of the program, but it might be a small overall improvement and it doesn't get to the heart of things.
Really you need to do two things - one is consider what you are doing and evaluate whether it is appropriate for a performance sensitive app, the other is to profile and find the bits that are too slow.
I rarely use Control.Concurrent so I can't offer any real experience but I'm concerned that it is adding overhead for no benefit. Looking at the code and what the comments say it does - I don't think your situation benefits from concurrency. A thread in your program could do all is work in one go, its not that you need to be servicing many clients (cf. a web server that needs to service many clients without individual long waits so it makes sense to schedule them) or that you are waiting on other processes making resources available. So for your program, any changes to execution caused by scheduling / de-scheduling threads (probably) just add to the total time.
If you have a multi-core machine you could potentially benefit from parallelism - splitting the work amongst available cores. But in GHC forkIO spawns "green threads" which run in the same OS thread so you won't be getting any automatic parallelism from the code even if you have multi-core.
However don't take my word for this - I could easily be wrong. If you want performance you really do need to see what the profiler tells you.
Best wishes
Stephen
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (3)
-
Alex Rozenshteyn
-
Anand Mitra
-
Stephen Tetley