
I have an accept-loop:
do (conn, _saddr) <- accept sock
forkIO $ initializeConnection conn
Which allocates memory iff accept allocates, I suppose. To test the theory,
is there a way I can force an allocation that won't get optimized away?
According to the old print-statement debugging method, it is accept that
causes the problem. The accept is the last thing that happens before the
hang.
I created a test program to hammer on accept.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Concurrent (forkIO)
import Control.Concurrent.MVar
import Control.Monad (forever, liftM)
import Network (listenOn, withSocketsDo, PortID(..))
import Network.BSD (getProtocolNumber)
import Network.Socket (Family(..), SockAddr(..), SocketType(..), Socket(..),
accept, connect, inet_addr, socket, sClose)
import Network.Socket.ByteString
import Data.ByteString.Char8 ()
import System.Environment (getArgs)
import System.IO (stdout, hFlush)
port = 8080
main = do
args <- getArgs
if length args < 1 || (head args /= "client" && head args /= "server")
then putStrLn "say client or server"
else withSocketsDo $ case head args of
"client" -> client
"server" -> server
client = do
putStrLn "Client mode"
tcp <- getProtocolNumber "tcp"
forever $ do sock <- socket AF_INET Stream tcp
localhost <- inet_addr "127.0.0.1"
putStr "Making a connection..."
connect sock (SockAddrInet port localhost)
send sock "a"
_ <- recv sock 1
sClose sock
putStrLn " done client connection"
server = do
putStrLn "Server mode"
sock <- listenOn (PortNumber port)
count <- newMVar 0
forever $ do count' <- modifyMVar count (\c -> return (c+1,c+1))
putStr $ (show count') ++ " About to accept..."
(conn, _saddr) <- accept sock
putStrLn " accepted."
hFlush stdout
forkIO $ handleServerConnection conn count'
handleServerConnection conn count = do
putStr $ (show count) ++ " Handling a server connection..."
rd <- recv conn 1
send conn "a"
sClose conn
putStrLn " done server connection."
hFlush stdout
While it doesn't seem to fully hang, it does go through mysterious "hiccups"
in which an accept call takes many seconds to return. I'm on Mac OS X Snow
Leopard, ghc 6.12.1.
I'm not quite sure how I ought to proceed. I'm still very open to debugging
tools and techniques I could use to approach the problem!
Aran
On Thu, May 13, 2010 at 12:10 PM, Jason Dagit
On Thu, May 13, 2010 at 5:53 AM, Aran Donohue
wrote: Thanks folks! Forward progress is made...
Unfortunately, programs don't seem to write out their threadscope event logs until they terminate, and mine hangs until I kill it, so I can't get at the event log.
Tracing has taught me that before the hang-cause, my program splits its time in pthread_cond_wait in two different threads, and select in a third. After the hang, it no longer calls select and one of those pthread_cond_waits in the other. In the version without -threaded that doesn't hang, it never does any pthread_cond_wait and never misses the select.
Now to go figure out what impossible condition it's waiting on, I guess.
The select sounds like the IO manager thread (a thread in the RTS not your code). Is it possible that one of your threads does work but never allocates memory? I've heard in some cases that can lead to starvation. I think the explanation was that thread switching happens on allocation?
Jason