Amazonka, conduit and sockets not closing

26 Nov 2020

      I've run into a problem with running out of filedescriptors. The
following snippet is a trimmed down version of what I'm doing:

#+begin_src haskell
main :: IO ()
main = do
  awsEnv <- newEnv Discover
  runAWSCond awsEnv $
    sqsSource queueUrl
      .| C.mapC snd
      .| sqsDeleteSink queueUrl
  where
    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit

sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
  (_, msgs) <- C.lift $ recvSQS queueUrl
  C.yieldMany msgs
  sqsSource queueUrl

sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
  C.await >>= \case
    Nothing -> pure ()
    Just receiptHandle -> do
      void $ C.lift $ delSQS queueUrl receiptHandle
      sqsDeleteSink queueUrl

recvSQS queueUrl = do
  let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
  rmrs <- send rm
  let status = rmrs ^. rmrsResponseStatus
      msgs = rmrs ^. rmrsMessages & traversed %~ extract
  pure (status, catMaybes msgs)
  where
    extract msg = do
      body <- msg ^. mBody
      rh <- msg ^. mReceiptHandle
      pure (body, rh)

delSQS queueUrl receiptHandle = do
  let dm = deleteMessage queueUrl receiptHandle
  send dm
#+end_src

This works fine for a while, but given a queue with enough messages it will fail
with something like

#+begin_example
TransportError (HttpExceptionRequest Request {
  host                 = "sqs.eu-central-1.amazonaws.com"
  port                 = 443
  secure               = True
  requestHeaders       = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
  path                 = "/"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 0
  responseTimeout      = ResponseTimeoutMicro 70000000
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))
#+end_example

After some detours I found out that it's actually not a network issue, but
rather that the process runs out of filedescriptors. Using =lsof= I can see that
it doesn't seem to close /any/ sockets at all, instead they get stuck in a
=CLOSE_WAIT= state:

#+begin_example
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
...
#+end_example

Am I using Amazonka and/or Conduit in a way that results in this? How do I should I use them?

Or, is it an issue somewhere "below" my code? What can I do address that?

Thanks for any insights or help
/M

--
Magnus Therning              OpenPGP: 0x927912051716CE39
email: magnus@therning.org
twitter: magthe              http://magnus.therning.org/

Action is the foundational key to all success.
     — Pablo Picasso

Magnus Therning

Bryan Richter

Will Yager

Viktor Dukhovni

Magnus Therning

tags

participants (4)