network package and SIGVTALRM

Hi all I am doing a DCC subsystem on a irc client. After all the handshakes are done I just connect to the server and start `recv`. The code I use for this is: getPackets :: MVar Int -> FilePath -- ^ Name media -> Int -- ^ File size -> AddrInfo -> ExceptT DCCError IO () getPackets mvar name totalSize addr = do receivedSize <- lift $ bracket acquire release receive let delta = (totalSize - receivedSize) if delta > 0 then throwE (NotFullRecv delta) else return () where bufferSize = 16384 acquire :: IO (IO.Handle,Socket) acquire = (,) <$> (IO.openFile name IO.WriteMode) <*> newSocket addr release :: (IO.Handle,Socket) -> IO () release (hdl, sock) = IO.hClose hdl >> close sock receive :: (IO.Handle,Socket) -> IO Int receive (hdl, sock) = flip execStateT 0 . fix $ \loop -> do mediaData <- lift (B.recv sock bufferSize) unless (B.null mediaData) $ do S.modify' (+ (B.length mediaData)) currentSize <- S.get lift $ B.hPut hdl mediaData >> B.send sock (int2BS currentSize) >> swapMVar mvar currentSize loop Part of the protocol is that on each `recv` I send the current received size on network byte order. Hence the B.send line on receive I use this function: -- | given a number forms a bytestring with each digit on a separated -- Word8 in network byte-order int2BS :: Int -> B.ByteString int2BS i | w <- (fromIntegral i :: Word32) = B.pack [ (fromIntegral (shiftR w 24) :: Word8) , (fromIntegral (shiftR w 16) :: Word8) , (fromIntegral (shiftR w 8) :: Word8) , (fromIntegral w :: Word8)] Everything works correctly until around 1/2 of a test transfer (ie in a file of 340M it gets 170). That first half is gotten in the right order (I tested with a video and it was playable until the middle). On tinier files the bug doesn't happen, the file is received completly. I did a little bit of `strace` and `tcpdump` and I got this -- strace -e trace=network -p $client (..) 30439 recvfrom(13, "\312\255\201\337\376\355\253\r\177\276\204X]8\6\221\301#\361<>\273+\355\5\343B \333\366\351W"..., 16384, 0, NULL, NULL) = 1380 30439 sendto(13, "\n\273\31l", 4, 0, NULL, 0) = 4 30439 recvfrom(13, 0x20023f010, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 30439 recvfrom(13, "\222llq_H\23\17\275\f}\367\"P4\23\207\312$w\371J\354aW2\243R\32\v\n\251"..., 16384, 0, NULL, NULL) = 1380 30439 sendto(13, "\n\273\36\320", 4, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} --- 30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} --- (..) -- tcpdump 05:20:22.788273 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48004680, win 489, options [nop,nop,TS val 627805332 ecr 675358947,nop,nop,sack 1 {48006060:48066780}], length 0 05:20:22.975627 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], seq 48004680:48006060, ack 82629, win 0, options [nop,nop,TS val 675359033 ecr 627805248], length 1380 05:20:23.014991 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48066780, win 4, options [nop,nop,TS val 627805559 ecr 675359033], length 0 05:20:23.768012 IP 198.255.92.74.36103 > tapioca.36346: Flags [P.], seq 48066780:48067292, ack 82629, win 0, options [nop,nop,TS val 675359232 ecr 627805559], length 512 05:20:23.768143 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48067292, win 0, options [nop,nop,TS val 627806312 ecr 675359232], length 0 05:20:24.523397 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], ack 82629, win 0, options [nop,nop,TS val 675359421 ecr 627806312], length 0 What bothers me is that SIGVTALRM on the strace output. I am not the greatest unix hacker but that signal is related to settimer and I haven't explicitly set that up. So I am scratching me head a little. Maybe somebody has experienced something related with the network package? Do you notice something on the logs? thanks in advance. -- -- Ruben Astudillo

Hi Ruben,
This signal is used internally by Haskell IO manager, nothing is wrong with
it.
What I see in the dump is that 198.255.92.74 is announcing window of size 0
from the beginning and tapioca ends up sending 0 window too. Are you sure
they both reading what other one sends to them? Given that your code only
fails on large files, I guess sender never reads this 4-byte status
messages receiver sends to it, and processes deadlock after socket buffer
is filled up.
As a side node, consider using putWord32be from binary or cereal packages
instead of serializing data yourself.
On Tue, 12 Jul 2016 at 03:35, Ruben Astudillo
Hi all
I am doing a DCC subsystem on a irc client. After all the handshakes are done I just connect to the server and start `recv`. The code I use for this is:
getPackets :: MVar Int -> FilePath -- ^ Name media -> Int -- ^ File size -> AddrInfo -> ExceptT DCCError IO () getPackets mvar name totalSize addr = do receivedSize <- lift $ bracket acquire release receive let delta = (totalSize - receivedSize) if delta > 0 then throwE (NotFullRecv delta) else return () where bufferSize = 16384
acquire :: IO (IO.Handle,Socket) acquire = (,) <$> (IO.openFile name IO.WriteMode) <*> newSocket addr
release :: (IO.Handle,Socket) -> IO () release (hdl, sock) = IO.hClose hdl >> close sock
receive :: (IO.Handle,Socket) -> IO Int receive (hdl, sock) = flip execStateT 0 . fix $ \loop -> do mediaData <- lift (B.recv sock bufferSize) unless (B.null mediaData) $ do S.modify' (+ (B.length mediaData)) currentSize <- S.get lift $ B.hPut hdl mediaData >> B.send sock (int2BS currentSize) >> swapMVar mvar currentSize loop
Part of the protocol is that on each `recv` I send the current received size on network byte order. Hence the B.send line on receive I use this function:
-- | given a number forms a bytestring with each digit on a separated -- Word8 in network byte-order int2BS :: Int -> B.ByteString int2BS i | w <- (fromIntegral i :: Word32) = B.pack [ (fromIntegral (shiftR w 24) :: Word8) , (fromIntegral (shiftR w 16) :: Word8) , (fromIntegral (shiftR w 8) :: Word8) , (fromIntegral w :: Word8)]
Everything works correctly until around 1/2 of a test transfer (ie in a file of 340M it gets 170). That first half is gotten in the right order (I tested with a video and it was playable until the middle). On tinier files the bug doesn't happen, the file is received completly. I did a little bit of `strace` and `tcpdump` and I got this
-- strace -e trace=network -p $client (..) 30439 recvfrom(13,
"\312\255\201\337\376\355\253\r\177\276\204X]8\6\221\301#\361<>\273+\355\5\343B \333\366\351W"..., 16384, 0, NULL, NULL) = 1380 30439 sendto(13, "\n\273\31l", 4, 0, NULL, 0) = 4 30439 recvfrom(13, 0x20023f010, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) 30439 recvfrom(13,
"\222llq_H\23\17\275\f}\367\"P4\23\207\312$w\371J\354aW2\243R\32\v\n\251"..., 16384, 0, NULL, NULL) = 1380 30439 sendto(13, "\n\273\36\320", 4, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) 30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} --- 30438 --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} --- (..)
-- tcpdump 05:20:22.788273 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48004680, win 489, options [nop,nop,TS val 627805332 ecr 675358947,nop,nop,sack 1 {48006060:48066780}], length 0 05:20:22.975627 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], seq 48004680:48006060, ack 82629, win 0, options [nop,nop,TS val 675359033 ecr 627805248], length 1380 05:20:23.014991 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48066780, win 4, options [nop,nop,TS val 627805559 ecr 675359033], length 0 05:20:23.768012 IP 198.255.92.74.36103 > tapioca.36346: Flags [P.], seq 48066780:48067292, ack 82629, win 0, options [nop,nop,TS val 675359232 ecr 627805559], length 512 05:20:23.768143 IP tapioca.36346 > 198.255.92.74.36103: Flags [.], ack 48067292, win 0, options [nop,nop,TS val 627806312 ecr 675359232], length 0 05:20:24.523397 IP 198.255.92.74.36103 > tapioca.36346: Flags [.], ack 82629, win 0, options [nop,nop,TS val 675359421 ecr 627806312], length 0
What bothers me is that SIGVTALRM on the strace output. I am not the greatest unix hacker but that signal is related to settimer and I haven't explicitly set that up. So I am scratching me head a little. Maybe somebody has experienced something related with the network package? Do you notice something on the logs? thanks in advance.
-- -- Ruben Astudillo _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 12/07/16 11:40, Andrey Sverdlichenko wrote:
they both reading what other one sends to them? Given that your code only fails on large files, I guess sender never reads this 4-byte status messages receiver sends to it, and processes deadlock after socket buffer is filled up.
Right on the nail. I played a bit with tcpdump/wireshark to see I was just sending 0 length ACKs and multiple 4 byte messages joined on a single packet. Seems TCP buffers tiny packets using a strategy called Nagle's algorithm and thus joined all my packets on a bufferzone until a threeshold. Then it sent them all at once, making the other end crazy. Adding on a function that set-ups the socket this line newSocket :: AddrInfo -> IO Socket newSocket addr = do sock <- socket AF_INET Stream defaultProtocol setSocketOption sock NoDelay 1 -- this was added connect sock (addrAddress addr) return sock makes the downloads go until the end. :-)
As a side node, consider using putWord32be from binary or cereal packages instead of serializing data yourself.
Note taken. I just don't want to impose on a new dependency on my first patch. I did copy/pasted putWord32be (at least on style) for my int2BS function though. Thanks a lot! -- Ruben Astudillo

On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo
Right on the nail. I played a bit with tcpdump/wireshark to see I was just sending 0 length ACKs and multiple 4 byte messages joined on a single packet. Seems TCP buffers tiny packets using a strategy called Nagle's algorithm and thus joined all my packets on a bufferzone until a threeshold. Then it sent them all at once, making the other end crazy.
This looks a bit scary. It should not matter if replies are merged or not. By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks. Regards, Andrey

Suppose TCP_NODELAY sock opt can prevent Nagle join small packets?
On Wed, Jul 13, 2016 at 11:10 AM Andrey Sverdlichenko
On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo
wrote: Right on the nail. I played a bit with tcpdump/wireshark to see I was just sending 0 length ACKs and multiple 4 byte messages joined on a single packet. Seems TCP buffers tiny packets using a strategy called Nagle's algorithm and thus joined all my packets on a bufferzone until a threeshold. Then it sent them all at once, making the other end crazy.
This looks a bit scary. It should not matter if replies are merged or not. By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks.
Regards, Andrey _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

If you are lucky. They still may be merged by sender if retransmission
occurs, or on receiving side, if receiver waits too long before reads, and
this is up to OS scheduler to control.
NDELAY option is used to improve interactive latency, it will not make TCP
obey message boundaries.
On Wed, Jul 13, 2016 at 11:43 AM Baojun Wang
Suppose TCP_NODELAY sock opt can prevent Nagle join small packets?
On Wed, Jul 13, 2016 at 11:10 AM Andrey Sverdlichenko
wrote: On Wed, Jul 13, 2016 at 3:14 AM Ruben Astudillo
wrote: Right on the nail. I played a bit with tcpdump/wireshark to see I was just sending 0 length ACKs and multiple 4 byte messages joined on a single packet. Seems TCP buffers tiny packets using a strategy called Nagle's algorithm and thus joined all my packets on a bufferzone until a threeshold. Then it sent them all at once, making the other end crazy.
This looks a bit scary. It should not matter if replies are merged or not. By any chance, don't your code use recv with some large max buffer size, but expect to get only 4 bytes because this is how much receiver sends each time? If so, change it to read 4 bytes only, and handle the case when less than 4 bytes are out. TCP do not preserve message boundaries and you can expect reads to return data in arbitrary sized chunks.
Regards, Andrey
_______________________________________________
Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 13/07/16 14:54, Andrey Sverdlichenko wrote:
If you are lucky. They still may be merged by sender if retransmission occurs, or on receiving side, if receiver waits too long before reads, and this is up to OS scheduler to control. NDELAY option is used to improve interactive latency, it will not make TCP obey message boundaries.
You're right. But understanding a little better the problem maybe will clarify why NODELAY is a valid option. DCC is in parts a redundant protocol. When you connect, the senders gives you the file you want but for coherency reasons every once in a while you have to reply the current transfered size through the same socket. This was a mean of preserving consistency that is redundant by the same mechanisms implemented on TCP. From the page[1] I am using to implement ``client A sends blocks of data (usually 1-2 KB) and at every block awaits confirmation from the client B, that when receiving a block should reply 4 bytes containing an positive number specifying the total size of the file received up to that moment. The transmission closes when the last acknowledge is received by client A. The acknowledges were meant to include some sort of coherency check in the transmission, but in fact no client can recover from an acknowledge error/desync, all of them just close the connection declaring the transfer as failed (the situation is even worse in fact, often acknowledge errors aren't even detected!). Since the packet-acknowledge round trip eats a lot of time, many clients included the send-ahead feature; the client A does not wait for the acknowledge of the first packet before sending the second one.'' The last part explains why my download still succeded until half the size of the file. But no sending any reply (because the message is too little to send) is a failure of interactivity on the protocol, not of message boundaries (which specify that the reply is 4 byte in length). [1]: http://www.kvirc.net/doc/doc_dcc_connection.html -- -- Ruben Astudillo

Hi Ruben,
I think you're falling into a common trap re. TCP. On a lightly-loaded
network, if you send a block of data on one host it typically arrives at
the other end of the connection as one thing. In other words, calls to
send() and recv() are one-to-one. In that situation adding NODELAY will
(seem to) solve problems like the ones that you were seeing. However, it
will all fall to pieces when you're running under load or there's
congestion or some other kind of problem, as it's perfectly legitimate for
packets to be combined and/or fragmented which breaks this one-to-one
relationship on which the correctness of your program rests.
You _must_ treat data received over TCP as a continuous stream of bytes and
not a sequence of discrete packets, and do things such as accounting for
the case where your 4-byte length indicator is split across two packets so
does not all arrive at once. If you don't, it will bite you at the very
worst time, and will do so nondeterministically. This kind of thing is very
hard to reproduce in a test environment.
There is nothing special about the DCC protocol that makes it immune from
this effect.
Best wishes,
David
On 14 July 2016 at 10:48, Ruben Astudillo
On 13/07/16 14:54, Andrey Sverdlichenko wrote:
If you are lucky. They still may be merged by sender if retransmission occurs, or on receiving side, if receiver waits too long before reads, and this is up to OS scheduler to control. NDELAY option is used to improve interactive latency, it will not make TCP obey message boundaries.
You're right. But understanding a little better the problem maybe will clarify why NODELAY is a valid option. DCC is in parts a redundant protocol. When you connect, the senders gives you the file you want but for coherency reasons every once in a while you have to reply the current transfered size through the same socket. This was a mean of preserving consistency that is redundant by the same mechanisms implemented on TCP. From the page[1] I am using to implement
``client A sends blocks of data (usually 1-2 KB) and at every block awaits confirmation from the client B, that when receiving a block should reply 4 bytes containing an positive number specifying the total size of the file received up to that moment.
The transmission closes when the last acknowledge is received by client A.
The acknowledges were meant to include some sort of coherency check in the transmission, but in fact no client can recover from an acknowledge error/desync, all of them just close the connection declaring the transfer as failed (the situation is even worse in fact, often acknowledge errors aren't even detected!).
Since the packet-acknowledge round trip eats a lot of time, many clients included the send-ahead feature; the client A does not wait for the acknowledge of the first packet before sending the second one.''
The last part explains why my download still succeded until half the size of the file. But no sending any reply (because the message is too little to send) is a failure of interactivity on the protocol, not of message boundaries (which specify that the reply is 4 byte in length).
[1]: http://www.kvirc.net/doc/doc_dcc_connection.html -- -- Ruben Astudillo
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

The last part explains why my download still succeded until half the size of the file. But no sending any reply (because the message is too little to send) is a failure of interactivity on the protocol, not of message boundaries (which specify that the reply is 4 byte in length).
TCP with Nagle algorithm enabled will not hold data infinitely. In fact, it only waits for a few tenths of a second, hoping there would be more data to send. If not, whatever it has is sent away. What your dumps show is a window announcement of size 0, which means there was a lot of data successfully received by TCP stack, but never read from socket, and this happens in both directions. You may want to check why you processes stop issuing read/recv calls.
participants (4)
-
Andrey Sverdlichenko
-
Baojun Wang
-
David Turner
-
Ruben Astudillo