summaryrefslogtreecommitdiffstats
path: root/net/tipc/bcast.c
diff options
context:
space:
mode:
authorJon Paul Maloy <jon.maloy@ericsson.com>2017-01-03 10:55:11 -0500
committerDavid S. Miller <davem@davemloft.net>2017-01-03 11:13:05 -0500
commit365ad353c2564bba8835290061308ba825166b3a (patch)
tree06439f0724f3df34e29e8d3fb32432894e6d8ff0 /net/tipc/bcast.c
parent4d8642d896c53966d32d5e343c3620813dd0e7c8 (diff)
downloadop-kernel-dev-365ad353c2564bba8835290061308ba825166b3a.zip
op-kernel-dev-365ad353c2564bba8835290061308ba825166b3a.tar.gz
tipc: reduce risk of user starvation during link congestion
The socket code currently handles link congestion by either blocking and trying to send again when the congestion has abated, or just returning to the user with -EAGAIN and let him re-try later. This mechanism is prone to starvation, because the wakeup algorithm is non-atomic. During the time the link issues a wakeup signal, until the socket wakes up and re-attempts sending, other senders may have come in between and occupied the free buffer space in the link. This in turn may lead to a socket having to make many send attempts before it is successful. In extremely loaded systems we have observed latency times of several seconds before a low-priority socket is able to send out a message. In this commit, we simplify this mechanism and reduce the risk of the described scenario happening. When a message is attempted sent via a congested link, we now let it be added to the link's backlog queue anyway, thus permitting an oversubscription of one message per source socket. We still create a wakeup item and return an error code, hence instructing the sender to block or stop sending. Only when enough space has been freed up in the link's backlog queue do we issue a wakeup event that allows the sender to continue with the next message, if any. The fact that a socket now can consider a message sent even when the link returns a congestion code means that the sending socket code can be simplified. Also, since this is a good opportunity to get rid of the obsolete 'mtu change' condition in the three socket send functions, we now choose to refactor those functions completely. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/tipc/bcast.c')
-rw-r--r--net/tipc/bcast.c6
1 files changed, 3 insertions, 3 deletions
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..c35fad3 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct sk_buff_head *xmitq)
* and to identified node local sockets
* @net: the applicable net namespace
* @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
* Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
*/
int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
@@ -197,7 +197,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
tipc_bcast_unlock(net);
/* Don't send to local node if adding to link failed */
- if (unlikely(rc)) {
+ if (unlikely(rc && (rc != -ELINKCONG))) {
__skb_queue_purge(&rcvq);
return rc;
}
@@ -206,7 +206,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
tipc_bcbase_xmit(net, &xmitq);
tipc_sk_mcast_rcv(net, &rcvq, &inputq);
__skb_queue_purge(list);
- return 0;
+ return rc;
}
/* tipc_bcast_rcv - receive a broadcast packet, and deliver to rcv link
OpenPOWER on IntegriCloud