summaryrefslogtreecommitdiffstats
path: root/net/sched
Commit message (Collapse)AuthorAgeFilesLines
* net_sched: get rid of struct tcf_commonWANG Cong2016-07-252-77/+72
| | | | | | | | | | | After the previous patch, struct tc_action should be enough to represent the generic tc action, tcf_common is not necessary any more. This patch gets rid of it to make tc action code more readable. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: move tc_action into tcf_commonWANG Cong2016-07-2514-279/+256
| | | | | | | | | | | | | | | | | | | struct tc_action is confusing, currently we use it for two purposes: 1) Pass in arguments and carry out results from helper functions 2) A generic representation for tc actions The first one is error-prone, since we need to make sure we don't miss anything. This patch aims to get rid of this use, by moving tc_action into tcf_common, so that they are allocated together in hashtable and can be cast'ed easily. And together with the following patch, we could really make tc_action a generic representation for all tc actions and each type of action can inherit from it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Add match-all classifier hw offloading.Yotam Gigi2016-07-241-3/+73
| | | | | | | | | | | | | | | | | | Following the work that have been done on offloading classifiers like u32 and flower, now the match-all classifier hw offloading is possible. if the interface supports tc offloading. To control the offloading, two tc flags have been introduced: skip_sw and skip_hw. Typical usage: tc filter add dev eth25 parent ffff: \ matchall skip_sw \ action mirred egress mirror \ dev eth27 Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: introduce Match-all classifierJiri Pirko2016-07-243-0/+259
| | | | | | | | | | | The matchall classifier matches every packet and allows the user to apply actions on it. This filter is very useful in usecases where every packet should be matched, for example, packet mirroring (SPAN) can be setup very easily using that filter. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-07-241-2/+4
|\ | | | | | | | | | | Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched/sch_htb: clamp xstats tokens to fit into 32-bit intKonstantin Khlebnikov2016-07-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink protocol these values are converted into pshed ticks (64ns for now) and truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens" are declared as unsigned 32-bit but they could be negative thus tool 'tc' prints them as signed. Big values loose higher bits and/or become negative. This patch clamps tokens in xstat into range from INT_MIN to INT_MAX. In this way it's easier to understand what's going on here. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* | hfsc: reduce hfsc_sched to 14 cachelinesFlorian Westphal2016-07-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14 cachelines by placing level after filter_cnt (covering 4 byte hole) and reducing period/nactive/flags to u32 (period is just a counter, incremented when class becomes active -- 2**32 is plenty for this purpose, also, long is only 32bit wide on 32bit platforms anyway). cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period member is already u32, so no precision is lost there either. Cc: Michal Soltys <soltys@ziu.info> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-07-061-1/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net_sched: fix mirrored packets checksumWANG Cong2016-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net sched actions: skbedit convert to use more modern nla_put_xxxJamal Hadi Salim2016-07-041-8/+4
| | | | | | | | | | Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net sched actions: skbedit add support for mod-ing skb pkt_typeJamal Hadi Salim2016-07-041-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Example usage: tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \ match ip src 5.5.5.5/32 \ flowid 1:5 action skbedit ptype host This will tag all packets incoming from 5.5.5.5 with type PACKET_HOST Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bpf: refactor bpf_prog_get and type check into helperDaniel Borkmann2016-07-012-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since bpf_prog_get() and program type check is used in a couple of places, refactor this into a small helper function that we can make use of. Since the non RO prog->aux part is not used in performance critical paths and a program destruction via RCU is rather very unlikley when doing the put, we shouldn't have an issue just doing the bpf_prog_get() + prog->type != type check, but actually not taking the ref at all (due to being in fdget() / fdput() section of the bpf fd) is even cleaner and makes the diff smaller as well, so just go for that. Callsites are changed to make use of the new helper where possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched/sch_hfsc.c: anchor virtual curve at proper vt in hfsc_change_fsc()Michal Soltys2016-07-011-1/+1
| | | | | | | | | | | | | | | | | | cl->cl_vt alone is relative only to the current backlog period, while the curve operates on cumulative virtual time. This patch adds missing cl->cl_vtoff. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched/sch_hfsc.c: go passive after vt updateMichal Soltys2016-07-011-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a class is going passive, it should update its cl_vt first to be consistent with the last dequeue operation. Otherwise its cl_vt will be one packet behind and parent's cvtmax might not be updated as well. One possible side effect is if some class goes passive and subsequently goes active /without/ its parent going passive - with cl_vt lagging one packet behind - comparison made in init_vf() will be affected (same period). Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched/sch_hfsc.c: remove leftover dlist and droplistMichal Soltys2016-07-011-8/+0
| | | | | | | | | | | | | | | | | | | | | | This is update to: commit a09ceb0e08140a ("sched: remove qdisc->drop") That commit removed qdisc->drop, but left alone dlist and droplist that no longer serve any meaningful purpose. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched/sch_hfsc.c: add unlikely() in qdisc_peek_len()Michal Soltys2016-07-011-1/+1
| | | | | | | | | | | | | | The condition can only succeed on wrong configurations. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched/sch_hfsc.c: handle corner cases where head may change invalidating ↵Michal Soltys2016-07-011-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | calculated deadline Realtime scheduling implemented in HFSC uses head of the queue to make the decision about which packet to schedule next. But in case of any head drop, the deadline calculated for the previous head is not necessarily correct for the next head (unless both packets have the same length). Thanks to peek() function used during dequeue - which internally is a dequeue operation - hfsc is almost safe from this issue, as peek() dequeues and isolates the head storing it temporarily until the real dequeue happens. But there is one exception: if after the class activation a drop happens before the first dequeue operation, there's never a chance to do the peek(). Adding peek() call in enqueue - if this is the first packet in a new backlog period AND the scheduler has realtime curve defined - fixes that one corner case. The 1st hfsc_dequeue() will use that peeked packet, similarly as every subsequent hfsc_dequeue() call uses packet peeked by the previous call. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-06-307-74/+73
|\ \ | |/ | | | | | | | | | | | | Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
| * netem: fix a use after freeEric Dumazet2016-06-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the packet was dropped by lower qdisc, then we must not access it later. Save qdisc_pkt_len(skb) in a temp variable. Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * act_ife: acquire ife_mod_lock before reading ifeoplistWANG Cong2016-06-231-6/+8
| | | | | | | | | | | | | | Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * act_ife: only acquire tcf_lock for existing actionsWANG Cong2016-06-231-24/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexey reported that we have GFP_KERNEL allocation when holding the spinlock tcf_lock. Actually we don't have to take that spinlock for all the cases, especially for the new one we just create. To modify the existing actions, we still need this spinlock to make sure the whole update is atomic. For net-next, we can get rid of this spinlock because we already hold the RTNL lock on slow path, and on fast path we can use RCU to protect the metalist. Joint work with Jamal. Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * htb: call qdisc_root with rcu read lock heldFlorian Westphal2016-06-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | saw a debug splat: net/include/net/sch_generic.h:287 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by kworker/2:1/710: #0: ("events"){.+.+.+}, at: [<ffffffff8106ca1d>] #1: ((&q->work)){+.+...}, at: [<ffffffff8106ca1d>] process_one_work+0x14d/0x690 Workqueue: events htb_work_func Call Trace: [<ffffffff812dc763>] dump_stack+0x85/0xc2 [<ffffffff8109fee7>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff814ced47>] htb_work_func+0x67/0x70 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net sched actions: bug fix dumping actions directly didnt produce NLMSG_DONEJamal Hadi Salim2016-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This refers to commands to direct action access as follows: sudo tc actions add action drop index 12 sudo tc actions add action pipe index 10 And then dumping them like so: sudo tc actions ls action gact iproute2 worked because it depended on absence of TCA_ACT_TAB TLV as end of message. This fix has been tested with iproute2 and is backward compatible. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * act_ipt: fix a bind refcnt leakWANG Cong2016-06-151-2/+5
| | | | | | | | | | | | | | | | | | | | And avoid calling tcf_hash_check() twice. Fixes: a57f19d30b2d ("net sched: ipt action fix late binding") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net_sched: prio: insure proper transactional behaviorEric Dumazet2016-06-151-34/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now prio_init() can return -ENOMEM, it also has to make sure any allocated qdiscs are freed, since the caller (qdisc_create()) wont call ->destroy() handler for us. More generally, we want a transactional behavior for "tc qdisc change ...", so prio_tune() should not make modifications if any error is returned. It means that we must validate parameters and allocate missing qdisc(s) before taking root qdisc lock exactly once, to not leave the prio qdisc in an intermediate state. Fixes: cbdf45116478 ("net_sched: prio: properly report out of memory errors") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net_sched: fix pfifo_head_drop behavior vs backlogEric Dumazet2016-06-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the qdisc is full, we drop a packet at the head of the queue, queue the current skb and return NET_XMIT_CN Now we track backlog on upper qdiscs, we need to call qdisc_tree_reduce_backlog(), even if the qlen did not change. Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net_sched: prio: properly report out of memory errorsEric Dumazet2016-06-121-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At Qdisc creation or change time, prio_tune() creates missing pfifo qdiscs but does not return an error code if one qdisc could not be allocated. Leaving a qdisc in non operational state without telling user anything about this problem is not good. Also, testing if we replace something different than noop_qdisc a second time makes no sense so I removed useless code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: netem: do not call qdisc_drop() with a NULL skbEric Dumazet2016-06-291-4/+8
| | | | | | | | | | | | | | | | | | | | If skb_unshare() fails, we call qdisc_drop() with a NULL skb, which is no longer supported. Fixes: 520ac30f4551 ("net_sched: drop packets after root qdisc lock is released") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: generalize bulk dequeueEric Dumazet2016-06-251-10/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When qdisc bulk dequeue was added in linux-3.18 (commit 5772e9a3463b "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"), it was constrained to some specific qdiscs. With some extra care, we can extend this to all qdiscs, so that typical traffic shaping solutions can benefit from small batches (8 packets in this patch). For example, HTB is often used on some multi queue device. And bonding/team are multi queue devices... Idea is to bulk-dequeue packets mapping to the same transmit queue. This brings between 35 and 80 % performance increase in HTB setup under pressure on a bonding setup : 1) NUMA node contention : 610,000 pps -> 1,110,000 pps 2) No node contention : 1,380,000 pps -> 1,930,000 pps Now we should work to add batches on the enqueue() side ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_htb: export class backlog in dumpsEric Dumazet2016-06-251-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already get child qdisc qlen, we also can get its backlog so that class dumps can report it. Also replace qstats by a single drop counter, but move it in a separate cache line so that drops do not dirty useful cache lines. Tested: $ tc -s cl sh dev eth0 class htb 1:1 root leaf 3: prio 0 rate 1Gbit ceil 1Gbit burst 500000b cburst 500000b Sent 2183346912 bytes 9021815 pkt (dropped 2340774, overlimits 0 requeues 0) rate 1001Mbit 517543pps backlog 120758b 499p requeues 0 lended: 9021770 borrowed: 0 giants: 0 tokens: 9 ctokens: 9 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: fq_codel: cache skb->truesize into skb->cbEric Dumazet2016-06-251-3/+4
| | | | | | | | | | | | | | | | | | | | Now we defer skb drops, it makes sense to keep a copy of skb->truesize in struct codel_skb_cb to avoid one cache line miss per dropped skb in fq_codel_drop(), to reduce latencies a bit further. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: drop packets after root qdisc lock is releasedEric Dumazet2016-06-2526-101/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net, cls: also reject deleting all filters when TCA_KIND presentDaniel Borkmann2016-06-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When we check for RTM_DELTFILTER, we should also reject the request for deleting all filters under a given parent when TCA_KIND attribute is present. If present, it's currently just ignored but there's also no point to let it pass in the first place either since this doesn't have any meaning with wild-card removal. Fixes: ea7f8277f907 ("net, cls: allow for deleting all filters for given parent") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_fq: defer skb freeingEric Dumazet2016-06-151-1/+1
| | | | | | | | | | | | | | sfq_reset() can use rtnl_kfree_skbs() instead of kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_pie: defer skb freeingEric Dumazet2016-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | pie_change() can use rtnl_qdisc_drop() to benefit from deferred freeing. pie_reset() is already using qdisc_reset_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_netem: defer skb freeingEric Dumazet2016-06-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | rtnl_kfree_skbs() can be used in tfifo_reset() It would be nice if we could iterate through rb tree instead of removing one skb at a time, and build a single skb chain. But this is left for a future patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_htb: defer skb freeingEric Dumazet2016-06-151-2/+2
| | | | | | | | | | | | | | | | | | Both htb_reset() and htb_destroy() can use __qdisc_reset_queue() instead of __skb_queue_purge() to defer skb freeing of internal queues. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_hhf: defer skb freeingEric Dumazet2016-06-151-2/+2
| | | | | | | | | | | | | | Both hhf_reset() and hhf_change() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: fq_codel: defer skb freeingEric Dumazet2016-06-151-8/+9
| | | | | | | | | | | | | | Both fq_codel_change() and fq_codel_reset() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_fq: defer skb freeingEric Dumazet2016-06-151-6/+13
| | | | | | | | | | | | | | Both fq_change() and fq_reset() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_codel: defer skb freeing in codel_change()Eric Dumazet2016-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | codel_change() can use rtnl_qdisc_drop() to defer expensive skb freeing after locks are released. codel_reset() already has support for deferred skb freeing because it uses qdisc_reset_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_choke: defer skb freeingEric Dumazet2016-06-151-4/+4
| | | | | | | | | | | | | | | | choke_reset() and choke_change() can use rtnl_qdisc_drop() to defer expensive skb freeing after locks are released. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: add the ability to defer skb freeingEric Dumazet2016-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qdisc are changed under RTNL protection and often while blocking BH and root qdisc spinlock. When lots of skbs need to be dropped, we free them under these locks causing TX/RX freezes, and more generally latency spikes. This commit adds rtnl_kfree_skbs(), used to queue skbs for deferred freeing. Actual freeing happens right after RTNL is released, with appropriate scheduling points. rtnl_qdisc_drop() can also be used in place of disc_drop() when RTNL is held. qdisc_reset_queue() and __qdisc_reset_queue() get the new behavior, so standard qdiscs like pfifo, pfifo_fast... have their ->reset() method automatically handled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: make tcf_hash_check() booleanWANG Cong2016-06-157-11/+16
| | | | | | | | | | | | | | Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | act_police: rename tcf_act_police_locate() to tcf_act_police_init()WANG Cong2016-06-151-4/+4
| | | | | | | | | | | | | | | | | | This function is just ->init(), rename it to make it obvious. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: remove internal use of TC_POLICE_*WANG Cong2016-06-152-3/+3
| | | | | | | | | | | | | | | | | | | | | | These should be gone when we removed CONFIG_NET_CLS_POLICE. We can not totally remove them since they are exposed to userspace. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: flower: Return error when hw can't offload and skip_sw is setAmir Vadai2016-06-141-17/+25
| | | | | | | | | | | | | | | | | | | | | | When skip_sw is set and hardware fails to apply filter, return error to user. This will make error propagation logic similar to the one currently used in u32 classifier. Also, changed code to use tc_skip_sw() utility function. Signed-off-by: Amir Vadai <amirva@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sched: remove NET_XMIT_POLICEDFlorian Westphal2016-06-122-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sch_atm returns this when TC_ACT_SHOT classification occurs. But all other schedulers that use tc_classify (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS in this case so just do that in atm. BATMAN uses it as an intermediate return value to signal forwarding vs. buffering, but it did not return POLICED to callers outside of BATMAN. Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: remove generic throttled managementEric Dumazet2016-06-107-17/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __QDISC_STATE_THROTTLED bit manipulation is rather expensive for HTB and few others. I already removed it for sch_fq in commit f2600cf02b5b ("net: sched: avoid costly atomic operation in fq_dequeue()") and so far nobody complained. When one ore more packets are stuck in one or more throttled HTB class, a htb dequeue() performs two atomic operations to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc lock is held. Removing this pair of atomic operations bring me a 8 % performance increase on 200 TCP_RR tests, in presence of throttled classes. This patch has no side effect, since nothing actually uses disc_is_throttled() anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: netem: remove qdisc_is_throttled() useEric Dumazet2016-06-101-3/+0
| | | | | | | | | | | | | | | | | | | | Looks like it is only there as some optimization attempt. Since __QDISC_STATE_THROTTLED set/unset is way too expensive, and netem is the last user, just remove this check. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud