| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Add support for reading switch registers with 'ethtool -d'.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Setting skb->protocol to a private protocol type may result in warning
messages such as
e1000e 0000:00:19.0 em1: checksum_partial proto=dada!
This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
skb->protocol for transmitted packets is not necessary and may actually
be harmful. For example, it prevents purposely unmodified (from a DSA
perspective) network drivers from properly setting up their transmit
checksum offload pointers since they inspect skb->protocol to set up the
IPv4 header or IPv6 header pointers. So don't unnecessarily change the
protocol field.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.
Here are some numbers with 30 000 dummy interfaces configured:
Before the patch:
$ time rmmod dummy
real 2m0.118s
user 0m0.000s
sys 1m50.048s
After the patch:
$ time rmmod dummy
real 1m9.970s
user 0m0.000s
sys 0m47.976s
Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
napi_schedule() can be called from any context and has to mask hard
irqs.
Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.
Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()
TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
order packets")
This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.
Here are performance test results (macvlan on vlan):
- Before
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 144096 1224195 1258.56
212992 60.00 51 0.45
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.23 0.00 25.26 0.08 0.00 74.43
- After
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 109593 0 957.20
212992 60.00 109593 957.20
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.18 0.00 8.38 0.02 0.00 91.43
[edumazet] Rewrote patch and changelog.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined!
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.
It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The internal and netdev vport remain part of openvswitch.ko. Encap
vports including vxlan, gre, and geneve can be built as separate
modules and are loaded on demand. Modules can be unloaded after use.
Datapath ports keep a reference to the vport module during their
lifetime.
Allows to remove the error prone maintenance of the global list
vport_ops_list.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
the AP devices to keep track of the hardware-address-to-IP-address
mapping of the mobile devices within the WLAN network.
The AP will learn this mapping via observing DHCP, ARP, and NS/NA
frames. When a request for such information is made (i.e. ARP request,
Neighbor Solicitation), the AP will respond on behalf of the
associated mobile device. In the process of doing so, the AP will drop
the multicast request frame that was intended to go out to the wireless
medium.
It was recommended at the LKS workshop to do this implementation in
the bridge layer. vxlan.c is already doing something very similar.
The DHCP snooping code will be added to the userspace application
(hostapd) per the recommendation.
This RFC commit is only for IPv4. A similar approach in the bridge
layer will be taken for IPv6 as well.
Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Let compiler decide what to do with static void __ipxitf_put()
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
use %08X instead of %08lX and remove casting.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
include ipx.h from sysctl_net_ipx.c
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
unsigned char *sha (source) was already in original git version
but was never used.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
See Documentation/CodingStyle Chapter 6
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.
Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.
After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is the prep work to reduce the number of calls to fib6_lookup().
The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).
This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
return values:
* If it is backtrack-able, returns next fn for retry.
* If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
patches easier to read.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Remove trailing whitespace in tcp.h icmp.c syncookies.c
Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.
But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.
This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.
Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.
Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:
[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762] Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762] CPU0
[ 1109.998762] ----
[ 1109.998762] lock(slock-AF_TIPC);
[ 1109.998762] <Interrupt>
[ 1109.998762] lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762] *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0
When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locking dependency detected below possible unsafe locking scenario:
CPU0 CPU1
T0: tipc_named_rcv() tipc_rcv()
T1: [grab nametble write lock]* [grab node lock]*
T2: tipc_update_nametbl() tipc_node_link_up()
T3: tipc_nodesub_subscribe() tipc_nametbl_publish()
T4: [grab node lock]* [grab nametble write lock]*
The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
size of the inner network headers too.
This is 'mostly harmless'; tbf might send skb that is slightly over
quota or drop skb even if it would have fit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
skb_gso_segment() has a 'features' argument representing offload features
available to the output path.
A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
those instead of the provided ones when handing encapsulation/tunnels.
Depending on dev->hw_enc_features of the output device skb_gso_segment() can
then return NULL even when the caller has disabled all GSO feature bits,
as segmentation of inner header thinks device will take care of segmentation.
This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
that did not fit the remaining token quota as the segmentation does not work
when device supports corresponding hw offload capabilities.
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains netfilter fixes for your net tree,
they are:
1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.
2) Restrict nat and masq expressions to the nat chain type. Otherwise,
users may crash their kernel if they attach a nat/masq rule to a non
nat chain.
3) Fix hook validation in nft_compat when non-base chains are used.
Basically, initialize hook_mask to zero.
4) Make sure you use match/targets in nft_compat from the right chain
type. The existing validation relies on the table name which can be
avoided by
5) Better netlink attribute validation in nft_nat. This expression has
to reject the configuration when no address and proto configurations
are specified.
6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
Yet another sanity check to reject incorrect configurations from
userspace.
7) Conditional NAT attribute dumping depending on the existing
configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| | |
Dump NFTA_NAT_REG_ADDR_MIN if this is non-zero. Same thing with
NFTA_NAT_REG_PROTO_MIN.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| | |
Interpret NFTA_NAT_REG_ADDR_MAX if NFTA_NAT_REG_ADDR_MIN is present,
otherwise, skip it. Same thing with NFTA_NAT_REG_PROTO_MAX.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We have to validate that we at least get an NFTA_NAT_REG_ADDR_MIN or
NFTA_NFT_REG_PROTO_MIN attribute. Reject the configuration if none
of them are present.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have to validate the real chain type to ensure that matches/targets
are not used out from their scope (eg. MASQUERADE in nat chain type).
The existing validation relies on the table name, but this is not
sufficient since userspace can fool us by using the appropriate table
name with a different chain type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Set hook_mask to zero for non-base chains, otherwise people may hit
bogus errors from the xt_check_target() and xt_check_match() when
validating the uninitialized hook_mask.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:
1) Use of nat from base chain that is not of nat type. Reject this
configuration from the nft_*_init() path of the expression.
2) Use of nat from non-base chain. In this case, we have to wait until
the non-base chain is referenced by at least one base chain via
jump/goto. This is resolved from the nft_*_validate() path which is
called from nf_tables_check_loops().
The user gets an -EOPNOTSUPP in both cases.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
[ 23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking fixes from David Miller:
"A quick batch of bug fixes:
1) Fix build with IPV6 disabled, from Eric Dumazet.
2) Several more cases of caching SKB data pointers across calls to
pskb_may_pull(), thus referencing potentially free'd memory. From
Li RongQing.
3) DSA phy code tests operation presence improperly, instead of going:
if (x->ops->foo)
r = x->ops->foo(args);
it was going:
if (x->ops->foo(args))
r = x->ops->foo(args);
Fix from Andew Lunn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Net: DSA: Fix checking for get_phy_flags function
ipv6: fix a potential use after free in sit.c
ipv6: fix a potential use after free in ip6_offload.c
ipv4: fix a potential use after free in gre_offload.c
tcp: fix build error if IPv6 is not enabled
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The check for the presence or not of the optional switch function
get_phy_flags() called the function, rather than checked to see if it
is a NULL pointer. This causes a derefernce of a NULL pointer on all
switch chips except the sf2, the only switch to implement this call.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6819563e646a ("net: dsa: allow switch drivers to specify phy_device::dev_flags")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pskb_may_pull() maybe change skb->data and make iph pointer oboslete,
fix it by geting ip header length directly.
Fixes: ca15a078 (sit: generate icmpv6 error when receiving icmpv4 error)
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pskb_may_pull() maybe change skb->data and make opth pointer oboslete,
so set the opth again
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pskb_may_pull() may change skb->data and make greh pointer oboslete;
so need to reassign greh;
but since first calling pskb_may_pull already ensured that skb->data
has enough space for greh, so move the reference of greh before second
calling pskb_may_pull(), to avoid reassign greh.
Fixes: 7a7ffbabf9("ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC")
Cc: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|