| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
In a later patch, we're going to need some atomic bit flags. Since that
field will need to be an unsigned long, we mitigate that space
consumption by migrating some other bitflags to the new field. Start
with the rq_secure flag.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\
| |
| |
| |
| |
| |
| | |
Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in
svc_rqst handling functions", which subsequent server rpc patches from
jlayton depend on. I'm merging this later tag on the assumption that's
more likely to be a tested and stable point.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a new directory heirarchy under the debugfs sunrpc/ directory:
sunrpc/
rpc_xprt/
<xprt id>/
Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.
For now, this directory just holds an "info" file, but we may add
other files to it in the future.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.
Add a new directory hierarchy under debugfs:
sunrpc/
rpc_clnt/
<clientid>/
Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
"NFS: Client side changes for RDMA
These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
xprtrdma: Display async errors
xprtrdma: Enable pad optimization
xprtrdma: Re-write rpcrdma_flush_cqs()
xprtrdma: Refactor tasklet scheduling
xprtrdma: unmap all FMRs during transport disconnect
xprtrdma: Cap req_cqinit
xprtrdma: Return an errno from rpcrdma_register_external()
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
An async error upcall is a hard error, and should be reported in
the system log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when
pad optimization was enabled. That bug was fixed by commit
e560e3b510d2 ("svcrdma: Add zero padding if the client doesn't send
it").
We can now enable pad optimization on the client, which helps
performance and is supported now by both Linux and Solaris servers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.
1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
different threads using the same wc buffers (ep->rep_recv_wcs
and ep->rep_send_wcs), added by commit 1c00dd077654 ("xprtrmda:
Reduce calls to ib_poll_cq() in completion handlers").
During transport disconnect processing, this sometimes resulted
in the same reply getting added to the rpcrdma_tasklets_g list
more than once, which corrupted the list.
2. The upcall functions drain only a limited number of CQEs,
thanks to the poll budget added by commit 8301a2c047cc
("xprtrdma: Limit work done by completion handler").
Fixes: a7bc211ac926 ("xprtrdma: On disconnect, don't ignore ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Restore the separate function that schedules the reply handling
tasklet. I need to call it from two different paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When using RPCRDMA_MTHCAFMR memory registration, after a few
transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
return EINVAL because the provider has exhausted its map pool.
Make sure that all FMRs are unmapped during transport disconnect,
and that ->send_request remarshals them during an RPC retransmit.
This resets the transport's MRs to ensure that none are leaked
during a disconnect.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.
Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.
The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.
For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).
For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.
Some providers hold back dispatching Work Requests until a CQE is
generated. If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.
I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.
Cap the rep_cqinit setting so completions are not left turned off
for too long.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RPC/RDMA send_request method and the chunk registration code
expects an errno from the registration function. This allows
the upper layers to distinguish between a recoverable failure
(for example, temporary memory exhaustion) and a hard failure
(for example, a bug in the registration logic).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
"NFS: Generic client side changes from Chuck
These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
NFS: Clean up nfs4_init_callback()
NFS: SETCLIENTID XDR buffer sizes are incorrect
SUNRPC: serialize iostats updates
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.
Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just
use that instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |/
| |
| |
| |
| |
| |
| | |
...just around svc_send, svc_recv and svc_process for now.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking fixes from David Miller:
1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.
2) nf_nat_range is not fully initialized and this is copied back to
userspace, from Daniel Borkmann.
3) Fix read past end of b uffer in netfilter ipset, also from Dan
Carpenter.
4) Signed integer overflow in ipv4 address mask creation helper
inet_make_mask(), from Vincent BENAYOUN.
5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
properly describe the device's capabilities, from Joe Stringer.
6) Fix memory leaks and checksum miscalculations in openvswitch, from
Pravin B SHelar and Jesse Gross.
7) FIB rules passes back ambiguous error code for unreachable routes,
making behavior confusing for userspace. Fix from Panu Matilainen.
8) ieee802154fake_probe() doesn't release resources properly on error,
from Alexey Khoroshilov.
9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.
10) Fix access of stale slave pointers in bonding code, from Nikolay
Aleksandrov.
11) Fix stack info leak in PPP pptp code, from Mathias Krause.
12) Cure locking bug in IPX stack, from Jiri Bohac.
13) Revert SKB fclone memory freeing optimization that is racey and can
allow accesses to freed up memory, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
tcp: Restore RFC5961-compliant behavior for SYN packets
net: Revert "net: avoid one atomic operation in skb_clone()"
virtio-net: validate features during probe
cxgb4 : Fix DCB priority groups being returned in wrong order
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
openvswitch: Don't validate IPv6 label masks.
pptp: fix stack info leak in pptp_getname()
brcmfmac: don't include linux/unaligned/access_ok.h
cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
ipv6: delete protocol and unregister rtnetlink when cleanup
net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
mac80211: minstrel_ht: fix a crash in rate sorting
vxlan: Inline vxlan_gso_check().
can: m_can: update to support CAN FD features
can: m_can: fix incorrect error messages
can: m_can: add missing delay after setting CCCR_INIT bit
can: m_can: fix not set can_dlc for remote frame
can: m_can: fix possible sleep in napi poll
can: m_can: add missing message RAM initialization
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit c3ae62af8e755 ("tcp: should drop incoming frames without ACK
flag set") was created to mitigate a security vulnerability in which a
local attacker is able to inject data into locally-opened sockets by
using TCP protocol statistics in procfs to quickly find the correct
sequence number.
This broke the RFC5961 requirement to send a challenge ACK in response
to spurious RST packets, which was subsequently fixed by commit
7b514a886ba50 ("tcp: accept RST without ACK flag").
Unfortunately, the RFC5961 requirement that spurious SYN packets be
handled in a similar manner remains broken.
RFC5961 section 4 states that:
... the handling of the SYN in the synchronized state SHOULD be
performed as follows:
1) If the SYN bit is set, irrespective of the sequence number, TCP
MUST send an ACK (also referred to as challenge ACK) to the remote
peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgment, TCP MUST drop the unacceptable
segment and stop processing further.
By sending an ACK, the remote peer is challenged to confirm the loss
of the previous connection and the request to start a new connection.
A legitimate peer, after restart, would not have a TCB in the
synchronized state. Thus, when the ACK arrives, the peer should send
a RST segment back with the sequence number derived from the ACK
field that caused the RST.
This RST will confirm that the remote peer has indeed closed the
previous connection. Upon receipt of a valid RST, the local TCP
endpoint MUST terminate its connection. The local TCP endpoint
should then rely on SYN retransmission from the remote end to
re-establish the connection.
This patch lets SYN packets through the discard added in c3ae62af8e755,
so that spurious SYN packets are properly dealt with as per the RFC.
The challenge ACK is sent unconditionally and is rate-limited, so the
original vulnerability is not reintroduced by this patch.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Not sure what I was thinking, but doing anything after
releasing a refcount is suicidal or/and embarrassing.
By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
could have released last reference and freed whole skb.
We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.
Reported-by: Chris Mason <clm@fb.com>
Fixes: ce1a4ea3f1258 ("net: avoid one atomic operation in skb_clone()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains two bugfixes for your net tree, they are:
1) Validate netlink group from nfnetlink to avoid an out of bound array
access. This should only happen with superuser priviledges though.
Discovered by Andrey Ryabinin using trinity.
2) Don't push ethernet header before calling the netfilter output hook
for multicast traffic, this breaks ebtables since it expects to see
skb->data pointing to the network header, patch from Linus Luessing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
for both locally generated IGMP and MLD queries. The IP header specific
filter options are off by 14 Bytes for netfilter (actual output on
interfaces is fine).
NF_HOOK() expects the skb->data to point to the IP header, not the
ethernet one (while dev_queue_xmit() does not). Luckily there is an
br_dev_queue_push_xmit() helper function already - let's just use that.
Introduced by eb1d16414339a6e113d89e2cca2556005d7ce919
("bridge: Add core IGMP snooping support")
Ebtables example:
$ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP
before (broken):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
IPv6 priority=0x3, Next Header=2
after (working):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
IPv6 priority=0x0, Next Header=0
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.
[ 180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[ 180.204249] index 9 is out of range for type 'int [9]'
[ 180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[ 180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[ 180.206498] 0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[ 180.207220] ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[ 180.207887] ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[ 180.208639] Call Trace:
[ 180.208857] dump_stack (lib/dump_stack.c:52)
[ 180.209370] ubsan_epilogue (lib/ubsan.c:174)
[ 180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[ 180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[ 180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[ 180.211495] SYSC_bind (net/socket.c:1541)
Moreover, define the missing nf_tables and nf_acct multicast groups too.
Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2014-11-20
Please full this little batch of fixes intended for the 3.18 stream!
For the mac80211 patch, Johannes says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
On top of that...
Ben Greear fixes an ath9k regression in which a BSSID mask is
miscalculated.
Dmitry Torokhov corrects an error handling routing in brcmfmac which
was checking an unsigned variable for a negative value.
Johannes Berg avoids a build problem in brcmfmac for arches where
linux/unaligned/access_ok.h and asm/unaligned.h conflict.
Mathy Vanhoef addresses another brcmfmac issue so as to eliminate a
use-after-free of the URB transfer buffer if a timeout occurs.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |\ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The commit 5935839ad73583781b8bbe8d91412f6826e218a4
"mac80211: improve minstrel_ht rate sorting by throughput & probability"
introduced a crash on rate sorting that occurs when the rate added to
the sorting array is faster than all the previous rates. Due to an
off-by-one error, it reads the rate index from tp_list[-1], which
contains uninitialized stack garbage, and then uses the resulting index
for accessing the group rate stats, leading to a crash if the garbage
value is big enough.
Cc: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).
When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket
Commit b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.
Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When userspace doesn't provide a mask, OVS datapath generates a fully
unwildcarded mask for the flow by copying the flow and setting all bits
in all fields. For IPv6 label, this creates a mask that matches on the
upper 12 bits, causing the following error:
openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)
This patch ignores the label validation check for masks, avoiding this
error.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |_|/
| | |/| |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:
====================
Open vSwitch
Following fixes are accumulated in ovs-repo.
Three of them are related to protocol processing, one is
related to memory leak in case of error and one is to
fix race.
Patch "Validate IPv6 flow key and mask values" has conflicts
with net-next, Let me know if you want me to send the patch
for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Reject flow label key and mask values with invalid bits set.
Introduced by commit 3fdbd1ce11e5 ("openvswitch: add ipv6 'set'
action").
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.
Introduced by commit 6093ae9abac1 ("openvswitch: Minimize
dp and vport critical sections").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.
This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed
Introduced by commit 03f0d916aa03 ("openvswitch: Mega flow implementation").
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.
Introduced by commit 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action").
Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Need to free memory in case of sample action error.
Introduced by commit 651887b0c22cffcfce7eb9c ("openvswitch: Sample
action without side effects").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Solves possible lockup issues that can be seen from firmware DCB agents calling
into the DCB app api.
DCB firmware event queues can be tied in with NAPI so that dcb events are
generated in softIRQ context. This can results in calls to dcb_*app()
functions which try to take the dcb_lock.
If the the event triggers while we also have the dcb_lock because lldpad or
some other agent happened to be issuing a get/set command we could see a cpu
lockup.
This code was not originally written with firmware agents in mind, hence
grabbing dcb_lock from softIRQ context was not considered.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.
2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.
3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.
4) Fix wrong netns in nft_compat when setting target and match params
structure.
5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.
6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.
7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.
8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().
The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.
Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list.
While at this, fix coding style of the comment that has been
updated.
Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.
Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.
So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.
This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Instead of the match->name, which is of course not relevant.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.
This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Instead of init_net when using xtables over nftables compat.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.
This patch properly copies ->sk and increments its reference count.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|