| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
r286227:
Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).
- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.
It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.
PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.
r286443:
Fix a kernel assertion issue introduced with r286227:
Avoid too strict INP_INFO_RLOCK_ASSERT checks due to
tcp_notify() being called from in6_pcbnotify().
Reported by: Larry Rosenman <ler@lerctr.org>
Submitted by: markj, jch
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix a double-free when an inp transitions to INP_TIMEWAIT state
after having been dropped.
This change enforces in_pcbdrop() logic in tcp_input():
"in_pcbdrop() is used by TCP to mark an inpcb as unused and avoid future packet
delivery or event notification when a socket remains open but TCP has closed."
PR: 203175
Reported by: Palle Girgensohn, Slawa Olhovchenkov
Tested by: Slawa Olhovchenkov
Reviewed by: Slawa Olhovchenkov
Approved by: gnn, Slawa Olhovchenkov
Differential Revision: https://reviews.freebsd.org/D8211
Sponsored by: Verisign, inc
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
r271119:
In tcp_input(), don't acquire the pcbinfo global write lock for SYN
packets targeting a listening socket. Permit to reduce TCP input
processing starvation in context of high SYN load (e.g. short-lived TCP
connections or SYN flood).
Submitted by: Julien Charbon <jcharbon@verisign.com>
Reviewed by: adrian, hiren, jhb, Mike Bentkofsky
r272081:
Catch up with r271119.
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change net.inet.tcp.ecn.enable sysctl mib from a binary off/on
control to a three way setting.
0 - Totally disable ECN. (no change)
1 - Enable ECN if incoming connections request it. Outgoing
connections will request ECN. (no change from present != 0 setting)
2 - Enable ECN if incoming connections request it. Outgoing
conections will not request ECN.
Change the default value of net.inet.tcp.ecn.enable from 0 to 2.
Linux version 2.4.20 and newer, Solaris, and Mac OS X 10.5 and newer have
similar capabilities. The actual values above match Linux, and the default
matches the current Linux default.
Reviewed by: eadler
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6386
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| | |
Prevent underflows in tp->snd_wnd if the remote side ACKs more than
tp->snd_wnd. This can happen, for example, when the remote side responds
to a window probe by ACKing the one byte it contains.
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Improve tcp duplicate ack processing when SACK is present.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Calculate the correct amount of bytes that are in-flight for a connection as
suggested by RFC 6675.
MFC: r292046
r290122 added 4 bytes and removed 8 in struct sackhint. Add a pad entry of 4
bytes to restore the size.
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Implementation of server-side TCP Fast Open (TFO) [RFC7413].
TFO is disabled by default in the kernel build. See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.
Differential Revision: https://reviews.freebsd.org/D4350
Sponsored by: Verisign, Inc.
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Add a comment specifying how we implement rfc3042.
|
|/
|
|
|
|
|
|
|
|
|
| |
Do not count security policy violation twice.
ipsec*_in_reject() do this by their own.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
TAG: IPSEC-HEAD
Issue: #4841
|
|
|
|
|
|
|
|
| |
Check TCP timestamp option flag so that the automatic receive buffer
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Approved by: re (gjb)
|
|
|
|
|
|
|
|
| |
Ensure that the flowid hashtype is assigned to the inp if the flowid
is also assigned.
Spotted by: gallatin
Tested by: gallatin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r275358:
Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.
r275483:
Remove M_FLOWID from SCTP code.
r276982:
Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr
manpage.
Note: The FreeBSD version has been bumped.
Reviewed by: hps, tuexen
Sponsored by: Limelight Networks
|
|
|
|
|
|
|
|
|
| |
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.
Sponsored by: Mellanox Technologies
|
|
|
|
|
| |
Security: FreeBSD-SA-14:19.tcp
Approved by: re (implicit, security advisory)
|
|
|
|
|
|
| |
Remove the prototpye for the static inline function
tcp_signature_verify_input().
The function is defined before first use already.
|
|
|
|
|
|
|
|
| |
Remove the prototypes for things that are no longer file local but were
moved to the header file.
Was suppoed to be MFCed with: r266596
Pointy hat to: bz
|
|
|
|
|
|
|
|
| |
Move the tcp_fields_to_host() and tcp_fields_to_net() (inline)
functions to the tcp_var.h header file in order to avoid further
duplication with upcoming commits.
Reviewed by: np
|
| |
|
|
|
|
| |
MFC slacker: adrian
|
|
|
|
| |
Pointy hat to: peter
|
|
|
|
| |
PR: kern/99188
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TCP delayed ACK logic isn't aware of LRO passing up large aggregated
segments thinking it received only one segment. This causes it to enable
the delay the ACK for 100ms to wait for another segment which may never
come because all the data was received already.
Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
us further away from acking every other packet; b) it introduces additional
delay in responding to the sender. The latter is especially bad because it
is in the nature of LRO to aggregated all segments of a burst with no more
coming until an ACK is sent back.
Change the delayed ACK logic to detect LRO segments by being larger than
the MSS for this connection and issuing an immediate ACK for them to keep
the ACK clock ticking without interruption.
Reported by: julian, cperciva
Tested by: cperciva
Reviewed by: lstewart
Approved by: re (glebius)
|
|
|
|
|
|
|
|
|
|
|
| |
sbdrop_locked() to cut acked mbufs from the socket buffer. Free this
chain a batch manner after the socket buffer lock is dropped.
This measurably reduces contention on socket buffer.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Approved by: re (marius)
|
|
|
|
|
|
|
|
|
| |
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.
Tested by: gnn, hiren
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
| |
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.
This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.
Reported by: Taku YAMAMOTO, Maciej Milewski
|
|
|
|
|
|
| |
violations.
Update related comments and style.
|
|
|
|
| |
Change interface of kread_counters() similar ot kread() in the netstat(1).
|
|
|
|
|
|
|
|
| |
increased the pointer, not the memory it points to.
In collaboration with: kib
Reported & tested by: Ian FREISLICH <ianf clue.co.za>
Sponsored by: Nginx, Inc.
|
|
|
|
|
|
| |
accounting.
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
limited in the amount of data they can handle at once.
Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.
The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore. The upper limit is still at
IP_MAXPACKET (65536 bytes). Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.
The placement into "struct ifnet" is a bit hackish but the best place
that was found. When the stack/driver boundary is updated it should
be handled in a better way.
Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week (using spare struct members to preserve ABI)
|
|
|
|
|
|
|
|
|
| |
duplicate ACK make sure we actually have new data to send.
This prevents us from sending unneccessary pure ACKs.
Reported by: Matt Miller <matt@matthewjmiller.net>
Tested by: Matt Miller <matt@matthewjmiller.net>
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
connections in the accept queue and contiguous new incoming SYNs.
Compared to the original submitters patch I've moved the test
next to the SYN handling to have it together in a logical unit
and reworded the comment explaining the issue.
Submitted by: Matt Miller <matt@matthewjmiller.net>
Submitted by: Juan Mojica <jmojica@gmail.com>
Reviewed by: Matt Miller (changes)
Tested by: pho
MFC after: 1 week
|
| |
|
|
|
|
|
|
|
|
|
| |
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).
This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.
Sponsored by: Nginx, Inc.
|
|
|
|
|
|
|
|
|
|
| |
For TIMEWAIT handling tcp_input may have to jump back for an additional
pass through pcblookup. Prior to this change the fwd_tag had been
discarded after the first lookup, so a new connection attempt delivered
locally via 'ipfw fwd' would fail to find a match.
As of r248886 the tag will be detached and freed when passed to the
socket buffer.
|
|
|
|
|
|
|
|
|
|
|
|
| |
logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are
unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could
therefore potentially corrupt the result (although under normal operation,
neither variable should legitmately exceed 32 bits).
[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html
Submitted by: jhb
MFC after: 1 week
|
| |
|
| |
|
|
|
|
|
|
|
| |
same code for IPv4 and IPv6 in tcp_input, we should check both
M_IP_NEXTHOP and M_IP6_NEXTHOP flags.
MFC after: 3 days
|
|
|
|
|
|
|
|
|
| |
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.
Reported & tested by: Pawel Tyll <ptyll nitronet.pl>
MFC after: 3 days
|