summaryrefslogtreecommitdiffstats
path: root/sys/netinet/siftr.c
Commit message (Collapse)AuthorAgeFilesLines
* Move the SIFTR DTrace probe out of the writing thread contextgnn2015-04-301-1/+2
| | | | and directly into the place where the data is collected.
* Brief demo script showing the various values that can be read viagnn2015-04-291-0/+3
| | | | | | | the new SIFTR statically defined tracepoint (SDT). Differential Revision: https://reviews.freebsd.org/D2387 Reviewed by: bz, markj
* The addition of flowid and flowtype in r280233 and r280237 respectively forgotlstewart2015-03-241-1/+1
| | | | | | | to extend the IPv6 packet node format string, which causes a build failure when SIFTR is compiled with IPv6 support. Reported by: Lars Eggert
* Add connection flow type to siftr(4).hiren2015-03-191-3/+8
| | | | | Suggested by: adrian Sponsored by: Limelight Networks
* Add connection flowid to siftr(4).hiren2015-03-181-3/+8
| | | | | | | Reviewed by: lstewart MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D2089
* In preparation of merging projects/sendfile, transform bare access toglebius2014-11-121-2/+2
| | | | | | | | | | | | sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.
* The SYSCTL data pointers can come from userspace and must not behselasky2014-10-281-26/+27
| | | | | | | | | | directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies
* Include necessary headers that now are available due to pollutionglebius2013-10-281-0/+2
| | | | | | | via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.
* The hashmask returned by hashinit() is a valid index in the returned hash array.lstewart2013-03-071-1/+1
| | | | | | | | Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in hashdestroy() by ensuring the last array index in the flow counter hash table is flushed of entries. MFC after: 3 days
* Switch the entire IPv4 stack to keep the IP packet headerglebius2012-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
* Decompose the current single inpcbinfo lock into two locks:rwatson2011-05-301-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.
* Staticize malloc types.pluknet2011-04-131-8/+5
| | | | | Approved by: lstewart MFC after: 1 week
* Use the full and proper company name for Swinburne University of Technologylstewart2011-04-121-4/+5
| | | | | | | | throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days
* After some off-list discussion, revert a number of changes to thedim2010-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
* Fix a minor code redundancy nit.lstewart2010-11-201-3/+1
| | | | MFC after: 3 days
* When enabling or disabling SIFTR with a VIMAGE kernel, ensure we add or removelstewart2010-11-201-12/+24
| | | | | | | | | | | | the SIFTR pfil(9) hook functions to or from all network stacks. This patch allows packets inbound or outbound from a vnet to be "seen" by SIFTR. Additional work is required to allow SIFTR to actually generate log messages for all vnet related packets because the siftr_findinpcb() function does not yet search for inpcbs across all vnets. This issue will be fixed separately. Reported and tested by: David Hayes <dahayes at swin edu au> MFC after: 3 days
* Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughoutdim2010-11-141-1/+1
| | | | the tree.
* Standardise all Swinburne related copyright/licence statements throughout thelstewart2010-11-121-2/+2
| | | | | tree in preparation for another large code import. Swinburne University is the legal entity that owns copyright and the 2-clause BSD licence is acceptable.
* The university does not require that its CRICOS number be included in sourcelstewart2010-11-121-2/+1
| | | | | | code. Remove all references from the tree. MFC after: 3 days
* Log the number of segments currently in the reassembly queue.lstewart2010-09-251-6/+11
| | | | Sponsored by: FreeBSD Foundation
* Remove the TCP inflight bandwidth limiter as announced in r211315andre2010-09-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | to give way for the pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference. In 'struct tcpcb' the variables previously used by the inflight limiter are renamed to spares to keep the ABI intact and to have some more space for future extensions. In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to preserve the ABI. It is always set to 0. In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed to preserve the ABI. It is always set to 0. These unused variable in the various structures may be reused in the future or garbage collected before the next release or at some other point when an ABI change happens anyway for other reasons. No MFC is planned. The inflight bandwidth limiter stays disabled by default in the other branches but remains available.
* - Move common code from the hook functions that fills in a packet node struct tolstewart2010-07-181-115/+87
| | | | | | | | | | | | | a separate inline function. This further reduces duplicate code that didn't have a good reason to stay as it was. - Reorder the malloc of a pkt_node struct in the hook functions such that it only occurs if we managed to find a usable tcpcb associated with the packet. - Make the inp_locally_locked variable's type consistent with the prototype of siftr_siftdata(). Sponsored by: FreeBSD Foundation
* The SIFTR DPCPU statistics struct was not being zeroed between enable/disablelstewart2010-07-131-0/+2
| | | | | | cycles so the values would accumulate rather than reset for each cycle. Sponsored by: FreeBSD Foundation
* Catch up with the rename of DPCPU_SUM to DPCPU_VARSUM in r209978.lstewart2010-07-131-10/+10
| | | | Sponsored by: FreeBSD Foundation
* Import the Statistical Information For TCP Research (SIFTR) kernel module intolstewart2010-07-031-0/+1568
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log file, providing the ability to make highly granular measurements of TCP connection state. The tool is aimed at system administrators, developers and researchers alike. Please take it for a spin and test it out - the man page should have all the information required to get you going. Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated. Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: dwmalone, gnn, rpaulo Tested by: Many on freebsd-current@ and elsewhere over the years MFC after: 1 month
OpenPOWER on IntegriCloud