summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* When performing IP fast forwarding, immediately drop traffic which isbms2004-11-041-0/+6
| | | | | | | | | | | destined for a blackhole route. This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case. Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks
* Until this change, the UDP input code used global variables udp_in,rwatson2004-11-041-57/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks
* Remove RFC1644 T/TCP support from the TCP side of the network stack.andre2004-11-0213-841/+34
| | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
* Correct a bug in TCP SACK that could result in wedging of the TCP stackrwatson2004-10-301-2/+2
| | | | | | | | | | | under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>
* Add a matching tunable for net.inet.tcp.sack.enable sysctl.rwatson2004-10-261-0/+1
|
* Check that rt_mask(rt) is non-NULL before dereferencing it, in thebms2004-10-261-0/+1
| | | | | | RTM_ADD case, thus avoiding a panic. Submitted by: Iasen Kostov
* IPDIVERT is a module now and tell the other parts of the kernel about it.andre2004-10-251-0/+4
| | | | IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.
* For variables that are only checked with defined(), don't provideru2004-10-241-1/+1
| | | | any fake value.
* Shave 40 unused bytes from struct tcpcb.andre2004-10-221-1/+0
|
* When printing the initialization string and IPDIVERT is not compiled into theandre2004-10-221-1/+1
| | | | kernel refer to it as "loadable" instead of "disabled".
* Refuse to unload the ipdivert module unless the 'force' flag is given to ↵andre2004-10-221-1/+11
| | | | | | | kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.
* Destroy the UMA zone on unload.andre2004-10-191-0/+1
|
* Slightly extend the locking during unload to fully cover the protocolandre2004-10-191-5/+6
| | | | | deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.
* Annotate a newly introduced race present due to the unloading ofrwatson2004-10-191-0/+4
| | | | | | | | protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.
* Convert IPDIVERT into a loadable module. This makes use of the dynamic ↵andre2004-10-195-37/+92
| | | | | | | | | | | loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.
* Properly declare the "net.inet" sysctl subtree.andre2004-10-191-0/+1
|
* Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACERandre2004-10-191-0/+6
| | | | | to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.
* Make use of the PROTO_SPACER functionality for dynamically loadableandre2004-10-191-2/+19
| | | | | | | protocols in inetsw[] and define initially eight spacer slots. Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.
* Support for dynamically loadable and unloadable IP protocols in the ipmux.andre2004-10-192-1/+64
| | | | | | | | | | | | | | | With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets. The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.
* Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.andre2004-10-191-0/+1
|
* Make comments more clear. Change the order of one if() statement to check theandre2004-10-191-3/+8
| | | | more likely variable first.
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-183-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Don't release the udbinfo lock until after the last use of UDP inpcbrwatson2004-10-121-3/+3
| | | | | | | | in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.
* Modify the thrilling "%D is using my IP address %s!" message so thatrwatson2004-10-121-1/+7
| | | | | | it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.
* When the access control on creating raw sockets was modified so thatrwatson2004-10-121-20/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | processes in jail could create raw sockets, additional access control checks were added to raw IP sockets to limit the ways in which those sockets could be used. Specifically, only the socket option IP_HDRINCL was permitted in rip_ctloutput(). Other socket options were protected by a call to suser(). This change was required to prevent processes in a Jail from modifying system properties such as multicast routing and firewall rule sets. However, it also introduced a regression: processes that create a raw socket with root privilege, but then downgraded credential (i.e., a daemon giving up root, or a setuid process switching back to the real uid) could no longer issue other unprivileged generic IP socket option operations, such as IP_TOS, IP_TTL, and the multicast group membership options, which prevented multicast routing daemons (and some other tools) from operating correctly. This change pushes the access control decision down to the granularity of individual socket options, rather than all socket options, on raw IP sockets. When rip_ctloutput() doesn't implement an option, it will now pass the request directly to in_control() without an access control check. This should restore the functionality of the generic IP socket options for raw sockets in the above-described scenarios, which may be confirmed with the ipsockopt regression test. RELENG_5 candidate. Reviewed by: csjp
* Acquire the send socket buffer lock around tcp_output() activitiesrwatson2004-10-091-2/+14
| | | | | | | | | | reaching into the socket buffer. This prevents a number of potential races, including dereferencing of sb_mb while unlocked leading to a NULL pointer deref (how I found it). Potentially this might also explain other "odd" TCP behavior on SMP boxes (although haven't seen it reported). RELENG_5 candidate.
* When running with debug.mpsafenet=0, initialize IP multicast routingrwatson2004-10-071-4/+7
| | | | | | | | | callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an assertion regarding Giant if they enter other parts of the stack from the callout. MFC after: 3 days Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
* - Estimate the amount of data in flight in sack recovery and use itps2004-10-057-59/+83
| | | | | | | | | | to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
* Add support to IPFW for matching by TCP data length.green2004-10-032-0/+24
|
* Add support to IPFW for classification based on "diverted" statusgreen2004-10-033-16/+42
| | | | (that is, input via a divert socket).
* Add to IPFW the ability to do ALTQ classification/tagging.green2004-10-032-0/+54
|
* Validate the action pointer to be within the rule size, so that trying togreen2004-09-301-0/+5
| | | | add corrupt ipfw rules would not potentially panic the system or worse.
* Add an additional struct inpcb * argument to pfil(9) in order to enablemlaier2004-09-296-19/+45
| | | | | | | | | | | | | | | | | | | passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)
* Assign so_pcb to NULL rather than 0 as it's a pointer.rwatson2004-09-291-1/+1
| | | | Spotted by: dwhite
* o Turn net.inet.ip.check_interface sysctl off by default.maxim2004-09-241-1/+1
| | | | | | | | | | | When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to 0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only. Among with the fact this knob is not documented it breaks POLA especially in bridge environment. OK'ed by: andre Reviewed by: -current
* Fix an out of bounds write during the initialization of the PF_INET protocolandre2004-09-161-4/+14
| | | | | | | | | family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is larger than IPPROTO_MAX and was initializing memory beyond the array. Catch all these kinds of errors by ignoring protocols that are higher than IPPROTO_MAX or 0 (zero). Add more comments ip_init().
* Clarify some comments for the M_FASTFWD_OURS case in ip_input().andre2004-09-151-4/+4
|
* Remove the last two global variables that are used to store packet state whileandre2004-09-154-39/+48
| | | | | | | | | it travels through the IP stack. This wasn't much of a problem because IP source routing is disabled by default but when enabled together with SMP and preemption it would have very likely cross-corrupted the IP options in transit. The IP source route options of a packet are now stored in a mtag instead of the global variable.
* Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled intoandre2004-09-131-0/+4
| | | | the kernel. Return EINVAL instead.
* If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don'tandre2004-09-131-3/+5
| | | | | | | prepend an already existing tag again. Instead unlink it and prepend it again to have it as the first tag in the chain. PR: kern/71380
* Make comments more clear for the packet changed cases after pfil hooks.andre2004-09-131-1/+2
|
* Fix ip_input() fallback for the destination modified cases (from the packetandre2004-09-131-6/+4
| | | | | | | | | filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS tagged packets to have ip_len and ip_off in host byte order instead of network byte order. PR: kern/71652 Submitted by: mlaier (patch)
* Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copiedandre2004-09-131-11/+11
| | | | | | | | | | and sent to the DIVERT socket while the original packet continues with the next rule. Unlike a normally diverted packet no IP reassembly attemts are made on tee'd packets and they are passed upwards totally unmodified. Note: This will not be MFC'd to 4.x because of major infrastucture changes. PR: kern/64240 (and many others collapsed into that one)
* Check flag do_bridge always, even if kernel was compiled withoutglebius2004-09-091-11/+5
| | | | | | | | BRIDGE support. This makes dynamic bridge.ko working. Reviewed by: sam Approved by: julian (mentor) MFC after: 1 week
* revert comment from rev1.158 now that rev1.225 backed it out..jmg2004-09-061-3/+1
| | | | MFC after: 3 days
* Recover normal behavior: return EINVAL to attempt to add a divert ruleglebius2004-09-051-2/+5
| | | | | | | when module is built without IPDIVERT. Silence from: andre Approved by: julian (mentor)
* fix up socket/ip layer violation... don't assume/know thatjmg2004-09-056-8/+17
| | | | SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
* Apply error and success logic consistently to the function netisr_queue() andandre2004-08-271-1/+1
| | | | | | | | | | | | | | | | | | its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days
* In the case the destination of a packet was changed by the packet filterandre2004-08-271-2/+2
| | | | | | | | | | | | to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface. Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface. Submitted by: brooks
* Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion.andre2004-08-271-1/+0
|
OpenPOWER on IntegriCloud