summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-183-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Don't release the udbinfo lock until after the last use of UDP inpcbrwatson2004-10-121-3/+3
| | | | | | | | in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.
* Modify the thrilling "%D is using my IP address %s!" message so thatrwatson2004-10-121-1/+7
| | | | | | it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.
* When the access control on creating raw sockets was modified so thatrwatson2004-10-121-20/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | processes in jail could create raw sockets, additional access control checks were added to raw IP sockets to limit the ways in which those sockets could be used. Specifically, only the socket option IP_HDRINCL was permitted in rip_ctloutput(). Other socket options were protected by a call to suser(). This change was required to prevent processes in a Jail from modifying system properties such as multicast routing and firewall rule sets. However, it also introduced a regression: processes that create a raw socket with root privilege, but then downgraded credential (i.e., a daemon giving up root, or a setuid process switching back to the real uid) could no longer issue other unprivileged generic IP socket option operations, such as IP_TOS, IP_TTL, and the multicast group membership options, which prevented multicast routing daemons (and some other tools) from operating correctly. This change pushes the access control decision down to the granularity of individual socket options, rather than all socket options, on raw IP sockets. When rip_ctloutput() doesn't implement an option, it will now pass the request directly to in_control() without an access control check. This should restore the functionality of the generic IP socket options for raw sockets in the above-described scenarios, which may be confirmed with the ipsockopt regression test. RELENG_5 candidate. Reviewed by: csjp
* Acquire the send socket buffer lock around tcp_output() activitiesrwatson2004-10-091-2/+14
| | | | | | | | | | reaching into the socket buffer. This prevents a number of potential races, including dereferencing of sb_mb while unlocked leading to a NULL pointer deref (how I found it). Potentially this might also explain other "odd" TCP behavior on SMP boxes (although haven't seen it reported). RELENG_5 candidate.
* When running with debug.mpsafenet=0, initialize IP multicast routingrwatson2004-10-071-4/+7
| | | | | | | | | callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an assertion regarding Giant if they enter other parts of the stack from the callout. MFC after: 3 days Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
* - Estimate the amount of data in flight in sack recovery and use itps2004-10-057-59/+83
| | | | | | | | | | to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
* Add support to IPFW for matching by TCP data length.green2004-10-032-0/+24
|
* Add support to IPFW for classification based on "diverted" statusgreen2004-10-033-16/+42
| | | | (that is, input via a divert socket).
* Add to IPFW the ability to do ALTQ classification/tagging.green2004-10-032-0/+54
|
* Validate the action pointer to be within the rule size, so that trying togreen2004-09-301-0/+5
| | | | add corrupt ipfw rules would not potentially panic the system or worse.
* Add an additional struct inpcb * argument to pfil(9) in order to enablemlaier2004-09-296-19/+45
| | | | | | | | | | | | | | | | | | | passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)
* Assign so_pcb to NULL rather than 0 as it's a pointer.rwatson2004-09-291-1/+1
| | | | Spotted by: dwhite
* o Turn net.inet.ip.check_interface sysctl off by default.maxim2004-09-241-1/+1
| | | | | | | | | | | When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to 0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only. Among with the fact this knob is not documented it breaks POLA especially in bridge environment. OK'ed by: andre Reviewed by: -current
* Fix an out of bounds write during the initialization of the PF_INET protocolandre2004-09-161-4/+14
| | | | | | | | | family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is larger than IPPROTO_MAX and was initializing memory beyond the array. Catch all these kinds of errors by ignoring protocols that are higher than IPPROTO_MAX or 0 (zero). Add more comments ip_init().
* Clarify some comments for the M_FASTFWD_OURS case in ip_input().andre2004-09-151-4/+4
|
* Remove the last two global variables that are used to store packet state whileandre2004-09-154-39/+48
| | | | | | | | | it travels through the IP stack. This wasn't much of a problem because IP source routing is disabled by default but when enabled together with SMP and preemption it would have very likely cross-corrupted the IP options in transit. The IP source route options of a packet are now stored in a mtag instead of the global variable.
* Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled intoandre2004-09-131-0/+4
| | | | the kernel. Return EINVAL instead.
* If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don'tandre2004-09-131-3/+5
| | | | | | | prepend an already existing tag again. Instead unlink it and prepend it again to have it as the first tag in the chain. PR: kern/71380
* Make comments more clear for the packet changed cases after pfil hooks.andre2004-09-131-1/+2
|
* Fix ip_input() fallback for the destination modified cases (from the packetandre2004-09-131-6/+4
| | | | | | | | | filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS tagged packets to have ip_len and ip_off in host byte order instead of network byte order. PR: kern/71652 Submitted by: mlaier (patch)
* Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copiedandre2004-09-131-11/+11
| | | | | | | | | | and sent to the DIVERT socket while the original packet continues with the next rule. Unlike a normally diverted packet no IP reassembly attemts are made on tee'd packets and they are passed upwards totally unmodified. Note: This will not be MFC'd to 4.x because of major infrastucture changes. PR: kern/64240 (and many others collapsed into that one)
* Check flag do_bridge always, even if kernel was compiled withoutglebius2004-09-091-11/+5
| | | | | | | | BRIDGE support. This makes dynamic bridge.ko working. Reviewed by: sam Approved by: julian (mentor) MFC after: 1 week
* revert comment from rev1.158 now that rev1.225 backed it out..jmg2004-09-061-3/+1
| | | | MFC after: 3 days
* Recover normal behavior: return EINVAL to attempt to add a divert ruleglebius2004-09-051-2/+5
| | | | | | | when module is built without IPDIVERT. Silence from: andre Approved by: julian (mentor)
* fix up socket/ip layer violation... don't assume/know thatjmg2004-09-056-8/+17
| | | | SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
* Apply error and success logic consistently to the function netisr_queue() andandre2004-08-271-1/+1
| | | | | | | | | | | | | | | | | | its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days
* In the case the destination of a packet was changed by the packet filterandre2004-08-271-2/+2
| | | | | | | | | | | | to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface. Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface. Submitted by: brooks
* Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion.andre2004-08-271-1/+0
|
* Always compile PFIL_HOOKS into the kernel and remove the associated kernelandre2004-08-275-47/+29
| | | | | | | | | | | compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack. If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active. Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
* Revert the last change to sys/modules/ipfw/Makefile and fix aru2004-08-262-1/+5
| | | | | | | standalone module build in a better way. Silence from: andre MFC after: 3 days
* Allocate memory when dumping pipes with M_WAITOK flag.pjd2004-08-251-9/+33
| | | | | | | | | | | | | On a system with huge number of pipes, M_NOWAIT failes almost always, because of memory fragmentation. My fix is different than the patch proposed by Pawel Malachowski, because in FreeBSD 5.x we cannot sleep while holding dummynet mutex (in 4.x there is no such lock). My fix is also ugly, but there is no easy way to prepare nice and clean fix. PR: kern/46557 Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> Reviewed by: mlaier
* Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.mlaier2004-08-221-13/+14
| | | | | | | | | | Previously the early drop was disabled unconditionally for ALTQ-enabled kernels. This should give some benefit for the normal gateway + LAN-server case with a busy LAN leg and an ALTQ managed uplink. Reviewed and style help from: cperciva, pjd
* When sliding the m_data pointer forward, update m_pktrhdr.len as wellrwatson2004-08-221-1/+3
| | | | | | | | | as m_len, or the pkthdr length will be inconsistent with the actual length of data in the mbuf chain. The symptom of this occuring was "out of data" warnings from in_cksum_skip() on large UDP packets sent via the loopback interface. Foot shot: green
* When a prison is given the ability to create raw sockets (when thecsjp2004-08-212-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged access to jails is given out, it is possible for prison root to manipulate various network parameters which effect the host environment. This commit plugs a number of security holes associated with the use of raw sockets and prisons. This commit makes the following changes: - Add a comment to rtioctl warning developers that if they add any ioctl commands, they should use super-user checks where necessary, as it is possible for PRISON root to make it this far in execution. - Add super-user checks for the execution of the SIOCGETVIFCNT and SIOCGETSGCNT IP multicast ioctl commands. - Add a super-user check to rip_ctloutput(). If the calling cred is PRISON root, make sure the socket option name is IP_HDRINCL, otherwise deny the request. Although this patch corrects a number of security problems associated with raw sockets and prisons, the warning in jail(8) should still apply, and by default we should keep the default value of security.jail.allow_raw_sockets MIB to 0 (or disabled) until we are certain that we have tracked down all the problems. Looking forward, we will probably want to eliminate the references to curthread. This may be a MFC candidate for RELENG_5. Reviewed by: rwatson Approved by: bmilekic (mentor)
* When prepending space onto outgoing UDP datagram payloads to hold therwatson2004-08-211-4/+7
| | | | | | | | | UDP/IP header, make sure that space is also allocated for the link layer header. If an mbuf must be allocated to hold the UDP/IP header (very likely), then this will avoid an additional mbuf allocation at the link layer. This trick is also used by TCP and other protocols to avoid extra calls to the mbuf allocator in the ethernet (and related) output routines.
* Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaningandre2004-08-201-1/+1
| | | | | | | up its remains. Do not terminate 'if' lines with ';'. Spotted by: claudio@OpenBSD.ORG (sitting 3m from my desk) Pointy hat to: andre
* When unloading ipfw module use callout_drain() to make absolutely sure thatandre2004-08-191-1/+1
| | | | | all callouts are stopped and finished. Move it before IPFW_LOCK() to avoid deadlocking when draining callouts.
* For IPv6 access pointer to tcpcb only after we have checked it is valid.andre2004-08-192-2/+8
| | | | Found by: Coverity's automated analysis (via Ted Unangst)
* Give a useful error message if someone tries to compile IPFIREWALL into theandre2004-08-191-0/+4
| | | | kernel without specifying PFIL_HOOKS as well.
* Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when buildingandre2004-08-192-4/+0
| | | | | | | | | | | | the ipfw KLD. For IPFIREWALL_FORWARD this does not have any side effects. If the module has it but not the kernel it just doesn't do anything. For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT compiled in too. However this is the least disturbing behaviour. The user can just recompile either module or the kernel to match the other one. The access to the machine is not denied if ipfw refuses to load.
* Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scriptsandre2004-08-194-0/+11
| | | | and to be able to disable ipfw if it was compiled directly into the kernel.
* Push down pcbinfo and inpcb locking from udp_send() into udp_output().rwatson2004-08-191-25/+35
| | | | | | | This provides greater context for the locking and allows us to avoid locking the pcbinfo structure if not binding operations will take place (i.e., already bound, connected, and no expliti sendto() address).
* In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock.rwatson2004-08-191-1/+1
|
* Fix build of ip_input.c with "options IPSEC" -- the "pass:" labelrwatson2004-08-181-1/+1
| | | | | is used with both FAST_IPSEC and IPSEC, but was defined for only FAST_IPSEC.
* Make the kernel compile again if you are not using PFIL_HOOKSpeter2004-08-181-0/+4
|
* Convert ipfw to use PFIL_HOOKS. This is change is transparent to userlandandre2004-08-1714-836/+648
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)
* White space cleanup for netinet before branch:rwatson2004-08-1632-674/+674
| | | | | | | | | | | - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
* Put the 'antispoof' opcode in the proper place in the opcode list suchobrien2004-08-161-1/+1
| | | | that it doesn't break the ipfw2 ABI.
* Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSDdwmalone2004-08-148-46/+23
| | | | | | | | | | | | | | | | | | | | | have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
OpenPOWER on IntegriCloud