summaryrefslogtreecommitdiffstats
path: root/sys/netinet/in_pcb.c
Commit message (Collapse)AuthorAgeFilesLines
* Port randomization leads to extremely fast port reuse at highsilby2005-01-021-4/+52
| | | | | | | | | | | | | | | | | connection rates, which is causing problems for some users. To retain the security advantage of random ports and ensure correct operation for high connection rate users, disable port randomization during periods of high connection rates. Whenever the connection rate exceeds randomcps (10 by default), randomization will be disabled for randomtime (45 by default) seconds. These thresholds may be tuned via sysctl. Many thanks to Igor Sysoev, who proved the necessity of this change and tested many preliminary versions of the patch. MFC After: 20 seconds
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Assign so_pcb to NULL rather than 0 as it's a pointer.rwatson2004-09-291-1/+1
| | | | Spotted by: dwhite
* In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock.rwatson2004-08-191-1/+1
|
* Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect()rwatson2004-08-111-0/+6
| | | | | and in_pcbconnect_setup(), since these functions frob the port and address state of inpcbs.
* Disallow a particular kind of port theft described by the following scenario:yar2004-07-281-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Alice is too lazy to write a server application in PF-independent manner. Therefore she knocks up the server using PF_INET6 only and allows the IPv6 socket to accept mapped IPv4 as well. An evil hacker known on IRC as cheshire_cat has an account in the same system. He starts a process listening on the same port as used by Alice's server, but in PF_INET. As a consequence, cheshire_cat will distract all IPv4 traffic supposed to go to Alice's server. Such sort of port theft was initially enabled by copying the code that implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for the implied case of the same owner for both connections. After this change, the above scenario will be impossible. In the same setting, the user who attempts to start his server last will get EADDRINUSE. Of course, using IPv4 mapped to IPv6 leads to security complications in the first place, but there is no reason to make it even more unsafe. This change doesn't apply to KAME since it affects a FreeBSD-specific part of the code. It doesn't modify the out-of-box behaviour of the TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default. MFC after: 1 month
* Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This iscperciva2004-07-261-2/+2
| | | | | | | | | | | somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
* o connect(2): if there is no a route to the destinationmaxim2004-06-161-3/+1
| | | | | | | | do not pick up the first local ip address for the source ip address, return ENETUNREACH instead. Submitted by: Gleb Smirnoff Reviewed by: -current (silence)
* Socket MAC labels so_label and so_peerlabel are now protected byrwatson2004-06-131-1/+4
| | | | | | | | | | | | | SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.
* Extend coverage of SOCK_LOCK(so) to include so_count, the socketrwatson2004-06-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* When checking for possible port theft, skip over a TCP inpcbyar2004-05-201-7/+3
| | | | | | | | | | | | | | unless it's in the closed or listening state (remote address == INADDR_ANY). If a TCP inpcb is in any other state, it's impossible to steal its local port or use it for port theft. And if there are both closed/listening and connected TCP inpcbs on the same localIP:port couple, the call to in_pcblookup_local() will find the former due to the design of that function. No objections raised in: -net, -arch MFC after: 1 month
* Wrap two long lines in the previous commit.silby2004-04-231-2/+4
|
* Take out an unneeded variable I forgot to remove in the last commit,silby2004-04-221-2/+3
| | | | and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
* Simplify random port allocation, and add net.inet.ip.portrange.randomized,silby2004-04-221-27/+13
| | | | | | which can be used to turn off randomized port allocation if so desired. Requested by: alfred
* Switch from using sequential to random ephemeral port allocation,silby2004-04-201-6/+28
| | | | | | | | | implementation taken directly from OpenBSD. I've resisted committing this for quite some time because of concern over TIME_WAIT recycling breakage (sequential allocation ensures that there is a long time before ports are recycled), but recent testing has shown me that my fears were unwarranted.
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h>bde2004-04-061-9/+9
| | | | | | | | | to implement this mistake. Fixed some nearby style bugs (initialization in declaration, misformatting of this initialization, missing blank line after the declaration, and comparision of the non-boolean result of the initialization with 0 using "!". In KNF, "!" is not even used to compare booleans with 0).
* Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:pjd2004-03-271-25/+24
| | | | | | | | | | | | | | - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume
* Remove unused argument.pjd2004-03-271-2/+1
| | | | Reviewed by: ume
* Remove unused function.pjd2004-03-251-10/+0
| | | | It was used in FreeBSD 4.x, but now we're using cr_canseesocket().
* Scrub unused variable zeroin_addr.rwatson2004-03-101-2/+0
|
* do not deref freed pointerume2004-01-131-2/+2
| | | | | Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net> Reviewed by: itojun
* Make sure all uses of stack allocated struct route's are properlyandre2003-11-261-2/+1
| | | | | | | | zeroed. Doing a bzero on the entire struct route is not more expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst. Reviewed by: sam (mentor) Approved by: re (scottl)
* Split the "inp" mutex class into separate classes for each of divert,sam2003-11-261-2/+3
| | | | | | | | raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)
* bzero() the the sockaddr used for the destination address fortmm2003-11-231-0/+1
| | | | | | | | | | | | rtalloc_ign() in in_pcbconnect_setup() before it is filled out. Otherwise, stack junk would be left in sin_zero, which could cause host routes to be ignored because they failed the comparison in rn_match(). This should fix the wrong source address selection for connect() to 127.0.0.1, among other things. Reviewed by: sam Approved by: re (rwatson)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-83/+14
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-181-7/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)sam2003-11-131-0/+1
| | | | Supported by: FreeBSD Foundation
* o reorder some locking asserts to reflect the order of the lockssam2003-11-131-3/+4
| | | | | | | o correct a read-lock assert in in_pcblookup_local that should be a write-lock assert (since time wait close cleanups may alter state) Supported by: FreeBSD Foundation
* In in_pcbconnect_setup(), don't use the cached inp->inp_route unlessiedowse2003-11-101-4/+4
| | | | | | | | it is marked as RTF_UP. This appears to fix a crash that was sometimes triggered when dhclient(8) tried to send a packet after an interface had been detatched. Reviewed by: sam
* add locking assertionssam2003-11-081-4/+29
| | | | Supported by: FreeBSD Foundation
* - cleanup SP refcnt issue.ume2003-11-041-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec_*_policy -> ipsec_*_pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct secpolicy *) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME
* - Add a new function tcp_twrecycleable, which tells us if the ISN whichsilby2003-11-011-0/+12
| | | | | | | | | | | | | we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.
* Overhaul routing table entry cleanup by introducing a new rtexpungesam2003-10-301-5/+3
| | | | | | | | | | | | routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference). Supported by: FreeBSD Foundation
* Locking for updates to routing table entries. Each rtentry gets a mutexsam2003-10-041-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
* Consistently use the BSD u_int and u_short instead of the SYSV uint andjhb2003-08-071-1/+1
| | | | | | | ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)
* Deprecate machine/limits.h in favor of new sys/limits.h.kan2003-04-291-2/+1
| | | | | | | Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
* The ancient and outdated concept of "privileged ports" in UNIX-typecjc2003-02-211-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | OSes has probably caused more problems than it ever solved. Allow the user to retire the old behavior by specifying their own privileged range with, net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1 net.inet.ip.portrange.reservedlo default = 0 Now you can run that webserver without ever needing root at all. Or just imagine, an ftpd that can really drop privileges, rather than just set the euid, and still do PORT data transfers from 20/tcp. Two edge cases to note, # sysctl net.inet.ip.portrange.reservedhigh=0 Opens all ports to everyone, and, # sysctl net.inet.ip.portrange.reservedhigh=65535 Locks all network activity to root only (which could actually have been achieved before with ipfw(8), but is somewhat more complicated). For those who stick to the old religion that 0-1023 belong to root and root alone, don't touch the knobs (or even lock them by raising securelevel(8)), and nothing changes.
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPjlemon2003-02-191-4/+25
| | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
* Back out M_* changes, per decision of the TRB.imp2003-02-191-1/+1
| | | | Approved by: trb
* in_pcbnotifyall() requires an exclusive protocol lock for notify functionshsu2003-02-121-7/+7
| | | | which modify the connection list, namely, tcp_notify().
* remove the restriction on build a kernel with FAST_IPSEC and INET6;sam2003-01-301-3/+0
| | | | | | you still don't want to use the two together, but it's ok to have them in the same kernel (the problem that initiated this bandaid has long since been fixed)
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-1/+1
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* temporarily disallow FAST_IPSEC and INET6 to avoid potential panics;sam2002-11-081-0/+3
| | | | will correct this before 5.0 release
* Replace in_pcbladdr() with a more generic inner subroutine foriedowse2002-10-211-84/+121
| | | | | | | | | | | | | | | in_pcbconnect() called in_pcbconnect_setup(). This version performs all of the functions of in_pcbconnect() except for the final committing of changes to the PCB. In the case of an EADDRINUSE error it can also provide to the caller the PCB of the duplicate connection, avoiding an extra in_pcblookup_hash() lookup in tcp_connect(). This change will allow the "temporary connect" hack in udp_output() to be removed and is part of the preparation for adding the IP_SENDSRCADDR control message. Discussed on: -net Approved by: re
* Split out most of the logic from in_pcbbind() into a new functioniedowse2002-10-201-36/+64
| | | | | | | | | | | called in_pcbbind_setup() that does everything except commit the changes to the PCB. There should be no functional change here, but in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR control message implementation to check or allocate the source address and port. Discussed on: -net Approved by: re
* Tie new "Fast IPsec" code into the build. This involves the usualsam2002-10-161-0/+10
| | | | | | | | | | | | configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
* Create new functions in_sockaddr(), in6_sockaddr(), andtruckman2002-08-211-26/+27
| | | | | | | | | | | | | | | | | in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked. Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept(). Reviewed by: suz
* cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,ume2002-07-251-1/+1
| | | | | | ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week
* Notify functions can destroy the pcb, so they have to return anhsu2002-06-141-2/+3
| | | | | | | | indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
OpenPOWER on IntegriCloud