summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* Configure UMA warnings for the following zones:pjd2012-12-071-0/+1
| | | | | | | | | | | | | | | | | - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks
* Schedule garbage collection run for the in-flight rights passed overkib2012-11-201-3/+3
| | | | | | | | | | | | | | | | | | | | | the unix domain sockets to the next tick, coalescing the serial calls until the collection fires. The thought is that more work for the collector could arise in the near time, allowing to clean more and not spend too much CPU on repeated collection when there is no garbage. Currently the collection task is fired immediately upon unix domain socket close if there are any rights in flight, which caused excessive CPU usage and too long blocking of the threads waiting for unp_list_lock and unp_link_rwlock in write mode. Robert noted that it would be nice if we could find some heuristic by which we decide whether to run GC a bit more quickly. E.g., if the number of UNIX domain sockets is close to its resource limit, but not quite. Reported and tested by: Markus Gebert <markus.gebert@hostpoint.ch> Reviewed by: rwatson MFC after: 2 weeks
* Update comment.glebius2012-11-161-1/+2
|
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-24/+5
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix up kernel sources to be ready for a 64-bit ino_t.mdf2012-09-271-1/+1
| | | | Original code by: Gleb Kurtsou
* Supply the pr_ctloutput method for local datagram sockets,glebius2012-09-071-0/+1
| | | | | | | so that setsockopt() and getsockopt() work on them. This makes 'tools/regression/sockets/unix_cmsg -t dgram' more successful.
* When checking if file descriptor number is valid, explicitely check for 'fd'pjd2012-06-131-1/+1
| | | | | | being less than 0 instead of using cast-to-unsigned hack. Today's commit was brought to you by the letters 'B', 'D' and 'E' :)
* Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()trociny2012-02-291-8/+6
| | | | | | | | | | | | | | | | | | | operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
* When detaching an unix domain socket, uipc_detach() checkstrociny2012-02-251-0/+39
| | | | | | | | | | | | | | | unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks
* unp_connect() may use a shared lock on the vnode to fetch the socket.trociny2012-02-211-2/+2
| | | | | | Suggested by: jhb Reviewed by: jhb, kib, rwatson MFC after: 2 weeks
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-4/+5
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Fix handling of corrupt compress(1)ed data. [11:04]bz2011-09-281-0/+4
| | | | | | | | | | Add missing length checks on unix socket addresses. [11:05] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-11:04.compress Security: CVE-2011-2895 [11:04] Security: FreeBSD-SA-11:05.unix
* Prevent the hiwatermark for the unix domain socket from becomingkib2011-08-201-2/+5
| | | | | | | | | | | | | | | | | effectively negative. Often seen as upstream fastcgi connection timeouts in nginx when using sendfile over unix domain sockets for communication. Sendfile(2) may send more bytes then currently allowed by the hiwatermark of the socket, e.g. because the so_snd sockbuf lock is dropped after sbspace() call in the kern_sendfile() loop. In this case, recalculated hiwatermark will overflow. Since lowatermark is renewed as half of the hiwatermark by sendfile code, and both are unsigned, the send buffer never reaches the free space requested by lowatermark, causing indefinite wait in sendfile. Reviewed by: rwatson Approved by: re (bz) MFC after: 2 weeks
* Mfp4 CH=177274,177280,177284-177285,177297,177324-177325bz2011-02-161-2/+10
| | | | | | | | | | | | | | | | | | | | | | VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
* The unp_gc() function drops and reaquires lock between scan andkib2011-02-011-12/+16
| | | | | | | | | | | | | | | | | | | | | | collect phases. The unp_discard() function executes unp_externalize_fp(), which might make the socket eligible for gc-ing, and then, later, taskqueue will close the socket. Since unp_gc() dropped the list lock to do the malloc, close might happen after the mark step but before the collection step, causing collection to not find the socket and miss one array element. I believe that the race was there before r216158, but the stated revision made the window much wider by postponing the close to taskqueue sometimes. Only process as much array elements as we find the sockets during second phase of gc [1]. Take linkage lock and recheck the eligibility of the socket for gc, as well as call fhold() under the linkage lock. Reported and tested by: jmallett Submitted by: jmallett [1] Reviewed by: rwatson, jeff (possibly) MFC after: 1 week
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needmdf2011-01-181-9/+10
| | | | to rely on the format string.
* Trim whitespaces at the end of lines. Use the commit to recordkib2010-12-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | proper log message for r216150. MFC after: 1 week If unix socket has a unix socket attached as the rights that has a unix socket attached as the rights that has a unix socket attached as the rights ... Kernel may overflow the stack on attempt to close such socket. Only close the rights file in the context of the current close if the file is not unix domain socket. Otherwise, postpone the work to taskqueue, preventing unlimited recursion. The pass of the unix domain sockets over the SCM_RIGHTS message control is not widely used, and more, the close of the socket with still attached rights is mostly an application failure. The change should not affect the performance of typical users of SCM_RIGHTS. Reviewed by: jeff, rwatson
* Reviewed by: jeff, rwatsonkib2010-12-031-5/+74
| | | | MFC after: 1 week
* Remove spurious '/*-' marks and fix some other style problems.trasz2010-07-221-1/+1
| | | | Submitted by: bde@
* Revert r210225 - turns out I was wrong; the "/*-" is not license-onlytrasz2010-07-181-1/+1
| | | | | | | thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@
* The "/*-" comment marker is supposed to denote copyrights. Remove non-copyrighttrasz2010-07-181-1/+1
| | | | occurences from sys/sys/ and sys/kern/.
* Fix build on amd64, where sysctl arg1 is a pointer.rwatson2009-10-051-1/+1
| | | | | Reported by: Mr Tinderbox MFC after: 3 months
* First cut at implementing SOCK_SEQPACKET support for UNIX (local) domainrwatson2009-10-051-16/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+2
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Remove unnecessary/redundant includes.jamie2009-06-231-1/+0
| | | | Approved by: bz (mentor)
* Fix a deadlock in the getpeername() method for UNIX domain sockets.jhb2009-06-181-4/+4
| | | | | | | | | | Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add internal 'mac_policy_count' counter to the MAC Framework, which is arwatson2009-06-021-2/+0
| | | | | | | | | | | | | | | | | | count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project
* Change the curvnet variable from a global const struct vnet *,zec2009-05-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-3/+1
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* Decompose the global UNIX domain sockets rwlock into two differentrwatson2009-03-081-102/+96
| | | | | | | | | | | | | | | | | | | | locks: a global list/counter/generation counter protected by a new mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock, which protects the connections between UNIX domain sockets. This eliminates conditional lock acquisition that was previously a property of the global lock being held over sonewconn() leading to a call to uipc_attach(), which also required the global lock, but couldn't rely on it as other paths existed to uipc_attach() that didn't hold it: now uipc_attach() uses only the list lock, which follows the linkage lock in the lock order. It may also reduce contention on the global lock for some workloads. Add global UNIX domain socket locks to hard-coded witness lock order. MFC after: 1 week Discussed with: kris
* White space and comment tweaks.rwatson2009-01-011-2/+2
| | | | MFC after: 3 weeks
* Rename mbcnt to mbcnt_delta in uipc_send() -- unlike other localrwatson2008-12-301-3/+3
| | | | | | | variables named mbcnt in uipc_usrreq.c, this instance is a delta rather than a cache of sb_mbcnt. MFC after: 3 weeks
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-1/+1
| | | | MFC after: 3 months
* Remove stale comment: while uipc_connect2() was, until recently, notrwatson2008-10-111-3/+0
| | | | | | | static so it could be used by fifofs (actually portalfs), it is now static. Submitted by: kensmith
* Remove stale comment (and XXX saying so) about why we zero the filerwatson2008-10-081-6/+0
| | | | | | | | descriptor pointer in unp_freerights: we can no longer recurse into unp_gc due to unp_gc being invoked in a deferred way, but it's still a good idea. MFC after: 3 days
* Differentiate pr_usrreqs for stream and datagram UNIX domain sockets, andrwatson2008-10-081-4/+25
| | | | | | employ soreceive_dgram for the datagram case. MFC after: 3 months
* Now that portalfs doesn't directly invoke uipc_connect2(), make it arwatson2008-10-061-1/+2
| | | | | | static symbol. MFC after: 3 days
* Further minor cleanups to UNIX domain sockets:rwatson2008-10-031-24/+16
| | | | | | | | | | | | | | | - Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(), unp_init(), and unp_externalize(), none of which have been required outside of uipc_usrreq.c since uipc_proto.c was removed. - Remove stale prototype for uipc_usrreq(), which has not existed in the code since 1997 - Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and not un.h. - Comment on why uipc_connect2() is still non-static -- it is used directly by fifofs. - Remove stale comments, tidy up whitespace. MFC after: 3 days (where applicable)
* Remove or update several stale comments.rwatson2008-10-031-16/+16
| | | | | | | | A bit of whitespace/style cleanup. Update copyright. MFC after: 3 days (applicable changes)
* Fill in a few sysctl descriptions.trhodes2008-07-261-7/+10
| | | | Approved by: rwatson
* Use bcopy instead of strlcpy in uipc_bind and unp_connect, sinceemaste2008-07-031-2/+4
| | | | | | | | | | | soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the terminating NUL is not part of the address." Since strlcpy has to return "the total length of the string [it] tried to create," it walks off the end of soun->sun_path looking for a \0. This reverts r105332. Reported by: Ryan Stone
* Move unlock of global UNIX domain socket lock slightly lower inrwatson2008-01-181-1/+1
| | | | | | | | | | | | unp_connect(): it is expected to return with the lock held, and two possible error paths otherwise returned with it unlocked. The fix committed here is slightly different from the patch in the PR, but along an alternative line suggested in the PR. PR: 119778 MFC after: 3 days Submitted by: James Juran <james dot juran at baesystems dot com>
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-1/+1
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* Remove "lock pushdown" todo item in comment -- I did that for 7.0.rwatson2008-01-101-1/+0
| | | | MFC after: 3 weeks
* Correct typos in comments.rwatson2008-01-101-2/+2
| | | | MFC after: 3 weeks
* - Place the fhold() in unp_internalize_fp to be more consistent with refs.jeff2008-01-011-9/+5
| | | | | | | | - Clear all of the gc flags before doing a run. Stale flags were causing us to skip some descriptors. - If a unp socket has been marked REF in a gc pass it can't be dead. Found by: rwatson's test tool.
* - Check the correct variable against NULL in two places.jeff2007-12-311-4/+2
| | | | | - If the unp_file is NULL that means it has never been internalized and it must be reachable.
* Remove explicit locking of struct file.jeff2007-12-301-237/+175
| | | | | | | | | | | | | - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
OpenPOWER on IntegriCloud