summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix build on amd64, where sysctl arg1 is a pointer.rwatson2009-10-051-1/+1
| | | | | Reported by: Mr Tinderbox MFC after: 3 months
* First cut at implementing SOCK_SEQPACKET support for UNIX (local) domainrwatson2009-10-051-16/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+2
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Remove unnecessary/redundant includes.jamie2009-06-231-1/+0
| | | | Approved by: bz (mentor)
* Fix a deadlock in the getpeername() method for UNIX domain sockets.jhb2009-06-181-4/+4
| | | | | | | | | | Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add internal 'mac_policy_count' counter to the MAC Framework, which is arwatson2009-06-021-2/+0
| | | | | | | | | | | | | | | | | | count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project
* Change the curvnet variable from a global const struct vnet *,zec2009-05-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-3/+1
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* Decompose the global UNIX domain sockets rwlock into two differentrwatson2009-03-081-102/+96
| | | | | | | | | | | | | | | | | | | | locks: a global list/counter/generation counter protected by a new mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock, which protects the connections between UNIX domain sockets. This eliminates conditional lock acquisition that was previously a property of the global lock being held over sonewconn() leading to a call to uipc_attach(), which also required the global lock, but couldn't rely on it as other paths existed to uipc_attach() that didn't hold it: now uipc_attach() uses only the list lock, which follows the linkage lock in the lock order. It may also reduce contention on the global lock for some workloads. Add global UNIX domain socket locks to hard-coded witness lock order. MFC after: 1 week Discussed with: kris
* White space and comment tweaks.rwatson2009-01-011-2/+2
| | | | MFC after: 3 weeks
* Rename mbcnt to mbcnt_delta in uipc_send() -- unlike other localrwatson2008-12-301-3/+3
| | | | | | | variables named mbcnt in uipc_usrreq.c, this instance is a delta rather than a cache of sb_mbcnt. MFC after: 3 weeks
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-1/+1
| | | | MFC after: 3 months
* Remove stale comment: while uipc_connect2() was, until recently, notrwatson2008-10-111-3/+0
| | | | | | | static so it could be used by fifofs (actually portalfs), it is now static. Submitted by: kensmith
* Remove stale comment (and XXX saying so) about why we zero the filerwatson2008-10-081-6/+0
| | | | | | | | descriptor pointer in unp_freerights: we can no longer recurse into unp_gc due to unp_gc being invoked in a deferred way, but it's still a good idea. MFC after: 3 days
* Differentiate pr_usrreqs for stream and datagram UNIX domain sockets, andrwatson2008-10-081-4/+25
| | | | | | employ soreceive_dgram for the datagram case. MFC after: 3 months
* Now that portalfs doesn't directly invoke uipc_connect2(), make it arwatson2008-10-061-1/+2
| | | | | | static symbol. MFC after: 3 days
* Further minor cleanups to UNIX domain sockets:rwatson2008-10-031-24/+16
| | | | | | | | | | | | | | | - Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(), unp_init(), and unp_externalize(), none of which have been required outside of uipc_usrreq.c since uipc_proto.c was removed. - Remove stale prototype for uipc_usrreq(), which has not existed in the code since 1997 - Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and not un.h. - Comment on why uipc_connect2() is still non-static -- it is used directly by fifofs. - Remove stale comments, tidy up whitespace. MFC after: 3 days (where applicable)
* Remove or update several stale comments.rwatson2008-10-031-16/+16
| | | | | | | | A bit of whitespace/style cleanup. Update copyright. MFC after: 3 days (applicable changes)
* Fill in a few sysctl descriptions.trhodes2008-07-261-7/+10
| | | | Approved by: rwatson
* Use bcopy instead of strlcpy in uipc_bind and unp_connect, sinceemaste2008-07-031-2/+4
| | | | | | | | | | | soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the terminating NUL is not part of the address." Since strlcpy has to return "the total length of the string [it] tried to create," it walks off the end of soun->sun_path looking for a \0. This reverts r105332. Reported by: Ryan Stone
* Move unlock of global UNIX domain socket lock slightly lower inrwatson2008-01-181-1/+1
| | | | | | | | | | | | unp_connect(): it is expected to return with the lock held, and two possible error paths otherwise returned with it unlocked. The fix committed here is slightly different from the patch in the PR, but along an alternative line suggested in the PR. PR: 119778 MFC after: 3 days Submitted by: James Juran <james dot juran at baesystems dot com>
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-1/+1
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* Remove "lock pushdown" todo item in comment -- I did that for 7.0.rwatson2008-01-101-1/+0
| | | | MFC after: 3 weeks
* Correct typos in comments.rwatson2008-01-101-2/+2
| | | | MFC after: 3 weeks
* - Place the fhold() in unp_internalize_fp to be more consistent with refs.jeff2008-01-011-9/+5
| | | | | | | | - Clear all of the gc flags before doing a run. Stale flags were causing us to skip some descriptors. - If a unp socket has been marked REF in a gc pass it can't be dead. Found by: rwatson's test tool.
* - Check the correct variable against NULL in two places.jeff2007-12-311-4/+2
| | | | | - If the unp_file is NULL that means it has never been internalized and it must be reachable.
* Remove explicit locking of struct file.jeff2007-12-301-237/+175
| | | | | | | | | | | | | - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* When we do open, we should lock the vnode exclusively. This fixes few races:pjd2007-07-261-1/+1
| | | | | | | | | | - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)
* Add DDB "show unpcb" command, allowing DDB to print out many pertinentrwatson2007-05-291-0/+121
| | | | details from UNIX domain socket protocol layer state.
* Remove more one more stale comment regarding unpcb type-safety.rwatson2007-05-111-4/+0
|
* Clarify and update quite a few comments to reflect locking optimizations,rwatson2007-05-111-38/+21
| | | | | | | the addition of unpcb refcounts, and bug fixes. Some of these fixes are appropriate for MFC. MFC after: 3 days
* Don't acquire Giant unconditionally.wkoszek2007-05-061-14/+20
| | | | Reviewed by: rwatson
* Replace custom file descriptor array sleep lock constructed using a mutexrwatson2007-04-041-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
* In uipc_close(), we no longer always free the unpcb, as the last referencerwatson2007-03-121-1/+2
| | | | | | | may be dropped later. In this case, always unlock the unpcb so as not to leak the lock. Found by: kris (BugMagnet)
* Remove two simultaneous acquisitions of multiple unpcb locks fromrwatson2007-03-011-22/+19
| | | | | | | | | uipc_send in cases where only a global read lock is held by breaking them out and avoiding the unpcb lock acquire in the common case. This avoids deadlocks which manifested with X11, and should also marginally further improve performance. Reported by: sepotvin, brooks
* Lock unp2 after checking for a non-NULL unp2 pointer in uipc_send() onrwatson2007-02-281-1/+1
| | | | datagram UNIX domain sockets, not before.
* Revise locking strategy used for UNIX domain sockets in order to improverwatson2007-02-261-223/+469
| | | | | | | | | | | | | | | | | | concurrency: - Add per-unpcb mutexes protecting unpcb connection state, fields, etc. - Replace global UNP mutex with a global UNP rwlock, which will protect the UNIX domain socket connection topology, v_socket, and be acquired exclusively before acquiring more than per-unpcb at a time in order to avoid lock order issues. In performance measurements involving MySQL, this change has little or no overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in multi-processor measurements using the sysbench and supersmack benchmarks. Much testing by: kris Approved by: re (kensmith)
* Add an additional MAC check to the UNIX domain socket connect path:rwatson2007-02-221-0/+5
| | | | | | | | | check that the subject has read/write access to the vnode using the vnode MAC check. MFC after: 3 weeks Submitted by: Spencer Minear <spencer_minear at securecomputing dot com> Obtained from: TrustedBSD Project
* Break introductory comment into two paragraphs to separate material on therwatson2007-02-201-12/+9
| | | | | | | | | | garbage collection complications from general discussion of UNIX domain sockets. Staticize unp_addsockcred(). Remove XXX comment regarding Giant and v_socket -- v_socket is protected by the global UNIX domain socket lock.
* Minor rearrangement of global variables, comments, etc, in UNIX domainrwatson2007-02-141-37/+34
| | | | sockets.
* Change unp_mtx to supporting recursion, and do not drop the unp_mtx overrwatson2007-02-141-13/+5
| | | | | | | | | | | | | | | | | | sonewconn() in unp_connect(). This avoids a race that occurs due to v_socket being an uncounted reference, as the lock was being released in order to call sonewconn(), which otherwise recurses into the UNIX domain socket code via pru_attach, as well as holding the lock over a sleeping memory allocation in uipc_attach(). Switch to a non-sleeping memory allocation during UNIX domain socket attach. This fix non-ideal in that it requires enabling recursion, but is a much smaller change than moving to using true references for v_socket. The reported panic occurs in unp_connect() following the return of sonewconn(). Update copyright year. Panic reported by: jhb
* Set UNP_CONNECTING when committing to moving ahead in unp_connect().rwatson2007-02-131-0/+1
| | | | | This logic was lost when merging the remainder of these changes in 1.178.
* Push UNIX domain socket locking further into uipc_ctloutput() in order torwatson2007-02-061-2/+6
| | | | | | | | avoid holding the UNIX domain socket subsystem lock over soooptcopyin() and sooptcopyout(). This problem was introduced when LOCAL_CREDS, and LOCAL_CONNWAIT support were added. Reviewed by: mdodd
* Canonicalize copyrights in some files I hold copyrights on:rwatson2007-01-081-1/+1
| | | | | | | | - Sort by date in license blocks, oldest copyright first. - All rights reserved after all copyrights, not just the first. - Use (c) to be consistent with other entries. MFC after: 3 days
* - Close a race between enumerating UNIX domain socket pcb structures viajhb2007-01-051-15/+50
| | | | | | | | | | | | | | | | | | | sysctl and socket teardown by adding a reference count to the UNIX domain pcb object and fixing the sysctl that enumerates unpcbs to grab a reference on each unpcb while it builds the list to copy out to userland. - Close a race between UNIX domain pcb garbage collection (unp_gc()) and file descriptor teardown (fdrop()) by adding a new garbage collection flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers in a UNIX domain socket looking for nested file descriptor references and clears the flag when it is finished. fdrop() checks to see if the flag is set on a file descriptor whose refcount just dropped to 0 and waits for unp_gc() to clear the flag before completely destroying the file descriptor. MFC after: 1 week Reviewed by: rwatson Submitted by: ups Hopefully makes the panics go away: mx1
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Minor white space tweaks.rwatson2006-08-131-4/+2
|
* Move definition of UNIX domain socket protosw and domain entries fromrwatson2006-08-071-3/+35
| | | | | | uipc_proto.c to uipc_usrreq.c, making localdomain static. Remove uipc_proto.c as it's no longer used. With this change, UNIX domain sockets are entirely encapsulated in uipc_usrreq.c.
OpenPOWER on IntegriCloud