summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_socket.c
Commit message (Collapse)AuthorAgeFilesLines
* Modify label allocation semantics for sockets: pass in soalloc's mallocrwatson2002-10-051-3/+11
| | | | | | | | | | | flags so that we can call malloc with M_NOWAIT if necessary, avoiding potential sleeps while holding mutexes in the TCP syncache code. Similar to the existing support for mbuf label allocation: if we can't allocate all the necessary label store in each policy, we back out the label allocation and fail the socket creation. Sync from MAC tree. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Make similar changes to fo_stat() and fo_poll() as made earlier torwatson2002-08-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Use the credential authorizing the socket creation operation to performrwatson2002-08-121-2/+2
| | | | | | | | | | | the jail check and the MAC socket labeling in socreate(). This handles socket creation using a cached credential better (such as in the NFS client code when rebuilding a socket following a disconnect: the new socket should be created using the nfsmount cached cred, not the cred of the thread causing the socket to be rebuilt). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Include file cleanup; mac.h and malloc.h at one point had orderingrwatson2002-08-011-1/+1
| | | | | | relationship requirements, and no longer do. Reminded by: bde
* Introduce support for Mandatory Access Control and extensiblerwatson2002-08-011-1/+42
| | | | | | | | | | | kernel access control. Implement two IOCTLs at the socket level to retrieve the primary and peer labels from a socket. Note that this user process interface will be changing to improve multi-policy support. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+11
| | | | | | | | | | | | | | | kernel access control. Invoke the necessary MAC entry points to maintain labels on sockets. In particular, invoke entry points during socket allocation and destruction, as well as creation by a process or during an accept-scenario (sonewconn). For UNIX domain sockets, also assign a peer label. As the socket code isn't locked down yet, locking interactions are not yet clear. Various protocol stack socket operations (such as peer label assignment for IPv4) will follow. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Catch up to rev 1.87 of sys/sys/socketvar.h (sb_cc changed from u_longmike2002-07-241-1/+1
| | | | | | to u_int). Noticed by: sparc64 tinderbox
* More caddr_t removal.alfred2002-06-291-2/+2
| | | | Change struct knote's kn_hook from caddr_t to void *.
* At long last, commit the zero copy sockets code.ken2002-06-261-0/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Implement SO_NOSIGPIPE option for sockets. This allows one to request thatalfred2002-06-201-0/+2
| | | | | | | an EPIPE error return not generate SIGPIPE on sockets. Submitted by: lioux Inspired by: Darwin
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.tanimura2002-05-311-177/+31
| | | | Requested by: hsu
* - td will never be NULL, so the call to soalloc() in socreate() will alwaysarr2002-05-211-2/+2
| | | | | be passed a 1; we can, however, use M_NOWAIT to indicate this. - Check so against NULL since it's a pointer to a structure.
* - OR the flag variable with M_ZERO so that the uma_zalloc() handles thearr2002-05-211-2/+1
| | | | | zero'ing out of the allocated memory. Also removed the logical bzero that followed.
* Lock down a socket, milestone 1.tanimura2002-05-201-31/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
* Make funsetown() take a 'struct sigio **' so that the locking canalfred2002-05-061-1/+1
| | | | | | | | | | | | | | | | be done internally. Ensure that no one can fsetown() to a dying process/pgrp. We need to check the process for P_WEXIT to see if it's exiting. Process groups are already safe because there is no such thing as a pgrp zombie, therefore the proctree lock completely protects the pgrp from having sigio structures associated with it after it runs funsetownlst. Add sigio lock to witness list under proctree and allproc, but over proc and pgrp. Seigo Tanimura helped with this.
* Redo the sigio locking.alfred2002-05-011-1/+1
| | | | | | | | | | | Turn the sigio sx into a mutex. Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed. In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.
* Make sure that sockets undergoing accept filtering are aborted in asilby2002-04-261-2/+1
| | | | | | | | | LRU fashion when the listen queue fills up. Previously, there was no mechanism to kick out old sockets, leading to an easy DoS of daemons using accept filtering. Reviewed by: alfred MFC after: 3 days
* There's only one socket zone so we don't need to remember ithsu2002-04-081-2/+1
| | | | in every socket structure.
* UMA permited us to utilize the 'waitok' flag to soalloc.jeff2002-03-201-2/+7
|
* Remove references to vm_zone.h and switch over to the new uma API.jeff2002-03-201-4/+4
| | | | | Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
* This is the first part of the new kernel memory allocator. This replacesjeff2002-03-191-1/+1
| | | | | | malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
* In sosend(), enforce the socket buffer limits regardless of whetheriedowse2002-02-281-1/+1
| | | | | | the data was supplied as a uio or an mbuf. Previously the limit was ignored for mbuf data, and NFS could run the kernel out of mbufs when an ipfw rule blocked retransmissions.
* Simple p_ucred -> td_ucred changes to start using the per-thread ucredjhb2002-02-271-1/+1
| | | | reference.
* Get rid of the twisted MFREE() macro entirely.dillon2002-02-051-2/+2
| | | | | Reviewed by: dg, bmilekic MFC after: 3 days
* Fix select on fifos.alfred2002-01-141-1/+8
| | | | | | | | | | | | | Backout revision 1.56 and 1.57 of fifo_vnops.c. Introduce a new poll op "POLLINIGNEOF" that can be used to ignore EOF on a fifo, POLLIN/POLLRDNORM is converted to POLLINIGNEOF within the FIFO implementation to effect the correct behavior. This should allow one to view a fifo pretty much as a data source rather than worry about connections coming and going. Reviewed by: bde
* o Make the credential used by socreate() an explicit argument torwatson2001-12-311-2/+3
| | | | | | | | | | | | | | socreate(), rather than getting it implicitly from the thread argument. o Make NFS cache the credential provided at mount-time, and use the cached credential (nfsmount->nm_cred) when making calls to socreate() on initially connecting, or reconnecting the socket. This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well as bugs involving NFS and mandatory access control implementations. Reviewed by: freebsd-arch
* Give struct socket structures a ref counting interface similar todillon2001-11-171-6/+27
| | | | | | | vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
* Remove EOL whitespace.keramida2001-11-121-8/+8
| | | | Reviewed by: alfred
* Make KASSERT's print the values that triggered a panic.keramida2001-11-121-2/+3
| | | | Reviewed by: alfred
* Change the kernel's ucred API as follows:jhb2001-10-111-2/+1
| | | | | | | | - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
* - Combine kern.ps_showallprocs and kern.ipc.showallsockets intorwatson2001-10-091-19/+0
| | | | | | | | | | | | | | | | | | | | | | | a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc(). Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models. Reviewed by: ps, billf Obtained from: TrustedBSD Project
* Only allow users to see their own socket connections ifps2001-10-051-0/+30
| | | | | | | | | kern.ipc.showallsockets is set to 0. Submitted by: billf (with modifications by me) Inspired by: Dave McKay (aka pm aka Packet Magnet) Reviewed by: peter MFC after: 2 weeks
* Hopefully improve control message passing over Unix domain sockets.dwmalone2001-10-041-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Allow the sending of more than one control message at a time over a unix domain socket. This should cover the PR 29499. 2) This requires that unp_{ex,in}ternalize and unp_scan understand mbufs with more than one control message at a time. 3) Internalize and externalize used to work on the mbuf in-place. This made life quite complicated and the code for sizeof(int) < sizeof(file *) could end up doing the wrong thing. The patch always create a new mbuf/cluster now. This resulted in the change of the prototype for the domain externalise function. 4) You can now send SCM_TIMESTAMP messages. 5) Always use CMSG_DATA(cm) to determine the start where the data in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1) in some places, which gives the wrong alignment on the alpha. (NetBSD made this fix some time ago). This results in an ABI change for discriptor passing and creds passing on the alpha. (Probably on the IA64 and Spare ports too). 6) Fix userland programs to use CMSG_* macros too. 7) Be more careful about freeing mbufs containing (file *)s. This is made possible by the prototype change of externalise. PR: 29499 MFC after: 6 weeks
* Use the passed in thread to selrecord() instead of curthread.jhb2001-09-211-2/+2
|
* KSE Milestone 2julian2001-09-121-34/+34
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-0/+3
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Actually show the values that tripped the assertion "receive 1"alfred2001-04-271-1/+3
|
* When doing a recv(.. MSG_WAITALL) for a message which is larger thanjlemon2001-03-161-0/+6
| | | | | | | | | | | the socket buffer size, the receive is done in sections. After completing a read, call pru_rcvd on the underlying protocol before blocking again. This allows the the protocol to take appropriate action, such as sending a TCP window update to the peer, if the window happened to close because the socket buffer was filled. If the protocol is not notified, a TCP transfer may stall until the remote end sends a window probe.
* Push the test for a disconnected socket when accept()ing down to thejlemon2001-03-091-4/+1
| | | | | protocol layer. Not all protocols behave identically. This fixes the brokenness observed with unix-domain sockets (and postfix)
* In soshutdown(), use SHUT_{RD,WR,RDWR} instead of FREAD and FWRITE.ru2001-02-271-3/+5
| | | | Also, return EINVAL if `how' is invalid, as required by POSIX spec.
* Introduce a NOTE_LOWAT flag for use with the read/write filters, whichjlemon2001-02-241-0/+4
| | | | | | | | | | allow the watermark to be passed in via the data field during the EV_ADD operation. Hook this up to the socket read/write filters; if specified, it overrides the so_{rcv|snd}.sb_lowat values in the filter. Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>
* When returning EV_EOF for the socket read/write filters, also returnjlemon2001-02-241-0/+2
| | | | | | | the current socket error in fflags. This may be useful for determining why a connect() request fails. Inspired by: "Jonathan Graehl" <jonathan@graehl.org>
* o Move per-process jail pointer (p->pr_prison) to inside of the subjectrwatson2001-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
* Extend kqueue down to the device layer.jlemon2001-02-151-27/+28
| | | | Backwards compatible approach suggested by: peter
* Return ECONNABORTED from accept if connection is closed while on thejlemon2001-02-141-5/+2
| | | | | | | listen queue, as well as the current behavior of a zero-length sockaddr. Obtained from: KAME Reviewed by: -net
* First step towards an MP-safe zone allocator:des2001-01-211-2/+2
| | | | | | | - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone
* * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.bmilekic2000-12-211-9/+9
| | | | | | | | | | | | | | | | | | This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-4/+2
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* Accept filters broke kernels compiled without options INET.alfred2000-11-201-6/+19
| | | | | | | Make accept filters conditional on INET support to fix. Pointed out by: bde Tested and assisted by: Stephen J. Kiernan <sab@vegamuse.org>
* Check so_error in filt_so{read|write} in order to detect UDP errors.jlemon2000-09-281-0/+4
| | | | PR: 21601
OpenPOWER on IntegriCloud