summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_syscalls.c
Commit message (Collapse)AuthorAgeFilesLines
* Fixed a bug in sendfile(2) where the sent data would be corrupted duedg2003-12-011-0/+5
| | | | | | | | | | to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org
* - Modify alpha's sf_buf implementation to use the direct virtual-to-alc2003-11-161-3/+4
| | | | | | | | | physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.
* falloc allocates a file structure and adds it to the file descriptordwmalone2003-10-191-4/+3
| | | | | | | | | | | | | | | | | | | | | table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb
* Migrate the sf_buf allocator that is used by sendfile(2) and zero-copyalc2003-08-291-99/+0
| | | | | | | | | | | sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.
* Drop Giant in recvit before returning an error to the caller to avoidkan2003-08-111-1/+4
| | | | leaking the Giant on the syscall exit.
* If connect(2) has been interrupted by a signal and therefore theyar2003-08-061-3/+8
| | | | | | | | | | | | | | connection is to be established asynchronously, behave as in the case of non-blocking mode: - keep the SS_ISCONNECTING bit set thus indicating that the connection establishment is in progress, which is the case (clearing the bit in this case was just a bug); - return EALREADY, instead of the confusing and unreasonable EADDRINUSE, upon further connect(2) attempts on this socket until the connection is established (this also brings our connect(2) into accord with IEEE Std 1003.1.)
* Do some minor Giant pushdown made possible by copyin, fget, fdrop,dwmalone2003-08-041-13/+8
| | | | | | | | | malloc and mbuf allocation all not requiring Giant. 1) ostat, fstat and nfstat don't need Giant until they call fo_stat. 2) accept can copyin the address length without grabbing Giant. 3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit. 4) move Giant grabbing from each indivitual recv* syscall to recvit.
* Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in sf_buf_init().alc2003-08-021-1/+1
| | | | | | (See revision 1.140 of kern/sys_pipe.c for a detailed rationale.) Submitted by: tegge
* VOP_GETVOBJECT() wants to be called with the vnode lock held.truckman2003-06-191-0/+3
|
* Finish the vm object locking in sendfile(2). More generally,alc2003-06-121-1/+8
| | | | the vm locking in sendfile(2) is complete.
* Lock the vm object when removing a page.alc2003-06-111-0/+3
|
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Grab giant in sendit rather than kern_sendit because sockargs maydwmalone2003-05-291-4/+6
| | | | | | | allocate mbufs with M_TRYWAIT, which may require Giant. Reviewed by: bmilekic Approved by: re (scottl)
* Split sendit into two parts. The first part, still called sendit, thatdwmalone2003-05-051-50/+65
| | | | | | | | does the copyin stuff and then calls the second part kern_sendit to do the hard work. Don't bother holding Giant during the copyin phase. The intent of this is to allow the Linux emulator to impliment send* syscalls without using the stackgap.
* Recent changes to uipc_cow.c have eliminated the need for some sf_buf-alc2003-03-311-3/+3
| | | | | related variables to be global. Make them either local to sf_buf_init() or static.
* Pass the vm_page's address to sf_buf_alloc(); map the vm_page as partalc2003-03-291-9/+6
| | | | | | | | | | of sf_buf_alloc() instead of expecting sf_buf_alloc()'s caller to map it. The ultimate reason for this change is to enable two optimizations: (1) that there never be more than one sf_buf mapping a vm_page at a time and (2) 64-bit architectures can transparently use their 1-1 virtual to physical mapping (e.g., "K0SEG") avoiding the overhead of pmap_qenter() and pmap_qremove().
* Pass the sf buf to MEXTADD() as the optional argument. This permitsalc2003-03-161-5/+3
| | | | | the simplification of socow_iodone() and sf_buf_free(); they don't have to reverse engineer the sf buf from the data's address.
* Remove GIANT_REQUIRED from sf_buf_free().alc2003-03-061-2/+0
|
* Sync new socket nonblocking/async state with file flags in accept().tegge2003-02-231-0/+7
| | | | | PR: 1775 Reviewed by: mbr
* Remove duplicate includes.cognet2003-02-201-1/+0
| | | | Submitted by: Cyril Nguyen-Huu <cyril@ci0.org>
* Back out M_* changes, per decision of the TRB.imp2003-02-191-11/+11
| | | | Approved by: trb
* Break out the bind and connect syscalls to intend to make callingume2003-02-031-15/+40
| | | | | | | | these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator. Submitted by: dwmalone MFC after: 10 days
* Consolidate MIN/MAX macros into one place (param.h).alfred2003-02-021-3/+0
| | | | Submitted by: Hiten Pandya <hiten@unixdaemons.com>
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-11/+11
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Bow to the whining masses and change a union back into void *. Retaindillon2003-01-131-4/+4
| | | | | removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
* Change struct file f_data to un_data, a union of the correct structdillon2003-01-121-4/+4
| | | | | | | | | | pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
* Move the declaration of the socket fileops from socketvar.h to file.h.phk2002-12-231-2/+0
| | | | | This allows us to use the new typedefs and removes the needs for a number of forward struct declarations in socketvar.h
* Integrate mac_check_socket_send() and mac_check_socket_receive()rwatson2002-10-061-0/+22
| | | | | | | | | | checks from the MAC tree: allow policies to perform access control for the ability of a process to send and receive data via a socket. At some point, we might also pass in additional address information if an explicit address is requested on send. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* In an SMP environment post-Giant it is no longer safe to blindlytruckman2002-10-031-2/+4
| | | | | | | | | dereference the struct sigio pointer without any locking. Change fgetown() to take a reference to the pointer instead of a copy of the pointer and call SIGIO_LOCK() before copying the pointer and dereferencing it. Reviewed by: rwatson
* accept(2) on a socket that has been shutdown(2) normally returnsarchie2002-08-281-5/+4
| | | | | | | | | | | ECONNABORTED. Make this happen in the non-blocking case as well. The previous behavior was to return EAGAIN, which (a) is not consistent with the blocking case and (b) causes the application to think the socket is still valid. PR: bin/42100 Reviewed by: freebsd-net MFC after: 3 days
* In order to better support flexible and extensible access control,rwatson2002-08-151-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Fix return case for negative namelen by jumping to normal exit processingrwatson2002-08-151-2/+4
| | | | | | rather than immediately returning, or we may not unlock necessary locks. Noticed by: Mike Heffner <mheffner@acm.vt.edu>
* Moved sf_buf_alloc and sf_buf_free function declarations to sys/socketvar.hdg2002-08-131-2/+0
| | | | so that they can be seen by external callers.
* Remove obsolete comment about sf_buf_* functions being static. They weredg2002-08-131-3/+0
| | | | made un-static in rev 1.114.
* Fix sendfile(), who was calling vn_rdwr() without aresid parameter andsemenu2002-08-111-2/+2
| | | | | | | thus hiting EIO at the end of file. This is believed to be a feature (not a bug) of vn_rdwr(), so we turn it off by supplying aresid param. Reviewed by: rwatson, dg
* While we're at it, add range checks similar to those in previous commit tonectar2002-08-091-0/+8
| | | | getsockname() and getpeername(), too.
* Add additional range checks for copyout targets.rwatson2002-08-091-0/+2
| | | | Submitted by: Silvio Cesare <silvio@qualys.com>
* Include file cleanup; mac.h and malloc.h at one point had orderingrwatson2002-08-011-0/+1
| | | | | | relationship requirements, and no longer do. Reminded by: bde
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+21
| | | | | | | | | | | | | kernel access control. Instrument connect(), listen(), and bind() system calls to invoke MAC framework entry points to permit policies to authorize these requests. This can be useful for policies that want to limit the activity of processes involving particular types of IPC and network activity. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* o In do_sendfile(), replace vm_page_sleep_busy() by vm_page_sleep_if_busy()alc2002-07-301-5/+6
| | | | | and extend the scope of the page queues lock to cover all accesses to the page's flags and busy fields.
* - Make use of the VM_ALLOC_WIRED flag in the call to vm_page_alloc() inarr2002-07-231-12/+12
| | | | | | | | do_sendfile(). This allows us to rearrange an if statement in order to avoid doing an unnecesary call to vm_page_lock_queues(), and an attempt at re-wiring the pages (which were wired in the vm_page_alloc() call). Reviewed by: alc, jhb
* Lock accesses to the page queues by sendfile() and friends.alc2002-07-131-0/+8
|
* Create a bug-for-bug FreeBSD4 compatible version of sendfile and move thealfred2002-07-121-3/+36
| | | | | fixed sendfile over. This is needed to preserve binary compatibility from 4.x to 5.x.
* nuke more instances of caddr_talfred2002-06-291-23/+19
|
* remove or replace caddr_t with void.alfred2002-06-281-32/+29
| | | | make the mbuf external free function take a void * rather than caddr_t.
* At long last, commit the zero copy sockets code.ken2002-06-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Implement SO_NOSIGPIPE option for sockets. This allows one to request thatalfred2002-06-201-1/+2
| | | | | | | an EPIPE error return not generate SIGPIPE on sockets. Submitted by: lioux Inspired by: Darwin
* Catch up to changes in ktrace API.jhb2002-06-071-8/+8
|
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.tanimura2002-05-311-35/+2
| | | | Requested by: hsu
* Lock down a socket, milestone 1.tanimura2002-05-201-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
OpenPOWER on IntegriCloud