summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_cow.c
Commit message (Collapse)AuthorAgeFilesLines
* Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/netchild2011-02-251-0/+2
| | | | | | | | | | | | | PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
* Correct the order of the arguments to vm_fault_quick_hold_pages().alc2010-12-261-1/+1
|
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-251-9/+5
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Remove page queues locking from all sf_buf_mext()-like functions. The pagealc2010-05-061-5/+1
| | | | | | lock now suffices. Fix a couple nearby style violations.
* Add page locking to the vm_page_cow* functions.alc2010-05-041-3/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* This is the first step in transitioning responsibility for synchronizingalc2010-05-031-0/+4
| | | | | | access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-1/+3
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Extend the struct vm_page wire_count to u_int to avoid the overflowkib2009-01-031-1/+5
| | | | | | | | | | | | | | | | of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks
* Give MEXTADD() another argument to make both void pointers to thephk2008-02-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.
* Previously, nothing prevented the page that was returned by pmap_extract()alc2005-10-231-4/+3
| | | | | | from being reclaimed before it was wired. Use pmap_extract_and_hold() instead of pmap_extract() and retain the hold on the page until it has been wired.
* Verify that access to the given address is allowed from user-space.alc2005-10-221-1/+8
| | | | Discussed with: rwatson@
* Eliminate spl* calls.alc2005-10-211-6/+0
|
* Allow sends sent from non page-aligned userspace addresses to begallatin2005-06-051-8/+9
| | | | | | | considered for zero-copy sends. Reviewed by: alc Submitted by: Romer Gil at Rice University
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc().alc2004-11-081-1/+1
| | | | | | Change the spelling of the "catch" option to be consistent with the new options. Implement the "no wait" option. An implementation of the "CPU private" for i386 will be committed at a later date.
* In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, italc2004-04-031-1/+1
| | | | | | | should not. Add a new parameter so that the caller can specify which is the case. Reported by: dillon
* Revise socow_iodone() in light of recent sf_buf changes. Specifically,alc2004-03-171-5/+9
| | | | | | use sf_buf_free() instead of sf_buf_mext() to consolidate all actions that require the page queues lock in one critical section. While I'm here remove unnecessary splvm() and splx() calls.
* Refactor the existing machine-dependent sf_buf_free() into a machine-alc2004-03-161-1/+1
| | | | | | | | | | dependent function by the same name and a machine-independent function, sf_buf_mext(). Aside from the virtue of making more of the code machine- independent, this change also makes the interface more logical. Before, sf_buf_free() did more than simply undo an sf_buf_alloc(); it also unwired and if necessary freed the page. That is now the purpose of sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used as a general-purpose emphemeral map cache.
* Handle sf_buf_alloc() returning null. This can happen if thegallatin2004-01-171-9/+18
| | | | | | process takes a signal while waiting for an sf_buf to become available. Reviewed by: alc
* - Modify alpha's sf_buf implementation to use the direct virtual-to-alc2003-11-161-3/+5
| | | | | | | | | physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.
* Use __FBSDID().obrien2003-06-111-2/+4
|
* The data in an sf_buf should not be modified by the mbuf system. Markalc2003-04-111-1/+1
| | | | | | the mbuf as read only. Reviewed by: gallatin
* Remove some dead code.alc2003-04-081-8/+1
|
* Pass the vm_page's address to sf_buf_alloc(); map the vm_page as partalc2003-03-291-3/+1
| | | | | | | | | | of sf_buf_alloc() instead of expecting sf_buf_alloc()'s caller to map it. The ultimate reason for this change is to enable two optimizations: (1) that there never be more than one sf_buf mapping a vm_page at a time and (2) 64-bit architectures can transparently use their 1-1 virtual to physical mapping (e.g., "K0SEG") avoiding the overhead of pmap_qenter() and pmap_qremove().
* - Add vm_paddr_t, a physical address type. This is required for systemsjake2003-03-251-1/+1
| | | | | | | | | | | | | | | where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
* Fix a race condition in socow_setup(): The page must be wired beforegallatin2003-03-181-4/+7
| | | | | | | sf_buf_alloc() is called, as sf_buf_alloc() may sleep. If it does sleep, the page might be reclaimed before wiring occurs. Reported by: alc
* Pass the sf buf to MEXTADD() as the optional argument. This permitsalc2003-03-161-6/+3
| | | | | the simplification of socow_iodone() and sf_buf_free(); they don't have to reverse engineer the sf buf from the data's address.
* Remove some unnecessary actions by the zero-copy setup and teardown code.alc2003-03-091-10/+1
| | | | | | | | | Remove an incorrect comment. (Incrementing an object's reference count does not prevent a process from exiting. The real concern here is that the physical page must not be deleted until transmission is complete. That is already handled by the VM system and sf_buf_free().) Tested by: ken
* Change iov_base's type from `char *' to the standard `void *'. Allmike2002-10-111-1/+1
| | | | | uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
* o Synchronize updates to struct vm_page::cow with the page queues lock.alc2002-09-021-1/+3
|
* Moved sf_buf_alloc and sf_buf_free function declarations to sys/socketvar.hdg2002-08-131-2/+0
| | | | so that they can be seen by external callers.
* Lock accesses to the page queues.alc2002-07-131-0/+2
|
* Remove the advertising clause from the Duke BSD copyright on thegallatin2002-07-061-4/+1
| | | | | | | zero-copy files Requested by: rwatson Approved by: Jeff Chase (my old boss at Duke)
* catch up with mextadd callback taking a void argument instead of a caddr_t.alfred2002-06-291-2/+2
|
* At long last, commit the zero copy sockets code.ken2002-06-261-0/+181
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
OpenPOWER on IntegriCloud