summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Correct the handling of two unusual cases by the zero-copy receive path,alc2004-12-131-16/+26
| | | | | | | | | | | | | | | specifically, vm_pgmoveco(): 1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up because the page may have been freed. 2. If the receive buffer is copy-on-write due to, for example, a fork, then although the first vm object in the shadow chain may not contain a page there may still be one from a backing object that is mapped. Thus, a pmap_remove() is required for the new page rather than the backing object's page to been seen by the application. Also, add some comments to vm_pgmoveco() and update some assertions. Tested by: ken@
* Tidy up the zero-copy receive path: Remove an unneeded argument toalc2004-12-081-6/+3
| | | | uiomoveco() and userspaceco().
* Update the Tigon 1 and 2 driver to use the sf_buf API for implementingalc2004-12-061-6/+4
| | | | | | | | | | | | zero-copy receive of jumbo frames. This eliminates the need for the jumbo frame allocator implemented in kern/uipc_jumbo.c and sys/jumbo.h. Remove it. Note: Zero-copy receive of jumbo frames did not work without these changes; I believe there was insufficient locking on the jumbo vm object. Tested by: ken@ Discussed with: gallatin@
* Eliminate an unused argument to vm_pgmoveco().alc2004-11-081-4/+2
|
* Two changes to vm_pgmoveco():alc2004-11-051-3/+1
| | | | | | | - Eliminate an initialized but unused variable. - Eliminate an unnecessary call to clear the page's PG_BUSY flag. (The call to vm_page_rename() already clears the page's PG_BUSY flag through its call to vm_page_remove().)
* The synchronization provided by vm object locking has eliminated thealc2004-11-031-2/+0
| | | | | | | | | | | | | | | | | need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.
* Add a WITNESS_WARN() to uiomove() to whine if locks are held when thisjhb2004-10-121-0/+2
| | | | | | function is called. MFC after: 1 month
* Clean up and wash struct iovec and struct uio handling.phk2004-07-101-17/+46
| | | | | | | | | | | | Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
* - Change mi_switch() and sched_switch() to accept an optional thread tojhb2004-07-021-1/+1
| | | | | | | | | | | | | switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
* Remove checks for curthread == NULL - it can't happen.tjr2004-06-031-5/+3
|
* Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoidtjr2004-06-031-9/+4
| | | | | | | having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Rename iov_to_uio to uiofromiov to be more consistent with othersilby2004-02-041-1/+1
| | | | | | uio* functions. Suggested by: bde
* Style fixessilby2004-02-041-29/+29
| | | | Submitted by: bde
* Remove debugging code that slipped into the previous commit.silby2004-02-021-3/+0
| | | | Spotted by: bde
* Rewrite sendfile's header support so that headers are now sent in the firstsilby2004-02-011-0/+42
| | | | | | | | | | | | packet along with data, instead of in their own packet. When serving files of size (packetsize - headersize) or smaller, this will result in one less packet crossing the network. Quick testing with thttpd and http_load has shown a noticeable performance improvement in this case (350 vs 330 fetches per second.) Included in this commit are two support routines, iov_to_uio, and m_uiotombuf; these routines are used by sendfile to construct the header mbuf chain that will be linked to the rest of the data in the socket buffer.
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr,alfred2003-12-261-2/+4
| | | | copyin and copyout.
* Introduce a uiomove_frombuf helper routine that handles computing andnectar2003-10-021-0/+23
| | | | | | | | | | | | | | | validating the offset within a given memory buffer before handing the real work off to uiomove(9). Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified. Add additional sanity checks when computing a memory allocation size in pfs_read. Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-) Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
* Use __FBSDID().obrien2003-06-111-1/+3
|
* - Add vm object locking to vm_pgmoveco().alc2003-06-091-2/+5
| | | | | - Add a comment to vm_pgmoveco() describing what remains to be done for vm locking.
* Tweak the clearing of TDF_DEADLKTREAT so that we only bother grabbing thejhb2003-05-051-2/+2
| | | | lock and clearing the flag if it was clear when uiomove() was called.
* Remove extraneous check. We are not going to return from copyin/out onjhb2003-03-251-2/+0
| | | | the stack of a thread A but actually be thread B instead of thread A.
* Zero copy send and receive fixes:ken2003-03-081-1/+1
| | | | | | | | | | | | | | | | | | - On receive, vm_map_lookup() needs to trigger the creation of a shadow object. To make that happen, call vm_map_lookup() with PROT_WRITE instead of PROT_READ in vm_pgmoveco(). - On send, a shadow object will be created by the vm_map_lookup() in vm_fault(), but vm_page_cowfault() will delete the original page from the backing object rather than simply letting the legacy COW mechanism take over. In other words, the new page should be added to the shadow object rather than replacing the old page in the backing object. (i.e. vm_page_cowfault() should not be called in this case.) We accomplish this by making sure fs.object == fs.first_object before calling vm_page_cowfault() in vm_fault(). Submitted by: gallatin, alc Tested by: ken
* Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.alc2003-03-061-106/+2
| | | | Discussed on: arch@
* Convert one of our main caddr_t consumers, uiomove(9), to void *.des2003-03-021-5/+5
|
* Clean up whitespace, unregisterize, ANSIfy, remove prototypes madedes2003-03-021-55/+19
| | | | superfluous by ANSIfication.
* Back out M_* changes, per decision of the TRB.imp2003-02-191-2/+2
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-2/+2
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Reduce the number of times that we acquire and release the page queuesalc2002-12-291-2/+0
| | | | | lock by making vm_page_rename()'s caller, rather than vm_page_rename(), responsible for acquiring it.
* Extend the scope of the page queues lock in vm_pgmoveco().alc2002-12-201-4/+4
|
* Hold the page queues lock when performing vm_page_busy().alc2002-12-181-0/+2
|
* Use pmap_remove_all() instead of pmap_remove() before freeing the pagealc2002-11-281-5/+4
| | | | | | | in vm_pgmoveco(); the page may have more than one mapping. Hold the page queues lock when calling pmap_remove_all(). Approved by: re (blanket)
* - Create a new scheduler api that is defined in sys/sched.hjeff2002-10-121-1/+2
| | | | | | | | | | - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
* Change iov_base's type from `char *' to the standard `void *'. Allmike2002-10-111-5/+8
| | | | | uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
* o Convert a vm_page_sleep_busy() into a vm_page_sleep_if_busy()alc2002-08-041-1/+3
| | | | with appropriate page queue locking.
* o Lock page queue accesses by vm_page_free().alc2002-07-211-0/+2
|
* Fix compilation with ENABLE_VFS_IOOPT turned on and ZERO_COPY_SOCKETSken2002-07-121-16/+11
| | | | | | | | | turned off. Clean up #ifdefs, and remove a bunch of unnecessary includes. Reviewed by: bde Tested by: netchild
* Add a hashdestroy() function to undo the actions of hashinit().iedowse2002-06-301-0/+15
|
* Part 1 of KSE-IIIjulian2002-06-291-1/+0
| | | | | | | | | | | | | The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
* More caddr_t removal.alfred2002-06-291-4/+4
| | | | Change struct knote's kn_hook from caddr_t to void *.
* At long last, commit the zero copy sockets code.ken2002-06-261-20/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Remove UIO_USERISPACE - we do not support any split instruction/datapeter2002-06-201-6/+0
| | | | | address space machines (eg: pdp-11) and are not likely to ever do so. Nothing in our kernel sets this.
* o Condition the compilation of uiomoveco() and vm_uiomove()alc2002-05-051-3/+7
| | | | | | on ENABLE_VFS_IOOPT. o Add a comment to the effect that this code is experimental support for zero-copy I/O.
* In a threaded world, differnt priorirites become properties ofjulian2002-02-111-1/+1
| | | | | | different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
* Fix a bug introduced in r. 1.28: when copy{in,out} would fail for antmm2002-02-081-1/+2
| | | | | | | iovec that was not the last one in the uio, the error would be ignored silently. Bug found and fix proposed by: jhb
* Change the preemption code for software interrupt thread schedules andjhb2002-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
* Make uio_yield() a global. Call uio_yield() between chunksdillon2001-09-261-3/+1
| | | | | | | | | | | | | | in vn_rdwr_inchunks(), allowing other processes to gain an exclusive lock on the vnode. Specifically: directory scanning, to avoid a race to the root directory, and multiple child processes coring simultaniously so they can figure out that some other core'ing child has an exclusive adv lock and just exit instead. This completely fixes performance problems when large programs core. You can have hundreds of copies (forked children) of the same binary core all at once and not notice. MFC after: 3 days
* Fix locking on td_flags for TDF_DEADLKTREAT. If the comments in the codejhb2001-09-131-1/+6
| | | | | are true that curthread can change during this function, then this flag needs to become a KSE flag, not a thread flag.
OpenPOWER on IntegriCloud