summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Rename iov_to_uio to uiofromiov to be more consistent with othersilby2004-02-041-1/+1
| | | | | | uio* functions. Suggested by: bde
* Style fixessilby2004-02-041-29/+29
| | | | Submitted by: bde
* Remove debugging code that slipped into the previous commit.silby2004-02-021-3/+0
| | | | Spotted by: bde
* Rewrite sendfile's header support so that headers are now sent in the firstsilby2004-02-011-0/+42
| | | | | | | | | | | | packet along with data, instead of in their own packet. When serving files of size (packetsize - headersize) or smaller, this will result in one less packet crossing the network. Quick testing with thttpd and http_load has shown a noticeable performance improvement in this case (350 vs 330 fetches per second.) Included in this commit are two support routines, iov_to_uio, and m_uiotombuf; these routines are used by sendfile to construct the header mbuf chain that will be linked to the rest of the data in the socket buffer.
* - Add a flags parameter to mi_switch. The value of flags may be SW_VOL orjeff2004-01-251-2/+1
| | | | | | | | | | SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
* Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr,alfred2003-12-261-2/+4
| | | | copyin and copyout.
* Introduce a uiomove_frombuf helper routine that handles computing andnectar2003-10-021-0/+23
| | | | | | | | | | | | | | | validating the offset within a given memory buffer before handing the real work off to uiomove(9). Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified. Add additional sanity checks when computing a memory allocation size in pfs_read. Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-) Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
* Use __FBSDID().obrien2003-06-111-1/+3
|
* - Add vm object locking to vm_pgmoveco().alc2003-06-091-2/+5
| | | | | - Add a comment to vm_pgmoveco() describing what remains to be done for vm locking.
* Tweak the clearing of TDF_DEADLKTREAT so that we only bother grabbing thejhb2003-05-051-2/+2
| | | | lock and clearing the flag if it was clear when uiomove() was called.
* Remove extraneous check. We are not going to return from copyin/out onjhb2003-03-251-2/+0
| | | | the stack of a thread A but actually be thread B instead of thread A.
* Zero copy send and receive fixes:ken2003-03-081-1/+1
| | | | | | | | | | | | | | | | | | - On receive, vm_map_lookup() needs to trigger the creation of a shadow object. To make that happen, call vm_map_lookup() with PROT_WRITE instead of PROT_READ in vm_pgmoveco(). - On send, a shadow object will be created by the vm_map_lookup() in vm_fault(), but vm_page_cowfault() will delete the original page from the backing object rather than simply letting the legacy COW mechanism take over. In other words, the new page should be added to the shadow object rather than replacing the old page in the backing object. (i.e. vm_page_cowfault() should not be called in this case.) We accomplish this by making sure fs.object == fs.first_object before calling vm_page_cowfault() in vm_fault(). Submitted by: gallatin, alc Tested by: ken
* Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.alc2003-03-061-106/+2
| | | | Discussed on: arch@
* Convert one of our main caddr_t consumers, uiomove(9), to void *.des2003-03-021-5/+5
|
* Clean up whitespace, unregisterize, ANSIfy, remove prototypes madedes2003-03-021-55/+19
| | | | superfluous by ANSIfication.
* Back out M_* changes, per decision of the TRB.imp2003-02-191-2/+2
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-2/+2
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Reduce the number of times that we acquire and release the page queuesalc2002-12-291-2/+0
| | | | | lock by making vm_page_rename()'s caller, rather than vm_page_rename(), responsible for acquiring it.
* Extend the scope of the page queues lock in vm_pgmoveco().alc2002-12-201-4/+4
|
* Hold the page queues lock when performing vm_page_busy().alc2002-12-181-0/+2
|
* Use pmap_remove_all() instead of pmap_remove() before freeing the pagealc2002-11-281-5/+4
| | | | | | | in vm_pgmoveco(); the page may have more than one mapping. Hold the page queues lock when calling pmap_remove_all(). Approved by: re (blanket)
* - Create a new scheduler api that is defined in sys/sched.hjeff2002-10-121-1/+2
| | | | | | | | | | - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
* Change iov_base's type from `char *' to the standard `void *'. Allmike2002-10-111-5/+8
| | | | | uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
* o Convert a vm_page_sleep_busy() into a vm_page_sleep_if_busy()alc2002-08-041-1/+3
| | | | with appropriate page queue locking.
* o Lock page queue accesses by vm_page_free().alc2002-07-211-0/+2
|
* Fix compilation with ENABLE_VFS_IOOPT turned on and ZERO_COPY_SOCKETSken2002-07-121-16/+11
| | | | | | | | | turned off. Clean up #ifdefs, and remove a bunch of unnecessary includes. Reviewed by: bde Tested by: netchild
* Add a hashdestroy() function to undo the actions of hashinit().iedowse2002-06-301-0/+15
|
* Part 1 of KSE-IIIjulian2002-06-291-1/+0
| | | | | | | | | | | | | The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
* More caddr_t removal.alfred2002-06-291-4/+4
| | | | Change struct knote's kn_hook from caddr_t to void *.
* At long last, commit the zero copy sockets code.ken2002-06-261-20/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Remove UIO_USERISPACE - we do not support any split instruction/datapeter2002-06-201-6/+0
| | | | | address space machines (eg: pdp-11) and are not likely to ever do so. Nothing in our kernel sets this.
* o Condition the compilation of uiomoveco() and vm_uiomove()alc2002-05-051-3/+7
| | | | | | on ENABLE_VFS_IOOPT. o Add a comment to the effect that this code is experimental support for zero-copy I/O.
* In a threaded world, differnt priorirites become properties ofjulian2002-02-111-1/+1
| | | | | | different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
* Fix a bug introduced in r. 1.28: when copy{in,out} would fail for antmm2002-02-081-1/+2
| | | | | | | iovec that was not the last one in the uio, the error would be ignored silently. Bug found and fix proposed by: jhb
* Change the preemption code for software interrupt thread schedules andjhb2002-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
* Make uio_yield() a global. Call uio_yield() between chunksdillon2001-09-261-3/+1
| | | | | | | | | | | | | | in vn_rdwr_inchunks(), allowing other processes to gain an exclusive lock on the vnode. Specifically: directory scanning, to avoid a race to the root directory, and multiple child processes coring simultaniously so they can figure out that some other core'ing child has an exclusive adv lock and just exit instead. This completely fixes performance problems when large programs core. You can have hundreds of copies (forked children) of the same binary core all at once and not notice. MFC after: 3 days
* Fix locking on td_flags for TDF_DEADLKTREAT. If the comments in the codejhb2001-09-131-1/+6
| | | | | are true that curthread can change during this function, then this flag needs to become a KSE flag, not a thread flag.
* KSE Milestone 2julian2001-09-121-12/+15
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Remove spl's in uio_yield() that are covered by the sched_lock.jhb2001-07-031-3/+0
|
* After one too many PRs on the subject, bite the bullet and define IOV_MAXwollman2001-06-181-0/+4
| | | | | | | | and its associated constants. Implement _SC_IOV_MAX in the usual way. Be a bit sloppy about the namespace question; this should get cleared up in time for 5.0. MFC after: 1 month
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-2/+2
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Introduce copyinfrom and copyinstrfrom, which can copy data from eitherjlemon2001-02-161-0/+36
| | | | | user or kernel space. This will allow layering of os-compat (e.g.: linux) system calls. Apply the changes to mount.
* Implement a unified run queue and adjust priority levels accordingly.jake2001-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
* Change and clean the mutex lock interface.bmilekic2001-02-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variablesjake2001-01-101-3/+3
| | | | other then curproc.
* - Split the run queue and sleep queue linkage, so that a processjake2000-11-171-1/+1
| | | | | | | | | may block on a mutex while on the sleep queue without corrupting it. - Move dropping of Giant to after the acquire of sched_lock. Tested by: John Hay <jhay@icomtek.csir.co.za> jhb
* Don't release and acquire Giant in mi_switch(). Instead, release andjhb2000-11-161-0/+2
| | | | | | | | acquire Giant as needed in functions that call mi_switch(). The releases need to be done outside of the sched_lock to avoid potential deadlocks from trying to acquire Giant while interrupts are disabled. Submitted by: witness
* Catch up to moving headers:jhb2000-10-201-2/+1
| | | | | - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
* GC vax-only codeeivind2000-09-141-47/+0
|
* Major update to the way synchronization is done in the kernel. Highlightsjasone2000-09-071-1/+6
| | | | | | | | | | | | | | | include: * Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) * Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
OpenPOWER on IntegriCloud