summaryrefslogtreecommitdiffstats
path: root/sys/kern/sys_generic.c
Commit message (Collapse)AuthorAgeFilesLines
* - Remove stale comment.jeff2008-03-191-8/+0
| | | | | | | | - In the last revision the code was changed to use maxfilesperproc rather than the per-process file limit to restrict the size of the poll array. This eliminates a significant source of process lock contention in multithreaded programs and is cheaper. This had been committed with the wrong batch of changes.
* - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice fromjeff2008-03-191-6/+1
| | | | | | | requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
* Make ftruncate a 'struct file' operation rather than a vnode operation.jhb2008-01-071-1/+66
| | | | | | | | | | | | | | This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
* Remove explicit locking of struct file.jeff2007-12-301-8/+4
| | | | | | | | | | | | | - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
* Refactor select to reduce contention and hide internal implementationjeff2007-12-161-168/+414
| | | | | | | | | | | | | | | | | | | | | details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch
* generally we are interested in what thread did something asjulian2007-11-141-1/+1
| | | | | | opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
* Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncatepeter2007-07-041-0/+28
| | | | Approved by: re (kensmith)
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-16/+16
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Remove unneeded include files.alc2007-05-011-2/+0
|
* Replace custom file descriptor array sleep lock constructed using a mutexrwatson2007-04-041-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
* Further system call comment cleanup:rwatson2007-03-051-15/+0
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-45/+10
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Do not dispatch SIGPIPE from the generic write path for a socket; withbms2007-03-011-1/+1
| | | | | | | | | | | | this patch the code behaves according to the comment on the line above. Without this patch, a socket could cause SIGPIPE to be delivered to its process, once with SO_NOSIGPIPE set, and twice without. With this patch, the kernel now passes the sigpipe regression test. Tested by: Anton Yuzhaninov MFC after: 1 week
* Prevent IOC_IN with zero size argument (this is only supportedru2006-10-141-1/+2
| | | | | | | | | | | | if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days
* Fix our ioctl(2) implementation when the argument is "int". Newru2006-09-271-11/+15
| | | | | | | | | | | | | ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls. Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64. Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week
* - Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumesjhb2006-07-081-37/+44
| | | | | | | | | that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
* Return error from fget_write() rather than hardcoding EBADF now thatjhb2006-01-061-2/+2
| | | | | | fget_write() DTRT. Requested by: bde
* Remove XXX comments complaining that write(2) on a read-only descriptorjhb2006-01-051-2/+2
| | | | | | | | | | | returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD). The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.) Discussed with: scottl, grog, mjacob, bde
* - Add two new system calls: preadv() and pwritev() which are like readv()jhb2005-07-071-167/+198
| | | | | | | | | | | | | | | | | and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
* Conditionally weaken sys_generic.c rev 1.136 to allow certain dubiouspeter2005-06-301-2/+7
| | | | | | | | | | | | | ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with a size of zero. Traditionally this was what you did before IOC_VOID existed, and we had some established users of this in the tree, namely procfs. Certain 3rd party drivers with binary userland components also have this too. This is necessary to have 4.x and 5.x binaries use these ioctl's. We found this at work when trying to run 4.x binaries. Approved by: re
* Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(),jhb2005-03-311-19/+32
| | | | | kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
* Declare "cnt" (a number of bytes to read or write) as an "ssize_t", notcperciva2005-02-101-2/+4
| | | | | | as a "long" in dofileread() and dofilewrite(). Discussed with: jhb
* Previously a read of zero bytes got handled in devfs:vop_read() but Iphk2005-01-251-0/+12
| | | | | | | | | missed that when the vnode bypass was introduced. Deal with zero length transfers before we even get to fo_ops->fo_read(). Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru> PR: 75758
* Detect sign-extension bugs in the ioctl(2) command argument: Truncatephk2005-01-181-0/+6
| | | | to 32 bits and print warning.
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Push Giant down through ioctl.phk2004-11-171-7/+0
| | | | | | | | | | | | | Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. mtx_lock(&Giant) in the opencrypto code. (This may actually not be needed, but better safe than sorry). Devfs grabs Giant if the driver is marked as needing Giant.
* Push Giant down through select and poll.phk2004-11-171-12/+1
| | | | | | | | | | Don't grab Giant in the upper syscall/wrapper code NET_LOCK_GIANT in the socket code (sockets/fifos). mtx_lock(&Giant) in the vnode code. Devfs grabs Giant if the driver is marked as needing Giant.
* Polish code to correctly reflect structure.phk2004-11-161-19/+10
|
* Rearrange memory management for ioctl arguments to use stronger checksphk2004-11-141-25/+16
| | | | for illegal values and don't store them on the stack any more.
* style polish.phk2004-11-141-8/+6
|
* Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.phk2004-11-131-6/+6
| | | | | | | | Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
* Poll() uses the array smallbits that is big enough to hold 32 structandre2004-08-271-3/+3
| | | | | | | | | | | | pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd. PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days
* Clean up and wash struct iovec and struct uio handling.phk2004-07-101-154/+57
| | | | | | | | | | | | Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Add annotations to mtx_lock(&Giant) in kern_select() and poll() thatrwatson2004-03-131-0/+8
| | | | | | we always grab Giant, even if we're actually only polling objects that don't require giant. Once socket locking is merged, there will be strong motivation to fix this.
* Switch the sleep/wakeup and condition variable implementations to use thejhb2004-02-271-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
* Locking for the per-process resource limits structure.jhb2004-02-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* pread/pwrite:ache2004-01-201-4/+9
| | | | follow lseek spirit - return EINVAL on negative offset for non-VCHR
* - Implement selwakeuppri() which allows raising the priority of atanimura2003-11-091-3/+24
| | | | | | | | | | | | | thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
* Introduce no_poll() default method for device drivers. Have itphk2003-09-271-11/+0
| | | | | | | | | | | | | | | | | | | | do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other. Retire seltrue() in favour of no_poll(). Create private default functions in kern_conf.c instead of public ones. Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded. Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now. Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c
* Remove Giant from writev(2). Eliminate trivial style differences betweenalc2003-08-011-11/+4
| | | | writev(2) and readv(2).
* Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use thatphk2003-06-181-4/+4
| | | | rather than assume that only DTYPE_VNODE is seekable.
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Deprecate machine/limits.h in favor of new sys/limits.h.kan2003-04-291-2/+1
| | | | | | | Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
* Back out M_* changes, per decision of the TRB.imp2003-02-191-7/+7
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-7/+7
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* SCARGS removal take II.alfred2002-12-141-6/+6
|
* Backout removal SCARGS, the code freeze is only "selectively" over.alfred2002-12-131-6/+6
|
* Remove SCARGS.alfred2002-12-131-6/+6
| | | | Reviewed by: md5
* Be consistent about "static" functions: if the function is markedphk2002-09-281-1/+1
| | | | | | static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
OpenPOWER on IntegriCloud