summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_syscalls.c
Commit message (Collapse)AuthorAgeFilesLines
* Fixed bug in calculation of amount of file to send when nbytes !=0 anddg2002-01-221-3/+6
| | | | | | | | | headers or trailers are supplied. Reported by Vladislav Shabanov <vs@rambler-co.ru>. PR: 33771 Submitted by: Maxim Konovalov <maxim@macomnet.ru> MFC after: 3 days
* SMP Lock struct file, filedesc and the global file list.alfred2002-01-131-7/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
* Sockets are called 'so' not 'sp'.alfred2002-01-091-8/+8
|
* o Make the credential used by socreate() an explicit argument torwatson2001-12-311-3/+6
| | | | | | | | | | | | | | socreate(), rather than getting it implicitly from the thread argument. o Make NFS cache the credential provided at mount-time, and use the cached credential (nfsmount->nm_cred) when making calls to socreate() on initially connecting, or reconnecting the socket. This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well as bugs involving NFS and mandatory access control implementations. Reviewed by: freebsd-arch
* Give struct socket structures a ref counting interface similar todillon2001-11-171-134/+71
| | | | | | | vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
* remove holdfp()dillon2001-11-141-17/+5
| | | | | | | | | | | | | | | | | | | Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped. introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode. *_read() requires that the file pointer be FREAD, *_write that it be FWRITE. This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.
* KSE Milestone 2julian2001-09-121-176/+168
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Giant pushdown syscalls in kern/uipc_syscalls.c. Affected calls:dillon2001-08-311-67/+230
| | | | | | recvmsg(), sendmsg(), recvfrom(), accept(), getpeername(), getsockname(), socket(), connect(), accept(), send(), recv(), bind(), setsockopt(), listen(), sendto(), shutdown(), socketpair(), sendfile()
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-13/+4
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* Don't dereference a NULL pointer if we fail to get a sendfilebuf.dwmalone2001-06-241-1/+2
|
* Add vm locking to sendfile(2) and sf_buf_free().jhb2001-05-251-5/+13
| | | | | Reported by: Tamiji Homma <thomma@BayNetworks.com> Tested by: Tamiji Homma <thomma@BayNetworks.com>
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-2/+5
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Revert consequences of changes to mount.h, part 2.grog2001-04-291-1/+0
| | | | Requested by: bde
* Sendfile is documented to return 0 on success, however if when aalfred2001-04-261-0/+7
| | | | | | | | | | | sf_hdtr is used to provide writev(2) style headers/trailers on the sent data the return value is actually either the result of writev(2) from the trailers or headers of no tailers are specified. Fix sendfile to comply with the documentation, by returning 0 on success. Ok'd by: dg
* Correct #includes to work with fixed sys/mount.h.grog2001-04-231-0/+1
|
* Fix is a similar race condition as existed in the mbuf code. When we gobmilekic2001-03-081-6/+7
| | | | | | | | | | | | into an interruptable sleep and we increment a sleep count, we make sure that we are the thread that will decrement the count when we wakeup. Otherwise, what happens is that if we get interrupted (signal) and we have to wake up, but before we get our mutex, some thread that wants to wake us up detects that the count is non-zero and so enters wakeup_one(), but there's nothing on the sleep queue and so we don't get woken up. The thread will still decrement the sleep count, which is bad because we will also decrement it again later (as we got interrupted) and are already off the sleep queue.
* Make the wait for sendfile buffers interruptable. Stops one processdwmalone2001-03-081-3/+24
| | | | | | | | consuming them all and then getting stuck. Reviewed by: dg Reviewed by: bmilekic Observed by: Andreas Persson <pap@garen.net>
* Grab the process lock while calling psignal and before calling psignal.jhb2001-03-071-1/+4
|
* Return ECONNABORTED from accept if connection is closed while on thejlemon2001-02-141-1/+14
| | | | | | | listen queue, as well as the current behavior of a zero-length sockaddr. Obtained from: KAME Reviewed by: -net
* Change and clean the mutex lock interface.bmilekic2001-02-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* Fix the <sys/queue.h> abuse.phk2001-01-021-8/+7
| | | | | Submitted by: Dima Dorfman <dima@unixfreak.org> Reviewed by: /sbin/md5
* Add an XXX about a <sys/queue.h> transgression which needs cleaned up.phk2001-01-021-0/+1
|
* * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.bmilekic2000-12-211-3/+3
| | | | | | | | | | | | | | | | | | This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-2/+2
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* Changed second argument in a call to sf_buf_free() to be NULL instead ofdg2000-12-031-1/+1
| | | | | | | PAGE_SIZE to match the prototype better. The argument is ignored, so this is just to silence the compile-time warning. Pointed out by: jhb
* Make sure to free the sf_buf if we've allocated it but fail to allocatebmilekic2000-12-021-0/+1
| | | | | | | an mbuf (ENOBUFS) before returning so that we don't leak sf_bufs in the case where we're out of mbufs. Submitted by: David Greenman (dg)
* This patchset fixes a large number of file descriptor race conditions.dillon2000-11-181-66/+151
| | | | | | | | | | | | Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
* Fixed a certain panic on IO error in sendfile(): Page must be set PG_BUSYdg2000-11-121-1/+3
| | | | before calling vm_page_free() on it.
* * Have m_pulldown() use the new M_WRITABLE() macro in order to determinebmilekic2000-11-111-1/+2
| | | | | | | | | | | | whether the given ext_buf is shared. * Have the sf_bufs be setup with the mbuf subsystem using MEXTADD() with the two new arguments. Note: m_pulldown() is somewhat crotchy; the added comment explains the situation. Reviewed by: jlemon
* Change the sf_bufs wakeups to be wakeup_one(), because we don't want tobmilekic2000-11-041-4/+5
| | | | | | | | | wakeup all of the sleeping threads when we free only one buffer. This avoids us having to needlessly try again (and fail, and go back to sleep) for all the threads sleeping. We will now only wakeup the thread we know will succeed. Reviewed by: green
* Setup and put to use the mutex lock for sf_freelist, the sendfile(2) bufsbmilekic2000-11-041-9/+18
| | | | | | | freelist. Should now be thread-friendly, in part. Note: More work is needed in uipc_syscalls.c, but it will have to wait until the socket locking issues are at least 80% implemented and committed.
* Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.bp2000-09-121-2/+1
| | | | | | | They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
* Replace the mbuf external reference counting code with somethingdwmalone2000-08-191-43/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that should be better. The old code counted references to mbuf clusters by using the offset of the cluster from the start of memory allocated for mbufs and clusters as an index into an array of chars, which did the reference counting. If the external storage was not a cluster then reference counting had to be done by the code using that external storage. NetBSD's system of linked lists of mbufs was cosidered, but Alfred felt it would have locking issues when the kernel was made more SMP friendly. The system implimented uses a pool of unions to track external storage. The union contains an int for counting the references and a pointer for forming a free list. The reference counts are incremented and decremented atomically and so should be SMP friendly. This system can track reference counts for any sort of external storage. Access to the reference counting stuff is now through macros defined in mbuf.h, so it should be easier to make changes to the system in the future. The possibility of storing the reference count in one of the referencing mbufs was considered, but was rejected 'cos it would often leave extra mbufs allocated. Storing the reference count in the cluster was also considered, but because the external storage may not be a cluster this isn't an option. The size of the pool of reference counters is available in the stats provided by "netstat -m". PR: 19866 Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: alfred (glanced at by others on -net)
* Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio *green2000-07-021-6/+14
| | | | | | | | | | | | | instead of a struct iovec * array and int len. Get rid of stupidly trying to allocate all of the memory and copyin()ing the entire iovec[], and instead just do the proper VOP_WRITE() in ktrwrite() using a copy of the struct uio that the syscall originally used. This solves the DoS which could easily be performed; to work around the DoS, one could also remove "options KTRACE" from the kernel. This is a very strong MFC candidate for 4.1. Found by: art@OpenBSD.org
* unstatic getfp() so that other subsystems can use it.alfred2000-06-121-3/+2
| | | | | | make sendfile() use it. Approved by: dg
* Back out the previous change to the queue(3) interface.jake2000-05-261-1/+1
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-1/+1
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Introduce kqueue() and kevent(), a kernel event notification facility.jlemon2000-04-161-0/+4
|
* This is Bosko Milekic's mbuf allocation waiting code. Basically, thisgreen1999-12-121-0/+4
| | | | | | | | means that running out of mbuf space isn't a panic anymore, and code which runs out of network memory will sleep to wait for it. Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: green, wollman
* General clean-up of socket.h and associated sources to synchronise upphk1999-11-241-1/+1
| | | | | | | | | | | | with NetBSD and the Single Unix Specification v2. This updates some structures with other, almost equivalent types and effort is under way to get the whole more consistent. Also removes a double definition of INET6 and some other clean-ups. Reviewed by: green, bde, phk Some part obtained from: NetBSD, SUSv2 specification
* This is a partial commit of the patch from PR 14914:phk1999-11-161-3/+3
| | | | | | | | | | | | | Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
* useracc() the prequel:phk1999-10-291-1/+0
| | | | | | | | | | | Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
* Add a missing spl lowering.green1999-10-141-0/+1
| | | | Submitted by: Ville-Pertti Keinonen <will@iki.fi>
* Trim unused options (or #ifdef for undoc options).peter1999-10-111-2/+0
| | | | Submitted by: phk
* Plug a potential filedescriptor leak. This will probably almostguido1999-09-301-1/+6
| | | | | | never be triggered. Reviewed by: David Greenman
* This is what was "fdfix2.patch," a fix for fd sharing. It's prettygreen1999-09-191-0/+4
| | | | | | | | | | | | | | | | | far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Fix fd race conditions (during shared fd table usage.) Badfileops isgreen1999-08-041-14/+11
| | | | | | | | | | | | now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter
* Fix warnings in preparation for adding -Wall -Wcast-qual to thedillon1999-01-271-3/+3
| | | | kernel compile
* Don't free the socket address if soaccept() / pru_accept() doesn'tfenner1999-01-251-2/+3
| | | | return one.
OpenPOWER on IntegriCloud