summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_aio.c
Commit message (Collapse)AuthorAgeFilesLines
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-2/+2
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-3/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Add 32-bit compat code for AIO kevent flags introduced in revision 230857.davidxu2012-02-051-0/+1
|
* If multiple threads call kevent() to get AIO events on same kqueue fd,davidxu2012-02-011-1/+7
| | | | | | | | | | | | | | | it is possible that a single AIO event will be reported to multiple threads, it is not threading friendly, and the existing API can not control this behavior. Allocate a kevent flags field sigev_notify_kevent_flags for AIO event notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH or EV_ONESHOT to AIO kernel code, user can control whether the event should be cleared once it is retrieved by a thread. This change should be comptaible with existing application, because the field should have already been zero-filled, and no additional action will be taken by kernel. PR: kern/156567
* When detaching an AIO or LIO requests grab the lock and tell knlist_removeambrisko2012-01-301-6/+12
| | | | | | | | that we have the lock now. This cleans up a locking panic ASSERT when knlist_empty is called without a lock when INVARIANTS etc. are turned. Reviewed by: kib jhb MFC after: 1 week
* Fix size check, that prevents getting negative after castingglebius2012-01-271-1/+1
| | | | | | to a signed type Reviewed by: bde
* Although aio_nbytes is size_t, later is is signed toglebius2012-01-261-0/+6
| | | | | | | | | casted types: to ssize_t in filesystem code and to int in buf code, thus supplying a negative argument leads to kernel panic later. To fix that check user supplied argument in the beginning of syscall. Submitted by: Maxim Dounin <mdounin mdounin.ru>, maxim@
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-15/+15
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Create a global thread hash table to speed up thread lookup, usedavidxu2010-10-091-4/+8
| | | | | | | | | | rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian
* Convert aio syscall registration to SYSCALL_INIT_HELPER.kib2010-03-191-33/+59
| | | | | Reviewed by: jhb MFC after: 2 weeks
* Provide groundwork for 32-bit binary compatibility on non-x86 platforms,nwhitehorn2010-03-111-1/+1
| | | | | | | | | for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
* Use C99 initialization for struct filterops.rwatson2009-09-121-4/+12
| | | | | | Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
* Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Usekib2009-06-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
* Rework socket upcalls to close some races with setup/teardown of upcalls.jhb2009-06-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem
* Use the correct type for the timeout parameter to the 32-bitjhb2009-01-231-1/+1
| | | | | | | | compat version aio_waitcomplete(). Reminded by: bz Submitted by: jamie MFC after: 3 days
* - Add 32-bit compat system calls for VFS_AIO. The system calls live in thejhb2008-12-101-126/+755
| | | | | | | | | | | | | | | | | | | | | | aio code and are registered via the recently added SYSCALL32_*() helpers. - Since the aio code likes to invoke fuword and suword a lot down in the "bowels" of system calls, add a structure holding a set of operations for things like storing errors, copying in the aiocb structure, storing status, etc. The 32-bit system calls use a separate operations vector to handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now done by having seperate operation vectors with different aiocb copyin routines. - Split out kern_foo() functions for the various AIO system calls so the 32-bit front ends can manage things like copying in and converting timespec structures, etc. - For both the native and 32-bit aio_suspend() and lio_listio() calls, just use copyin() to read the array of aiocb pointers instead of using a for loop that iterated over fuword/fuword32. The error handling in the old case was incomplete (lio_listio() just ignored any aiocb's that it got an EFAULT trying to read rather than reporting an error), and possibly slower. MFC after: 1 month
* Use minimum of max_aio_procs and target_aio_procs when spawning newgonzo2008-06-211-1/+1
| | | | aiod since there should be no more then max_aio_procs processes.
* Use FEATURE() macro to advertise aio availability.rwatson2008-02-011-0/+2
|
* When asked to use kqueue, AIO stores its internal state in thedumbbell2008-01-241-4/+6
| | | | | | | | | | | | | `kn_sdata' member of the newly registered knote. The problem is that this member is overwritten by a call to kevent(2) with the EV_ADD flag, targetted at the same kevent/knote. For instance, a userland application may set the pointer to NULL, leading to a panic. A testcase was provided by the submitter. PR: kern/118911 Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 1 day
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-1/+1
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-1/+1
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Rename the kthread_xxx (e.g. kthread_create()) callsjulian2007-10-201-2/+2
| | | | | | | | | | | to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
* Destroy the kaio_mtx on the freeing the struct kaioinfo in thekib2007-08-201-1/+5
| | | | | | | | | | aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)
* Remove unused variable.mjacob2007-06-101-1/+0
|
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-10/+8
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Further system call comment cleanup:rwatson2007-03-051-3/+3
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Merge posix4/* into normal kernel hierarchy.trhodes2006-11-111-1/+1
| | | | | Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@
* MFP4 (with some minor changes):netchild2006-10-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>
* hide kqueue_register from public view, and replace it w/ kqfd_register...jmg2006-09-241-33/+6
| | | | this eliminates a possible race in aio registering a kevent..
* Remove call to fdfree() for the AIO daemons to prevent kernel panicsmp2006-09-061-6/+0
| | | | | | | | with linprocfs. This call is not needed since file descriptor sharing was removed in v1.125. Reviewed by: alc, davidxu, ambrisko MFC after: 3 days
* - Change process_exec function handlers prototype to include structnetchild2006-08-151-1/+8
| | | | | | | | | | | | | image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)
* Make lio ident more consistant with aio ident.ambrisko2006-06-021-1/+1
|
* Use a dedicated mutex to protect aio queues, the movation is to reducedavidxu2006-05-091-51/+69
| | | | lock contention with other parts.
* 1. Move code for scanning pending I/O from aio_fsync to aio_aqueue,davidxu2006-03-241-77/+51
| | | | | it has less overhead. 2. Avoid scheduling task if maximum number of I/O threads is reached.
* Implement aio_fsync() syscall.davidxu2006-03-231-78/+244
|
* 1. Remove aio entry from lists earlier in aio_free_entry,davidxu2006-02-261-19/+17
| | | | | | | | | | | | | | | | | | so other threads can not see it if we unlock the proc lock (this can happen in knlist_delete). Don't do wakeup, it is not necessary. 2. Decrease kaio_buffer_count in biohelper rather than doing it in aio_bio_done_notify. 3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN was set, because the process is already in single thread mode. 4. Use assignment to initialize aiothreadflags. 5. AIOCBLIST_RUNDOWN is not useful, axe the code using it. 6. use LIO_NOP instead of zero.
* If block size is zero, use normal file operations to do I/O,davidxu2006-02-221-0/+3
| | | | | | this eliminates a divided-by-zero fault. Recommended by: phk
* Just like dofilewrite(), call bwillwrite before fo_write.davidxu2006-01-271-0/+2
|
* return final error code in aio_return rather than a hardcoded 0.davidxu2006-01-271-1/+0
|
* in aio_aqueue, store same return code into job->_aiocb_private.error.davidxu2006-01-261-3/+5
| | | | in aio_return, unlock proc lock before suword.
* Add locking annotation and comments about socket, pipe, fifo problem.davidxu2006-01-241-125/+126
| | | | Temporarily fix a locking problem for socket I/O.
* Er, rescure a deleted comment line.davidxu2006-01-241-0/+1
|
* More cleanup for aio code:davidxu2006-01-241-11/+9
| | | | | | | | | 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb
* Add bracket.davidxu2006-01-231-1/+1
|
* Verify all supported notification types.davidxu2006-01-231-3/+18
|
* 1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue,davidxu2006-01-231-45/+29
| | | | | so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.
* Fix a bogus panic.davidxu2006-01-221-1/+1
|
* Decrease kaio_active_count first, because user process may go awaydavidxu2006-01-221-2/+5
| | | | after we notified it.
OpenPOWER on IntegriCloud