summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_aio.c
Commit message (Collapse)AuthorAgeFilesLines
* Convert aio syscall registration to SYSCALL_INIT_HELPER.kib2010-03-191-33/+59
| | | | | Reviewed by: jhb MFC after: 2 weeks
* Provide groundwork for 32-bit binary compatibility on non-x86 platforms,nwhitehorn2010-03-111-1/+1
| | | | | | | | | for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
* Use C99 initialization for struct filterops.rwatson2009-09-121-4/+12
| | | | | | Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
* Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Usekib2009-06-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
* Rework socket upcalls to close some races with setup/teardown of upcalls.jhb2009-06-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem
* Use the correct type for the timeout parameter to the 32-bitjhb2009-01-231-1/+1
| | | | | | | | compat version aio_waitcomplete(). Reminded by: bz Submitted by: jamie MFC after: 3 days
* - Add 32-bit compat system calls for VFS_AIO. The system calls live in thejhb2008-12-101-126/+755
| | | | | | | | | | | | | | | | | | | | | | aio code and are registered via the recently added SYSCALL32_*() helpers. - Since the aio code likes to invoke fuword and suword a lot down in the "bowels" of system calls, add a structure holding a set of operations for things like storing errors, copying in the aiocb structure, storing status, etc. The 32-bit system calls use a separate operations vector to handle fuword32 vs fuword, etc. Also, the oldsigevent handling is now done by having seperate operation vectors with different aiocb copyin routines. - Split out kern_foo() functions for the various AIO system calls so the 32-bit front ends can manage things like copying in and converting timespec structures, etc. - For both the native and 32-bit aio_suspend() and lio_listio() calls, just use copyin() to read the array of aiocb pointers instead of using a for loop that iterated over fuword/fuword32. The error handling in the old case was incomplete (lio_listio() just ignored any aiocb's that it got an EFAULT trying to read rather than reporting an error), and possibly slower. MFC after: 1 month
* Use minimum of max_aio_procs and target_aio_procs when spawning newgonzo2008-06-211-1/+1
| | | | aiod since there should be no more then max_aio_procs processes.
* Use FEATURE() macro to advertise aio availability.rwatson2008-02-011-0/+2
|
* When asked to use kqueue, AIO stores its internal state in thedumbbell2008-01-241-4/+6
| | | | | | | | | | | | | `kn_sdata' member of the newly registered knote. The problem is that this member is overwritten by a call to kevent(2) with the EV_ADD flag, targetted at the same kevent/knote. For instance, a userland application may set the pointer to NULL, leading to a panic. A testcase was provided by the submitter. PR: kern/118911 Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 1 day
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-1/+1
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-1/+1
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Rename the kthread_xxx (e.g. kthread_create()) callsjulian2007-10-201-2/+2
| | | | | | | | | | | to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
* Destroy the kaio_mtx on the freeing the struct kaioinfo in thekib2007-08-201-1/+5
| | | | | | | | | | aio_proc_rundown. Do not allow for zero-length read to be passed to the fo_read file method by aio. Reported and tested by: Peter Holm Approved by: re (kensmith)
* Remove unused variable.mjacob2007-06-101-1/+0
|
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-10/+8
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Further system call comment cleanup:rwatson2007-03-051-3/+3
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Merge posix4/* into normal kernel hierarchy.trhodes2006-11-111-1/+1
| | | | | Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@
* MFP4 (with some minor changes):netchild2006-10-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>
* hide kqueue_register from public view, and replace it w/ kqfd_register...jmg2006-09-241-33/+6
| | | | this eliminates a possible race in aio registering a kevent..
* Remove call to fdfree() for the AIO daemons to prevent kernel panicsmp2006-09-061-6/+0
| | | | | | | | with linprocfs. This call is not needed since file descriptor sharing was removed in v1.125. Reviewed by: alc, davidxu, ambrisko MFC after: 3 days
* - Change process_exec function handlers prototype to include structnetchild2006-08-151-1/+8
| | | | | | | | | | | | | image_params arg. - Change struct image_params to include struct sysentvec pointer and initialize it. - Change all consumers of process_exit/process_exec eventhandlers to new prototypes (includes splitting up into distinct exec/exit functions). - Add eventhandler to userret. Sponsored by: Google SoC 2006 Submitted by: rdivacky Parts suggested by: jhb (on hackers@)
* Make lio ident more consistant with aio ident.ambrisko2006-06-021-1/+1
|
* Use a dedicated mutex to protect aio queues, the movation is to reducedavidxu2006-05-091-51/+69
| | | | lock contention with other parts.
* 1. Move code for scanning pending I/O from aio_fsync to aio_aqueue,davidxu2006-03-241-77/+51
| | | | | it has less overhead. 2. Avoid scheduling task if maximum number of I/O threads is reached.
* Implement aio_fsync() syscall.davidxu2006-03-231-78/+244
|
* 1. Remove aio entry from lists earlier in aio_free_entry,davidxu2006-02-261-19/+17
| | | | | | | | | | | | | | | | | | so other threads can not see it if we unlock the proc lock (this can happen in knlist_delete). Don't do wakeup, it is not necessary. 2. Decrease kaio_buffer_count in biohelper rather than doing it in aio_bio_done_notify. 3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN was set, because the process is already in single thread mode. 4. Use assignment to initialize aiothreadflags. 5. AIOCBLIST_RUNDOWN is not useful, axe the code using it. 6. use LIO_NOP instead of zero.
* If block size is zero, use normal file operations to do I/O,davidxu2006-02-221-0/+3
| | | | | | this eliminates a divided-by-zero fault. Recommended by: phk
* Just like dofilewrite(), call bwillwrite before fo_write.davidxu2006-01-271-0/+2
|
* return final error code in aio_return rather than a hardcoded 0.davidxu2006-01-271-1/+0
|
* in aio_aqueue, store same return code into job->_aiocb_private.error.davidxu2006-01-261-3/+5
| | | | in aio_return, unlock proc lock before suword.
* Add locking annotation and comments about socket, pipe, fifo problem.davidxu2006-01-241-125/+126
| | | | Temporarily fix a locking problem for socket I/O.
* Er, rescure a deleted comment line.davidxu2006-01-241-0/+1
|
* More cleanup for aio code:davidxu2006-01-241-11/+9
| | | | | | | | | 1) unregsiter kqueue filter for EVFILT_LIO. 2) free uma_zones. 3) call setsid directly to enter another session rather than implementing by itself. Submitted by: jhb
* Add bracket.davidxu2006-01-231-1/+1
|
* Verify all supported notification types.davidxu2006-01-231-3/+18
|
* 1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue,davidxu2006-01-231-45/+29
| | | | | so that lio_listio won't exceed the quota. 2) Remove lio_ref_count, it is no longer used.
* Fix a bogus panic.davidxu2006-01-221-1/+1
|
* Decrease kaio_active_count first, because user process may go awaydavidxu2006-01-221-2/+5
| | | | after we notified it.
* Make aio code MP safe.davidxu2006-01-221-843/+574
|
* Initialize ki to p->p_aioinfo after we know it's going to be referencingcsjp2006-01-151-2/+2
| | | | | | | a valid kaioinfo structure. This avoids a potential NULL pointer dereference. Found with: Coverity Prevent(tm) MFC after: 2 weeks
* Return error from fget_write() rather than hardcoding EBADF now thatjhb2006-01-061-1/+1
| | | | | | fget_write() DTRT. Requested by: bde
* In aio_waitcomplete, do not return EAGAIN if no other threadsdavidxu2005-11-081-1/+1
| | | | | | | | | | | have started aio, instead, initialize aio management structure if it hasn't been done, the reason to adjust this behavior is to make it a bit friendly for threaded program, consider two threads, one submits aio_write, and another just calls aio_waitcomplete to wait any I/O to be completed and recycle the aio requests, before submitter doing any I/O, the recycler wants to wait in kernel. This also fixes inconsistency with other aio syscalls.
* Various and sundry cleanups:jhb2005-11-081-80/+84
| | | | | | | | | | | - Use curthread for calls to knlist_delete() and add a big comment explaining why as well as appropriate assertions. - Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them. - Use fget() family of functions to lookup file objects instead of grovelling around in file descriptor tables. - Destroy the aio_freeproc mutex if we are unloaded. Tested on: i386
* Fix name compatible problem with POSIX standard. the sigval_ptr anddavidxu2005-11-041-2/+2
| | | | | | sigval_int really should be sival_ptr and sival_int. Also sigev_notify_function accepts a union sigval value but not a pointer.
* Support sending realtime signal information via signal queue, realtimedavidxu2005-11-031-8/+40
| | | | signal memory is pre-allocated, so kernel can always notify user code.
* Push down Giant into fdfree() and remove it from two of the callers.jhb2005-11-011-2/+0
| | | | | | | Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week
* Fix sigevent's POSIX incompatible problem by adding member fieldsdavidxu2005-10-301-9/+63
| | | | | | | sigev_notify_function and sigev_notify_attributes. AIO syscalls use sigevent, so they have to be adjusted. Reviewed by: alc
* Fix tinderbox box by removing incomplete/bad spl usage. Proper giant freeambrisko2005-10-121-6/+0
| | | | | | locking is required in for aio. Pointed out by: imp
* Add in kqueue support to LIO event notification and fix how it handledambrisko2005-10-121-121/+198
| | | | | | | | | | | | | | | | | | notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon. Suggestions from: alc
OpenPOWER on IntegriCloud