diff options
author | kib <kib@FreeBSD.org> | 2012-05-30 16:42:08 +0000 |
---|---|---|
committer | kib <kib@FreeBSD.org> | 2012-05-30 16:42:08 +0000 |
commit | 080f2e89d9c5e9af64996cec3dae98923bc88f93 (patch) | |
tree | 45aa59c122d98cdc86599d748c432368e49b039c /sys/sys/mount.h | |
parent | 6f4e16f8338923e9fd89009ec9cb4a5a3d770983 (diff) | |
download | FreeBSD-src-080f2e89d9c5e9af64996cec3dae98923bc88f93.zip FreeBSD-src-080f2e89d9c5e9af64996cec3dae98923bc88f93.tar.gz |
vn_io_fault() is a facility to prevent page faults while filesystems
perform copyin/copyout of the file data into the usermode
buffer. Typical filesystem hold vnode lock and some buffer locks over
the VOP_READ() and VOP_WRITE() operations, and since page fault
handler may need to recurse into VFS to get the page content, a
deadlock is possible.
The facility works by disabling page faults handling for the current
thread and attempting to execute i/o while allowing uiomove() to
access the usermode mapping of the i/o buffer. If all buffer pages are
resident, uiomove() is successfull and request is finished. If EFAULT
is returned from uiomove(), the pages backing i/o buffer are faulted
in and held, and the copyin/out is performed using uiomove_fromphys()
over the held pages for the second attempt of VOP call.
Since pages are hold in chunks to prevent large i/o requests from
starving free pages pool, and since vnode lock is only taken for
i/o over the current chunk, the vnode lock no longer protect atomicity
of the whole i/o request. Use newly added rangelocks to provide the
required atomicity of i/o regardind other i/o and truncations.
Filesystems need to explicitely opt-in into the scheme, by setting the
MNTK_NO_IOPF struct mount flag, and optionally by using
vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or
converting uio into request for uiomove_fromphys().
Reviewed by: bf (comments), mdf, pjd (previous version)
Tested by: pho
Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version)
MFC after: 2 months
Diffstat (limited to 'sys/sys/mount.h')
-rw-r--r-- | sys/sys/mount.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/sys/sys/mount.h b/sys/sys/mount.h index b1cd913..4b83ba4 100644 --- a/sys/sys/mount.h +++ b/sys/sys/mount.h @@ -370,6 +370,9 @@ void __mnt_vnode_markerfree(struct vnode **mvp, struct mount *mp); #define MNTK_REFEXPIRE 0x00000020 /* refcount expiring is happening */ #define MNTK_EXTENDED_SHARED 0x00000040 /* Allow shared locking for more ops */ #define MNTK_SHARED_WRITES 0x00000080 /* Allow shared locking for writes */ +#define MNTK_NO_IOPF 0x00000100 /* Disallow page faults during reads + and writes. Filesystem shall properly + handle i/o state on EFAULT. */ #define MNTK_NOASYNC 0x00800000 /* disable async */ #define MNTK_UNMOUNT 0x01000000 /* unmount in progress */ #define MNTK_MWAIT 0x02000000 /* waiting for unmount to finish */ |