op-kernel-dev - Development kernel branch for OpenPOWER systems

diff options

author	Lee Schermerhorn <lee.schermerhorn@hp.com>	2008-10-18 20:26:49 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -0700
commit	8edb08caf68184fb170f4f69c7445929e199eaea (patch)
tree	c0d8f24971c90e5627207f0f0cb7c06f9bdb5dc4 /mm/sparse.c
parent	fa07e787733416c42938a310a8e717295934e33c (diff)
download	op-kernel-dev-8edb08caf68184fb170f4f69c7445929e199eaea.zip op-kernel-dev-8edb08caf68184fb170f4f69c7445929e199eaea.tar.gz

mlock: downgrade mmap sem while populating mlocked regions

We need to hold the mmap_sem for write to initiatate mlock()/munlock() because we may need to merge/split vmas. However, this can lead to very long lock hold times attempting to fault in a large memory region to mlock it into memory. This can hold off other faults against the mm [multithreaded tasks] and other scans of the mm, such as via /proc. To alleviate this, downgrade the mmap_sem to read mode during the population of the region for locking. This is especially the case if we need to reclaim memory to lock down the region. We [probably?] don't need to do this for unlocking as all of the pages should be resident--they're already mlocked. Now, the caller's of the mlock functions [mlock_fixup() and mlock_vma_pages_range()] expect the mmap_sem to be returned in write mode. Changing all callers appears to be way too much effort at this point. So, restore write mode before returning. Note that this opens a window where the mmap list could change in a multithreaded process. So, at least for mlock_fixup(), where we could be called in a loop over multiple vmas, we check that a vma still exists at the start address and that vma still covers the page range [start,end). If not, we return an error, -EAGAIN, and let the caller deal with it. Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup() if the vma at 'start' disappears or changes so that the page range [start,end) is no longer contained in the vma. Again, let the caller deal with it. Looks like only sys_remap_file_pages() [via mmap_region()] should actually care. With this patch, I no longer see processes like ps(1) blocked for seconds or minutes at a time waiting for a large [multiple gigabyte] region to be locked down. However, I occassionally see delays while unlocking or unmapping a large mlocked region. Should we also downgrade the mmap_sem for the unlock path? Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Diffstat (limited to 'mm/sparse.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: