summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_mmap.c
Commit message (Collapse)AuthorAgeFilesLines
* Extend the support for exempting processes from being killed when swap isjhb2013-09-191-10/+7
| | | | | | | | | | | | | | | | | | | | | | exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
* Fix an off-by-one error when populating mincore(2) entries forjhb2013-09-121-2/+2
| | | | | | | | skipped entries. lastvecindex references the last valid byte, so the new bytes should come after it. Approved by: re (kib) MFC after: 1 week
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-6/+25
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Change the cap_rights_t type from uint64_t to a structure that we can extendpjd2013-09-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
* Add new mmap(2) flags to permit applications to request specific virtualjhb2013-08-161-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
* Fix previous commit when option RACCT is not used.jlh2013-07-221-0/+2
| | | | MFC after: 7 days
* Fix a panic in the racct code when munlock(2) is called with incorrect values.jlh2013-07-221-1/+4
| | | | | | | | | | | | | | | The racct code in sys_munlock() assumed that the boundaries provided by the userland were correct as long as vm_map_unwire() returned successfully. However the latter contains its own logic and sometimes manages to do something out of those boundaries, even if they are buggy. This change makes the racct code to use the accounting done by the vm layer, as it is done in other places such as vm_mlock(). Despite fixing the panic, Alan Cox pointed that this code is still race-y though: two simultaneous callers will produce incorrect values. Reviewed by: alc MFC after: 7 days
* Be more aggressive in using superpages in all mappings of objects:jhb2013-07-191-1/+2
| | | | | | | | | | | | | | - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks
* Make sys_mlock() function just a wrapper around vm_mlock() functionglebius2013-06-081-5/+10
| | | | | | | that does all the job. Reviewed by: kib, jilles Sponsored by: Nginx, Inc.
* Add a hint suggesting why tmpfs does not need a special case there.kib2013-05-021-1/+1
|
* Make vm_object_page_clean() and vm_mmap_vnode() tolerate the vnode'kib2013-04-281-2/+9
| | | | | | | | | | | | | | | | | v_object of non OBJT_VNODE type. For vm_object_page_clean(), simply do not assert that object type must be OBJT_VNODE, and add a comment explaining how the check for OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such objects. For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it to be for swap pager (or default), handle the bypass filesystems, and correctly acquire the object reference in this case. Reviewed by: alc Tested by: pho, bf MFC after: 1 week
* Release the v_writecount reference on the vnode in case of error,kib2013-03-281-0/+4
| | | | | | | | | | | | | before the vnode is vput() in vm_mmap_vnode(). Error return means that there is no use reference on the vnode from the vm object reference, and failing to restore v_writecount breaks the invariant that v_writecount is less or equal to the usecount. The situation observed when nfs client returns ESTALE for VOP_GETATTR() after the open. In collaboration with: pho MFC after: 1 week
* MFCattilio2013-03-021-3/+3
|\
| * Merge Capsicum overhaul:pjd2013-03-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
* | Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-6/+6
| | | | | | | | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* | Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
|/ | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* - Reduce kernel size by removing unnecessary pointer indirections.zont2013-01-101-8/+9
| | | | | | | | | GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes. Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* - Fix locked memory accounting for maps with MAP_WIREFUTURE flag.zont2012-12-181-11/+33
| | | | | | | | - Add sysctl vm.old_mlock which may turn such accounting off. Reviewed by: avg, trasz Approved by: kib (mentor) MFC after: 1 week
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-9/+3
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* - Simplify VM code by using vmspace_wired_count() for counting wiredzont2012-09-051-4/+3
| | | | | | | | memory of a process. Reviewed by: avg Approved by: kib (mentor) MFC after: 2 weeks
* For old mmap syscall, when executing on amd64 or ia64, enforce thekib2012-08-141-0/+7
| | | | | | | | | | | | PROT_EXEC if prot is non-zero, process is 32bit and kern.elf32.i386_read_exec syscal is enabled. This workaround is needed for old i386 a.out binaries, where dynamic linker did not specified PROT_EXEC for mapping of the text. The kern.elf32.i386_read_exec MIB name looks weird for a.out binaries, but I reused the existing knob which already has the needed semantic. MFC after: 1 week
* Adjust the r205536, by allowing a non-zero offset for anonymouskib2012-08-141-5/+17
| | | | | | | | mappings for a.out binaries. Apparently, a.out ld.so from FreeBSD 1.1.5.1 can issue such requests. Reported and tested by: Dan Plassche <dplassche@gmail.com> MFC after: 1 week
* When MAP_STACK mapping is created, the map entry is created only tokib2012-04-211-2/+4
| | | | | | | | | | | | cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent call to vm_map_wire() to wire the whole stack region fails due to VM_MAP_WIRE_NOHOLES flag. Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack. Reported and tested by: Sushanth Rai <sushanth_rai yahoo com> Reviewed by: alc MFC after: 1 week
* Fix mincore(2) so that it reports PG_CACHED pages as resident.alc2012-04-081-0/+3
| | | | MFC after: 2 weeks
* In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flagkib2012-03-171-0/+2
| | | | | | | | | | | | | | | | | | if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks
* Eliminate stale incorrect ARGSUSED comments.alc2012-03-021-3/+0
| | | | Submitted by: bde
* Simplify vm_mmap()'s control flow.alc2012-02-251-16/+19
| | | | | | | | Add a comment describing what vm_mmap_to_errno() does. Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071
* Place the if() at the right location, to activate the v_writecountkib2012-02-241-4/+4
| | | | | | | | | accounting for shared writeable mappings for all filesystems, not only for the bypass layers. Submitted by: alc Pointy hat to: kib MFC after: 20 days
* Account the writeable shared mappings backed by file in the vnodekib2012-02-231-11/+39
| | | | | | | | | | | | | | | | | | | | | | | | v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
* When vm_mmap() is used to map a vm object into a kernel vm_map, italc2012-02-161-10/+10
| | | | | | | makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process. Reviewed by: kib
* Close a race due to dropping of the map lock between creating map entrykib2012-02-111-7/+3
| | | | | | | | | for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-15/+15
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-4/+4
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Extract the code to translate VM error into errno, into an exportedkib2011-07-101-0/+7
| | | | | | | | | | function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2). Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week
* Style.kib2011-07-101-1/+1
| | | | MFC after: 3 days
* All the racct_*() calls need to happen with the proc locked. Fixing thistrasz2011-07-061-0/+12
| | | | | | won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
* Add accounting for most of the memory-related resources.trasz2011-04-051-3/+42
| | | | | Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* Remove sysctl vm.max_proc_mmap used to protect from KVA space exhaustion.pluknet2011-02-241-39/+0
| | | | | | | | | | | | | | As it was pointed out by Alan Cox, that no longer serves its purpose with the modern UMA allocator compared to the old one used in 4.x days. The removal of sysctl eliminates max_proc_mmap type overflow leading to the broken mmap(2) seen with large amount of physical memory on arches with factually unbound KVA space (such as amd64). It was found that slightly less than 256GB of physmem was enough to trigger the overflow. Reviewed by: alc, kib Approved by: avg (mentor) MFC after: 2 months
* Fix comment intentation.trasz2010-12-041-8/+8
|
* Do not use __FreeBSD_version prefix for the special osrel version.kib2010-11-141-1/+1
| | | | | | | | The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week
* Use symbolic names instead of hardcoding values for magic p_osrel constants.kib2010-11-141-1/+1
| | | | MFC after: 1 week
* Allow a POSIX shared memory object that is opened for read but not foralc2010-09-191-1/+2
| | | | | | | | | | | | write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write. (This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.) PR: 150260 MFC after: 3 weeks
* Fix a typo in r212281. uintptr -> uintptr_trstone2010-09-071-1/+1
| | | | | | | Pointy hat to: rstone Approved by: emaste (mentor) MFC after: 2 weeks
* In munmap() downgrade the vm_map_lock to a read lock before taking a readrstone2010-09-071-3/+11
| | | | | | | | | | | | | | | | lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock. Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode. Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks
* Add the MAP_PREFAULT_READ option to mmap(2).alc2010-08-281-2/+3
| | | | Reviewed by: jhb, kib
* Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that createdkib2010-08-061-7/+7
| | | | | | | | | cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
* Fix commented out resource limit check in mlockall(2). It's still racy,trasz2010-07-271-2/+1
| | | | but at least less misleading.
* Push down page queues lock acquisition in pmap_enter_object() andalc2010-05-261-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-241-26/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
OpenPOWER on IntegriCloud