summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* When ffs_realloccg() failed to allocate bigger fragment and, becausekib2010-02-131-1/+3
| | | | | | | | | | | | pending blocks are scheduled for removal, goes to retry the (re)allocation, clear the bp pointer. It might happen that meantime free space is really exhausted and we are entering nospace: label without bread()ing buffer, causing stale bp value to be brelse()d again. Tested by: pho (Producing a scenario to reliably reproduce the race appeared to be much harder then fixing the bug) MFC after: 1 week
* One last pass to get all the unsigned comparisons correct.mckusick2010-02-111-1/+1
|
* This fix corrects a problem in the file system that treats largemckusick2010-02-102-58/+64
| | | | | | | | | | | | | | | inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb. To fully handle this problem, newfs must be updated to ensure that it will never create a filesystem with more than 2^32 inodes. That patch will be forthcoming soon. Reported by: Scott Burns, John Kilburg, Bruce Evans Followup by: Jeff Roberson PR: 133980 MFC after: 2 weeks
* Remove unused variable.trasz2010-02-101-3/+2
|
* Return proper error code.trasz2010-01-251-1/+1
| | | | Found with: clang
* Move out code that does POSIX.1e ACL inheritance into separate routines.trasz2010-01-241-186/+171
| | | | Reviewed by: rwatson
* Cast 64-bit quantity to intptr_t rather than int so as to work properlymckusick2010-01-111-2/+2
| | | | | | with 64-bit architectures (such as amd64). Reported by: bz
* Background:mckusick2010-01-113-9/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
* Remove extraneous semicolons, no functional changes.mbr2010-01-071-1/+1
| | | | | Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
* KASSERT that condition raised by Coverity cannot happen.mckusick2010-01-071-0/+1
| | | | | Found by: Coverity Prevent (tm) KASSERT by: sam
* Implement NFSv4 ACL support for UFS.trasz2009-12-216-72/+541
| | | | Reviewed by: rwatson
* VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm objectkib2009-12-211-7/+8
| | | | | | | | | | | | | flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
* Don't build ufs_gjournal.c at all if UFS_GJOURNAL option is not givenrdivacky2009-09-221-4/+0
| | | | | | | instead of building an almost empty C file. Approved by: pjd Approved by: ed (mentor, implicit)
* Allocate space for the group array in a static credential used inbrooks2009-09-171-0/+2
| | | | | | | | | the quota code. One case was correctly handled in r194498, but this one was missed. PR: kern/138657 Tested by: PR submitter MFC after: 3 days
* Remove useless variable assignment.trasz2009-09-081-3/+0
|
* insmntque_stddtr() clears vp->v_data and resets vp->v_op tokib2009-09-071-0/+1
| | | | | | | | | dead_vnodeops before calling vgone(). Revert r189706 and corresponding part of the r186560. Noted and reviewed by: tegge Approved by: des (pseudofs part) MFC after: 3 days
* The clear_remove() and clear_inodedeps() call vn_start_write(NULL, &mp,kib2009-09-061-5/+21
| | | | | | | | | | | V_NOWAIT) on the non-busied mount point. Unmount might free ufs-specific mp data, causing ffs_vgetf() to access freed memory. Busy mountpoint before dropping softdep lk. Noted and reviewed by: tegge Tested by: pho MFC after: 1 week
* When a UFS node is truncated to the zero length, e.g. by explicitkib2009-08-141-1/+8
| | | | | | | | | | | | | | | | | truncate(2) call, or by being removed or truncated on open, either new softupdate freeblks structure is allocated to track the freed blocks of the node, or truncation is done syncronously when too many SU dependencies are accumulated. The decision does not take into account the allocated freeblks dependencies, allowing workloads that do huge amount of truncations to exhaust the kernel memory. Take the number of allocated freeblks into consideration for softdep_slowdown(). Reported by: pluknet gmail com Diagnosed and tested by: pho Approved by: re (rwatson) MFC after: 1 month
* Fix fpathconf(3) on fifos, in effect making ls(1) properlytrasz2009-07-021-0/+25
| | | | | | | | | display '+' on them. Taken from kern/125613, with cosmetic changes. PR: kern/125613 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Approved by: re (kib)
* In vn_vget_ino() and their inline equivalents, mnt_ref() the mount pointkib2009-07-021-0/+2
| | | | | | | | | | | around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused. Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks
* Don't panic on attempt to set ACL on a block device file.trasz2009-07-011-6/+6
| | | | | | | | | This is just a part of kern/125613. PR: kern/125613 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Reviewed by: rwatson Approved by: re (kib)
* For SU mounts, softdep_fsync() might drop vnode lock, allowing otherkib2009-06-301-4/+25
| | | | | | | | | | threads to put dirty buffers on the vnode bufobj list. For regular files and synchronous fsync requests, check for the condition and restart the fsync vop if a new dirty buffer arrived. Tested by: pho Approved by: re (kensmith) MFC after: 1 month
* Softdep_fsync() may need to lock parent directory of the synced vnode.kib2009-06-301-0/+18
| | | | | | | | | Use inlined (due to FFSV_FORCEINSMQ) version of vn_vget_ino() to prevent mountpoint from being unmounted and freed while no vnodes are locked. Tested by: pho Approved by: re (kensmith) MFC after: 1 month
* Fix a bug reported by pho@ where one can induce a panic by decreasingsnb2009-06-251-1/+4
| | | | | | | | | | | | | | | vfs.ufs.dirhash_maxmem below the current amount of memory used by dirhash. When ufsdirhash_build() is called with the memory in use greater than dirhash_maxmem, it attempts to free up memory by calling ufsdirhash_recycle(). If successful in freeing enough memory, ufsdirhash_recycle() leaves the dirhash list locked. But at this point in ufsdirhash_build(), the list is not explicitly unlocked after the call(s) to ufsdirhash_recycle(). When we next attempt to lock the dirhash list, we will get a "panic: _mtx_lock_sleep: recursed on non-recursive mutex dirhash list". Tested by: pho Approved by: dwmalone (mentor) MFC after: 3 weeks
* Rework the credential code to support larger values of NGROUPS andbrooks2009-06-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
* Keep dirhash tailq locked throughout the entirety of ufsdirhash_destroy() to fixsnb2009-06-171-11/+11
| | | | | | | | | | a potential race pointed out by pjd. Also use TAILQ_FOREACH_SAFE to iterate over dirhashes in ufsdirhash_lowmem(), so that we can continue iterating even after a dirhash is destroyed. Suggested by: pjd Tested by: pho Approved by: dwmalone (mentor)
* Do not use casts (int *)0 and (struct thread *)0 for the arguments ofkib2009-06-162-2/+2
| | | | | | | vn_rdwr, use NULL. Reviewed by: jhb MFC after: 1 week
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-052-2/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add vm_lowmem event handler for dirhash. This will cause dirhashes to besnb2009-06-032-26/+109
| | | | | | | | | | | | | deleted when the system is low on memory. This ought to allow an increase to vfs.ufs.dirhash_maxmem on machines that have lots of memory, without degrading performance by having too much memory reserved for dirhash when other things need it. The default value for dirhash_maxmem is being kept at 2MB for now, though. This work was mostly done during the 2008 Google Summer of Code. Approved by: dwmalone (mentor), re MFC after: 3 months
* Handle lock recursion differenty by always checking against LO_RECURSABLEattilio2009-06-021-2/+2
| | | | | | instead the lock own flag itself. Tested by: pho
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* Make 'struct acl' larger, as required to support NFSv4 ACLs. Providetrasz2009-05-221-126/+162
| | | | | | compatibility interfaces in both kernel and libc. Reviewed by: rwatson
* Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). Thisalc2009-05-171-8/+6
| | | | | | eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-114-20/+26
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Do not embed struct ucred into larger netcred parent structures.kan2009-05-091-1/+0
| | | | | | | | | | | | | Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
* Change the semantics of i_modrev/va_filerev to what is required forrmacklem2009-04-273-6/+6
| | | | | | | | | | | | | | | | the nfsv4 Change attribute. There are 2 changes: 1 - The value now changes on metadata changes as well as data modifications (incremented for IN_CHANGE instead of IN_UPDATE). 2 - It is now saved in spare space in the on-disk i-node so that it survives a crash. Since va_filerev is not passed out into user space, the only current use of va_filerev is in the nfs server, which uses it as the directory cookie verifier. Since this verifier is only passed back to the server by a client verbatim and then the server doesn't check it, changing the semantics should not break anything currently in FreeBSD. Reviewed by: bde Approved by: kib (mentor)
* In ufs_checkpath(), recheck that '..' still points to the inode withkib2009-04-203-41/+55
| | | | | | | | | | | | | | | | | | the same inode number after VFS_VGET() and relock of the vp. If '..' changed, redo the lookup. To reduce code duplication, move the code to read '..' dirent into the static helper function ufs_dir_dd_ino(). Supply the source inode number as an argument to ufs_checkpath() instead of the source inode itself. The inode is unlocked, thus it might be reclaimed, causing accesses to the freed memory. Use vn_vget_ino() to get the '..' vnode by its inode number, instead of directly code VFS_VGET() and relock, to properly busy the mount point while vp lock is dropped. Noted and reviewed by: tegge Tested by: pho MFC after: 1 month
* When verifying '..' after VFS_VGET() in ufs_lookup(), do not returnkib2009-04-191-15/+17
| | | | | | | | | | | | | error if '..' is still there but changed between lookup and check. Start relookup instead. Rename is supposed to change '..' reference atomically, so transient failures introduced by r191137 are wrong. While rearranging the code to allow lookup restart in ufs_lookup(), remove the comment that only distracts the reader. Noted and reviewed by: tegge Also reported by: pho MFC after: 1 month
* Use acl_alloc() and acl_free() instead of using uma(9) directly.trasz2009-04-181-19/+19
| | | | | | This will make switching to malloc(9) easier; also, it would be neccessary to add these routines if/when we implement variable-size ACLs.
* Verify that '..' still exists with the same inode number afterkib2009-04-161-9/+35
| | | | | | | | | | | | | | | VFS_VGET() has returned in ufs_lookup(). If the '..' lookup started immediately before the parent directory was removed, we might return either cleared or unrelated inode otherwise. Ufs_lookup() is split into new function ufs_lookup_() that either does lookup, or verifies that directory entry exists and references supplied inode number. Reviewed by: tegge Tested by: pho, Andreas Tobler <andreast-list fgznet ch> (previous version) MFC after: 1 month
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-1/+0
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* When removing or renaming snaphost, do not delve into request_cleanup().kib2009-04-041-1/+1
| | | | | | | | The later may need blocks from the underlying device that belongs to normal files, that should not be locked while snap lock is held. Reported and tested by: pho MFC after: 1 month
* Correct typo.kib2009-03-271-2/+2
| | | | Noted by: kensmith
* Fix two issues with bufdaemon, often causing the processes to hang inkib2009-03-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the "nbufkv" sleep. First, ffs background cg group block write requests a new buffer for the shadow copy. When ffs_bufwrite() is called from the bufdaemon due to buffers shortage, requesting the buffer deadlock bufdaemon. Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk to not block while allocating the buffer, and return failure instead. Add a flag argument to the geteblk to allow to pass the flags to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer allocation failed and either GB_NOWAIT_BD is specified, or geteblk() is called from bufdaemon (or its helper, see below). In ffs_bufwrite(), fall back to synchronous cg block write if shadow block allocation failed. Since r107847, buffer write assumes that vnode owning the buffer is locked. The second problem is that buffer cache may accumulate many buffers belonging to limited number of vnodes. With such workload, quite often threads that own the mentioned vnodes locks are trying to read another block from the vnodes, and, due to buffer cache exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make any substantial progress because the vnodes are locked. Allow the threads owning vnode locks to help the bufdaemon by doing the flush pass over the buffer cache before getnewbuf() is going to uninterruptible sleep. Move the flushing code from buf_daemon() to new helper function buf_do_flush(), that is called from getnewbuf(). The number of buffers flushed by single call to buf_do_flush() from getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent recursive calls to buf_do_flush() by marking the bufdaemon and threads that temporarily help bufdaemon by TDP_BUFNEED flag. In collaboration with: pho Reviewed by: tegge (previous version) Tested by: glebius, yandex ... MFC after: 3 weeks
* The non-modifying EA VOPs are executed with only shared vnode lock taken.kib2009-03-123-63/+94
| | | | | | | | | | | | | Provide a custom lock around initializing and tearing down EA area, to prevent both memory leaks and double-free of it. Count the number of EA area accessors. Lock protocol requires either holding exclusive vnode lock to modify i_ea_area, or shared vnode lock and owning IN_EA_LOCKED flag in i_flag. Noted by: YAMAMOTO, Taku <taku tackymt homeip net> Tested by: pho (previous version) MFC after: 2 weeks
* Do not double-free the struct inode when insmntque failed. Defaultkib2009-03-111-1/+0
| | | | | | | insmntque destructor reclaims the vnode, and ufs_reclaim frees the memory. Reviewed by: tegge MFC after: 3 days
* Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that ajhb2009-03-111-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
* Adjust some variables (mostly related to the buffer cache) that holdjhb2009-03-092-3/+3
| | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
* Right now, when trying to unmount a device that's already gone,trasz2009-02-231-2/+2
| | | | | | | | | | | | msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO. However, dounmount() treats ENXIO as a success and proceeds with unmounting. In effect, the filesystem gets unmounted without closing GEOM provider etc. Reviewed by: kib Approved by: rwatson (mentor) Tested by: dho Sponsored by: FreeBSD Foundation
* Refactor, moving error checking outside of thetrasz2009-02-231-7/+7
| | | | | | | | | | 'if (mp->mnt_flag & MNT_SOFTDEP)' conditional. No functional changes. Reviewed by: kib Approved by: rwatson (mentor) Tested by: pho Sponsored by: FreeBSD Foundation
OpenPOWER on IntegriCloud