summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
* Introduce a new lock, hostname_mtx, and use it to synchronize accessrwatson2008-07-051-0/+2
| | | | | | | | | | | | to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks
* The uniqdosname() function takes char[12] as it third argument.kib2008-07-041-1/+1
| | | | | | | Found by: -fstack-protector Reported by: dougb Tested by: dougb, Rainer Hurling <rhurlin gwdg de> MFC after: 3 days
* Remove unused 'td' arguments from smbfs_hash_lock() andrwatson2008-07-011-9/+9
| | | | | | smbfs_hash_unlock(). MFC after: 3 days
* Get pointer to devfs_ruleset struct after garbage collection has beengonzo2008-06-221-3/+3
| | | | | | | | performed. Otherwise if ruleset is used by given mountpoint and is empty it's freed by devfs_ruleset_reap and pointer becomes bogus. Submitted by: Mateusz Guzik <mjguzik@gmail.com> PR: kern/124853
* Struct cdev is always the member of the struct cdev_priv. When devfskib2008-06-163-9/+10
| | | | | | | | | | | needed to promote cdev to cdev_priv, the si_priv pointer was followed. Use member2struct() to calculate address of the wrapping cdev_priv. Rename si_priv to __si_reserved. Tested by: pho Reviewed by: ed MFC after: 2 weeks
* Do not redo the vnode tear-down work already done by insmntque() whenkib2008-06-151-4/+1
| | | | | | | | vnode cannot be put on the vnode list for mount. Reported and tested by: marck Guilty party: me MFC after: 3 days
* Don't enforce unique device minor number policy anymore.ed2008-06-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | Except for the case where we use the cloner library (clone_create() and friends), there is no reason to enforce a unique device minor number policy. There are various drivers in the source tree that allocate unr pools and such to provide minor numbers, without using them themselves. Because we still need to support unique device minor numbers for the cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's that are used in combination with the cloner library should be marked with this flag to make the cloning work. This means drivers can now freely use si_drv0 to store their own flags and state, making it effectively the same as si_drv1 and si_drv2. We still keep the minor() and dev2unit() routines around to make drivers happy. The NTFS code also used the minor number in its hash table. We should not do this anymore. If the si_drv0 field would be changed, it would no longer end up in the same list. Approved by: philip (mentor)
* In cd9660_readdir vop, always initialize the idp->uio_off member.kib2008-06-111-0/+1
| | | | | | | | | | The while loop that is assumed to initialize the uio_off later, may be not entered at all, causing uninitialized value to be returned in uio->uio_offset. PR: 122925 Submitted by: Jaakko Heinonen <jh saunalahti fi> MFC after: 1 weeks
* When devfs_allocv() committed to create new vnode, since de_vnode is NULL,kib2008-06-051-1/+1
| | | | | | | | | | | the dm_lock is held while the newly allocated vnode is locked. Since no other threads may try to lock the new vnode yet, the LOR there cannot result in the deadlock. Shut down the witness warning to note this fact. Tested by: pho Prodded by: attilio
* Revert the changes I made to devfs_setattr() in r179457.ed2008-06-011-4/+3
| | | | | | | | | | | As discussed with Robert Watson and John Baldwin, it would be better if PTY's are created with proper permissions, turning grantpt() into a no-op. Bypassing security frameworks like MAC by passing NOCRED to VOP_SETATTR() will only make things more complex. Approved by: philip (mentor)
* Merge back devfs changes from the mpsafetty branch.ed2008-05-311-6/+6
| | | | | | | | | | | | | | | | | | | | | | | In the mpsafetty branch, PTY's are allocated through the posix_openpt() system call. The controller side of a PTY now uses its own file descriptor type (just like sockets, vnodes, pipes, etc). To remain compatible with existing FreeBSD and Linux C libraries, we can still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes implement d_fdopen(). Devfs has been slightly changed here, to allow finit() to be called from d_fdopen(). The routine grantpt() has also been moved into the kernel. This routine is a little odd, because it needs to bypass standard UNIX permissions. It needs to change the owner/group/mode of the slave device node, which may often not be possible. The old implementation solved this by spawning a setuid utility. When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to allow changes when a->a_cred is set to NOCRED. Approved by: philip (mentor)
* - Add locking to all filesystem operations in fdescfs and flag it as MPSAFE.lulf2008-05-243-84/+199
| | | | | | | | | | | | | | | | | - Use proper synhronization primitives to protect the internal fdesc node cache used in fdescfs. - Properly initialize and uninitalize hash. - Remove unused functions. Since fdescfs might recurse on itself, adding proper locking to it needed some tricky workarounds in some parts to make it work. For instance, a descriptor in fdescfs could refer to an open descriptor to itself, thus forcing the thread to recurse on vnode locks. Because of this, other race conditions also had to be fixed. Tested by: pho Reviewed by: kib (mentor) Approved by: kib (mentor)
* When vget() fails (because the vnode has been reclaimed), there is nokib2008-05-231-3/+4
| | | | | | | | | | sense to loop trying to vget() the vnode again. PR: 122977 Submitted by: Arthur Hartwig <arthur.hartwig nokia com> Tested by: pho Reviewed by: jhb MFC after: 1 week
* Implement the per-open file data for the cdev.kib2008-05-212-1/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch does not change the cdevsw KBI. Management of the data is provided by the functions int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr); int devfs_get_cdevpriv(void **datap); void devfs_clear_cdevpriv(void); All of the functions are supposed to be called from the cdevsw method contexts. - devfs_set_cdevpriv assigns the priv as private data for the file descriptor which is used to initiate currently performed driver operation. dtr is the function that will be called when either the last refernce to the file goes away, the device is destroyed or devfs_clear_cdevpriv is called. - devfs_get_cdevpriv is the obvious accessor. - devfs_clear_cdevpriv allows to clear the private data for the still open file. Implementation keeps the driver-supplied pointers in the struct cdev_privdata, that is referenced both from the struct file and struct cdev, and cannot outlive any of the referee. Man pages will be provided after the KPI stabilizes. Reviewed by: jhb Useful suggestions from: jeff, antoine Debugging help and tested by: pho MFC after: 1 month
* Fix and speedup timestamp calculations which is roughly based on the patch inmarkus2008-05-161-22/+34
| | | | | | | | | | | | | | | | | | | | the mentioned PR: - bounds check time->month as it is used as an array index - fix usage of time->month as array index (month is 1-12) - fix calculation based on time->day (day is 1-31) - fix the speedup code as it doesn't calculate correct timestamps before the year 2000 and reduce the number of calculation in the year-by-year code - speedup month calculations by replacing the array content with cumulative values - add microseconds calculation - fix an endian problem PR: kern/97786 Submitted by: Andriy Gapon <avg@topspin.kiev.ua> Reviewed by: scottl (earlier version) Approved by: emax (mentor) MFC after: 1 week
* lockinit() can't accept LK_EXCLUSIVE as an initializaiton flag, so justattilio2008-05-151-1/+1
| | | | | | | drop it. Reported by: Josh Carroll <josh dot carroll at gmail dot com> Submitted by: jhb
* Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFEjhb2008-05-071-20/+5
| | | | | | | | drivers. Since devfs is already marked MPSAFE it shouldn't be held anyway. MFC after: 2 weeks Discussed with: phk
* - change function name from *_vdir to *_vnode becausedaichi2008-05-071-21/+33
| | | | | | | | | | VSOCK has been added as cache target. Now they process not only VDIR but also VSOCK. - fixed panic issue caused by cache incorrect free process by "umount -f" Submitted by: Masanori OZAWA <ozawa@ongs.co.jp> MFC after: 1 week
* o Fixed multi thread access issue reported by Alexander V. Chernikovdaichi2008-04-253-12/+13
| | | | | | | | | | (admin@su29.net) fixed: kern/109950 PR: kern/109950 Submitted by: Alexander V. Chernikov (admin@su29.net) Reviewed by: Masanori OZAWA (ozawa@ongs.co.jp) MFC after: 1 week
* o Improved unix socket connection issuedaichi2008-04-251-13/+28
| | | | | | | | fixed: kern/118346 PR: kern/118346 Submitted by: Masanori OZAWA (ozawa@ongs.co.jp) MFC after: 1 week
* o Fixed rename panic issuedaichi2008-04-251-11/+14
| | | | | Submitted by: Masanori OZAWA (ozawa@ongs.co.jp) MFC after: 1 week
* o Fixed inaccessible issue especially including devfs on unionfs case.daichi2008-04-252-8/+187
| | | | | | | | fixed also: kern/117829 PR: kern/117829 Submitted by: Masanori OZAWA (ozawa@ongs.co.jp) MFC after: 1 week
* o Added system hang-up process when VOP_READDIR of unionfs_nodeget()daichi2008-04-251-1/+7
| | | | | | | | returns not end of the file status on debug mode (DIAGNOSTIC defined) kernel. Submitted by: Masanori OZAWA (ozawa@ongs.co.jp) MFC after: 1 week
* Move the head of byte-level advisory lock list from thekib2008-04-167-77/+4
| | | | | | | | | | | | | | | | | | | | | | filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
* When calling lf_advlock to unlock a record, make sure that ap->a_fl->l_typedfr2008-04-141-0/+3
| | | | | | is F_UNLCK otherwise we trigger a LOCKF_DEBUG panic. MFC after: 3 days
* Optimize lockmgr in order to get rid of the pool mutex interlock, of theattilio2008-04-061-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007
* The temporary workaround for the call to the vget() without lock type inkib2008-04-041-1/+3
| | | | | | | | | the fdesc_allocvp(). The caller of the fdesc_allocvp() expects that the returned vnode is not reclaimed. Do lock the vnode exclusive and drop the lock after. Reported by: pho Reviewed by: jeff
* Add the support for the AT_FDCWD and fd-relative name lookups to thekib2008-03-314-0/+4
| | | | | | | | | namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
* - Simplify null_hashget() and null_hashins() by using vref() ratherjeff2008-03-291-21/+4
| | | | | than a complex series of steps involving vget() without a lock type to emulate the same thing.
* Add the new kernel-mode NFS Lock Manager. To use it instead of thedfr2008-03-262-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-0/+4
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* Do not dereference cdev->si_cdevsw, use the dev_refthread() to properlykib2008-03-201-5/+12
| | | | | | | | obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week
* Remove kernel support for M:N threading.jeff2008-03-124-21/+10
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* Replace lockmgr lock protecting nwfs vnode hash table with an sx lock.rwatson2008-03-021-12/+15
| | | | MFC after: 1 month
* Replace lockmgr lock protecting smbfs node hash table with sx lock.rwatson2008-03-023-9/+10
| | | | MFC after: 1 month
* - Handle buffer lock waiters count directly in the buffer cache insteadattilio2008-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
* Rename fdescfs vnode from "fdesc" to "fdescfs" to avoid name collisionkib2008-02-261-1/+1
| | | | | of the vnode lock with the fdesc_mtx mutex. Having different kinds of locks with the same name confuses witness.
* Add "Make MPSAFE" to the Coda todo list.rwatson2008-02-261-0/+1
| | | | MFC after: 3 days
* Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it isattilio2008-02-258-41/+40
| | | | | | | | | always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
* Introduce some functions in the vnode locks namespace and in the ffsattilio2008-02-243-3/+3
| | | | | | | | | | | | | | | namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
* Don't check the bpbSecPerTrack and bpbHeads fields of the BPB.marcel2008-02-211-8/+7
| | | | | They are typically 0 on new ia64 systems. Since we don't use either field, there's no harm in not checking.
* Remove custom queue macros in Coda, replacing them with queue(9) tailqrwatson2008-02-174-105/+30
| | | | | | | | macros. The only semantic change was the need to add a vc_opened field to struct vcomm since we can no longer use the request queue returning to an uninitialized state to hold whether or not the device is open. MFC after: 1 month
* Remove namecache performance-tuning todo for Coda: we now use the FreeBSDrwatson2008-02-171-1/+0
| | | | | | name cache. MFC after: 1 month
* The possibly interruptible msleep in coda_call() means well, but isrwatson2008-02-151-1/+1
| | | | | | | | | | | | fundamentally fairly confused about how signals work and when it is appropriate for upcalls to be interrupted. In particular, we should be exempting certain upcalls from interruption, we should not always eventually time out sleeping on a upcall, and we should not be interrupting the sleep for certain signals that we currently are (including SIGINFO). This code needs to be reworked in the style of NFS interruptible mounts. MFC after: 1 month
* Spell replys as replies.rwatson2008-02-152-8/+8
| | | | MFC after: 1 month
* Reorder and clean up make_coda_node(), annotate weaknesses in therwatson2008-02-151-20/+25
| | | | | | implementation. MFC after: 1 month
* Remove debugging code under OLD_DIAGNOSTIC; this is all >10 years old andrwatson2008-02-142-32/+3
| | | | | | hasn't been used in that time. MFC after: 1 month
* In Coda, flush the attribute cache for a cnode when its fid isrwatson2008-02-141-1/+4
| | | | | | | changed, as its synthesized inode number may have changed and we want stat(2) to pick up the new inode number. MFC after: 1 month
* Update cache flushing behavior in light of recent namecache andrwatson2008-02-131-7/+0
| | | | | | | | | | | | | access cache improvements: - Flush just access control state on CODA_PURGEUSER, not the full namecache for /coda. - When replacing a fid on a cnode as a result of, e.g., reintegration after offline operation, we no longer need to purge the namecache entries associated with its vnode. MFC after: 1 month
* Implement a rudimentary access cache for the Coda kernel module,rwatson2008-02-133-28/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modeled on the access cache found in NFS, smbfs, and the Linux coda module. This is a positive access cache of a single entry per file, tracking recently granted rights, but unlike NFS and smbfs, supporting explicit invalidation by the distributed file system. For each cnode, maintain a C_ACCCACHE flag indicating the validity of the cache, and a cached uid and mode tracking recently granted positive access control decisions. Prefer the cache to venus_access() in VOP_ACCESS() if it is valid, and when we must fall back to venus_access(), update the cache. Allow Venus to clear the access cache, either the whole cache on CODA_FLUSH, or just entries for a specific uid on CODA_PURGEUSER. Unlike the Coda module on Linux, we don't flush all entries on a user purge using a generation number, we instead walk present cnodes and clear only entries for the specific user, meaning it is somewhat more expensive but won't hit all users. Since the Coda module is agressive about not keeping around unopened cnodes, the utility of the cache is somewhat limited for files, but works will for directories. We should make Coda less agressive about GCing cnodes in VOP_INACTIVE() in order to improve the effectiveness of in-kernel caching of attributes and access rights. MFC after: 1 month
OpenPOWER on IntegriCloud