summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+0
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Fix some LORs between vnode locks and filedescriptor table locks.jhb2009-07-311-2/+0
| | | | | | | | | | - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
* Fix the experimental nfs client so that it only calls ncl_vinvalbuf()rmacklem2009-07-291-5/+11
| | | | | | | | | for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since nfscl_mustflush() only returns 0 when there is a valid write delegation issued to the client, it only affects the case of an NFSv4 mount with callbacks/delegations enabled. Approved by: re (kensmith), kib (mentor)
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-241-0/+1
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* When vfs.newnfs.callback_addr is set to an IPv4 address, thermacklem2009-07-221-1/+1
| | | | | | | | experimental NFSv4 client might try and use it as an IPv6 address, breaking callbacks. The fix simply initializes the isinet6 variable for this case. Approved by: re (kensmith), kib (mentor)
* Add changes to the experimental nfs client to use the PBDRY flag forrmacklem2009-07-223-7/+14
| | | | | | | | msleep(9) when a vnode lock or similar may be held. The changes are just a clone of the changes applied to the regular nfs client by r195703. Approved by: re (kensmith), kib (mentor)
* When using an NFSv4 mount in the experimental nfs client with delegationsrmacklem2009-07-221-1/+1
| | | | | | | | | | | being issued from the server, there was a case where an Open issued locally based on the delegation would be released before the associated vnode became inactive. If the delegation was recalled after the open was released, an Open against the server would not have been acquired and subsequent I/O operations would need to use the special stateid of all zeros. This patch fixes that case. Approved by: re (kensmith), kib (mentor)
* Fix two bugs in the experimental nfs client:rmacklem2009-07-191-13/+7
| | | | | | | | | | | | | | | | | | | - When the root vnode was acquired during mounting, mnt_stat.f_iosize was still set to 0, so getnewvnode() would set bo_bsize == 0. This would confuse getblk(), so that it always returned the first block causing the problem when the root directory of the mount point was greater than one block in size. It was fixed by setting mnt_stat.f_iosize to NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode. - NFSMNT_INT was being set temporarily while the initial connect to a server was being done. This erroneously configured the krpc for interruptible RPCs, which caused problems because signals weren't being masked off as they would have been for interruptible mounts. This code was deleted to fix the problem. Since mount_nfs does an NFS null RPC before the mount system call, connections to the server should work ok. Tested by: swell dot k at gmail dot com Approved by: re (kensmith), kib (mentor)
* Fix the experimental nfs client so that it does not cause armacklem2009-07-141-1/+2
| | | | | | | | | "share->excl" panic when doing a lookup of dotdot at the root of a server's file system. The patch avoids calling vn_lock() for that case, since nfscl_nget() has already acquired a lock for the vnode. Approved by: re (kensmith), kib (mentor)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-143-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Add calls to the experimental nfs client for the case of an "intr" mount,rmacklem2009-07-121-1/+16
| | | | | | | so that signals that aren't supposed to terminate RPCs in progress are masked off during the RPC. Approved by: re (kensmith), kib (mentor)
* Fix the handling of dotdot in lookup for the experimental nfs clientrmacklem2009-07-121-0/+2
| | | | | | in a manner analagous to the change in r195294 for the regular nfs client. Approved by: re (kensmith), kib (mentor)
* Since the nfscl_getclose() function both decremented open counts and,rmacklem2009-07-093-183/+178
| | | | | | | | | | | | | optionally, created a separate list of NFSv4 opens to be closed, it was possible for the associated OpenOwner to be free'd before the Open was closed. The problem was that the Open was taken off the OpenOwner list before the Close RPC was done and OpenOwners can be free'd once the list is empty. This patch separates out the case of doing the Close RPC into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose() so that it closes a single open instead of a list of them. This avoids removing the Open from the OpenOwner list before doing the Close RPC. Approved by: re (kensmith), kib (mentor)
* Fix poll(2) and select(2) for named pipes to return "ready for read"kib2009-07-071-20/+14
| | | | | | | | | | | | | | | | | when all writers, observed by reader, exited. Use writer generation counter for fifo, and store the snapshot of the fifo generation in the f_seqcount field of struct file, that is otherwise unused for fifos. Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is equal to fifo' fi_wgen, and revert r89376. Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them. Note that the patch does not fix not returning POLLHUP for fifos. PR: kern/94772 Submitted by: bde (original version) Reviewed by: rwatson, jilles Approved by: re (kensmith) MFC after: 6 weeks (might be)
* In vn_vget_ino() and their inline equivalents, mnt_ref() the mount pointkib2009-07-021-0/+2
| | | | | | | | | | | around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused. Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks
* Change the type of uio_resid member of struct uio from int to ssize_t.kib2009-06-253-5/+5
| | | | | | | | Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)
* Add a new global rwlock, in_ifaddr_lock, which will synchronize use of therwatson2009-06-251-0/+4
| | | | | | | | | | | | | | | | | | | in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-232-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Add explicit struct ucred * argument for VOP_VPTOCNP, to be used bykib2009-06-211-1/+2
| | | | | | | | | | vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
* In non-debugging mode make this define (void)0 instead of nothing. Thisrdivacky2009-06-211-3/+3
| | | | | | | | | | helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@
* Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build.rmacklem2009-06-202-3/+3
| | | | Approved by: kib (mentor)
* Improve nested jail awareness of devfs by handling credentials.ed2009-06-201-0/+22
| | | | | | | | | | | | | | | | | | Now that we start to use credentials on character devices more often (because of MPSAFE TTY), move the prison-checks that are in place in the TTY code into devfs. Instead of strictly comparing the prisons, use the more common prison_check() function to compare credentials. This means that pseudo-terminals are only visible in devfs by processes within the same jail and parent jails. Even though regular users in parent jails can now interact with pseudo-terminals from child jails, this seems to be the right approach. These processes are also capable of interacting with the jailed processes anyway, through signals for example. Reviewed by: kib, rwatson (older version)
* Change the size of the nfsc_groups[] array in the experimental nfsrmacklem2009-06-202-5/+5
| | | | | | | | client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on the wire for AUTH_SYS authentication. Reviewed by: brooks Approved by: kib (mentor)
* Rework the credential code to support larger values of NGROUPS andbrooks2009-06-197-31/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
* Fix some of the style errors in *getpages().alc2009-06-181-18/+13
|
* Add the SVC_RELEASE(xprt), as required by r194407.rmacklem2009-06-172-0/+2
| | | | Approved by: kib (mentor)
* Add explicit includes for jail.h to the files that need them andbz2009-06-171-0/+1
| | | | remove the "hidden" one from vimage.h.
* Fix handling of ".." in nfs_lookup() for the forced dismount casermacklem2009-06-171-14/+36
| | | | | | by cribbing the change made to the regular nfs client in r194358. Approved by: kib (mentor)
* Add the explicit include of vimage.h to another five .c files stillbz2009-06-171-0/+1
| | | | | | | missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.
* Remove the "int *" typecast for the aresid argument to vn_rdwr()rmacklem2009-06-162-2/+2
| | | | | | | | and change the type of the argument from size_t to int. This should avoid issues on 64bit architectures. Suggested by: kib Approved by: kib (mentor)
* Eliminate unnecessary variables.alc2009-06-131-4/+2
|
* Rename the host-related prison fields to be the same as the host.*jamie2009-06-131-1/+2
| | | | | | | parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)
* Use getcredhostuuid instead of accessing the prison directly.jamie2009-06-131-5/+1
| | | | Approved by: bz (mentor)
* Update the inline version of vn_get_ino() for ".." lookups to match thejhb2009-06-121-6/+8
| | | | | | recentish changes to vn_get_ino(). MFC after: 1 week
* This commit is analagous to r193952, but for the experimental nfsrmacklem2009-06-101-8/+15
| | | | | | | | | | | subsystem. Add a test for VI_DOOMED just after ncl_upgrade_vnlock() in ncl_bioread_check_cons(). This is required since it is possible for the vnode to be vgonel()'d while in ncl_upgrade_vnlock() when a forced dismount is in progress. Also, move the check for VI_DOOMED in ncl_vinvalbuf() down to after ncl_upgrade_vnlock() and replace the out of date comment for it. Approved by: kib (mentor)
* For cd9660_ioctl, check for recycled vnode after locking it.kib2009-06-101-0/+4
| | | | | Noted by: Jaakko Heinonen <jh saunalahti fi> MFC after: 2 weeks
* Fix r193923 by noting that type of a_fp is struct file *, not int.kib2009-06-102-2/+2
| | | | | | | It was assumed that r193923 was trivial change that cannot be done wrong. MFC after: 2 weeks
* s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_argskib2009-06-103-3/+3
| | | | | | | definition. Discussed with: bde MFC after: 2 weeks
* Remove unused VOP_IOCTL and VOP_KQFILTER implementations for fifofs.kib2009-06-101-40/+2
| | | | MFC after: 2 weeks
* VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data maykib2009-06-103-14/+49
| | | | | | | | | | | | be NULL or derefenced memory may become free at arbitrary moment. Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL to prevent reclaim; check whether the vnode was already reclaimed after the lock is granted. Reported by: georg at dts su Reviewed by: des (pseudofs) MFC after: 2 weeks
* Since vn_lock() with the LK_RETRY flag never returns an errorrmacklem2009-06-091-7/+11
| | | | | | | | | for FreeBSD-CURRENT, the code that checked for and returned the error was broken. Change it to check for VI_DOOMED set after vn_lock() and return an error for that case. I believe this should only happen for forced dismounts. Approved by: kib (mentor)
* Fix nfscl_getcl() so that it doesn't crash when it is called tormacklem2009-06-081-20/+29
| | | | | | | do an NFSv4 Close operation with the cred argument NULL. Also, clarify what NULL arguments mean in the function's comment. Approved by: kib (mentor)
* Use #ifdef APPLE_MAC instead of #ifdef MAC to conditionalize Apple-specificrwatson2009-06-061-2/+2
| | | | | | | | | | behavior for unicode support in UDF so as not to conflict with the MAC Framework. Note that Apple's XNU kernel also uses #ifdef MAC for the MAC Framework. Suggested by: pjd MFC after: 3 days
* Drop Giant.des2009-06-061-12/+14
| | | | MFC after: 1 week
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-055-9/+3
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Don't check MAC in the NFS server ACL set path, right now we aren'trwatson2009-06-051-4/+0
| | | | enforcing MAC for NFS clients.
* Re-add opt_mac.h include, which is required in order for MNT_MULTILABELrwatson2009-06-041-0/+2
| | | | | | | | to be set properly on devfs. Otherwise, it isn't possible to set labels on /dev nodes. Reported by: Sergio Rodriguez <sergiorr at yahoo.com> MFC after: 3 days
* nfs_write() can use the recently introduced vfs_bio_set_valid() instead ofalc2009-05-311-1/+1
| | | | | | vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
* Unlock the pseudofs vnode before calling fill method for pfs_readlink().kib2009-05-311-1/+6
| | | | | | | | | The fill code may need to lock another vnode, e.g. procfs file implementation. Reviewed by: des Tested by: pho MFC after: 2 weeks
* Implement the bypass routine for VOP_VPTOCNP in nullfs.kib2009-05-311-1/+50
| | | | | | | | Among other things, this makes procfs <pid>/file working for executables started from nullfs mount. Tested by: pho PR: 94269, 104938
OpenPOWER on IntegriCloud