summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* We don't need to #include <sys/disklabel.h>.phk2002-09-201-2/+0
| | | | | | We don't need to #include <sys/disklabel.h> second time either. Sponsored by: DARPA & NAI Labs.
* VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()truckman2002-09-191-11/+4
| | | | | | | | | | wasn't doing. Rather than just lock and unlock the vnode around the call to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode in kern_link() before calling VOP_LINK(), since the other filesystems also locked the file vnode right away in their link methods. Remove the locking and and unlocking from the leaf filesystem link methods. Reviewed by: rwatson, bde (except for the unionfs_link() changes)
* intmax_t is printed with %jd, not %lld.obrien2002-09-191-7/+7
|
* Remove any VOP_PRINT that redundantly prints the tag.njl2002-09-181-4/+2
| | | | | | Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde
* Remove all use of vnode->v_tag, replacing with appropriate substitutes.njl2002-09-142-4/+4
| | | | | | | | | | | | v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
* vfs_syscalls.c:bde2002-09-101-52/+6
| | | | | | | | | | | | | | | | | | | | | | Changed rename(2) to follow the letter of the POSIX spec. POSIX requires rename() to have no effect if its args "resolve to the same existing file". I think "file" can only reasonably be read as referring to the inode, although the rationale and "resolve" seem to say that sameness is at the level of (resolved) directory entries. ext2fs_vnops.c, ufs_vnops.c: Replaced code that gave the historical BSD behaviour of removing one link name by checks that this code is now unreachable. This fixes some races. All vnodes needed to be unlocked for the removal, and locking at another level using something like IN_RENAME was not even attempted, so it was possible for rename(x, y) to return with both x and y removed even without any unlink(2) syscalls (one process can remove x using rename(x, y) and another process can remove y using rename(y, x)). Prodded by: alfred MFC after: 8 weeks PR: 42617
* Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods.phk2002-09-052-31/+192
| | | | | | | | Use extattr_check_cred() to check access to EAs. This is still a WIP. Sponsored by: DARPA & NAI Labs.
* Use canonical extattr_check_cred() instead of private implementation of thephk2002-09-051-39/+3
| | | | | | same policy. Sponsored by: DARPA & NAI Labs.
* Fix credentials check: do not leak ENOATTR until we know if they'rephk2002-09-051-12/+15
| | | | | | supposed to know. Sponsored by: DARPA & NAI Labs.
* Include <sys/malloc.h> instead of depending on namespace pollution 2bde2002-09-051-9/+11
| | | | | | | | | layers deep in <sys/proc.h> or <sys/vnode.h>. Include <sys/vmmeter.h> instead of depending on namespace pollution in <sys/pcpu.h>. Sorted includes as much as possible.
* Since we have vp and td cached in local variables, use those insteadrwatson2002-09-011-1/+1
| | | | | | | of derefencing the VOP arguments again when calling the UFS code. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Correctly handle setting, getting and deleting EA's with zero length content.phk2002-08-301-12/+14
| | | | Sponsored by: DARPA & NAI Labs.
* Replace various spelling with FALLTHROUGH which is lint()ablecharnier2002-08-251-1/+1
|
* o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever sincealc2002-08-251-1/+1
| | | | | | pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.
* Implement list of EA return functionality.phk2002-08-201-9/+33
| | | | | | Correctly delete EA's when the content length is set to zero. Sponsored by: DARPA & NAI Labs.
* First snapshot of UFS2 EA support.phk2002-08-191-7/+228
| | | | Sponsored by: DARPA & NAI Labs.
* In order to better support flexible and extensible access control,rwatson2002-08-152-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Expand the arguments to ffs_ext{read,write}() to their componentphk2002-08-131-41/+17
| | | | | | | | | parts rather than use vop_{read,write}_args. Access to these functions will ultimately not be available through the "vop_{read,write}+IO_EXT" API but this functionality is retained for debugging purposes for now. Sponsored by: DARPA & NAI Labs.
* Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is anphk2002-08-132-7/+65
| | | | | | | | | UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.
* Introduce typedefs for the member functions of struct vfsops and employphk2002-08-135-13/+13
| | | | | | | these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.
* Pass IO_NOMACCHECK to vn_rdwr() in the following checks to preventrwatson2002-08-122-5/+7
| | | | | | | | | | | | | | | | | | | | enforcement of MAC policy on the read or write operations: - In ext2fs, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), directory modifications in rename(), directory write operations in mkdir(), symlink write operations in symlink(). - In the NFS client locking code, perform vn_rdwr() on the NFS locking socket without enforcing MAC, since the write is done on behalf of the kernel NFS implementation rather than the user process. - In UFS, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), and symlink write operations in symlink(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Stop pretending that the FFS file ufs_readwrite.c is a UFS file.phk2002-08-122-1075/+1028
| | | | | | | Instead of #including it, pull it into ffs_vnops.c and name things correctly. Sponsored by: DARPA & NAI Labs.
* Fix a comment.phk2002-08-121-1/+1
|
* Don't call softdep_slowdown() if soft updates are not active on theiedowse2002-08-051-1/+1
| | | | | | | filesystem. This causes a panic for kernels compiled without softupdates. Reported by: luigi
* - Replace v_flag with v_iflag and v_vflagjeff2002-08-048-30/+55
| | | | | | | | | | | | | | | - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-1/+31
| | | | | | | | | | | | | | | | | | | | kernel access control. Instrument UFS to support per-inode MAC labels. In particular, invoke MAC framework entry points for generically supporting the backing of MAC labels into extended attributes. This ends up introducing new vnode operation vector entries point at the MAC framework entry points, as well as some explicit entry point invocations for file and directory creation events so that the MAC framework can push labels to disk before the directory names become persistent (this will work better once EAs in UFS2 are hooked into soft updates). The generic EA MAC entry points support executing with the file system in either single label or multilabel operation, and will fall back to the mount label if multilabel is not specified at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* I forgot this bit of uglyness in the fsck_ffs cleanup.phk2002-07-311-0/+1
|
* Fix braino in last commit.phk2002-07-301-3/+0
|
* Move ffs_isfreeblock() to ffs_alloc.c and make it static.phk2002-07-303-26/+25
| | | | Sponsored by: DARPA & NAI Labs.
* Lock page queue accesses by vm_page_free().alc2002-07-281-0/+8
|
* Add a missing argument to the stub for softdep_setup_freeblocks.benno2002-07-201-1/+2
| | | | Forgotten by: mckusick
* Fix a warning:peter2002-07-201-1/+2
| | | | ffs_softdep.c:1630: warning: int format, different type arg (arg 2)
* Add support to UFS2 to provide storage for extended attributes.mckusick2002-07-1913-211/+1043
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags |= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.
* Change utimes to set the file creation time (for filesystems thatmckusick2002-07-171-1/+11
| | | | | | | | support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.
* Change the name of st_createtime to st_birthtime. This change ismckusick2002-07-163-8/+8
| | | | | | | made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.
* Fix a type: s/your are/you are/trhodes2002-07-121-1/+1
|
* Fixed some printf format errors (4 new ones reported by gcc and 5 nearbybde2002-07-081-7/+7
| | | | old ones not reported by gcc). This helps unbreak LINT.
* Use indirect function pointer hooks instead of #ifdef SOFTUPDATESiedowse2002-07-011-0/+6
| | | | | | | | | direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick
* Add the ffs bits necessary to support unloading of the ufs kerneliedowse2002-07-014-2/+38
| | | | | | | | module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick
* Remove the bogus SYSINIT from ufs_dirhash.c and instead add a calliedowse2002-06-307-7/+65
| | | | | to ufsdirhash_init() from ufs_init(). Add uninit() functions corresponding the ufs, dirhash, quota and ihash init() functions.
* Remove the kernel file-size limit for UFS2, so that only the limitiedowse2002-06-261-5/+7
| | | | | | | | | | imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick
* At long last, commit the zero copy sockets code.ken2002-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Force the quota update to be done when an inode is released inmckusick2002-06-251-1/+1
| | | | | ufs_inactive. This avoid a panic when checking a NULL credential in suser_cred().
* Prototype fixes (long newinum --> ino_t newinum).jlemon2002-06-241-2/+2
|
* Warning fixes for 64 bits platforms. This eliminates all themux2002-06-235-23/+23
| | | | | | warnings I have had in the FFS code on sparc64. Reviewed by: mckusick
* Rename the BALLOC flags from B_* to BA_* to avoid confusion with thedillon2002-06-237-48/+48
| | | | | | struct buf B_ flags. Approved by: mckusick
* This patch fixes a problem whereby filesystems that ranmckusick2002-06-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | out of inodes in a cylinder group would fail to check for free inodes in other cylinder groups. This bug was introduced in the UFS2 code merge two days ago. An inode is allocated by calling ffs_valloc which calls ffs_hashalloc to do the filesystem scan. Ffs_hashalloc walks around the cylinder groups calling its passed allocator (ffs_nodealloccg in this case) until the allocator returns a non-zero result. The bug is that ffs_hashalloc expects the passed allocator function to return a 64-bit ufs2_daddr_t. When allocating inodes, it calls ffs_nodealloccg which was returning a 32-bit ino_t. The ffs_hashalloc code checked a 64-bit return value and usually found random non-zero bits in the high 32-bits so decided that the allocation had succeeded (in this case in the only cylinder group that it checked). When the result was passed back to ffs_valloc it looked at only the bottom 32-bits, saw zero and declared the system out of inodes. But ffs_hashalloc had really only checked one cylinder group. The fix is to change ffs_nodealloccg to return 64-bit results. Sponsored by: DARPA & NAI Labs. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Reviewed by: Maxime Henrion <mux@freebsd.org>
* This commit adds basic support for the UFS2 filesystem. The UFS2mckusick2002-06-2126-1068/+2494
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
* In rev 1.72 a situation related to write/mmap was fixed which could resultdillon2002-06-191-7/+11
| | | | | | | | | | | | | | | | | | | | | | in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI <mki@mozone.net> MFC after: 3 days
* Fix a typo in my recently added comment: s/beleived/believed/semenu2002-06-061-1/+1
| | | | Submitted by: keramida
OpenPOWER on IntegriCloud