summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* When checking to see if another CPU is running its idle thread, examinejhb2008-11-181-4/+4
| | | | | | | | the thread running on the other CPU instead of the thread being placed on the run queue. Reported by: Ravi Murty @ Intel Reviewed by: jeff
* Obey signedness flag in %z case.delphij2008-11-171-1/+1
| | | | MFC after: 2 months
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.pjd2008-11-176-247/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
* Revert r184118. There is actually a code in the kernel, for instance inkib2008-11-161-10/+1
| | | | | | | | | | | | | | kern_unlinkat(), that expects that vn_start_write() actually fills the mp even when the call failed. As Tor noted, that pattern relies on the the type stability of the mount points, as well as that suspended mount points are never freed and V_XSLEEP is always passed to vn_start_write() when called on a freed mount point. Reported by: stass Reviewed by: tegge PR: 123768
* Silence detach messages if the device has marked itself quiet (u3g).n_hibma2008-11-131-1/+2
| | | | MFC after: 3 weeks
* Don't forget to relock the TTY after uiomove() returns an error.ed2008-11-121-4/+2
| | | | | | | | | Peter Holm just discovered this funny bug inside the TTY code: if uiomove() in ttydisc_write() returns an error, we forget to relock the TTY before jumping out of ttydisc_write(). Fix it by placing tty_unlock() and tty_lock() around uiomove(). Submitted by: pho
* Several cleanups related to pipe(2).ed2008-11-111-9/+22
| | | | | | | | | | | | | | | | | | - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
* Avoid scheduling firmware taskqs when cold.gallatin2008-11-111-3/+7
| | | | | | | | | | | This prevents a panic which occurs when a driver attempts to load firmware at boot via firmware_get() when the firmware module has not been preloaded. firmware_get() will enqueue a task using a struct taskqueue allocated on the stack, and the machine will crash much later in the firmware taskq thread when taskqs are started and the struct taskqueue is garbage. Not objected to by: sam
* Regenerate system call tables for r184789.ed2008-11-093-67/+8
|
* Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.ed2008-11-092-40/+29
| | | | | | | | | | | | | | | | | | | | | Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib
* make kern.ipc.nmbclusters actually have a useful effect on nmbclusters et al.kmacy2008-11-091-3/+4
| | | | initialize pkthdr in field order
* Reduce the default baud rate of PTY's to 9600.ed2008-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | On RELENG_6 (and probably RELENG_7) we see our syscons windows and pseudo-terminals have the following buffer sizes: | LINE RAW CAN OUT IHIWT ILOWT OHWT LWT COL STATE SESS PGID DISC | ttyv0 0 0 0 7680 6720 2052 256 7 OCcl 1146 1146 term | ttyp0 0 0 0 7680 6720 1296 256 0 OCc 82033 82033 term These buffer sizes make no sense, because we often have much more output than input, but I guess having higher input buffer sizes improves guarantees of the system. On MPSAFE TTY I just sent both the input and output buffer sizes to 7 KB, which is pretty big on a standard FreeBSD install with 8 syscons windows and some PTY's. Reduce the baud rate to 9600 baud, which means we now have the following buffer sizes: | LINE INQ CAN LIN LOW OUTQ USE LOW COL SESS PGID STATE | ttyv0 1920 0 0 192 1984 0 199 7 2401 2401 Oil | pts/0 1920 0 0 192 1984 0 199 5631 1305 2526 Oi This is a lot smaller, but for pseudo-devices this should be good enough. You need to do a lot of punching to fill up a 7.5 KB input buffer. If it turns out things don't work out this way, we'll just switch to 19200 baud.
* Merge latest DTrace changes from Perforce.rodrigc2008-11-052-5/+19
| | | | Approved by: jb
* Revert rev 184216 and 184199, due to the way the thread_lock works,davidxu2008-11-055-15/+55
| | | | | | it may cause a lockup. Noticed by: peter, jhb
* Use shared vnode locks for auditing vnode arguments as auditing onlyjhb2008-11-041-4/+4
| | | | | | does a VOP_GETATTR() which does not require an exclusive lock. Reviewed by: csjp, rwatson
* Don't bother calling setrunnable() and clearing the sleeping flag injhb2008-11-041-9/+12
| | | | sleepq_resume_thread() if the thread isn't asleep.
* Remove unnecessary locking around vn_fullpath(). The vnode lock for thejhb2008-11-042-8/+6
| | | | | | | | | | | | | | | | vnode in question does not need to be held. All the data structures used during the name lookup are protected by the global name cache lock. Instead, the caller merely needs to ensure a reference is held on the vnode (such as vhold()) to keep it from being freed. In the case of procfs' <pid>/file entry, grab the process lock while we gain a new reference (via vhold()) on p_textvp to fully close races with execve(2). For the kern.proc.vmmap sysctl handler, use a shared vnode lock around the call to VOP_GETATTR() rather than an exclusive lock. MFC after: 1 month
* Remove redundant return value tests.ed2008-11-041-6/+1
| | | | | There is no need to test whether the return value is non-zero here. Just return the error number directly.
* Adjust the license statement to more closely match a standard 3-clause BSDjhb2008-11-031-12/+12
| | | | | | license. MFC after: 3 days
* Use shared vnode locks instead of exclusive vnode locks for the access(),jhb2008-11-033-16/+16
| | | | | | | | | | chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month
* Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.attilio2008-11-032-20/+3
| | | | | | | | | | | Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho
* A few style nits.jhb2008-11-031-1/+2
|
* Regen.dfr2008-11-033-2/+21
|
* Implement support for RPCSEC_GSS authentication to both the NFS clientdfr2008-11-033-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
* Increase the initial sbuf size for CPU topology dump to something moreivoras2008-11-021-1/+1
| | | | | | | | usable for newer CPUs. The new value allows 2 x quad core configuration dumps to fit within the initial buffer without reallocations. Approved by: gnn (mentor) (older version) Pointed out by: rdivacky
* Improve VFS locking:attilio2008-11-025-62/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Clamp the values of t_column to 5 digits in `pstat -t' and `show all ttys'.ed2008-11-011-1/+1
| | | | | | | | We often run into these very high column numbers when we run curses applications, because they don't print any newlines. This messes up the table output of `pstat -t'. If these numbers get really high, they aren't of any use to the reader anyway. Convert them to `99999' when they run out of bounds.
* Reimplement the /dev/console device node.ed2008-11-012-244/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the pieces of code that I had left alone during the development of the MPSAFE TTY layer, was tty_cons.c. This file actually has two different functions: - It contains low-level console input/output routines (cnputc(), etc). - It creates /dev/console and wraps all its cdevsw calls to the appropriate TTY. This commit reimplements the second set of functions by moving it directly into the TTY layer. /dev/console is now a character device node that's basically a regular TTY, but does a lookup of `si_drv1' each time you open it. d_write has also been changed to call log_console(). d_close() is not present, because we must make sure we don't revoke the TTY after writing a log message to it. Even though I'm not convinced this is in line with the future directions of our console code, it is a good move for now. It removes recursive locking from the top half of the TTY layer. The previous implementation called into the TTY layer with Giant held. I'm renaming tty_cons.c to kern_cons.c now. The code hardly contains any TTY related bits, so we'd better give it a less misleading name. Tested by: Andrzej Tobola <ato iem pw edu pl>, Carlos A.M. dos Santos <unixmania gmail com>, Eygene Ryabinkin <rea-fbsd codelabs ru>
* Add three extra to the kinfo_proc_vmmap data. kve_offset - the offsetpeter2008-10-311-0/+10
| | | | | | within an object that a mapping refers to. fileid and fsid are inode/dev for vnodes. (Linux procfs has these and valgrind is really unhappy without them.) I believe I didn't change the size of the struct.
* Make it possible to compile kernel with KTR but without DDB.sobomax2008-10-301-1/+5
|
* Introduce a new sysctl, kern.sched.topology_spec, that returns an XMLivoras2008-10-291-1/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | dump of detected ULE CPU topology. This dump can be used to check the topology detection and for general system information. An example of CPU topology dump is: kern.sched.topology_spec: <groups> <group level="1" cache-level="0"> <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu> <flags></flags> <children> <group level="2" cache-level="0"> <cpu count="4" mask="0xf">0, 1, 2, 3</cpu> <flags></flags> </group> <group level="2" cache-level="0"> <cpu count="4" mask="0xf0">4, 5, 6, 7</cpu> <flags></flags> </group> </children> </group> </groups> Reviewed by: jeff Approved by: gnn (mentor)
* If threads limit is exceeded, increase the totoal numberdavidxu2008-10-291-1/+4
| | | | of failures.
* Rename a variable missed in previous accmode_t-related commits.trasz2008-10-281-21/+21
| | | | Approved by: rwatson (mentor)
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-287-57/+60
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Style return statements in vn_pollrecord().kib2008-10-281-2/+2
|
* Protect check for v_pollinfo == NULL and assignment of the newly allocatedkib2008-10-281-14/+22
| | | | | | | | | vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week
* Rename three MAC entry points from _proc_ to _cred_ to reflect the factrwatson2008-10-281-2/+2
| | | | | | | that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies. Obtained from: TrustedBSD Project
* After a machine has been up for a bit more than 20 days with HZ=1000,peter2008-10-281-1/+1
| | | | | | "ticks" goes negative. This breaks the signed comparison in softclock. This causes sleep() to never wake up, tcp to stop, etc etc. This is bad(TM). Use the SEQ_LT() method from tcp's sequence number comparisons.
* - Whitespace fix for vop_poll.jhb2008-10-271-2/+2
| | | | - Use the right label for vop_vptofh lock assertions so they are enforced.
* vm_pnames should be "const char *const[]".sobomax2008-10-271-1/+1
| | | | Submitted by: Christoph Mallon
* vm_pnames has no reason to be global.sobomax2008-10-271-1/+1
| | | | MFC after: 2 weeks
* Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly.sobomax2008-10-271-1/+39
| | | | | | | | | | | | | | | | Due to the nature of the beast it causes lot of unproductive overhead. This is especially bad when running SMP kernel on VMWare with several virtual processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my dual-core MacBook Pro when I am enabling two virtual CPUs, making even host not very usable. Detect when we are running in the sandbox and reduce HZ to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This brings host CPU usage of idle FreeBSD/SMP on two virtual processors down to 10%. Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox and VirtualPC. MFC after: 2 weeks
* Don't rely on the value of *statep without first taking the vnode interlock.dfr2008-10-241-1/+4
| | | | | Reviewed by: Mike Tancsa MFC after: 2 weeks
* Don't rearm callout if the process is exiting, it may leak a calloutdavidxu2008-10-241-1/+2
| | | | | because callout_drain() only waits for running callout, but not disable it if it is rearmed.
* partly revert revision 184199, because TDF_NEEDSIGCHK is persitentdavidxu2008-10-241-10/+5
| | | | | | | when thread is in kernel mode, it can cause dead loop, now unlock process lock after acquired sleep queue lock and thread lock to avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must be set with process lock and thread lock being hold at same time.
* Whitespace fix.jhb2008-10-231-1/+2
|
* Fix a number of style issues in the MALLOC / FREE commit. I've tried todes2008-10-235-8/+8
| | | | | be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-2316-76/+71
| | | | MFC after: 3 months
* Actually, for signal and thread suspension, extra process spin lock isdavidxu2008-10-235-59/+20
| | | | | | unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.
* Split the copyout of *base at the end of getdirentries() out leaving thejhb2008-10-221-10/+23
| | | | | | | | | | rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week
OpenPOWER on IntegriCloud