summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Handle taskqueue_drain(9) correctly on a threaded taskqueue:zml2010-04-301-5/+6
| | | | | | | | | | | | taskqueue_drain(9) will not correctly detect whether a task is currently running. The check is against a field in the taskqueue struct, but for a threaded queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: jhb Approved by: dfr (mentor)
* Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(),alfred2010-04-301-3/+19
| | | | | | | | use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb
* Don't leak core_buf or gzfile if doing a compressed core file and wealfred2010-04-301-4/+7
| | | | | | hit an error condition. Obtained from: Juniper Networks
* Do not set IO_NODELOCKED while writing to vnodes as our consumersalfred2010-04-301-4/+4
| | | | | | | | do not lock the vnodes. Obtained from: Juniper Networks Reviewed by: jhb
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-306-20/+34
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Remove caddr_t casts.kib2010-04-291-5/+3
| | | | | Requested by: bde MFC after: 10 days
* kern_ntptime: drop a comment that became stale after r207359avg2010-04-291-4/+0
| | | | | MFC after: 1 week X-MFC after: r207359
* periodically save system time to hardware time-of-day clockavg2010-04-291-0/+65
| | | | | | | | | | | | | | | This is done in kern_ntptime, perhaps not the best place. This is done using resettodr(). Some features: - make save period configurable via tunable and sysctl - period of zero disables saving, setting a non-zero period re-enables it or reschedules it - do saving only if system clock is ntp-synchronized - save on shutdown Discussed with: des, Peter Jeremy <peterjeremy@acm.org> X-Maybe: save time near seconds boundary for better precision MFC after: 2 weeks
* kern_ntptime: abstract time error check into a functionavg2010-04-291-23/+27
| | | | | | ... to avoid code duplication MFC after: 1 week
* - Rework the underlying ALQ storage to be a circular buffer, which amongst otherlstewart2010-04-261-89/+466
| | | | | | | | | | | | | | | | | | | | | | | things allows variable length messages to be easily supported. - Extend KPI with alq_writen() and alq_getn() to support variable length messages, which is enabled at ALQ creation time depending on the arguments passed to alq_open(). Also add variants of alq_open() and alq_post() that accept a flags argument. The KPI is still fully backwards compatible and shouldn't require any change in ALQ consumers unless they wish to utilise the new features. - Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers to have more control over IO scheduling and resource acquisition respectively. - Strengthen invariants checking. - Document ALQ changes in ALQ(9) man page. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jeff, rpaulo, rwatson MFC after: 1 month
* Move the constants specifying the size of struct kinfo_proc intokib2010-04-241-0/+3
| | | | | | | | | | machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks
* - Merge soft-updates journaling from projects/suj/head into head. Thisjeff2010-04-242-3/+26
| | | | | | | | brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
* Remove one zero from the double-0.bz2010-04-231-2/+2
| | | | | | This code doesn't have a license to kill. MFC after: 3 days
* Fix typo.kib2010-04-211-1/+1
| | | | | | Submitted by: emaste Pointy hat to: kib (who needs much bigger wardrobe) MFC after: 1 week
* Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) tokib2010-04-211-4/+130
| | | | | | | | | | mostly work on 64bit host. The work is based on an original patch submitted by emaste, obtained from Sandvine's source tree. Reviewed by: jhb MFC after: 1 week
* Make sure that we free the passed in data message if we don't actuallyimp2010-04-201-3/+12
| | | | | | | insert it onto the queue. Also, fix a mtx leak if someone turns off devctl while we're processing a messages. MFC after: 5 days
* Fix compilation in the !SMP case.attilio2010-04-201-0/+6
| | | | | | | | Keep the interrupts disabled in order to avoid preemption problems. Reported by: tinderbox, b.f. <bf1783 at googlemail dot com> MFC: 2 weeks X-MFC: r206878
* The cache_enter(9) function shall not be called for doomed dvp.kib2010-04-201-0/+2
| | | | | | | | | | | | | | | | Assert this. In the reported panic, vdestroy() fired the assertion "vp has namecache for ..", because pseudofs may end up doing cache_enter() with reclaimed dvp, after dotdot lookup temporary unlocked dvp. Similar problem exists in ufs_lookup() for "." lookup, when vnode lock needs to be upgraded. Verify that dvp is not reclaimed before calling cache_enter(). Reported and tested by: pho Reviewed by: kan MFC after: 2 weeks
* getblk lockmgr is mostly used as a msleep() and may lead too easilly toattilio2010-04-191-0/+1
| | | | | | | false positives. Whitelist it. Reported by: Erik Cederstrand <erik at cederstrand dot dk>
* Fix a deadlock in the shutdown code:attilio2010-04-191-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | When performing a smp_rendezvous() or more likely, on amd64 and i386, a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx spinlock held, busy-waiting for other CPUs to acknowledge the operation. As long as CPUs are suspended (via cpu_reset()) between the active mask read and IPI sending there can be a deadlock where the caller will wait forever for a dead CPU to acknowledge the operation. Please note that on CPU0 that is going to be someway heavier because of the spinlocks being disabled earlier than quitting the machine. Fix this bug by calling cpu_reset() with the smp_ipi_mtx held. Note that it is very likely that a saner offline/online CPUs mechanism will help heavilly in fixing similar cases as it is likely more bugs of this type may arise in the future. Reported by: rwatson Discussed with: jhb Tested by: rnoland, Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC: 2 weeks Special deciation to: anyone who made possible to have 16-ways machines in Netperf
* Fix typo.kib2010-04-151-1/+1
| | | | MFC after: 3 days
* Change the semantics of the debug.ktr.alq_enable control so that when youjulian2010-04-141-24/+30
| | | | | | | | disable alq, it acts as if alq had not been enabled in the build. in other words, the rest of ktr is still available for use. If you really don't want that as well, set the mask to 0. MFC after:3 weeks
* Handle a case in kern_openat() when vn_open() change file type fromkib2010-04-131-15/+2
| | | | | | | | | | | | | | DTYPE_VNODE. Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode, since we allow for fcntl(2) to process with advisory locks for DTYPE_VNODE only. Another reason is that all fo_close() routines need to check and release locks otherwise. For O_TRUNC, call fo_truncate() instead of truncating the vnode. Discussed with: rwatson MFC after: 2 week
* Remove XXX comment. Add another comment, describing why f_vnode assignmentkib2010-04-131-1/+6
| | | | | | is useful. MFC after: 3 days
* Initialize the virtual memory-related resource limits in a single place.alc2010-04-111-5/+12
| | | | | | | | | | | | | Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.) Make vm_thread_swapin() and vm_thread_swapout() static. Submitted by: bde (an earlier version) Reviewed by: kib
* - Introduce a blessed list for sxlocks that prevents the deadlkres toattilio2010-04-111-1/+31
| | | | | | | | | | panic on those ones. [0] - Fix ticks counter wrap-up Sponsored by: Sandvine Incorporated [0] Reported by: jilles [0] Tested by: jilles MFC: 1 week
* Do not leak master pty or ptmx vnode.kib2010-04-081-0/+9
| | | | | | Report and test case by: Petr Salinger <Petr.Salinger seznam cz> Reviewed by: ed MFC after: 1 week
* When OOM searches for a process to kill, ignore the processes alreadykib2010-04-061-0/+1
| | | | | | | | | | | | | | | killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
* Add missing MNT_NFS4ACLS.jh2010-04-041-0/+1
|
* Make _vm_map_init() the one place where the vm map's pmap field isalc2010-04-031-3/+2
| | | | | | initialized. Reviewed by: kib
* Fix some whitespace nits.pjd2010-04-031-7/+5
|
* Add missing mnt_kern_flag flags in 'show mount' output.pjd2010-04-031-1/+5
|
* vn_stat: take into account va_blocksize when setting st_blksizeavg2010-04-031-3/+2
| | | | | | | | | | | | | | | | | | | As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some cases this might not be optimal, but at least nothing should get broken. Generally I don't expect this commit to change much for the following reasons (in case of VREG, VDIR): - application I/O and physical I/O are sufficiently decoupled by filesystem code, buffer cache code, cluster and read-ahead logic - not all applications use st_blksize as a hint, some use f_iosize, some use fixed block sizes I expect writes to the middle of files on ZFS to benefit the most from this change. Silence from: fs@ MFC after: 2 weeks
* bo_bsize: revert r205860 and take an alternative approch in getblkavg2010-04-021-1/+1
| | | | | | | | | | | | | | | | | In r205860 I missed the fact that there is code that strongly assumes that devvp bo_bsize is equal to underlying provider's sectorsize. In those places it is hard to obtain the sectorsize in an alternative way if devvp bo_bsize is set to something else. So, I am reverting bo_bsize assigment in g_vfs_open. Instead, in getblk I use DEV_BSIZE block size for b_offset calculation if vp is a disk vp as reported by vn_isdisk. This should coinside with vp being a devvp. Reported by: Mykola Dzham <i@levsha.me> Tested by: Mykola Dzham <i@levsha.me> Pointyhat to: avg MFC after: 2 weeks X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks
* Supply default implementation of VOP_RENAME() that does neccessarykib2010-04-021-0/+16
| | | | | | | | | | | | unlocks and unreferences for argument vnodes, as expected by kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and reference leaks when rename is attempted on fs that does not implement rename. PR: kern/107439 Based on submission by: Mikolaj Golub <to.my.trociny gmail com> Tested by: Mikolaj Golub MFC after: 1 week
* Add function vop_rename_fail(9) that performs needed cleanup for lockskib2010-04-021-0/+14
| | | | | | | | and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename(). Tested by: Mikolaj Golub MFC after: 1 week
* The ALQ should not be considered drained until it has been made inactive.lstewart2010-04-011-1/+1
| | | | | | | Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month
* According to SLEEP(9), msleep() is deprecated in favour of mtx_sleep().lstewart2010-04-011-3/+3
| | | | | | | Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month
* - Factor code to destroy an ALQ out of alq_close() into a private alq_destroy().lstewart2010-04-011-17/+20
| | | | | | | | | - Use the new alq_destroy() to properly handle a failure case in alq_open(). Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month
* Add support for ALQ(9) to be compiled and loaded as a kernel module.lstewart2010-03-311-1/+83
| | | | | | | Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson Approved by: kmacy (mentor) MFC after: 1 month
* Defer freeing a kevent list until after dropping kqueue locks.jhb2010-03-301-4/+6
| | | | | | LOR: 185 Submitted by: Matthew Fleming @ Isilon MFC after: 1 week
* Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.ed2010-03-287-29/+29
| | | | | | | | | | | | | | | A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.
* Support only LOOKUP operation for "/" in relookup() because lookup()jh2010-03-261-11/+9
| | | | | | can't succeed for CREATE, DELETE and RENAME. Discussed with: bde
* Add the ELF relocation base to struct image_params. This will benwhitehorn2010-03-252-0/+2
| | | | | required to correctly relocate the executable entry point's function descriptor on powerpc64.
* Change the arguments of exec_setregs() so that it receives a pointernwhitehorn2010-03-251-4/+3
| | | | | | | | to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
* Change the way text_addr and data_addr are computed to use thenwhitehorn2010-03-251-11/+6
| | | | | | | | | | | executable status of segments instead of detecting the main text segment by which segment contains the program entry point. This affects obreak() and is required for correct operation of that function on 64-bit PowerPC systems. The previous behavior was apparently required only for the Alpha, which is no longer supported. Reviewed by: jhb Tested on: amd64, sparc64, powerpc
* Print the pointer to the lock with the panic message. The previousbz2010-03-241-2/+2
| | | | | | | | | panic: rw lock not unlocked was not really helpful for debugging. Now one can at least call show lock <ptr> form ddb to learn more about the lock. MFC after: 3 days
* The nargvstr and nenvstr properties of arginfo are ints, not longs,nwhitehorn2010-03-241-2/+2
| | | | | | | | so should be copied to userspace with suword32() instead of suword(). This alleviates problems on 64-bit big-endian architectures, and is a no-op on all 32-bit architectures. Tested on: amd64, sparc64, powerpc64
* Actually make O_DIRECTORY work.ed2010-03-212-0/+8
| | | | | | According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.
* Split eventhandler_register() into an internal part and a wrapper functionbz2010-03-191-17/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month
OpenPOWER on IntegriCloud