summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Supply some useful information to the started image using ELF aux vectors.kib2010-08-172-3/+39
| | | | | | | | In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month
* Simplify taskqueue_drain() by using proved macros.pjd2010-08-131-14/+7
|
* Allow interrupt driven config hooks to be registered from config hook callbacks.gibbs2010-08-121-8/+45
| | | | | | | | | | | | | | | | | | | | | | Interrupt driven configuration hooks serve two purposes: they are a mechanism for registering for a callback that is invoked once interrupt services are available, and they hold off root device selection so long as any configuration hooks are still active. Before this change, it was not possible to safely register additional hooks from the context of a configuration hook callback. The need for this feature arises when interrupts are required to discover new devices (e.g. access to the XenStore to find para-virtualized devices) which in turn also require the ability to hold off root device selection until some lengthy, interrupt driven, configuration task has completed (e.g. Xen front/back device driver negotiation). More specifically, the mutex protecting the list of active configuration hooks is never held during a callback, and static information is used to ensure proper ordering and only a single callback to each hook even when faced with registration or removal of a hook during an active run. Sponsored by: Spectra Logic Corporation MFC after: 1 week.
* Properly indent a continue statement. No functional changes.gibbs2010-08-121-1/+1
|
* Add the half of time-of-day clock resolution when we adjust system time fromjkim2010-08-121-1/+7
| | | | | | | | | | time-of-day clock or vice versa. For x86 systems, RTC resolution is one second and we used to lose up to one second whenever we initialize system time from RTC or write system time back to RTC. With this change, margin of error per conversion is roughly between -0.5 and +0.5 second rather than between -1 and 0 second. Note that it does not take care of errors from getnanotime(9) (which is up to 1/hz second) or CLOCK_GETTIME() latency. These are just too expensive to correct and it is not worthy of the cost.
* Provide description for 'machdep.disable_rtc_set' sysctl. Clean up style(9)jkim2010-08-121-19/+13
| | | | nits. Remove a redundant return statement and an unnecessary variable.
* The buffers b_vflags field is not always properly protected bykib2010-08-122-4/+59
| | | | | | | | | | | | | | | bufobj lock. If b_bufobj is not NULL, then bufobj lock should be held when manipulating the flags. Not doing this sometimes leaves BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to stuck indefinitely in "getbuf" sleep, waiting for background write to finish which is not actually performed. Add BO_LOCK() in the cases where it was missed. In collaboration with: pho Tested by: bz Reviewed by: jeff MFC after: 1 month
* Rework memguard(9) to reserve significantly more KVA to detectmdf2010-08-111-18/+21
| | | | | | | | | | | | | | | | | use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month
* Fix (hopefully) the spelling of "queuing."ivoras2010-08-091-1/+1
| | | | Submitted by: bf1783 at gmail com
* Bumping the read-ahead count once more, to value equivalent to 512 KiB onivoras2010-08-091-1/+1
| | | | | | | | | | | | | | | | most system, based on benchmark results on a low-end fibre channel SAN under VMWare: vfs.read_max read performance 8 (historical default) 83 MB/s 16 (recent bump) 131 MB/s 32 (this version) 152 MB/s 64 157 MB/s (results are +/- 3 MB/s) As read-ahead is heuristic, based on past IO requests, it shouldn't be problematic. The new default is still smaller then in other OSes.
* Elaborate on how hirunningspace was chosen.ivoras2010-08-091-2/+5
|
* Add descriptions to a handful of sysctl nodes.gavin2010-08-093-7/+13
| | | | | | PR: kern/148580 Submitted by: Galimov Albert <wtfcrap mail.ru> MFC after: 1 week
* The r208165 fixed a bug related to unsigned integer overflowing for theattilio2010-08-091-4/+1
| | | | | | | | | | | | | | | | number of CPUs detection. However, that was not mention at all, the problem was not reported, the patch has not been MFCed and the fix is mostly improper. Fix the original overflow (caused when 32 CPUs must be detected) by just using a different mathematical computation (it also makes more explicit the size of operands involved, which is good in the moment waiting for a more complete support for a large number of CPUs). PR: kern/148698 Submitted by: Joe Landers <jlanders at vmware dot com> Tested by: gianni MFC after: 10 days
* Back out r210974. Any convenience of not typing "persist" is outweighedjamie2010-08-081-2/+6
| | | | by the possibility of unintended partially-formed jails.
* To help with sequential read UFS performance on modern systems, increaseivoras2010-08-071-1/+1
| | | | | | | | | | | | | | the vfs.read_max default. For most systems this means going from 128 KiB to 256 KiB, which is still very conservative and lower than what most other operating systems use, but as a sane default should not interfere much with existing systems. For systems with RAID volumes and/or virtualization envirnments, where read performance is very important, increasing this sysctl tunable to 32 or even more will demonstratively yield additional performance benefits. If MAXPHYS ever gets bumped up, it will probably be a good idea to slave read_max to it.
* Fix a bug where MSG_TRUNC was not returned in all necessary cases fortuexen2010-08-071-1/+6
| | | | | | | | | SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could not be copied to the application. If some data was left in the last mbuf, it was correctly discarded, but MSG_TRUNC was not set. Reviewed by: bz MFC after: 3 weeks
* Implicitly make a new jail persistent if it's set not to attach.jamie2010-08-061-6/+2
| | | | MFC after: 3 days
* Add a new ipi_cpu() function to the MI IPI API that can be used to send anjhb2010-08-063-8/+8
| | | | | | | | | | | | IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month
* Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixescsjp2010-08-061-0/+1
| | | | | | | | the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb
* Add "show cdev" ddb command.kib2010-08-061-0/+68
| | | | | In collaboration with: pho MFC after: 1 month
* Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that createdkib2010-08-063-48/+82
| | | | | | | | | cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
* In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we mustalc2010-08-041-1/+1
| | | | | | | cast PAGE_SIZE to an "int". (Powerpc and sparc, unlike the other architectures, define PAGE_SIZE as a "long".) Submitted by: Andreas Tobler
* Update the "desiredvnodes" calculation. In particular, make the part ofalc2010-08-021-8/+19
| | | | | | | | | the calculation that is based on the kernel's heap size more conservative. Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the time being set MAXVNODES_MAX to a large value. Reviewed by: jhb@ MFC after: 6 weeks
* Bump the witness pendlist to 768 to accomodate the increased number ofrpaulo2010-07-291-1/+1
| | | | spinlocks.
* Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple umamdf2010-07-281-24/+124
| | | | | | | | | | | | | | | | | | | | | zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)
* The interpreter name should no longer be treated as a buffer that can bealc2010-07-281-0/+4
| | | | | | overwritten. (This change should have been included in r210545.) Submitted by: kib
* Introduce exec_alloc_args(). The objective being to encapsulate thealc2010-07-272-15/+24
| | | | | | | | | | details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib
* Change the order in which the file name, arguments, environment, andalc2010-07-252-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib
* Eliminate a little bit of duplicated code.alc2010-07-231-3/+2
|
* completely ignore zero-sized elf sections in modules of elf object type (amd64)avg2010-07-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Current code doesn't check size of elf sections and may perform needless actions of zero-sized memory allocation and similar. The bigger issue is that alignment requirement of a zero-sized section gets effectively applied to the next section if it has smaller alignment requirement. But other tools, like gdb and consequently kgdb, completely ignore zero-sized sections and thus may map symbols to addresses differently. Zero-sized sections are not typical in general. Their typical (only, even) cause in FreeBSD modules is inline assembly that creates custom sections which is found in pcpu.h and vnet.h. Mere inclusion of one of those header files produces a custom section in elf output. If there is no actual use for the section in a given module, then the section remains empty. Better solution is to avoid creating zero-sized sections altogether, which is in plans. Preloaded modules are handled in boot code (load_elf_obj.c), while dynamically loaded modules are handled by kernel (link_elf_obj.c). Based on code by: np MFC after: 3 weeks
* cpufreq: allocate long-lived buffer for handling of sysctl requestsavg2010-07-231-7/+6
| | | | | | | | | | | | | At present the cpufreq sysctl handler for current level setting would allocate and deallocate a temporary buffer of 24KB even to handle a read-only query. This puts unnecessary load on memory subsystem when current level is checked frequently, e.g. when the likes of powerd and system monitoring software are running. Change the strategy to allocating a long-lived buffer for handling the requests. Reviewed by: njl MFC after: 2 weeks
* Make lorunningspace catch up with hirunningspace.ivoras2010-07-231-1/+6
| | | | | | While there, add comment about the magic numbers. Prodded by: alc
* Remove unused variable that snuck in during development.mdf2010-07-221-2/+1
| | | | Approved by: zml (mentor)
* Fix taskqueue_drain(9) to not have false negatives. For threadedmdf2010-07-221-23/+38
| | | | | | | | | | | | | | taskqueues, more than one task can be running simultaneously. Also make taskqueue_run(9) static to the file, since there are no consumers in the base kernel and the function signature needs to change with this fix. Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the taskqueue(9) man page. Reviewed by: jhb Approved by: zml (mentor)
* When compat32 binary asks for the value of hw.machine_arch, report thekib2010-07-221-3/+25
| | | | | | | | | | | name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks
* Remove spurious '/*-' marks and fix some other style problems.trasz2010-07-222-5/+4
| | | | Submitted by: bde@
* Use proper sysctl type (quad) for et_frequency. It fixes output on sparc64.mav2010-07-211-2/+2
|
* Probabilly defaulting to KTR_GEN is not the right decision when KTR_MASKattilio2010-07-211-1/+1
| | | | | | | | | | | is not defined at all because KTR_GEN is still a valid class and some traces may fit in. Default to 0, instead, and block any tracing. As long as this is a POLA violation (some thirdy-part code, even if that may be a questionable choice, could be rely on that feature) a MFC possibility might be carefully evaluated. Sponsored by: Sandvine Incorporated
* Fix several un-/signedness bugs of r210290 and r210293. Add one more check.mav2010-07-201-2/+3
|
* Fix expression style.ivoras2010-07-201-3/+2
| | | | Prodded by: jhb
* Extend timer driver API to report also minimal and maximal supported periodmav2010-07-202-11/+55
| | | | | | lengths. Make MI wrapper code to validate periods in request. Make kernel clock management code to honor these hardware limitations while choosing hz, stathz and profhz values.
* Fix function name in error messages.davidxu2010-07-201-2/+2
|
* Revert r210225 - turns out I was wrong; the "/*-" is not license-onlytrasz2010-07-189-26/+26
| | | | | | | thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@
* The "/*-" comment marker is supposed to denote copyrights. Remove non-copyrighttrasz2010-07-189-26/+26
| | | | occurences from sys/sys/ and sys/kern/.
* Remove outdated comment and move part of it into more applicable place.trasz2010-07-181-5/+0
|
* In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace onivoras2010-07-181-1/+3
| | | | | | | | | | machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc
* Retire td_syscalls now that it is no longer needed.jhb2010-07-152-2/+0
|
* A cosmetic change - don't output empty <flags>.ivoras2010-07-151-2/+2
|
* Rename timeevents.c to kern_clocksource.c.mav2010-07-141-0/+0
| | | | Suggested by: jhb@
* - Document layout of KTR_STRUCT payload in a comment.jhb2010-07-141-6/+4
| | | | | | | | - Simplify ktrstruct() calling convention by having ktrstruct() use strlen() rather than requiring the caller to hand-code the length of constant strings. MFC after: 1 month
OpenPOWER on IntegriCloud