summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_param.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r288068:kib2015-10-051-1/+3
| | | | Ensure that maxproc does not exceed pid_max, at the time of boot.
* MFC 273800:jhb2015-02-101-59/+8
| | | | | | | | | | | | | Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare.
* Merge r257996,r258001,r258069 from head: fixes for HyperV guest.pluknet2013-11-141-0/+2
| | | | | | | | - Set description string for VM_GUEST_HV (HyperV guest). - Add a brief comment about VM_GUEST and vm_guest_sysctl_names relationship. - CTASSERT that vm_guest range is covered by vm_guest_sysctl_names. Approved by: re (glebius)
* MFC r257221:kib2013-10-311-1/+1
| | | | | | Fix typo. Approved by: re (glebius)
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichkib2013-03-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
* Move the auto-sizing of the callout array from init_param2() toandre2013-03-081-12/+0
| | | | | | | | | | | | kern_timeout_callwheel_alloc() where it is actually used. This is a mechanical move and no tuning parameters are changed. The pre-allocated callout array is only used for legacy timeout(9) calls and is only allocated and active on cpu0. Eventually all remaining users of timeout(9) should switch to the callout_* API. Reviewed by: davide
* - Make callout(9) tickless, relying on eventtimers(4) as backend fordavide2013-03-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil
* Move the mbuf memory limit calculations from init_param2() toandre2013-01-171-14/+0
| | | | | | | | | | | | | tunable_mbinit() where it is next to where it is used later. Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES to SI_SUB_KMEM after the VM is running. This allows to use better methods to determine the effectively available physical and virtual memory available to the kernel. Update comments. In a second step it can be merged into mbuf_init().
* Do not autotune ncallout to be greater than 18508.alfred2013-01-151-1/+4
| | | | | | | | | | | | | | | | | | | When maxusers was unrestricted and maxfiles was allowed to autotune much higher the result was that ncallout which was based on maxfiles and maxproc grew much higher than was needed. To fix this clip autotuning to the same number we would get with the old maxusers algorithm which would stop scaling at 384 maxusers. Growing ncalout higher is not likely to be needed since most consumers of timeout(9) are gone and any higher value for ncallout causes the callwheel hashes to be much larger than will even be needed for most applications. MFC after: 1 month Reviewed by: mav
* - Detect when we are in KVM.zont2013-01-151-0/+2
| | | | | | Silence on: emulation Approved by: kib (mentor) MFC after: 1 week
* Teach the kernel to recognize that it is executing inside a bhyve virtualneel2013-01-051-0/+1
| | | | | | machine. Obtained from: NetApp
* Prevent long type overflow of realmem calculation on ILP32 by forcingandre2012-12-101-2/+2
| | | | | | | | calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc
* Using a long is the wrong type to represent the realmem and maxmbufmemandre2012-11-291-4/+4
| | | | | | | | | variable as they may overflow on i386/PAE and i386 with > 2GB RAM. Use 64bit quad_t instead. It has broader kernel infrastructure support with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. Pointed out by: alc, bde
* Base the mbuf related limits on the available physical memory orandre2012-11-271-8/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel memory, whichever is lower. The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. The limit is set to half of the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping too easily. Via a tunable kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm. At the same time divorce maxfiles from maxusers and set maxfiles to physpages / 8 with a floor based on maxusers. This way busy servers can make use of the significantly increased mbuf limits with a much larger number of open sockets. Tidy up ordering in init_param2() and check up on some users of those values calculated here. Out of the overall mbuf memory limit 2K clusters and 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Normal mbufs (256B) weren't limited at all previously. This was problematic as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. NB: Every cluster also has an mbuf associated with it. Two examples on the revised mbuf sizing limits: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for even the most demanding network loads. MFC after: 1 month
* Allow maxusers to scale on machines with large address space.alfred2012-11-101-11/+11
| | | | | | | | | | | | | | | | | | Some hooks are added to clamp down maxusers and nmbclusters for small address space systems. VM_MAX_AUTOTUNE_MAXUSERS - the max maxusers that will be autotuned based on physical memory. VM_MAX_AUTOTUNE_NMBCLUSTERS - max nmbclusters based on physical memory. These are set to the old values on i386 to preserve the clamping that was being done to all arches. Another macro VM_AUTOTUNE_NMBCLUSTERS is provided to allow an override for the calculation on a MD basis. Currently no arch defines this. Reviewed by: peter MFC after: 2 weeks
* Allow autotune maxusers > 384 on 64 bit machinesalfred2012-10-251-2/+10
| | | | | | | | | A default install on large memory machines with multiple 10gigE interfaces were not being given enough mbufs to do full bandwidth TCP or NFS traffic. To keep the value somewhat reasonable, we scale back the number of maxuers by 1/6 past the 384 point. This gives us enough mbufs for most of our pretty basic 10gigE line-speed tests to complete.
* - Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN.zont2012-09-031-7/+7
| | | | | | Pointed out by: avg Approved by: kib (mentor) MFC after: 1 week
* - Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssizzont2012-09-021-7/+7
| | | | | | and kern.sgrowsiz sysctls writable. Approved by: kib (mentor)
* As a safety measure, disable lowering pid_max too much.kib2012-08-161-0/+3
| | | | | Requested by: Peter Jeremy <peter@rulingia.com> MFC after: 1 week
* Add a sysctl kern.pid_max, which limits the maximum pid the system iskib2012-08-151-2/+11
| | | | | | | allowed to allocate, and corresponding tunable with the same name. Note that existing processes with higher pids are left intact. MFC after: 1 week
* Modestly increase the maximum allowed size of the kmem map on i386.alc2011-03-231-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.
* Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.pluknet2011-01-211-0/+7
| | | | | | | Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
* Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixescsjp2010-08-061-0/+1
| | | | | | | | the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb
* Reverse the logic of the if statement that sets the default value ofnwhitehorn2010-06-241-3/+3
| | | | | | HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel
* Move default HZ from 100 to 1000 on powerpc.nwhitehorn2010-06-231-1/+1
| | | | | Reviewed by: marcel MFC after: 2 weeks
* Document the VM detection type and sysctl a bit better.ivoras2010-03-021-1/+1
|
* When running as a guest operating system, the FreeBSD kernel must assumealc2010-02-271-4/+4
| | | | | | | | | | | | that the virtual machine monitor has enabled machine check exceptions. Unfortunately, on AMD Family 10h processors the machine check hardware has a bug (Erratum 383) that can result in a false machine check exception when a superpage promotion occurs. Thus, I am disabling superpage promotion when the FreeBSD kernel is running as a guest operating system on an AMD Family 10h processor. Reviewed by: jhb, kib MFC after: 3 days
* Don't inforce an upper bound on kern.ngroups. The INT_MAX-1 limit wasbrooks2010-02-241-2/+0
| | | | | | | | | too high due to several overflows. The actual limit is somewhere in the neighborhood of INT_MAX/4 on 64-bit machines, but most systems could not support such a limit due to a lack of memory and the cost of duplicate credentials. Reported by: bde
* Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamicbrooks2010-01-121-0/+14
| | | | | | | | kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month
* Increase HZ_VM from 10 to 100. While 10 hz saves cpu timesilby2009-07-081-1/+1
| | | | | | | | under VM environments, it's too slow for FreeBSD to work properly. For example, ping at 10hz pings about every 600ms instead of about every second. Approved by: re (kib)
* Improve the description of a few sysctls.jhb2009-03-231-10/+11
| | | | | Submitted by: bde (partially) MFC after: 3 days
* Change the sysctls for maxbcache and maxswzone from int to long. I missedjhb2009-03-121-2/+2
| | | | this earlier since these sysctls don't exist in 7.x yet.
* Export the current values of nbuf, ncallout, and nswbuf via read-onlyjhb2009-03-121-0/+6
| | | | | | sysctls that match the tunable names. MFC after: 3 days
* - Make maxpipekva a signed long rather than an unsigned long as overflowjhb2009-03-101-2/+2
| | | | | | | is more likely to be noticed with signed types. - Make amountpipekva a long as well to match maxpipekva. Discussed with: bde
* Adjust some variables (mostly related to the buffer cache) that holdjhb2009-03-091-6/+6
| | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
* Document the relationship between enum VM_GUEST and the vm_guest_sysctl_namesivoras2008-12-301-1/+3
| | | | | | array. Approved by: gnn (original version)
* Hide detect_virtual() along with the accompanying stringbz2008-12-271-7/+9
| | | | | | | | arrays under #ifndef XEN to make XEN config compile again. In case of Xen vm_guest is hard coded. Move the list for the vm_guest sysctl out of the restictive bounds as the sysctl is there in either case.
* By popular request, stringify kern.vm_guest sysctl. Now it returns aivoras2008-12-181-3/+27
| | | | | | | short, self-documenting string describing the detected virtual environment. Approved by: gnn (mentor) (earlier version)
* Introduce a sysctl kern.vm_guest that reflects what the kernel knows aboutivoras2008-12-171-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it running under a virtual environment. This also introduces a globally accessible variable vm_guest that can be used where appropriate in the kernel to inspect this environment. To make it easier for the long run, an enum VM_GUEST is also introduced, which could possibly be factored out in a header somewhere (but the question is where - vm/vm_param.h? sys/param.h?) so it eventually becomes a part of the standard KPI. In any case, it's a start. The purpose of all this isn't to absolutely detect that the OS is running under a virtual environment (cf. "redpill") but to allow the parts of the kernel and the userland that care about this particular aspect and can do something useful depending on it to have a standardised interface. Reducing kern.hz is one example but there are other things that could be done like avoiding context switches, not using CPU instructions that are known to be slow in emulation, possibly different strategies in VM (memory) allocation, CPU scheduling, etc. It isn't clear if the JAILS/VIMAGE functionality should also be exposed by this particular mechanism (probably not since they're not "full" virtual hardware environments). Sometime in the future another sysctl and a variable could be introduced to reflect if the kernel supports any kind of virtual hosting (e.g. VMWare VMI, Xen dom0). Reviewed by: silence from src-commiters@, virtualization@, kmacy@ Approved by: gnn (mentor) Security: Obscurity doesn't help.
* - Detect Bochs BIOS variants and use HZ_VM as well.jkim2008-12-081-12/+25
| | | | | - Free kernel environment variable after its use. - Fix style(9) nits.
* vm_pnames should be "const char *const[]".sobomax2008-10-271-1/+1
| | | | Submitted by: Christoph Mallon
* vm_pnames has no reason to be global.sobomax2008-10-271-1/+1
| | | | MFC after: 2 weeks
* Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly.sobomax2008-10-271-1/+39
| | | | | | | | | | | | | | | | Due to the nature of the beast it causes lot of unproductive overhead. This is especially bad when running SMP kernel on VMWare with several virtual processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my dual-core MacBook Pro when I am enabling two virtual CPUs, making even host not very usable. Detect when we are running in the sandbox and reduce HZ to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This brings host CPU usage of idle FreeBSD/SMP on two virtual processors down to 10%. Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox and VirtualPC. MFC after: 2 weeks
* Correct an error in the comments for init_param3().alc2008-07-041-2/+2
| | | | Discussed with: silby
* - Export HZ value via kern.hz sysctl (this is the same name as for thepjd2008-05-091-8/+17
| | | | | | | | loader tunable). - Document other sysctls in this file and also mark them as loader tunable via CTLFLAG_RDTUN flag. Reviewed by: roberto
* Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz,alfred2007-10-161-0/+10
| | | | | | and sgrowsiz via sysctl. MFC after: 1 week
* Partially revert revision 1.66, which contained a change that did notkris2005-10-141-4/+4
| | | | | | | | | | | | | | | | | | correspond to the commit log. It changed the maxswzone and maxbcache parameters from int to long, without changing the extern definitions in <sys/buf.h>. In fact it's a good thing it did not, because other parts of the system are not yet ready for this, and on large-memory sparc machines it causes severe filesystem damage if you try. The worst effect of the change was that the tunables controlling the above variables stopped working. These were necessary to allow such large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not set a hard-coded upper limit on these parameters and they ended up overflowing an int, causing an infinite loop at boot in bufinit(). Reviewed by: mlaier
* Increase default HZ for sparc64 to 1000.marius2005-04-161-1/+1
|
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Fix the build.bms2004-11-301-2/+2
|
OpenPOWER on IntegriCloud