summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_malloc.c
Commit message (Collapse)AuthorAgeFilesLines
* Use vm_offset_t for kmembase and kmemlimit rather than char *, avoidingrwatson2007-06-271-4/+4
| | | | | | | | unnecessary casts, and making it possible to compile kern_malloc.c with strict aliasing. Submitted by: rdivacky Approved by: re (kensmith)
* Spell statistics more correctly in comments.rwatson2007-06-141-1/+1
|
* Revert VMCNT_* operations introduction.attilio2007-05-311-5/+4
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* Remove #if 0'd check for 0-size allocations, which if enabled, calledrwatson2007-05-271-4/+0
| | | | kdb_enter().
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-181-4/+5
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
* Add support for specifying a minimal size for vm.kmem_size in the loader viasepotvin2007-04-211-0/+12
| | | | | | | | vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc
* Increase usefulness of "show malloc" by moving from displaying the basicrwatson2006-10-261-5/+11
| | | | | | | | | | | | counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl
* Remove old kern.malloc sysctl, which generated a text representation ofrwatson2006-07-231-104/+0
| | | | | | | | the kernel malloc(9) state for vmstat -m. libmemstat is now used to generate a machine-readable version which is converged by vmstat -m into a human-readable version. Not for MFC.
* Expand comments for malloc(9) to better describe the design andrwatson2006-07-231-8/+44
| | | | statistics / memory types model.
* Fix bug in malloc_uninit():ps2006-03-031-1/+3
| | | | | | | | | | | Releasing items from the mt_zone can not be done by a simple uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC flag. Use uma_zfree_arg instead and supply the slab. This bug caused panics in low memory situations on unloading kernel modules containing MALLOC_DEFINE(..) statements. Submitted by: ups
* Add buffer corruption protection (RedZone) for kernel's malloc(9).pjd2006-01-311-1/+22
| | | | | | | | It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris
* Improve memguard a bit:pjd2005-12-301-13/+17
| | | | | | | | | | | | | | | | | - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
* In realloc(9), determine size of the original block based onpjd2005-12-281-1/+1
| | | | | | | | | | | | | | | | UMA_SLAB_MALLOC flag. In some circumstances (I observed it when I was doing a lot of reallocs) UMA_SLAB_MALLOC can be set even if us_keg != NULL. If this is the case we have wonderful, silent data corruption, because less data is copied to the newly allocated region than should be. I'm not sure when this bug was introduced, it could be there undetected for years now, as we don't have a lot of realloc(9) consumers and it was hard to reproduce it... ...but what I know for sure, is that I don't want to know who introduce the bug:) It took me two/three days to track it down (of course most of the time I was looking for the bug in my own code).
* Detect memory leaks when memory type is being destroyed.pjd2005-11-031-0/+21
| | | | | | This is very helpful for detecting kernel modules memory leaks on unload. Discussed and reviewed by: rwatson
* Change format string for u_int64_t to %ju from %llu, in order to use therwatson2005-10-201-1/+1
| | | | | | correct format string on 64-bit systems. Pointed out by: pjd
* Add a "show malloc" command to DDB, which prints out the current stats forrwatson2005-10-201-0/+27
| | | | | | | available kernel malloc types. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box. MFC after: 2 weeks
* Long overdue, keep up with mbuf.h,v 1.148.ru2005-08-021-3/+2
|
* Fix the way how "InUse" column in 'vmstat -m' output works:pjd2005-07-271-3/+6
| | | | | | | | | | | | - increase number of allocations count only on successfull malloc(9), so it doesn't confuse people; - because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;' under the check as well, as for size=0 it is of course a no-op; - avoid critical_enter()/critical_exit() in case of failure in malloc_type_allocated() as there will be nothing to do. OK'ed by: rwatson MFC after: 2 days
* Correct build on 64-bit: cast u_int64_t to (unsigned long long) beforerwatson2005-07-141-1/+1
| | | | | | | | printfing as (unsigned long long). 32-bit build on i386 didn't notice this. Whoops. Reported by: arved Tested by: sledge
* Introduce a new sysctl, kern.malloc_stats, which exports kernel mallocrwatson2005-07-141-5/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | statistics via a binary structure stream: - Add structure 'malloc_type_stream_header', which defines a stream version, definition of MAXCPUS used in the stream, and a number of malloc_type records in the stream. - Add structure 'malloc_type_header', which defines the name of the malloc type being reported on. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUS malloc_type_stats structures holding per-CPU allocation information. Typical values of MAXCPUS will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here: - Bump statistics width to uint64_t, and hard code using fixed-width type in order to be more sure about structure layout in the stream. We allocate and free a lot of memory. - Add kmemcount, a counter of the number of registered malloc types, in order to avoid excessive manual counting of types. Export via a new sysctl to allow user-space code to better size buffers. - De-XXX comment on no longer maintaining the high watermark in old sysctl monitoring code. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. Likewise, similar changes to UMA.
* Remove a variable that became unused as a result of changes madekensmith2005-06-161-1/+0
| | | | | | | | in v1.139. This was only exposed if MALLOC_PROFILE was defined. Submitted by: Gary Jennejohn Pointy hat: rwatson Approved by: re (scottl)
* Fix typo.jkoshy2005-06-101-1/+1
| | | | Reviewed by: rwatson, sam
* Kernel malloc layers malloc_type allocation over one of two underlyingrwatson2005-05-291-130/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocators: a set of power-of-two UMA zones for small allocations, and the VM page allocator for large allocations. In order to maintain unified statistics for specific malloc types, kernel malloc maintains a separate per-type statistics pool, which can be monitored using vmstat -m. Prior to this commit, each pool of per-type statistics was protected using a per-type mutex associated with the malloc type. This change modifies kernel malloc to maintain per-CPU statistics pools for each malloc type, and protects writing those statistics using critical sections. It also moves to unsynchronized reads of per-CPU statistics when generating coalesced statistics. To do this, several changes are implemented: - In the previous world order, the statistics memory was allocated by the owner of the malloc type structure, allocated statically using MALLOC_DEFINE(). This embedded the definition of the malloc_type structure into all kernel modules. Move to a model in which a pointer within struct malloc_type points at a UMA-allocated malloc_type_internal data structure owned and maintained by kern_malloc.c, and not part of the exported ABI/API to the rest of the kernel. For the purposes of easing a possible MFC, re-use an existing pointer in 'struct malloc_type', and maintain the current malloc_type structure size, as well as layout with respect to the fields reused outside of the malloc subsystem (such as ks_shortdesc). There are several unused fields as a result of no longer requiring the mutex in malloc_type. - Struct malloc_type_internal contains an array of malloc_type_stats, of size MAXCPU. The structure defined above avoids hard-coding a kernel compile-time value of MAXCPU into kernel modules that interact with malloc. - When accessing per-cpu statistics for a malloc type, surround read - modify - update requests with critical_enter()/critical_exit() in order to avoid races during write. The per-CPU fields are written only from the CPU that owns them. - Per-CPU stats now maintained "allocated" and "freed" counters for number of allocations/frees and bytes allocated/freed, since there is no longer a coherent global notion of the totals. When coalescing malloc stats, accept a slight race between reading stats across CPUs, and avoid showing the user a negative allocation count for the type in the event of a race. The global high watermark is no longer maintained for a malloc type, as there is no global notion of the number of allocations. - While tearing up the sysctl() path, also switch to using sbufs. The current "export as text" sysctl format is retained with the same syntax. We may want to change this in the future to export more per-CPU information, such as how allocations and frees are balanced across CPUs. This change results in a substantial speedup of kernel malloc and free paths on SMP, as critical sections (where usable) out-perform mutexes due to avoiding atomic/bus-locked operations. There is also a minor improvement on UP due to the slightly lower cost of critical sections there. The cost of the change to this approach is the loss of a continuous notion of total allocations that can be exploited to track per-type high watermarks, as well as increased complexity when monitoring statistics. Due to carefully avoiding changing the ABI, as well as hardening the ABI against future changes, it is not necessary to recompile kernel modules for this change. However, MFC'ing this change to RELENG_5 will require also MFC'ing optimizations for soft critical sections, which may modify exposed kernel ABIs. The internal malloc API is changed, and modifications to vmstat in order to restore "vmstat -m" on core dumps will follow shortly. Several improvements from: bde Statistics approach discussed with: ups Tested by: scottl, others
* Consistently style function declarations in kern_malloc.c.rwatson2005-04-121-23/+7
| | | | MFC after: 3 days
* Bring in MemGuard, a very simple and small replacement allocatorbmilekic2005-01-211-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables.des2004-09-291-4/+17
| | | | MFC after: 3 days
* Reimplement contigmalloc(9) with an algorithm which stands a greatly-green2004-07-191-27/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.
* Update for the KDB framework:marcel2004-07-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
* Bring in mbuma to replace mballoc.bmilekic2004-05-311-17/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. Todes2004-01-271-1/+7
| | | | | | | | | | | assure backward compatibility (conditional on !BURN_BRIDGES), look it up by its old name first, and log a warning (but accept the setting) if it was found. If both the old and new name are defined, the new name takes precedence. Also export vm.kmem_size as a read-only sysctl variable; I find it hard to tune a parameter when I don't know its default value, especially when that default value is computed at boot time.
* - Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger thanjeff2003-09-191-1/+12
| | | | | this are requested very infrequently and waste memory when we cache spares.
* Revert stuff which accidentally ended up in the previous commit.phk2003-07-221-6/+3
|
* Don't attempt to inline large functions mb_alloc() and mb_free(),phk2003-07-221-3/+6
| | | | | | it more than doubles the text size of this file. GCC has wisely ignored us on this previously
* Add init_param3() to subr_param. This function is calledsilby2003-07-111-0/+5
| | | | | | | | immediately after the kernel map has been sized, and is the optimal place for the autosizing of memory allocations which occur within the kernel map to occur. Suggested by: bde
* Don't overflow when calculating vm_kmem_size. This fixes kmem_mapps2003-06-111-4/+4
| | | | | | | | too small panics on PAE machines which have odd > 4GB sizes (4.5 gig would render a 20MB of KVA for kmem_map instead of 200MB). Submitted by: John Cagle <john.cagle@hp.com>, jeff Reviewed by: jeff, peter, scottl, lots of USENIX folks
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Don't pass NULL pointer to memset if we are compiled with DIAGNOSTICphk2003-05-121-4/+3
| | | | Approved by: re/rwatson
* Add two KASSERTS which trigger if free(9) would drag the "memuse" statisticphk2003-05-051-0/+6
| | | | | for a malloc bucket under zero. This typically happens if you malloc(9) from one bucket and free to another.
* Update the "last malloc failure timestamp" also for simulatedphk2003-04-251-0/+1
| | | | malloc errors.
* Permit debug.malloc.failure_rate to be specified using a tunable sorwatson2003-03-261-0/+1
| | | | | | | that the feature can be enabled during the boot process. Note the continued limitation that FreeBSD fails so rapidly with this setting enabled that it's hard to narrow down particular failures for correction; we really need per-malloc type failure rates.
* Add a new kernel option, MALLOC_MAKE_FAILURES, which compilesrwatson2003-03-261-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in a debugging feature causing M_NOWAIT allocations to fail at a specified rate. This can be useful for detecting poor handling of M_NOWAIT: the most frequent problems I've bumped into are unconditional deference of the pointer even though it's NULL, and hangs as a result of a lost event where memory for the event couldn't be allocated. Two sysctls are added: debug.malloc.failure_rate How often to generate a failure: if set to 0 (default), this feature is disabled. Otherwise, the frequency of failures -- I've been using 10 (one in ten mallocs fails), but other popular settings might be much lower or much higher. debug.malloc.failure_count Number of times a coerced malloc failure has occurred as a result of this feature. Useful for tracking what might have happened and whether failures are being generated. Useful possible additions: tying failure rate to malloc type, printfs indicating the thread that experienced the coerced failure. Reviewed by: jeffr, jhb
* PHCC[1]:phk2003-03-101-2/+2
| | | | | | | | | | I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit
* Make malloc and mbuf allocation mode flags nonoverlapping.phk2003-03-101-1/+18
| | | | | | Under INVARIANTS whine if we get incompatible flags. Submitted by: imp
* o Allow "buckets" in mb_alloc to be differently sized (according tobmilekic2003-02-201-2/+1
| | | | | | | | | | | | | | | | compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c
* Back out M_* changes, per decision of the TRB.imp2003-02-191-4/+4
| | | | Approved by: trb
* Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not havephk2003-02-011-0/+8
| | | | M_ZERO specified with 0x70. (malloc_flags=J for the kernel :-)
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-4/+4
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Introduce malloc_last_fail() which returns the number of seconds sincephk2002-11-011-0/+16
| | | | | | | | | malloc(9) failed last time. This is intended to help code adjust memory usage to the current circumstances. A typical use could be: if (malloc_last_fail() < 60) reduce_cache_by_one();
OpenPOWER on IntegriCloud