summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_mbuf.c
Commit message (Collapse)AuthorAgeFilesLines
* PHCC[1]:phk2003-03-101-2/+2
| | | | | | | | | | I had commented the #ifdef INVARIANTS checks out to make sure I ran this code in all kernels and forgot to comment the #ifdefs back in before I committed. Spotted by: bmilekic [1] PHCC = Pointy Hat Correction Commit
* Make malloc and mbuf allocation mode flags nonoverlapping.phk2003-03-101-0/+15
| | | | | | Under INVARIANTS whine if we get incompatible flags. Submitted by: imp
* Replace calls to WITNESS_SLEEP() and witness_list() with equivalent callsjhb2003-03-041-9/+2
| | | | to WITNESS_WARN().
* o Allow "buckets" in mb_alloc to be differently sized (according tobmilekic2003-02-201-73/+112
| | | | | | | | | | | | | | | | compile-time constants). That is, a "bucket" now is not necessarily a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ worth of mbufs, clusters. o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce {mbuf,clust}_lowm, which currently has no effect but will be used to set the low watermarks. o Fix netstat so that it can deal with the differently-sized buckets and teach it about the low watermarks too. o Make sure the per-cpu stats for an absent CPU has mb_active set to 0, explicitly. o Get rid of the allocate refcounts from mbuf map mess. Instead, just malloc() the refcounts in one shot from mbuf_init() o Clean up / update comments in subr_mbuf.c
* Fix a serious bug when computing the index for thebmilekic2003-02-201-1/+1
| | | | | | | reference counter array for mbuf clusters. I don't know how this got past early testing nor how it survived so long without getting caught. If anyone was seeing really really bizarre memory corruption in a few mbufs this would be why.
* Back out M_* changes, per decision of the TRB.imp2003-02-191-22/+22
| | | | Approved by: trb
* Make m_getm() always return the top of the newly allocated chain, asbmilekic2003-02-141-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | opposed to returning the top of the old chain when there was one and the top of the newly allocated chain if there was no old chain. Actually, it should be noted that prior to this fix, although the comment above m_getm() advertised that m_getm() would return the top of the old chain (if an old chain was being passed in) it actually [wrongly] was returning the tail mbuf in the old chain instead. This is a bug but since the one use of m_getm() in the tree luckily did not depend on the behavior, it happened to work out without notice. Harti Brandt pointed out that the advertised behavior was actually not the real behavior and so this change makes m_getm() ALWAYS return the newly allocated chain (and fixes the comment). This is less confusing and is the best course of action as then the caller is always able to have both a reference to the top of the original chain (because it's passing it in in the call) and a reference to the newly attached chain. Although the API is slightly modified, I don't think that any third-party code uses m_getm() and if it does, it surely can't be working properly because the old behavior was bogus. API bug pointed out by: Harti Brandt <brandt@fokus.fraunhofer.de>
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-22/+22
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* o Introduce a new external mbuf type, EXT_EXTREF.gallatin2003-01-021-3/+8
| | | | | | | | | | | | | | | | | o Allow callers of m_extadd() to allocate their own reference m_ext.ref_cnt pointer, rather than having the mbuf system allocate it with a malloc() in the critical path. This speeds m_extadd() up, and also simplifies locking (malloc() may need Giant). A driver or subsystem wishing to take use its own ref counter must initialize m_ext.ref_cnt to point to its ref counter prior to calling m_extadd(), and it must use EXT_EXTREF as its external type. Eg: m->m_ext.ref_cnt = my_ref_cnt_ptr; m_extadd(.....,EXT_EXTREF); Reviewed by: bosko
* o Initialise each mbuf's m_len to 0 in m_getm(); mb_put_mem() dependstjr2002-11-271-1/+3
| | | | | | | | | on this. o Update the `cur' pointer in the cluster loop in m_getm() to avoid incorrect truncation and leaked mbufs. Reviewed by: bmilekic Approved by: re
* Fix a fairly subtle bug in mbuf_init() where the reference counterbmilekic2002-10-161-1/+1
| | | | | | | | contiguous space was being allocated from the clust_map instead of the mbuf_map as the comments indicated. This resulted in some address space wastage in mbuf_map. Submitted by: Rohit Jalan <rohjal@yahoo.co.in>
* Replace aux mbufs with packet tags:sam2002-10-161-14/+7
| | | | | | | | | | | | | | | | | | | o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
* Be consistent about "static" functions: if the function is markedphk2002-09-281-1/+1
| | | | | | static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
* Make m_flags an int instead of a short, this is consistent with thebmilekic2002-08-151-1/+1
| | | | | | | | | type of the 'flags' argument m_getcl() was using anyway; m_extadd() needed to be changed to accept an int instead of a short for 'flags.' This makes things more consistent and also gives us more bits to use for m_flags in the future (we have almost run out). Requested by: sam (Sam Leffler)
* Only my brain can fart while fixing a previous brain fart.bmilekic2002-08-081-2/+1
|
* YIKES, I take the pointy-hat for a really big braino here. Ibmilekic2002-08-081-4/+3
| | | | | | | | | appologize to those of you who may have been seeing crashes in code that uses sendfile(2) or other types of external buffers with mbufs. Pointed out by, and provided trace: Niels Chr. Bank-Pedersen <ncbp at bank-pedersen.dk>
* Correct a bug introduced in 1.26: M_PKTHDR is set in the 'flags'rwatson2002-08-071-1/+1
| | | | | | | | | argument, not the 'type' argument. As a result of the buf, the MAC label on some packet header mbufs might not be set in mbufs allocated using m_getcl(), resulting in a page fault. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Move the MAC label init/destroy stuff to more appropriate places so thatbmilekic2002-08-011-10/+20
| | | | | | the inits/destroys are done without the cache locks held even in the persistent-lock calls. I may be cheating a little by using the MAC "already initialized" flag for now.
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-1/+22
| | | | | | | | | | | | | | | | kernel access control. Invoke the necessary MAC entry points to maintain labels on header mbufs. In particular, invoke entry points during the two mbuf header allocation cases, and the mbuf freeing case. Pass the "how" argument at allocation time to the MAC framework so that it can determine if it is permitted to block (as with policy modules), and permit the initialization entry point to fail if it needs to allocate memory but is not permitted to, failing the mbuf allocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Make reference counting for mbuf clusters [only] work like in RELENG_4.bmilekic2002-07-301-49/+37
| | | | | | | | | | | | | | | While I don't think this is the best solution, it certainly is the fastest and in trying to find bottlenecks in network related code I want this out of the way, so that I don't have to think about it. What this means, for mbuf clusters anyway is: - one less malloc() to do for every cluster allocation (replaced with a relatively quick calculation + assignment) - no more free() in the cluster free case (replaced with empty space) :-) This can offer a substantial throughput improvement, but it may not for all cases. Particularly noticable for larger buffer sends/recvs. See http://people.freebsd.org/~bmilekic/code/measure2.txt for a rough idea.
* Move m_freem() from uipc_mbuf.c to subr_mbuf.c so it can take advantagebmilekic2002-07-241-0/+48
| | | | | | | | | | | of the inlines, like its cousin, m_free(). Also, make a small (first step?) optimisation of m_free() to use the MBP_PERSIST{,ENT} interface to hold the lock across frees when possible. The thing is that right now, we can only do this easily for at most across one mbuf + one cluster free, as the comment mentions (it also explains why). Anyway, some basic tests revealed a 5-10% overall improvement. Some of the results can be found here: http://people.freebsd.org/~bmilekic/code/measure.txt
* Introduce mb_free() to the MBP_PERSIST{,ENT} interface. What this meansbmilekic2002-07-231-17/+70
| | | | | | | | | | | | | | | is that grouped frees will be done as most often as possible without dropping the cache lock in between. So, for the most part, they'll be done without the lock being dropped. This is particularly true if you have something that does a grouped m_getm() or m_getcl() (a cluster and mbuf at the same time) - most likely getting the buffers from the same per-CPU cache - and then frees them with m_free{,m}(). Unless the buffers' underlying buckets were moved, the free will be done without the lock getting dropped in between. So far, only m_free() has been shown how to do this, and m_freem() will shortly follow. Since I'm here, I also fixed a small (but mostly harmless) type-mismatch introduced in the last commit.
* o Introduce new m_getcl() interface routine that allocates an mbufbmilekic2002-07-151-74/+395
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and a cluster in one shot. o Introduce MBP_PERSIST and MBP_PERSISTENT control bits to mb_alloc(); MBP_PERSIST means "if you can allocate, then keep the cache lock held on exit," and MBP_PERSISTENT means "a cache lock is alredy held on entry, so allocate from the specified (already locked) cache." They may be used in combination. o m_getcl() uses the MBP_PERSIST/MBP_PERSISTENT interface so that it doesn't drop the cache lock in between the mbuf and cluster allocations. o m_getm(), which takes a size and allocates an mbuf + cluster "best fit" chain, has been moved from uipc_mbuf.c to subr_mbuf.c and shown how to use MBP_PERSIST/MBP_PERSISTENT to attempt to do a grouped allocation without dropping the cache lock in between. Why this is good: much less bus-locked lock acquires/drops when they're not needed. Also, prototype for m_getcl(): struct mbuf * m_getcl(int how, short type, int flags); "how" and "type" are self-explanatory. "flags" may be M_PKTHDR, in which case m_getcl() will make the mbuf a pkthdr-mbuf. While I'm in subr_mbuf.c: o Every exported routine now has a nice comment with a description of the expected arguments. Eventually, mbuf(9) needs to be re-vamped but there's still more code to write/finalize before I get to that. o internal macros have been changed a bit. o consistently use 'short' for "type." This somehow slipped through before (that 'type' was sometimes declared as int). Alfred has been pushing for the MBP_PERSIST{,ENT} thing for almost a year now. Luigi asked for m_getcl(), and will probably MFC that part of this commit. TODO [Related]: teach mb_free() about MBP_PERSIST{, ENT}.
* m_extadd takes a void (*freef)(void *, void *) now, not aalfred2002-06-291-1/+1
| | | | void (*freef)(caddr_t, void *).
* Set system_map for both mbuf_map and clust_map to 1, in mbuf_init().bmilekic2002-06-131-2/+4
| | | | | Submitted by: Tor Egge (tegge) Pointed out to me by: hsu
* Separate "seperate" from kernel source.eric2002-05-161-1/+1
|
* Remove a printf(3) argument with no corresponding format specifier.des2002-05-141-1/+1
|
* Change the mbuf exhaustion warning message to match the messagesilby2002-05-091-1/+2
| | | | in -stable.
* Change callers of mtx_init() to pass in an appropriate lock type name. Injhb2002-04-041-2/+2
| | | | | | | most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
* Fix bug in mb_alloc that made systems configured withbmilekic2002-03-031-3/+1
| | | | | | | | | | PAGE_SIZE / MCLBYTES == 1 crash. Fix them by changing the appropriate "allocate new page and bucket" code in mb_alloc to use the macro for properly grabbing an allocated object from a bucket, the one that checks whether the bucket is empty. This should allow ken to continue testing zero-copy stuff on -CURRENT. Noticed and provided debug info: ken
* On the first day of Christmas bde gave to me:bmilekic2001-12-231-143/+132
| | | | | | | | A [hopefully] conforming style(9) revamp of mb_alloc and related code. (This was possible due to bde's remarkable patience.) Submitted by: (in large part) bde Reviewed by: (the other part) bde
* Move prototype of _mext_free to mbuf.h, where it belongs, because it isbmilekic2001-12-221-1/+0
| | | | | | | used in MEXTFREE and needs to be in scope for external MEXTFREE users. Pointed out by: Chad David <davidc@acns.ab.ca> Confirmed by: bde
* vm/vm_kern.c: rate limit (to once per second) diagnostic printf whenluigi2001-12-011-1/+14
| | | | | | | | | | | | | | | | | | | | | you run out of mbuf address space. kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away. This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic. Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there). Discussed-with: jlemon, silby and -net
* Context:bmilekic2001-11-251-1/+2
| | | | | | | | | | | | | | | | | | For an object type, we maintain a variable mb_mapfull. It is 0 by default and is only raised to 1 in one place: when an mb_pop_cont() fails for the first time, on the assumption that the reason for the failure is due to the underlying map for the object (e.g. clust_map, mbuf_map) being exhausted. Problem and Changes: Change how we define "mb_mapfull." It now means: "set to 1 when the first mb_pop_cont() fails only in the kmem_malloc()-ing of the object, and only if the call was with the M_TRYWAIT flag." This is a more conservative definition and should avoid odd [but theoretically possible] situations from occuring. i.e. we had set mb_mapfull to 1 thinking the map for the object was actually exhausted when we _actually_ failed in malloc()ing the space for the bucket structure managing the objects in the page we're allocating.
* Re-enable mbtypes statistics in the mbuf allocator. I disabled thesebmilekic2001-09-301-13/+71
| | | | | | | | | | | | | | | | | when I changed the allocator bits. This implements per-CPU mbtypes stats by keeping net number of decrements/increments of a given mbtype per-CPU and then summing all of the per-CPU mbtypes to produce the total net number of allocated mbufs of the given mbtype. Counters are carefully balanced to avoid/prevent underflows/overflows. mbtypes stats are re-enabled with the idea that we may occasionally (although very rarely) observe slight inconsistencies in the stat reporting. Most of the time, we should be fine, though. Also make appropriate modifications to netstat(1) and systat(1) to do the necessary reporting. Submitted by: Jiangyi Liu <jyliu@163.net>
* KSE Milestone 2julian2001-09-121-1/+1
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Rename mb_init() mbuf subsystem initialization routine to mbuf_init(), inbmilekic2001-08-031-4/+4
| | | | | | | | order to avoid namespace collision with subr_mchain.c's mb_init(). This wasn't "fatal" as the mbuf initialization routine mb_init() was local to subr_mbuf.c which in turn didn't pull in subr_mchain.c's mb_init() declaration, but it should deffinately be changed now before it creates headache.
* Move CPU_ABSENT() macro to smp.h, where it belongs anyway. It will bebmilekic2001-08-011-15/+4
| | | | | | | | | | | defined to 0 in the non-SMP case, which very much makes sense as it permits its usage in per-CPU initialization loops (for an example, check out subr_mbuf.c). Further, on a UP system, make mb_alloc always use the first per-CPU container, regardless of cpuid (i.e. remove reliability on cpuid in the UP case). Requested by: alfred
* Use the tunable maxusers rather than the compile-time one. Evaluate andpeter2001-07-261-12/+18
| | | | | | initialize in the right order to make derivative settings work right. eg: at compile time, nmbufs was double nmbclusters. For POLA this should work the same at runtime.
* - Do not handle the per-CPU containers in mbuf code as though the cpuidsbmilekic2001-07-261-6/+23
| | | | | | | | | | | | | | | | | were indices in a dense array. The cpuids are a sparse set and treat them as such, setting up containers only for CPUs activated during mb_init(). - Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse map, in accordance with the above. This allows us to properly boot with certain CPUs disactivated. However, if we later decide to re-activate said CPUs, we will barf until we decide to implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU containers to get configured on their activation. Reported by: mjacob Partially (sys/ diffs) Submitted by: mjacob
* Increase NMBCLUSTERS by 4x.obrien2001-07-171-1/+1
| | | | This takes a GENERIC kernel (MAXUSERS=32) from 1536 to 3072.
* Temporary fix at least- define NCPU_PRESENT which will be mp_npcus formjacob2001-06-221-2/+11
| | | | SMP kernels, one (1) for non-SMP.
* Introduce numerous SMP friendly changes to the mbuf allocator. Namely,bmilekic2001-06-221-0/+1029
introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/
OpenPOWER on IntegriCloud