summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_mbuf.c
Commit message (Collapse)AuthorAgeFilesLines
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-2/+4
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Fix inconsistency in setup of kernel_map: we need to make sure thatbmilekic2001-04-181-4/+5
| | | | | | | | we also reserve _adequate_ space for the mb_map submap; i.e. we need space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to rounddown, and not roundup, so that we are consistent. Pointed out by: bde
* - Change the msleep()s to condition variables.bmilekic2001-04-031-21/+21
| | | | | | | | The mbuf and mcluster free lists now each "own" a condition variable, m_starved. - Clean up minor indentention issues in sys/mbuf.h caused by previous commit.
* Use only one mutex for the entire mbuf subsystem.alfred2001-04-031-44/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't use atomic operations for the stats updating, instead protect the counts with the mbuf mutex. Most twiddling of the stats was done right before or after releasing a mutex. By doing this we reduce the number of locked ops needed as well as allow a sysctl to gain a consitant view of the entire stats structure. In the future... This will allow us to chain common mbuf operations that would normally need to aquire/release 2 or 3 of the locks to build an mbuf with a cluster or external data attached into a single op requiring only one lock. Simplify the per-cpu locks that are planned. There's also some if (1) code that should check if the "how" operation specifies blocking/non-blocking behavior, we _could_ make it so that we hold onto the mutex through calls into kmem_alloc when non-blocking requests are made, but for safety reasons we currently drop and reaquire the mutex around the calls. Also, note that calling kmem_alloc is rare and only happens during a shortage so drop/re-getting the mutex will not be a common occurance. Remove some #define's that seemed to obfuscate the code to me. Remove an extranious comment. Remove an XXX, including mutex.h isn't a crime. Reviewed by: bmilekic
* Move the atomic() mbstat.m_drops incrementing to the MGET(HDR) andbmilekic2001-03-241-19/+5
| | | | | | | MCLGET macros in order to avoid incrementing the drop count twice. Otherwise, in some cases, we may increment m_drops once in m_mballoc() for example, and increment it again in m_mballoc_wait() if the wait fails.
* Fix a couple of things in the internal mbuf allocation interface:bmilekic2001-03-171-8/+8
| | | | | | | | | | | | | | | - Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to be allocated from mb_map. In the case where nmbufs-reserved space is not an exact multiple of PAGE_SIZE (which it should be, but anyway...), we hold nmbufs as an absolute maximum which need not ever be reached. - Clean up m_clalloc(); make it more consistent in the sense that the first argument `ncl' really means "the number of clusters ensured to be allocated" and not "the number of pages worth of clusters to be allocated," as was previously the case. This also makes it consistent with m_mballoc() as well as the comment that preceeds it. Reviewed by: jlemon
* Fix parameter order in the calls to MGET().bp2001-02-211-2/+2
|
* Preserve alignment of first mbuf in m_copypacket.luigi2001-02-201-0/+4
| | | | | | | | This is useful when doing copies of packet where some leading space has been preallocated to insert protocol headers. Note that there are in fact almost no users of m_copypacket. MFC candidate.
* Implement m_getm() which will perform an "all or nothing" mbuf + clusterbmilekic2001-02-141-1/+67
| | | | | | | | | | | | | | | | | | | | | allocation, as required. If m_getm() receives NULL as a first argument, then it allocates `len' (second argument) bytes worth of mbufs + clusters and returns the chain only if it was able to allocate everything. If the first argument is non-NULL, then it should be an existing mbuf chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and so it will allocate `len' bytes worth of clusters and mbufs, as needed, and append them to the tail of the passed in chain, only if it was able to allocate everything requested. If allocation fails, only what was allocated by the routine will be freed, and NULL will be returned. Also, get rid of existing m_getm() in netncp code and replace calls to it to calls to this new generic code. Heavily Reviewed by: bp
* Long awaited style fixup in mbuf code. Get rid of K&R style prototypingbmilekic2001-02-111-129/+87
| | | | | | and function argument declarations. Make sure that functions that are supposed to return a pointer return NULL in case of failure. Don't cast NULL. Finally, get rid of annoying `register' uses.
* Change and clean the mutex lock interface.bmilekic2001-02-091-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* Don't bother with acquiring/releasing Giant around kmem_malloc() andjhb2001-02-081-40/+0
| | | | | | | | kmem_free() for now. Kmem_malloc() and kmem_free() now have appropriate assertions in place, and these checks aren't feasible until more of the networking code is locked down. Also, the extra assertions here should already be caught by the WITNESS code as lock order violations should mutex operations on Giant be reintroduced here later.
* When short of mbufs or mbuf clusters, we sleep on appropriate "counters."bmilekic2001-01-201-6/+5
| | | | | | | | | The counters are incremented when a thread goes to sleep and decremented either when a thread is woken up by another thread or when the sleep times out. There existed a race where the sleep count could be decremented twice resulting in an eventual underflow. Move the decrementing of the "counters" to the thread initiating the sleep and thus remedy the problem.
* Add some KASSERTs valid if WITNESS is defined to verify that the mbufbmilekic2001-01-161-4/+39
| | | | | | | | | | | | | | allocation routines are being called safely. Since we drop our relevant mbuf mutex and acquire Giant before we call kmem_malloc(), we have to make sure that this does not pave the way for a fatal lock order reversal. Check that either Giant is already held (in which case it's safe to grab it again and recurse on it) or, if Giant is not held, that no other locks are held before we try to acquire Giant. Similarily, add a KASSERT valid in the WITNESS case in m_reclaim() to nail callers who end up in m_reclaim() and hold a lock. Pointed out by: jhb
* In m_mballoc_wait(), drop the mmbfree mutex lock prior to callingbmilekic2001-01-091-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | m_reclaim() and re-acquire it when m_reclaim() returns. This means that we now call the drain routines without holding the mutex lock and recursing into it. This was done for mainly two reasons: (i) Avoid the long recursion; long recursions are typically bad and this is the case here because we block all other code from freeing mbufs if they need to. Doing that is kind of counter-productive, since we're really hoping that someone will free. (ii) More importantly, avoid a potential lock order reversal. Right now, not all the locks have been added to our networking code; but without this change, we're introducing the possibility for deadlock. Consider for example ip_drain(). We will likely eventually introduce a lock for ipq there, and so ip_freef() will be called with ipq lock held. But, ip_freef() calls m_freem() which in turn acquires the mmbfree lock. Since we were previously calling ip_drain() with mmbfree held, our lock order would be: mmbfree->ipq->mmbfree. Some other code may very well lock ipq first and then call ip_freef(). This would result in the regular lock order, ipq->mmbfree. Clearly, we have deadlock if one thread acquires the ipq lock and sits waiting for mmbfree while another thread calling m_reclaim() acquires mmbfree and sits waiting for the ipq lock. Also, make sure to add a comment above m_reclaim()'s definition briefly explaining this. Also document this above the call to m_reclaim() in m_mballoc_wait(). Suggested and reviewed by: alfred
* * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.bmilekic2000-12-211-9/+9
| | | | | | | | | | | | | | | | | | This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
* Catch up to moving headers:jhb2000-10-201-1/+1
| | | | | - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
* Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is thebmilekic2000-10-151-1/+5
| | | | | | | | | | | | | number of ext_buf counters that are possibly allocatable. Do this because: (i) It will make it easier to influence EXT_COUNTERS for if_sk, if_ti (or similar) users where the driver allocates its own ext_bufs and where it is important for the mbuf system to take it into account when reserving necessary space for counters. (ii) Facilitate some percentile calculation for netstat(1)
* Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhatbmilekic2000-09-301-254/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to accomodate the changes. Here's a list of things that have changed (I may have left out a few); for a relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which nobody uses anymore. It was great while it lasted, but now we're moving onto bigger and better things (Approved by: wollman). * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate new allocations which grab the necessary lock. * Make sure that necessary mbstat variables are manipulated with corresponding atomic() routines. * Changed the "wait" routines, cleaned it up, made one routine that does the job. * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they are now included in the generalized "wait" routines. * Sleep routines now use msleep(). * Free lists have locks. * etc... probably other stuff I'm missing... Things to look out for and work on later: * find a better way to (dynamically) adjust EXT_COUNTERS * move necessity to recurse on a lock from drain routines by providing lock-free lower-level version of MFREE() (and possibly m_free()?). * checkout include of mutex.h in sys/sys/mbuf.h - probably violating general philosophy here. The code has been reviewed quite a bit, but problems may arise... please, don't panic! Send me Emails: bmilekic@freebsd.org Reviewed by: jlemon, cp, alfred, others?
* m_mballoc_wait() had a spl/tsleep race. mbufs can be freed in interruptpeter2000-08-251-0/+2
| | | | context, which can cause a wakeup.. which can race with this.
* Replace the mbuf external reference counting code with somethingdwmalone2000-08-191-27/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that should be better. The old code counted references to mbuf clusters by using the offset of the cluster from the start of memory allocated for mbufs and clusters as an index into an array of chars, which did the reference counting. If the external storage was not a cluster then reference counting had to be done by the code using that external storage. NetBSD's system of linked lists of mbufs was cosidered, but Alfred felt it would have locking issues when the kernel was made more SMP friendly. The system implimented uses a pool of unions to track external storage. The union contains an int for counting the references and a pointer for forming a free list. The reference counts are incremented and decremented atomically and so should be SMP friendly. This system can track reference counts for any sort of external storage. Access to the reference counting stuff is now through macros defined in mbuf.h, so it should be easier to make changes to the system in the future. The possibility of storing the reference count in one of the referencing mbufs was considered, but was rejected 'cos it would often leave extra mbufs allocated. Storing the reference count in the cluster was also considered, but because the external storage may not be a cluster this isn't an option. The size of the pool of reference counters is available in the stats provided by "netstat -m". PR: 19866 Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: alfred (glanced at by others on -net)
* mbstat should be a read-only sysctl.alfred2000-07-311-1/+1
| | | | Submitted by: Bosko Milekic <bmilekic@dsuper.net>
* Make mbstat.m_mtypes seperate and viewable via sysctl, alsoalfred2000-07-151-0/+4
| | | | | | | expand the size from short to ulong Submitted by: Ian Dowse <iedowse@maths.tcd.ie> PR: kern/19809
* sync with kame tree as of july00. tons of bug fixes/improvements.itojun2000-07-041-0/+9
| | | | | | | API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
* Actively limit the allocation of mbufs to NMBUFS/nmbufs and mbuf clustersmsmith1999-12-281-2/+22
| | | | | | | | to NMBCLUSTERS/nmbclusters/kern.ipc.nmbclusters. Add a read-only sysctl kern.ipc.nmbufs matching kern.ipc.nmbclusters. Submitted by: Bosko Milekic <bmilekic@dsuper.net>
* Make m_print const correct (avoids a warning)eivind1999-12-201-1/+1
|
* Woops, I'm so sorry I forgot this! From the last mbuf.h change:green1999-12-181-2/+2
| | | | | | | m_mballoc_wakeup() (inline) -> MMBWAKEUP() (macro) m_clalloc_wakeup() (inline) -> MCLWAKEUP() (macro) Noticed by: peter
* Bug fix:green1999-12-141-0/+2
| | | | | | | | | | | The variables "m_mclalloc_wid" and "m_mballoc_wid" were not in the proper place. They should have been in uipc_mbuf.c and have been global, not in mbuf.h and local per each file that uses mbuf.h. Sorta bug fix: In mbuf.h, the definitions of various things for KERNEL and not KERNEL cases were very screwy. This fixes all of that which I could find.
* This is Bosko Milekic's mbuf allocation waiting code. Basically, thisgreen1999-12-121-22/+138
| | | | | | | | means that running out of mbuf space isn't a panic anymore, and code which runs out of network memory will sleep to wait for it. Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: green, wollman
* The functions m_copym() and m_copypacket() return read-only copies,archie1999-12-011-0/+78
| | | | | | | | | | | | because in the case of mbuf clusters they only increment the reference count rather than actually copying the data. Add comments to this effect, and add a new routine called m_dup() that returns a real, writable copy of an mbuf chain. This is preliminary work required for implementing 'ipfw tee'. Reviewed by: julian
* Fix a warning.peter1999-11-181-1/+1
|
* New function:phk1999-11-011-0/+16
| | | | | m_print(struct mbuf *); hexdumps a mbuf.
* change identical and "programming error" panic("mcopy*")'s intoalfred1999-10-131-12/+9
| | | | | | more verbose messages using KASSERT. Reviewed by: eivind, des
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Move the initialisation/tuning of nmbclusters from param.c/machdep.cmsmith1999-07-051-1/+11
| | | | | | | | | | | | | | | into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper. Make NMBUFs tunable as well. Move the nmbclusters sysctl here as well. Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl. Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).
* Slight reorganization of kernel thread/process creation. Instead of usingpeter1999-07-011-2/+2
| | | | | | | | | | | | | | | SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one. One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.
* Typo in comment.des1999-04-121-2/+2
|
* * Change sysctl from using linker_set to construct its tree using SLISTs.dfr1999-02-161-1/+2
| | | | | | | | | | This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
* Only call m_reclaim() if M_WAIT since calling it from an interrupt candg1998-07-271-3/+11
| | | | | cause problems. PR: 7403
* Update M_EXT support in m_copypacket().phk1998-07-031-3/+11
| | | | | | | PR: 7122 Reviewed by: phk Submitted by: Castor Fu <castor@geocast.com> Originally forgotten by: julian
* If we are out of mb_map space and we failed to m_reclaim() anything anddg1998-06-051-7/+15
| | | | | | | the alloc is not M_DONTWAIT, then panic with "Out of mbuf clusters". Callers that specify M_WAIT can't deal with getting a NULL buffer, so this is a more graceful failure than randomly page faulting in the socket code or elsewhere.
* Don't depend on "implicit int".bde1998-02-201-2/+2
|
* Restored used include of <sys/malloc.h>. malloc() is not usedbde1997-12-281-21/+22
| | | | | | | | | | | | | here, but kmem_malloc() is used and it takes the same "flags" as malloc(). Use the mbuf allocation "flags" M_WAIT and M_DONTWAIT consistently. There is really only one boolean flag, M_DONTWAIT, but the "flags" were always treated as enum-like values, except in some places here where the values are tacitly converted to boolean flags. Treat them as enum-like values everywhere, except where we tacitly assume that there are only two values in order to convert them to the corresponding two kmem_malloc() "flags".
* Removed unused #includes.bde1997-10-281-2/+1
|
* Last major round (Unless Bruce thinks of somthing :-) of malloc changes.phk1997-10-121-2/+1
| | | | | | | | Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
* Removed unused #includes.bde1997-08-021-4/+1
|
* Create a new branch of the kernel MIB, kern.ipc, to storewollman1997-02-241-3/+19
| | | | | | | | | | | all of the configurables and instrumentation related to inter-process communication mechanisms. Some variables, like mbuf statistics, are instrumented here for the first time. For mbuf statistics: also keep track of m_copym() and m_pullup() failures, and provide for the user's inspection the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
* uipc_mbuf.c: do a better job of counting how often we have to waitwollman1997-02-181-6/+16
| | | | | | | for memory, or are denied a cluster. uipc_socket2.c: define some generic ``operation-not-supported'' entry points for pr_usrreqs.
* Provide an alternative mbuf cluster allocator which permits use ofwollman1997-02-131-1/+46
| | | | | | | clusters greater than one page in length by calling contigmalloc1(). This uses a helper process `mclalloc' to do the allocation if the system runs out at interrupt time to avoid calling contigmalloc at high spl. It is not yet clear to me whether this works.
* Fix bug related to map entry allocations where a sleep might be attempteddg1997-01-151-2/+2
| | | | | | | | when allocating memory for network buffers at interrupt time. This is due to inadequate checking for the new mcl_map. Fixed by merging mb_map and mcl_map into a single mb_map. Reviewed by: wollman
OpenPOWER on IntegriCloud