summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_mbuf.c
Commit message (Collapse)AuthorAgeFilesLines
* Add support to the virtual memory system for configuring machine-alc2009-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)
* This change is the next step in implementing the cache control functionalityalc2009-06-261-1/+1
| | | | | | | | | | | required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb
* define helper routines for deferred mbuf initializationkmacy2009-06-191-0/+26
|
* Utilize the new function kmem_alloc_contig() to implement the UMA back-endalc2009-06-181-17/+4
| | | | | | | | | | allocator for the jumbo frames zones. This change has two benefits: (1) a custom back-end deallocator is no longer required. UMA's standard deallocator suffices. (2) It eliminates a potentially confusing artifact of using contigmalloc(): The malloc(9) statistics contain bogus information about the usage of jumbo frames. Specifically, the malloc(9) statistics report all jumbo frames in use whereas the UMA zone statistics report the "truth" about the number in use vs. the number free.
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Temporary workaround for the limitations of the mbuf flowid field: zerorwatson2009-01-011-0/+2
| | | | | | the field in the mbuf constructors, since otherwise we have no way to tell if they are valid. In the future, Kip has plans to add a flag specifically to indicate validity, which is the preferred model.
* Removed a comment made obsolete by revisions 157927 and 174292.ru2008-12-181-1/+0
|
* Make sure nmbclusters are initialized before maxsocketsbz2008-12-101-1/+6
| | | | | | | | | by running the tunable_mbinit() SYSINIT at SI_ORDER_MIDDLE before the init_maxsockets() SYSINT at SI_ORDER_ANY. Reviewed by: rwatson, zec Sponsored by: The FreeBSD Foundation MFC after: 4 weeks
* make kern.ipc.nmbclusters actually have a useful effect on nmbclusters et al.kmacy2008-11-091-3/+4
| | | | initialize pkthdr in field order
* Reintroduce UMA_SLAB_KMAP; however, change its spelling toalc2008-04-041-1/+2
| | | | | | | | | | | | | | | | | | | UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+1
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Give MEXTADD() another argument to make both void pointers to thephk2008-02-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.
* - fix tab to space issue, hmm maybe I should use vi.rrs2007-12-151-1/+1
|
* - Puts default limits on 4k/9k and 16k zones for mbufs all basedrrs2007-12-051-6/+67
| | | | | | | | | on 1/2 of each of the successive limits tied to the limit for 2k clusters. - Adds real functionality in so that doing a sysctl to change these actually changes them :-) MFC after: 1 week
* Introduce an UMA backend page allocator for the jumbo frame zones thatalc2007-12-041-0/+33
| | | | | | | | allocates physically contiguous memory. MFC after: 3 months Requested and reviewed by: Kip Macy Tested by: Andrew Gallatin and Pyun YongHyeon
* style(9)obrien2007-10-261-11/+13
|
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* This patch adds an M_NOFREE flag which allows one to mark an mbuf askmacy2007-10-061-0/+1
| | | | | | | | | | | | not being independently freeable. This allows one to embed an mbuf in the cluster itself. This confers the benefits of the packet zone on all cluster sizes. Embedded mbufs currently suffer from the same limitation that packet zone mbufs do in that one cannot disconnect them and pass them around independently of the cluster. It would likely be possible to eliminate this limitation in the future by adding a second reference for the mbuf itself. Approved by: re(gnn)
* Allow drivers to free an mbuf without having the mbuf be touched ifkmacy2007-10-061-2/+5
| | | | | | the driver has already freed any attached tags Approved by: re(gnn)
* Despite several examples in the kernel, the third argument ofdwmalone2007-06-041-1/+1
| | | | | | | | | | | | | sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.
* Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone,kmacy2007-04-041-29/+37
| | | | | | | simplify setting refcnt Reviewed by: andre, rwatson, and glebius MFC after: 3 days
* Fix for problems that occur when all mbuf clusters migrate to the mbuf packetmohans2007-01-251-0/+8
| | | | | | | | | zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Remove VLAN mtag UMA zones and initialize ether_vtag and tso_segsz packetandre2006-09-171-25/+4
| | | | | | header fields to zero on mbuf allocation. Sponsored by: TCP/IP Optimization Fundraise 2005
* Move some functions and definitions from uipc_socket2.c to uipc_socket.c:rwatson2006-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month
* Allow for nmbclusters and maxsockets to be increased via sysctl.ps2006-04-211-1/+19
| | | | | An eventhandler is used to update all the various zones that depend on these values.
* Properly handle the case when the packet secondary zone can't allocateandre2006-03-081-2/+2
| | | | | | | | | further mbuf clusters to attach to mbufs. Reported by: kris Tested by: kris Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
* One more grammar nit.glebius2006-02-271-1/+1
| | | | Submitted by: ru
* Fix several typos and trim spaces at eol.glebius2006-02-261-7/+7
| | | | | PR: kern/93759 Submitted by: Antoine Brodin <antoine.brodin laposte.net>
* Replace the 4k fixed sized jumbo mbuf clusters with PAGE_SIZE sizedandre2006-02-171-11/+11
| | | | | | | | | | | | | | jumbo mbuf clusters. To make the variable size clear they are named MJUMPAGESIZE. Having jumbo clusters with the native PAGE_SIZE is more useful than a fixed 4k size according the device driver writers using this API. The 9k and 16k jumbo mbuf clusters remain unchanged. Requested by: glebius, gallatin Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
* Merge the //depot/user/yar/vlan branch into CVS. It contains some collectiveglebius2006-01-301-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | work by yar, thompsa and myself. The checksum offloading part also involves work done by Mihail Balikov. The most important changes: o Instead of global linked list of all vlan softc use a per-trunk hash. The size of hash is dynamically adjusted, depending on number of entries. This changes struct ifnet, replacing counter of vlans with a pointer to trunk structure. This change is an improvement for setups with big number of VLANs, several interfaces and several CPUs. It is a small regression for a setup with a single VLAN interface. An alternative to dynamic hash is a per-trunk static array with 4096 entries, which is a compile time option - VLAN_ARRAY. In my experiments the array is not an improvement, probably because such a big trunk structure doesn't fit into CPU cache. o Introduce an UMA zone for VLAN tags. Since drivers depend on it, the zone is declared in kern_mbuf.c, not in optional vlan(4) driver. This change is a big improvement for any setup utilizing vlan(4). o Use rwlock(9) instead of mutex(9) for locking. We are the first ones to do this! :) o Some drivers can do hardware VLAN tagging + hardware checksum offloading. Add an infrastructure for this. Whenever vlan(4) is attached to a parent or parent configuration is changed, the flags on vlan(4) interface are updated. In collaboration with: yar, thompsa In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
* In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg().andre2006-01-231-1/+1
| | | | | | | | | | The success of the cluster allocation is checked through a field in the mbuf structure. This change is non-functional but properly silences code inspection tools. Found by: Coverity Prevent(tm) Coverity ID: CID807 Sponsored by: TCP/IP Optimization Fundraise 2005
* Hide the 4k mbuf clusters if the normal clusters are defined to beandre2005-12-101-0/+2
| | | | | | | | 4k already. This unbreaks tinderbox. Submitted by: ru
* Add an API for jumbo mbuf cluster allocation and also provideandre2005-12-081-1/+19
| | | | | | | | | | | | | | | | | | | | | | 4k clusters in addition to 9k and 16k ones. struct mbuf *m_getjcl(int how, short type, int flags, int size) void *m_cljget(struct mbuf *m, int how, int size) m_getjcl() returns an mbuf with a cluster of the specified size attached like m_getcl() does for 2k clusters. m_cljget() is different from m_clget() as it can allocate clusters without attaching them to an mbuf. In that case the return value is the pointer to the cluster of the requested size. If an mbuf was specified, it gets the cluster attached to it and the return value can be safely ignored. For size both take MCLBYTES, MJUM4BYTES, MJUM9BYTES, MJUM16BYTES. Reviewed by: glebius Tested by: glebius Sponsored by: TCP/IP Optimization Fundraise 2005
* Fix panic string in last revision.glebius2005-11-061-1/+1
|
* Free only those mbuf+clusters back to the packet zone that were allocatedandre2005-11-051-1/+2
| | | | | | | | | | from there. All others get broken up and free'd individually to the mbuf and cluster zones. The packet zone is a secondary zone to the mbuf zone. There is currently a limitation in UMA which prevents decreasing the packet zone stock when the mbuf and cluster zone are drained and all their members are part of packets. When this is fixed this change may be reverted.
* Fix a logic error introduced with mandatory mbuf cluster refcounting andandre2005-11-041-4/+3
| | | | freeing of mbufs+clusters back to the packet zone.
* Mandatory mbuf cluster reference counting and groundwork for UMAandre2005-11-021-42/+151
| | | | | | | | | | | | | | | | | | | | | | | | based jumbo 9k and jumbo 16k cluster support. All mbuf's with external storage attached are mandatory reference counted. For clusters and jumbo clusters UMA provides the refcnt storage directly. It does not have to be separatly allocated. Any other type of external storage gets its own refcnt allocated from an UMA mbuf refcnt zone instead of normal kernel malloc. The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer publically accessible. The proper m_* functions have to be used. mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well as 9k and 16k clusters. Clusters and jumbo clusters may be obtained without attaching it immideatly to an mbuf. This is for high performance cluster allocation in network drivers where mbufs are attached after the cluster has been filled. Tested by: rwatson Sponsored by: TCP/IP Optimizations Fundraise 2005
* No longer maintain mbstat statistics for the mbuf allocator, UMArwatson2005-09-271-11/+0
| | | | | | statistics and libmemstat(3) are now used to track mbuf statistics. MFC after: 1 month
* Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, whichrwatson2005-07-171-4/+6
| | | | | | | | | | | are string names for their respective UMA zones and malloc types, and are passed into uma_zcreate() and MALLOC_DEFINE(). Export them outside of _KERNEL in mbuf.h so that netstat can reference them. Change the names to improve consistency, with each zone/type associated with the mbuf allocator being prefixed mbuf_. MFC after: 1 week
* Fix the false memory modified after free messages some users have beensilby2005-06-291-0/+3
| | | | | | | | | | | | | reporting - in my previous change, I missed the case where a mbuf from the packet zone was freed back to the mbuf/packet keg, where it was subsequently put into the mbuf zone and found not to contain the expected trash. This change adds the necessary trash_dtor call inside mb_fini_pack so that everything is correct. Thanks for Bosko for finding the bug and showing me how secondary zones work. Approved by: re (dwhite)
* Change the mbuf, mbuf cluster, and mbuf packet allocation routines so thatsilby2005-06-231-0/+34
| | | | | | | | | the UMA "trash" allocator is used - this ensures that any writes to a freed mbuf should provoke a panic. Only enabled under INVARIANTS, of course. Approved by: re (scottl)
* Well, it seems that I pre-maturely removed the "All rights reserved"bmilekic2005-02-161-2/+2
| | | | | | | | | | | | | | statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h
* Optimize the way reference counting is performed with Mbufs. Webmilekic2005-02-101-4/+2
| | | | | | | | | | | | | | do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster) constructor to initialize the reference counter anymore. The reference counts are located in a separate memory region (in the slab header, because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very often in a cache miss. Additionally, and perhaps more significantly, optimize the free mbuf+cluster (packet) case, which is very common, to no longer require an atomic operation on free (to verify the reference counter) if the reference on the cluster has never been increased (also very common). Reduces an atomic on mbuf free on average. Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>
* Update copyright, remove "all rights reserved" (since they are notbmilekic2005-02-011-5/+1
| | | | | | all reserved, as the lisence makes clear), and strike the third clause (now this is a 2-clause liberal BSDL as are the rest of files I hold copyright over).
* CTASSERT that MSZIE is a power of 2 (otherwise dtom() breaks)brian2004-09-201-1/+4
| | | | | | | | | | Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks) As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE anyway, but that was an implementation side-effect.... KASSERT -> CTASSERT suggested by: dd@ Approved by: silence on -net
* * Add a "how" argument to uma_zone constructors and initialization functionsgreen2004-08-021-35/+32
| | | | | | | | | | | | | | | | | so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.
* Fix a couple of bugs in the mbuf and packet ctors. In the latter case,bmilekic2004-06-011-5/+3
| | | | | | nextpkt within the m_hdr was not being initialized to NULL for !M_PKTHDR cases. *Maybe* this will fix weird socket buffer inconsistency panics, but we'll see.
* Bring in mbuma to replace mballoc.bmilekic2004-05-311-0/+385
mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
OpenPOWER on IntegriCloud