FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge 260488, r260508.	melifaro	2014-05-08	1	-78/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r260488: Split rt_newaddrmsg_fib() into two different functions. Adding/deleting interface addresses involves access to 3 different subsystems, int different parts of code. Each call can fail, so reporting successful operation by rtsock in the middle of the process error-prone. Further split routing notification API and actual rtsock calls via creating public-available rt_addrmsg() / rt_routemsg() functions with "private" rtsock_* backend. r260508: Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde).
*	Merge r260379, r260460.	melifaro	2014-05-08	1	-6/+5
\| \| \| \| \| \| \| \| \| \|	r260379: Partially fix IPv4 interface routes deletion in RADIX_MPATH. Noticed by: Nikolay Denev <ndenev at gmail.com> r260460: Constanly use RT_ALL_FIBS everywhere instead of -1.
*	Merge r259528, r259528, r260295.	melifaro	2014-05-08	1	-22/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r259528: Simplify contiguous mask checking. Suggested by: glebius r260228: Remove useless register variable modifiers. Do some more style(9). r260295: Change semantics for rnh_lookup() function: now it performs exact match search, regardless of netmask existance. This simplifies most of rnh_lookup() consumers. Fix panic triggered by deleting non-existent host route. PR: kern/185092 Submitted by: Nikolay Denev <ndenev at gmail.com>
*	o Provide a compatibility shim for netstat(1) to obtain output queue	glebius	2014-04-03	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	drops via NET_RT_IFLISTL sysctl. The sysctl handler appends oqdrops at the end of struct if_msghdrl, and netstat(1) sees that as an additional field of struct if_data. This allows us to fetch the data keeping ABI and API compatibility. This is direct commit to stable/10. o Merge r263331 from head, to restore printing of queue drops. Sponsored by: Nginx, Inc. Sponsored by: Netflix
*	Merge r262763, r262767, r262771, r262806 from head:	glebius	2014-03-21	1	-35/+23
\| \| \| \| \| \| \| \| \| \|	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
*	After r241616 properly export ifi_baudrate_pf in the 32bit compat case.	bz	2013-08-20	1	-1/+2
\| \| \| \|	MFC after: 3 days
*	sin6 should be assigned before the loop.	hrs	2013-07-28	1	-1/+1
\|
*	Add a leaf node CTL_NET.PF_ROUTE.0.AF.NET_RT_DUMP.0.FIB. This returns	hrs	2013-07-12	1	-2/+13
\| \| \| \|	routing table with the specified FIB number, not td->td_proc->p_fibnum.
*	Due to the routing related networking kernel redesign work	qingli	2013-06-25	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \|	in FBSD 8.0, interface routes have been returened to the applications without the RTF_GATEWAY bit. This incompatibility has caused some issues with Zebra, Qugga and the like. This patch provides the RTF_GATEWAY flag bit in returned interface routes so to behave similarly to pre 8.0 systems. Reviewed by: hrs Verified by: mackn at opendns dot com
*	- Use m_getcl() instead of hand allocating.	glebius	2013-03-15	1	-11/+8
\| \| \| \| \| \| \| \|	- Convert panic() to KASSERT. - Remove superfluous cleaning of mbuf fields after allocation. - Add comment on possible use of m_get2() here. Sponsored by: Nginx, Inc.
*	- Move definition of V_deembed_scopeid to scope6_var.h.	hrs	2012-12-05	1	-63/+51
\| \| \| \| \| \|	- Deembed scope id in L3 address in in6_lltable_dump(). - Simplify scope id recovery in rtsock routines. - Remove embedded scope id handling in ndp(8) and route(8) completely.
*	Mechanically substitute flags from historic mbuf allocator with	glebius	2012-12-05	1	-2/+2
\| \| \| \| \| \| \| \| \|	malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
*	- Fix LOR in sa6_recoverscope() in rt_msg2()[1].	hrs	2012-12-04	1	-39/+32
\| \| \| \| \| \| \|	- Check V_deembed_scopeid before checking if sa_family == AF_INET6. - Fix scope id handing in route(8)[2] and ifconfig(8). Reported by: rpaulo[1], Mateusz Guzik[1], peter[2]
*	Fix up a compile time warning if INET6 isn't defined.	adrian	2012-11-18	1	-1/+1
\|
*	Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to	hrs	2012-11-17	1	-0/+90
\| \| \| \| \| \| \| \| \| \| \| \|	userland via routing socket or sysctl. This eliminates the following KAME-specific sin6_scope_id handling routine from each userland utility: sin6.sin6_scope_id = ntohs((u_int16_t )&sin6.sin6_addr.s6_addr[2]); This behavior can be controlled by net.inet6.ip6.deembed_scopeid. This is set to 1 by default (sin6_scope_id will be filled in the kernel). Reviewed by: bz
*	Mechanically remove the last stray remains of spl* calls from net/.	andre	2012-10-18	1	-11/+1
\| \| \| \|	They have been Noop's for a long time now.
*	Do not require radix write lock to be held while dumping route table	melifaro	2012-04-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	via sysctl(4) interface. This permits router not to stop forwarding packets while route table is being written to user-supplied buffer. Reported by: Pawel Tyll <ptyll@nitronet.pl> Approved by: kib(mentor) MFC after: 1 week
*	Introduce a new NET_RT_IFLISTL API to query the address list. It works	bz	2012-02-11	1	-44/+203
\| \| \| \| \| \| \| \| \| \| \|	on extended and extensible structs if_msghdrl and ifa_msghdrl. This will allow us to extend both the msghdrl structs and eventually if_data in the future without breaking the ABI. Bump __FreeBSD_version to allow ports to more easily detect the new API. Reviewed by: glebius, brooks MFC after: 3 days
*	Backout changes from r228571. Remove if_data from struct ifa_msghdr again.	bz	2012-02-11	1	-4/+0
\| \| \| \| \| \| \|	While this breaks carp on HEAD temporary, it restores the upgrade path from stable, and head before 20111215. Reviewed by: glebius, brooks
*	Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571.	glebius	2012-01-08	1	-0/+1
\| \| \| \|	Submitted by: bz
*	Convert all users of IF_ADDR_LOCK to use new locking macros that specify	jhb	2012-01-05	1	-10/+10
\| \| \| \| \| \| \|	either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks
*	A major overhaul of the CARP implementation. The ip_carp.c was started	glebius	2011-12-16	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
*	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.	ed	2011-11-07	1	-2/+2
\| \| \| \| \| \|	The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
*	Fix a use-after-free/redzone issue in the routing code.	mlaier	2011-11-03	1	-12/+14
\| \| \| \| \| \| \|	Reported by (repeatedly): Mike Tancsa Prodded by (repeatedly): bz Forgotten by (repeatedly): mlaier MFC after: 2 weeks
*	Pass the fibnum where we need filtering of the message on the	bz	2011-09-28	1	-3/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rtsock allowing routing daemons to filter routing updates on an rtsock per FIB. Adjust raw_input() and split it into wrapper and a new function taking an optional callback argument even though we only have one consumer [1] to keep the hackish flags local to rtsock.c. PR: kern/134931 Submitted by: multiple (see PR) Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 3 days
*	As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm	mlaier	2011-02-10	1	-1/+3
\| \| \| \| \| \| \|	until rt_dispatch is done with the sockaddr. Found by: memguard MFC after: 3 days
*	Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating	bz	2010-10-16	1	-0/+4
\| \| \| \| \| \| \| \| \|	over all interfaces to make sure the address will neither change nor be freed while we are working on it. PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 1 week
*	Properly set ifi_datalen for compat32 struct if_data32.	kib	2010-08-03	1	-1/+1
\| \| \| \| \| \|	PR: kern/149240 Submitted by: Stef Walter <stef memberwebs com> MFC after: 1 weeks
*	This patch fixes the problem where proxy ARP entries cannot be added	qingli	2010-05-25	1	-5/+16
\| \| \| \| \| \|	over the if_ng interface. MFC after: 3 days
*	Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.	kib	2010-04-25	1	-1/+101
\| \| \| \| \| \| \| \| \|	This allows getifaddrs(3) to work for compat32 binaries. Submitted by: jhb (6.x version) Reviewed by: emaste Tested by: emaste and <pluknet gmail com> MFC after: 2 weeks
*	The proxy arp entries could not be added into the system over the	qingli	2009-12-30	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days
*	Throughout the network stack we have a few places of	bz	2009-12-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days
*	As part of r196609, a call to "rtalloc" did not take the fib into	qingli	2009-08-31	1	-1/+1
\| \| \| \| \| \| \| \|	account. So call the appropriate "rtalloc_ign_fib()" instead of calling "rtalloc_ign()". Reviewed by:i pointed out by bz MFC after: immediately
*	In ip_output(), the flow-table module must not try to cache L2/L3	qingli	2009-08-28	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days
*	Put multiple instructions into a block when iterating; unbreaks	bz	2009-08-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	NET_RT_DUMP, which otherwise only returned information of AF_MAX. This was broken in r193232 (save your time - my bug, my fix). PR: kern/137700 Reported by: Larry Baird (lab gta.com) Tested by: Larry Baird (lab gta.com) Reviewed by: zec, lstewart, qing Approved by: re (kib)
*	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and	rwatson	2009-08-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
*	Introduce and use a sysinit-based initialization scheme for virtual	rwatson	2009-07-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)
*	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator	rwatson	2009-07-14	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
*	Modify most routines returning 'struct ifaddr *' to return references	rwatson	2009-06-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
*	Clean up common ifaddr management:	rwatson	2009-06-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks
*	SCTP needs either IPv4 or IPv6 as lower layer[1].	bz	2009-06-10	1	-0/+4
\| \| \| \| \| \| \| \|	So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]
*	After r193232 rt_tables in vnet.h are no longer indirectly dependent on	bz	2009-06-08	1	-1/+0
\| \| \| \| \| \| \| \| \|	the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
*	Convert the two dimensional array to be malloced and introduce	bz	2009-06-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible
*	Reimplement the netisr framework in order to support parallel netisr	rwatson	2009-06-01	1	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz
*	Add hierarchical jails. A jail may further virtualize its environment	jamie	2009-05-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
*	Change the curvnet variable from a global const struct vnet *,	zec	2009-05-05	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
*	In preparation for turning on options VIMAGE in next commits,	zec	2009-04-26	1	-1/+0
\| \| \| \| \| \| \| \|	rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)
*	Acquire address list lock before walking an interface's address list to	rwatson	2009-04-20	1	-0/+4
\| \| \| \| \| \|	identify possible jail addresses on it for IPv4 and IPv6. MFC after: 2 weeks
*	Extend route command:	kmacy	2009-04-14	1	-6/+8
\| \| \| \| \| \| \| \| \| \|	- add show as alias for get - add weights to allow mpath to do more than equal cost - add sticky / nostick to disable / re-enable per-connection load balancing This adds a field to rt_metrics_lite so network bits of world will need to be re-built. Reviewed by: jeli & qingli
*	Call prison_if from rtm_get_jailed, instead of splitting it out into	jamie	2009-02-05	1	-90/+63
\| \| \| \| \| \| \|	prison_check_ip4 and prison_check_ip6. As prison_if includes a jailed() check, remove that check before calling rtm_get_jailed. Approved by: bz (mentor)