summaryrefslogtreecommitdiffstats
path: root/net/bridge
Commit message (Collapse)AuthorAgeFilesLines
* bridge: use time_before() in br_fdb_cleanup()Fabio Checconi2008-03-201-1/+1
| | | | | | | | | In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they should be compared using the time_after() macro. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: fix ebtable targets returnJoonwoo Park2008-02-233-3/+3
| | | | | | | | The function ebt_do_table doesn't take NF_DROP as a verdict from the targets. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix incorrect use of skb_make_writableJoonwoo Park2008-02-193-3/+3
| | | | | | | | | http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ebtables: mark matches, targets and watchers __read_mostlyJan Engelhardt2008-01-3115-27/+15
| | | | | | Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ebtables: Update modules' descriptionsJan Engelhardt2008-01-3116-4/+16
| | | | | | | | Update the MODULES_DESCRIPTION() tags for all Ebtables modules. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ebtables: remove casts, use constsJan Engelhardt2008-01-3116-63/+85
| | | | | | Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: bridge netfilter: remove nf_bridge_info read-only netoutdev memberPatrick McHardy2008-01-311-4/+0
| | | | | | | | | | | Before the removal of the deferred output hooks, netoutdev was used in case of VLANs on top of a bridge to store the VLAN device, so the deferred hooks would see the correct output device. This isn't necessary anymore since we're calling the output hooks for the correct device directly in the IP stack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-281-1/+1
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Consolidate kernel netlink socket destruction.Denis V. Lunev2008-01-281-2/+2
| | | | | | | | | | Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Remove unused include of a header file in ebtables.cRami Rosen2008-01-281-2/+0
| | | | | | | | | In net/bridge/netfilter/ebtables.c, - remove unused include of a header file (linux/tty.h) and remove the corresponding comment above it. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Remove unused macros from ebt_vlan.cRami Rosen2008-01-281-2/+0
| | | | | | | | Remove two unused macros, INV_FLAG and SET_BITMASK from net/bridge/netfilter/ebt_vlan.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Simple ctl_table to ctl_path conversions.Pavel Emelyanov2008-01-281-19/+5
| | | | | | | | | | | This patch includes many places, that only required replacing the ctl_table-s with appropriate ctl_paths and call register_sysctl_paths(). Nothing special was done with them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED optionPatrick McHardy2008-01-281-1/+1
| | | | | | | | | | | The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter options when disabled and provides defaults (M) that should allow to run a distribution firewall without further thinking. Defaults to 'y' to avoid breaking current configurations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo argPatrick McHardy2008-01-282-2/+2
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_log: move logging stuff to seperate headerPatrick McHardy2008-01-282-0/+2
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Use cpu_to_be16() where appropriate.YOSHIFUJI Hideaki2008-01-281-1/+1
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Mark hooks __read_mostlyPatrick McHardy2008-01-283-3/+3
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make rtnetlink infrastructure network namespace aware (v3)Denis V. Lunev2008-01-281-2/+2
| | | | | | | | | | | | | | | | | After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)Denis V. Lunev2008-01-281-0/+9
| | | | | | | | | | | | | | | | Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy2008-01-281-6/+6
| | | | | | | | | | | The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Kobject: convert remaining kobject_unregister() to kobject_put()Greg Kroah-Hartman2008-01-241-1/+1
| | | | | | | | | | | There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Kobject: convert net/bridge/br_if.c to use kobject_init/add_ng()Greg Kroah-Hartman2008-01-241-7/+3
| | | | | | | | | This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Kobject: change net/bridge to use kobject_create_and_addGreg Kroah-Hartman2008-01-244-13/+7
| | | | | | | | | | | The kobject in the bridge code is only used for registering with sysfs, not for any lifespan rules. This patch changes it to be only a pointer and use the simpler api for this kind of thing. Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [NETFILTER]: bridge-netfilter: fix net_device refcnt leaksPatrick McHardy2008-01-201-0/+27
| | | | | | | | | | | | | | | | | | | | When packets are flood-forwarded to multiple output devices, the bridge-netfilter code reuses skb->nf_bridge for each clone to store the bridge port. When queueing packets using NFQUEUE netfilter takes a reference to skb->nf_bridge->physoutdev, which is overwritten when the packet is forwarded to the second port. This causes refcount unterflows for the first device and refcount leaks for all others. Additionally this provides incorrect data to the iptables physdev match. Unshare skb->nf_bridge by copying it if it is shared before assigning the physoutdev device. Reported, tested and based on initial patch by Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: bridge: fix double POST_ROUTING invocationPatrick McHardy2008-01-111-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | The bridge code incorrectly causes two POST_ROUTING hook invocations for DNATed packets that end up on the same bridge device. This happens because packets with a changed destination address are passed to dst_output() to make them go through the neighbour output function again to build a new destination MAC address, before they will continue through the IP hooks simulated by bridge netfilter. The resulting hook order is: PREROUTING (bridge netfilter) POSTROUTING (dst_output -> ip_output) FORWARD (bridge netfilter) POSTROUTING (bridge netfilter) The deferred hooks used to abort the first POST_ROUTING invocation, but since the only thing bridge netfilter actually really wants is a new MAC address, we can avoid going through the IP stack completely by simply calling the neighbour output function directly. Tested, reported and lots of data provided by: Damien Thebault <damien.thebault@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Assign random address.Stephen Hemminger2007-12-161-2/+1
| | | | | | | | | | | | | | Assigning a valid random address to bridge device solves problems when bridge device is brought up before adding real device to bridge. When the first real device is added to the bridge, it's address will overide the bridges random address. Note: any device added to a bridge must already have a valid ethernet address. br_add_if -> br_fdb_insert -> fdb_insert -> is_valid_ether_addr Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Section fix.Andrew Morton2007-12-071-1/+1
| | | | | | | WARNING: vmlinux.o(.init.text+0x204e2): Section mismatch: reference to .exit.text:br_fdb_fini (between 'br_init' and 'br_fdb_init') Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Properly dereference the br_should_route_hookPavel Emelyanov2007-11-292-5/+6
| | | | | | | | | | | | | | | | This hook is protected with the RCU, so simple if (br_should_route_hook) br_should_route_hook(...) is not enough on some architectures. Use the rcu_dereference/rcu_assign_pointer in this case. Fixed Stephen's comment concerning using the typeof(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [BRIDGE]: Lost call to br_fdb_fini() in br_init() error pathPavel Emelyanov2007-11-291-1/+3
| | | | | | | | | In case the br_netfilter_init() (or any subsequent call) fails, the br_fdb_fini() must be called to free the allocated in br_fdb_init() br_fdb_cache kmem cache. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [BRIDGE]: Add missing "space"Joe Perches2007-11-191-1/+1
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: bridge: fix double POSTROUTING hook invocationPatrick McHardy2007-11-131-0/+3
| | | | | | | | | Packets routed between bridges have the POST_ROUTING hook invoked twice since bridging mistakes them for bridged packets because they have skb->nf_bridge set. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ebt_arp: fix --arp-gratuitous matching dependence on ↵Bart De Schuymer2007-11-071-1/+1
| | | | | | | | | | | --arp-ip-{src,dst} Fix --arp-gratuitous matching dependence on --arp-ip-{src,dst} Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Lutz Preßler <Lutz.Pressler@SerNet.DE> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Convert files to UTF-8 and some cleanupsJan Engelhardt2007-10-191-1/+1
| | | | | | | | | | | | | | | | | | * Convert files to UTF-8. * Also correct some people's names (one example is Eißfeldt, which was found in a source file. Given that the author used an ß at all in a source file indicates that the real name has in fact a 'ß' and not an 'ss', which is commonly used as a substitute for 'ß' when limited to 7bit.) * Correct town names (Goettingen -> Göttingen) * Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313) Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
* [BRIDGE]: Remove SKB share checks in br_nf_pre_routing().Patrick McHardy2007-10-151-3/+0
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Replace sk_buff ** with sk_buff *Herbert Xu2007-10-1512-57/+49
| | | | | | | | | With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroomHerbert Xu2007-10-153-30/+9
| | | | | | | | | | | This patch replaces unnecessary uses of skb_copy, pskb_copy and skb_realloc_headroom by functions such as skb_make_writable and pskb_expand_head. This allows us to remove the double pointers later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Unshare skb upon entryHerbert Xu2007-10-151-0/+4
| | | | | | | | | | Due to the special location of the bridging hook, it should never see a shared packet anyway (certainly not with any in-kernel code). So it makes sense to unshare the skb there if necessary as that will greatly simplify the code below it (in particular, netfilter). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* kobjects: fix up improper use of the kobject name fieldGreg Kroah-Hartman2007-10-121-1/+1
| | | | | | | A number of different drivers incorrect access the kobject name field directly. This is not correct as the name might not be in the array. Use the proper accessor function instead.
* [NETFILTER]: bridge: remove broken netfilter binary sysctlsJoseph Fannin2007-10-101-5/+0
| | | | | | | | | | | | | | | | | | | The netfilter sysctls in the bridging code don't set strategy routines: sysctl table check failed: /net/bridge/bridge-nf-call-arptables .3.10.1 Missing strategy sysctl table check failed: /net/bridge/bridge-nf-call-iptables .3.10.2 Missing strategy sysctl table check failed: /net/bridge/bridge-nf-call-ip6tables .3.10.3 Missing strategy sysctl table check failed: /net/bridge/bridge-nf-filter-vlan-tagged .3.10.4 Missing strategy sysctl table check failed: /net/bridge/bridge-nf-filter-pppoe-tagged .3.10.5 Missing strategy These binary sysctls can't work. The binary sysctl numbers of other netfilter sysctls with this problem are being removed. These need to go as well. Signed-off-by: Joseph Fannin <jfannin@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ETHTOOL] Provide default behaviors for a few ethtool sub-ioctlsJeff Garzik2007-10-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | For the operations get-tx-csum get-sg get-tso get-ufo the default ethtool_op_xxx behavior is fine for all drivers, so we permit op==NULL to imply the default behavior. This provides a more uniform behavior across all drivers, eliminating ethtool(8) "ioctl not supported" errors on older drivers that had not been updated for the latest sub-ioctls. The ethtool_op_xxx() functions are left exported, in case anyone wishes to call them directly from a driver-private implementation -- a not-uncommon case. Should an ethtool_op_xxx() helper remain unused for a while, except by net/core/ethtool.c, we can un-export it at a later date. [ Resolved conflicts with set/get value ethtool patch... -DaveM ] Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Nuke SET_MODULE_OWNER macro.Ralf Baechle2007-10-101-1/+0
| | | | | | | | | | | | It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make the device list and device lookups per namespace.Eric W. Biederman2007-10-104-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Support multiple network namespaces with netlinkEric W. Biederman2007-10-101-2/+3
| | | | | | | | | | | | | | | | | | | | | Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make device event notification network namespace safeEric W. Biederman2007-10-101-0/+4
| | | | | | | | | | | | | | | | | | Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make packet reception network namespace safeEric W. Biederman2007-10-101-0/+4
| | | | | | | | | | | | | | This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: DIV_ROUND_UP cleanup (part two)Ilpo Järvinen2007-10-101-1/+1
| | | | | | | | | | | Hopefully captured all single statement cases under net/. I'm not too sure if there is some policy about #includes that are "guaranteed" (ie., in the current tree) to be available through some other #included header, so I just added linux/kernel.h to each changed file that didn't #include it previously. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] skbuff: Add skb_cow_headHerbert Xu2007-09-161-1/+1
| | | | | | | | | | | This patch adds an optimised version of skb_cow that avoids the copy if the header can be modified even if the rest of the payload is cloned. This can be used in encapsulating paths where we only need to modify the header. As it is, this can be used in PPPOE and bridging. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Kill clone argument to br_flood_*Herbert Xu2007-09-164-50/+31
| | | | | | | | | The clone argument is only used by one caller and that caller can clone the packet itself. This patch moves the clone call into the caller and kills the clone argument. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix/improve deadlock condition on module removal netfilterNeil Horman2007-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Fix OOPS when bridging device without ethtool.Stephen Hemminger2007-08-301-8/+8
| | | | | | | | | | | | | | | Bridge code calls ethtool to get speed. The conversion to using only ethtool_ops broke the case of devices without ethtool_ops. This is a new regression in 2.6.23. Rearranged the switch to a logical order, and use gcc initializer. Ps: speed should have been part of the network device structure from the start rather than burying it in ethtool. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Acked-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud