summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* pipe: switch wait_for_partner() and wake_up_partner() to pipe_inode_infoAl Viro2013-04-091-9/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: fold file_operations instances in oneAl Viro2013-04-094-195/+38
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold fifo.c into pipe.cAl Viro2013-04-093-154/+139
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write out of ->splice_write()Al Viro2013-04-091-6/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write into default_file_splice_write()Al Viro2013-04-091-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write() out of ->write()Al Viro2013-04-098-13/+26
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch compat readv/writev variants to COMPAT_SYSCALL_DEFINEAl Viro2013-04-095-210/+197
| | | | | | ... and take to fs/read_write.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* f2fs: use mnt_want_write_file() in ioctlAl Viro2013-04-091-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lift sb_start_write/sb_end_write out of ->aio_write()Al Viro2013-04-0911-22/+28
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* hpfs: move setting hpfs-private i_dirty to ->write_end()Al Viro2013-04-091-16/+20
| | | | | | ... so that writev(2) doesn't miss it. Get rid of hpfs_file_write(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: don't wank with EFBIG before calling do_sync_write()Al Viro2013-04-091-60/+1
| | | | | | look for file_capable() in there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold release_mounts() into namespace_unlock()Al Viro2013-04-091-23/+30
| | | | | | | | | ... and provide namespace_lock() as a trivial wrapper; switch to those two consistently. Result is patterned after rtnl_lock/rtnl_unlock pair. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch unlock_mount() to namespace_unlock(), convert all umount_tree() callersAl Viro2013-04-093-24/+16
| | | | | | | which allows to kill the last argument of umount_tree() and make release_mounts() static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* more conversions to namespace_unlock()Al Viro2013-04-091-14/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of the second argument of shrink_submounts()Al Viro2013-04-091-4/+4
| | | | | | ... it's always &unmounted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* saner umount_tree()/release_mounts(), part 1Al Viro2013-04-091-4/+13
| | | | | | | | | global list of release_mounts() fodder, protected by namespace_sem; eventually, all umount_tree() callers will use it as kill list. Helper picking the contents of that list, releasing namespace_sem and doing release_mounts() on what it got. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of full-hash scan on detaching vfsmountsAl Viro2013-04-094-97/+149
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mnt: release locks on error path in do_loopbackAndrey Vagin2013-04-091-1/+1
| | | | | | | | | | | | | | | | | do_loopback calls lock_mount(path) and forget to unlock_mount if clone_mnt or copy_mnt fails. [ 77.661566] ================================================ [ 77.662939] [ BUG: lock held when returning to user space! ] [ 77.664104] 3.9.0-rc5+ #17 Not tainted [ 77.664982] ------------------------------------------------ [ 77.666488] mount/514 is leaving the kernel with locks still held! [ 77.668027] 2 locks held by mount/514: [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0 [ 77.671755] #1: (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* palinfo fixesAl Viro2013-04-091-64/+13
| | | | | | | | | * check for proc_mkdir() failures * fix buffer overrun - sizeof(format string) is *not* enough to hold sprintf() result. * use proc_remove_subtree(); life's much easier with it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* procfs: add proc_remove_subtree()Al Viro2013-04-092-30/+91
| | | | | | | | | just what it sounds like; do that only to procfs subtrees you've created - doing that to something shared with another driver is not only antisocial, but might cause interesting races with proc_create() and its ilk. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ecryptfs: close rmmod raceAl Viro2013-04-091-12/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-03-265-8/+48
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "stable fodder; assorted deadlock fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vt: synchronize_rcu() under spinlock is not nice... Nest rename_lock inside vfsmount_lock Don't bother with redoing rw_verify_area() from default_file_splice_from()
| * vt: synchronize_rcu() under spinlock is not nice...Al Viro2013-03-261-2/+4
| | | | | | | | | | | | | | | | | | vcs_poll_data_free() calls unregister_vt_notifier(), which calls atomic_notifier_chain_unregister(), which calls synchronize_rcu(). Do it *after* we'd dropped ->f_lock. Cc: stable@vger.kernel.org (all kernels since 2.6.37) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Nest rename_lock inside vfsmount_lockAl Viro2013-03-261-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... lest we get livelocks between path_is_under() and d_path() and friends. The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks; it is possible to have thread B spin on attempt to take lock shared while thread A is already holding it shared, if B is on lower-numbered CPU than A and there's a thread C spinning on attempt to take the same lock exclusive. As the result, we need consistent ordering between vfsmount_lock (lglock) and rename_lock (seq_lock), even though everything that takes both is going to take vfsmount_lock only shared. Spotted-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Don't bother with redoing rw_verify_area() from default_file_splice_from()Al Viro2013-03-213-1/+33
| | | | | | | | | | | | | | | | | | | | default_file_splice_from() ends up calling vfs_write() (via very convoluted callchain). It's an overkill, since we already have done rw_verify_area() in the caller by the time we call vfs_write() we are under set_fs(KERNEL_DS), so access_ok() is also pointless. Add a new helper (__kernel_write()), use it instead of kernel_write() in there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-03-2669-459/+540
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Always increment IPV4 ID field in encapsulated GSO packets, even when DF is set. Regression fix from Pravin B Shelar. 2) Fix per-net subsystem initialization in netfilter conntrack, otherwise we may access dynamically allocated memory before it is actually allocated. From Gao Feng. 3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka. 4) Fix race between submission of sync vs async commands in mwifiex driver, from Amitkumar Karwar. 5) Add missing cancel of command timer in mwifiex driver, from Bing Zhao. 6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna. 7) Thermal layer tries to use a genetlink multicast string that is longer than the 16 character limit. Fix it and add a BUG check to prevent this kind of thing from happening in the future. From Masatake YAMATO. 8) Fix many bugs in the handling of the teardown of L2TP connections, UDP encapsulation instances, and sockets. From Tom Parkin. 9) Missing socket release in IRDA, from Kees Cook. 10) Fix fec driver modular build, from Fabio Estevam. 11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop, from Wei Yongjun. 12) Fix bugs in handling of queue numbers and steering rules in mlx4 driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz. 13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey Vagin. 14) TCP segmentation deferral is unintentionally done too strongly, breaking ACK clocking. Fix from Eric Dumazet. 15) net_enable_timestamp() can legitimately be invoked from software interrupts, and in a way that is safe, so remove the WARN_ON(). Also from Eric Dumazet. 16) Fix use after free in VLANs, from Cong Wang. 17) Fix TCP slow start retransmit storms after SACK reneging, from Yuchung Cheng. 18) Unix socket release should mark a socket dead before NULL'ing out sock->sk, otherwise we can race. Fix from Paul Moore. 19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo. 20) Fix register mis-programming, NULL pointer derefs, and wrong PHC clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri Benc, and Jeff Kirsher. 21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet forwarding. Fix from Veaceslav Falico. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) ipv4: Fix ip-header identification for gso packets. bonding: remove already created master sysfs link on failure af_unix: dont send SCM_CREDENTIAL when dest socket is NULL pch_gbe: fix ip_summed checksum reporting on rx igb: fix PHC stopping on max freq igb: make sensor info static igb: SR-IOV init reordering igb: Fix null pointer dereference igb: fix i350 anti spoofing config ixgbevf: don't release the soft entries ipv6: fix bad free of addrconf_init_net unix: fix a race condition in unix_release() tcp: undo spurious timeout after SACK reneging bnx2x: fix assignment of signed expression to unsigned variable bridge: fix crash when set mac address of br interface 8021q: fix a potential use-after-free net: remove a WARN_ON() in net_enable_timestamp() tcp: preserve ACK clocking in TSO net: fix *_DIAG_MAX constants net/mlx4_core: Disallow releasing VF QPs which have steering rules ...
| * | ipv4: Fix ip-header identification for gso packets.Pravin B Shelar2013-03-262-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip-header id needs to be incremented even if IP_DF flag is set. This behaviour was changed in commit 490ab08127cebc25e3a26 (IP_GRE: Fix IP-Identification). Following patch fixes it so that identification is always incremented. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bonding: remove already created master sysfs link on failureVeaceslav Falico2013-03-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | If slave sysfs symlink failes to be created - we end up without removing the master sysfs symlink. Remove it in case of failure. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | af_unix: dont send SCM_CREDENTIAL when dest socket is NULLdingtianhong2013-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCM_SCREDENTIALS should apply to write() syscalls only either source or destination socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong, and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom). Origionally-authored-by: Karel Srot <ksrot@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2013-03-265-22/+43
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net Jeff Kirsher says: ==================== This series contains updates to ixgbevf and igb. The ixgbevf calls to pci_disable_msix() and to free the msix_entries memory should not occur if device open fails. Instead they should be called during device driver removal to balance with the call to pci_enable_msix() and the call to allocate msix_entries memory during the device probe and driver load. The remaining 4 of 5 igb patches are simple 1-3 line patches to fix several issues such as possible null pointer dereference, PHC stopping on max frequency, make sensor info static and SR-IOV initialization reordering. The remaining igb patch to fix anti-spoofing config fixes a problem in i350 where anti spoofing configuration was written into a wrong register. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | igb: fix PHC stopping on max freqJiri Benc2013-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For 82576 MAC type, max_adj is reported as 1000000000 ppb. However, if this value is passed to igb_ptp_adjfreq_82576, incvalue overflows out of INCVALUE_82576_MASK, resulting in setting of zero TIMINCA.incvalue, stopping the PHC (instead of going at twice the nominal speed). Fix the advertised max_adj value to the largest value hardware can handle. As there is no min_adj value available (-max_adj is used instead), this will also prevent stopping the clock intentionally. It's probably not a big deal, other igb MAC types don't support stopping the clock, either. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | igb: make sensor info staticStephen Hemminger2013-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial sparse warning. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | igb: SR-IOV init reorderingAlex Williamson2013-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | igb is ineffective at setting a lower total VFs because: int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs) { ... /* Shouldn't change if VFs already enabled */ if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE) return -EBUSY; Swap init ordering. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | igb: Fix null pointer dereferenceAlex Williamson2013-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The max_vfs= option has always been self limiting to the number of VFs supported by the device. fa44f2f1 added SR-IOV configuration via sysfs, but in the process broke this self correction factor. The failing path is: igb_probe igb_sw_init if (max_vfs > 7) { adapter->vfs_allocated_count = 7; ... igb_probe_vfs igb_enable_sriov(, max_vfs) if (num_vfs > 7) { err = -EPERM; ... This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out when igb_probe finally calls igb_reset. It seems like a really bad idea, and somewhat pointless, to set vfs_allocated_count separate from vf_data, but limiting max_vfs is enough to avoid the null pointer. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | igb: fix i350 anti spoofing configLior Levy2013-03-261-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a problem in i350 where anti spoofing configuration was written into a wrong register. Signed-off-by: Lior Levy <lior.levy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | ixgbevf: don't release the soft entriesxunleer2013-03-261-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the ixgbevf driver is opened the request to allocate MSIX irq vectors may fail. In that case the driver will call ixgbevf_down() which will call ixgbevf_irq_disable() to clear the HW interrupt registers and calls synchronize_irq() using the msix_entries pointer in the adapter structure. However, when the function to request the MSIX irq vectors failed it had already freed the msix_entries which causes an OOPs from using the NULL pointer in synchronize_irq(). The calls to pci_disable_msix() and to free the msix_entries memory should not occur if device open fails. Instead they should be called during device driver removal to balance with the call to pci_enable_msix() and the call to allocate msix_entries memory during the device probe and driver load. Signed-off-by: Li Xun <xunleer.li@huawei.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | | pch_gbe: fix ip_summed checksum reporting on rxVeaceslav Falico2013-03-261-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that checksums were correct and CHECKSUM_NONE in any other case. They're currently placed vice versa, which breaks the forwarding scenario. Fix it by placing them as described above. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: fix bad free of addrconf_init_netHong Zhiguo2013-03-251-16/+10
| | | | | | | | | | | | | | | Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | unix: fix a race condition in unix_release()Paul Moore2013-03-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by: Jan Stancek <jan.stancek@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: undo spurious timeout after SACK renegingYuchung Cheng2013-03-241-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bnx2x: fix assignment of signed expression to unsigned variableKumar Amit Mehta2013-03-241-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | fix for incorrect assignment of signed expression to unsigned variable. Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: fix crash when set mac address of br interfaceHong zhi guo2013-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I tried to set mac address of a bridge interface to a mac address which already learned on this bridge, I got system hang. The cause is straight forward: function br_fdb_change_mac_address calls fdb_insert with NULL source nbp. Then an fdb lookup is performed. If an fdb entry is found and it's local, it's OK. But if it's not local, source is dereferenced for printk without NULL check. Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | 8021q: fix a potential use-after-freeCong Wang2013-03-241-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vlan_vid_del() could possibly free ->vlan_info after a RCU grace period, however, we may still refer to the freed memory area by 'grp' pointer. Found by code inspection. This patch moves vlan_vid_del() as behind as possible. Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: remove a WARN_ON() in net_enable_timestamp()Eric Dumazet2013-03-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false positive, in socket clone path, run from softirq context : [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80() [ 3641.668811] Call Trace: [ 3641.671254] <IRQ> [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0 [ 3641.677871] [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20 [ 3641.683683] [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80 [ 3641.689668] [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450 [ 3641.695222] [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170 [ 3641.701213] [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820 [ 3641.707663] [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670 [ 3641.713354] [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0 [ 3641.719425] [<ffffffff807af63a>] tcp_check_req+0x3da/0x530 [ 3641.724979] [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80 [ 3641.730964] [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0 [ 3641.736430] [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0 [ 3641.741985] [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0 Its safe at this point because the parent socket owns a reference on the netstamp_needed, so we cant have a 0 -> 1 transition, which requires to lock a mutex. Instead of refining the check, lets remove it, as all known callers are safe. If it ever changes in the future, static_key_slow_inc() will complain anyway. Reported-by: Laurent Chavey <chavey@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: preserve ACK clocking in TSOEric Dumazet2013-03-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix *_DIAG_MAX constantsAndrey Vagin2013-03-212-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow the common pattern and define *_DIAG_MAX like: [...] __XXX_DIAG_MAX, }; Because everyone is used to do: struct nlattr *attrs[XXX_DIAG_MAX+1]; nla_parse([...], XXX_DIAG_MAX, [...] Reported-by: Thomas Graf <tgraf@suug.ch> Cc: "David S. Miller" <davem@davemloft.net> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'mlx4'David S. Miller2013-03-213-22/+47
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Or Gerlitz says: ==================== Here's a batch of mlx4 driver fixes for 3.9, mostly SRIOV/Flow-steering related. Series done against the net tree as of commit 5a3da1f "inet: limit length of fragment queue hash table bucket lists ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_core: Disallow releasing VF QPs which have steering rulesHadar Hen Zion2013-03-211-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VF QPs must not be released when they have steering rules attached to them. For that end, introduce a reference count field to the QP object in the SRIOV resource tracker which is incremented/decremented when steering rules are attached/detached to it. QPs can be released by VF only when their ref count is zero. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_core: Always use 64 bit resource ID when doing lookupHadar Hen Zion2013-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the resource tracker code paths was wrongly using int and not u64 for resource tracking IDs, fix it. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_en: Remove ethtool flow steering rules before releasing QPsHadar Hen Zion2013-03-211-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the ethtool flow steering rules cleanup to be carried out before releasing the RX QPs. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud