op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ext4: add support for lazy inode table initialization	Lukas Czerner	2010-10-27	4	-3/+611
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Add helper function for blkdev_issue_zeroout (sb_issue_discard)	Lukas Czerner	2010-10-27	1	-0/+8
\| \| \| \| \| \| \| \|	This is done the same way as helper sb_issue_discard for blkdev_issue_discard. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: Add sanity check for attempts to start handle during umount	Theodore Ts'o	2010-10-27	2	-0/+11
\| \| \| \| \| \| \| \| \|	An attempt to modify the file system during the call to jbd2_destroy_journal() can lead to a system lockup. So add some checking to make it much more obvious when this happens to and to determine where the offending code is located. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix NULL pointer dereference in print_daily_error_info()	Sergey Senozhatsky	2010-10-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Fix NULL pointer dereference in print_daily_error_info, when called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error reporting timer in ext4_put_super. Google-Bug-Id: 3017663 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: don't hold spinlock while calling ext4_issue_discard()	Lukas Czerner	2010-10-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We can't hold the block group spinlock because we ext4_issue_discard() calls wait and hence can get rescheduled. Google-Bug-Id: 3017678 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: check for negative error code from sb_issue_discard	Lukas Czerner	2010-10-27	1	-1/+1
\| \| \| \| \| \| \| \|	sb_issue_discard() is returning negative error code, so check for -EOPNOTSUPP. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: don't bump up LONG_MAX nr_to_write by a factor of 8	Eric Sandeen	2010-10-27	1	-3/+6
\| \| \| \| \| \| \| \| \|	I'm uneasy with lots of stuff going on in ext4_da_writepages(), but bumping nr_to_write from LLONG_MAX to -8 clearly isn't making anything better, so avoid the multiplier in that case. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: stop looping in ext4_num_dirty_pages when max_pages reached	Eric Sandeen	2010-10-27	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Today we simply break out of the inner loop when we have accumulated max_pages; this keeps scanning forwad and doing pagevec_lookup_tag() in the while (!done) loop, this does potentially a lot of work with no net effect. When we have accumulated max_pages, just clean up and return. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: use dedicated slab caches for group_info structures	Curt Wohlgemuth	2010-10-27	2	-21/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_group_info structures are currently allocated with kmalloc(). With a typical 4K block size, these are 136 bytes each -- meaning they'll each consume a 256-byte slab object. On a system with many ext4 large partitions, that's a lot of wasted kernel slab space. (E.g., a single 1TB partition will have about 8000 block groups, using about 2MB of slab, of which nearly 1MB is wasted.) This patch creates an array of slab pointers created as needed -- depending on the superblock block size -- and uses these slabs to allocate the group info objects. Google-Bug-Id: 2980809 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: avoid null dereference in trace_ext4_mballoc_discard	Wen Congyang	2010-10-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ac->inode is set to null in function ext4_mb_release_group_pa(), and then trace_ext4_mballoc_discard(ac) is called, the kernel will panic. BUG: unable to handle kernel NULL pointer dereference at 000000a4 IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] pdpt = 0000000000abd001 pde = 0000000000000000 Oops: 0000 [#1] SMP Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530 EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1 EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000 ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000) Call Trace: [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4] [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4] [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4] [<c05a6c5b>] ? __make_request+0x10b/0x440 [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4] [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0 [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4] [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4] [<c04c8a68>] ? find_get_pages_tag+0x38/0x110 [<c04d23a5>] ? __pagevec_release+0x15/0x20 [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4] [<c04cfbe0>] ? __writepage+0x0/0x30 [<c04d0e34>] ? do_writepages+0x14/0x30 [<c0526600>] ? writeback_single_inode+0xa0/0x240 [<c0526971>] ? writeback_sb_inodes+0xc1/0x180 [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140 [<c0526d7b>] ? wb_writeback+0x20b/0x320 [<c045aca7>] ? lock_timer_base+0x27/0x50 [<c0526fe0>] ? wb_do_writeback+0x150/0x190 [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0 [<c043b680>] ? complete+0x40/0x60 [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0 [<c0469474>] ? kthread+0x74/0x80 [<c0469400>] ? kthread+0x0/0x80 [<c040a23e>] ? kernel_thread_helper+0x6/0x10 Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: Fix I/O hang in jbd2_journal_release_jbd_inode	Brian King	2010-10-27	3	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a hang seen in jbd2_journal_release_jbd_inode on a lot of Power 6 systems running with ext4. When we get in the hung state, all I/O to the disk in question gets blocked where we stay indefinitely. Looking at the task list, I can see we are stuck in jbd2_journal_release_jbd_inode waiting on a wake up. I added some debug code to detect this scenario and dump additional data if we were stuck in jbd2_journal_release_jbd_inode for longer than 30 minutes. When it hit, I was able to see that i_flags was 0, suggesting we missed the wake up. This patch changes i_flags to be an unsigned long, uses bit operators to access it, and adds barriers around the accesses. Prior to applying this patch, we were regularly hitting this hang on numerous systems in our test environment. After applying the patch, the hangs no longer occur. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix EOFBLOCKS_FL handling	Theodore Ts'o	2010-10-27	1	-29/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out we have several problems with how EOFBLOCKS_FL is handled. First of all, there was a fencepost error where we were not clearing the EOFBLOCKS_FL when fill in the last uninitialized block, but rather when we allocate the next block _after_ the uninitalized block. Secondly we were not testing to see if we needed to clear the EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting an uninitialized block (which is the most common case). Google-Bug-Id: 2928259 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Linux 2.6.36-rc6v2.6.36-rc6	Linus Torvalds	2010-09-28	1	-1/+1
\|
*	MN10300: Handle missing sys_cacheflush() when caching disabled	David Howells	2010-09-28	2	-8/+27
\| \| \| \| \| \| \| \| \| \| \| \|	When caching is disabled on the MN10300 arch, the sys_cacheflush() function is removed by conditional stuff in the makefiles, but is still referred to by the syscall table. Provide a null version that just returns 0 when caching is disabled (or -EINVAL if the arguments are silly). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	alpha: fix compile problem in arch/alpha/kernel/signal.c	Linus Torvalds	2010-09-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Tssk. Apparently Al hadn't checked commit c52c2ddc1dfa ("alpha: switch osf_sigprocmask() to use of sigprocmask()") at all. It doesn't compile. Fixed as per suggestions from Michael Cree. Reported-by: Michael Cree <mcree@orcon.net.nz> Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2010-09-28	4	-14/+24
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ahci: fix module refcount breakage introduced by libahci split
\| *	ahci: fix module refcount breakage introduced by libahci split	Tejun Heo	2010-09-28	4	-14/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libata depends on scsi_host_template for module reference counting and sht's should be owned by each low level driver. During libahci split, the sht was left with libahci.ko leaving the actual low level drivers not reference counted. This made ahci and ahci_platform always unloadable even while they're being actively used. Fix it by defining AHCI_SHT() macro in ahci.h and defining a sht for each low level ahci driver. stable: only applicable to 2.6.35. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Pedro Francisco <pedrogfrancisco@gmail.com> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* \|	Merge branch 'hwmon-for-linus' of ↵	Linus Torvalds	2010-09-28	1	-0/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging: hwmon (coretemp): Fix build breakage if SMP is undefined
\| * \|	hwmon (coretemp): Fix build breakage if SMP is undefined	Guenter Roeck	2010-09-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit e40cc4bdfd4b89813f072f72bd9c7055814d3f0f introduced a build breakage if CONFIG_SMP is undefined. This commit fixes the problem. This fix is only a workaround. For a real fix, cpu_sibling_mask() should be defined in UP include code, eg in linux/smp.h, and asm/smp.h should not be included directly. This fix is currently not possible because asm/smp.h defines cpu_sibling_mask() unconditionally and is included directly from many source files. Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-09-28	2	-3/+4
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fix pci_resource_alignment prototype
\| * \| \|	PCI: fix pci_resource_alignment prototype	Cam Macdonell	2010-09-09	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the prototype for both pci_resource_alignment() and pci_sriov_resource_alignment(). Patch started as debugging effort from Cam Macdonell. Cc: Cam Macdonell <cam@cs.ualberta.ca> Cc: Avi Kivity <avi@redhat.com> [chrisw: add iov bits] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-09-28	53	-196/+444
\|\ \ \ \ \| \|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) tcp: Fix >4GB writes on 64-bit. net/9p: Mount only matching virtio channels de2104x: fix ethtool tproxy: check for transparent flag in ip_route_newports ipv6: add IPv6 to neighbour table overflow warning tcp: fix TSO FACK loss marking in tcp_mark_head_lost 3c59x: fix regression from patch "Add ethtool WOL support" ipv6: add a missing unregister_pernet_subsys call s390: use free_netdev(netdev) instead of kfree() sgiseeq: use free_netdev(netdev) instead of kfree() rionet: use free_netdev(netdev) instead of kfree() ibm_newemac: use free_netdev(netdev) instead of kfree() smsc911x: Add MODULE_ALIAS() net: reset skb queue mapping when rx'ing over tunnel br2684: fix scheduling while atomic de2104x: fix TP link detection de2104x: fix power management de2104x: disable autonegotiation on broken hardware net: fix a lockdep splat e1000e: 82579 do not gate auto config of PHY by hardware during nominal use ...
\| * \| \|	tcp: Fix >4GB writes on 64-bit.	David S. Miller	2010-09-27	3	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net/9p: Mount only matching virtio channels	Sven Eckelmann	2010-09-27	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	p9_virtio_create will only compare the the channel's tag characters against the device name till the end of the channel's tag but not till the end of the device name. This means that if a user defines channels with the tags foo and foobar then he would mount foo when he requested foonot and may mount foo when he requested foobar. Thus it is necessary to check both string lengths against each other in case of a successful partial string match. Signed-off-by: Sven Eckelmann <sven.eckelmann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	de2104x: fix ethtool	Ondrej Zary	2010-09-27	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the interface is up, using ethtool breaks it because: a) link is put down but media_timer interval is not shortened to NO_LINK b) rxtx is stopped but not restarted Also manual 10baseT-HD (and probably FD too - untested) mode does not work - the link is forced up, packets are transmitted but nothing is received. Changing CSR14 value to match documentation (not disabling link check) fixes this. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	Merge branch 'vhost-net' of ↵	David S. Miller	2010-09-27	1	-3/+4
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
\| \| * \| \|	vhost: fix log ctx signalling	Michael S. Tsirkin	2010-09-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log eventfd signalling got put in dead code. We didn't notice because qemu currently does polling instead of eventfd select. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
\| * \| \| \|	tproxy: check for transparent flag in ip_route_newports	Ulrich Weber	2010-09-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	as done in ip_route_connect() Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	ipv6: add IPv6 to neighbour table overflow warning	Ulrich Weber	2010-09-27	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IPv4 and IPv6 have separate neighbour tables, so the warning messages should be distinguishable. [ Add a suitable message prefix on the ipv4 side as well -DaveM ] Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	tcp: fix TSO FACK loss marking in tcp_mark_head_lost	Yuchung Cheng	2010-09-27	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When TCP uses FACK algorithm to mark lost packets in tcp_mark_head_lost(), if the number of packets in the (TSO) skb is greater than the number of packets that should be marked lost, TCP incorrectly exits the loop and marks no packets lost in the skb. This underestimates tp->lost_out and affects the recovery/retransmission. This patch fargments the skb and marks the correct amount of packets lost. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	3c59x: fix regression from patch "Add ethtool WOL support"	Jan Beulich	2010-09-27	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch (commit 690a1f2002a3091bd18a501f46c9530f10481463) added a new call site for acpi_set_WOL() without checking that the function is actually suitable to be called via vortex_set_wol+0xcd/0xe0 [3c59x] dev_ethtool+0xa5a/0xb70 dev_ioctl+0x2e0/0x4b0 T.961+0x49/0x50 sock_ioctl+0x47/0x290 do_vfs_ioctl+0x7f/0x340 sys_ioctl+0x80/0xa0 system_call_fastpath+0x16/0x1b i.e. outside of code paths run when the device is not yet enabled or already disabled. In particular, putting the device into D3hot is a pretty bad idea when it was already brought up. Furthermore, all prior callers of the function made sure they're actually dealing with a PCI device, while the newly added one didn't. In the same spirit, the .get_wol handler shouldn't indicate support for WOL for non-PCI devices. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	ipv6: add a missing unregister_pernet_subsys call	Neil Horman	2010-09-26	3	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up a missing exit path in the ipv6 module init routines. In addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys for the ipv6_addr_label_ops structure. But if module loading fails, or if the ipv6 module is removed, there is no corresponding unregister_pernet_subsys call, which leaves a now-bogus address on the pernet_list, leading to oopses in subsequent registrations. This patch cleans up both the failed load path and the unload path. Tested by myself with good results. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/net/addrconf.h \| 1 + net/ipv6/addrconf.c \| 11 ++++++++--- net/ipv6/addrlabel.c \| 5 +++++ 3 files changed, 14 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	s390: use free_netdev(netdev) instead of kfree()	Vasiliy Kulikov	2010-09-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Freeing netdev without free_netdev() leads to net, tx leaks. I might lead to dereferencing freed pointer. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) @@ struct net_device* dev; @@ -kfree(dev) +free_netdev(dev) Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	sgiseeq: use free_netdev(netdev) instead of kfree()	Kulikov Vasiliy	2010-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Freeing netdev without free_netdev() leads to net, tx leaks. I might lead to dereferencing freed pointer. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) @@ struct net_device* dev; @@ -kfree(dev) +free_netdev(dev) Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	rionet: use free_netdev(netdev) instead of kfree()	Kulikov Vasiliy	2010-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Freeing netdev without free_netdev() leads to net, tx leaks. I might lead to dereferencing freed pointer. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) @@ struct net_device* dev; @@ -kfree(dev) +free_netdev(dev) Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	ibm_newemac: use free_netdev(netdev) instead of kfree()	Kulikov Vasiliy	2010-09-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Freeing netdev without free_netdev() leads to net, tx leaks. I might lead to dereferencing freed pointer. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) @@ struct net_device* dev; @@ -kfree(dev) +free_netdev(dev) Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	smsc911x: Add MODULE_ALIAS()	Vincent Stehlé	2010-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables auto loading for the smsc911x ethernet driver. Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net: reset skb queue mapping when rx'ing over tunnel	Tom Herbert	2010-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reset queue mapping when an skb is reentering the stack via a tunnel. On second pass, the queue mapping from the original device is no longer valid. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	br2684: fix scheduling while atomic	Karl Hiramoto	2010-09-26	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	You can't call atomic_notifier_chain_unregister() while in atomic context. Fix, call un/register_atmdevice_notifier in module __init and __exit. Bug report: http://comments.gmane.org/gmane.linux.network/172603 Reported-by: Mikko Vinni <mmvinni@yahoo.com> Tested-by: Mikko Vinni <mmvinni@yahoo.com> Signed-off-by: Karl Hiramoto <karl@hiramoto.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	de2104x: fix TP link detection	Ondrej Zary	2010-09-26	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compex FreedomLine 32 PnP-PCI2 cards have only TP and BNC connectors but the SROM contains AUI port too. When TP loses link, the driver switches to non-existing AUI port (which reports that carrier is always present). Connecting TP back generates LinkPass interrupt but de_media_interrupt() is broken - it only updates the link state of currently connected media, ignoring the fact that LinkPass and LinkFail bits of MacStatus register belong to the TP port only (the chip documentation says that). This patch changes de_media_interrupt() to switch media to TP when link goes up (and media type is not locked) and also to update the link state only when the TP port is used. Also the NonselPortActive (and also SelPortActive) bits of SIAStatus register need to be cleared (by writing 1) after reading or they're useless. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	de2104x: fix power management	Ondrej Zary	2010-09-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At least my 21041 cards come out of suspend with bus mastering disabled so they did not work after resume(no data transferred). After adding pci_set_master(), the driver oopsed immediately on resume - because de_clean_rings() is called on suspend but de_init_rings() call was missing in resume. Also disable link (reset SIA) before sleep (de4x5 does this too). Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	de2104x: disable autonegotiation on broken hardware	Ondrej Zary	2010-09-24	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At least on older 21041-AA chips (mine is rev. 11), TP duplex autonegotiation causes the card not to work at all (link is up but no packets are transmitted). de4x5 disables autonegotiation completely. But it seems to work on newer (21041-PA rev. 21) so disable it only on rev<20 chips. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net: fix a lockdep splat	Eric Dumazet	2010-09-24	6	-26/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have for each socket : One spinlock (sk_slock.slock) One rwlock (sk_callback_lock) Possible scenarios are : (A) (this is used in net/sunrpc/xprtsock.c) read_lock(&sk->sk_callback_lock) (without blocking BH) <BH> spin_lock(&sk->sk_slock.slock); ... read_lock(&sk->sk_callback_lock); ... (B) write_lock_bh(&sk->sk_callback_lock) stuff write_unlock_bh(&sk->sk_callback_lock) (C) spin_lock_bh(&sk->sk_slock) ... write_lock_bh(&sk->sk_callback_lock) stuff write_unlock_bh(&sk->sk_callback_lock) spin_unlock_bh(&sk->sk_slock) This (C) case conflicts with (A) : CPU1 [A] CPU2 [C] read_lock(callback_lock) <BH> spin_lock_bh(slock) <wait to spin_lock(slock)> <wait to write_lock_bh(callback_lock)> We have one problematic (C) use case in inet_csk_listen_stop() : local_bh_disable(); bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock) WARN_ON(sock_owned_by_user(child)); ... sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock) lockdep is not happy with this, as reported by Tetsuo Handa It seems only way to deal with this is to use read_lock_bh(callbacklock) everywhere. Thanks to Jarek for pointing a bug in my first attempt and suggesting this solution. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82579 do not gate auto config of PHY by hardware during nominal use	Bruce Allan	2010-09-22	1	-9/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For non-managed versions of 82579, set the bit that prevents the hardware from automatically configuring the PHY after resets only when the driver performs a reset, clear the bit after resets. This is so the hardware can configure the PHY automatically when the part is reset in a manner that is not controlled by the driver (e.g. in a virtual environment via PCI FLR) otherwise the PHY will be mis-configured causing issues such as failing to link at 1000Mbps. For managed versions of 82579, keep the previous behavior since the manageability firmware will handle the PHY configuration. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82579 jumbo frame workaround causing CRC errors	Bruce Allan	2010-09-22	2	-21/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The subject workaround was causing CRC errors due to writing the wrong register with updates of the RCTL register. It was also found that the workaround function which modifies the RCTL register was being called in the middle of a read-modify-write operation of the RCTL register, so the function call has been moved appropriately. Lastly, jumbo frames must not be allowed when CRC stripping is disabled by a module parameter because the workaround requires the CRC be stripped. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82579 unaccounted missed packets	Bruce Allan	2010-09-22	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 82579, there is a hardware bug that can cause received packets to not get transferred from the PHY to the MAC due to K1 (a power saving feature of the PHY-MAC interconnect similar to ASPM L1). Since the MAC controls the accounting of missed packets, these will go unnoticed. Workaround the issue by setting the K1 beacon duration according to the link speed. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82566DC fails to get link	Bruce Allan	2010-09-22	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two recent patches to cleanup the reset[1] and initial PHY configuration[2] code paths for ICH/PCH devices inadvertently left out a 10msec delay and device ID check respectively which are necessary for the 82566DC (device id 0x104b) to be configured properly, otherwise it will not get link. [1] commit e98cac447cc1cc418dff1d610a5c79c4f2bdec7f [2] commit 3f0c16e84438d657d29446f85fe375794a93f159 CC: stable@kernel.org Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82579 SMBus address and LEDs incorrect after device reset	Bruce Allan	2010-09-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the hardware is prevented from performing automatic PHY configuration (the driver does it instead), the OEM_WRITE_ENABLE bit in the EXTCNF_CTRL register will not get cleared preventing the SMBus address and the LED configuration to be written to the PHY registers. On 82579, do not check the OEM_WRITE_ENABLE bit. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	e1000e: 82577/8/9 issues with device in Sx	Bruce Allan	2010-09-22	1	-8/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When going to Sx, disable gigabit in PHY (e1000_oem_bits_config_ich8lan) in addition to the MAC before configuring PHY wakeup otherwise the PHY configuration writes might be missed. Also write the LED configuration and SMBus address to the PHY registers (e1000_oem_bits_config_ich8lan and e1000_write_smbus_addr, respectively). The reset is no longer needed since re-auto-negotiation is forced in e1000_oem_bits_config_ich8lan and leaving it in causes issues with auto-negotiating the link. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	xfrm4: strip ECN bits from tos field	Ulrich Weber	2010-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	otherwise ECT(1) bit will get interpreted as RTO_ONLINK and routing will fail with XfrmOutBundleGenError. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>