op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary	Theodore Ts'o	2011-01-10	7	-25/+74
\| \| \| \| \| \| \| \| \| \|	Replace the jbd2_inode structure (which is 48 bytes) with a pointer and only allocate the jbd2_inode when it is needed --- that is, when the file system has a journal present and the inode has been opened for writing. This allows us to further slim down the ext4_inode_info structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: drop i_state_flags on architectures with 64-bit longs	Theodore Ts'o	2011-01-10	3	-9/+25
\| \| \| \| \| \| \| \| \| \|	We can store the dynamic inode state flags in the high bits of EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8 bytes from the size of ext4_inode_info structure, which when multiplied by the number of the number of in the inode cache, can save a lot of memory. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: reorder ext4_inode_info structure elements to remove unneeded padding	Theodore Ts'o	2011-01-10	1	-3/+4
\| \| \| \| \| \| \|	By reordering the elements in the ext4_inode_info structure, we can reduce the padding needed on an x86_64 system by 16 bytes. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: drop ec_type from the ext4_ext_cache structure	Theodore Ts'o	2011-01-10	3	-28/+18
\| \| \| \| \| \| \| \| \| \| \|	We can encode the ec_type information by using ee_len == 0 to denote EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT. This allows us to reduce the size of ext4_ext_inode by another 8 bytes. (ec_type is 4 bytes, plus another 4 bytes of padding) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: use ext4_lblk_t instead of sector_t for logical blocks	Theodore Ts'o	2011-01-10	4	-5/+5
\| \| \| \| \| \| \| \| \| \|	This fixes a number of places where we used sector_t instead of ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data types. No point wasting space in the ext4_inode_info structure, and requiring 64-bit arithmetic on 32-bit systems, when it isn't necessary. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED	Theodore Ts'o	2011-01-10	5	-8/+9
\| \| \| \| \| \| \| \| \| \|	Remove the short element i_delalloc_reserved_flag from the ext4_inode_info structure and replace it a new bit in i_state_flags. Since we have an ext4_inode_info for every ext4 inode cached in the inode cache, any savings we can produce here is a very good thing from a memory utilization perspective. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix 32bit overflow in ext4_ext_find_goal()	Kazuya Mio	2011-01-10	1	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_ext_find_goal() returns an ideal physical block number that the block allocator tries to allocate first. However, if a required file offset is smaller than the existing extent's one, ext4_ext_find_goal() returns a wrong block number because it may overflow at "block - le32_to_cpu(ex->ee_block)". This patch fixes the problem. ext4_ext_find_goal() will also return a wrong block number in case a file offset of the existing extent is too big. In this case, the ideal physical block number is fixed in ext4_mb_initialize_context(), so it's no problem. reproduce: # dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 128 67456 128 eof /mnt/mp1/file: 2 extents found # rm -rf /mnt/mp1/tmp # echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc result (linux-2.6.37-rc2 + ext4 patch queue): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 33280 128 1 128 67456 33407 128 eof /mnt/mp1/file: 2 extents found result(apply this patch): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 66560 128 1 128 67456 66687 128 eof /mnt/mp1/file: 2 extents found Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: add more error checks to ext4_mkdir()	Namhyung Kim	2011-01-10	1	-7/+14
\| \| \| \| \| \| \| \| \|	Check return value of ext4_journal_get_write_access, ext4_journal_dirty_metadata and ext4_mark_inode_dirty. Move brelse() under 'out_stop' to release bh properly in case of journal error. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: ext4_ext_migrate should use NULL not 0	Eric Paris	2011-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	ext4_ext_migrate() calls ext4_new_inode() and passes 0 instead of a pointer to a struct qstr. This patch uses NULL, to make it obvious to the caller that this was a pointer. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use ext4_error_file() to print the pathname to the corrupted inode	Theodore Ts'o	2011-01-10	4	-34/+49
\| \| \| \| \| \| \|	Where the file pointer is available, use ext4_error_file() instead of ext4_error_inode(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: use IS_ERR() to check for errors in ext4_error_file	Dan Carpenter	2011-01-10	1	-1/+1
\| \| \| \| \| \| \| \|	d_path() returns an ERR_PTR and it doesn't return NULL. This is in ext4_error_file() and no one actually calls ext4_error_file(). Signed-off-by: Dan Carpenter <error27@gmail.com>
*	ext4: test the correct variable in ext4_init_pageio()	Dan Carpenter	2011-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is a copy and paste error. The intent was to check "io_page_cachep". We tested "io_page_cachep" earlier. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext2: remove dead code in ext2_xattr_get	Wang Sheng-Hui	2011-01-10	1	-8/+0
\| \| \| \| \| \|	Reviewed-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext2,ext3,ext4: clarify comment for extN_xattr_set_handle	Wang Sheng-Hui	2011-01-10	3	-3/+3
\| \| \| \| \|	Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: clean up ext4_xattr_list()'s error code checking and return strategy	Theodore Ts'o	2011-01-10	1	-13/+13
\| \| \| \| \| \| \|	Any time you see code that tries to add error codes together, you should want to claw your eyes out... Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: remove warning message from ext4_issue_discard helper	Lukas Czerner	2011-01-10	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_issue_discard is supposed to be helper for calling discard, however in case that underlying device does not support discard it prints out the warning message and clears the DISCARD t_mount_opt flag. Since it can be (and is) used by others, it should not do anything and let the caller to handle the error case. This commit removes warning message and flag setting from ext4_issue_discard and use it just in place where it is really needed (release_blocks_on_commit). FITRIM ioctl should not set any flags nor it should print out warning messages, so get rid of the warning as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
*	ext4: fix possible overflow in ext4_trim_fs()	Lukas Czerner	2011-01-10	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	When determining last group through ext4_get_group_no_and_offset() the result may be wrong in cases when range->start and range-len are too big, because it may overflow when summing up those two numbers. Fix that by checking range->len and limit its value to ext4_blocks_count(). This commit was tested by myself with expected result. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
*	ext4: Add error checking to kmem_cache_alloc() call in ext4_free_blocks()	Theodore Ts'o	2010-12-20	1	-1/+5
\| \| \| \|	Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use printf extension %pV	Joe Perches	2010-12-19	1	-17/+23
\| \| \| \| \| \| \| \| \| \| \|	Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. In function __ext4_grp_locked_error also added KERN_CONT to some printks. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use vzalloc in ext4_fill_flex_info()	Joe Perches	2010-12-19	1	-8/+7
\| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: zero out nanosecond timestamps for small inodes	Eric Sandeen	2010-12-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	When nanosecond timestamp resolution isn't supported on an ext4 partition (inode size = 128), stat() appears to be returning uninitialized garbage in the nanosecond component of timestamps. EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE evaluates to false. Reported-by: Jordan Russell <jr-list-2010@quo.to> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: optimize ext4_check_dir_entry() with unlikely() annotations	Theodore Ts'o	2010-12-19	3	-25/+32
\| \| \| \| \| \| \| \|	This function gets called a lot for large directories, and the answer is almost always "no, no, there's no problem". This means using unlikely() is a good thing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: use kmem_cache_zalloc() in ext4_init_io_end()	Jesper Juhl	2010-12-19	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use advantage of kmem_cache_zalloc() to remove a memset() call in ext4_init_io_end() and save a few bytes. Before: [jj@dragon linux-2.6]$ size fs/ext4/page-io.o text data bss dec hex filename 3016 0 624 3640 e38 fs/ext4/page-io.o After: [jj@dragon linux-2.6]$ size fs/ext4/page-io.o text data bss dec hex filename 3000 0 624 3624 e28 fs/ext4/page-io.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Remove redundant unlikely()	Tobias Klauser	2010-12-19	1	-1/+1
\| \| \| \| \| \| \|	IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: simplify return path of journal_init_common	Theodore Ts'o	2010-12-18	1	-4/+2
\| \| \| \|	Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: move debug message into debug #ifdef	Theodore Ts'o	2010-12-18	1	-1/+1
\| \| \| \| \| \| \|	This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: remove unnecessary goto statement	Theodore Ts'o	2010-12-18	1	-2/+0
\| \| \| \| \| \| \|	This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: use offset_in_page() instead of manual calculation	Theodore Ts'o	2010-12-18	1	-1/+1
\| \| \| \| \| \| \|	This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: Fix a debug message in do_get_write_access()	Theodore Ts'o	2010-12-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	'buffer_head' should be 'journal_head' This is a port of a patch which Namhyung Kim <namhyung@gmail.com> made to fs/jbd to jbd2. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: Use pr_notice_ratelimited() in journal_alloc_journal_head()	Theodore Ts'o	2010-12-17	1	-6/+2
\| \| \| \| \| \| \| \| \|	We had an open-coded version of printk_ratelimited(); use the provided abstraction to make the code cleaner and easier to understand. Based on a similar patch for fs/jbd from Namhyung Kim <namhyung@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use pr_warning_ratelimited() instead of printk_ratelimit()	Theodore Ts'o	2010-12-17	1	-2/+2
\| \| \| \| \| \| \|	printk_ratelimit() is deprecated since it is a global instead of a per-printk ratelimit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix up comments in inode.c	Theodore Ts'o	2010-12-16	1	-4/+13
\| \| \| \| \| \| \| \|	This fixes up some broken argument descriptions that Namhyung Kim had originally submitted for ext3. This fixes the comments that were still applicable in ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add second mount options field since the s_mount_opt is full up	Theodore Ts'o	2010-12-15	2	-2/+13
\| \| \| \|	Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Move struct ext4_mount_options from ext4.h to super.c	Theodore Ts'o	2010-12-15	2	-16/+15
\| \| \| \| \| \| \|	Move the ext4_mount_options structure definition from ext4.h, since it is only used in super.c. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Simplify the usage of clear_opt() and set_opt() macros	Theodore Ts'o	2010-12-15	3	-84/+86
\| \| \| \| \| \| \| \|	Change clear_opt() and set_opt() to take a superblock pointer instead of a pointer to EXT4_SB(sb)->s_mount_opt. This makes it easier for us to support a second mount option field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Linux 2.6.37-rc6v2.6.37-rc6	Linus Torvalds	2010-12-15	1	-1/+1
\|
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	2010-12-15	1	-0/+1
\|\ \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
\| *	crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h	Randy Dunlap	2010-12-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add missing header file: arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR' arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2010-12-15	4	-4/+18
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix typo which broke '..' detection in ext4_find_entry() ext4: Turn off multiple page-io submission by default
\| * \|	ext4: fix typo which broke '..' detection in ext4_find_entry()	Aaro Koskinen	2010-12-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There should be a check for the NUL character instead of '0'. Fortunately the only thing that cares about this is NFS serving, which is why we didn't notice this in the merge window testing. Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \|	ext4: Turn off multiple page-io submission by default	Theodore Ts'o	2010-12-14	3	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: Jon Nelson <jnelson@jamponi.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* \| \|	xen: Provide a variant of __RING_SIZE() that is an integer constant expression	Jeremy Fitzhardinge	2010-12-15	3	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, gcc 4.5 won't compile xen-netfront and xen-blkfront, where this is being used to specify array sizes. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Miller <davem@davemloft.net> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	MAINTAINERS: update MSM git tree	Daniel Walker	2010-12-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MSM main git tree has changed over to this new address. Signed-off-by: Daniel Walker <dwalker@codeaurora.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	install_special_mapping skips security_file_mmap check.	Tavis Ormandy	2010-12-15	2	-4/+17
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: Tavis Ormandy <taviso@google.com> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Robert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds	2010-12-14	2	-3/+13
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: It is likely that WORKER_NOT_RUNNING is true MAINTAINERS: Add workqueue entry workqueue: check the allocation of system_unbound_wq
\| * \|	workqueue: It is likely that WORKER_NOT_RUNNING is true	Steven Rostedt	2010-12-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running the annotate branch profiler on three boxes, including my main box that runs firefox, evolution, xchat, and is part of the distcc farm, showed this with the likelys in the workqueue code: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 96 996253 99 wq_worker_sleeping workqueue.c 703 96 996247 99 wq_worker_waking_up workqueue.c 677 The likely()s in this case were assuming that WORKER_NOT_RUNNING will most likely be false. But this is not the case. The reason is (and shown by adding trace_printks and testing it) that most of the time WORKER_PREP is set. In worker_thread() we have: worker_clr_flags(worker, WORKER_PREP); [ do work stuff ] worker_set_flags(worker, WORKER_PREP, false); (that 'false' means not to wake up an idle worker) The wq_worker_sleeping() is called from schedule when a worker thread is putting itself to sleep. Which happens most of the time outside of that [ do work stuff ]. The wq_worker_waking_up is called by the wakeup worker code, which is also callod outside that [ do work stuff ]. Thus, the likely and unlikely used by those two functions are actually backwards. Remove the annotation and let gcc figure it out. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Tejun Heo <tj@kernel.org>
\| * \|	MAINTAINERS: Add workqueue entry	Tejun Heo	2010-12-14	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Tejun Heo <tj@kernel.org>
\| * \|	workqueue: check the allocation of system_unbound_wq	Hitoshi Mitake	2010-11-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found a trivial bug on initialization of workqueue. Current init_workqueues doesn't check the result of allocation of system_unbound_wq, this should be checked like other queues. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* \| \|	Merge branch 'for-linus' of git://neil.brown.name/md	Linus Torvalds	2010-12-14	2	-20/+21
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://neil.brown.name/md: md: protect against NULL reference when waiting to start a raid10. md: fix bug with re-adding of partially recovered device. md: fix possible deadlock in handling flush requests. md: move code in to submit_flushes. md: remove handling of flush_pending in md_submit_flush_data
\| * \| \|	md: protect against NULL reference when waiting to start a raid10.	NeilBrown	2010-12-09	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we fail to start a raid10 for some reason, we call md_unregister_thread to kill the thread that was created. Unfortunately md_thread() will then make one call into the handler (raid10d) even though md_wakeup_thread has not been called. This is not safe and as md_unregister_thread is called after mddev->private has been set to NULL, it will definitely cause a NULL dereference. So fix this at both ends: - md_thread should only call the handler if THREAD_WAKEUP has been set. - raid10 should call md_unregister_thread before setting things to NULL just like all the other raid modules do. This is applicable to 2.6.35 and later. Cc: stable@kernel.org Reported-by: "Citizen" <citizen_lee@thecus.com> Signed-off-by: NeilBrown <neilb@suse.de>