op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	fix setpriority(PRIO_PGRP) thread iterator breakage	Ken Chen	2008-08-20	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP process, only the task leader of the process is affected, all other sibling LWP threads didn't receive the setting. The problem was that the iterator used in sys_setpriority() only iteartes over one task for each process, ignoring all other sibling thread. Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk each thread of a process. Convert 4 call sites in {set/get}priority and ioprio_{set/get}. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	binfmt_misc: fix false -ENOEXEC when coupled with other binary handlers	Pavel Emelyanov	2008-08-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case the binfmt_misc binary handler is registered before the e.g. script one (when for example being compiled as a module) the following situation may occur: 1. user launches a script, whose interpreter is a misc binary; 2. the load_misc_binary sets the misc_bang and returns -ENOEVEC, since the binary is a script; 3. the load_script_binary loads one and calls for search_binary_hander to run the interpreter; 4. the load_misc_binary is called again, but refuses to load the binary due to misc_bang bit set. The fix is to move the misc_bang setting lower - prior to the actual call to the search_binary_handler. Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel stack overflow) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Tested-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	/proc/self/maps doesn't display the real file offset	Clement Calmels	2008-08-20	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses http://bugzilla.kernel.org/show_bug.cgi?id=11318 In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20 than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE equal to 4096 (i.e. 2^12). The next seq_printf use an unsigned long for the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset value displayed in /proc/self/maps is truncated if the page offset is greater than 2^20. A test that shows this issue: #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <stdlib.h> #include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #define PAGE_SIZE (getpagesize()) #if __i386__ # define U64_STR "%llx" #elif __x86_64 # define U64_STR "%lx" #else # error "Architecture Unsupported" #endif int main(int argc, char argv[]) { int fd; char addr; off64_t offset = 0x10000000; char filename = "/dev/zero"; fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return 1; } offset = 0x10; printf("offset = " U64_STR "\n", offset); addr = (char)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, offset); if ((void)addr == MAP_FAILED) { perror("mmap64"); return 1; } { FILE fmaps; char line = NULL; size_t len = 0; ssize_t read; size_t filename_len = strlen(filename); fmaps = fopen("/proc/self/maps", "r"); if (!fmaps) { perror("fopen"); return 1; } while ((read = getline(&line, &len, fmaps)) != -1) { if ((read > filename_len + 1) && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0)) printf("%s", line); } if (line) free(line); fclose(fmaps); } close(fd); return 0; } [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Clement Calmels <cboulte@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'sh/for-2.6.27' of ↵	Linus Torvalds	2008-08-20	1	-1/+3
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Provide a FLAT_PLAT_INIT() definition. binfmt_flat: Stub in a FLAT_PLAT_INIT(). video: export sh_mobile_lcdc panel size sh: select memchunk size using kernel cmdline sh: export sh7723 VEU as VEU2H input: migor_ts compile and detection fix sh: remove MSTPCR defines from Migo-R header file sh: Update sh7763rdp defconfig sh: Add support sh7760fb to sh7763rdp board sh: Add support sh_eth to sh7763rdp board sh: Disable 64kB hugetlbpage size when using 64kB PAGE_SIZE. sh: Don't export __{s,u}divsi3_i4i from SH-2 libgcc. fix SH7705_CACHE_32KB compilation sh: mach-x3proto: Fix up smc91x platform data.
\| *	binfmt_flat: Stub in a FLAT_PLAT_INIT().	Takashi YOSHII	2008-08-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides a FLAT_PLAT_INIT() arch hook for platforms that need to set up specific register state prior to calling in to the process, as per ELF_PLAT_INIT(). Signed-off-by: Takashi YOSHII <yoshii.takashi@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* \|	vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion	Linus Torvalds	2008-08-20	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was another FAT BKL conversion deadlock reported by Bart Trojanowski due to the BKL being used as a recursive lock by FAT, which was missed because it only triggers with 'sync' (or 'dirsync') mounts. The recursion worked for the BKL, but after the conversion to lock_super (which uses a mutex), it just deadlocks. Thanks to Bart for debugging this and testing the fix. The lock debugging information from the original report: ============================================= [ INFO: possible recursive locking detected ] 2.6.27-rc3-bisect-00448-ga7f5aaf #16 --------------------------------------------- mv/4020 is trying to acquire lock: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 but task is already holding lock: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 other info that might help us debug this: 3 locks held by mv/4020: #0: (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140 #1: (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110 #2: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20 stack backtrace: Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16 [<c014e694>] validate_chain+0x984/0xea0 [<c0108d70>] ? native_sched_clock+0x0/0xf0 [<c014ee9c>] __lock_acquire+0x2ec/0x9b0 [<c014f5cf>] lock_acquire+0x6f/0x90 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c044e5fd>] mutex_lock_nested+0xad/0x300 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c01a90fe>] ? lock_super+0x1e/0x20 [<c01a90fe>] lock_super+0x1e/0x20 [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat] [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80 [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat] [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat] [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat] [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat] [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat] [<c01b0968>] vfs_unlink+0x98/0x110 [<c01b2400>] do_unlinkat+0x130/0x140 [<c016a8f5>] ? audit_syscall_entry+0x105/0x150 [<c01b253b>] sys_unlinkat+0x3b/0x40 [<c01040d3>] sysenter_do_call+0x12/0x3f ======================= where the deadlock is due to the nesting of lock_super from vfat_unlink to fat_write_inode: - do_unlinkat - vfs_unlink - vfat_unlink * lock_super - fat_remove_entries - fat_sync_inode - fat_write_inode * lock_super and the fix is to simply remove the use of lock_super() in fat_write_inode. The lock_super() there had been just an automatic conversion of the kernel lock to the superblock lock, but no locking was actually needed there, since the code in fat_write_inode already protected all relevant accesses with a spinlock (sbi->inode_hash_lock to be exact). The only code inside the BKL (and thus the superblock lock) was accesses tp local variables or calls to functions that have long been SMP-safe (i.e. sb_bread, mark_buffe_dirty and brlese). Bart reports: "Looks good. I ran 10 parallel processes creating 1M files truncating them, writing to them again and then deleting them. This patch fixes the issue I ran into. Signed-off-by: Bart Trojanowski <bart@jukie.net>" Reported-and-tested-by: Bart Trojanowski <bart@jukie.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds	2008-08-15	2	-0/+3
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] mount of IPC$ breaks with iget patch [CIFS] remove trailing whitespace [CIFS] if get root inode fails during mount, cleanup tree connection
\| * \|	[CIFS] mount of IPC$ breaks with iget patch	Steve French	2008-08-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In looking at network named pipe support on cifs, I noticed that Dave Howell's iget patch: iget: stop CIFS from using iget() and read_inode() broke mounts to IPC$ (the interprocess communication share), and don't handle the error case (when getting info on the root inode fails). Thanks to Gunter who noted a typo in a debug line in the original version of this patch. CC: David Howells <dhowells@redhat.com> CC: Gunter Kukkukk <linux@kukkukk.com> CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \|	[CIFS] remove trailing whitespace	Steve French	2008-08-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \|	[CIFS] if get root inode fails during mount, cleanup tree connection	Steve French	2008-08-11	1	-0/+2
\| \|/ \| \| \| \| \| \| \| \|	CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
* \|	Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6	Linus Torvalds	2008-08-15	17	-243/+328
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6: (29 commits) UBIFS: xattr bugfixes UBIFS: remove unneeded check UBIFS: few commentary fixes UBIFS: fix budgeting request alignment in xattr code UBIFS: improve arguments checking in debugging messages UBIFS: always set i_generation to 0 UBIFS: correct spelling of "thrice". UBIFS: support splice_write UBIFS: minor tweaks in commit UBIFS: reserve more space for index UBIFS: print pid in dump function UBIFS: align inode data to eight UBIFS: improve budgeting checks UBIFS: correct orphan deletion order UBIFS: fix typos in comments UBIFS: do not union creat_sqnum and del_cmtno UBIFS: optimize deletions UBIFS: increment commit number earlier UBIFS: remove another unneeded function parameter UBIFS: remove unneeded function parameter ...
\| * \|	UBIFS: xattr bugfixes	Artem Bityutskiy	2008-08-14	2	-28/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Xattr code has not been tested for a while and there were serveral bugs. One of them is using wrong inode in 'ubifs_jnl_change_xattr()'. The other is a deadlock in 'ubifs_setxattr()': the i_mutex is locked in 'cap_inode_need_killpriv()' path, so deadlock happens when 'ubifs_setxattr()' tries to lock it again. Thanks to Zoltan Sogor for finding these bugs. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: remove unneeded check	Artem Bityutskiy	2008-08-13	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d70b67c8bc72ee23b55381bd6a884f4796692f77 fixed VFS and it never calls FS lookup function in deleted directories now. We may remove corresponding UBIFS check. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: few commentary fixes	Artem Bityutskiy	2008-08-13	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: fix budgeting request alignment in xattr code	Zoltan Sogor	2008-08-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Data length has to be aligned in the budgeting request. Code in xattr.c did not do this. Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: improve arguments checking in debugging messages	Artem Bityutskiy	2008-08-13	1	-74/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use "if (0) printk()" construct in debugging print macros to make the debugging messages be checked even if debugging is off. This patch also removes some unneeded spaces and blank lines. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: always set i_generation to 0	Adrian Hunter	2008-08-13	3	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UBIFS does not presently re-use inode numbers, so leaving i_generation zero is most appropriate for now. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: correct spelling of "thrice".	Adrian Hunter	2008-08-13	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: support splice_write	Zoltan Sogor	2008-08-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: minor tweaks in commit	Artem Bityutskiy	2008-08-13	1	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No functional changes, just lessen the amount of indentations. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: reserve more space for index	Artem Bityutskiy	2008-08-13	4	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment UBIFS reserves twice old index size space for the index. But this is not enough in some cases, because if the indexing node are very fragmented and there are many small gaps, while the dirty index has big znodes - in-the-gaps method would fail. Thus, reserve trise as more, in which case we are guaranteed that we can commit in any case. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: print pid in dump function	Artem Bityutskiy	2008-08-13	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Useful when something fails and there are many processes racing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: align inode data to eight	Artem Bityutskiy	2008-08-13	5	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UBIFS aligns node lengths to 8, so budgeting has to do the same. Well, direntry, inode, and page budgets are already aligned, but not inode data budget (e.g., data in special devices or symlinks). Do this for inode data as well. Also, add corresponding debugging checks. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: improve budgeting checks	Artem Bityutskiy	2008-08-13	2	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Budgeting is a crucial UBIFS subsystem - add more assertions to improve requests checking. This is not compiled in when UBIFS debugging is disabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: correct orphan deletion order	Adrian Hunter	2008-08-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The debug function that checks orphans, does so using the TNC mutex. That means it will not see a correct picture if the inode is removed from the orphan tree before it is removed from TNC. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: fix typos in comments	Adrian Hunter	2008-08-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: do not union creat_sqnum and del_cmtno	Adrian Hunter	2008-08-13	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The values in these two fields need to be preserved independently and so a union cannot be used. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: optimize deletions	Artem Bityutskiy	2008-08-13	3	-5/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every time anything is deleted, UBIFS writes the deletion inode node twice - once in 'ubifs_jnl_update()' and the second time in 'ubifs_jnl_write_inode()'. However, the second write is not needed if no commit happened after 'ubifs_jnl_update()'. This patch checks that condition and avoids writing the deletion inode for the second time. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: increment commit number earlier	Artem Bityutskiy	2008-08-13	4	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increment the commit number at the beginnig of the commit, instead of doing this after the commit. This is needed for further optimizations. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: remove another unneeded function parameter	Artem Bityutskiy	2008-08-13	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'last_reference' parameter of 'pack_inode()' is not really needed because 'inode->i_nlink' may be tested instead. Zap it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: remove unneeded function parameter	Artem Bityutskiy	2008-08-13	3	-16/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify 'ubifs_jnl_write_inode()' by removing the 'deletion' parameter which is not really needed because we may test inode->i_nlink and check whether this is a deletion or not. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: do not write orphans back	Artem Bityutskiy	2008-08-13	1	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Orphan inodes are deleted inodes which will disappear after FS re-mount. There is not need to write orphan inodes back, because they are not needed on the flash media. So optimize orphans a little by not writing them back. Just mark them as clean, free the budget, and report success to VFS. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: make ubifs_ro_mode() not inline	Adrian Hunter	2008-08-13	3	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use ubifs_ro_mode() quite a lot, and not in fast-path, so there is no reason to blow the code up by having it inlined. Also, we usually want R/O mode change to be seen to other CPUs as soon as possible, so when we make this a function call, we will automatically have a memory barrier. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: ensure UBIFS switches to read-only on error	Adrian Hunter	2008-08-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UBI transparently handles write errors by automatically copying and remapping the affected eraseblock. If UBI is unable to do that, for example its pool of eraseblocks reserved for bad block handling is empty, then the error is propagated to UBIFS. UBIFS must protect the media from falling into an inconsistent state by immediately switching to read-only mode. In the case of log updates, this was not being done. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: fix error return in failure mode	Adrian Hunter	2008-08-13	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UBIFS recovery testing debug facility simulates media failures. When simulating an IO error, the error code returned must be -EIO but it was not always if the user switched off the debug recovery testing option at the same time. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
\| * \|	UBIFS: free budget in delete_inode as well	Artem Bityutskiy	2008-08-13	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although the inode is marked as clean when it is being deleted, it might stay and be used as orphan, and be marked as dirty. So we have to free the budget when we delete it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: improve debugging	Artem Bityutskiy	2008-08-13	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Print inode mode in some of debugging messages 2. Add few more useful assertions Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: fix budgeting calculations	Artem Bityutskiy	2008-08-13	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'ubifs_release_dirty_inode_budget()' was buggy and incorrectly freed the budget, which led to not freeing all dirty data budget. This patch fixes that. Also, this patch fixes ubifs_mkdir() which passed 1 in dirty_ino_d, which makes no sense. Well, it is harmless though. Also, add few more useful assertions. And improve few debugging messages. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBIFS: print volume name as well	Artem Bityutskiy	2008-08-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We encouredge people to mount using volume name, not device numbers. So print the name of the mounted UBI volume, not just IDs. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* \| \|	omfs: fix oops when file metadata is corrupted	Bob Copeland	2008-08-15	2	-9/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A fuzzed fileystem image failed with OMFS when the extent count was used in a loop without being checked against the max number of extents. It also provoked a signed division for an array index that was checked as if unsigned, leading to index by -1. omfsck will be updated to fix these cases, in the meantime bail out gracefully. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	omfs: fix potential oops when directory size is corrupted	Bob Copeland	2008-08-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Testing with a modified fsfuzzer reveals a couple of locations in omfs where filesystem variables are ultimately used as loop counters with insufficient sanity checking. In this case, dir->i_size is used to compute the number of buckets in the directory hash. If too large, readdir will overrun a buffer. Since it's an invariant that dir->i_size is equal to the sysblock size, and we already sanity check that, just use that value instead. This fixes the following oops: BUG: unable to handle kernel paging request at c978e004 IP: [<c032298e>] omfs_readdir+0x18e/0x32f Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC Modules linked in: Pid: 4796, comm: ls Not tainted (2.6.27-rc2 #12) EIP: 0060:[<c032298e>] EFLAGS: 00010287 CPU: 0 EIP is at omfs_readdir+0x18e/0x32f EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100 ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process ls (pid: 4796, ti=cb2d3000 task=cb175f40 task.ti=cb2d3000) Stack: 00000002 00000000 00000000 c018a820 cb2d3f94 cb2cf100 cbfb0000 ffffff10 cbfb3b80 cbfcfaf8 000001c9 00000a09 00000000 00000000 00000000 cbfcfbc8 c9697000 cbfb3b80 22222222 00001000 c08e6cd0 cb2cf100 cbfb3b80 cb2d3f88 Call Trace: [<c018a820>] ? filldir64+0x0/0xcd [<c018a9f2>] ? vfs_readdir+0x56/0x82 [<c018a820>] ? filldir64+0x0/0xcd [<c018aa7c>] ? sys_getdents64+0x5e/0xa0 [<c01038bd>] ? sysenter_do_call+0x12/0x31 ======================= Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d 14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b 41 18 <8b> 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9 Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	fs/inode.c: properly init address_space->writeback_index	Chris Mason	2008-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	write_cache_pages() uses i_mapping->writeback_index to pick up where it left off the last time a given inode was found by pdflush or balance_dirty_pages (or anyone else who sets wbc->range_cyclic) alloc_inode() should set it to a sane value so that writeback doesn't start in the middle of a file. It is somewhat difficult to notice the bug since write_cache_pages will loop around to the start of the file and the elevator helps hide the resulting seeks. For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of the linux kernel in series runs at 47MB/s on a single sata drive. With this fix, it jumps to 62MB/s. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	CRED: Introduce credential access wrappers	David Howells	2008-08-14	3	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patches that are intended to introduce copy-on-write credentials for 2.6.28 require abstraction of access to some fields of the task structure, particularly for the case of one task accessing another's credentials where RCU will have to be observed. Introduced here are trivial no-op versions of the desired accessors for current and other tasks so that other subsystems can start to be converted over more easily. Wrappers are introduced into a new header (linux/cred.h) for UID/GID, EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed user_struct. These wrappers are macros because the ordering between header files mitigates against making them inline functions. linux/cred.h is #included from linux/sched.h. Further, XFS is modified such that it no longer defines and uses parameterised versions of current_fs[ug]id(), thus getting rid of the namespace collision otherwise incurred. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* \| \|	Merge git://oss.sgi.com:8090/xfs/linux-2.6	Linus Torvalds	2008-08-13	61	-1326/+838
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits) [XFS] Fix use after free in xfs_log_done(). [XFS] Make xfs_bmap__count_leaves void. [XFS] Use KM_NOFS for debug trace buffers [XFS] use KM_MAYFAIL in xfs_mountfs [XFS] refactor xfs_mount_free [XFS] don't call xfs_freesb from xfs_unmountfs [XFS] xfs_unmountfs should return void [XFS] cleanup xfs_mountfs [XFS] move root inode IRELE into xfs_unmountfs [XFS] stop using file_update_time [XFS] optimize xfs_ichgtime [XFS] update timestamp in xfs_ialloc manually [XFS] remove the sema_t from XFS. [XFS] replace dquot flush semaphore with a completion [XFS] replace inode flush semaphore with a completion [XFS] extend completions to provide XFS object flush requirements [XFS] replace the XFS buf iodone semaphore with a completion [XFS] clean up stale references to semaphores [XFS] use get_unaligned_ helpers [XFS] Fix compile failure in xfs_buf_trace() ...
\| * \| \|	[XFS] Fix use after free in xfs_log_done().	Lachlan McIlroy	2008-08-13	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ticket allocation code got reworked in 2.6.26 and we now free tickets whereas before we used to cache them so the use-after-free went undetected. SGI-PV: 985525 SGI-Modid: xfs-linux-melb:xfs-kern:31877a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
\| * \| \|	[XFS] Make xfs_bmap_*_count_leaves void.	Ruben Porras	2008-08-13	1	-19/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always 0, make them void. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31844a Signed-off-by: Ruben Porras <ruben.porras@linworks.de> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \| \|	[XFS] Use KM_NOFS for debug trace buffers	Lachlan McIlroy	2008-08-13	6	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use KM_NOFS to prevent recursion back into the filesystem which can cause deadlocks. In the case of xfs_iread() we hold the lock on the inode cluster buffer while allocating memory for the trace buffers. If we recurse back into XFS to flush data that may require a transaction to allocate extents which needs log space. This can deadlock with the xfsaild thread which can't push the tail of the log because it is trying to get the inode cluster buffer lock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31838a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
\| * \| \|	[XFS] use KM_MAYFAIL in xfs_mountfs	Christoph Hellwig	2008-08-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use KM_MAYFAIL for the m_perag allocation, we can deal with the error easily and blocking forever during mount is not a good idea either. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31837a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \| \|	[XFS] refactor xfs_mount_free	Christoph Hellwig	2008-08-13	1	-16/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_mount_free mostly frees the perag data, which is something that is duplicated in the mount error path. Move the XFS_QM_DONE call to the caller and remove the useless mutex_destroy/spinlock_destroy calls so that we can re-use it for the mount error path. Also rename it to xfs_free_perag to reflect what it does. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31836a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \| \|	[XFS] don't call xfs_freesb from xfs_unmountfs	Christoph Hellwig	2008-08-13	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_readsb is called before xfs_mount so xfs_freesb should be called after xfs_unmountfs, too. This means it now happens after a few things during the of xfs_unmount which all have nothing to do with the superblock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31835a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>