op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PATCH] fs: error case fix in __generic_file_aio_read	Tejun Heo	2005-10-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When __generic_file_aio_read() hits an error during reading, it reports the error iff nothing has successfully been read yet. This is condition - when an error occurs, if nothing has been read/written, report the error code; otherwise, report the amount of bytes successfully transferred upto that point. This corner case can be exposed by performing readv(2) with the following iov. iov[0] = len0 @ ptr0 iov[1] = len1 @ NULL (or any other invalid pointer) iov[2] = len2 @ ptr2 When file size is enough, performing above readv(2) results in len0 bytes from file_pos @ ptr0 len2 bytes from file_pos + len0 @ ptr2 And the return value is len0 + len2. Test program is attached to this mail. This patch makes __generic_file_aio_read()'s error handling identical to other functions. #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/uio.h> #include <errno.h> #include <string.h> int main(int argc, char *argv) { const char path; struct stat stbuf; size_t len0, len1; void buf0, buf1; struct iovec iov[3]; int fd, i; ssize_t ret; if (argc < 2) { fprintf(stderr, "Usage: testreadv path (better be a " "small text file)\n"); return 1; } path = argv[1]; if (stat(path, &stbuf) < 0) { perror("stat"); return 1; } len0 = stbuf.st_size / 2; len1 = stbuf.st_size - len0; if (!len0 \|\| !len1) { fprintf(stderr, "Dude, file is too small\n"); return 1; } if ((fd = open(path, O_RDONLY)) < 0) { perror("open"); return 1; } if (!(buf0 = malloc(len0)) \|\| !(buf1 = malloc(len1))) { perror("malloc"); return 1; } memset(buf0, 0, len0); memset(buf1, 0, len1); iov[0].iov_base = buf0; iov[0].iov_len = len0; iov[1].iov_base = NULL; iov[1].iov_len = len1; iov[2].iov_base = buf1; iov[2].iov_len = len1; printf("vector "); for (i = 0; i < 3; i++) printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len); printf("\n"); ret = readv(fd, iov, 3); if (ret < 0) perror("readv"); printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n", ret, (char )buf0, (char )buf1); return 0; } Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm/filemap.c:filemap_populate(): move export.	Nikita Danilov	2005-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	move EXPORT_SYMBOL(filemap_populate) to the proper place: just after function itself: it's easy to miss that function is exported otherwise. Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: update comments to pte lock	Hugh Dickins	2005-10-29	1	-3/+3
\| \| \| \| \| \| \| \|	Updated several references to page_table_lock in common code comments. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: split page table lock	Hugh Dickins	2005-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with a many-threaded application which concurrently initializes different parts of a large anonymous area. This patch corrects that, by using a separate spinlock per page table page, to guard the page table entries in that page, instead of using the mm's single page_table_lock. (But even then, page_table_lock is still used to guard page table allocation, and anon_vma allocation.) In this implementation, the spinlock is tucked inside the struct page of the page table page: with a BUILD_BUG_ON in case it overflows - which it would in the case of 32-bit PA-RISC with spinlock debugging enabled. Splitting the lock is not quite for free: another cacheline access. Ideally, I suppose we would use split ptlock only for multi-threaded processes on multi-cpu machines; but deciding that dynamically would have its own costs. So for now enable it by config, at some number of cpus - since the Kconfig language doesn't support inequalities, let preprocessor compare that with NR_CPUS. But I don't think it's worth being user-configurable: for good testing of both split and unsplit configs, split now at 4 cpus, and perhaps change that to 8 later. There is a benefit even for singly threaded processes: kswapd can be attacking one part of the mm while another part is busy faulting. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: page fault handlers tidyup	Hugh Dickins	2005-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impose a little more consistency on the page fault handlers do_wp_page, do_swap_page, do_anonymous_page, do_no_page, do_file_page: why not pass their arguments in the same order, called the same names? break_cow is all very well, but what it did was inlined elsewhere: easier to compare if it's brought back into do_wp_page. do_file_page's fallback to do_no_page dates from a time when we were testing pte_file by using it wherever possible: currently it's peculiar to nonlinear vmas, so just check that. BUG_ON if not? Better not, it's probably page table corruption, so just show the pte: hmm, there's a pte_ERROR macro, let's use that for do_wp_page's invalid pfn too. Hah! Someone in the ppc64 world noticed pte_ERROR was unused so removed it: restored (and say "pud" not "pmd" in its pud_ERROR). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] gfp_t: mm/* (easy parts)	Al Viro	2005-10-28	1	-4/+4
\| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm/filemap.c: make two functions static	Adrian Bunk	2005-09-10	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \|	With Nick Piggin <npiggin@suse.de> Give some things static scope. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] shmem_populate: avoid an useless check, and some comments	Paolo 'Blaisorblade' Giarrusso	2005-09-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Either shmem_getpage returns a failure, or it found a page, or it was told it couldn't do any I/O. So it's useless to check nonblock in the else branch. We could add a BUG() there but I preferred to comment the offending function. This was taken out from one Ingo Molnar's old patch I'm resurrecting. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] swap: swap_lock replace list+device	Hugh Dickins	2005-09-05	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea of a swap_device_lock per device, and a swap_list_lock over them all, is appealing; but in practice almost every holder of swap_device_lock must already hold swap_list_lock, which defeats the purpose of the split. The only exceptions have been swap_duplicate, valid_swaphandles and an untrodden path in try_to_unuse (plus a few places added in this series). valid_swaphandles doesn't show up high in profiles, but swap_duplicate does demand attention. However, with the hold time in get_swap_pages so much reduced, I've not yet found a load and set of swap device priorities to show even swap_duplicate benefitting from the split. Certainly the split is mere overhead in the common case of a single swap device. So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock (generally we seem to prefer an _ in the name, and not hide in a macro). If someone can show a regression in swap_duplicate, then probably we should add a hashlock for the swap_map entries alone (shorts being anatomic), so as to help the case of the single swap device too. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix for generic_file_write iov problem	Badari Pulavarty	2005-06-25	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here is the fix for the problem described in http://bugzilla.kernel.org/show_bug.cgi?id=4721 Basically, problem is generic_file_buffered_write() is accessing beyond end of the iov[] vector after handling the last vector. If we happen to cross page boundary, we get a fault. I think this simple patch is good enough. If we really don't want to depend on the "count", then we need pass nr_segs to filemap_set_next_iovec() and decrement it and check it. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Fix the error handling in direct I/O	Hifumi Hisashi	2005-06-25	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug on error handling in the direct I/O function. Currently, if a file is opened with the O_DIRECT\|O_SYNC flag, the write() syscall cannot receive the EIO error after an I/O error (SCSI cable is disconnected etc.). Return values of other points that call generic_osync_inode() are treated appropriately. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] xip: fs/mm: execute in place	Carsten Otte	2005-06-24	1	-72/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- generic_file* file operations do no longer have a xip/non-xip split - filemap_xip.c implements a new set of fops that require get_xip_page aop to work proper. all new fops are exported GPL-only (don't like to see whatever code use those except GPL modules) - __xip_unmap now uses page_check_address, which is no longer static in rmap.c, and defined in linux/rmap.h - mm/filemap.h is now much more clean, plainly having just Linus' inline funcs moved here from filemap.c - fix includes in filemap_xip to make it build cleanly on i386 Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Remove f_error field from struct file	Christoph Lameter	2005-06-23	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by: Christoph Lameter <christoph@lameter.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] broken fault_in_pages_readable call in generic_file_buffered_write()	Martin Schwidefsky	2005-06-06	1	-1/+7
\| \| \| \| \| \| \| \| \| \|	fault_in_pages_readable() is being passed an incorrect `end' address, which can result in writes accidentally faulting in pages which will not be affected by the write() call. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] fix for __generic_file_aio_read() to return 0 on EOF	Suparna Bhattacharya	2005-05-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I came across the following problem while running ltp-aiodio testcases from ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the tests with EXT3 as well as JFS filesystems. One or two fsx-linux testcases were hung after some time. These testcases were hanging at wait_for_all_aios(). Debugging shows that there were some iocbs which were not getting completed eventhough the last retry for those returned -EIOCBQUEUED. Also all such pending iocbs represented READ operation. Further debugging revealed that all such iocbs hit EOF in the DIO layer. To be more precise, the "pos" from which they were trying to read was greater than the "size" of the file. So the generic_file_direct_IO returned 0. This happens rarely as there is already a check in __generic_file_aio_read(), for whether "pos" < "size" before calling direct IO routine. >size = i_size_read(inode); >if (pos < size) { > retval = generic_file_direct_IO(READ, iocb, > iov, pos, nr_segs); But for READ, we are taking the inode->i_sem only in the DIO layer. So it is possible that some other process can change the size of the file before we take the i_sem. In such a case ( when "pos" > "size"), the __generic_file_aio_read() would return -EIOCBQUEUED even though there were no I/O requests submitted by the DIO layer. This would cause the AIO layer to expect aio_complete() for THE iocb, which doesnot happen. And thus the test hangs forever, waiting for an I/O completion, where there are no requests submitted at all. The following patch makes __generic_file_aio_read() return 0 (instead of returning -EIOCBQUEUED), on getting 0 from generic_file_direct_IO(), so that the AIO layer does the aio_complete(). Testing: I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the fsx-linux tests hung. Also the aio-stress tests ran without any problem. Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] remove outdated comments from filemap.c	Christoph Hellwig	2005-05-05	1	-5/+0
\| \| \| \| \|	Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] DocBook: fix some descriptions	Martin Waitz	2005-05-01	1	-8/+9
\| \| \| \| \| \| \| \| \|	Some KernelDoc descriptions are updated to match the current code. No code changes. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Exterminate PAGE_BUG	Matt Mackall	2005-05-01	1	-2/+1
\| \| \| \| \| \| \| \|	Remove PAGE_BUG - repalce it with BUG and BUG_ON. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sync_page() smp_mb() comment	William Lee Irwin III	2005-05-01	1	-1/+19
\| \| \| \| \| \| \| \| \| \|	The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses page_mapping(page). The comments in the patch (the entire patch is the addition of this comment) try to explain further how and why smp_mb() is used. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] generic_file_buffered_write fixes	akpm@osdl.org	2005-05-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Anton Altaparmakov <aia21@cam.ac.uk> points out: - It calls fault_in_pages_readable() which is completely bogus if @nr_segs > 1. It needs to be replaced by a to be written "fault_in_pages_readable_iovec()". - It increments @buf even in the iovec case thus @buf can point to random memory really quickly (in the iovec case) and then it calls fault_in_pages_readable() on this random memory. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] filemap_getpage can block when MAP_NONBLOCK specified	Jeff Moyer	2005-04-16	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We will return NULL from filemap_getpage when a page does not exist in the page cache and MAP_NONBLOCK is specified, here: page = find_get_page(mapping, pgoff); if (!page) { if (nonblock) return NULL; goto no_cached_page; } But we forget to do so when the page in the cache is not uptodate. The following could result in a blocking call: /* * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ if (!PageUptodate(page)) goto page_not_uptodate; Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Linux-2.6.12-rc2v2.6.12-rc2	Linus Torvalds	2005-04-16	1	-0/+2306
	Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!