summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] page migration cleanup: group functionsChristoph Lameter2006-06-231-70/+72
| | | | | | | | | Reorder functions in migrate.c. Group all migration functions for struct address_space_operations together. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] page migration cleanup: rename "ignrefs" to "migration"Christoph Lameter2006-06-231-9/+9
| | | | | | | | migrate is a better name since it is only used by page migration. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] writeback: fix range handlingOGAWA Hirofumi2006-06-239-31/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] buglet in radix_tree_tag_setPeter Zijlstra2006-06-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | The comment states: 'Setting a tag on a not-present item is a BUG.' Hence if 'index' is larger than the maxindex; the item _cannot_ be presen; it should also be a BUG. Also, this allows the following statement (assume a fresh tree): radix_tree_tag_set(root, 16, 1); to fail silently, but when preceded by: radix_tree_insert(root, 32, item); it would BUG, because the height has been extended by the insert. In neither case was 16 present. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: redzone double-free detectionPekka Enberg2006-06-231-9/+23
| | | | | | | | | | | | | | | | At present our slab debugging tells us that it detected a double-free or corruption - it does not distinguish between them. Sometimes it's useful to be able to differentiate between these two types of information. Add double-free detection to redzone verification when freeing an object. As explained by Manfred, when we are freeing an object, both redzones should be RED_ACTIVE. However, if both are RED_INACTIVE, we are trying to free an object that was already free'd. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] likely cleanup: remove unlikely in sys_mprotect()Hua Zhong2006-06-231-2/+1
| | | | | | | | | With likely/unlikely profiling on my not-so-busy-typical-developmentsystem there are 5k misses vs 2k hits. So I guess we should remove the unlikely. Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] radix-tree: smallNick Piggin2006-06-231-1/+1
| | | | | | | | | | Reduce radix tree node memory usage by about a factor of 4 for small files (< 64K). There are pointer traversal and memory usage costs for large files with dense pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] radix-tree: direct dataNick Piggin2006-06-232-83/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ability to have height 0 radix trees (a direct pointer to the data item rather than going through a full node->slot) quietly disappeared with old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd. On 64-bit machines this causes nearly 600 bytes to be used for every <= 4K file in pagecache. Re-introduce this feature, root tags stored in spare ->gfp_mask bits. Simplify radix_tree_delete's complex tag clearing arrangement (which would become even more complex) by just falling back to tag clearing functions (the pagecache radix-tree never uses this path anyway, so the icache savings will mean it's actually a speedup). On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in pagecache. Pagecache lookup, insertion, and removal speed for small files will also be improved. This makes RCU radix tree harder, but it's worth it. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] change gen_pool allocator to not touch managed memoryDean Nelson2006-06-234-261/+252
| | | | | | | | | | | | | | | | | | | | | | | Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme instead of the buddy scheme. The purpose of this change is to eliminate the touching of the actual memory being allocated. Since the change modifies the interface, a change to the uncached allocator (arch/ia64/kernel/uncached.c) is also required. Both Andrey Volkov and Jes Sorenson have expressed a desire that the gen_pool allocator not write to the memory being managed. See the following: http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2 Signed-off-by: Dean Nelson <dcn@sgi.com> Cc: Andrey Volkov <avolkov@varma-el.com> Acked-by: Jes Sorensen <jes@trained-monkey.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: introduce remap_vmalloc_range()Nick Piggin2006-06-232-2/+128
| | | | | | | | | Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers can have a nice interface for remapping vmalloc memory. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Unify pxm_to_node() and node_to_pxm()Yasunori Goto2006-06-2312-71/+102
| | | | | | | | | | | | | Consolidate the various arch-specific implementations of pxm_to_node() and node_to_pxm() into a single generic version. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] swsusp: rework memory shrinkerRafael J. Wysocki2006-06-232-57/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the swsusp's memory shrinker in the following way: - Simplify balance_pgdat() by removing all of the swsusp-related code from it. - Make shrink_all_memory() use shrink_slab() and a new function shrink_all_zones() which calls shrink_active_list() and shrink_inactive_list() directly for each zone in a way that's optimized for suspend. In shrink_all_memory() we try to free exactly as many pages as the caller asks for, preferably in one shot, starting from easier targets.  If slab caches are huge, they are most likely to have enough pages to reclaim.  The inactive lists are next (the zones with more inactive pages go first) etc. Each time shrink_all_memory() attempts to shrink the active and inactive lists for each zone in 5 passes.  In the first pass, only the inactive lists are taken into consideration.  In the next two passes the active lists are also shrunk, but mapped pages are not reclaimed.  In the last two passes the active and inactive lists are shrunk and mapped pages are reclaimed as well. The aim of this is to alter the reclaim logic to choose the best pages to keep on resume and improve the responsiveness of the resumed system. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Con Kolivas <kernel@kolivas.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: stop using list_for_eachChristoph Hellwig2006-06-231-27/+11
| | | | | | | | Use the _entry variant everywhere to clean the code up a tiny bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: clean up kmem_getpagesChristoph Hellwig2006-06-231-16/+14
| | | | | | | | | | | | | | The last ifdef addition hit the ugliness treshold on this functions, so: - rename the variable i to nr_pages so it's somewhat descriptive - remove the addr variable and do the page_address call at the very end - instead of ifdef'ing the whole alloc_pages_node call just make the __GFP_COMP addition to flags conditional - rewrite the __GFP_COMP comment to make sense Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] tightening hugetlb strict accountingChen, Kenneth W2006-06-233-138/+173
| | | | | | | | | | | | | | | | | | | | | Current hugetlb strict accounting for shared mapping always assume mapping starts at zero file offset and reserves pages between zero and size of the file. This assumption often reserves (or lock down) a lot more pages then necessary if application maps at none zero file offset. libhugetlbfs is one example that requires proper reservation on shared mapping starts at none zero offset. This patch extends the reservation and hugetlb strict accounting to support any arbitrary pair of (offset, len), resulting a much more robust and accurate scheme. More importantly, it won't lock down any hugetlb pages outside file mapping. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] reserve space for swap labelAndreas Dilger2006-06-231-6/+8
| | | | | | | | | | | | | | | | | Reserve space in the swap disk header for a LABEL and UUID to be specified. This has been possible with util-linux-2.12b (via e2fsprogs 1.36 libblkid), and is used by at least FC3 and later. The kernel doesn't really care about this, but the space shouldn't accidentally be used by something else either. Also make the on-disk structures be fixed-size types, instead of "int", though I don't know of any architecture in use where an "int" isn't the same size as a "__u32" (all current kernel arches have it as "unsigned int"). Signed-off-by: Andreas Dilger <adilger@shaw.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: fix typos in comments in mm/oom_kill.cDave Peterson2006-06-231-3/+3
| | | | | | | | This fixes a few typos in the comments in mm/oom_kill.c. Signed-off-by: David S. Peterson <dsp@llnl.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] support for panic at OOMKAMEZAWA Hiroyuki2006-06-234-0/+26
| | | | | | | | | | | | | | | | | This patch adds panic_on_oom sysctl under sys.vm. When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in the same way as it does today. Of course, the default value is 0 and only root can modifies it. In general, oom_killer works well and kill rogue processes. So the whole system can survive. But there are environments where panic is preferable rather than kill some processes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] squash duplicate page_to_pfn and pfn_to_pageAndy Whitcroft2006-06-232-42/+17
| | | | | | | | | | | | We have architectures where the size of page_to_pfn and pfn_to_page are significant enough to overall image size that they wish to push them out of line. However, in the process we have grown a second copy of the implementation of each of these routines for each memory model. Share the implmentation exposing it either inline or out-of-line as required. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: update zonelistsYasunori Goto2006-06-232-5/+33
| | | | | | | | | | | | | | | | | | | | | | In current code, zonelist is considered to be build once, no modification. But MemoryHotplug can add new zone/pgdat. It must be updated. This patch modifies build_all_zonelists(). By this, build_all_zonelist() can reconfig pgdat's zonelists. To update them safety, this patch use stop_machine_run(). Other cpus don't touch among updating them by using it. In old version (V2 of node hotadd), kernel updated them after zone initialization. But present_page of its new zone is still 0, because online_page() is not called yet at this time. Build_zonelists() checks present_pages to find present zone. It was too early. So, I changed it after online_pages(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: wait_table ↵Yasunori Goto2006-06-231-6/+53
| | | | | | | | | | | | | | | initialization Wait_table is initialized according to zone size at boot time. But, we cannot know the maixmum zone size when memory hotplug is enabled. It can be changed.... And resizing of wait_table is hard. So kernel allocate and initialzie wait_table as its maximum size. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: add return ↵Yasunori Goto2006-06-233-5/+24
| | | | | | | | | | | | | | | | code for init_current_empty_zone When add_zone() is called against empty zone (not populated zone), we have to initialize the zone which didn't initialize at boot time. But, init_currently_empty_zone() may fail due to allocation of wait table. So, this patch is to catch its error code. Changes against wait_table is in the next patch. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: change to ↵Yasunori Goto2006-06-232-11/+11
| | | | | | | | | | | | | meminit for build_zonelist Change definitions of some functions and data from __init to __meminit. These functions and data can be used after bootup by this patch to be used for hot-add codes. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] wait_table and zonelist initializing for memory hotadd: change name ↵Yasunori Goto2006-06-232-7/+9
| | | | | | | | | | of wait_table_size() This is just to rename from wait_table_size() to wait_table_hash_nr_entries(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] migration: remove unnecessary PageSwapCache checksChristoph Lameter2006-06-232-15/+0
| | | | | | | | | | Remove two unnecessary PageSwapCache checks. The page refcount is raised and therefore page migration cannot occur in both functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: page mapping cleanupPekka Enberg2006-06-231-11/+16
| | | | | | | | | | | Clean up slab allocator page mapping a bit. The memory allocated for a slab is physically contiguous so it is okay to assume struct pages are too so kill the long-standing comment. Furthermore, rename set_slab_attr to slab_map_pages and add a comment explaining why its needed. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] PG_uncached is ia64 onlyAndrew Morton2006-06-231-1/+13
| | | | | | | | | | | | | | As Nick points out, only ia64 uses PG_uncached. So we can push it up into the higher bits of the lower half of page->flags and make room for another flag on 32-bit machines. Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@sgi.com> Cc: Jes Sorensen <jes@trained-monkey.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract cache_free_alien from __cache_freePekka Enberg2006-06-231-35/+42
| | | | | | | | | | Move alien object freeing to cache_free_alien() to reduce #ifdef clutter in __cache_free(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Page Migration: Make do_swap_page redo the faultChristoph Lameter2006-06-231-7/+0
| | | | | | | | | | | | | | | | It is better to redo the complete fault if do_swap_page() finds that the page is not in PageSwapCache() because the page migration code may have replaced the swap pte already with a pte pointing to valid memory. do_swap_page() may interpret an invalid swap entry without this patch because we do not reload the pte if we are looping back. The page migration code may already have reused the swap entry referenced by our local swp_entry. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] zone handle unaligned zone boundariesAndy Whitcroft2006-06-232-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buddy allocator has a requirement that boundaries between contigious zones occur aligned with the the MAX_ORDER ranges. Where they do not we will incorrectly merge pages cross zone boundaries. This can lead to pages from the wrong zone being handed out. Originally the buddy allocator would check that buddies were in the same zone by referencing the zone start and end page frame numbers. This was removed as it became very expensive and the buddy allocator already made the assumption that zones boundaries were aligned. It is clear that not all configurations and architectures are honouring this alignment requirement. Therefore it seems safest to reintroduce support for non-aligned zone boundaries. This patch introduces a new check when considering a page a buddy it compares the zone_table index for the two pages and refuses to merge the pages where they do not match. The zone_table index is unique for each node/zone combination when FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size). Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] for_each_possible_cpu: xfsKAMEZAWA Hiroyuki2006-06-232-3/+3
| | | | | | | | | | | | | | | | | | for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. in xfs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the resultsDavid Howells2006-06-231-1/+2
| | | | | | | | | | Enable XFS to limit the statfs() results to the project quota covering the dentry used as a base for call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS: Permit filesystem to perform statfs with a known root dentryDavid Howells2006-06-2360-144/+175
| | | | | | | | | | | | | | | | | | | | | Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells2006-06-2383-431/+482
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix cdrom being confused on using kdumpRachita Kothiyal2006-06-231-1/+4
| | | | | | | | | | | | | | | | | | I have seen the cdrom drive appearing confused on using kdump on certain x86_64 systems. During the booting up of the second kernel, the following message would keep flooding the console, and the booting would not proceed any further. hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) In this patch, whenever we are hitting a confused state in the interrupt handler with the DRQ set, we end the request and return ide_stopped. Using this I dont see the status error. Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds2006-06-222-8/+13
|\ | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: [PATCH] Driver core: fix locking issues with the devices that are attached to classes [PATCH] USB: get USB suspend to work again
| * [PATCH] Driver core: fix locking issues with the devices that are attached ↵Greg Kroah-Hartman2006-06-221-8/+11
| | | | | | | | | | | | | | | | to classes Doh, that was foolish... Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * [PATCH] USB: get USB suspend to work againGreg Kroah-Hartman2006-06-221-0/+2
| | | | | | | | | | | | | | | | | | | | Yeah, it's a hack, but it is only temporary until Alan's patches reworking this area make it in. We really should not care what devices below us are doing, especially when we do not really know what type of devices they are. This patch relies on the fact that the endpoint devices do not have a driver assigned to us. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-mmcLinus Torvalds2006-06-224-83/+74
|\ \ | | | | | | | | | | | | | | | * 'devel' of master.kernel.org:/home/rmk/linux-2.6-mmc: [ARM] 3565/1: AT91RM9200 MMC update [MMC] Convert all hosts except mmci to use data->blksz
| * | [ARM] 3565/1: AT91RM9200 MMC updateAndrew Victor2006-06-191-74/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Andrew Victor This patch includes code cleanups and minor fixes to the AT91RM9200 MMC driver. 1. Replace calls to DBG() with pr_debug(). 2. 'host' can never be null, so don't bother checking for that case. 3. Remove SA_SAMPLE_RANDOM from request_irq(). [Patch from Matt Mackall] 4. clk_get() doesn't return NULL on error - need to test returned value with IS_ERR(). 5. Free resources if clk_get() or request_irq() fails. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | [MMC] Convert all hosts except mmci to use data->blkszRussell King2006-06-194-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MMC specification allows non-power of two block sizes. As such, we should not pass the log2 block size to host drivers, but instead pass the byte size. However, ARM MMCI can only work with log2 block size, so continue to pass both the log2 block size and byte block size. This means that for the moment, the byte block size must remain a power of two, but this is the first stage of removing this restriction for other hosts. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2006-06-2247-497/+1946
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits) [ARM] 3629/1: S3C24XX: fix missing bracket in regs-dsc.h [ARM] 3537/1: Rework DMA-bounce locking for finer granularity [ARM] 3601/1: i.MX/MX1 DMA error handling for signaled channels only [ARM] 3597/1: ixp4xx/nslu2: Board support for new LED subsystem [ARM] 3595/1: ixp4xx/nas100d: Board support for new LED subsystem [ARM] 3626/1: ARM EABI: fix syscall restarting [ARM] 3628/1: S3C24XX: add get_rate call to struct clk [ARM] 3627/1: S3C24XX: split s3c2410 clocks from core clocks [ARM] 3613/1: S3C2410: Add sysdev and sysclass [ARM] 3624/1: Report true modem control line states [ARM] 3620/2: ixp23xx: add uengine loader support [ARM] 3618/1: add defconfig for logicpd pxa270 card engine [ARM] 3617/1: ep93xx: fix slightly incorrect timer tick rate [ARM] 3616/1: fix timer handler wrap logic for a number of platforms [ARM] 3615/1: ixp23xx: use platform devices for physmap flash [ARM] 3614/1: ep93xx: use platform devices for physmap flash [ARM] 3621/1: fix compilation breakage for pnx4008 [ARM] 3623/1: pnx4008: move GPIO-related defines to gpio.h [ARM] 3622/1: pnx4008: remove clk_use/clk_unuse [ARM] Enable VFP to be built when non-VFP capable CPUs are selected ...
| * | | [ARM] 3629/1: S3C24XX: fix missing bracket in regs-dsc.hBen Dooks2006-06-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Ben Dooks Fix missing bracket in include/asm-arm/arch-s3c2410/regs-dsc.h Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3537/1: Rework DMA-bounce locking for finer granularityKevin Hilman2006-06-221-46/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Kevin Hilman This time with IRQ versions of locks. Rework also enables compatability with realtime-preemption patch. With the current locking via interrupt disabling, under RT, potentially sleeping functions can be called with interrupts disabled. Signed-off-by: Kevin Hilman <khilman@mvista.com> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3601/1: i.MX/MX1 DMA error handling for signaled channels onlyPavel Pisa2006-06-222-28/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Pavel Pisa There has been bug, that dma_err_handler() touches even channels not signaling error condition. Problem noticed by Andrea Paterniani. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3597/1: ixp4xx/nslu2: Board support for new LED subsystemRod Whitby2006-06-221-1/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Rod Whitby This patch implements NEW_LEDS support for the Linksys NSLU2. The NSLU2 has four LED indicators, which are the only form of output for an unmodified device - there is no keyboard or display on an NSLU2. For an NSLU2 which has been modified to bring out the serial port console, it is important to register that device first separately, to enable debugging of other device support. Signed-off-by: John Bowler <jbowler@acm.org> Signed-off-by: Rod Whitby <rod@whitby.id.au> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3595/1: ixp4xx/nas100d: Board support for new LED subsystemRod Whitby2006-06-221-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Rod Whitby This patch implements NEW_LEDS support for the IOMega NAS100d. The NAS100d has three LED indicators, which are the only form of output for an unmodified device - there is no keyboard or display on an NAS100d. For an NAS100d which has been modified to bring out the serial port console, it is important to register that device first separately, to enable debugging of other device support. Signed-off-by: John Bowler <jbowler@acm.org> Signed-off-by: Rod Whitby <rod@whitby.id.au> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3626/1: ARM EABI: fix syscall restartingNicolas Pitre2006-06-221-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Nicolas Pitre The RESTARTBLOCK case currently store some code on the stack to invoke sys_restart_syscall. However this is ABI dependent and there is a mismatch with the way __NR_restart_syscall gets defined when the kernel is compiled for EABI. There is also a long standing bug in the thumb case since with OABI the __NR_restart_syscall value includes __NR_SYSCALL_BASE which should not be the case for Thumb syscalls. Credits to Yauheni Kaliuta <yauheni.kaliuta@gmail.com> for finding the EABI bug. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3628/1: S3C24XX: add get_rate call to struct clkBen Dooks2006-06-222-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Ben Dooks Add a get_rate call to allow an given clock to over-ride the clk_get_rate() call. This provides support for clocks which rely on division of their parent to correctly report their frequency when the parent can also change. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] 3627/1: S3C24XX: split s3c2410 clocks from core clocksBen Dooks2006-06-2210-218/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Ben Dooks Split the s3c2410 specific clocks from the core clock code, as part of the work to support more of the Samsung line of SoCs. The patch does not use the sysdev mechanism as the clocks are needed for the timer init, which is very early in the kernel init sequence. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
OpenPOWER on IntegriCloud