summaryrefslogtreecommitdiffstats
path: root/include/linux/highmem.h
Commit message (Collapse)AuthorAgeFilesLines
* mm: fix PageUptodate data raceNick Piggin2008-02-051-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after PageUptodate. Many places that test PageUptodate, do so with the page locked, and this would be enough to ensure memory ordering in those places if SetPageUptodate were only called while the page is locked. Unfortunately that is not always the case for some filesystems, but it could be an idea for the future. Also bring the handling of anonymous page uptodateness in line with that of file backed page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing allows us to get rid of the smp_wmb's in the page copying functions, which were especially added for anonymous pages for an analogous memory ordering problem. Both file and anonymous pages are handled with the same barriers. FAQ: Q. Why not do this in flush_dcache_page? A. Firstly, flush_dcache_page handles only one side (the smb side) of the ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away memory barriers in a completely unrelated function is nasty; at least in the PageUptodate macros, they are located together with (half) the operations involved in the ordering. Thirdly, the smp_wmb is only required when first bringing the page uptodate, wheras flush_dcache_page should be called each time it is written to through the kernel mapping. It is logically the wrong place to put it. Q. Why does this increase my text size / reduce my performance / etc. A. Because it is adding the necessary instructions to eliminate the data-race. Q. Can it be improved? A. Yes, eg. if you were to create a rule that all SetPageUptodate operations run under the page lock, we could avoid the smp_rmb places where PageUptodate is queried under the page lock. Requires audit of all filesystems and at least some would need reworking. That's great you're interested, I'm eagerly awaiting your patches. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Pagecache zeroing: zero_user_segment, zero_user_segments and zero_userChristoph Lameter2008-02-051-18/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove alloc_zeroed_user_highpage()Mel Gorman2007-07-191-15/+0
| | | | | | | | | | alloc_zeroed_user_highpage() has no in-tree users and it is not exported. As it is not exported, it can simply be removed. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add __GFP_MOVABLE for callers to flag allocations from high memory that may ↵Mel Gorman2007-07-171-2/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | be migrated It is often known at allocation time whether a page may be migrated or not. This patch adds a flag called __GFP_MOVABLE and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. An API function very similar to alloc_zeroed_user_highpage() is added for __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The flags used by alloc_zeroed_user_highpage() are not changed because it would change the semantics of an existing API. After this patch is applied there are no in-kernel users of alloc_zeroed_user_highpage() so it probably should be marked deprecated if this patch is merged. Note that this patch includes a minor cleanup to the use of __GFP_ZERO in shmem.c to keep all flag modifications to inode->mapping in the shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of Hugh Dickens. Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the concept. Credit to Hugh Dickens for catching issues with shmem swap vector and ramfs allocations. [akpm@linux-foundation.org: build fix] [hugh@veritas.com: __GFP_ZERO cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: deprecate memclear_highpage_flushNate Diller2007-05-091-2/+1
| | | | | | | | | Now that all the in-tree users are converted over to zero_user_page(), deprecate the old memclear_highpage_flush() call. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: convert core functions to zero_user_pageNate Diller2007-05-091-9/+19
| | | | | | | | | | | | | | | | | | | | | | It's very common for file systems to need to zero part or all of a page, the simplist way is just to use kmap_atomic() and memset(). There's actually a library function in include/linux/highmem.h that does exactly that, but it's confusingly named memclear_highpage_flush(), which is descriptive of *how* it does the work rather than what the *purpose* is. So this patchset renames the function to zero_user_page(), and calls it from the various places that currently open code it. This first patch introduces the new function call, and converts all the core kernel callsites, both the open-coded ones and the old memclear_highpage_flush() ones. Following this patch is a series of conversions for each file system individually, per AKPM, and finally a patch deprecating the old call. The diffstat below shows the entire patchset. [akpm@linux-foundation.org: fix a few things] Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds2007-05-051-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pagesJeremy Fitzhardinge2007-05-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>
* | Convert non-highmem kmap_atomic() to static inline functionGeert Uytterhoeven2007-05-041-2/+8
|/ | | | | | | | | Convert kmap_atomic() in the non-highmem case from a macro to a static inline function, for better type-checking and the ability to pass void pointers instead of struct page pointers. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [ARM] pass vma for flush_anon_page()Russell King2007-01-081-1/+1
| | | | | | | | | | | | | | | Since get_user_pages() may be used with processes other than the current process and calls flush_anon_page(), flush_anon_page() has to cope in some way with non-current processes. It may not be appropriate, or even desirable to flush a region of virtual memory cache in the current process when that is different to the process that we want the flush to occur for. Therefore, pass the vma into flush_anon_page() so that the architecture can work out whether the 'vmaddr' is for the current process or not. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* [PATCH] Pass vma argument to copy_user_highpage().Atsushi Nemoto2006-12-131-1/+2
| | | | | | | | | | | | | To allow a more effective copy_user_highpage() on certain architectures, a vma argument is added to the function and cow_user_page() allowing the implementation of these functions to check for the VM_EXEC bit. The main part of this patch was originally written by Ralf Baechle; Atushi Nemoto did the the debugging. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix COW D-cache aliasing on forkAtsushi Nemoto2006-12-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1. There is a process containing two thread (T1 and T2). The thread T1 calls fork(). Then dup_mmap() function called on T1 context. static inline int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) ... flush_cache_mm(current->mm); ... /* A */ (write-protect all Copy-On-Write pages) ... /* B */ flush_tlb_mm(current->mm); ... 2. When preemption happens between A and B (or on SMP kernel), the thread T2 can run and modify data on COW pages without page fault (modified data will stay in cache). 3. Some time after fork() completed, the thread T2 may cause a page fault by write-protect on a COW page. 4. Then data of the COW page will be copied to newly allocated physical page (copy_cow_page()). It reads data via kernel mapping. The kernel mapping can have different 'color' with user space mapping of the thread T2 (dcache aliasing). Therefore copy_cow_page() will copy stale data. Then the modified data in cache will be lost. In order to allow architecture code to deal with this problem allow architecture code to override copy_user_highpage() by defining __HAVE_ARCH_COPY_USER_HIGHPAGE in <asm/page.h>. The main part of this patch was originally written by Ralf Baechle; Atushi Nemoto did the the debugging. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: k{,um}map_atomic() vs in_atomic()Peter Zijlstra2006-12-071-3/+5
| | | | | | | | | | Make kmap_atomic/kunmap_atomic denote a pagefault disabled scope. All non trivial implementations already do this anyway. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] reduce MAX_NR_ZONES: move HIGHMEM counters into highmem.c/.hChristoph Lameter2006-09-261-0/+3
| | | | | | | | | | | | | Move totalhigh_pages and nr_free_highpages() into highmem.c/.h Move the totalhigh_pages definition into highmem.c/.h. Move the nr_free_highpages function into highmem.c [yoichi_yuasa@tripeaks.co.jp: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] update to the kernel kmap/kunmap APIJames Bottomley2006-09-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Give non-highmem architectures access to the kmap API for the purposes of overriding (this is what the attached patch does). The proposal is that we should now require all architectures with coherence issues to manage data coherence via the kmap/kunmap API. Thus driver writers never have to write code like kmap(page) modify data in page flush_kernel_dcache_page(page) kunmap(page) instead, kmap/kunmap will manage the coherence and driver (and filesystem) writers don't need to worry about how to flush between kmap and kunmap. For most architectures, the page only needs to be flushed if it was actually written to *and* there are user mappings of it, so the best implementation looks to be: clear the page dirty pte bit in the kernel page tables on kmap and on kunmap, check page->mappings for user maps, and then the dirty bit, and only flush if it both has user mappings and is dirty. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Don't include linux/config.h from anywhere else in include/David Woodhouse2006-04-261-1/+0
| | | | Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* [PATCH] Add flush_kernel_dcache_page() APIJames Bottomley2006-03-261-0/+6
| | | | | | | | | | | | | | | | | | | | | We have a problem in a lot of emulated storage in that it takes a page from get_user_pages() and does something like kmap_atomic(page) modify page kunmap_atomic(page) However, nothing has flushed the kernel cache view of the page before the kunmap. We need a lightweight API to do this, so this new API would specifically be for flushing the kernel cache view of a user page which the kernel has modified. The driver would need to add flush_kernel_dcache_page(page) before the final kunmap. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add API for flushing Anon pagesJames Bottomley2006-03-261-0/+6
| | | | | | | | | | | | | | | | Currently, get_user_pages() returns fully coherent pages to the kernel for anything other than anonymous pages. This is a problem for things like fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA to anonymous pages passed in by users. The fix is to add a new memory management API: flush_anon_page() which is used in get_user_pages() to make anonymous pages coherent. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kdump: Routines for copying dump pagesVivek Goyal2005-06-251-0/+1
| | | | | | | | | | | This patch provides the interfaces necessary to read the dump contents, treating it as a high memory device. Signed off by Hariprasad Nellitheertha <hari@in.ibm.com> Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-161-0/+104
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
OpenPOWER on IntegriCloud