| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
linkages"
Commit 05f144a0d5c2 ("mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages") removed vma->vm_policy updates code but it is
the purpose of mbind_range(). Now, mbind_range() is virtually a no-op
and while it does not allow memory corruption it is not the right fix.
This patch is a revert.
[mgorman@suse.de: Edited changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
made available
While compaction is migrating pages to free up large contiguous blocks
for allocation it races with other allocation requests that may steal
these blocks or break them up. This patch alters direct compaction to
capture a suitable free page as soon as it becomes available to reduce
this race. It uses similar logic to split_free_page() to ensure that
watermarks are still obeyed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
failures
If allocation fails after compaction then compaction may be deferred for
a number of allocation attempts. If there are subsequent failures,
compact_defer_shift is increased to defer for longer periods. This
patch uses that information to scale the number of pages reclaimed with
compact_defer_shift until allocations succeed again. The rationale is
that reclaiming the normal number of pages still allowed compaction to
fail and its success depends on the number of pages. If it's failing,
reclaim more pages until it succeeds again.
Note that this is not implying that VM reclaim is not reclaiming enough
pages or that its logic is broken. try_to_free_pages() always asks for
SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
what it does. Direct reclaim stops normally with this check.
if (sc->nr_reclaimed >= sc->nr_to_reclaim)
goto out;
should_continue_reclaim delays when that check is made until a minimum
number of pages for reclaim/compaction are reclaimed. It is possible
that this patch could instead set nr_to_reclaim in try_to_free_pages()
and drive it from there but that's behaves differently and not
necessarily for the better. If driven from do_try_to_free_pages(), it
is also possible that priorities will rise.
When they reach DEF_PRIORITY-2, it will also start stalling and setting
pages for immediate reclaim which is more disruptive than not desirable
in this case. That is a more wide-reaching change that could cause
another regression related to THP requests causing interactive jitter.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocation success rates have been far lower since 3.4 due to commit
fe2c2a106663 ("vmscan: reclaim at order 0 when compaction is enabled").
This commit was introduced for good reasons and it was known in advance
that the success rates would suffer but it was justified on the grounds
that the high allocation success rates were achieved by aggressive
reclaim. Success rates are expected to suffer even more in 3.6 due to
commit 7db8889ab05b ("mm: have order > 0 compaction start off where it
left") which testing has shown to severely reduce allocation success
rates under load - to 0% in one case.
This series aims to improve the allocation success rates without
regressing the benefits of commit fe2c2a106663. The series is based on
latest mmotm and takes into account the __GFP_NO_KSWAPD flag is going
away.
Patch 1 updates a stale comment seeing as I was in the general area.
Patch 2 updates reclaim/compaction to reclaim pages scaled on the number
of recent failures.
Patch 3 captures suitable high-order pages freed by compaction to reduce
races with parallel allocation requests.
Patch 4 fixes the upstream commit [7db8889a: mm: have order > 0 compaction
start off where it left] to enable compaction again
Patch 5 identifies when compacion is taking too long due to contention
and aborts.
STRESS-HIGHALLOC
3.6-rc1-akpm full-series
Pass 1 36.00 ( 0.00%) 51.00 (15.00%)
Pass 2 42.00 ( 0.00%) 63.00 (21.00%)
while Rested 86.00 ( 0.00%) 86.00 ( 0.00%)
From
http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__stress-highalloc-performance-ext3/hydra/comparison.html
I know that the allocation success rates in 3.3.6 was 78% in comparison
to 36% in in the current akpm tree. With the full series applied, the
success rates are up to around 51% with some variability in the results.
This is not as high a success rate but it does not reclaim excessively
which is a key point.
MMTests Statistics: vmstat
Page Ins 3050912 3078892
Page Outs 8033528 8039096
Swap Ins 0 0
Swap Outs 0 0
Note that swap in/out rates remain at 0. In 3.3.6 with 78% success rates
there were 71881 pages swapped out.
Direct pages scanned 70942 122976
Kswapd pages scanned 1366300 1520122
Kswapd pages reclaimed 1366214 1484629
Direct pages reclaimed 70936 105716
Kswapd efficiency 99% 97%
Kswapd velocity 1072.550 1182.615
Direct efficiency 99% 85%
Direct velocity 55.690 95.672
The kswapd velocity changes very little as expected. kswapd velocity is
around the 1000 pages/sec mark where as in kernel 3.3.6 with the high
allocation success rates it was 8140 pages/second. Direct velocity is
higher as a result of patch 2 of the series but this is expected and is
acceptable. The direct reclaim and kswapd velocities change very little.
If these get accepted for merging then there is a difficulty in how they
should be handled. 7db8889a ("mm: have order > 0 compaction start off
where it left") is broken but it is already in 3.6-rc1 and needs to be
fixed. However, if just patch 4 from this series is applied then Jim
Schutt's workload is known to break again as his workload also requires
patch 5. While it would be preferred to have all these patches in 3.6 to
improve compaction in general, it would at least be acceptable if just
patches 4 and 5 were merged to 3.6 to fix a known problem without breaking
compaction completely. On the face of it, that would force
__GFP_NO_KSWAPD patches to be merged at the same time but I can do a
version of this series with __GFP_NO_KSWAPD change reverted and then
rebase it on top of this series. That might be best overall because I
note that the __GFP_NO_KSWAPD patch should have removed
deferred_compaction from page_alloc.c but it didn't but fixing that causes
collisions with this series.
This patch:
The comment about order applied when the check was order >
PAGE_ALLOC_COSTLY_ORDER which has not been the case since c5a73c3d ("thp:
use compaction for all allocation orders"). Fixing the comment while I'm
in the general area.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
People get confused by find_vma_prepare(), because it doesn't care about
what it returns in its output args, when its callers won't be interested.
Clarify by passing in end-of-range address too, and returning failure if
any existing vma overlaps the new range: instead of returning an ambiguous
vma which most callers then must check. find_vma_links() is a clearer
name.
This does revert 2.6.27's dfe195fb79e88 ("mm: fix uninitialized variables
for find_vma_prepare callers"), but it looks like gcc 4.3.0 was one of
those releases too eager to shout about uninitialized variables: only
copy_vma() warns with 4.5.1 and 4.7.1, which a BUG on error silences.
[hughd@google.com: fix warning, remove BUG()]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When writing a new file with 2048 bytes buffer, such as write(fd, buffer,
2048), it will call generic_perform_write() twice for every page:
write_begin
mark_page_accessed(page)
write_end
write_begin
mark_page_accessed(page)
write_end
Pages 1-13 will be added to lru-pvecs in write_begin() and will *NOT* be
added to active_list even they have be accessed twice because they are not
PageLRU(page). But when page 14th comes, all pages in lru-pvecs will be
moved to inactive_list (by __lru_cache_add() ) in first write_begin(), now
page 14th *is* PageLRU(page). And after second write_end() only page 14th
will be in active_list.
In Hadoop environment, we do comes to this situation: after writing a
file, we find out that only 14th, 28th, 42th... page are in active_list
and others in inactive_list. Now kswapd works, shrinks the inactive_list,
the file only have 14th, 28th...pages in memory, the readahead request
size will be broken to only 52k (13*4k), system's performance falls
dramatically.
This problem can also replay by below steps (the machine has 8G memory):
1. dd if=/dev/zero of=/test/file.out bs=1024 count=1048576
2. cat another 7.5G file to /dev/null
3. vmtouch -m 1G -v /test/file.out, it will show:
/test/file.out
[oooooooooooooooooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 187847/262144
the 'o' means same pages are in memory but same are not.
The solution for this problem is simple: the 14th page should be added to
lru_add_pvecs before mark_page_accessed() just as other pages.
[akpm@linux-foundation.org: tweak comment]
[akpm@linux-foundation.org: grab better comment from the v3 patch]
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
VM_DONTEXPAND, VM_DONTCOPY. Currently this flag used only for
sys_madvise. The next patch will use it for replacing the outdated flag
VM_RESERVED.
Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
(VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the kernel sets mm->exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it
resets mm->exe_file at last mmput() when mm->mm_users drops to zero.
VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag. Usually binfmt module sets
mm->exe_file and mmaps executable vmas with this file, they hold
mm->exe_file while task is running.
comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:
> The kernel implements readlink of /proc/pid/exe by getting the file from
> the first executable VMA. Then the path to the file is reconstructed and
> reported as the result.
>
> Because of the VMA walk the code is slightly different on nommu systems.
> This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
> walking the VMAs to find the first executable file-backed VMA we store a
> reference to the exec'd file in the mm_struct.
>
> That reference would prevent the filesystem holding the executable file
> from being unmounted even after unmapping the VMAs. So we track the number
> of VM_EXECUTABLE VMAs and drop the new reference when the last one is
> unmapped. This avoids pinning the mounted filesystem.
exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.
Seems like currently nobody depends on this behaviour. We can try to
remove this logic and keep mm->exe_file until final mmput().
mm->exe_file is still protected with mm->mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall
task can change its mm->exe_file and unpin mountpoint explicitly.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().
Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.
Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge VM_INSERTPAGE into VM_MIXEDMAP. VM_MIXEDMAP VMA can mix pure-pfn
ptes, special ptes and normal ptes.
Now copy_page_range() always copies VM_MIXEDMAP VMA on fork like
VM_PFNMAP. If driver populates whole VMA at mmap() it probably not
expects page-faults.
This patch removes special check from vma_wants_writenotify() which
disables pages write tracking for VMA populated via vm_instert_page().
BDI below mapped file should not use dirty-accounting, moreover
do_wp_page() can handle this.
vm_insert_page() still marks vma after first usage. Usually it is called
from f_op->mmap() handler under mm->mmap_sem write-lock, so it able to
change vma->vm_flags. Caller must set VM_MIXEDMAP at mmap time if it
wants to call this function from other places, for example from page-fault
handler.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combine several arch-specific vma flags into one.
before patch:
0x00000200 0x01000000 0x20000000 0x40000000
x86 VM_NOHUGEPAGE VM_HUGEPAGE - VM_PAT
powerpc - - VM_SAO -
parisc VM_GROWSUP - - -
ia64 VM_GROWSUP - - -
nommu - VM_MAPPED_COPY - -
others - - - -
after patch:
0x00000200 0x01000000 0x20000000 0x40000000
x86 - VM_PAT VM_HUGEPAGE VM_NOHUGEPAGE
powerpc - VM_SAO - -
parisc - VM_GROWSUP - -
ia64 - VM_GROWSUP - -
nommu - VM_MAPPED_COPY - -
others - VM_ARCH_1 - -
And voila! One completely free bit.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT.
We can toss mapping address from remap_pfn_range() into
track_pfn_vma_new(), and collect all PAT-related logic together in
arch/x86/.
This patch also restores orignal frustration-free is_cow_mapping() check
in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73
("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3")
is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c,
because it already handled by VM_PFNMAP in VM_NO_THP bit-mask.
[suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_insert_pfn
With PAT enabled, vm_insert_pfn() looks up the existing pfn memory
attribute and uses it. Expectation is that the driver reserves the
memory attributes for the pfn before calling vm_insert_pfn().
remap_pfn_range() (when called for the whole vma) will setup a new
attribute (based on the prot argument) for the specified pfn range.
This addresses the legacy usage which typically calls remap_pfn_range()
with a desired memory attribute. For ranges smaller than the vma size
(which is typically not the case), remap_pfn_range() will use the
existing memory attribute for the pfn range.
Expose two different API's for these different behaviors.
track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn()
and track_pfn_remap() for the remap_pfn_range().
This cleanup also prepares the ground for the track/untrack pfn vma
routines to take over the ownership of setting PAT specific vm_flag in
the 'vma'.
[khlebnikov@openvz.org: Clear checks in track_pfn_remap()]
[akpm@linux-foundation.org: tweak a few comments]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When transparent huge pages were introduced, memory compaction and swap
storms were an issue, and the kernel had to be careful to not make THP
allocations cause pageout or compaction.
Now that we have working compaction deferral, kswapd is smart enough to
invoke compaction and the quadratic behaviour around isolate_free_pages
has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
[minchan@kernel.org: Comment fix]
[mgorman@suse.de: Avoid direct reclaim for deferred compaction]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB changes from Pekka Enberg:
"New and noteworthy:
* More SLAB allocator unification patches from Christoph Lameter and
others. This paves the way for slab memcg patches that hopefully
will land in v3.8.
* SLAB tracing improvements from Ezequiel Garcia.
* Kernel tainting upon SLAB corruption from Dave Jones.
* Miscellanous SLAB allocator bug fixes and improvements from various
people."
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (43 commits)
slab: Fix build failure in __kmem_cache_create()
slub: init_kmem_cache_cpus() and put_cpu_partial() can be static
mm/slab: Fix kmem_cache_alloc_node_trace() declaration
Revert "mm/slab: Fix kmem_cache_alloc_node_trace() declaration"
mm, slob: fix build breakage in __kmalloc_node_track_caller
mm/slab: Fix kmem_cache_alloc_node_trace() declaration
mm/slab: Fix typo _RET_IP -> _RET_IP_
mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB
mm, slab: Rename __cache_alloc() -> slab_alloc()
mm, slab: Match SLAB and SLUB kmem_cache_alloc_xxx_trace() prototype
mm, slab: Replace 'caller' type, void* -> unsigned long
mm, slob: Add support for kmalloc_track_caller()
mm, slab: Remove silly function slab_buffer_size()
mm, slob: Use NUMA_NO_NODE instead of -1
mm, sl[au]b: Taint kernel when we detect a corrupted slab
slab: Only define slab_error for DEBUG
slab: fix the DEADLOCK issue on l3 alien lock
slub: Zero initial memory segment for kmem_cache and kmem_cache_node
Revert "mm/sl[aou]b: Move sysfs_slab_add to common"
mm/sl[aou]b: Move kmem_cache refcounting to common code
...
|
| |\ |
|
| | |
| | |
| | |
| | |
| | | |
This reverts commit 1e5965bf1f018cc30a4659fa3f1a40146e4276f6. Ezequiel
Garcia has a better fix.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On Sat, 8 Sep 2012, Ezequiel Garcia wrote:
> @@ -454,15 +455,35 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> gfp |= __GFP_COMP;
> ret = slob_new_pages(gfp, order, node);
>
> - trace_kmalloc_node(_RET_IP_, ret,
> + trace_kmalloc_node(caller, ret,
> size, PAGE_SIZE << order, gfp, node);
> }
>
> kmemleak_alloc(ret, size, 1, gfp);
> return ret;
> }
> +
> +void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> +{
> + return __do_kmalloc_node(size, gfp, node, _RET_IP_);
> +}
> EXPORT_SYMBOL(__kmalloc_node);
>
> +#ifdef CONFIG_TRACING
> +void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
> +}
> +
> +#ifdef CONFIG_NUMA
> +void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> + int node, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, gfp, node, caller);
> +}
> +#endif
This breaks Pekka's slab/next tree with this:
mm/slob.c: In function '__kmalloc_node_track_caller':
mm/slob.c:488: error: 'gfp' undeclared (first use in this function)
mm/slob.c:488: error: (Each undeclared identifier is reported only once
mm/slob.c:488: error: for each function it appears in.)
mm, slob: fix build breakage in __kmalloc_node_track_caller
"mm, slob: Add support for kmalloc_track_caller()" breaks the build
because gfp is undeclared. Fix it.
Acked-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The bug was introduced in commit 4052147c0afa ("mm, slab: Match SLAB
and SLUB kmem_cache_alloc_xxx_trace() prototype").
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The bug was introduced by commit 7c0cb9c64f83 ("mm, slab: Replace
'caller' type, void* -> unsigned long").
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix up a trivial conflict with NUMA_NO_NODE cleanups.
Conflicts:
mm/slob.c
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
On Tue, 11 Sep 2012, Stephen Rothwell wrote:
> After merging the final tree, today's linux-next build (sparc64 defconfig)
> produced this warning:
>
> mm/slab.c:808:13: warning: '__slab_error' defined but not used [-Wunused-function]
>
> Introduced by commit 945cf2b6199b ("mm/sl[aou]b: Extract a common
> function for kmem_cache_destroy"). All uses of slab_error() are now
> guarded by DEBUG.
There is no use case left for slab builds without DEBUG.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Tony Luck reported the following problem on IA-64:
Worked fine yesterday on next-20120905, crashes today. First sign of
trouble was an unaligned access, then a NULL dereference. SL*B related
bits of my config:
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLABINFO=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
And he console log.
PID hash table entries: 4096 (order: 1, 32768 bytes)
Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
6012k data, 880k init)
kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
swapper[0]: error during unaligned kernel access
-1 [1]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 00001010084a2018 ifs : 800000000000060f ip :
[<a0000001001be550>] Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
ip is at new_slab+0x90/0x680
unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
rnat: 9666960159966a59 bsps: a0000001001441c0 pr : 9666960159965a59
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000001001be500 b6 : a00000010112cb20 b7 : a0000001011660a0
f6 : 0fff7f0f0f0f0e54f0000 f7 : 0ffe8c5c1000000000000
f8 : 1000d8000000000000000 f9 : 100068800000000000000
f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
r1 : a00000010155eef0 r2 : 0000000000000000 r3 : fffffffffffc1638
r8 : e0000040600081b8 r9 : ca2ffc55fb373e95 r10 : 0000000000000000
r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
Unable to handle kernel NULL pointer dereference (address
0000000000000018)
swapper[0]: Oops 11003706212352 [2]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 0000121008022018 ifs : 800000000000cc18 ip :
[<a0000001004dc8f1>] Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
ip is at __copy_user+0x891/0x960
unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 9666960159961765
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a00000010004b550 b6 : a00000010004b740 b7 : a00000010000c750
f6 : 000000000000000000000 f7 : 1003e9e3779b97f4a7c16
f8 : 1003e0a00000010001550 f9 : 100068800000000000000
f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
r1 : a00000010155eef0 r2 : a0000001012870b0 r3 : a0000001012870b8
r8 : 0000000000000298 r9 : 0000000000000013 r10 : 0000000000000000
r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813
Sedat Dilek and Hugh Dickins reported similar problems as well.
Earlier patches in the common set moved the zeroing of the kmem_cache
structure into common code. See "Move allocation of kmem_cache into
common code".
The allocation for the two special structures is still done from SLUB
specific code but no zeroing is done since the cache creation functions
used to zero. This now needs to be updated so that the structures are
zeroed during allocation in kmem_cache_init(). Otherwise random pointer
values may be followed.
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This reverts commit 96d17b7be0a9849d381442030886211dbb2a7061 which
caused the following errors at boot:
[ 1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
[ 1.114885] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.114885] Call Trace:
[ 1.114885] [<ffffffff81273f37>] kobject_init+0x87/0xa0
[ 1.115555] [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
[ 1.115555] [<ffffffff8127c870>] ? sprintf+0x40/0x50
[ 1.115555] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.115555] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.115555] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.115555] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.115555] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.115836] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.116834] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.117835] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.117835] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.118401] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.119832] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.120325] ------------[ cut here ]------------
[ 1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
[ 1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
[ 1.121831] Modules linked in:
[ 1.122138] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.122831] Call Trace:
[ 1.123074] [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
[ 1.123833] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
[ 1.124405] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
[ 1.124832] [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
[ 1.125337] [<ffffffff81195eb3>] create_dir+0x73/0xd0
[ 1.125832] [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
[ 1.126363] [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
[ 1.126832] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
[ 1.127406] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.127832] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.128384] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.128833] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.129831] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.130305] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.130831] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.131351] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.131830] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.132392] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.132830] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.133315] ---[ end trace 2703540871c8fab7 ]---
[ 1.133830] ------------[ cut here ]------------
[ 1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
[ 1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
[ 1.135829] Modules linked in:
[ 1.136135] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.136828] Call Trace:
[ 1.137071] [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
[ 1.137830] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
[ 1.138402] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
[ 1.138830] [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
[ 1.139419] [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
[ 1.139830] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
[ 1.140429] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.140830] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.141829] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.142307] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.142829] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.143307] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.143829] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.144352] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.144829] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.145405] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.145828] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.146313] ---[ end trace 2703540871c8fab8 ]---
Conflicts:
mm/slub.c
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Get rid of the refcount stuff in the allocators and do that part of
kmem_cache management in the common code.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Do the initial settings of the fields in common code. This will allow us
to push more processing into common code later and improve readability.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Shift the allocations to common code. That way the allocation and
freeing of the kmem_cache structures is handled by common code.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Simplify locking by moving the slab_add_sysfs after all locks have been
dropped. Eases the upcoming move to provide sysfs support for all
allocators.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The slab aliasing logic causes some strange contortions in slub. So add
a call to deal with aliases to slab_common.c but disable it for other
slab allocators by providng stubs that fail to create aliases.
Full general support for aliases will require additional cleanup passes
and more standardization of fields in kmem_cache.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Duping of the slabname has to be done by each slab. Moving this code to
slab_common avoids duplicate implementations.
With this patch we have common string handling for all slab allocators.
Strings passed to kmem_cache_create() are copied internally. Subsystems
can create temporary strings to create slab caches.
Slabs allocated in early states of bootstrap will never be freed (and
those can never be freed since they are essential to slab allocator
operations). During bootstrap we therefore do not have to worry about
duping names.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
What is done there can be done in __kmem_cache_shutdown.
This affects RCU handling somewhat. On rcu free all slab allocators do
not refer to other management structures than the kmem_cache structure.
Therefore these other structures can be freed before the rcu deferred
free to the page allocator occurs.
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The freeing action is basically the same in all slab allocators.
Move to the common kmem_cache_destroy() function.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
structure.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
kmem_cache_destroy does basically the same in all allocators.
Extract common code which is easy since we already have common mutex
handling.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Move the code to append the new kmem_cache to the list of slab caches to
the kmem_cache_create code in the shared code.
This is possible now since the acquisition of the mutex was moved into
kmem_cache_create().
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of using s == NULL use an errorcode. This allows much more
detailed diagnostics as to what went wrong. As we add more functionality
from the slab allocators to the common kmem_cache_create() function we will
also add more error conditions.
Print the error code during the panic as well as in a warning if the module
can handle failure. The API for kmem_cache_create() currently does not allow
the returning of an error code. Return NULL but log the cause of the problem
in the syslog.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Do not use kmalloc() but kmem_cache_alloc() for the allocation
of the kmem_cache structures in slub.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add additional debugging to check that the objects is actually from the cache
the caller claims. Doing so currently trips up some other debugging code. It
takes a lot to infer from that what was happening.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
[ penberg@kernel.org: Use pr_err() ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| |\ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix build failure with CONFIG_DEBUG_SLAB=y && CONFIG_DEBUG_PAGEALLOC=y caused
by commit 8a13a4cc "mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists".
mm/slab.c: In function '__kmem_cache_create':
mm/slab.c:2474: error: 'align' undeclared (first use in this function)
mm/slab.c:2474: error: (Each undeclared identifier is reported only once
mm/slab.c:2474: error: for each function it appears in.)
make[1]: *** [mm/slab.o] Error 1
make: *** [mm] Error 2
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |/
| | |/|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch does not fix anything, and its only goal is to enable us
to obtain some common code between SLAB and SLUB.
Neither behavior nor produced code is affected.
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch does not fix anything and its only goal is to
produce common code between SLAB and SLUB.
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This long (seemingly unnecessary) patch does not fix anything and
its only goal is to produce common code between SLAB and SLUB.
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This allows to use _RET_IP_ instead of builtin_address(0), thus
achiveing implementation consistency in all three allocators.
Though maybe a nitpick, the real goal behind this patch is
to be able to obtain common code between SLAB and SLUB.
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently slob falls back to regular kmalloc for this case.
With this patch kmalloc_track_caller() is correctly implemented,
thus tracing the specified caller.
This is important to trace accurately allocations performed by
krealloc, kstrdup, kmemdup, etc.
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This function is seldom used, and can be simply replaced with cachep->size.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It doesn't seem worth adding a new taint flag for this, so just re-use
the one from 'bad page'
Acked-by: Christoph Lameter <cl@linux.com> # SLUB
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|