diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-13 13:11:15 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-13 13:11:15 -0800 |
commit | f6e858a00af788bab0fd4c0b7f5cd788000edc18 (patch) | |
tree | f9403ca3671be9821dbf83e726e61dbe75fbca6b /Documentation | |
parent | 193c0d682525987db59ac3a24531a77e4947aa95 (diff) | |
parent | 98870901cce098bbe94d90d2c41d8d1fa8d94392 (diff) | |
download | op-kernel-dev-f6e858a00af788bab0fd4c0b7f5cd788000edc18.zip op-kernel-dev-f6e858a00af788bab0fd4c0b7f5cd788000edc18.tar.gz |
Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc VM changes from Andrew Morton:
"The rest of most-of-MM. The other MM bits await a slab merge.
This patch includes the addition of a huge zero_page. Not a
performance boost but it an save large amounts of physical memory in
some situations.
Also a bunch of Fujitsu engineers are working on memory hotplug.
Which, as it turns out, was badly broken. About half of their patches
are included here; the remainder are 3.8 material."
However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken. We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text. Does the feature even make sense without compaction or
memory hotplug?
* akpm: (54 commits)
mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
mm/memory.c: remove unused code from do_wp_page()
asm-generic, mm: pgtable: consolidate zero page helpers
mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
hwpoison, hugetlbfs: fix RSS-counter warning
hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
mm: protect against concurrent vma expansion
memcg: do not check for mm in __mem_cgroup_count_vm_event
tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
mm: provide more accurate estimation of pages occupied by memmap
fs/buffer.c: remove redundant initialization in alloc_page_buffers()
fs/buffer.c: do not inline exported function
writeback: fix a typo in comment
mm: introduce new field "managed_pages" to struct zone
mm, oom: remove statically defined arch functions of same name
mm, oom: remove redundant sleep in pagefault oom handler
mm, oom: cleanup pagefault oom handler
memory_hotplug: allow online/offline memory to result movable node
numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
mm, memcg: avoid unnecessary function call when memcg is disabled
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/cgroups/cpusets.txt | 2 | ||||
-rw-r--r-- | Documentation/memory-hotplug.txt | 5 | ||||
-rw-r--r-- | Documentation/vm/transhuge.txt | 19 |
3 files changed, 22 insertions, 4 deletions
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index cefd3d8..12e01d4 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code. The cpus and mems files in the root (top_cpuset) cpuset are read-only. The cpus file automatically tracks the value of cpu_online_mask using a CPU hotplug notifier, and the mems file -automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e., +automatically tracks the value of node_states[N_MEMORY]--i.e., nodes with memory--using the cpuset_track_online_nodes() hook. diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index c6f993d..8e5eacb 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -390,6 +390,7 @@ struct memory_notify { unsigned long start_pfn; unsigned long nr_pages; int status_change_nid_normal; + int status_change_nid_high; int status_change_nid; } @@ -397,7 +398,9 @@ start_pfn is start_pfn of online/offline memory. nr_pages is # of pages of online/offline memory. status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask is (will be) set/clear, if this is -1, then nodemask status is not changed. -status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be) +status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask +is (will be) set/clear, if this is -1, then nodemask status is not changed. +status_change_nid is set node id when N_MEMORY of nodemask is (will be) set/clear. It means a new(memoryless) node gets new memory by online and a node loses all memory. If this is -1, then nodemask status is not changed. If status_changed_nid* >= 0, callback should create/discard structures for the diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index f734bb2..8785fb8 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt @@ -116,6 +116,13 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag echo madvise >/sys/kernel/mm/transparent_hugepage/defrag echo never >/sys/kernel/mm/transparent_hugepage/defrag +By default kernel tries to use huge zero page on read page fault. +It's possible to disable huge zero page by writing 0 or enable it +back by writing 1: + +echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page +echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". @@ -197,6 +204,14 @@ thp_split is incremented every time a huge page is split into base pages. This can happen for a variety of reasons but a common reason is that a huge page is old and is being reclaimed. +thp_zero_page_alloc is incremented every time a huge zero page is + successfully allocated. It includes allocations which where + dropped due race with other allocation. Note, it doesn't count + every map of the huge zero page, only its allocation. + +thp_zero_page_alloc_failed is incremented if kernel fails to allocate + huge zero page and falls back to using small pages. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in /proc/vmstat to help @@ -276,7 +291,7 @@ unaffected. libhugetlbfs will also work fine as usual. == Graceful fallback == Code walking pagetables but unware about huge pmds can simply call -split_huge_page_pmd(mm, pmd) where the pmd is the one returned by +split_huge_page_pmd(vma, addr, pmd) where the pmd is the one returned by pmd_offset. It's trivial to make the code transparent hugepage aware by just grepping for "pmd_offset" and adding split_huge_page_pmd where missing after pmd_offset returns the pmd. Thanks to the graceful @@ -299,7 +314,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c return NULL; pmd = pmd_offset(pud, addr); -+ split_huge_page_pmd(mm, pmd); ++ split_huge_page_pmd(vma, addr, pmd); if (pmd_none_or_clear_bad(pmd)) return NULL; |