summaryrefslogtreecommitdiffstats
path: root/Documentation/controllers/memory.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/controllers/memory.txt')
-rw-r--r--Documentation/controllers/memory.txt135
1 files changed, 125 insertions, 10 deletions
diff --git a/Documentation/controllers/memory.txt b/Documentation/controllers/memory.txt
index 1c07547..e150196 100644
--- a/Documentation/controllers/memory.txt
+++ b/Documentation/controllers/memory.txt
@@ -137,7 +137,32 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
-2.4 Reclaim
+Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used..
+When you do swapoff and make swapped-out pages of shmem(tmpfs) to
+be backed into memory in force, charges for pages are accounted against the
+caller of swapoff rather than the users of shmem.
+
+
+2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
+Swap Extension allows you to record charge for swap. A swapped-in page is
+charged back to original page allocator if possible.
+
+When swap is accounted, following files are added.
+ - memory.memsw.usage_in_bytes.
+ - memory.memsw.limit_in_bytes.
+
+usage of mem+swap is limited by memsw.limit_in_bytes.
+
+Note: why 'mem+swap' rather than swap.
+The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
+to move account from memory to swap...there is no change in usage of
+mem+swap.
+
+In other words, when we want to limit the usage of swap without affecting
+global LRU, mem+swap limit is better than just limiting swap from OS point
+of view.
+
+2.5 Reclaim
Each cgroup maintains a per cgroup LRU that consists of an active
and inactive list. When a cgroup goes over its limit, we first try
@@ -207,12 +232,6 @@ exceeded.
The memory.stat file gives accounting information. Now, the number of
caches, RSS and Active pages/Inactive pages are shown.
-The memory.force_empty gives an interface to drop *all* charges by force.
-
-# echo 1 > memory.force_empty
-
-will drop all charges in cgroup. Currently, this is maintained for test.
-
4. Testing
Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
@@ -242,10 +261,106 @@ reclaimed.
A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. Such charges are automatically dropped at
-rmdir() if there are no tasks.
+tasks have migrated away from it.
+Such charges are freed(at default) or moved to its parent. When moved,
+both of RSS and CACHES are moved to parent.
+If both of them are busy, rmdir() returns -EBUSY. See 5.1 Also.
+
+Charges recorded in swap information is not updated at removal of cgroup.
+Recorded information is discarded and a cgroup which uses swap (swapcache)
+will be charged as a new owner of it.
+
+
+5. Misc. interfaces.
+
+5.1 force_empty
+ memory.force_empty interface is provided to make cgroup's memory usage empty.
+ You can use this interface only when the cgroup has no tasks.
+ When writing anything to this
+
+ # echo 0 > memory.force_empty
+
+ Almost all pages tracked by this memcg will be unmapped and freed. Some of
+ pages cannot be freed because it's locked or in-use. Such pages are moved
+ to parent and this cgroup will be empty. But this may return -EBUSY in
+ some too busy case.
+
+ Typical use case of this interface is that calling this before rmdir().
+ Because rmdir() moves all pages to parent, some out-of-use page caches can be
+ moved to the parent. If you want to avoid that, force_empty will be useful.
+
+5.2 stat file
+ memory.stat file includes following statistics (now)
+ cache - # of pages from page-cache and shmem.
+ rss - # of pages from anonymous memory.
+ pgpgin - # of event of charging
+ pgpgout - # of event of uncharging
+ active_anon - # of pages on active lru of anon, shmem.
+ inactive_anon - # of pages on active lru of anon, shmem
+ active_file - # of pages on active lru of file-cache
+ inactive_file - # of pages on inactive lru of file cache
+ unevictable - # of pages cannot be reclaimed.(mlocked etc)
+
+ Below is depend on CONFIG_DEBUG_VM.
+ inactive_ratio - VM inernal parameter. (see mm/page_alloc.c)
+ recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
+ recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
+
+ Memo:
+ recent_rotated means recent frequency of lru rotation.
+ recent_scanned means recent # of scans to lru.
+ showing for better debug please see the code for meanings.
+
+
+5.3 swappiness
+ Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
+
+ Following cgroup's swapiness can't be changed.
+ - root cgroup (uses /proc/sys/vm/swappiness).
+ - a cgroup which uses hierarchy and it has child cgroup.
+ - a cgroup which uses hierarchy and not the root of hierarchy.
+
+
+6. Hierarchy support
+
+The memory controller supports a deep hierarchy and hierarchical accounting.
+The hierarchy is created by creating the appropriate cgroups in the
+cgroup filesystem. Consider for example, the following cgroup filesystem
+hierarchy
+
+ root
+ / | \
+ / | \
+ a b c
+ | \
+ | \
+ d e
+
+In the diagram above, with hierarchical accounting enabled, all memory
+usage of e, is accounted to its ancestors up until the root (i.e, c and root),
+that has memory.use_hierarchy enabled. If one of the ancestors goes over its
+limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
+children of the ancestor.
+
+6.1 Enabling hierarchical accounting and reclaim
+
+The memory controller by default disables the hierarchy feature. Support
+can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup
+
+# echo 1 > memory.use_hierarchy
+
+The feature can be disabled by
+
+# echo 0 > memory.use_hierarchy
+
+NOTE1: Enabling/disabling will fail if the cgroup already has other
+cgroups created below it.
+
+NOTE2: This feature can be enabled/disabled per subtree.
-5. TODO
+7. TODO
1. Add support for accounting huge pages (as a separate controller)
2. Make per-cgroup scanner reclaim not-shared pages first
OpenPOWER on IntegriCloud