From 21c13a4f7bc185552c4b402b792c3bbb9aa69df0 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Tue, 7 Jun 2011 11:35:52 -0400 Subject: usb-storage: redo incorrect reads Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: Alan Stern Tested-by: Sven Geggus Tested-by: Paul Hartman CC: Matthew Dharm CC: Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d9a203b..fd248a31 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2598,6 +2598,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. unlock ejectable media); m = MAX_SECTORS_64 (don't transfer more than 64 sectors = 32 KB at a time); + n = INITIAL_READ10 (force a retry of the + initial READ(10) command); o = CAPACITY_OK (accept the capacity reported by the device); r = IGNORE_RESIDUE (the device reports -- cgit v1.1 From f699bf2328521cc3e20c412fcdb9ffe1255c360f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 9 Jun 2011 11:43:04 +1000 Subject: md:Documentation/md.txt - fix typo Reported-by: CoolCold Signed-off-by: NeilBrown --- Documentation/md.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index 2366b1c..f0eee83 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -555,7 +555,7 @@ also have sync_min sync_max The two values, given as numbers of sectors, indicate a range - withing the array where 'check'/'repair' will operate. Must be + within the array where 'check'/'repair' will operate. Must be a multiple of chunk_size. When it reaches "sync_max" it will pause, rather than complete. You can use 'select' or 'poll' on "sync_completed" to wait for -- cgit v1.1 From 09223371deac67d08ca0b70bd18787920284c967 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Tue, 14 Jun 2011 13:26:25 +0800 Subject: rcu: Use softirq to address performance regression Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li Tested-by: "Alex,Shi" Signed-off-by: Paul E. McKenney --- Documentation/filesystems/proc.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index f481780..db3b1ab 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -843,6 +843,7 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 + RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide -- cgit v1.1 From a59ec1e7ff98cc4365d5b1bff4e7102e86b5716b Mon Sep 17 00:00:00 2001 From: Michael Hennerich Date: Wed, 15 Jun 2011 15:08:11 -0700 Subject: backlight: new driver for the ADP8870 backlight devices Signed-off-by: Michael Hennerich Signed-off-by: Mike Frysinger Cc: Richard Purdie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .../testing/sysfs-class-backlight-driver-adp8870 | 56 ++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 new file mode 100644 index 0000000..aa11dbd --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 @@ -0,0 +1,56 @@ +What: /sys/class/backlight//_max +What: /sys/class/backlight//l1_daylight_max +What: /sys/class/backlight//l2_bright_max +What: /sys/class/backlight//l3_office_max +What: /sys/class/backlight//l4_indoor_max +What: /sys/class/backlight//l5_dark_max +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the maximum brightness for + on this . Values are between 0 and 127. This file + will also show the brightness level stored for this + . + +What: /sys/class/backlight//_dim +What: /sys/class/backlight//l2_bright_dim +What: /sys/class/backlight//l3_office_dim +What: /sys/class/backlight//l4_indoor_dim +What: /sys/class/backlight//l5_dark_dim +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the dim brightness for + on this . Values are between 0 and 127, typically + set to 0. Full off when the backlight is disabled. + This file will also show the dim brightness level stored for + this . + +What: /sys/class/backlight//ambient_light_level +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get conversion value of the light sensor. + This value is updated every 80 ms (when the light sensor + is enabled). Returns integer between 0 (dark) and + 8000 (max ambient brightness) + +What: /sys/class/backlight//ambient_light_zone +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get/Set current ambient light zone. Reading returns + integer between 1..5 (1 = daylight, 2 = bright, ..., 5 = dark). + Writing a value between 1..5 forces the backlight controller + to enter the corresponding ambient light zone. + Writing 0 returns to normal/automatic ambient light level + operation. The ambient light sensing feature on these devices + is an extension to the API documented in + Documentation/ABI/stable/sysfs-class-backlight. + It can be enabled by writing the value stored in + /sys/class/backlight//max_brightness to + /sys/class/backlight//brightness. \ No newline at end of file -- cgit v1.1 From 50c35e5ba255fd8428cef8ff076da8d23bfd4909 Mon Sep 17 00:00:00 2001 From: Ying Han Date: Wed, 15 Jun 2011 15:08:16 -0700 Subject: memcg: add documentation for the memory.numastat API [akpm@linux-foundation.org: rework text, fit it into 80-cols] Signed-off-by: Ying Han Reviewed-by: KOSAKI Motohiro Acked-by: Balbir Singh Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7c16347..510d645 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -70,6 +70,7 @@ Brief summary of control files. (See sysctl's vm.swappiness) memory.move_charge_at_immigrate # set/show controls of moving charges memory.oom_control # set/show oom controls. + memory.numa_stat # show the number of memory usage per numa node 1. History @@ -464,6 +465,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2). +5.6 numa_stat + +This is similar to numa_maps but operates on a per-memcg basis. This is +useful for providing visibility into the numa locality information within +an memcg since the pages are allowed to be allocated from any physical +node. One of the usecases is evaluating application performance by +combining this information with the application's cpu allocation. + +We export "total", "file", "anon" and "unevictable" pages per-node for +each memcg. The ouput format of memory.numa_stat is: + +total= N0= N1= ... +file= N0= N1= ... +anon= N0= N1= ... +unevictable= N0= N1= ... + +And we have total = file + anon + unevictable. + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. -- cgit v1.1 From 31b5f8eeece4c0d70b649bfac7759cf7e3f915dd Mon Sep 17 00:00:00 2001 From: "akpm@linux-foundation.org" Date: Wed, 15 Jun 2011 15:08:52 -0700 Subject: Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the ns_cgroup but it forgot to remove the related doc in feature-removal-schedule.txt. Signed-off-by: WANG Cong Cc: Daniel Lezcano Cc: Serge E. Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/feature-removal-schedule.txt | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 1a9446b..72e2384 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -481,23 +481,6 @@ Who: FUJITA Tomonori ---------------------------- -What: namespace cgroup (ns_cgroup) -When: 2.6.38 -Why: The ns_cgroup leads to some problems: - * cgroup creation is out-of-control - * cgroup name can conflict when pids are looping - * it is not possible to have a single process handling - a lot of namespaces without falling in a exponential creation time - * we may want to create a namespace without creating a cgroup - - The ns_cgroup is replaced by a compatibility flag 'clone_children', - where a newly created cgroup will copy the parent cgroup values. - The userspace has to manually create a cgroup and add a task to - the 'tasks' file. -Who: Daniel Lezcano - ----------------------------- - What: iwlwifi disable_hw_scan module parameters When: 2.6.40 Why: Hareware scan is the prefer method for iwlwifi devices for -- cgit v1.1 From 04c55715cbd5de526046745bca3d3b6ffa6641c6 Mon Sep 17 00:00:00 2001 From: Andrew Murray Date: Wed, 15 Jun 2011 12:57:09 -0700 Subject: Documentation: update printk-formats.txt This patch updates the incomplete documentation concerning the printk extended format specifiers. Signed-off-by: Andrew Murray Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/printk-formats.txt | 119 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 117 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 1b5a5dd..5df176e 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -9,7 +9,121 @@ If variable is of Type, use printk format specifier: size_t %zu or %zx ssize_t %zd or %zx -Raw pointer value SHOULD be printed with %p. +Raw pointer value SHOULD be printed with %p. The kernel supports +the following extended format specifiers for pointer types: + +Symbols/Function Pointers: + + %pF versatile_init+0x0/0x110 + %pf versatile_init + %pS versatile_init+0x0/0x110 + %ps versatile_init + %pB prev_fn_of_versatile_init+0x88/0x88 + + For printing symbols and function pointers. The 'S' and 's' specifiers + result in the symbol name with ('S') or without ('s') offsets. Where + this is used on a kernel without KALLSYMS - the symbol address is + printed instead. + + The 'B' specifier results in the symbol name with offsets and should be + used when printing stack backtraces. The specifier takes into + consideration the effect of compiler optimisations which may occur + when tail-call's are used and marked with the noreturn GCC attribute. + + On ia64, ppc64 and parisc64 architectures function pointers are + actually function descriptors which must first be resolved. The 'F' and + 'f' specifiers perform this resolution and then provide the same + functionality as the 'S' and 's' specifiers. + +Kernel Pointers: + + %pK 0x01234567 or 0x0123456789abcdef + + For printing kernel pointers which should be hidden from unprivileged + users. The behaviour of %pK depends on the kptr_restrict sysctl - see + Documentation/sysctl/kernel.txt for more details. + +Struct Resources: + + %pr [mem 0x60000000-0x6fffffff flags 0x2200] or + [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] + %pR [mem 0x60000000-0x6fffffff pref] or + [mem 0x0000000060000000-0x000000006fffffff pref] + + For printing struct resources. The 'R' and 'r' specifiers result in a + printed resource with ('R') or without ('r') a decoded flags member. + +MAC/FDDI addresses: + + %pM 00:01:02:03:04:05 + %pMF 00-01-02-03-04-05 + %pm 000102030405 + + For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm' + specifiers result in a printed address with ('M') or without ('m') byte + separators. The default byte separator is the colon (':'). + + Where FDDI addresses are concerned the 'F' specifier can be used after + the 'M' specifier to use dash ('-') separators instead of the default + separator. + +IPv4 addresses: + + %pI4 1.2.3.4 + %pi4 001.002.003.004 + %p[Ii][hnbl] + + For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4' + specifiers result in a printed address with ('i4') or without ('I4') + leading zeros. + + The additional 'h', 'n', 'b', and 'l' specifiers are used to specify + host, network, big or little endian order addresses respectively. Where + no specifier is provided the default network/big endian order is used. + +IPv6 addresses: + + %pI6 0001:0002:0003:0004:0005:0006:0007:0008 + %pi6 00010002000300040005000600070008 + %pI6c 1:2:3:4:5:6:7:8 + + For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6' + specifiers result in a printed address with ('I6') or without ('i6') + colon-separators. Leading zeros are always used. + + The additional 'c' specifier can be used with the 'I' specifier to + print a compressed IPv6 address as described by + http://tools.ietf.org/html/rfc5952 + +UUID/GUID addresses: + + %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f + %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F + %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f + %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F + + For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L', + 'b' and 'B' specifiers are used to specify a little endian order in + lower ('l') or upper case ('L') hex characters - and big endian order + in lower ('b') or upper case ('B') hex characters. + + Where no additional specifiers are used the default little endian + order with lower case hex characters will be printed. + +struct va_format: + + %pV + + For printing struct va_format structures. These contain a format string + and va_list as follows: + + struct va_format { + const char *fmt; + va_list *va; + }; + + Do not use this feature without some mechanism to verify the + correctness of the format string and va_list arguments. u64 SHOULD be printed with %llu/%llx, (unsigned long long): @@ -32,4 +146,5 @@ Reminder: sizeof() result is of type size_t. Thank you for your cooperation and attention. -By Randy Dunlap +By Randy Dunlap and +Andrew Murray -- cgit v1.1 From 06a2c45d6b4a7586eba7cd20dd656b08d8b63c2f Mon Sep 17 00:00:00 2001 From: "Maxin B. John" Date: Wed, 15 Jun 2011 12:58:29 -0700 Subject: Documentation: update kmemleak supported archs Instead of listing the architectures that are supported by kmemleak in Documentation/kmemleak.txt, just refer people to the list of supported architecutures in lib/Kconfig.debug so that Documentation/kmemleak.txt does not need more updates for this. Signed-off-by: Maxin B. John Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/kmemleak.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 090e6ee..51063e6 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -11,7 +11,9 @@ with the difference that the orphan objects are not freed but only reported via /sys/kernel/debug/kmemleak. A similar method is used by the Valgrind tool (memcheck --leak-check) to detect the memory leaks in user-space applications. -Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze and tile. + +Please check DEBUG_KMEMLEAK dependencies in lib/Kconfig.debug for supported +architectures. Usage ----- -- cgit v1.1 From f6e07d38078e82a6aeaae00bb134591ef5ac1167 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20Sommer?= Date: Wed, 15 Jun 2011 12:59:45 -0700 Subject: Documentation: update cgroupfs mount point MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/accounting/cgroupstats.txt | 4 +- Documentation/cgroups/blkio-controller.txt | 29 +++++++------- Documentation/cgroups/cgroups.txt | 58 +++++++++++++++++----------- Documentation/cgroups/cpuacct.txt | 21 +++++----- Documentation/cgroups/cpusets.txt | 28 +++++++------- Documentation/cgroups/devices.txt | 6 +-- Documentation/cgroups/freezer-subsystem.txt | 20 +++++----- Documentation/cgroups/memory.txt | 17 ++++---- Documentation/scheduler/sched-design-CFS.txt | 7 ++-- Documentation/scheduler/sched-rt-group.txt | 7 ++-- Documentation/vm/hwpoison.txt | 6 +-- 11 files changed, 109 insertions(+), 94 deletions(-) (limited to 'Documentation') diff --git a/Documentation/accounting/cgroupstats.txt b/Documentation/accounting/cgroupstats.txt index eda40fd..d16a984 100644 --- a/Documentation/accounting/cgroupstats.txt +++ b/Documentation/accounting/cgroupstats.txt @@ -21,7 +21,7 @@ information will not be available. To extract cgroup statistics a utility very similar to getdelays.c has been developed, the sample output of the utility is shown below -~/balbir/cgroupstats # ./getdelays -C "/cgroup/a" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a" sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0 -~/balbir/cgroupstats # ./getdelays -C "/cgroup" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup" sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2 diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 465351d..b1b1bfa 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -28,16 +28,19 @@ cgroups. Here is what you can do. - Enable group scheduling in CFQ CONFIG_CFQ_GROUP_IOSCHED=y -- Compile and boot into kernel and mount IO controller (blkio). +- Compile and boot into kernel and mount IO controller (blkio); see + cgroups.txt, Why are cgroups needed?. - mount -t cgroup -o blkio none /cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/blkio + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Create two cgroups - mkdir -p /cgroup/test1/ /cgroup/test2 + mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2 - Set weights of group test1 and test2 - echo 1000 > /cgroup/test1/blkio.weight - echo 500 > /cgroup/test2/blkio.weight + echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight + echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight - Create two same size files (say 512MB each) on same disk (file1, file2) and launch two dd threads in different cgroup to read those files. @@ -46,12 +49,12 @@ cgroups. Here is what you can do. echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/sdb/zerofile1 of=/dev/null & - echo $! > /cgroup/test1/tasks - cat /cgroup/test1/tasks + echo $! > /sys/fs/cgroup/blkio/test1/tasks + cat /sys/fs/cgroup/blkio/test1/tasks dd if=/mnt/sdb/zerofile2 of=/dev/null & - echo $! > /cgroup/test2/tasks - cat /cgroup/test2/tasks + echo $! > /sys/fs/cgroup/blkio/test2/tasks + cat /sys/fs/cgroup/blkio/test2/tasks - At macro level, first dd should finish first. To get more precise data, keep on looking at (with the help of script), at blkio.disk_time and @@ -68,13 +71,13 @@ Throttling/Upper Limit policy - Enable throttling in block layer CONFIG_BLK_DEV_THROTTLING=y -- Mount blkio controller - mount -t cgroup -o blkio none /cgroup/blkio +- Mount blkio controller (see cgroups.txt, Why are cgroups needed?) + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Specify a bandwidth rate on particular device for root group. The format for policy is ": ". - echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device + echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. @@ -149,7 +152,7 @@ Proportional weight policy files Following is the format. - #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device + # echo dev_maj:dev_minor weight > blkio.weight_device Configure weight=300 on /dev/sdb (8:16) in this cgroup # echo 8:16 300 > blkio.weight_device # cat blkio.weight_device diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 0ed99f0..15bca10 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -138,7 +138,7 @@ With the ability to classify tasks differently for different resources the admin can easily set up a script which receives exec notifications and depending on who is launching the browser he can - # echo browser_pid > /mnt///tasks + # echo browser_pid > /sys/fs/cgroup///tasks With only a single hierarchy, he now would potentially have to create a separate cgroup for every browser launched and associate it with @@ -153,9 +153,9 @@ apps enhanced CPU power, With ability to write pids directly to resource classes, it's just a matter of : - # echo pid > /mnt/network//tasks + # echo pid > /sys/fs/cgroup/network//tasks (after some time) - # echo pid > /mnt/network//tasks + # echo pid > /sys/fs/cgroup/network//tasks Without this ability, he would have to split the cgroup into multiple separate ones and then associate the new cgroups with the @@ -310,21 +310,24 @@ subsystem, this is the case for the cpuset. To start a new job that is to be contained within a cgroup, using the "cpuset" cgroup subsystem, the steps are something like: - 1) mkdir /dev/cgroup - 2) mount -t cgroup -ocpuset cpuset /dev/cgroup - 3) Create the new cgroup by doing mkdir's and write's (or echo's) in - the /dev/cgroup virtual file system. - 4) Start a task that will be the "founding father" of the new job. - 5) Attach that task to the new cgroup by writing its pid to the - /dev/cgroup tasks file for that cgroup. - 6) fork, exec or clone the job tasks from this founding father task. + 1) mount -t tmpfs cgroup_root /sys/fs/cgroup + 2) mkdir /sys/fs/cgroup/cpuset + 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + 4) Create the new cgroup by doing mkdir's and write's (or echo's) in + the /sys/fs/cgroup virtual file system. + 5) Start a task that will be the "founding father" of the new job. + 6) Attach that task to the new cgroup by writing its pid to the + /sys/fs/cgroup/cpuset/tasks file for that cgroup. + 7) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cgroup named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cgroup: - mount -t cgroup cpuset -ocpuset /dev/cgroup - cd /dev/cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/cpuset + mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -345,7 +348,7 @@ Creating, modifying, using the cgroups can be done through the cgroup virtual filesystem. To mount a cgroup hierarchy with all available subsystems, type: -# mount -t cgroup xxx /dev/cgroup +# mount -t cgroup xxx /sys/fs/cgroup The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like. @@ -354,23 +357,32 @@ Note: Some subsystems do not work without some user input first. For instance, if cpusets are enabled the user will have to populate the cpus and mems files for each new cgroup created before that group can be used. +As explained in section `1.2 Why are cgroups needed?' you should create +different hierarchies of cgroups for each single resource or group of +resources you want to control. Therefore, you should mount a tmpfs on +/sys/fs/cgroup and create directories for each cgroup resource or resource +group. + +# mount -t tmpfs cgroup_root /sys/fs/cgroup +# mkdir /sys/fs/cgroup/rg1 + To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: -# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup +# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1 To change the set of subsystems bound to a mounted hierarchy, just remount with different options: -# mount -o remount,cpuset,blkio hier1 /dev/cgroup +# mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1 Now memory is removed from the hierarchy and blkio is added. Note this will add blkio to the hierarchy but won't remove memory or cpuset, because the new options are appended to the old ones: -# mount -o remount,blkio /dev/cgroup +# mount -o remount,blkio /sys/fs/cgroup/rg1 To Specify a hierarchy's release_agent: # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ - xxx /dev/cgroup + xxx /sys/fs/cgroup/rg1 Note that specifying 'release_agent' more than once will return failure. @@ -379,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting the ability to arbitrarily bind/unbind subsystems from an existing cgroup hierarchy is intended to be implemented in the future. -Then under /dev/cgroup you can find a tree that corresponds to the -tree of the cgroups in the system. For instance, /dev/cgroup +Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the +tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 is the cgroup that holds the whole system. If you want to change the value of release_agent: -# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent +# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent It can also be changed via remount. -If you want to create a new cgroup under /dev/cgroup: -# cd /dev/cgroup +If you want to create a new cgroup under /sys/fs/cgroup/rg1: +# cd /sys/fs/cgroup/rg1 # mkdir my_cgroup Now you want to do something with this cgroup. diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt index 8b93094..9ad85df 100644 --- a/Documentation/cgroups/cpuacct.txt +++ b/Documentation/cgroups/cpuacct.txt @@ -10,26 +10,25 @@ directly present in its group. Accounting groups can be created by first mounting the cgroup filesystem. -# mkdir /cgroups -# mount -t cgroup -ocpuacct none /cgroups - -With the above step, the initial or the parent accounting group -becomes visible at /cgroups. At bootup, this group includes all the -tasks in the system. /cgroups/tasks lists the tasks in this cgroup. -/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by -this group which is essentially the CPU time obtained by all the tasks +# mount -t cgroup -ocpuacct none /sys/fs/cgroup + +With the above step, the initial or the parent accounting group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. +/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained +by this group which is essentially the CPU time obtained by all the tasks in the system. -New accounting groups can be created under the parent group /cgroups. +New accounting groups can be created under the parent group /sys/fs/cgroup. -# cd /cgroups +# cd /sys/fs/cgroup # mkdir g1 # echo $$ > g1 The above steps create a new group g1 and move the current shell process (bash) into it. CPU time consumed by this bash and its children can be obtained from g1/cpuacct.usage and the same is accumulated in -/cgroups/cpuacct.usage also. +/sys/fs/cgroup/cpuacct.usage also. cpuacct.stat file lists a few statistics which further divide the CPU time obtained by the cgroup into user and system times. Currently diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 98a3082..5b0d78e 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -661,21 +661,21 @@ than stress the kernel. To start a new job that is to be contained within a cpuset, the steps are: - 1) mkdir /dev/cpuset - 2) mount -t cgroup -ocpuset cpuset /dev/cpuset + 1) mkdir /sys/fs/cgroup/cpuset + 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset 3) Create the new cpuset by doing mkdir's and write's (or echo's) in - the /dev/cpuset virtual file system. + the /sys/fs/cgroup/cpuset virtual file system. 4) Start a task that will be the "founding father" of the new job. 5) Attach that task to the new cpuset by writing its pid to the - /dev/cpuset tasks file for that cpuset. + /sys/fs/cgroup/cpuset tasks file for that cpuset. 6) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cpuset named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cpuset: - mount -t cgroup -ocpuset cpuset /dev/cpuset - cd /dev/cpuset + mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -710,14 +710,14 @@ Creating, modifying, using the cpusets can be done through the cpuset virtual filesystem. To mount it, type: -# mount -t cgroup -o cpuset cpuset /dev/cpuset +# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset -Then under /dev/cpuset you can find a tree that corresponds to the -tree of the cpusets in the system. For instance, /dev/cpuset +Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the +tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset is the cpuset that holds the whole system. -If you want to create a new cpuset under /dev/cpuset: -# cd /dev/cpuset +If you want to create a new cpuset under /sys/fs/cgroup/cpuset: +# cd /sys/fs/cgroup/cpuset # mkdir my_cpuset Now you want to do something with this cpuset. @@ -765,12 +765,12 @@ wrapper around the cgroup filesystem. The command -mount -t cpuset X /dev/cpuset +mount -t cpuset X /sys/fs/cgroup/cpuset is equivalent to -mount -t cgroup -ocpuset,noprefix X /dev/cpuset -echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent +mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset +echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent 2.2 Adding/removing cpus ------------------------ diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt index 57ca4c8..16624a7f8 100644 --- a/Documentation/cgroups/devices.txt +++ b/Documentation/cgroups/devices.txt @@ -22,16 +22,16 @@ removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance - echo 'c 1:3 mr' > /cgroups/1/devices.allow + echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing - echo a > /cgroups/1/devices.deny + echo a > /sys/fs/cgroup/1/devices.deny will remove the default 'a *:* rwm' entry. Doing - echo a > /cgroups/1/devices.allow + echo a > /sys/fs/cgroup/1/devices.allow will add the 'a *:* rwm' entry to the whitelist. diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index 41f37fe..c21d777 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -59,28 +59,28 @@ is non-freezable. * Examples of usage : - # mkdir /containers - # mount -t cgroup -ofreezer freezer /containers - # mkdir /containers/0 - # echo $some_pid > /containers/0/tasks + # mkdir /sys/fs/cgroup/freezer + # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer + # mkdir /sys/fs/cgroup/freezer/0 + # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks to get status of the freezer subsystem : - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED to freeze all tasks in the container : - # echo FROZEN > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FREEZING - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FROZEN to unfreeze all tasks in the container : - # echo THAWED > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED This is the basic mechanism which should do the right thing for user space task diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 510d645..ffec241 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -264,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) -1. Prepare the cgroups -# mkdir -p /cgroups -# mount -t cgroup none /cgroups -o memory +1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) +# mount -t tmpfs none /sys/fs/cgroup +# mkdir /sys/fs/cgroup/memory +# mount -t cgroup none /sys/fs/cgroup/memory -o memory 2. Make the new group and move bash into it -# mkdir /cgroups/0 -# echo $$ > /cgroups/0/tasks +# mkdir /sys/fs/cgroup/memory/0 +# echo $$ > /sys/fs/cgroup/memory/0/tasks Since now we're in the 0 cgroup, we can alter the memory limit: -# echo 4M > /cgroups/0/memory.limit_in_bytes +# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) @@ -281,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). NOTE: We cannot set limits on the root cgroup any more. -# cat /cgroups/0/memory.limit_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes 4194304 We can check the usage: -# cat /cgroups/0/memory.usage_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes 1216512 A successful write to this file does not guarantee a successful set of diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 9996199..91ecff0 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt @@ -223,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each group created using the pseudo filesystem. See example steps below to create task groups and modify their CPU share using the "cgroups" pseudo filesystem. - # mkdir /dev/cpuctl - # mount -t cgroup -ocpu none /dev/cpuctl - # cd /dev/cpuctl + # mount -t tmpfs cgroup_root /sys/fs/cgroup + # mkdir /sys/fs/cgroup/cpu + # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu + # cd /sys/fs/cgroup/cpu # mkdir multimedia # create "multimedia" group of tasks # mkdir browser # create "browser" group of tasks diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 605b0d4..71b54d5 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt @@ -129,9 +129,8 @@ priority! Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real CPU bandwidth to task groups. -This uses the /cgroup virtual file system and -"/cgroup//cpu.rt_runtime_us" to control the CPU time reserved for each -control group. +This uses the cgroup virtual file system and "/cpu.rt_runtime_us" +to control the CPU time reserved for each control group. For more information on working with control groups, you should read Documentation/cgroups/cgroups.txt as well. @@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans): =============== There is work in progress to make the scheduling period for each group -("/cgroup//cpu.rt_period_us") configurable as well. +("/cpu.rt_period_us") configurable as well. The constraint on the period is that a subgroup must have a smaller or equal period to its parent. But realistically its not very useful _yet_ diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 12f9ba2..5500684 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -129,12 +129,12 @@ Limit injection to pages owned by memgroup. Specified by inode number of the memcg. Example: - mkdir /cgroup/hwpoison + mkdir /sys/fs/cgroup/mem/hwpoison usemem -m 100 -s 1000 & - echo `jobs -p` > /cgroup/hwpoison/tasks + echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks - memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') + memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ') echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg page-types -p `pidof init` --hwpoison # shall do nothing -- cgit v1.1 From 67de0162fbb78713fcb23cb2502b380faa8bde73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20Sommer?= Date: Wed, 15 Jun 2011 13:00:47 -0700 Subject: Documentation: fix cgroup typos and formatting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix format and spelling. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/cgroups/blkio-controller.txt | 2 +- Documentation/cgroups/cgroups.txt | 2 +- Documentation/cgroups/memory.txt | 22 +++++++++++----------- 3 files changed, 13 insertions(+), 13 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index b1b1bfa..cd45c8e 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -111,7 +111,7 @@ Hierarchical Cgroups CFQ and throttling will practically treat all groups at same level. pivot - / | \ \ + / / \ \ root test1 test2 test3 Down the line we can implement hierarchical accounting/control support diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 15bca10..cd67e90 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -142,7 +142,7 @@ and depending on who is launching the browser he can With only a single hierarchy, he now would potentially have to create a separate cgroup for every browser launched and associate it with -approp network and other resource class. This may lead to +appropriate network and other resource class. This may lead to proliferation of such cgroups. Also lets say that the administrator would like to give enhanced network diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index ffec241..06eb6d9 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -1,8 +1,8 @@ Memory Resource Controller -NOTE: The Memory Resource Controller has been generically been referred - to as the memory controller in this document. Do not confuse memory - controller used here with the memory controller that is used in hardware. +NOTE: The Memory Resource Controller has generically been referred to as the + memory controller in this document. Do not confuse memory controller + used here with the memory controller that is used in hardware. (For editors) In this document: @@ -182,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). -Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. +Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. @@ -214,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from OS point of view. * What happens when a cgroup hits memory.memsw.limit_in_bytes -When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out +When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out in this cgroup. Then, swap-out will not be done by cgroup routine and file caches are dropped. But as mentioned above, global LRU can do swapout memory from it for sanity of the system's memory management state. You can't forbid @@ -491,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the cgroup filesystem. Consider for example, the following cgroup filesystem hierarchy - root + root / | \ - / | \ - a b c - | \ - | \ - d e + / | \ + a b c + | \ + | \ + d e In the diagram above, with hierarchical accounting enabled, all memory usage of e, is accounted to its ancestors up until the root (i.e, c and root), -- cgit v1.1 From a9e758634f464ffb09344821a9f0b5a5c6df2b3e Mon Sep 17 00:00:00 2001 From: Sarah Sharp Date: Thu, 16 Jun 2011 13:06:04 -0700 Subject: USB: Fix up URB error codes to reflect implementation. Documentation/usb/error-codes.txt mentions that urb->status can be set to -EXDEV, if the isochronous transfer was not fully completed. However, in practice, EHCI, UHCI, and OHCI all only set -EXDEV in the individual frame status, never in the URB status. Those host controller actually always pass in a zero status to usb_hcd_giveback_urb, and rely on the core to set the appropriate status value. The xHCI driver ran into issues with the uvcvideo driver when it tried to set -EXDEV in urb->status, because the driver refused to submit URBs, and the userspace camera application's video froze. Clean up the documentation to reflect the actual implementation. Signed-off-by: Sarah Sharp Acked-by: Alan Stern --- Documentation/usb/error-codes.txt | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt index d83703e..b3f606b 100644 --- a/Documentation/usb/error-codes.txt +++ b/Documentation/usb/error-codes.txt @@ -76,6 +76,13 @@ A transfer's actual_length may be positive even when an error has been reported. That's because transfers often involve several packets, so that one or more packets could finish before an error stops further endpoint I/O. +For isochronous URBs, the urb status value is non-zero only if the URB is +unlinked, the device is removed, the host controller is disabled, or the total +transferred length is less than the requested length and the URB_SHORT_NOT_OK +flag is set. Completion handlers for isochronous URBs should only see +urb->status set to zero, -ENOENT, -ECONNRESET, -ESHUTDOWN, or -EREMOTEIO. +Individual frame descriptor status fields may report more status codes. + 0 Transfer completed successfully @@ -132,7 +139,7 @@ one or more packets could finish before an error stops further endpoint I/O. device removal events immediately. -EXDEV ISO transfer only partially completed - look at individual frame status for details + (only set in iso_frame_desc[n].status, not urb->status) -EINVAL ISO madness, if this happens: Log off and go home -- cgit v1.1 From 129b656a0de9a229a72fe4bb6bacd134a1477b44 Mon Sep 17 00:00:00 2001 From: Kevin Hilman Date: Fri, 10 Jun 2011 16:05:51 -0700 Subject: PM / Runtime: Update doc: usage count no longer incremented across system PM Commit e8665002477f0278f84f898145b1f141ba26ee26 (PM: Allow pm_runtime_suspend() to succeed during system suspend) removed usage count increment across system PM. Update doc to reflect this. Signed-off-by: Kevin Hilman Signed-off-by: Rafael J. Wysocki --- Documentation/power/runtime_pm.txt | 5 ----- 1 file changed, 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 654097b..22accb3 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -566,11 +566,6 @@ to do this is: pm_runtime_set_active(dev); pm_runtime_enable(dev); -The PM core always increments the run-time usage counter before calling the -->prepare() callback and decrements it after calling the ->complete() callback. -Hence disabling run-time PM temporarily like this will not cause any run-time -suspend callbacks to be lost. - 7. Generic subsystem callbacks Subsystems may wish to conserve code space by using the set of generic power -- cgit v1.1 From 78420884e680da8fbc3240de2d3106437042381e Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 18 Jun 2011 19:53:57 +0200 Subject: PM: Update documentation regarding sysdevs The part of Documentation/power/devices.txt regarding sysdevs is not valid any more after commit 2e711c04dbbf7a7732a3f7073b1fc285d12b369d (PM: Remove sysdev suspend, resume and shutdown operations), so remove it. Signed-off-by: Rafael J. Wysocki --- Documentation/power/devices.txt | 26 -------------------------- 1 file changed, 26 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 8888083..ff923fe 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -549,32 +549,6 @@ callbacks. The other platforms need not implement it or take it into account in any way. -System Devices --------------- -System devices (sysdevs) follow a slightly different API, which can be found in - - include/linux/sysdev.h - drivers/base/sys.c - -System devices will be suspended with interrupts disabled, and after all other -devices have been suspended. On resume, they will be resumed before any other -devices, and also with interrupts disabled. These things occur in special -"sysdev_driver" phases, which affect only system devices. - -Thus, after the suspend_noirq (or freeze_noirq or poweroff_noirq) phase, when -the non-boot CPUs are all offline and IRQs are disabled on the remaining online -CPU, then a sysdev_driver.suspend phase is carried out, and the system enters a -sleep state (or a system image is created). During resume (or after the image -has been created or loaded) a sysdev_driver.resume phase is carried out, IRQs -are enabled on the only online CPU, the non-boot CPUs are enabled, and the -resume_noirq (or thaw_noirq or restore_noirq) phase begins. - -Code to actually enter and exit the system-wide low power state sometimes -involves hardware details that are only known to the boot firmware, and -may leave a CPU running software (from SRAM or flash memory) that monitors -the system and manages its wakeup sequence. - - Device Low Power (suspend) States --------------------------------- Device low-power states aren't standard. One device might only handle -- cgit v1.1 From ca9c6890b598997165a7c85c001f382c910f12b0 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 21 Jun 2011 23:25:32 +0200 Subject: PM / Domains: Update documentation Commit 4d27e9dcff00a6425d779b065ec8892e4f391661 (PM: Make power domain callbacks take precedence over subsystem ones) forgot to update the device power management documentation to take changes made by it into account. Correct that mistake. Signed-off-by: Rafael J. Wysocki --- Documentation/power/devices.txt | 41 ++++++++++++++--------------------------- 1 file changed, 14 insertions(+), 27 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index ff923fe..64565aa 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -520,33 +520,20 @@ Support for power domains is provided through the pwr_domain field of struct device. This field is a pointer to an object of type struct dev_power_domain, defined in include/linux/pm.h, providing a set of power management callbacks analogous to the subsystem-level and device driver callbacks that are executed -for the given device during all power transitions, in addition to the respective -subsystem-level callbacks. Specifically, the power domain "suspend" callbacks -(i.e. ->runtime_suspend(), ->suspend(), ->freeze(), ->poweroff(), etc.) are -executed after the analogous subsystem-level callbacks, while the power domain -"resume" callbacks (i.e. ->runtime_resume(), ->resume(), ->thaw(), ->restore, -etc.) are executed before the analogous subsystem-level callbacks. Error codes -returned by the "suspend" and "resume" power domain callbacks are ignored. - -Power domain ->runtime_idle() callback is executed before the subsystem-level -->runtime_idle() callback and the result returned by it is not ignored. Namely, -if it returns error code, the subsystem-level ->runtime_idle() callback will not -be called and the helper function rpm_idle() executing it will return error -code. This mechanism is intended to help platforms where saving device state -is a time consuming operation and should only be carried out if all devices -in the power domain are idle, before turning off the shared power resource(s). -Namely, the power domain ->runtime_idle() callback may return error code until -the pm_runtime_idle() helper (or its asychronous version) has been called for -all devices in the power domain (it is recommended that the returned error code -be -EBUSY in those cases), preventing the subsystem-level ->runtime_idle() -callback from being run prematurely. - -The support for device power domains is only relevant to platforms needing to -use the same subsystem-level (e.g. platform bus type) and device driver power -management callbacks in many different power domain configurations and wanting -to avoid incorporating the support for power domains into the subsystem-level -callbacks. The other platforms need not implement it or take it into account -in any way. +for the given device during all power transitions, instead of the respective +subsystem-level callbacks. Specifically, if a device's pm_domain pointer is +not NULL, the ->suspend() callback from the object pointed to by it will be +executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and +anlogously for all of the remaining callbacks. In other words, power management +domain callbacks, if defined for the given device, always take precedence over +the callbacks provided by the device's subsystem (e.g. bus type). + +The support for device power management domains is only relevant to platforms +needing to use the same device driver power management callbacks in many +different power domain configurations and wanting to avoid incorporating the +support for power domains into subsystem-level callbacks, for example by +modifying the platform bus type. Other platforms need not implement it or take +it into account in any way. Device Low Power (suspend) States -- cgit v1.1 From 5efb54cc3fc104585cda81c44676f05115bd9ddd Mon Sep 17 00:00:00 2001 From: Kevin Hilman Date: Thu, 30 Jun 2011 15:07:31 -0700 Subject: PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist. Replace reference to pm_runtime_idle_sync() in the driver core with pm_runtime_put_sync() which is used in the code. Signed-off-by: Kevin Hilman Signed-off-by: Rafael J. Wysocki --- Documentation/power/runtime_pm.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 22accb3..518d9be 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -506,7 +506,7 @@ pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, they will fail returning -EAGAIN, because the device's usage counter is incremented by the core before executing ->probe() and ->remove(). Still, it may be desirable to suspend the device as soon as ->probe() or ->remove() has -finished, so the PM core uses pm_runtime_idle_sync() to invoke the +finished, so the PM core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for the device at that time. The user space can effectively disallow the driver of the device to power manage -- cgit v1.1 From f5da24dbed213d103f00aa9ef26e010b50d2db24 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 2 Jul 2011 14:27:11 +0200 Subject: PM / Runtime: Update documentation regarding driver removal Commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a (PM / Runtime: Rework runtime PM handling during driver removal) forgot to update the documentation in Documentation/power/runtime_pm.txt to match the new code in drivers/base/dd.c. Update that documentation to match the code it describes. Signed-off-by: Rafael J. Wysocki Reviewed-by: Kevin Hilman --- Documentation/power/runtime_pm.txt | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 518d9be..b24875b 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -501,13 +501,29 @@ helper functions described in Section 4. In that case, pm_runtime_resume() should be used. Of course, for this purpose the device's run-time PM has to be enabled earlier by calling pm_runtime_enable(). -If the device bus type's or driver's ->probe() or ->remove() callback runs +If the device bus type's or driver's ->probe() callback runs pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, they will fail returning -EAGAIN, because the device's usage counter is -incremented by the core before executing ->probe() and ->remove(). Still, it -may be desirable to suspend the device as soon as ->probe() or ->remove() has -finished, so the PM core uses pm_runtime_put_sync() to invoke the -subsystem-level idle callback for the device at that time. +incremented by the driver core before executing ->probe(). Still, it may be +desirable to suspend the device as soon as ->probe() has finished, so the driver +core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for +the device at that time. + +Moreover, the driver core prevents runtime PM callbacks from racing with the bus +notifier callback in __device_release_driver(), which is necessary, because the +notifier is used by some subsystems to carry out operations affecting the +runtime PM functionality. It does so by calling pm_runtime_get_sync() before +driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This +resumes the device if it's in the suspended state and prevents it from +being suspended again while those routines are being executed. + +To allow bus types and drivers to put devices into the suspended state by +calling pm_runtime_suspend() from their ->remove() routines, the driver core +executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER +notifications in __device_release_driver(). This requires bus types and +drivers to make their ->remove() callbacks avoid races with runtime PM directly, +but also it allows of more flexibility in the handling of devices during the +removal of their drivers. The user space can effectively disallow the driver of the device to power manage it at run time by changing the value of its /sys/devices/.../power/control -- cgit v1.1 From 5da556e33fc53179a5bec10b5698e262cf68e26d Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sun, 3 Jul 2011 13:32:53 +0200 Subject: hwmon: (f71882fg) Add support for the F71869A The F71869A is almost the same as the F71869F/E, except that it has the normal number of temp and pwm zones for a F71882FG derived chip, rather then the limited number of the F71869F/E. Signed-off-by: Hans de Goede Tested-by: Max Baldwin Acked-by: Guenter Roeck Signed-off-by: Jean Delvare --- Documentation/hwmon/f71882fg | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg index 84d2623..de91c0d 100644 --- a/Documentation/hwmon/f71882fg +++ b/Documentation/hwmon/f71882fg @@ -22,6 +22,10 @@ Supported chips: Prefix: 'f71869' Addresses scanned: none, address read from Super I/O config space Datasheet: Available from the Fintek website + * Fintek F71869A + Prefix: 'f71869a' + Addresses scanned: none, address read from Super I/O config space + Datasheet: Not public * Fintek F71882FG and F71883FG Prefix: 'f71882fg' Addresses scanned: none, address read from Super I/O config space -- cgit v1.1 From af75d5b771269f764999a67511e7d0c995d1a185 Mon Sep 17 00:00:00 2001 From: Clemens Ladisch Date: Sun, 3 Jul 2011 13:32:54 +0200 Subject: hwmon: (k10temp) Update documentation for Fam12h Add some CPU series IDs and links to the Fam12h datasheets. Signed-off-by: Clemens Ladisch Signed-off-by: Jean Delvare --- Documentation/hwmon/k10temp | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp index 0393c89..a10f736 100644 --- a/Documentation/hwmon/k10temp +++ b/Documentation/hwmon/k10temp @@ -9,8 +9,8 @@ Supported chips: Socket S1G3: Athlon II, Sempron, Turion II * AMD Family 11h processors: Socket S1G2: Athlon (X2), Sempron (X2), Turion X2 (Ultra) -* AMD Family 12h processors: "Llano" -* AMD Family 14h processors: "Brazos" (C/E/G-Series) +* AMD Family 12h processors: "Llano" (E2/A4/A6/A8-Series) +* AMD Family 14h processors: "Brazos" (C/E/G/Z-Series) * AMD Family 15h processors: "Bulldozer" Prefix: 'k10temp' @@ -20,12 +20,16 @@ Supported chips: http://support.amd.com/us/Processor_TechDocs/31116.pdf BIOS and Kernel Developer's Guide (BKDG) for AMD Family 11h Processors: http://support.amd.com/us/Processor_TechDocs/41256.pdf + BIOS and Kernel Developer's Guide (BKDG) for AMD Family 12h Processors: + http://support.amd.com/us/Processor_TechDocs/41131.pdf BIOS and Kernel Developer's Guide (BKDG) for AMD Family 14h Models 00h-0Fh Processors: http://support.amd.com/us/Processor_TechDocs/43170.pdf Revision Guide for AMD Family 10h Processors: http://support.amd.com/us/Processor_TechDocs/41322.pdf Revision Guide for AMD Family 11h Processors: http://support.amd.com/us/Processor_TechDocs/41788.pdf + Revision Guide for AMD Family 12h Processors: + http://support.amd.com/us/Processor_TechDocs/44739.pdf Revision Guide for AMD Family 14h Models 00h-0Fh Processors: http://support.amd.com/us/Processor_TechDocs/47534.pdf AMD Family 11h Processor Power and Thermal Data Sheet for Notebooks: -- cgit v1.1 From 316b3799880c55bb20f6d2db904245eccc30e25f Mon Sep 17 00:00:00 2001 From: Jesper Juhl Date: Wed, 6 Jul 2011 11:27:17 -0700 Subject: Documentation: update CodingStyle memory allocators The list of available general purpose memory allocators in Documentation/CodingStyle chapter 14 is incomplete. This patch adds the missing vzalloc() to the list. Signed-off-by: Jesper Juhl Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/CodingStyle | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 58b0bf9..fa6e25b 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -680,8 +680,8 @@ ones already enabled by DEBUG. Chapter 14: Allocating memory The kernel provides the following general purpose memory allocators: -kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API -documentation for further information about them. +kmalloc(), kzalloc(), kcalloc(), vmalloc(), and vzalloc(). Please refer to +the API documentation for further information about them. The preferred form for passing a size of a struct is the following: -- cgit v1.1 From 9b61fc4cf3a7232ecc39f573a1e68148ef40ea49 Mon Sep 17 00:00:00 2001 From: Andrea Righi Date: Wed, 6 Jul 2011 11:26:26 -0700 Subject: Documentation: fix cgroup blkio throttle filenames All the blkio.throttle.* file names are incorrectly reported without ".throttle" in the documentation. Fix it. Signed-off-by: Andrea Righi Signed-off-by: Randy Dunlap Acked-by: Vivek Goyal Signed-off-by: Linus Torvalds --- Documentation/cgroups/blkio-controller.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index cd45c8e..84f0a15 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -77,7 +77,7 @@ Throttling/Upper Limit policy - Specify a bandwidth rate on particular device for root group. The format for policy is ": ". - echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device + echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. @@ -90,7 +90,7 @@ Throttling/Upper Limit policy 1024+0 records out 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s - Limits for writes can be put using blkio.write_bps_device file. + Limits for writes can be put using blkio.throttle.write_bps_device file. Hierarchical Cgroups ==================== @@ -286,28 +286,28 @@ Throttling/Upper limit policy files specified in bytes per second. Rules are per deivce. Following is the format. - echo ": " > /cgrp/blkio.read_bps_device + echo ": " > /cgrp/blkio.throttle.read_bps_device - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per deivce. Following is the format. - echo ": " > /cgrp/blkio.write_bps_device + echo ": " > /cgrp/blkio.throttle.write_bps_device - blkio.throttle.read_iops_device - Specifies upper limit on READ rate from the device. IO rate is specified in IO per second. Rules are per deivce. Following is the format. - echo ": " > /cgrp/blkio.read_iops_device + echo ": " > /cgrp/blkio.throttle.read_iops_device - blkio.throttle.write_iops_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in io per second. Rules are per deivce. Following is the format. - echo ": " > /cgrp/blkio.write_iops_device + echo ": " > /cgrp/blkio.throttle.write_iops_device Note: If both BW and IOPS rules are specified for a device, then IO is subjectd to both the constraints. -- cgit v1.1 From 2d43f671c8fb0fd72e896a8b31e15b98916f707d Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sun, 5 Jun 2011 16:22:34 -0300 Subject: thinkpad-acpi: handle some new HKEY 0x60xx events Handle some user interface events from the newer Lenovo models. We are likely to do something smart with these events in the future, for now, hide the ones we are already certain about from the user and userspace both. * Events 0x6000 and 0x6005 are key-related. 0x6005 is not properly identified yet. Ignore these events, and do not report them. * Event 0x6040 has not been properly identified yet, and we don't know if it is important (looks like it isn't, but still...). Keep reporting it. * Change the message the driver outputs on unknown 0x6xxx events, as all recent events are not related to thermal alarms. Degrade log level from ALERT to WARNING. Thanks to all users who reported these events or asked about them in a number of mailing lists. Your help is highly appreciated, even if I did took a lot of time to act on them. For that I apologise. I will list those that identified the reasons for the events as "reported-by", and I apologise in advance if I leave anyone out: it was not done on purpose, I made the mistake of not properly tagging all event report emails separately, and might have missed some. Signed-off-by: Henrique de Moraes Holschuh Reported-by: Markus Malkusch Reported-by: Peter Giles Signed-off-by: Matthew Garrett --- Documentation/laptops/thinkpad-acpi.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 1565eefd..4bc92ea 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -534,6 +534,8 @@ Events that are never propagated by the driver: 0x2404 System is waking up from hibernation to undock 0x2405 System is waking up from hibernation to eject bay 0x5010 Brightness level changed/control event +0x6000 KEYBOARD: Numlock key pressed +0x6005 KEYBOARD: Fn key pressed (TO BE VERIFIED) Events that are propagated by the driver to userspace: @@ -552,6 +554,7 @@ Events that are propagated by the driver to userspace: 0x6021 ALARM: a sensor is too hot 0x6022 ALARM: a sensor is extremely hot 0x6030 System thermal table changed +0x6040 Nvidia Optimus/AC adapter related (TO BE VERIFIED) Battery nearly empty alarms are a last resort attempt to get the operating system to hibernate or shutdown cleanly (0x2313), or shutdown -- cgit v1.1 From a50245af782ea85b1d5ad23e1015e0ac52996b27 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sun, 5 Jun 2011 16:22:35 -0300 Subject: thinkpad-acpi: handle HKEY 0x4010, 0x4011 events Handle events 0x4010 and 0x4011 so that we do not pester users about them. These events report when the thinkpad is docked/undocked to a native hotplug dock (i.e. one that does not need ACPI handling, nor is represented in the ACPI device tree). Such docks are based on USB 2.0/3.0, and also work as port replicators. We really want a proper dock class to report these, or at least new input EV_SW events. Since it is not clear which one to use yet, keep reporting them as vendor-specific ThinkPad events. WARNING: As defined by the thinkpad-acpi sysfs ABI rules of engagement, the vendor-specific events will be REMOVED as soon as generic events are made available (duplicate events are a big problem), with an appropriate update to the thinkpad-acpi sysfs/event ABI versioning. Userspace is already prepared to provide easy backwards compatibility for such changes when convenient to the distro (see acpi-fakekey). * Event 0x4010: docking to hotplug dock/port replicator * Event 0x4011: undocking from hotplug dock/port replicator Typical usecase would be to trigger display reconfiguration. Reports mention T410, T510, and series 3 docks/port replicators. Special thanks to Robert de Rooy for his extensive report and analysis of the situation. http://www.thinkwiki.org/wiki/ThinkPad_Port_Replicator_Series_3 http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Series_3 http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Plus_Series_3 http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Plus_Series_3_for_Mobile_Workstations http://lenovoblogs.com/insidethebox/?p=290 Signed-off-by: Henrique de Moraes Holschuh Cc: Matthew Garrett Reported-by: Claudius Hubig Reported-by: Doctor Bill Reported-by: Korte Noack Reported-by: Robert de Rooy Reported-by: Sebastian Will Signed-off-by: Matthew Garrett --- Documentation/laptops/thinkpad-acpi.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 4bc92ea..6181548 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -547,6 +547,8 @@ Events that are propagated by the driver to userspace: 0x3006 Bay hotplug request (hint to power up SATA link when the optical drive tray is ejected) 0x4003 Undocked (see 0x2x04), can sleep again +0x4010 Docked into hotplug port replicator (non-ACPI dock) +0x4011 Undocked from hotplug port replicator (non-ACPI dock) 0x500B Tablet pen inserted into its storage bay 0x500C Tablet pen removed from its storage bay 0x6011 ALARM: battery is too hot -- cgit v1.1 From 6293698277f04eb1623536887651381ed3abc8d0 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 13 Jun 2011 09:38:54 -0300 Subject: [media] feature-removal-schedule: change in how radio device nodes are handled Radio devices have weird side-effects when used with combined TV/radio tuners and the V4L2 spec is ambiguous on how it should work. This results in inconsistent driver behavior which makes life hard for everyone. Be more strict in when and how the switch between radio and tv mode takes place and make sure all drivers behave the same. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- Documentation/feature-removal-schedule.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 1a9446b..9df0e09 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -600,3 +600,25 @@ Why: Superseded by the UVCIOC_CTRL_QUERY ioctl. Who: Laurent Pinchart ---------------------------- + +What: For VIDIOC_S_FREQUENCY the type field must match the device node's type. + If not, return -EINVAL. +When: 3.2 +Why: It makes no sense to switch the tuner to radio mode by calling + VIDIOC_S_FREQUENCY on a video node, or to switch the tuner to tv mode by + calling VIDIOC_S_FREQUENCY on a radio node. This is the first step of a + move to more consistent handling of tv and radio tuners. +Who: Hans Verkuil + +---------------------------- + +What: Opening a radio device node will no longer automatically switch the + tuner mode from tv to radio. +When: 3.3 +Why: Just opening a V4L device should not change the state of the hardware + like that. It's very unexpected and against the V4L spec. Instead, you + switch to radio mode by calling VIDIOC_S_FREQUENCY. This is the second + and last step of the move to consistent handling of tv and radio tuners. +Who: Hans Verkuil + +---------------------------- -- cgit v1.1 From c902ce1bfb40d8b049bd2319b388b4b68b04bc27 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 7 Jul 2011 12:19:48 +0100 Subject: FS-Cache: Add a helper to bulk uncache pages on an inode Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton Signed-off-by: David Howells Reviewed-and-Tested-by: Suresh Jayaraman cc: stable@kernel.org Signed-off-by: Linus Torvalds --- Documentation/filesystems/caching/netfs-api.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt index a167ab8..7cc6bf2 100644 --- a/Documentation/filesystems/caching/netfs-api.txt +++ b/Documentation/filesystems/caching/netfs-api.txt @@ -673,6 +673,22 @@ storage request to complete, or it may attempt to cancel the storage request - in which case the page will not be stored in the cache this time. +BULK INODE PAGE UNCACHE +----------------------- + +A convenience routine is provided to perform an uncache on all the pages +attached to an inode. This assumes that the pages on the inode correspond on a +1:1 basis with the pages in the cache. + + void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, + struct inode *inode); + +This takes the netfs cookie that the pages were cached with and the inode that +the pages are attached to. This function will wait for pages to finish being +written to the cache and for the cache to finish with the page generally. No +error is returned. + + ========================== INDEX AND DATA FILE UPDATE ========================== -- cgit v1.1 From 06b8fc5d308b15e853a68c3d4854fb2ac33db867 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Fri, 8 Jul 2011 09:31:31 -0700 Subject: net: Fix default in docs for tcp_orphan_retries. Default should be listed at 8 instead of 7. Reported-by: Denys Fedoryshchenko Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..bfe9242 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -346,7 +346,7 @@ tcp_orphan_retries - INTEGER when RTO retransmissions remain unacknowledged. See tcp_retries2 for more details. - The default value is 7. + The default value is 8. If your machine is a loaded WEB server, you should think about lowering this value, such sockets may consume significant resources. Cf. tcp_max_orphans. -- cgit v1.1 From f483d3923dc3a6394c483e28ccb3fe700bdf399e Mon Sep 17 00:00:00 2001 From: Ram Pai Date: Thu, 7 Jul 2011 11:19:10 -0700 Subject: PCI: conditional resource-reallocation through kernel parameter pci=realloc Multiple attempts to dynamically reallocate pci resources have unfortunately lead to regressions. Though we continue to fix the regressions and fine tune the dynamic-reallocation behavior, we have not reached a acceptable state yet. This patch provides a interim solution. It disables dynamic reallocation by default, but adds the ability to enable it through pci=realloc kernel command line parameter. Tested-by: Oliver Hartkopp Signed-off-by: Ram Pai Signed-off-by: Jesse Barnes --- Documentation/kernel-parameters.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index fd248a31..aa47be7 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2015,6 +2015,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the default. off: Turn ECRC off on: Turn ECRC on. + realloc reallocate PCI resources if allocations done by BIOS + are erroneous. pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power Management. -- cgit v1.1 From 05801817845b308e1cf0fb8e2700b15dab79afc5 Mon Sep 17 00:00:00 2001 From: Muthu Kumar Date: Mon, 11 Jul 2011 11:04:58 -0700 Subject: Documentation/spinlocks.txt: Remove reference to sti()/cli() Since we removed sti()/cli() and related, how about removing it from Documentation/spinlocks.txt? Signed-off-by: Muthukumar R Signed-off-by: Linus Torvalds --- Documentation/spinlocks.txt | 45 +++++++-------------------------------------- 1 file changed, 7 insertions(+), 38 deletions(-) (limited to 'Documentation') diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt index 2e3c64b..9dbe885 100644 --- a/Documentation/spinlocks.txt +++ b/Documentation/spinlocks.txt @@ -13,18 +13,8 @@ static DEFINE_SPINLOCK(xxx_lock); The above is always safe. It will disable interrupts _locally_, but the spinlock itself will guarantee the global lock, so it will guarantee that there is only one thread-of-control within the region(s) protected by that -lock. This works well even under UP. The above sequence under UP -essentially is just the same as doing - - unsigned long flags; - - save_flags(flags); cli(); - ... critical section ... - restore_flags(flags); - -so the code does _not_ need to worry about UP vs SMP issues: the spinlocks -work correctly under both (and spinlocks are actually more efficient on -architectures that allow doing the "save_flags + cli" in one operation). +lock. This works well even under UP also, so the code does _not_ need to +worry about UP vs SMP issues: the spinlocks work correctly under both. NOTE! Implications of spin_locks for memory are further described in: @@ -36,27 +26,7 @@ The above is usually pretty simple (you usually need and want only one spinlock for most things - using more than one spinlock can make things a lot more complex and even slower and is usually worth it only for sequences that you _know_ need to be split up: avoid it at all cost if you -aren't sure). HOWEVER, it _does_ mean that if you have some code that does - - cli(); - .. critical section .. - sti(); - -and another sequence that does - - spin_lock_irqsave(flags); - .. critical section .. - spin_unlock_irqrestore(flags); - -then they are NOT mutually exclusive, and the critical regions can happen -at the same time on two different CPU's. That's fine per se, but the -critical regions had better be critical for different things (ie they -can't stomp on each other). - -The above is a problem mainly if you end up mixing code - for example the -routines in ll_rw_block() tend to use cli/sti to protect the atomicity of -their actions, and if a driver uses spinlocks instead then you should -think about issues like the above. +aren't sure). This is really the only really hard part about spinlocks: once you start using spinlocks they tend to expand to areas you might not have noticed @@ -120,11 +90,10 @@ Lesson 3: spinlocks revisited. The single spin-lock primitives above are by no means the only ones. They are the most safe ones, and the ones that work under all circumstances, -but partly _because_ they are safe they are also fairly slow. They are -much faster than a generic global cli/sti pair, but slower than they'd -need to be, because they do have to disable interrupts (which is just a -single instruction on a x86, but it's an expensive one - and on other -architectures it can be worse). +but partly _because_ they are safe they are also fairly slow. They are slower +than they'd need to be, because they do have to disable interrupts +(which is just a single instruction on a x86, but it's an expensive one - +and on other architectures it can be worse). If you have a case where you have to protect a data structure across several CPU's and you want to use spinlocks you can potentially use -- cgit v1.1 From 5adaf851d2073c76bb98bc5cdb13d5754b7f0983 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 11 Jul 2011 16:48:38 -0700 Subject: Documentation/Changes: remove some really obsolete text That file harkens back to the days of the big 2.4 -> 2.6 version jump, and was based even then on older versions. Some of it is just obsolete, and Jesper Juhl points out that it talks about kernel versions 2.6 and should be updated to 3.0. Remove some obsolete text, and re-phrase some other to not be 2.6-specific. Reported-by: Jesper Juhl Signed-off-by: Linus Torvalds --- Documentation/Changes | 43 ++++++++++++++++++------------------------- 1 file changed, 18 insertions(+), 25 deletions(-) (limited to 'Documentation') diff --git a/Documentation/Changes b/Documentation/Changes index 5f4828a0..b175808 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -2,13 +2,7 @@ Intro ===== This document is designed to provide a list of the minimum levels of -software necessary to run the 2.6 kernels, as well as provide brief -instructions regarding any other "Gotchas" users may encounter when -trying life on the Bleeding Edge. If upgrading from a pre-2.4.x -kernel, please consult the Changes file included with 2.4.x kernels for -additional information; most of that information will not be repeated -here. Basically, this document assumes that your system is already -functional and running at least 2.4.x kernels. +software necessary to run the 3.0 kernels. This document is originally based on my "Changes" file for 2.0.x kernels and therefore owes credit to the same people as that file (Jared Mauch, @@ -22,11 +16,10 @@ Upgrade to at *least* these software revisions before thinking you've encountered a bug! If you're unsure what version you're currently running, the suggested command should tell you. -Again, keep in mind that this list assumes you are already -functionally running a Linux 2.4 kernel. Also, not all tools are -necessary on all systems; obviously, if you don't have any ISDN -hardware, for example, you probably needn't concern yourself with -isdn4k-utils. +Again, keep in mind that this list assumes you are already functionally +running a Linux kernel. Also, not all tools are necessary on all +systems; obviously, if you don't have any ISDN hardware, for example, +you probably needn't concern yourself with isdn4k-utils. o Gnu C 3.2 # gcc --version o Gnu make 3.80 # make --version @@ -114,12 +107,12 @@ Ksymoops If the unthinkable happens and your kernel oopses, you may need the ksymoops tool to decode it, but in most cases you don't. -In the 2.6 kernel it is generally preferred to build the kernel with -CONFIG_KALLSYMS so that it produces readable dumps that can be used as-is -(this also produces better output than ksymoops). -If for some reason your kernel is not build with CONFIG_KALLSYMS and -you have no way to rebuild and reproduce the Oops with that option, then -you can still decode that Oops with ksymoops. +It is generally preferred to build the kernel with CONFIG_KALLSYMS so +that it produces readable dumps that can be used as-is (this also +produces better output than ksymoops). If for some reason your kernel +is not build with CONFIG_KALLSYMS and you have no way to rebuild and +reproduce the Oops with that option, then you can still decode that Oops +with ksymoops. Module-Init-Tools ----------------- @@ -261,8 +254,8 @@ needs to be recompiled or (preferably) upgraded. NFS-utils --------- -In 2.4 and earlier kernels, the nfs server needed to know about any -client that expected to be able to access files via NFS. This +In ancient (2.4 and earlier) kernels, the nfs server needed to know +about any client that expected to be able to access files via NFS. This information would be given to the kernel by "mountd" when the client mounted the filesystem, or by "exportfs" at system startup. exportfs would take information about active clients from /var/lib/nfs/rmtab. @@ -272,11 +265,11 @@ which is not always easy, particularly when trying to implement fail-over. Even when the system is working well, rmtab suffers from getting lots of old entries that never get removed. -With 2.6 we have the option of having the kernel tell mountd when it -gets a request from an unknown host, and mountd can give appropriate -export information to the kernel. This removes the dependency on -rmtab and means that the kernel only needs to know about currently -active clients. +With modern kernels we have the option of having the kernel tell mountd +when it gets a request from an unknown host, and mountd can give +appropriate export information to the kernel. This removes the +dependency on rmtab and means that the kernel only needs to know about +currently active clients. To enable this new functionality, you need to: -- cgit v1.1 From 11e48feebe8b5d2ae348893a6e0d47c46204bee2 Mon Sep 17 00:00:00 2001 From: Darren Hart Date: Mon, 11 Jul 2011 20:27:40 -0700 Subject: x86, doc only: Correct real-mode kernel header offset for init_size The real-mode kernel header init_size field is located at 0x260 per the field listing in th e"REAL-MODE KERNEL HEADER" section. It is listed as 0x25c in the "DETAILS OF HEADER FIELDS" section, which overlaps with pref_address. Correct the details listing to 0x260. Signed-off-by: Darren Hart Link: http://lkml.kernel.org/r/541cf88e2dfe5b8186d8b96b136d892e769a68c1.1310441260.git.dvhart@linux.intel.com CC: H. Peter Anvin Signed-off-by: H. Peter Anvin --- Documentation/x86/boot.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index 9b7221a..7c3a880 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt @@ -674,7 +674,7 @@ Protocol: 2.10+ Field name: init_size Type: read -Offset/size: 0x25c/4 +Offset/size: 0x260/4 This field indicates the amount of linear contiguous memory starting at the kernel runtime start address that the kernel needs before it -- cgit v1.1 From 3a36199114de0e60984534776732e0a7a220e29e Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sun, 19 Jun 2011 16:56:29 +0900 Subject: nilfs2: remove resize from unsupported features list Resize feature was supported by the commit 4e33f9eab07e but it was not reflected to the list of unsupported features in nilfs2.txt file. This updates the list to fix discrepancy. Signed-off-by: Ryusuke Konishi --- Documentation/filesystems/nilfs2.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index d5c0cef..873a2ab 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -40,7 +40,6 @@ Features which NILFS2 does not support yet: - POSIX ACLs - quotas - fsck - - resize - defragmentation Mount options -- cgit v1.1