Merge branch 'perf/core' into perf/probes

Resolved merge conflict in tools/perf/Makefile Merge reason: we want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Ingo Molnar <mingo@elte.hu> 2009-11-17 10:16:43 +0100
committer: Ingo Molnar <mingo@elte.hu> 2009-11-17 10:17:47 +0100
commit: a7b63425a41cd6a8d50f76fef0660c5110f97e91 (patch)
tree: be17ee121f1c8814d8d39c9f3e0205d9397fab54 /Documentation
parent: 35039eb6b199749943547c8572be6604edf00229 (diff)
parent: 3726cc75e581c157202da93bb2333cce25c15c98 (diff)
download: op-kernel-dev-a7b63425a41cd6a8d50f76fef0660c5110f97e91.zip
op-kernel-dev-a7b63425a41cd6a8d50f76fef0660c5110f97e91.tar.gz
12 files changed, 598 insertions, 228 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable
deleted file mode 100644
index 175bb4f..0000000
--- a/Documentation/ABI/testing/sysfs-devices-cache_disable
+++ /dev/null
@@ -1,18 +0,0 @@
-What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
-Date:      August 2008
-KernelVersion:	2.6.27
-Contact:	mark.langsdorf@amd.com
-Description:	These files exist in every cpu's cache index directories.
-		There are currently 2 cache_disable_# files in each
-		directory.  Reading from these files on a supported
-		processor will return that cache disable index value
-		for that processor and node.  Writing to one of these
-		files will cause the specificed cache index to be disabled.
-
-		Currently, only AMD Family 10h Processors support cache index
-		disable, and only for their L3 caches.  See the BIOS and
-		Kernel Developer's Guide at
-		http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
-		for formatting information and other details on the
-		cache index disable.
-Users:    joachim.deguara@amd.com
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
new file mode 100644
index 0000000..a703b9e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -0,0 +1,156 @@
+What:		/sys/devices/system/cpu/
+Date:		pre-git history
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+		A collection of both global and individual CPU attributes
+
+		Individual CPU attributes are contained in subdirectories
+		named by the kernel's logical CPU number, e.g.:
+
+		/sys/devices/system/cpu/cpu#/
+
+What:		/sys/devices/system/cpu/sched_mc_power_savings
+		/sys/devices/system/cpu/sched_smt_power_savings
+Date:		June 2006
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Discover and adjust the kernel's multi-core scheduler support.
+
+		Possible values are:
+
+		0 - No power saving load balance (default value)
+		1 - Fill one thread/core/package first for long running threads
+		2 - Also bias task wakeups to semi-idle cpu package for power
+		    savings
+
+		sched_mc_power_savings is dependent upon SCHED_MC, which is
+		itself architecture dependent.
+
+		sched_smt_power_savings is dependent upon SCHED_SMT, which
+		is itself architecture dependent.
+
+		The two files are independent of each other. It is possible
+		that one file may be present without the other.
+
+		Introduced by git commit 5c45bf27.
+
+
+What:		/sys/devices/system/cpu/kernel_max
+		/sys/devices/system/cpu/offline
+		/sys/devices/system/cpu/online
+		/sys/devices/system/cpu/possible
+		/sys/devices/system/cpu/present
+Date:		December 2008
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	CPU topology files that describe kernel limits related to
+		hotplug. Briefly:
+
+		kernel_max: the maximum cpu index allowed by the kernel
+		configuration.
+
+		offline: cpus that are not online because they have been
+		HOTPLUGGED off or exceed the limit of cpus allowed by the
+		kernel configuration (kernel_max above).
+
+		online: cpus that are online and being scheduled.
+
+		possible: cpus that have been allocated resources and can be
+		brought online if they are present.
+
+		present: cpus that have been identified as being present in
+		the system.
+
+		See Documentation/cputopology.txt for more information.
+
+
+
+What:		/sys/devices/system/cpu/cpu#/node
+Date:		October 2009
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:	Discover NUMA node a CPU belongs to
+
+		When CONFIG_NUMA is enabled, a symbolic link that points
+		to the corresponding NUMA node directory.
+
+		For example, the following symlink is created for cpu42
+		in NUMA node 2:
+
+		/sys/devices/system/cpu/cpu42/node2 -> ../../node/node2
+
+
+What:		/sys/devices/system/cpu/cpu#/topology/core_id
+		/sys/devices/system/cpu/cpu#/topology/core_siblings
+		/sys/devices/system/cpu/cpu#/topology/core_siblings_list
+		/sys/devices/system/cpu/cpu#/topology/physical_package_id
+		/sys/devices/system/cpu/cpu#/topology/thread_siblings
+		/sys/devices/system/cpu/cpu#/topology/thread_siblings_list
+Date:		December 2008
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	CPU topology files that describe a logical CPU's relationship
+		to other cores and threads in the same physical package.
+
+		One cpu# directory is created per logical CPU in the system,
+		e.g. /sys/devices/system/cpu/cpu42/.
+
+		Briefly, the files above are:
+
+		core_id: the CPU core ID of cpu#. Typically it is the
+		hardware platform's identifier (rather than the kernel's).
+		The actual value is architecture and platform dependent.
+
+		core_siblings: internal kernel map of cpu#'s hardware threads
+		within the same physical_package_id.
+
+		core_siblings_list: human-readable list of the logical CPU
+		numbers within the same physical_package_id as cpu#.
+
+		physical_package_id: physical package id of cpu#. Typically
+		corresponds to a physical socket number, but the actual value
+		is architecture and platform dependent.
+
+		thread_siblings: internel kernel map of cpu#'s hardware
+		threads within the same core as cpu#
+
+		thread_siblings_list: human-readable list of cpu#'s hardware
+		threads within the same core as cpu#
+
+		See Documentation/cputopology.txt for more information.
+
+
+What:		/sys/devices/system/cpu/cpuidle/current_driver
+		/sys/devices/system/cpu/cpuidle/current_governer_ro
+Date:		September 2007
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Discover cpuidle policy and mechanism
+
+		Various CPUs today support multiple idle levels that are
+		differentiated by varying exit latencies and power
+		consumption during idle.
+
+		Idle policy (governor) is differentiated from idle mechanism
+		(driver)
+
+		current_driver: displays current idle mechanism
+
+		current_governor_ro: displays current idle policy
+
+		See files in Documentation/cpuidle/ for more information.
+
+
+What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
+Date:      August 2008
+KernelVersion:	2.6.27
+Contact:	mark.langsdorf@amd.com
+Description:	These files exist in every cpu's cache index directories.
+		There are currently 2 cache_disable_# files in each
+		directory.  Reading from these files on a supported
+		processor will return that cache disable index value
+		for that processor and node.  Writing to one of these
+		files will cause the specificed cache index to be disabled.
+
+		Currently, only AMD Family 10h Processors support cache index
+		disable, and only for their L3 caches.  See the BIOS and
+		Kernel Developer's Guide at
+		http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
+		for formatting information and other details on the
+		cache index disable.
+Users:    joachim.deguara@amd.com
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index b41f3e5..f1c5c4b 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -1,15 +1,28 @@
 
-Export cpu topology info via sysfs. Items (attributes) are similar
+Export CPU topology info via sysfs. Items (attributes) are similar
 to /proc/cpuinfo.
 
 1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
-represent the physical package id of  cpu X;
+
+	physical package id of cpuX. Typically corresponds to a physical
+	socket number, but the actual value is architecture and platform
+	dependent.
+
 2) /sys/devices/system/cpu/cpuX/topology/core_id:
-represent the cpu core id to cpu X;
+
+	the CPU core ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
 3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
-represent the thread siblings to cpu X in the same core;
+
+	internel kernel map of cpuX's hardware threads within the same
+	core as cpuX
+
 4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
-represent the thread siblings to cpu X in the same physical package;
+
+	internal kernel map of cpuX's hardware threads within the same
+	physical_package_id.
 
 To implement it in an architecture-neutral way, a new source file,
 drivers/base/topology.c, is to export the 4 attributes.
@@ -32,32 +45,32 @@ not defined by include/asm-XXX/topology.h:
 3) thread_siblings: just the given CPU
 4) core_siblings: just the given CPU
 
-Additionally, cpu topology information is provided under
+Additionally, CPU topology information is provided under
 /sys/devices/system/cpu and includes these files.  The internal
 source for the output is in brackets ("[]").
 
-    kernel_max: the maximum cpu index allowed by the kernel configuration.
+    kernel_max: the maximum CPU index allowed by the kernel configuration.
 		[NR_CPUS-1]
 
-    offline:	cpus that are not online because they have been
+    offline:	CPUs that are not online because they have been
 		HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
-		of cpus allowed by the kernel configuration (kernel_max
+		of CPUs allowed by the kernel configuration (kernel_max
 		above). [~cpu_online_mask + cpus >= NR_CPUS]
 
-    online:	cpus that are online and being scheduled [cpu_online_mask]
+    online:	CPUs that are online and being scheduled [cpu_online_mask]
 
-    possible:	cpus that have been allocated resources and can be
+    possible:	CPUs that have been allocated resources and can be
 		brought online if they are present. [cpu_possible_mask]
 
-    present:	cpus that have been identified as being present in the
+    present:	CPUs that have been identified as being present in the
 		system. [cpu_present_mask]
 
 The format for the above output is compatible with cpulist_parse()
 [see <linux/cpumask.h>].  Some examples follow.
 
-In this example, there are 64 cpus in the system but cpus 32-63 exceed
+In this example, there are 64 CPUs in the system but cpus 32-63 exceed
 the kernel max which is limited to 0..31 by the NR_CPUS config option
-being 32.  Note also that cpus 2 and 4-31 are not online but could be
+being 32.  Note also that CPUs 2 and 4-31 are not online but could be
 brought online as they are both present and possible.
 
      kernel_max: 31
@@ -67,8 +80,8 @@ brought online as they are both present and possible.
         present: 0-31
 
 In this example, the NR_CPUS config option is 128, but the kernel was
-started with possible_cpus=144.  There are 4 cpus in the system and cpu2
-was manually taken offline (and is the only cpu that can be brought
+started with possible_cpus=144.  There are 4 CPUs in the system and cpu2
+was manually taken offline (and is the only CPU that can be brought
 online.)
 
      kernel_max: 127
@@ -78,4 +91,4 @@ online.)
         present: 0-3
 
 See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
-as well as more information on the various cpumask's.
+as well as more information on the various cpumasks.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 04e6c81..bc693ff 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -418,6 +418,14 @@ When:	2.6.33
 Why:	Should be implemented in userspace, policy daemon.
 Who:	Johannes Berg <johannes@sipsolutions.net>
 
+---------------------------
+
+What:	CONFIG_INOTIFY
+When:	2.6.33
+Why:	last user (audit) will be converted to the newer more generic
+	and more easily maintained fsnotify subsystem
+Who:	Eric Paris <eparis@redhat.com>
+
 ----------------------------
 
 What:	lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index bf4f4b7..6d94e06 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -134,9 +134,15 @@ ro                   	Mount filesystem read only. Note that ext4 will
                      	mount options "ro,noload" can be used to prevent
 		     	writes to the filesystem.
 
+journal_checksum	Enable checksumming of the journal transactions.
+			This will allow the recovery code in e2fsck and the
+			kernel to detect corruption in the kernel.  It is a
+			compatible change and will be ignored by older kernels.
+
 journal_async_commit	Commit block can be written to disk without waiting
 			for descriptor blocks. If enabled older kernels cannot
-			mount the device.
+			mount the device. This will enable 'journal_checksum'
+			internally.
 
 journal=update		Update the ext4 file system's journal to the current
 			format.
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index dcbd502..82def88 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -353,10 +353,20 @@ power[1-*]_average		Average power use
 				Unit: microWatt
 				RO
 
-power[1-*]_average_interval	Power use averaging interval
+power[1-*]_average_interval	Power use averaging interval.  A poll
+				notification is sent to this file if the
+				hardware changes the averaging interval.
 				Unit: milliseconds
 				RW
 
+power[1-*]_average_interval_max	Maximum power use averaging interval
+				Unit: milliseconds
+				RO
+
+power[1-*]_average_interval_min	Minimum power use averaging interval
+				Unit: milliseconds
+				RO
+
 power[1-*]_average_highest	Historical average maximum power use
 				Unit: microWatt
 				RO
@@ -365,6 +375,18 @@ power[1-*]_average_lowest	Historical average minimum power use
 				Unit: microWatt
 				RO
 
+power[1-*]_average_max		A poll notification is sent to
+				power[1-*]_average when power use
+				rises above this value.
+				Unit: microWatt
+				RW
+
+power[1-*]_average_min		A poll notification is sent to
+				power[1-*]_average when power use
+				sinks below this value.
+				Unit: microWatt
+				RW
+
 power[1-*]_input		Instantaneous power use
 				Unit: microWatt
 				RO
@@ -381,6 +403,39 @@ power[1-*]_reset_history	Reset input_highest, input_lowest,
 				average_highest and average_lowest.
 				WO
 
+power[1-*]_accuracy		Accuracy of the power meter.
+				Unit: Percent
+				RO
+
+power[1-*]_alarm		1 if the system is drawing more power than the
+				cap allows; 0 otherwise.  A poll notification is
+				sent to this file when the power use exceeds the
+				cap.  This file only appears if the cap is known
+				to be enforced by hardware.
+				RO
+
+power[1-*]_cap			If power use rises above this limit, the
+				system should take action to reduce power use.
+				A poll notification is sent to this file if the
+				cap is changed by the hardware.  The *_cap
+				files only appear if the cap is known to be
+				enforced by hardware.
+				Unit: microWatt
+				RW
+
+power[1-*]_cap_hyst		Margin of hysteresis built around capping and
+				notification.
+				Unit: microWatt
+				RW
+
+power[1-*]_cap_max		Maximum cap that can be set.
+				Unit: microWatt
+				RO
+
+power[1-*]_cap_min		Minimum cap that can be set.
+				Unit: microWatt
+				RO
+
 **********
 * Energy *
 **********
diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4
index c5b37c5..ac540c7 100644
--- a/Documentation/i2c/busses/i2c-piix4
+++ b/Documentation/i2c/busses/i2c-piix4
@@ -8,7 +8,7 @@ Supported adapters:
     Datasheet: Only available via NDA from ServerWorks
   * ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges
     Datasheet: Not publicly available
-  * AMD SB900
+  * AMD Hudson-2
     Datasheet: Not publicly available
   * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
     Datasheet: Publicly available at the SMSC website http://www.smsc.com
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index ba9373f..098de5b 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -42,7 +42,6 @@
 #include <signal.h>
 #include "linux/lguest_launcher.h"
 #include "linux/virtio_config.h"
-#include <linux/virtio_ids.h>
 #include "linux/virtio_net.h"
 #include "linux/virtio_blk.h"
 #include "linux/virtio_console.h"
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 1c8eb45..fd9a2f6 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -522,7 +522,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
     pcm_devs       - Number of PCM devices assigned to each card
                      (default = 1, up to 4)
     pcm_substreams - Number of PCM substreams assigned to each PCM
-                     (default = 8, up to 16)
+                     (default = 8, up to 128)
     hrtimer        - Use hrtimer (=1, default) or system timer (=0)
     fake_buffer    - Fake buffer allocations (default = 1)
 
diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index 70d68ce..a87dc27 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -1,5 +1,5 @@
 Generic Thermal Sysfs driver How To
-=========================
+===================================
 
 Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com>
 
@@ -10,20 +10,20 @@ Copyright (c)  2008 Intel Corporation
 
 0. Introduction
 
-The generic thermal sysfs provides a set of interfaces for thermal zone devices (sensors)
-and thermal cooling devices (fan, processor...) to register with the thermal management
-solution and to be a part of it.
+The generic thermal sysfs provides a set of interfaces for thermal zone
+devices (sensors) and thermal cooling devices (fan, processor...) to register
+with the thermal management solution and to be a part of it.
 
-This how-to focuses on enabling new thermal zone and cooling devices to participate
-in thermal management.
-This solution is platform independent and any type of thermal zone devices and
-cooling devices should be able to make use of the infrastructure.
+This how-to focuses on enabling new thermal zone and cooling devices to
+participate in thermal management.
+This solution is platform independent and any type of thermal zone devices
+and cooling devices should be able to make use of the infrastructure.
 
-The main task of the thermal sysfs driver is to expose thermal zone attributes as well
-as cooling device attributes to the user space.
-An intelligent thermal management application can make decisions based on inputs
-from thermal zone attributes (the current temperature and trip point temperature)
-and throttle appropriate devices.
+The main task of the thermal sysfs driver is to expose thermal zone attributes
+as well as cooling device attributes to the user space.
+An intelligent thermal management application can make decisions based on
+inputs from thermal zone attributes (the current temperature and trip point
+temperature) and throttle appropriate devices.
 
 [0-*]	denotes any positive number starting from 0
 [1-*]	denotes any positive number starting from 1
@@ -31,77 +31,77 @@ and throttle appropriate devices.
 1. thermal sysfs driver interface functions
 
 1.1 thermal zone device interface
-1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *name, int trips,
-				void *devdata, struct thermal_zone_device_ops *ops)
-
-	This interface function adds a new thermal zone device (sensor) to
-	/sys/class/thermal folder as thermal_zone[0-*].
-	It tries to bind all the thermal cooling devices registered at the same time.
-
-	name: the thermal zone name.
-	trips: the total number of trip points this thermal zone supports.
-	devdata: device private data
-	ops: thermal zone device call-backs.
-		.bind: bind the thermal zone device with a thermal cooling device.
-		.unbind: unbind the thermal zone device with a thermal cooling device.
-		.get_temp: get the current temperature of the thermal zone.
-		.get_mode: get the current mode (user/kernel) of the thermal zone.
-			   "kernel" means thermal management is done in kernel.
-			   "user" will prevent kernel thermal driver actions upon trip points
-			   so that user applications can take charge of thermal management.
-		.set_mode: set the mode (user/kernel) of the thermal zone.
-		.get_trip_type: get the type of certain trip point.
-		.get_trip_temp: get the temperature above which the certain trip point
-				will be fired.
+1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *name,
+		int trips, void *devdata, struct thermal_zone_device_ops *ops)
+
+    This interface function adds a new thermal zone device (sensor) to
+    /sys/class/thermal folder as thermal_zone[0-*]. It tries to bind all the
+    thermal cooling devices registered at the same time.
+
+    name: the thermal zone name.
+    trips: the total number of trip points this thermal zone supports.
+    devdata: device private data
+    ops: thermal zone device call-backs.
+	.bind: bind the thermal zone device with a thermal cooling device.
+	.unbind: unbind the thermal zone device with a thermal cooling device.
+	.get_temp: get the current temperature of the thermal zone.
+	.get_mode: get the current mode (user/kernel) of the thermal zone.
+	    - "kernel" means thermal management is done in kernel.
+	    - "user" will prevent kernel thermal driver actions upon trip points
+	      so that user applications can take charge of thermal management.
+	.set_mode: set the mode (user/kernel) of the thermal zone.
+	.get_trip_type: get the type of certain trip point.
+	.get_trip_temp: get the temperature above which the certain trip point
+			will be fired.
 
 1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 
-	This interface function removes the thermal zone device.
-	It deletes the corresponding entry form /sys/class/thermal folder and unbind all
-	the thermal cooling devices it uses.
+    This interface function removes the thermal zone device.
+    It deletes the corresponding entry form /sys/class/thermal folder and
+    unbind all the thermal cooling devices it uses.
 
 1.2 thermal cooling device interface
 1.2.1 struct thermal_cooling_device *thermal_cooling_device_register(char *name,
-					void *devdata, struct thermal_cooling_device_ops *)
-
-	This interface function adds a new thermal cooling device (fan/processor/...) to
-	/sys/class/thermal/ folder as cooling_device[0-*].
-	It tries to bind itself to all the thermal zone devices register at the same time.
-	name: the cooling device name.
-	devdata: device private data.
-	ops: thermal cooling devices call-backs.
-		.get_max_state: get the Maximum throttle state of the cooling device.
-		.get_cur_state: get the Current throttle state of the cooling device.
-		.set_cur_state: set the Current throttle state of the cooling device.
+		void *devdata, struct thermal_cooling_device_ops *)
+
+    This interface function adds a new thermal cooling device (fan/processor/...)
+    to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
+    to all the thermal zone devices register at the same time.
+    name: the cooling device name.
+    devdata: device private data.
+    ops: thermal cooling devices call-backs.
+	.get_max_state: get the Maximum throttle state of the cooling device.
+	.get_cur_state: get the Current throttle state of the cooling device.
+	.set_cur_state: set the Current throttle state of the cooling device.
 
 1.2.2 void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
 
-	This interface function remove the thermal cooling device.
-	It deletes the corresponding entry form /sys/class/thermal folder and unbind
-	itself from all	the thermal zone devices using it.
+    This interface function remove the thermal cooling device.
+    It deletes the corresponding entry form /sys/class/thermal folder and
+    unbind itself from all the thermal zone devices using it.
 
 1.3 interface for binding a thermal zone device with a thermal cooling device
 1.3.1 int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
-			int trip, struct thermal_cooling_device *cdev);
+		int trip, struct thermal_cooling_device *cdev);
 
-	This interface function bind a thermal cooling device to the certain trip point
-	of a thermal zone device.
-	This function is usually called in the thermal zone device .bind callback.
-	tz: the thermal zone device
-	cdev: thermal cooling device
-	trip: indicates which trip point the cooling devices is associated with
-		 in this thermal zone.
+    This interface function bind a thermal cooling device to the certain trip
+    point of a thermal zone device.
+    This function is usually called in the thermal zone device .bind callback.
+    tz: the thermal zone device
+    cdev: thermal cooling device
+    trip: indicates which trip point the cooling devices is associated with
+	  in this thermal zone.
 
 1.3.2 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
-				int trip, struct thermal_cooling_device *cdev);
+		int trip, struct thermal_cooling_device *cdev);
 
-	This interface function unbind a thermal cooling device from the certain trip point
-	of a thermal zone device.
-	This function is usually called in the thermal zone device .unbind callback.
-	tz: the thermal zone device
-	cdev: thermal cooling device
-	trip: indicates which trip point the cooling devices is associated with
-		in this thermal zone.
+    This interface function unbind a thermal cooling device from the certain
+    trip point of a thermal zone device. This function is usually called in
+    the thermal zone device .unbind callback.
+    tz: the thermal zone device
+    cdev: thermal cooling device
+    trip: indicates which trip point the cooling devices is associated with
+	  in this thermal zone.
 
 2. sysfs attributes structure
 
@@ -114,153 +114,166 @@ if hwmon is compiled in or built as a module.
 
 Thermal zone device sys I/F, created once it's registered:
 /sys/class/thermal/thermal_zone[0-*]:
-	|-----type:			Type of the thermal zone
-	|-----temp:			Current temperature
-	|-----mode:			Working mode of the thermal zone
-	|-----trip_point_[0-*]_temp:	Trip point temperature
-	|-----trip_point_[0-*]_type:	Trip point type
+    |---type:			Type of the thermal zone
+    |---temp:			Current temperature
+    |---mode:			Working mode of the thermal zone
+    |---trip_point_[0-*]_temp:	Trip point temperature
+    |---trip_point_[0-*]_type:	Trip point type
 
 Thermal cooling device sys I/F, created once it's registered:
 /sys/class/thermal/cooling_device[0-*]:
-	|-----type :			Type of the cooling device(processor/fan/...)
-	|-----max_state:		Maximum cooling state of the cooling device
-	|-----cur_state:		Current cooling state of the cooling device
+    |---type:			Type of the cooling device(processor/fan/...)
+    |---max_state:		Maximum cooling state of the cooling device
+    |---cur_state:		Current cooling state of the cooling device
 
 
-These two dynamic attributes are created/removed in pairs.
-They represent the relationship between a thermal zone and its associated cooling device.
-They are created/removed for each
-thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device successful execution.
+Then next two dynamic attributes are created/removed in pairs. They represent
+the relationship between a thermal zone and its associated cooling device.
+They are created/removed for each successful execution of
+thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
 
-/sys/class/thermal/thermal_zone[0-*]
-	|-----cdev[0-*]:		The [0-*]th cooling device in the current thermal zone
-	|-----cdev[0-*]_trip_point:	Trip point that cdev[0-*] is associated with
+/sys/class/thermal/thermal_zone[0-*]:
+    |---cdev[0-*]:		[0-*]th cooling device in current thermal zone
+    |---cdev[0-*]_trip_point:	Trip point that cdev[0-*] is associated with
 
 Besides the thermal zone device sysfs I/F and cooling device sysfs I/F,
-the generic thermal driver also creates a hwmon sysfs I/F for each _type_ of
-thermal zone device. E.g. the generic thermal driver registers one hwmon class device
-and build the associated hwmon sysfs I/F for all the registered ACPI thermal zones.
+the generic thermal driver also creates a hwmon sysfs I/F for each _type_
+of thermal zone device. E.g. the generic thermal driver registers one hwmon
+class device and build the associated hwmon sysfs I/F for all the registered
+ACPI thermal zones.
+
 /sys/class/hwmon/hwmon[0-*]:
-	|-----name:			The type of the thermal zone devices.
-	|-----temp[1-*]_input:		The current temperature of thermal zone [1-*].
-	|-----temp[1-*]_critical:	The critical trip point of thermal zone [1-*].
+    |---name:			The type of the thermal zone devices
+    |---temp[1-*]_input:	The current temperature of thermal zone [1-*]
+    |---temp[1-*]_critical:	The critical trip point of thermal zone [1-*]
+
 Please read Documentation/hwmon/sysfs-interface for additional information.
 
 ***************************
 * Thermal zone attributes *
 ***************************
 
-type				Strings which represent the thermal zone type.
-				This is given by thermal zone driver as part of registration.
-				Eg: "acpitz" indicates it's an ACPI thermal device.
-				In order to keep it consistent with hwmon sys attribute,
-				this should be a short, lowercase string,
-				not containing spaces nor dashes.
-				RO
-				Required
-
-temp				Current temperature as reported by thermal zone (sensor)
-				Unit: millidegree Celsius
-				RO
-				Required
-
-mode				One of the predefined values in [kernel, user]
-				This file gives information about the algorithm
-				that is currently managing the thermal zone.
-				It can be either default kernel based algorithm
-				or user space application.
-				RW
-				Optional
-				kernel	= Thermal management in kernel thermal zone driver.
-				user	= Preventing kernel thermal zone driver actions upon
-					  trip points so that user application can take full
-					  charge of the thermal management.
-
-trip_point_[0-*]_temp		The temperature above which trip point will be fired
-				Unit: millidegree Celsius
-				RO
-				Optional
-
-trip_point_[0-*]_type 		Strings which indicate the type of the trip point
-				E.g. it can be one of critical, hot, passive,
-				    active[0-*] for ACPI thermal zone.
-				RO
-				Optional
-
-cdev[0-*]			Sysfs link to the thermal cooling device node where the sys I/F
-				for cooling device throttling control represents.
-				RO
-				Optional
-
-cdev[0-*]_trip_point		The trip point with which cdev[0-*] is associated in this thermal zone
-				-1 means the cooling device is not associated with any trip point.
-				RO
-				Optional
-
-******************************
-* Cooling device  attributes *
-******************************
-
-type				String which represents the type of device
-				eg: For generic ACPI: this should be "Fan",
-				"Processor" or "LCD"
-				eg. For memory controller device on intel_menlow platform:
-				this should be "Memory controller"
-				RO
-				Required
-
-max_state			The maximum permissible cooling state of this cooling device.
-				RO
-				Required
-
-cur_state			The current cooling state of this cooling device.
-				the value can any integer numbers between 0 and max_state,
-				cur_state == 0 means no cooling
-				cur_state == max_state means the maximum cooling.
-				RW
-				Required
+type
+	Strings which represent the thermal zone type.
+	This is given by thermal zone driver as part of registration.
+	E.g: "acpitz" indicates it's an ACPI thermal device.
+	In order to keep it consistent with hwmon sys attribute; this should
+	be a short, lowercase string, not containing spaces nor dashes.
+	RO, Required
+
+temp
+	Current temperature as reported by thermal zone (sensor).
+	Unit: millidegree Celsius
+	RO, Required
+
+mode
+	One of the predefined values in [kernel, user].
+	This file gives information about the algorithm that is currently
+	managing the thermal zone. It can be either default kernel based
+	algorithm or user space application.
+	kernel	= Thermal management in kernel thermal zone driver.
+	user	= Preventing kernel thermal zone driver actions upon
+		  trip points so that user application can take full
+		  charge of the thermal management.
+	RW, Optional
+
+trip_point_[0-*]_temp
+	The temperature above which trip point will be fired.
+	Unit: millidegree Celsius
+	RO, Optional
+
+trip_point_[0-*]_type
+	Strings which indicate the type of the trip point.
+	E.g. it can be one of critical, hot, passive, active[0-*] for ACPI
+	thermal zone.
+	RO, Optional
+
+cdev[0-*]
+	Sysfs link to the thermal cooling device node where the sys I/F
+	for cooling device throttling control represents.
+	RO, Optional
+
+cdev[0-*]_trip_point
+	The trip point with which cdev[0-*] is associated in this thermal
+	zone; -1 means the cooling device is not associated with any trip
+	point.
+	RO, Optional
+
+passive
+	Attribute is only present for zones in which the passive cooling
+	policy is not supported by native thermal driver. Default is zero
+	and can be set to a temperature (in millidegrees) to enable a
+	passive trip point for the zone. Activation is done by polling with
+	an interval of 1 second.
+	Unit: millidegrees Celsius
+	RW, Optional
+
+*****************************
+* Cooling device attributes *
+*****************************
+
+type
+	String which represents the type of device, e.g:
+	- for generic ACPI: should be "Fan", "Processor" or "LCD"
+	- for memory controller device on intel_menlow platform:
+	  should be "Memory controller".
+	RO, Required
+
+max_state
+	The maximum permissible cooling state of this cooling device.
+	RO, Required
+
+cur_state
+	The current cooling state of this cooling device.
+	The value can any integer numbers between 0 and max_state:
+	- cur_state == 0 means no cooling
+	- cur_state == max_state means the maximum cooling.
+	RW, Required
 
 3. A simple implementation
 
-ACPI thermal zone may support multiple trip points like critical/hot/passive/active.
-If an ACPI thermal zone supports critical, passive, active[0] and active[1] at the same time,
-it may register itself as a thermal_zone_device (thermal_zone1) with 4 trip points in all.
-It has one processor and one fan, which are both registered as thermal_cooling_device.
-If the processor is listed in _PSL method, and the fan is listed in _AL0 method,
-the sys I/F structure will be built like this:
+ACPI thermal zone may support multiple trip points like critical, hot,
+passive, active. If an ACPI thermal zone supports critical, passive,
+active[0] and active[1] at the same time, it may register itself as a
+thermal_zone_device (thermal_zone1) with 4 trip points in all.
+It has one processor and one fan, which are both registered as
+thermal_cooling_device.
+
+If the processor is listed in _PSL method, and the fan is listed in _AL0
+method, the sys I/F structure will be built like this:
 
 /sys/class/thermal:
 
 |thermal_zone1:
-	|-----type:			acpitz
-	|-----temp:			37000
-	|-----mode:			kernel
-	|-----trip_point_0_temp:	100000
-	|-----trip_point_0_type:	critical
-	|-----trip_point_1_temp:	80000
-	|-----trip_point_1_type:	passive
-	|-----trip_point_2_temp:	70000
-	|-----trip_point_2_type:	active0
-	|-----trip_point_3_temp:	60000
-	|-----trip_point_3_type:	active1
-	|-----cdev0:			--->/sys/class/thermal/cooling_device0
-	|-----cdev0_trip_point:		1	/* cdev0 can be used for passive */
-	|-----cdev1:			--->/sys/class/thermal/cooling_device3
-	|-----cdev1_trip_point:		2	/* cdev1 can be used for active[0]*/
+    |---type:			acpitz
+    |---temp:			37000
+    |---mode:			kernel
+    |---trip_point_0_temp:	100000
+    |---trip_point_0_type:	critical
+    |---trip_point_1_temp:	80000
+    |---trip_point_1_type:	passive
+    |---trip_point_2_temp:	70000
+    |---trip_point_2_type:	active0
+    |---trip_point_3_temp:	60000
+    |---trip_point_3_type:	active1
+    |---cdev0:			--->/sys/class/thermal/cooling_device0
+    |---cdev0_trip_point:	1	/* cdev0 can be used for passive */
+    |---cdev1:			--->/sys/class/thermal/cooling_device3
+    |---cdev1_trip_point:	2	/* cdev1 can be used for active[0]*/
 
 |cooling_device0:
-	|-----type:			Processor
-	|-----max_state:		8
-	|-----cur_state:		0
+    |---type:			Processor
+    |---max_state:		8
+    |---cur_state:		0
 
 |cooling_device3:
-	|-----type:			Fan
-	|-----max_state:		2
-	|-----cur_state:		0
+    |---type:			Fan
+    |---max_state:		2
+    |---cur_state:		0
 
 /sys/class/hwmon:
 
 |hwmon0:
-	|-----name:			acpitz
-	|-----temp1_input:		37000
-	|-----temp1_crit:		100000
+    |---name:			acpitz
+    |---temp1_input:		37000
+    |---temp1_crit:		100000
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 957b22f..8179692 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1231,6 +1231,7 @@ something like this simple program:
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <string.h>
 
 #define _STR(x) #x
 #define STR(x) _STR(x)
@@ -1265,6 +1266,7 @@ const char *find_debugfs(void)
                return NULL;
        }
 
+       strcat(debugfs, "/tracing/");
        debugfs_found = 1;
 
        return debugfs;
diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt
new file mode 100644
index 0000000..3ffadf8
--- /dev/null
+++ b/Documentation/vm/hwpoison.txt
@@ -0,0 +1,136 @@
+What is hwpoison?
+
+Upcoming Intel CPUs have support for recovering from some memory errors
+(``MCA recovery''). This requires the OS to declare a page "poisoned",
+kill the processes associated with it and avoid using it in the future.
+
+This patchkit implements the necessary infrastructure in the VM.
+
+To quote the overview comment:
+
+ * High level machine check handler. Handles pages reported by the
+ * hardware as being corrupted usually due to a 2bit ECC memory or cache
+ * failure.
+ *
+ * This focusses on pages detected as corrupted in the background.
+ * When the current CPU tries to consume corruption the currently
+ * running process can just be killed directly instead. This implies
+ * that if the error cannot be handled for some reason it's safe to
+ * just ignore it because no corruption has been consumed yet. Instead
+ * when that happens another machine check will happen.
+ *
+ * Handles page cache pages in various states. The tricky part
+ * here is that we can access any page asynchronous to other VM
+ * users, because memory failures could happen anytime and anywhere,
+ * possibly violating some of their assumptions. This is why this code
+ * has to be extremely careful. Generally it tries to use normal locking
+ * rules, as in get the standard locks, even if that means the
+ * error handling takes potentially a long time.
+ *
+ * Some of the operations here are somewhat inefficient and have non
+ * linear algorithmic complexity, because the data structures have not
+ * been optimized for this case. This is in particular the case
+ * for the mapping from a vma to a process. Since this case is expected
+ * to be rare we hope we can get away with this.
+
+The code consists of a the high level handler in mm/memory-failure.c,
+a new page poison bit and various checks in the VM to handle poisoned
+pages.
+
+The main target right now is KVM guests, but it works for all kinds
+of applications. KVM support requires a recent qemu-kvm release.
+
+For the KVM use there was need for a new signal type so that
+KVM can inject the machine check into the guest with the proper
+address. This in theory allows other applications to handle
+memory failures too. The expection is that near all applications
+won't do that, but some very specialized ones might.
+
+---
+
+There are two (actually three) modi memory failure recovery can be in:
+
+vm.memory_failure_recovery sysctl set to zero:
+	All memory failures cause a panic. Do not attempt recovery.
+	(on x86 this can be also affected by the tolerant level of the
+	MCE subsystem)
+
+early kill
+	(can be controlled globally and per process)
+	Send SIGBUS to the application as soon as the error is detected
+	This allows applications who can process memory errors in a gentle
+	way (e.g. drop affected object)
+	This is the mode used by KVM qemu.
+
+late kill
+	Send SIGBUS when the application runs into the corrupted page.
+	This is best for memory error unaware applications and default
+	Note some pages are always handled as late kill.
+
+---
+
+User control:
+
+vm.memory_failure_recovery
+	See sysctl.txt
+
+vm.memory_failure_early_kill
+	Enable early kill mode globally
+
+PR_MCE_KILL
+	Set early/late kill mode/revert to system default
+	arg1: PR_MCE_KILL_CLEAR: Revert to system default
+	arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode
+		PR_MCE_KILL_EARLY: Early kill
+		PR_MCE_KILL_LATE:  Late kill
+		PR_MCE_KILL_DEFAULT: Use system global default
+PR_MCE_KILL_GET
+	return current mode
+
+
+---
+
+Testing:
+
+madvise(MADV_POISON, ....)
+	(as root)
+	Poison a page in the process for testing
+
+
+hwpoison-inject module through debugfs
+	/sys/debug/hwpoison/corrupt-pfn
+
+Inject hwpoison fault at PFN echoed into this file
+
+
+Architecture specific MCE injector
+
+x86 has mce-inject, mce-test
+
+Some portable hwpoison test programs in mce-test, see blow.
+
+---
+
+References:
+
+http://halobates.de/mce-lc09-2.pdf
+	Overview presentation from LinuxCon 09
+
+git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
+	Test suite (hwpoison specific portable tests in tsrc)
+
+git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
+	x86 specific injector
+
+
+---
+
+Limitations:
+
+- Not all page types are supported and never will. Most kernel internal
+objects cannot be recovered, only LRU pages for now.
+- Right now hugepage support is missing.
+
+---
+Andi Kleen, Oct 2009
+
author	Ingo Molnar <mingo@elte.hu>	2009-11-17 10:16:43 +0100
committer	Ingo Molnar <mingo@elte.hu>	2009-11-17 10:17:47 +0100
commit	a7b63425a41cd6a8d50f76fef0660c5110f97e91 (patch)
tree	be17ee121f1c8814d8d39c9f3e0205d9397fab54 /Documentation
parent	35039eb6b199749943547c8572be6604edf00229 (diff)
parent	3726cc75e581c157202da93bb2333cce25c15c98 (diff)
download	op-kernel-dev-a7b63425a41cd6a8d50f76fef0660c5110f97e91.zip op-kernel-dev-a7b63425a41cd6a8d50f76fef0660c5110f97e91.tar.gz