From ea3acb199a5d7e4da1de0a4288eba993b29f33b9 Mon Sep 17 00:00:00 2001 From: Jason Wessel Date: Thu, 24 Sep 2009 09:08:30 -0500 Subject: x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg Commit c953094 ("early_printk: Allow more than one early console") introduced a regression in the parsing of the earlyprintk= kernel arguments. If you specify "earlyprintk=serial,ttyS0,115200" as a kernel argument, the "serial,ttyS" should be parsed as a single argument and not as "serial" and then "ttyS". Also update the documentation to reflect you can specify the ttyS directly without the "serial" argument. Signed-off-by: Jason Wessel Cc: Len Brown Cc: Greg KH Cc: Linus Torvalds Cc: Andrew Morton Cc: Johannes Weiner LKML-Reference: <4ABB7D5E.6000301@windriver.com> Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6fa7292..9107b38 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -671,6 +671,7 @@ and is between 256 and 4096 characters. It is defined in the file earlyprintk= [X86,SH,BLACKFIN] earlyprintk=vga earlyprintk=serial[,ttySn[,baudrate]] + earlyprintk=ttySn[,baudrate] earlyprintk=dbgp[debugController#] Append ",keep" to not disable it when the real console -- cgit v1.1 From 3bfc13c239fd56ebc1ac98a914c6c6b8b0045478 Mon Sep 17 00:00:00 2001 From: HighPoint Linux Team Date: Fri, 11 Sep 2009 17:21:27 +0800 Subject: [SCSI] hptiop: Add RR44xx adapter support Most code changes were made to support RR44xx adapters. - add more PCI device ID. - using PCI BAR[2] to access RR44xx IOP. - using PCI BAR[0] to check and clear RR44xx IRQ. Signed-off-by: HighPoint Linux Team Signed-off-by: James Bottomley --- Documentation/scsi/hptiop.txt | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt index a6eb4ad..9605179 100644 --- a/Documentation/scsi/hptiop.txt +++ b/Documentation/scsi/hptiop.txt @@ -3,6 +3,25 @@ HIGHPOINT ROCKETRAID 3xxx/4xxx ADAPTER DRIVER (hptiop) Controller Register Map ------------------------- +For RR44xx Intel IOP based adapters, the controller IOP is accessed via PCI BAR0 and BAR2: + + BAR0 offset Register + 0x11C5C Link Interface IRQ Set + 0x11C60 Link Interface IRQ Clear + + BAR2 offset Register + 0x10 Inbound Message Register 0 + 0x14 Inbound Message Register 1 + 0x18 Outbound Message Register 0 + 0x1C Outbound Message Register 1 + 0x20 Inbound Doorbell Register + 0x24 Inbound Interrupt Status Register + 0x28 Inbound Interrupt Mask Register + 0x30 Outbound Interrupt Status Register + 0x34 Outbound Interrupt Mask Register + 0x40 Inbound Queue Port + 0x44 Outbound Queue Port + For Intel IOP based adapters, the controller IOP is accessed via PCI BAR0: BAR0 offset Register @@ -93,7 +112,7 @@ The driver exposes following sysfs attributes: ----------------------------------------------------------------------------- -Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved. +Copyright (C) 2006-2009 HighPoint Technologies, Inc. All Rights Reserved. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of -- cgit v1.1 From 211a641770c2ae191c9589fee69a8fb67f67fffd Mon Sep 17 00:00:00 2001 From: "Justin P. Mattock" Date: Tue, 29 Sep 2009 15:13:58 -0700 Subject: ieee1394: update URLs in debugging-via-ohci1394.txt Update URLs of the userspace tools to use ohci1394_dma=early for debugging. Seems the address ftp://ftp.suse.de/private/bk/firewire/tools/* is not very helpful. After a quick search, seems this was talked about: http://www.mail-archive.com/kgdb-bugreport@lists.sourceforge.net/msg02761.html (can't find the original thread). Signed-off-by: Justin P. Mattock Signed-off-by: Stefan Richter --- Documentation/debugging-via-ohci1394.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/debugging-via-ohci1394.txt index 59a91e5..611f5a5 100644 --- a/Documentation/debugging-via-ohci1394.txt +++ b/Documentation/debugging-via-ohci1394.txt @@ -64,14 +64,14 @@ be used to view the printk buffer of a remote machine, even with live update. Bernhard Kaindl enhanced firescope to support accessing 64-bit machines from 32-bit firescope and vice versa: -- ftp://ftp.suse.de/private/bk/firewire/tools/firescope-0.2.2.tar.bz2 +- http://halobates.de/firewire/firescope-0.2.2.tar.bz2 and he implemented fast system dump (alpha version - read README.txt): -- ftp://ftp.suse.de/private/bk/firewire/tools/firedump-0.1.tar.bz2 +- http://halobates.de/firewire/firedump-0.1.tar.bz2 There is also a gdb proxy for firewire which allows to use gdb to access data which can be referenced from symbols found by gdb in vmlinux: -- ftp://ftp.suse.de/private/bk/firewire/tools/fireproxy-0.33.tar.bz2 +- http://halobates.de/firewire/fireproxy-0.33.tar.bz2 The latest version of this gdb proxy (fireproxy-0.34) can communicate (not yet stable) with kgdb over an memory-based communication module (kgdbom). @@ -178,7 +178,7 @@ Step-by-step instructions for using firescope with early OHCI initialization: Notes ----- -Documentation and specifications: ftp://ftp.suse.de/private/bk/firewire/docs +Documentation and specifications: http://halobates.de/firewire/ FireWire is a trademark of Apple Inc. - for more information please refer to: http://en.wikipedia.org/wiki/FireWire -- cgit v1.1 From f58ee00f1547ceb17b610ecfce2aa9097f1f9737 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Sun, 4 Oct 2009 02:28:42 +0200 Subject: HWPOISON: Add brief hwpoison description to Documentation Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 136 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 Documentation/vm/hwpoison.txt (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt new file mode 100644 index 0000000..3ffadf8 --- /dev/null +++ b/Documentation/vm/hwpoison.txt @@ -0,0 +1,136 @@ +What is hwpoison? + +Upcoming Intel CPUs have support for recovering from some memory errors +(``MCA recovery''). This requires the OS to declare a page "poisoned", +kill the processes associated with it and avoid using it in the future. + +This patchkit implements the necessary infrastructure in the VM. + +To quote the overview comment: + + * High level machine check handler. Handles pages reported by the + * hardware as being corrupted usually due to a 2bit ECC memory or cache + * failure. + * + * This focusses on pages detected as corrupted in the background. + * When the current CPU tries to consume corruption the currently + * running process can just be killed directly instead. This implies + * that if the error cannot be handled for some reason it's safe to + * just ignore it because no corruption has been consumed yet. Instead + * when that happens another machine check will happen. + * + * Handles page cache pages in various states. The tricky part + * here is that we can access any page asynchronous to other VM + * users, because memory failures could happen anytime and anywhere, + * possibly violating some of their assumptions. This is why this code + * has to be extremely careful. Generally it tries to use normal locking + * rules, as in get the standard locks, even if that means the + * error handling takes potentially a long time. + * + * Some of the operations here are somewhat inefficient and have non + * linear algorithmic complexity, because the data structures have not + * been optimized for this case. This is in particular the case + * for the mapping from a vma to a process. Since this case is expected + * to be rare we hope we can get away with this. + +The code consists of a the high level handler in mm/memory-failure.c, +a new page poison bit and various checks in the VM to handle poisoned +pages. + +The main target right now is KVM guests, but it works for all kinds +of applications. KVM support requires a recent qemu-kvm release. + +For the KVM use there was need for a new signal type so that +KVM can inject the machine check into the guest with the proper +address. This in theory allows other applications to handle +memory failures too. The expection is that near all applications +won't do that, but some very specialized ones might. + +--- + +There are two (actually three) modi memory failure recovery can be in: + +vm.memory_failure_recovery sysctl set to zero: + All memory failures cause a panic. Do not attempt recovery. + (on x86 this can be also affected by the tolerant level of the + MCE subsystem) + +early kill + (can be controlled globally and per process) + Send SIGBUS to the application as soon as the error is detected + This allows applications who can process memory errors in a gentle + way (e.g. drop affected object) + This is the mode used by KVM qemu. + +late kill + Send SIGBUS when the application runs into the corrupted page. + This is best for memory error unaware applications and default + Note some pages are always handled as late kill. + +--- + +User control: + +vm.memory_failure_recovery + See sysctl.txt + +vm.memory_failure_early_kill + Enable early kill mode globally + +PR_MCE_KILL + Set early/late kill mode/revert to system default + arg1: PR_MCE_KILL_CLEAR: Revert to system default + arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode + PR_MCE_KILL_EARLY: Early kill + PR_MCE_KILL_LATE: Late kill + PR_MCE_KILL_DEFAULT: Use system global default +PR_MCE_KILL_GET + return current mode + + +--- + +Testing: + +madvise(MADV_POISON, ....) + (as root) + Poison a page in the process for testing + + +hwpoison-inject module through debugfs + /sys/debug/hwpoison/corrupt-pfn + +Inject hwpoison fault at PFN echoed into this file + + +Architecture specific MCE injector + +x86 has mce-inject, mce-test + +Some portable hwpoison test programs in mce-test, see blow. + +--- + +References: + +http://halobates.de/mce-lc09-2.pdf + Overview presentation from LinuxCon 09 + +git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git + Test suite (hwpoison specific portable tests in tsrc) + +git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git + x86 specific injector + + +--- + +Limitations: + +- Not all page types are supported and never will. Most kernel internal +objects cannot be recovered, only LRU pages for now. +- Right now hugepage support is missing. + +--- +Andi Kleen, Oct 2009 + -- cgit v1.1 From 896a7cf8d846a9e86fb823be16f4f14ffeb7f074 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 2 Oct 2009 20:24:59 +0000 Subject: pktgen: Fix multiqueue handling It is not currently possible to instruct pktgen to use one selected tx queue. When Robert added multiqueue support in commit 45b270f8, he added an interval (queue_map_min, queue_map_max), and his code doesnt take into account the case of min = max, to select one tx queue exactly. I suspect a high performance setup on a eight txqueue device wants to use exactly eight cpus, and assign one tx queue to each sender. This patchs makes pktgen select the right tx queue, not the first one. Also updates Documentation to reflect Robert changes. Signed-off-by: Eric Dumazet Signed-off-by: Robert Olsson Signed-off-by: David S. Miller --- Documentation/networking/pktgen.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index c6cf4a3..61bb645 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -90,6 +90,11 @@ Examples: pgset "dstmac 00:00:00:00:00:00" sets MAC destination address pgset "srcmac 00:00:00:00:00:00" sets MAC source address + pgset "queue_map_min 0" Sets the min value of tx queue interval + pgset "queue_map_max 7" Sets the max value of tx queue interval, for multiqueue devices + To select queue 1 of a given device, + use queue_map_min=1 and queue_map_max=1 + pgset "src_mac_count 1" Sets the number of MACs we'll range through. The 'minimum' MAC is what you set with srcmac. @@ -101,6 +106,9 @@ Examples: IPDST_RND, UDPSRC_RND, UDPDST_RND, MACSRC_RND, MACDST_RND MPLS_RND, VID_RND, SVID_RND + QUEUE_MAP_RND # queue map random + QUEUE_MAP_CPU # queue map mirrors smp_processor_id() + pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then cycle through the port range. -- cgit v1.1 From f1af9f58546e2d98ef078fa30b2ef80a9042131e Mon Sep 17 00:00:00 2001 From: Tilman Schmidt Date: Tue, 6 Oct 2009 12:18:00 +0000 Subject: Documentation: expand isdn/INTERFACE.CAPI document - Note that send_message() may be called in interrupt context. - Describe the storage of CAPI messages and payload data in SKBs. - Add more details to the description of the _cmsg structure. - Describe kernelcapi debugging output. Impact: documentation Signed-off-by: Tilman Schmidt Signed-off-by: David S. Miller --- Documentation/isdn/INTERFACE.CAPI | 83 +++++++++++++++++++++++++++++++-------- 1 file changed, 67 insertions(+), 16 deletions(-) (limited to 'Documentation') diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI index 686e107..5fe8de5 100644 --- a/Documentation/isdn/INTERFACE.CAPI +++ b/Documentation/isdn/INTERFACE.CAPI @@ -60,10 +60,9 @@ open() operation on regular files or character devices. After a successful return from register_appl(), CAPI messages from the application may be passed to the driver for the device via calls to the -send_message() callback function. The CAPI message to send is stored in the -data portion of an skb. Conversely, the driver may call Kernel CAPI's -capi_ctr_handle_message() function to pass a received CAPI message to Kernel -CAPI for forwarding to an application, specifying its ApplID. +send_message() callback function. Conversely, the driver may call Kernel +CAPI's capi_ctr_handle_message() function to pass a received CAPI message to +Kernel CAPI for forwarding to an application, specifying its ApplID. Deregistration requests (CAPI operation CAPI_RELEASE) from applications are forwarded as calls to the release_appl() callback function, passing the same @@ -142,6 +141,7 @@ u16 (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb) to accepting or queueing the message. Errors occurring during the actual processing of the message should be signaled with an appropriate reply message. + May be called in process or interrupt context. Calls to this function are not serialized by Kernel CAPI, ie. it must be prepared to be re-entered. @@ -154,7 +154,8 @@ read_proc_t *ctr_read_proc system entry, /proc/capi/controllers/; will be called with a pointer to the device's capi_ctr structure as the last (data) argument -Note: Callback functions are never called in interrupt context. +Note: Callback functions except send_message() are never called in interrupt +context. - to be filled in before calling capi_ctr_ready(): @@ -171,14 +172,40 @@ u8 serial[CAPI_SERIAL_LEN] value to return for CAPI_GET_SERIAL -4.3 The _cmsg Structure +4.3 SKBs + +CAPI messages are passed between Kernel CAPI and the driver via send_message() +and capi_ctr_handle_message(), stored in the data portion of a socket buffer +(skb). Each skb contains a single CAPI message coded according to the CAPI 2.0 +standard. + +For the data transfer messages, DATA_B3_REQ and DATA_B3_IND, the actual +payload data immediately follows the CAPI message itself within the same skb. +The Data and Data64 parameters are not used for processing. The Data64 +parameter may be omitted by setting the length field of the CAPI message to 22 +instead of 30. + + +4.4 The _cmsg Structure (declared in ) The _cmsg structure stores the contents of a CAPI 2.0 message in an easily -accessible form. It contains members for all possible CAPI 2.0 parameters, of -which only those appearing in the message type currently being processed are -actually used. Unused members should be set to zero. +accessible form. It contains members for all possible CAPI 2.0 parameters, +including subparameters of the Additional Info and B Protocol structured +parameters, with the following exceptions: + +* second Calling party number (CONNECT_IND) + +* Data64 (DATA_B3_REQ and DATA_B3_IND) + +* Sending complete (subparameter of Additional Info, CONNECT_REQ and INFO_REQ) + +* Global Configuration (subparameter of B Protocol, CONNECT_REQ, CONNECT_RESP + and SELECT_B_PROTOCOL_REQ) + +Only those parameters appearing in the message type currently being processed +are actually used. Unused members should be set to zero. Members are named after the CAPI 2.0 standard names of the parameters they represent. See for the exact spelling. Member data @@ -190,18 +217,19 @@ u16 for CAPI parameters of type 'word' u32 for CAPI parameters of type 'dword' -_cstruct for CAPI parameters of type 'struct' not containing any - variably-sized (struct) subparameters (eg. 'Called Party Number') +_cstruct for CAPI parameters of type 'struct' The member is a pointer to a buffer containing the parameter in CAPI encoding (length + content). It may also be NULL, which will be taken to represent an empty (zero length) parameter. + Subparameters are stored in encoded form within the content part. -_cmstruct for CAPI parameters of type 'struct' containing 'struct' - subparameters ('Additional Info' and 'B Protocol') +_cmstruct alternative representation for CAPI parameters of type 'struct' + (used only for the 'Additional Info' and 'B Protocol' parameters) The representation is a single byte containing one of the values: - CAPI_DEFAULT: the parameter is empty - CAPI_COMPOSE: the values of the subparameters are stored - individually in the corresponding _cmsg structure members + CAPI_DEFAULT: The parameter is empty/absent. + CAPI_COMPOSE: The parameter is present. + Subparameter values are stored individually in the corresponding + _cmsg structure members. Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert messages between their transport encoding described in the CAPI 2.0 standard @@ -297,3 +325,26 @@ char *capi_cmd2str(u8 Command, u8 Subcommand) be NULL if the command/subcommand is not one of those defined in the CAPI 2.0 standard. + +7. Debugging + +The module kernelcapi has a module parameter showcapimsgs controlling some +debugging output produced by the module. It can only be set when the module is +loaded, via a parameter "showcapimsgs=" to the modprobe command, either on +the command line or in the configuration file. + +If the lowest bit of showcapimsgs is set, kernelcapi logs controller and +application up and down events. + +In addition, every registered CAPI controller has an associated traceflag +parameter controlling how CAPI messages sent from and to tha controller are +logged. The traceflag parameter is initialized with the value of the +showcapimsgs parameter when the controller is registered, but can later be +changed via the MANUFACTURER_REQ command KCAPI_CMD_TRACE. + +If the value of traceflag is non-zero, CAPI messages are logged. +DATA_B3 messages are only logged if the value of traceflag is > 2. + +If the lowest bit of traceflag is set, only the command/subcommand and message +length are logged. Otherwise, kernelcapi logs a readable representation of +the entire message. -- cgit v1.1 From aa07a99412f56ad56faecbaa683f3bc0ae99abc2 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 7 Oct 2009 15:35:55 -0700 Subject: IB: Fix typo in udev rule documentation The proper syntax for udev rules is KERNEL==... instead of KERNEL=... Signed-off-by: Bart Van Assche Reported-by: Lukasz Jurewicz Signed-off-by: Roland Dreier --- Documentation/infiniband/user_mad.txt | 4 ++-- Documentation/infiniband/user_verbs.txt | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index 744687d..8a36695 100644 --- a/Documentation/infiniband/user_mad.txt +++ b/Documentation/infiniband/user_mad.txt @@ -128,8 +128,8 @@ Setting IsSM Capability Bit To create the appropriate character device files automatically with udev, a rule like - KERNEL="umad*", NAME="infiniband/%k" - KERNEL="issm*", NAME="infiniband/%k" + KERNEL=="umad*", NAME="infiniband/%k" + KERNEL=="issm*", NAME="infiniband/%k" can be used. This will create device nodes named diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt index f847501..afe3f8d 100644 --- a/Documentation/infiniband/user_verbs.txt +++ b/Documentation/infiniband/user_verbs.txt @@ -58,7 +58,7 @@ Memory pinning To create the appropriate character device files automatically with udev, a rule like - KERNEL="uverbs*", NAME="infiniband/%k" + KERNEL=="uverbs*", NAME="infiniband/%k" can be used. This will create device nodes named -- cgit v1.1 From c73602ad31cdcf7e6651f43d12f65b5b9b825b6f Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 7 Oct 2009 16:32:22 -0700 Subject: ksm: more on default values Adjust the max_kernel_pages default to a quarter of totalram_pages, instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be pinning 32MB of lowmem with their rmap_items, so no need for the more obscure calculation (nor for its own special init function). There is no way for the user to switch KSM on if CONFIG_SYSFS is not enabled, so in that case default run to KSM_RUN_MERGE. Update KSM Documentation and Kconfig to reflect the new defaults. Signed-off-by: Hugh Dickins Cc: Izik Eidus Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/ksm.txt | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index 72a22f6..262d8e6 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -52,15 +52,15 @@ The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root: max_kernel_pages - set to maximum number of kernel pages that KSM may use - e.g. "echo 2000 > /sys/kernel/mm/ksm/max_kernel_pages" + e.g. "echo 100000 > /sys/kernel/mm/ksm/max_kernel_pages" Value 0 imposes no limit on the kernel pages KSM may use; but note that any process using MADV_MERGEABLE can cause KSM to allocate these pages, unswappable until it exits. - Default: 2000 (chosen for demonstration purposes) + Default: quarter of memory (chosen to not pin too much) pages_to_scan - how many present pages to scan before ksmd goes to sleep - e.g. "echo 200 > /sys/kernel/mm/ksm/pages_to_scan" - Default: 200 (chosen for demonstration purposes) + e.g. "echo 100 > /sys/kernel/mm/ksm/pages_to_scan" + Default: 100 (chosen for demonstration purposes) sleep_millisecs - how many milliseconds ksmd should sleep before next scan e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs" @@ -70,7 +70,8 @@ run - set 0 to stop ksmd from running but keep merged pages, set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", set 2 to stop ksmd and unmerge all pages currently merged, but leave mergeable areas registered for next run - Default: 1 (for immediate use by apps which register) + Default: 0 (must be changed to 1 to activate KSM, + except if CONFIG_SYSFS is disabled) The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: @@ -86,4 +87,4 @@ pages_volatile embraces several different kinds of activity, but a high proportion there would also indicate poor use of madvise MADV_MERGEABLE. Izik Eidus, -Hugh Dickins, 30 July 2009 +Hugh Dickins, 24 Sept 2009 -- cgit v1.1 From 7823da36ce8e42d66941887eb922768d259763f2 Mon Sep 17 00:00:00 2001 From: Paul Menage Date: Wed, 7 Oct 2009 16:32:26 -0700 Subject: cgroups: update documentation of cgroups tasks and procs files Update documentation of cgroups tasks and procs files Document the cgroup.procs file. Clarify the semantics of the cgroup.procs and tasks files. Although the current cgroup.procs interface returns a sorted and uniqified list of pids, potential future performance enhancements could result in those properties being removed - explicitly document this aspect of the API. There are no existing users of cgroup.procs, so compatibility isn't an issue. There are users of the "tasks" file, but none that would appear to break in the event of the sorted property being broken. The standard "libcpuset" explicitly sorts the results of reading from the tasks file, and "libcg" and other users don't appear to care about ordering. Signed-off-by: Paul Menage Reviewed-by: Li Zefan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 455d4e6..0b33bfe 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -227,7 +227,14 @@ as the path relative to the root of the cgroup file system. Each cgroup is represented by a directory in the cgroup file system containing the following files describing that cgroup: - - tasks: list of tasks (by pid) attached to that cgroup + - tasks: list of tasks (by pid) attached to that cgroup. This list + is not guaranteed to be sorted. Writing a thread id into this file + moves the thread into this cgroup. + - cgroup.procs: list of tgids in the cgroup. This list is not + guaranteed to be sorted or free of duplicate tgids, and userspace + should sort/uniquify the list if this property is required. + Writing a tgid into this file moves all threads with that tgid into + this cgroup. - notify_on_release flag: run the release agent on exit? - release_agent: the path to use for release notifications (this file exists in the top cgroup only) @@ -374,7 +381,7 @@ Now you want to do something with this cgroup. In this directory you can find several files: # ls -notify_on_release tasks +cgroup.procs notify_on_release tasks (plus whatever files added by the attached subsystems) Now attach your shell to this cgroup: -- cgit v1.1 From 253fb02d62571e5455eedc9e39b9d660e86a40f0 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:27 -0700 Subject: pagemap: export KPF_HWPOISON This flag indicates a hardware detected memory corruption on the page. Any future access of the page data may bring down the machine. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 2 ++ Documentation/vm/pagemap.txt | 4 ++++ 2 files changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index fa1a30d..87f5722 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -69,6 +69,7 @@ #define KPF_COMPOUND_TAIL 16 #define KPF_HUGE 17 #define KPF_UNEVICTABLE 18 +#define KPF_HWPOISON 19 #define KPF_NOPAGE 20 /* [32-] kernel hacking assistances */ @@ -116,6 +117,7 @@ static char *page_flag_names[] = { [KPF_COMPOUND_TAIL] = "T:compound_tail", [KPF_HUGE] = "G:huge", [KPF_UNEVICTABLE] = "u:unevictable", + [KPF_HWPOISON] = "X:hwpoison", [KPF_NOPAGE] = "n:nopage", [KPF_RESERVED] = "r:reserved", diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index 600a304..2fdd84a 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -57,6 +57,7 @@ There are three components to pagemap: 16. COMPOUND_TAIL 16. HUGE 18. UNEVICTABLE + 19. HWPOISON 20. NOPAGE Short descriptions to the page flags: @@ -86,6 +87,9 @@ Short descriptions to the page flags: 17. HUGE this is an integral part of a HugeTLB page +19. HWPOISON + hardware detected memory corruption on this page: don't touch the data! + 20. NOPAGE no page frame exists at the requested address -- cgit v1.1 From a1bbb5ec39042fb762e7f4bcc634da0d87834193 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:28 -0700 Subject: pagemap: document KPF_KSM and show it in page-types It indicates to the system admin that processes mapping such pages may be eating less physical memory than the reported numbers by legacy tools. Signed-off-by: Wu Fengguang Cc: Hugh Dickins Cc: Izik Eidus Acked-by: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 2 ++ Documentation/vm/pagemap.txt | 4 ++++ 2 files changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 87f5722..9899fa2 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -71,6 +71,7 @@ #define KPF_UNEVICTABLE 18 #define KPF_HWPOISON 19 #define KPF_NOPAGE 20 +#define KPF_KSM 21 /* [32-] kernel hacking assistances */ #define KPF_RESERVED 32 @@ -119,6 +120,7 @@ static char *page_flag_names[] = { [KPF_UNEVICTABLE] = "u:unevictable", [KPF_HWPOISON] = "X:hwpoison", [KPF_NOPAGE] = "n:nopage", + [KPF_KSM] = "x:ksm", [KPF_RESERVED] = "r:reserved", [KPF_MLOCKED] = "m:mlocked", diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index 2fdd84a..df09b96 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -59,6 +59,7 @@ There are three components to pagemap: 18. UNEVICTABLE 19. HWPOISON 20. NOPAGE + 21. KSM Short descriptions to the page flags: @@ -93,6 +94,9 @@ Short descriptions to the page flags: 20. NOPAGE no page frame exists at the requested address +21. KSM + identical memory pages dynamically shared between one or more processes + [IO related page flags] 1. ERROR IO error occurred 3. UPTODATE page has up-to-date data -- cgit v1.1 From 0c57effe27eb6544eb44d5fac563b7334e3bc771 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:28 -0700 Subject: page-types: add GPL note Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 9899fa2..e46fb5b 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -2,7 +2,10 @@ * page-types: Tool for querying page flags * * Copyright (C) 2009 Intel corporation - * Copyright (C) 2009 Wu Fengguang + * + * Authors: Wu Fengguang + * + * Released under the General Public License (GPL). */ #define _LARGEFILE64_SOURCE -- cgit v1.1 From 31bbf66eaaaf50ba79e50ab7d3c89531b31c0614 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:29 -0700 Subject: page-types: introduce checked_open() This helps merge duplicate code (now and future) and outstand the main logic. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index e46fb5b..6bdcc06 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -214,6 +214,18 @@ static void fatal(const char *x, ...) exit(EXIT_FAILURE); } +int checked_open(const char *pathname, int flags) +{ + int fd = open(pathname, flags); + + if (fd < 0) { + perror(pathname); + exit(EXIT_FAILURE); + } + + return fd; +} + /* * page flag names @@ -534,11 +546,7 @@ static void walk_addr_ranges(void) { int i; - kpageflags_fd = open(PROC_KPAGEFLAGS, O_RDONLY); - if (kpageflags_fd < 0) { - perror(PROC_KPAGEFLAGS); - exit(EXIT_FAILURE); - } + kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY); if (!nr_addr_ranges) add_addr_range(0, ULONG_MAX); @@ -631,11 +639,7 @@ static void parse_pid(const char *str) opt_pid = parse_number(str); sprintf(buf, "/proc/%d/pagemap", opt_pid); - pagemap_fd = open(buf, O_RDONLY); - if (pagemap_fd < 0) { - perror(buf); - exit(EXIT_FAILURE); - } + pagemap_fd = checked_open(buf, O_RDONLY); sprintf(buf, "/proc/%d/maps", opt_pid); file = fopen(buf, "r"); -- cgit v1.1 From 4a1b6726feee3132905996ae4cd1c7dc2104e721 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:30 -0700 Subject: page-types: make standalone pagemap/kpageflags read routines Refactor the code to be more modular and easier to reuse. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 154 ++++++++++++++++++++++++------------------ 1 file changed, 89 insertions(+), 65 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 6bdcc06..18e891a 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -161,8 +161,6 @@ static unsigned long pg_start[MAX_VMAS]; static unsigned long pg_end[MAX_VMAS]; static unsigned long voffset; -static int pagemap_fd; - #define MAX_BIT_FILTERS 64 static int nr_bit_filters; static uint64_t opt_mask[MAX_BIT_FILTERS]; @@ -170,7 +168,7 @@ static uint64_t opt_bits[MAX_BIT_FILTERS]; static int page_size; -#define PAGES_BATCH (64 << 10) /* 64k pages */ +static int pagemap_fd; static int kpageflags_fd; #define HASH_SHIFT 13 @@ -226,6 +224,62 @@ int checked_open(const char *pathname, int flags) return fd; } +/* + * pagemap/kpageflags routines + */ + +static unsigned long do_u64_read(int fd, char *name, + uint64_t *buf, + unsigned long index, + unsigned long count) +{ + long bytes; + + if (index > ULONG_MAX / 8) + fatal("index overflow: %lu\n", index); + + if (lseek(fd, index * 8, SEEK_SET) < 0) { + perror(name); + exit(EXIT_FAILURE); + } + + bytes = read(fd, buf, count * 8); + if (bytes < 0) { + perror(name); + exit(EXIT_FAILURE); + } + if (bytes % 8) + fatal("partial read: %lu bytes\n", bytes); + + return bytes / 8; +} + +static unsigned long kpageflags_read(uint64_t *buf, + unsigned long index, + unsigned long pages) +{ + return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages); +} + +static unsigned long pagemap_read(uint64_t *buf, + unsigned long index, + unsigned long pages) +{ + return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages); +} + +static unsigned long pagemap_pfn(uint64_t val) +{ + unsigned long pfn; + + if (val & PM_PRESENT) + pfn = PM_PFRAME(val); + else + pfn = 0; + + return pfn; +} + /* * page flag names @@ -432,79 +486,53 @@ static void add_page(unsigned long offset, uint64_t flags) total_pages++; } +#define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ static void walk_pfn(unsigned long index, unsigned long count) { + uint64_t buf[KPAGEFLAGS_BATCH]; unsigned long batch; - unsigned long n; + unsigned long pages; unsigned long i; - if (index > ULONG_MAX / KPF_BYTES) - fatal("index overflow: %lu\n", index); - - lseek(kpageflags_fd, index * KPF_BYTES, SEEK_SET); - while (count) { - uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH]; - - batch = min_t(unsigned long, count, PAGES_BATCH); - n = read(kpageflags_fd, kpageflags_buf, batch * KPF_BYTES); - if (n == 0) + batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH); + pages = kpageflags_read(buf, index, batch); + if (pages == 0) break; - if (n < 0) { - perror(PROC_KPAGEFLAGS); - exit(EXIT_FAILURE); - } - - if (n % KPF_BYTES != 0) - fatal("partial read: %lu bytes\n", n); - n = n / KPF_BYTES; - for (i = 0; i < n; i++) - add_page(index + i, kpageflags_buf[i]); + for (i = 0; i < pages; i++) + add_page(index + i, buf[i]); - index += batch; - count -= batch; + index += pages; + count -= pages; } } - -#define PAGEMAP_BATCH 4096 -static unsigned long task_pfn(unsigned long pgoff) +#define PAGEMAP_BATCH (64 << 10) +static void walk_vma(unsigned long index, unsigned long count) { - static uint64_t buf[PAGEMAP_BATCH]; - static unsigned long start; - static long count; - uint64_t pfn; + uint64_t buf[PAGEMAP_BATCH]; + unsigned long batch; + unsigned long pages; + unsigned long pfn; + unsigned long i; - if (pgoff < start || pgoff >= start + count) { - if (lseek64(pagemap_fd, - (uint64_t)pgoff * PM_ENTRY_BYTES, - SEEK_SET) < 0) { - perror("pagemap seek"); - exit(EXIT_FAILURE); - } - count = read(pagemap_fd, buf, sizeof(buf)); - if (count == 0) - return 0; - if (count < 0) { - perror("pagemap read"); - exit(EXIT_FAILURE); - } - if (count % PM_ENTRY_BYTES) { - fatal("pagemap read not aligned.\n"); - exit(EXIT_FAILURE); - } - count /= PM_ENTRY_BYTES; - start = pgoff; - } + while (count) { + batch = min_t(unsigned long, count, PAGEMAP_BATCH); + pages = pagemap_read(buf, index, batch); + if (pages == 0) + break; - pfn = buf[pgoff - start]; - if (pfn & PM_PRESENT) - pfn = PM_PFRAME(pfn); - else - pfn = 0; + for (i = 0; i < pages; i++) { + pfn = pagemap_pfn(buf[i]); + voffset = index + i; + if (pfn) + walk_pfn(pfn, 1); + } - return pfn; + index += pages; + count -= pages; + } } static void walk_task(unsigned long index, unsigned long count) @@ -524,11 +552,7 @@ static void walk_task(unsigned long index, unsigned long count) index = min_t(unsigned long, pg_end[i], end); assert(voffset < index); - for (; voffset < index; voffset++) { - unsigned long pfn = task_pfn(voffset); - if (pfn) - walk_pfn(pfn, 1); - } + walk_vma(voffset, index - voffset); } } -- cgit v1.1 From e577ebde9fb161bdc87511763c75dfad291059e2 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:30 -0700 Subject: page-types: make voffset local variables Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 18e891a..56f33b6 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -159,7 +159,6 @@ static unsigned long opt_size[MAX_ADDR_RANGES]; static int nr_vmas; static unsigned long pg_start[MAX_VMAS]; static unsigned long pg_end[MAX_VMAS]; -static unsigned long voffset; #define MAX_BIT_FILTERS 64 static int nr_bit_filters; @@ -328,7 +327,8 @@ static char *page_flag_longname(uint64_t flags) * page list and summary */ -static void show_page_range(unsigned long offset, uint64_t flags) +static void show_page_range(unsigned long voffset, + unsigned long offset, uint64_t flags) { static uint64_t flags0; static unsigned long voff; @@ -354,7 +354,8 @@ static void show_page_range(unsigned long offset, uint64_t flags) count = 1; } -static void show_page(unsigned long offset, uint64_t flags) +static void show_page(unsigned long voffset, + unsigned long offset, uint64_t flags) { if (opt_pid) printf("%lx\t", voffset); @@ -435,7 +436,6 @@ static uint64_t well_known_flags(uint64_t flags) return flags; } - /* * page frame walker */ @@ -467,7 +467,8 @@ static int hash_slot(uint64_t flags) exit(EXIT_FAILURE); } -static void add_page(unsigned long offset, uint64_t flags) +static void add_page(unsigned long voffset, + unsigned long offset, uint64_t flags) { flags = expand_overloaded_flags(flags); @@ -478,16 +479,18 @@ static void add_page(unsigned long offset, uint64_t flags) return; if (opt_list == 1) - show_page_range(offset, flags); + show_page_range(voffset, offset, flags); else if (opt_list == 2) - show_page(offset, flags); + show_page(voffset, offset, flags); nr_pages[hash_slot(flags)]++; total_pages++; } #define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ -static void walk_pfn(unsigned long index, unsigned long count) +static void walk_pfn(unsigned long voffset, + unsigned long index, + unsigned long count) { uint64_t buf[KPAGEFLAGS_BATCH]; unsigned long batch; @@ -501,7 +504,7 @@ static void walk_pfn(unsigned long index, unsigned long count) break; for (i = 0; i < pages; i++) - add_page(index + i, buf[i]); + add_page(voffset + i, index + i, buf[i]); index += pages; count -= pages; @@ -525,9 +528,8 @@ static void walk_vma(unsigned long index, unsigned long count) for (i = 0; i < pages; i++) { pfn = pagemap_pfn(buf[i]); - voffset = index + i; if (pfn) - walk_pfn(pfn, 1); + walk_pfn(index + i, pfn, 1); } index += pages; @@ -537,8 +539,9 @@ static void walk_vma(unsigned long index, unsigned long count) static void walk_task(unsigned long index, unsigned long count) { - int i = 0; const unsigned long end = index + count; + unsigned long start; + int i = 0; while (index < end) { @@ -548,11 +551,11 @@ static void walk_task(unsigned long index, unsigned long count) if (pg_start[i] >= end) return; - voffset = max_t(unsigned long, pg_start[i], index); - index = min_t(unsigned long, pg_end[i], end); + start = max_t(unsigned long, pg_start[i], index); + index = min_t(unsigned long, pg_end[i], end); - assert(voffset < index); - walk_vma(voffset, index - voffset); + assert(start < index); + walk_vma(start, index - start); } } @@ -577,7 +580,7 @@ static void walk_addr_ranges(void) for (i = 0; i < nr_addr_ranges; i++) if (!opt_pid) - walk_pfn(opt_offset[i], opt_size[i]); + walk_pfn(0, opt_offset[i], opt_size[i]); else walk_task(opt_offset[i], opt_size[i]); @@ -879,7 +882,7 @@ int main(int argc, char *argv[]) walk_addr_ranges(); if (opt_list == 1) - show_page_range(0, 0); /* drain the buffer */ + show_page_range(0, 0, 0); /* drain the buffer */ if (opt_no_summary) return 0; -- cgit v1.1 From 48640d69f5f06fe951a4de065a7f374cb9ee6040 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:31 -0700 Subject: page-types: introduce kpageflags_flags() Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 56f33b6..11b9a03 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -436,6 +436,16 @@ static uint64_t well_known_flags(uint64_t flags) return flags; } +static uint64_t kpageflags_flags(uint64_t flags) +{ + flags = expand_overloaded_flags(flags); + + if (!opt_raw) + flags = well_known_flags(flags); + + return flags; +} + /* * page frame walker */ @@ -470,10 +480,7 @@ static int hash_slot(uint64_t flags) static void add_page(unsigned long voffset, unsigned long offset, uint64_t flags) { - flags = expand_overloaded_flags(flags); - - if (!opt_raw) - flags = well_known_flags(flags); + flags = kpageflags_flags(flags); if (!bit_mask_ok(flags)) return; -- cgit v1.1 From a54fed9f70a2765f4476e1ce3d691a2f31df258f Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:32 -0700 Subject: page-types: add hwpoison/unpoison feature For hwpoison stress testing. The debugfs mount point is assumed to be /debug/. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 73 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 11b9a03..3ec4f2a 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -170,6 +170,13 @@ static int page_size; static int pagemap_fd; static int kpageflags_fd; +static int opt_hwpoison; +static int opt_unpoison; + +static char *hwpoison_debug_fs = "/debug/hwpoison"; +static int hwpoison_inject_fd; +static int hwpoison_forget_fd; + #define HASH_SHIFT 13 #define HASH_SIZE (1 << HASH_SHIFT) #define HASH_MASK (HASH_SIZE - 1) @@ -447,6 +454,53 @@ static uint64_t kpageflags_flags(uint64_t flags) } /* + * page actions + */ + +static void prepare_hwpoison_fd(void) +{ + char buf[100]; + + if (opt_hwpoison && !hwpoison_inject_fd) { + sprintf(buf, "%s/corrupt-pfn", hwpoison_debug_fs); + hwpoison_inject_fd = checked_open(buf, O_WRONLY); + } + + if (opt_unpoison && !hwpoison_forget_fd) { + sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs); + hwpoison_forget_fd = checked_open(buf, O_WRONLY); + } +} + +static int hwpoison_page(unsigned long offset) +{ + char buf[100]; + int len; + + len = sprintf(buf, "0x%lx\n", offset); + len = write(hwpoison_inject_fd, buf, len); + if (len < 0) { + perror("hwpoison inject"); + return len; + } + return 0; +} + +static int unpoison_page(unsigned long offset) +{ + char buf[100]; + int len; + + len = sprintf(buf, "0x%lx\n", offset); + len = write(hwpoison_forget_fd, buf, len); + if (len < 0) { + perror("hwpoison forget"); + return len; + } + return 0; +} + +/* * page frame walker */ @@ -485,6 +539,11 @@ static void add_page(unsigned long voffset, if (!bit_mask_ok(flags)) return; + if (opt_hwpoison) + hwpoison_page(offset); + if (opt_unpoison) + unpoison_page(offset); + if (opt_list == 1) show_page_range(voffset, offset, flags); else if (opt_list == 2) @@ -624,6 +683,8 @@ static void usage(void) " -l|--list Show page details in ranges\n" " -L|--list-each Show page details one by one\n" " -N|--no-summary Don't show summay info\n" +" -X|--hwpoison hwpoison pages\n" +" -x|--unpoison unpoison pages\n" " -h|--help Show this usage message\n" "addr-spec:\n" " N one page at offset N (unit: pages)\n" @@ -833,6 +894,8 @@ static struct option opts[] = { { "list" , 0, NULL, 'l' }, { "list-each" , 0, NULL, 'L' }, { "no-summary", 0, NULL, 'N' }, + { "hwpoison" , 0, NULL, 'X' }, + { "unpoison" , 0, NULL, 'x' }, { "help" , 0, NULL, 'h' }, { NULL , 0, NULL, 0 } }; @@ -844,7 +907,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); while ((c = getopt_long(argc, argv, - "rp:f:a:b:lLNh", opts, NULL)) != -1) { + "rp:f:a:b:lLNXxh", opts, NULL)) != -1) { switch (c) { case 'r': opt_raw = 1; @@ -870,6 +933,14 @@ int main(int argc, char *argv[]) case 'N': opt_no_summary = 1; break; + case 'X': + opt_hwpoison = 1; + prepare_hwpoison_fd(); + break; + case 'x': + opt_unpoison = 1; + prepare_hwpoison_fd(); + break; case 'h': usage(); exit(0); -- cgit v1.1 From d0153ca35d344d9b640dc305031b0703ba3f30f0 Mon Sep 17 00:00:00 2001 From: Alok Kataria Date: Tue, 29 Sep 2009 10:25:24 -0700 Subject: x86, vmi: Mark VMI deprecated and schedule it for removal Add text in feature-removal.txt indicating that VMI will be removed in the 2.6.37 timeframe. Signed-off-by: Alok N Kataria Acked-by: Chris Wright LKML-Reference: <1254193238.13456.48.camel@ank32.eng.vmware.com> [ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ] Signed-off-by: H. Peter Anvin Signed-off-by: Ingo Molnar --- Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 89a47b5..04e6c81 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR will also allow making ALSA OSS emulation independent of sound_core. The dependency will be broken then too. Who: Tejun Heo + +---------------------------- + +What: Support for VMware's guest paravirtuliazation technique [VMI] will be + dropped. +When: 2.6.37 or earlier. +Why: With the recent innovations in CPU hardware acceleration technologies + from Intel and AMD, VMware ran a few experiments to compare these + techniques to guest paravirtualization technique on VMware's platform. + These hardware assisted virtualization techniques have outperformed the + performance benefits provided by VMI in most of the workloads. VMware + expects that these hardware features will be ubiquitous in a couple of + years, as a result, VMware has started a phased retirement of this + feature from the hypervisor. We will be removing this feature from the + Kernel too. Right now we are targeting 2.6.37 but can retire earlier if + technical reasons (read opportunity to remove major chunk of pvops) + arise. + + Please note that VMI has always been an optimization and non-VMI kernels + still work fine on VMware's platform. + Latest versions of VMware's product which support VMI are, + Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence + releases for these products will continue supporting VMI. + + For more details about VMI retirement take a look at this, + http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html + +Who: Alok N Kataria + +---------------------------- -- cgit v1.1 From 6dbce5218298c0c77a740b3ffe97cb8c304e1710 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 17 Sep 2009 17:37:12 +0200 Subject: ext3: Update documentation about ext3 quota mount options Signed-off-by: Jan Kara --- Documentation/filesystems/ext3.txt | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index 570f9bd..05d5cf1 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -123,10 +123,18 @@ resuid=n The user ID which may use the reserved blocks. sb=n Use alternate superblock at this location. -quota -noquota -grpquota -usrquota +quota These options are ignored by the filesystem. They +noquota are used only by quota tools to recognize volumes +grpquota where quota should be turned on. See documentation +usrquota in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + +jqfmt= These options tell filesystem details about quota +usrjquota= so that quota information can be properly updated +grpjquota= during journal replay. They replace the above + quota options. See documentation in the quota-tools + package for more details + (http://sourceforge.net/projects/linuxquota). bh (*) ext3 associates buffer heads to data pages to nobh (a) cache disk block mapping information -- cgit v1.1 From 54930531a00af5a1c33361a02e67dd1802110465 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 11 Oct 2009 17:38:29 +0200 Subject: ALSA: hda - Fix mute sound with STAC9227/9228 codecs On FSC laptops, the sound gets muted gradually when the volume is chnaged. This is due to the wrong volume-knob widget setup. The delta bit (bit 7) shouldn't be set for these devices. This patch adds a new quirk to set the value 0x7f to the widget 0x24 instead of 0xff. Reference: Novell bnc#546006 http://bugzilla.novell.com/show_bug.cgi?id=546006 Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/HD-Audio-Models.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index a2643cf..4bf953b 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -359,6 +359,7 @@ STAC9227/9228/9229/927x 5stack-no-fp D965 5stack without front panel dell-3stack Dell Dimension E520 dell-bios Fixes with Dell BIOS setup + volknob Fixes with volume-knob widget 0x24 auto BIOS setup (default) STAC92HD71B* -- cgit v1.1 From 99b830aa553668a2c433e4cbff130224a75c74bb Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Mon, 12 Oct 2009 15:45:13 +0000 Subject: USB: rename Documentation/ABI/.../sysfs-class-usb_host The usb_host class is no more. Rename its documentation file (which only contained WUSB specific files) to .../sysfs-class-uwb_rc-wusbhc. Signed-off-by: David Vrabel Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-class-usb_host | 25 ---------------------- .../ABI/testing/sysfs-class-uwb_rc-wusbhc | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 25 deletions(-) delete mode 100644 Documentation/ABI/testing/sysfs-class-usb_host create mode 100644 Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-usb_host b/Documentation/ABI/testing/sysfs-class-usb_host deleted file mode 100644 index 46b66ad1..0000000 --- a/Documentation/ABI/testing/sysfs-class-usb_host +++ /dev/null @@ -1,25 +0,0 @@ -What: /sys/class/usb_host/usb_hostN/wusb_chid -Date: July 2008 -KernelVersion: 2.6.27 -Contact: David Vrabel -Description: - Write the CHID (16 space-separated hex octets) for this host controller. - This starts the host controller, allowing it to accept connection from - WUSB devices. - - Set an all zero CHID to stop the host controller. - -What: /sys/class/usb_host/usb_hostN/wusb_trust_timeout -Date: July 2008 -KernelVersion: 2.6.27 -Contact: David Vrabel -Description: - Devices that haven't sent a WUSB packet to the host - within 'wusb_trust_timeout' ms are considered to have - disconnected and are removed. The default value of - 4000 ms is the value required by the WUSB - specification. - - Since this relates to security (specifically, the - lifetime of PTKs and GTKs) it should not be changed - from the default. diff --git a/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc new file mode 100644 index 0000000..4e8106f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc @@ -0,0 +1,25 @@ +What: /sys/class/uwb_rc/uwbN/wusbhc/wusb_chid +Date: July 2008 +KernelVersion: 2.6.27 +Contact: David Vrabel +Description: + Write the CHID (16 space-separated hex octets) for this host controller. + This starts the host controller, allowing it to accept connection from + WUSB devices. + + Set an all zero CHID to stop the host controller. + +What: /sys/class/uwb_rc/uwbN/wusbhc/wusb_trust_timeout +Date: July 2008 +KernelVersion: 2.6.27 +Contact: David Vrabel +Description: + Devices that haven't sent a WUSB packet to the host + within 'wusb_trust_timeout' ms are considered to have + disconnected and are removed. The default value of + 4000 ms is the value required by the WUSB + specification. + + Since this relates to security (specifically, the + lifetime of PTKs and GTKs) it should not be changed + from the default. -- cgit v1.1 From 1243ba98e3abecd12e9e90dd390801b95ef070f2 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Wed, 14 Oct 2009 12:43:22 -0600 Subject: Update flex_arrays.txt The 2.6.32 merge window brought a number of changes to the flexible array API; this patch updates the documentation to match the new state of affairs. Acked-by: David Rientjes Signed-off-by: Jonathan Corbet --- Documentation/flexible-arrays.txt | 43 +++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt index 84eb268..cb8a3a0 100644 --- a/Documentation/flexible-arrays.txt +++ b/Documentation/flexible-arrays.txt @@ -1,5 +1,5 @@ Using flexible arrays in the kernel -Last updated for 2.6.31 +Last updated for 2.6.32 Jonathan Corbet Large contiguous memory allocations can be unreliable in the Linux kernel. @@ -40,6 +40,13 @@ argument is passed directly to the internal memory allocation calls. With the current code, using flags to ask for high memory is likely to lead to notably unpleasant side effects. +It is also possible to define flexible arrays at compile time with: + + DEFINE_FLEX_ARRAY(name, element_size, total); + +This macro will result in a definition of an array with the given name; the +element size and total will be checked for validity at compile time. + Storing data into a flexible array is accomplished with a call to: int flex_array_put(struct flex_array *array, unsigned int element_nr, @@ -76,16 +83,30 @@ particular element has never been allocated. Note that it is possible to get back a valid pointer for an element which has never been stored in the array. Memory for array elements is allocated one page at a time; a single allocation could provide memory for several -adjacent elements. The flexible array code does not know if a specific -element has been written; it only knows if the associated memory is -present. So a flex_array_get() call on an element which was never stored -in the array has the potential to return a pointer to random data. If the -caller does not have a separate way to know which elements were actually -stored, it might be wise, at least, to add GFP_ZERO to the flags argument -to ensure that all elements are zeroed. - -There is no way to remove a single element from the array. It is possible, -though, to remove all elements with a call to: +adjacent elements. Flexible array elements are normally initialized to the +value FLEX_ARRAY_FREE (defined as 0x6c in ), so errors +involving that number probably result from use of unstored array entries. +Note that, if array elements are allocated with __GFP_ZERO, they will be +initialized to zero and this poisoning will not happen. + +Individual elements in the array can be cleared with: + + int flex_array_clear(struct flex_array *array, unsigned int element_nr); + +This function will set the given element to FLEX_ARRAY_FREE and return +zero. If storage for the indicated element is not allocated for the array, +flex_array_clear() will return -EINVAL instead. Note that clearing an +element does not release the storage associated with it; to reduce the +allocated size of an array, call: + + int flex_array_shrink(struct flex_array *array); + +The return value will be the number of pages of memory actually freed. +This function works by scanning the array for pages containing nothing but +FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work +if the array's pages are allocated with __GFP_ZERO. + +It is possible to remove all elements of an array with a call to: void flex_array_free_parts(struct flex_array *array); -- cgit v1.1 From cdc321ff0af78e818c97d4787f62bf52bdf9db2a Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Mon, 29 Jun 2009 11:13:30 -0400 Subject: inotify: deprecate the inotify kernel interface In 2.6.33 there will be no users of the inotify interface. Mark it for removal as fsnotify is more generic and is easier to use. Signed-off-by: Eric Paris --- Documentation/feature-removal-schedule.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 04e6c81..bc693ff 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -418,6 +418,14 @@ When: 2.6.33 Why: Should be implemented in userspace, policy daemon. Who: Johannes Berg +--------------------------- + +What: CONFIG_INOTIFY +When: 2.6.33 +Why: last user (audit) will be converted to the newer more generic + and more easily maintained fsnotify subsystem +Who: Eric Paris + ---------------------------- What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be -- cgit v1.1 From e95646c3ec33c8ec0693992da4332a6b32eb7e31 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Wed, 30 Sep 2009 11:17:21 +0200 Subject: virtio: let header files include virtio_ids.h Rusty, commit 3ca4f5ca73057a617f9444a91022d7127041970a virtio: add virtio IDs file moved all device IDs into a single file. While the change itself is a very good one, it can break userspace applications. For example if a userspace tool wanted to get the ID of virtio_net it used to include virtio_net.h. This does no longer work, since virtio_net.h does not include virtio_ids.h. This patch moves all "#include " from the C files into the header files, making the header files compatible with the old ones. In addition, this patch exports virtio_ids.h to userspace. CC: Fernando Luis Vazquez Cao Signed-off-by: Christian Borntraeger Signed-off-by: Rusty Russell --- Documentation/lguest/lguest.c | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index ba9373f..098de5b 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -42,7 +42,6 @@ #include #include "linux/lguest_launcher.h" #include "linux/virtio_config.h" -#include #include "linux/virtio_net.h" #include "linux/virtio_blk.h" #include "linux/virtio_console.h" -- cgit v1.1 From 115a57c5b31ab560574fe1a09deaba2ae89e77b5 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 26 Oct 2009 16:50:07 -0700 Subject: hwmon: enhance the sysfs API for power meters Augment the documentation of the hwmon sysfs API to accomodate ACPI power meters and the current desired behavior of power capping hardware drivers. Signed-off-by: Darrick J. Wong Cc: Zhang Rui Cc: Pavel Machek Cc: Len Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/hwmon/sysfs-interface | 57 ++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface index dcbd502..82def88 100644 --- a/Documentation/hwmon/sysfs-interface +++ b/Documentation/hwmon/sysfs-interface @@ -353,10 +353,20 @@ power[1-*]_average Average power use Unit: microWatt RO -power[1-*]_average_interval Power use averaging interval +power[1-*]_average_interval Power use averaging interval. A poll + notification is sent to this file if the + hardware changes the averaging interval. Unit: milliseconds RW +power[1-*]_average_interval_max Maximum power use averaging interval + Unit: milliseconds + RO + +power[1-*]_average_interval_min Minimum power use averaging interval + Unit: milliseconds + RO + power[1-*]_average_highest Historical average maximum power use Unit: microWatt RO @@ -365,6 +375,18 @@ power[1-*]_average_lowest Historical average minimum power use Unit: microWatt RO +power[1-*]_average_max A poll notification is sent to + power[1-*]_average when power use + rises above this value. + Unit: microWatt + RW + +power[1-*]_average_min A poll notification is sent to + power[1-*]_average when power use + sinks below this value. + Unit: microWatt + RW + power[1-*]_input Instantaneous power use Unit: microWatt RO @@ -381,6 +403,39 @@ power[1-*]_reset_history Reset input_highest, input_lowest, average_highest and average_lowest. WO +power[1-*]_accuracy Accuracy of the power meter. + Unit: Percent + RO + +power[1-*]_alarm 1 if the system is drawing more power than the + cap allows; 0 otherwise. A poll notification is + sent to this file when the power use exceeds the + cap. This file only appears if the cap is known + to be enforced by hardware. + RO + +power[1-*]_cap If power use rises above this limit, the + system should take action to reduce power use. + A poll notification is sent to this file if the + cap is changed by the hardware. The *_cap + files only appear if the cap is known to be + enforced by hardware. + Unit: microWatt + RW + +power[1-*]_cap_hyst Margin of hysteresis built around capping and + notification. + Unit: microWatt + RW + +power[1-*]_cap_max Maximum cap that can be set. + Unit: microWatt + RO + +power[1-*]_cap_min Minimum cap that can be set. + Unit: microWatt + RO + ********** * Energy * ********** -- cgit v1.1 From 468727ab1273a0f95562befa611a3ce39778599c Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:15 -0600 Subject: Documentation: ABI: rename sysfs-devices-cache_disable properly Rename sysfs-devices-cache_disable to sysfs-devices-system-cpu, in order to keep a stricter correlation between a sysfs directory and its documentation. Reported-by: David Rientjes Signed-off-by: Alex Chiang Acked-by: David Rientjes Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-cache_disable | 18 ------------------ Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 18 deletions(-) delete mode 100644 Documentation/ABI/testing/sysfs-devices-cache_disable create mode 100644 Documentation/ABI/testing/sysfs-devices-system-cpu (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable deleted file mode 100644 index 175bb4f..0000000 --- a/Documentation/ABI/testing/sysfs-devices-cache_disable +++ /dev/null @@ -1,18 +0,0 @@ -What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X -Date: August 2008 -KernelVersion: 2.6.27 -Contact: mark.langsdorf@amd.com -Description: These files exist in every cpu's cache index directories. - There are currently 2 cache_disable_# files in each - directory. Reading from these files on a supported - processor will return that cache disable index value - for that processor and node. Writing to one of these - files will cause the specificed cache index to be disabled. - - Currently, only AMD Family 10h Processors support cache index - disable, and only for their L3 caches. See the BIOS and - Kernel Developer's Guide at - http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf - for formatting information and other details on the - cache index disable. -Users: joachim.deguara@amd.com diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu new file mode 100644 index 0000000..175bb4f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -0,0 +1,18 @@ +What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X +Date: August 2008 +KernelVersion: 2.6.27 +Contact: mark.langsdorf@amd.com +Description: These files exist in every cpu's cache index directories. + There are currently 2 cache_disable_# files in each + directory. Reading from these files on a supported + processor will return that cache disable index value + for that processor and node. Writing to one of these + files will cause the specificed cache index to be disabled. + + Currently, only AMD Family 10h Processors support cache index + disable, and only for their L3 caches. See the BIOS and + Kernel Developer's Guide at + http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf + for formatting information and other details on the + cache index disable. +Users: joachim.deguara@amd.com -- cgit v1.1 From 2ceb3fb0a78c671892b01319ac8e3baede33a78c Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:20 -0600 Subject: Documentation: ABI: document /sys/devices/system/cpu/ This interface has been around for a long time, but hasn't been officially documented. Document the top level sysfs directory for CPU attributes. Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 175bb4f..86126b1 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -1,3 +1,15 @@ +What: /sys/devices/system/cpu/ +Date: pre-git history +Contact: Linux kernel mailing list +Description: + A collection of both global and individual CPU attributes + + Individual CPU attributes are contained in subdirectories + named by the kernel's logical CPU number, e.g.: + + /sys/devices/system/cpu/cpu#/ + + What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X Date: August 2008 KernelVersion: 2.6.27 -- cgit v1.1 From d93fc863d2d2cea1057996c39cef368f41741448 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:25 -0600 Subject: Documentation: ABI: /sys/devices/system/cpu/ topology files Add brief descriptions for the following sysfs files: /sys/devices/system/cpu/kernel_max /sys/devices/system/cpu/offline /sys/devices/system/cpu/online /sys/devices/system/cpu/possible /sys/devices/system/cpu/present Excerpted the relevant information from Documentation/cputopology.txt and pointed back to cputopology.txt as the authoritative source of information. Cc: Mike Travis Cc: Rusty Russell Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 28 ++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 86126b1..871acdb 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -10,6 +10,34 @@ Description: /sys/devices/system/cpu/cpu#/ +What: /sys/devices/system/cpu/kernel_max + /sys/devices/system/cpu/offline + /sys/devices/system/cpu/online + /sys/devices/system/cpu/possible + /sys/devices/system/cpu/present +Date: December 2008 +Contact: Linux kernel mailing list +Description: CPU topology files that describe kernel limits related to + hotplug. Briefly: + + kernel_max: the maximum cpu index allowed by the kernel + configuration. + + offline: cpus that are not online because they have been + HOTPLUGGED off or exceed the limit of cpus allowed by the + kernel configuration (kernel_max above). + + online: cpus that are online and being scheduled. + + possible: cpus that have been allocated resources and can be + brought online if they are present. + + present: cpus that have been identified as being present in + the system. + + See Documentation/cputopology.txt for more information. + + What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X Date: August 2008 KernelVersion: 2.6.27 -- cgit v1.1 From 663fb2fc733006f685400fb44551303b72b61a88 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:31 -0600 Subject: Documentation: ABI: /sys/devices/system/cpu/cpu#/ topology files Add brief descriptions for the following sysfs files: /sys/devices/system/cpu/cpu#/topology/core_id /sys/devices/system/cpu/cpu#/topology/core_siblings /sys/devices/system/cpu/cpu#/topology/core_siblings_list /sys/devices/system/cpu/cpu#/topology/physical_package_id /sys/devices/system/cpu/cpu#/topology/thread_siblings /sys/devices/system/cpu/cpu#/topology/thread_siblings_list The descriptions in Documentation/cputopology.txt weren't very informative, so I attempted a better description based on code reading and hopeful guessing. Updated Documentation/cputopology.txt with the better descriptions and fixed some style issues. Cc: Mike Travis Cc: Rusty Russell Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++ Documentation/cputopology.txt | 47 ++++++++++++++-------- 2 files changed, 69 insertions(+), 17 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 871acdb..2ade5c0 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -38,6 +38,45 @@ Description: CPU topology files that describe kernel limits related to See Documentation/cputopology.txt for more information. +What: /sys/devices/system/cpu/cpu#/topology/core_id + /sys/devices/system/cpu/cpu#/topology/core_siblings + /sys/devices/system/cpu/cpu#/topology/core_siblings_list + /sys/devices/system/cpu/cpu#/topology/physical_package_id + /sys/devices/system/cpu/cpu#/topology/thread_siblings + /sys/devices/system/cpu/cpu#/topology/thread_siblings_list +Date: December 2008 +Contact: Linux kernel mailing list +Description: CPU topology files that describe a logical CPU's relationship + to other cores and threads in the same physical package. + + One cpu# directory is created per logical CPU in the system, + e.g. /sys/devices/system/cpu/cpu42/. + + Briefly, the files above are: + + core_id: the CPU core ID of cpu#. Typically it is the + hardware platform's identifier (rather than the kernel's). + The actual value is architecture and platform dependent. + + core_siblings: internal kernel map of cpu#'s hardware threads + within the same physical_package_id. + + core_siblings_list: human-readable list of the logical CPU + numbers within the same physical_package_id as cpu#. + + physical_package_id: physical package id of cpu#. Typically + corresponds to a physical socket number, but the actual value + is architecture and platform dependent. + + thread_siblings: internel kernel map of cpu#'s hardware + threads within the same core as cpu# + + thread_siblings_list: human-readable list of cpu#'s hardware + threads within the same core as cpu# + + See Documentation/cputopology.txt for more information. + + What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X Date: August 2008 KernelVersion: 2.6.27 diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt index b41f3e5..f1c5c4b 100644 --- a/Documentation/cputopology.txt +++ b/Documentation/cputopology.txt @@ -1,15 +1,28 @@ -Export cpu topology info via sysfs. Items (attributes) are similar +Export CPU topology info via sysfs. Items (attributes) are similar to /proc/cpuinfo. 1) /sys/devices/system/cpu/cpuX/topology/physical_package_id: -represent the physical package id of cpu X; + + physical package id of cpuX. Typically corresponds to a physical + socket number, but the actual value is architecture and platform + dependent. + 2) /sys/devices/system/cpu/cpuX/topology/core_id: -represent the cpu core id to cpu X; + + the CPU core ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. + 3) /sys/devices/system/cpu/cpuX/topology/thread_siblings: -represent the thread siblings to cpu X in the same core; + + internel kernel map of cpuX's hardware threads within the same + core as cpuX + 4) /sys/devices/system/cpu/cpuX/topology/core_siblings: -represent the thread siblings to cpu X in the same physical package; + + internal kernel map of cpuX's hardware threads within the same + physical_package_id. To implement it in an architecture-neutral way, a new source file, drivers/base/topology.c, is to export the 4 attributes. @@ -32,32 +45,32 @@ not defined by include/asm-XXX/topology.h: 3) thread_siblings: just the given CPU 4) core_siblings: just the given CPU -Additionally, cpu topology information is provided under +Additionally, CPU topology information is provided under /sys/devices/system/cpu and includes these files. The internal source for the output is in brackets ("[]"). - kernel_max: the maximum cpu index allowed by the kernel configuration. + kernel_max: the maximum CPU index allowed by the kernel configuration. [NR_CPUS-1] - offline: cpus that are not online because they have been + offline: CPUs that are not online because they have been HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit - of cpus allowed by the kernel configuration (kernel_max + of CPUs allowed by the kernel configuration (kernel_max above). [~cpu_online_mask + cpus >= NR_CPUS] - online: cpus that are online and being scheduled [cpu_online_mask] + online: CPUs that are online and being scheduled [cpu_online_mask] - possible: cpus that have been allocated resources and can be + possible: CPUs that have been allocated resources and can be brought online if they are present. [cpu_possible_mask] - present: cpus that have been identified as being present in the + present: CPUs that have been identified as being present in the system. [cpu_present_mask] The format for the above output is compatible with cpulist_parse() [see ]. Some examples follow. -In this example, there are 64 cpus in the system but cpus 32-63 exceed +In this example, there are 64 CPUs in the system but cpus 32-63 exceed the kernel max which is limited to 0..31 by the NR_CPUS config option -being 32. Note also that cpus 2 and 4-31 are not online but could be +being 32. Note also that CPUs 2 and 4-31 are not online but could be brought online as they are both present and possible. kernel_max: 31 @@ -67,8 +80,8 @@ brought online as they are both present and possible. present: 0-31 In this example, the NR_CPUS config option is 128, but the kernel was -started with possible_cpus=144. There are 4 cpus in the system and cpu2 -was manually taken offline (and is the only cpu that can be brought +started with possible_cpus=144. There are 4 CPUs in the system and cpu2 +was manually taken offline (and is the only CPU that can be brought online.) kernel_max: 127 @@ -78,4 +91,4 @@ online.) present: 0-3 See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter -as well as more information on the various cpumask's. +as well as more information on the various cpumasks. -- cgit v1.1 From e6dcfa7c61c4d31797a12d738bfe0bdec0ca2be1 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:36 -0600 Subject: Documentation: ABI: /sys/devices/system/cpu/sched_[mc|smt]_power_savings Document sched_[mc|smt]_power_savings by reading existing code and git logs. Cc: Suresh Siddha Cc: Ingo Molnar Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 24 ++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 2ade5c0..8cdda1c 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -9,6 +9,30 @@ Description: /sys/devices/system/cpu/cpu#/ +What: /sys/devices/system/cpu/sched_mc_power_savings + /sys/devices/system/cpu/sched_smt_power_savings +Date: June 2006 +Contact: Linux kernel mailing list +Description: Discover and adjust the kernel's multi-core scheduler support. + + Possible values are: + + 0 - No power saving load balance (default value) + 1 - Fill one thread/core/package first for long running threads + 2 - Also bias task wakeups to semi-idle cpu package for power + savings + + sched_mc_power_savings is dependent upon SCHED_MC, which is + itself architecture dependent. + + sched_smt_power_savings is dependent upon SCHED_SMT, which + is itself architecture dependent. + + The two files are independent of each other. It is possible + that one file may be present without the other. + + Introduced by git commit 5c45bf27. + What: /sys/devices/system/cpu/kernel_max /sys/devices/system/cpu/offline -- cgit v1.1 From c1fb5c475126b77b47ba762f5b48535cd0420d24 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:41 -0600 Subject: Documentation: ABI: /sys/devices/system/cpu/cpuidle/ Document cpuidle sysfs attributes by reading code, Documentation/cpuidle/, and git logs. Cc: Venki Pallipadi Cc: Len Brown Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 8cdda1c..968d8ba 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -101,6 +101,26 @@ Description: CPU topology files that describe a logical CPU's relationship See Documentation/cputopology.txt for more information. +What: /sys/devices/system/cpu/cpuidle/current_driver + /sys/devices/system/cpu/cpuidle/current_governer_ro +Date: September 2007 +Contact: Linux kernel mailing list +Description: Discover cpuidle policy and mechanism + + Various CPUs today support multiple idle levels that are + differentiated by varying exit latencies and power + consumption during idle. + + Idle policy (governor) is differentiated from idle mechanism + (driver) + + current_driver: displays current idle mechanism + + current_governor_ro: displays current idle policy + + See files in Documentation/cpuidle/ for more information. + + What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X Date: August 2008 KernelVersion: 2.6.27 -- cgit v1.1 From 657348a056eea4a27be20cf8e22c98a252597447 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 22:15:30 -0600 Subject: Documentation: ABI: /sys/devices/system/cpu/cpu#/node Describe NUMA node symlink created for CPUs when CONFIG_NUMA is set. Cc: Randy Dunlap Signed-off-by: Alex Chiang Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 968d8ba..a703b9e 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -62,6 +62,21 @@ Description: CPU topology files that describe kernel limits related to See Documentation/cputopology.txt for more information. + +What: /sys/devices/system/cpu/cpu#/node +Date: October 2009 +Contact: Linux memory management mailing list +Description: Discover NUMA node a CPU belongs to + + When CONFIG_NUMA is enabled, a symbolic link that points + to the corresponding NUMA node directory. + + For example, the following symlink is created for cpu42 + in NUMA node 2: + + /sys/devices/system/cpu/cpu42/node2 -> ../../node/node2 + + What: /sys/devices/system/cpu/cpu#/topology/core_id /sys/devices/system/cpu/cpu#/topology/core_siblings /sys/devices/system/cpu/cpu#/topology/core_siblings_list -- cgit v1.1