summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/kernel-parameters.txt3
-rw-r--r--Documentation/memory-barriers.txt129
-rw-r--r--Documentation/scheduler/sched-rt-group.txt20
-rw-r--r--Documentation/trace/ftrace.txt15
-rw-r--r--Documentation/x86/boot.txt122
5 files changed, 276 insertions, 13 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fd5cac0..11648c1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1575,6 +1575,9 @@ and is between 256 and 4096 characters. It is defined in the file
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
+ nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
+ remapping.
+
nointroute [IA-64]
nojitter [IA64] Disables jitter checking for ITC timers.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f5b7127..7f5809e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -31,6 +31,7 @@ Contents:
- Locking functions.
- Interrupt disabling functions.
+ - Sleep and wake-up functions.
- Miscellaneous functions.
(*) Inter-CPU locking barrier effects.
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
other means.
+SLEEP AND WAKE-UP FUNCTIONS
+---------------------------
+
+Sleeping and waking on an event flagged in global data can be viewed as an
+interaction between two pieces of data: the task state of the task waiting for
+the event and the global data used to indicate the event. To make sure that
+these appear to happen in the right order, the primitives to begin the process
+of going to sleep, and the primitives to initiate a wake up imply certain
+barriers.
+
+Firstly, the sleeper normally follows something like this sequence of events:
+
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ schedule();
+ }
+
+A general memory barrier is interpolated automatically by set_current_state()
+after it has altered the task state:
+
+ CPU 1
+ ===============================
+ set_current_state();
+ set_mb();
+ STORE current->state
+ <general barrier>
+ LOAD event_indicated
+
+set_current_state() may be wrapped by:
+
+ prepare_to_wait();
+ prepare_to_wait_exclusive();
+
+which therefore also imply a general memory barrier after setting the state.
+The whole sequence above is available in various canned forms, all of which
+interpolate the memory barrier in the right place:
+
+ wait_event();
+ wait_event_interruptible();
+ wait_event_interruptible_exclusive();
+ wait_event_interruptible_timeout();
+ wait_event_killable();
+ wait_event_timeout();
+ wait_on_bit();
+ wait_on_bit_lock();
+
+
+Secondly, code that performs a wake up normally follows something like this:
+
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+or:
+
+ event_indicated = 1;
+ wake_up_process(event_daemon);
+
+A write memory barrier is implied by wake_up() and co. if and only if they wake
+something up. The barrier occurs before the task state is cleared, and so sits
+between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ set_current_state(); STORE event_indicated
+ set_mb(); wake_up();
+ STORE current->state <write barrier>
+ <general barrier> STORE current->state
+ LOAD event_indicated
+
+The available waker functions include:
+
+ complete();
+ wake_up();
+ wake_up_all();
+ wake_up_bit();
+ wake_up_interruptible();
+ wake_up_interruptible_all();
+ wake_up_interruptible_nr();
+ wake_up_interruptible_poll();
+ wake_up_interruptible_sync();
+ wake_up_interruptible_sync_poll();
+ wake_up_locked();
+ wake_up_locked_poll();
+ wake_up_nr();
+ wake_up_poll();
+ wake_up_process();
+
+
+[!] Note that the memory barriers implied by the sleeper and the waker do _not_
+order multiple stores before the wake-up with respect to loads of those stored
+values after the sleeper has called set_current_state(). For instance, if the
+sleeper does:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ __set_current_state(TASK_RUNNING);
+ do_something(my_data);
+
+and the waker does:
+
+ my_data = value;
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+there's no guarantee that the change to event_indicated will be perceived by
+the sleeper as coming after the change to my_data. In such a circumstance, the
+code on both sides must interpolate its own memory barriers between the
+separate data accesses. Thus the above sleeper ought to do:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated) {
+ smp_rmb();
+ do_something(my_data);
+ }
+
+and the waker should do:
+
+ my_data = value;
+ smp_wmb();
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+
MISCELLANEOUS FUNCTIONS
-----------------------
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
Under normal operation, memory operation reordering is generally not going to
be a problem as a single-threaded linear piece of code will still appear to
-work correctly, even if it's in an SMP kernel. There are, however, three
+work correctly, even if it's in an SMP kernel. There are, however, four
circumstances in which reordering definitely _could_ be a problem:
(*) Interprocessor interaction.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 5ba4d3f..1df7f9c 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -4,6 +4,7 @@
CONTENTS
========
+0. WARNING
1. Overview
1.1 The problem
1.2 The solution
@@ -14,6 +15,23 @@ CONTENTS
3. Future plans
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unstable system, the knobs are
+ root only and assumes root knows what he is doing.
+
+Most notable:
+
+ * very small values in sched_rt_period_us can result in an unstable
+ system when the period is smaller than either the available hrtimer
+ resolution, or the time it takes to handle the budget refresh itself.
+
+ * very small values in sched_rt_runtime_us can result in an unstable
+ system when the runtime is so small the system has difficulty making
+ forward progress (NOTE: the migration thread and kstopmachine both
+ are real-time processes).
+
1. Overview
===========
@@ -169,7 +187,7 @@ get their allocated time.
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-139. With deadline scheduling you need to
+the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index fd9a3e6..e362f50 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -518,9 +518,18 @@ priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.
- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
+ Kernel Space User Space
+ ===============================================================
+ 0(high) to 98(low) user RT priority 99(high) to 1(low)
+ with SCHED_RR or SCHED_FIFO
+ ---------------------------------------------------------------
+ 99 sched_priority is not used in scheduling
+ decisions(it must be specified as 0)
+ ---------------------------------------------------------------
+ 100(high) to 139(low) user nice -20(high) to 19(low)
+ ---------------------------------------------------------------
+ 140 idle task priority
+ ---------------------------------------------------------------
The task states are:
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index e020366..8da3a79 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -50,6 +50,10 @@ Protocol 2.08: (Kernel 2.6.26) Added crc32 checksum and ELF format
Protocol 2.09: (Kernel 2.6.26) Added a field of 64-bit physical
pointer to single linked list of struct setup_data.
+Protocol 2.10: (Kernel 2.6.31) Added a protocol for relaxed alignment
+ beyond the kernel_alignment added, new init_size and
+ pref_address fields. Added extended boot loader IDs.
+
**** MEMORY LAYOUT
The traditional memory map for the kernel loader, used for Image or
@@ -168,12 +172,13 @@ Offset Proto Name Meaning
021C/4 2.00+ ramdisk_size initrd size (set by boot loader)
0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only
0224/2 2.01+ heap_end_ptr Free memory after setup end
-0226/2 N/A pad1 Unused
+0226/1 2.02+(3 ext_loader_ver Extended boot loader version
+0227/1 2.02+(3 ext_loader_type Extended boot loader ID
0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
022C/4 2.03+ ramdisk_max Highest legal initrd address
0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel
0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
-0235/1 N/A pad2 Unused
+0235/1 2.10+ min_alignment Minimum alignment, as a power of two
0236/2 N/A pad3 Unused
0238/4 2.06+ cmdline_size Maximum size of the kernel command line
023C/4 2.07+ hardware_subarch Hardware subarchitecture
@@ -182,6 +187,8 @@ Offset Proto Name Meaning
024C/4 2.08+ payload_length Length of kernel payload
0250/8 2.09+ setup_data 64-bit physical pointer to linked list
of struct setup_data
+0258/8 2.10+ pref_address Preferred loading address
+0260/4 2.10+ init_size Linear memory required during initialization
(1) For backwards compatibility, if the setup_sects field contains 0, the
real value is 4.
@@ -190,6 +197,8 @@ Offset Proto Name Meaning
field are unusable, which means the size of a bzImage kernel
cannot be determined.
+(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
+
If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
the boot protocol version is "old". Loading an old kernel, the
following parameters should be assumed:
@@ -343,18 +352,32 @@ Protocol: 2.00+
0xTV here, where T is an identifier for the boot loader and V is
a version number. Otherwise, enter 0xFF here.
+ For boot loader IDs above T = 0xD, write T = 0xE to this field and
+ write the extended ID minus 0x10 to the ext_loader_type field.
+ Similarly, the ext_loader_ver field can be used to provide more than
+ four bits for the bootloader version.
+
+ For example, for T = 0x15, V = 0x234, write:
+
+ type_of_loader <- 0xE4
+ ext_loader_type <- 0x05
+ ext_loader_ver <- 0x23
+
Assigned boot loader ids:
0 LILO (0x00 reserved for pre-2.00 bootloader)
1 Loadlin
2 bootsect-loader (0x20, all other values reserved)
- 3 SYSLINUX
- 4 EtherBoot
+ 3 Syslinux
+ 4 Etherboot/gPXE
5 ELILO
7 GRUB
- 8 U-BOOT
+ 8 U-Boot
9 Xen
A Gujin
B Qemu
+ C Arcturus Networks uCbootloader
+ E Extended (see ext_loader_type)
+ F Special (0xFF = undefined)
Please contact <hpa@zytor.com> if you need a bootloader ID
value assigned.
@@ -453,6 +476,35 @@ Protocol: 2.01+
Set this field to the offset (from the beginning of the real-mode
code) of the end of the setup stack/heap, minus 0x0200.
+Field name: ext_loader_ver
+Type: write (optional)
+Offset/size: 0x226/1
+Protocol: 2.02+
+
+ This field is used as an extension of the version number in the
+ type_of_loader field. The total version number is considered to be
+ (type_of_loader & 0x0f) + (ext_loader_ver << 4).
+
+ The use of this field is boot loader specific. If not written, it
+ is zero.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
+Field name: ext_loader_type
+Type: write (obligatory if (type_of_loader & 0xf0) == 0xe0)
+Offset/size: 0x227/1
+Protocol: 2.02+
+
+ This field is used as an extension of the type number in
+ type_of_loader field. If the type in type_of_loader is 0xE, then
+ the actual type is (ext_loader_type + 0x10).
+
+ This field is ignored if the type in type_of_loader is not 0xE.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
Field name: cmd_line_ptr
Type: write (obligatory)
Offset/size: 0x228/4
@@ -482,11 +534,19 @@ Protocol: 2.03+
0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
Field name: kernel_alignment
-Type: read (reloc)
+Type: read/modify (reloc)
Offset/size: 0x230/4
-Protocol: 2.05+
+Protocol: 2.05+ (read), 2.10+ (modify)
+
+ Alignment unit required by the kernel (if relocatable_kernel is
+ true.) A relocatable kernel that is loaded at an alignment
+ incompatible with the value in this field will be realigned during
+ kernel initialization.
- Alignment unit required by the kernel (if relocatable_kernel is true.)
+ Starting with protocol version 2.10, this reflects the kernel
+ alignment preferred for optimal performance; it is possible for the
+ loader to modify this field to permit a lesser alignment. See the
+ min_alignment and pref_address field below.
Field name: relocatable_kernel
Type: read (reloc)
@@ -498,6 +558,22 @@ Protocol: 2.05+
After loading, the boot loader must set the code32_start field to
point to the loaded code, or to a boot loader hook.
+Field name: min_alignment
+Type: read (reloc)
+Offset/size: 0x235/1
+Protocol: 2.10+
+
+ This field, if nonzero, indicates as a power of two the minimum
+ alignment required, as opposed to preferred, by the kernel to boot.
+ If a boot loader makes use of this field, it should update the
+ kernel_alignment field with the alignment unit desired; typically:
+
+ kernel_alignment = 1 << min_alignment
+
+ There may be a considerable performance cost with an excessively
+ misaligned kernel. Therefore, a loader should typically try each
+ power-of-two alignment from kernel_alignment down to this alignment.
+
Field name: cmdline_size
Type: read
Offset/size: 0x238/4
@@ -582,6 +658,36 @@ Protocol: 2.09+
sure to consider the case where the linked list already contains
entries.
+Field name: pref_address
+Type: read (reloc)
+Offset/size: 0x258/8
+Protocol: 2.10+
+
+ This field, if nonzero, represents a preferred load address for the
+ kernel. A relocating bootloader should attempt to load at this
+ address if possible.
+
+ A non-relocatable kernel will unconditionally move itself and to run
+ at this address.
+
+Field name: init_size
+Type: read
+Offset/size: 0x25c/4
+
+ This field indicates the amount of linear contiguous memory starting
+ at the kernel runtime start address that the kernel needs before it
+ is capable of examining its memory map. This is not the same thing
+ as the total amount of memory the kernel needs to boot, but it can
+ be used by a relocating boot loader to help select a safe load
+ address for the kernel.
+
+ The kernel runtime start address is determined by the following algorithm:
+
+ if (relocatable_kernel)
+ runtime_start = align_up(load_address, kernel_alignment)
+ else
+ runtime_start = pref_address
+
**** THE IMAGE CHECKSUM
OpenPOWER on IntegriCloud