summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/configfs-iio21
-rw-r--r--Documentation/ABI/testing/configfs-usb-gadget-sourcesink2
-rw-r--r--Documentation/ABI/testing/sysfs-bus-iio-ina2xx-adc24
-rw-r--r--Documentation/ABI/testing/sysfs-bus-usb27
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-cdc_ncm19
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-mesh4
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-qmi23
-rw-r--r--Documentation/arm64/silicon-errata.txt58
-rw-r--r--Documentation/cgroup-v1/00-INDEX (renamed from Documentation/cgroups/00-INDEX)2
-rw-r--r--Documentation/cgroup-v1/blkio-controller.txt (renamed from Documentation/cgroups/blkio-controller.txt)82
-rw-r--r--Documentation/cgroup-v1/cgroups.txt (renamed from Documentation/cgroups/cgroups.txt)0
-rw-r--r--Documentation/cgroup-v1/cpuacct.txt (renamed from Documentation/cgroups/cpuacct.txt)0
-rw-r--r--Documentation/cgroup-v1/cpusets.txt (renamed from Documentation/cgroups/cpusets.txt)0
-rw-r--r--Documentation/cgroup-v1/devices.txt (renamed from Documentation/cgroups/devices.txt)0
-rw-r--r--Documentation/cgroup-v1/freezer-subsystem.txt (renamed from Documentation/cgroups/freezer-subsystem.txt)0
-rw-r--r--Documentation/cgroup-v1/hugetlb.txt (renamed from Documentation/cgroups/hugetlb.txt)0
-rw-r--r--Documentation/cgroup-v1/memcg_test.txt (renamed from Documentation/cgroups/memcg_test.txt)0
-rw-r--r--Documentation/cgroup-v1/memory.txt (renamed from Documentation/cgroups/memory.txt)0
-rw-r--r--Documentation/cgroup-v1/net_cls.txt (renamed from Documentation/cgroups/net_cls.txt)0
-rw-r--r--Documentation/cgroup-v1/net_prio.txt (renamed from Documentation/cgroups/net_prio.txt)0
-rw-r--r--Documentation/cgroup-v1/pids.txt (renamed from Documentation/cgroups/pids.txt)0
-rw-r--r--Documentation/cgroup-v2.txt1293
-rw-r--r--Documentation/cgroups/unified-hierarchy.txt647
-rw-r--r--Documentation/cpu-freq/intel-pstate.txt241
-rw-r--r--Documentation/cpu-freq/pcc-cpufreq.txt4
-rw-r--r--Documentation/devicetree/bindings/arm/cpus.txt17
-rw-r--r--Documentation/devicetree/bindings/arm/l2c2x0.txt (renamed from Documentation/devicetree/bindings/arm/l2cc.txt)24
-rw-r--r--Documentation/devicetree/bindings/arm/pmu.txt5
-rw-r--r--Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt91
-rw-r--r--Documentation/devicetree/bindings/crypto/rockchip-crypto.txt29
-rw-r--r--Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt10
-rw-r--r--Documentation/devicetree/bindings/dma/stm32-dma.txt82
-rw-r--r--Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt6
-rw-r--r--Documentation/devicetree/bindings/extcon/extcon-arizona.txt60
-rw-r--r--Documentation/devicetree/bindings/extcon/extcon-max3355.txt21
-rw-r--r--Documentation/devicetree/bindings/i2c/trivial-devices.txt1
-rw-r--r--Documentation/devicetree/bindings/iio/accel/mma8452.txt6
-rw-r--r--Documentation/devicetree/bindings/iio/adc/imx7d-adc.txt22
-rw-r--r--Documentation/devicetree/bindings/iio/adc/mcp320x.txt30
-rw-r--r--Documentation/devicetree/bindings/iio/adc/mcp3422.txt3
-rw-r--r--Documentation/devicetree/bindings/iio/adc/palmas-gpadc.txt48
-rw-r--r--Documentation/devicetree/bindings/iio/adc/ti-adc128s052.txt4
-rw-r--r--Documentation/devicetree/bindings/iio/adc/ti-ads8688.txt20
-rw-r--r--Documentation/devicetree/bindings/iio/health/max30100.txt21
-rw-r--r--Documentation/devicetree/bindings/iio/light/us5182d.txt11
-rw-r--r--Documentation/devicetree/bindings/iio/st-sensors.txt1
-rw-r--r--Documentation/devicetree/bindings/input/touchscreen/goodix.txt14
-rw-r--r--Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt4
-rw-r--r--Documentation/devicetree/bindings/input/touchscreen/ts4800-ts.txt11
-rw-r--r--Documentation/devicetree/bindings/net/dsa/dsa.txt3
-rw-r--r--Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt7
-rw-r--r--Documentation/devicetree/bindings/net/ieee802154/adf7242.txt18
-rw-r--r--Documentation/devicetree/bindings/net/macb.txt8
-rw-r--r--Documentation/devicetree/bindings/net/micrel-ksz90x1.txt17
-rw-r--r--Documentation/devicetree/bindings/net/nfc/st95hf.txt50
-rw-r--r--Documentation/devicetree/bindings/net/renesas,ravb.txt12
-rw-r--r--Documentation/devicetree/bindings/net/socfpga-dwmac.txt2
-rw-r--r--Documentation/devicetree/bindings/net/stmmac.txt25
-rw-r--r--Documentation/devicetree/bindings/opp/opp.txt132
-rw-r--r--Documentation/devicetree/bindings/phy/brcm,brcmstb-sata-phy.txt1
-rw-r--r--Documentation/devicetree/bindings/phy/phy-hi6220-usb.txt16
-rw-r--r--Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt39
-rw-r--r--Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt6
-rw-r--r--Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt1
-rw-r--r--Documentation/devicetree/bindings/phy/ti-phy.txt20
-rw-r--r--Documentation/devicetree/bindings/serial/renesas,sci-serial.txt44
-rw-r--r--Documentation/devicetree/bindings/staging/ion/hi6220-ion.txt31
-rw-r--r--Documentation/devicetree/bindings/usb/dwc2.txt1
-rw-r--r--Documentation/devicetree/bindings/usb/dwc3-xilinx.txt33
-rw-r--r--Documentation/devicetree/bindings/usb/mt8173-xhci.txt51
-rw-r--r--Documentation/devicetree/bindings/usb/renesas_usb3.txt23
-rw-r--r--Documentation/devicetree/bindings/usb/renesas_usbhs.txt22
-rw-r--r--Documentation/devicetree/bindings/usb/usb-xhci.txt4
-rw-r--r--Documentation/dmaengine/client.txt59
-rw-r--r--Documentation/dmaengine/provider.txt20
-rw-r--r--Documentation/fault-injection/notifier-error-inject.txt25
-rw-r--r--Documentation/features/seccomp/seccomp-filter/arch-support.txt2
-rw-r--r--Documentation/features/time/irq-time-acct/arch-support.txt2
-rw-r--r--Documentation/filesystems/configfs/configfs.txt57
-rw-r--r--Documentation/iio/iio_configfs.txt93
-rw-r--r--Documentation/kernel-parameters.txt22
-rw-r--r--Documentation/networking/batman-adv.txt9
-rw-r--r--Documentation/networking/ip-sysctl.txt31
-rw-r--r--Documentation/networking/switchdev.txt8
-rw-r--r--Documentation/power/pci.txt2
-rw-r--r--Documentation/power/runtime_pm.txt6
-rw-r--r--Documentation/printk-formats.txt6
-rw-r--r--Documentation/usb/chipidea.txt4
-rw-r--r--Documentation/usb/gadget-testing.txt4
-rw-r--r--Documentation/usb/power-management.txt11
-rw-r--r--Documentation/virtual/kvm/api.txt41
-rw-r--r--Documentation/virtual/kvm/devices/vm.txt3
-rw-r--r--Documentation/virtual/kvm/mmu.txt4
93 files changed, 3016 insertions, 947 deletions
diff --git a/Documentation/ABI/testing/configfs-iio b/Documentation/ABI/testing/configfs-iio
new file mode 100644
index 0000000..2483756
--- /dev/null
+++ b/Documentation/ABI/testing/configfs-iio
@@ -0,0 +1,21 @@
+What: /config/iio
+Date: October 2015
+KernelVersion: 4.4
+Contact: linux-iio@vger.kernel.org
+Description:
+ This represents Industrial IO configuration entry point
+ directory. It contains sub-groups corresponding to IIO
+ objects.
+
+What: /config/iio/triggers
+Date: October 2015
+KernelVersion: 4.4
+Description:
+ Industrial IO software triggers directory.
+
+What: /config/iio/triggers/hrtimers
+Date: October 2015
+KernelVersion: 4.4
+Description:
+ High resolution timers directory. Creating a directory here
+ will result in creating a hrtimer trigger in the IIO subsystem.
diff --git a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
index bc7ff73..f56335a 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+++ b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
@@ -10,3 +10,5 @@ Description:
isoc_mult - 0..2 (hs/ss only)
isoc_maxburst - 0..15 (ss only)
buflen - buffer length
+ bulk_qlen - depth of queue for bulk
+ iso_qlen - depth of queue for iso
diff --git a/Documentation/ABI/testing/sysfs-bus-iio-ina2xx-adc b/Documentation/ABI/testing/sysfs-bus-iio-ina2xx-adc
new file mode 100644
index 0000000..8916f7e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-iio-ina2xx-adc
@@ -0,0 +1,24 @@
+What: /sys/bus/iio/devices/iio:deviceX/in_allow_async_readout
+Date: December 2015
+KernelVersion: 4.4
+Contact: linux-iio@vger.kernel.org
+Description:
+ By default (value '0'), the capture thread checks for the Conversion
+ Ready Flag to being set prior to committing a new value to the sample
+ buffer. This synchronizes the in-chip conversion rate with the
+ in-driver readout rate at the cost of an additional register read.
+
+ Writing '1' will remove the polling for the Conversion Ready Flags to
+ save the additional i2c transaction, which will improve the bandwidth
+ available for reading data. However, samples can be occasionally skipped
+ or repeated, depending on the beat between the capture and conversion
+ rates.
+
+What: /sys/bus/iio/devices/iio:deviceX/in_shunt_resistor
+Date: December 2015
+KernelVersion: 4.4
+Contact: linux-iio@vger.kernel.org
+Description:
+ The value of the shunt resistor may be known only at runtime fom an
+ eeprom content read by a client application. This attribute allows to
+ set its value in ohms.
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index 3a4abfc..0bd731c 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -134,19 +134,21 @@ Description:
enabled for the device. Developer can write y/Y/1 or n/N/0 to
the file to enable/disable the feature.
-What: /sys/bus/usb/devices/.../power/usb3_hardware_lpm
-Date: June 2015
+What: /sys/bus/usb/devices/.../power/usb3_hardware_lpm_u1
+ /sys/bus/usb/devices/.../power/usb3_hardware_lpm_u2
+Date: November 2015
Contact: Kevin Strasser <kevin.strasser@linux.intel.com>
+ Lu Baolu <baolu.lu@linux.intel.com>
Description:
If CONFIG_PM is set and a USB 3.0 lpm-capable device is plugged
in to a xHCI host which supports link PM, it will check if U1
and U2 exit latencies have been set in the BOS descriptor; if
- the check is is passed and the host supports USB3 hardware LPM,
+ the check is passed and the host supports USB3 hardware LPM,
USB3 hardware LPM will be enabled for the device and the USB
- device directory will contain a file named
- power/usb3_hardware_lpm. The file holds a string value (enable
- or disable) indicating whether or not USB3 hardware LPM is
- enabled for the device.
+ device directory will contain two files named
+ power/usb3_hardware_lpm_u1 and power/usb3_hardware_lpm_u2. These
+ files hold a string value (enable or disable) indicating whether
+ or not USB3 hardware LPM U1 or U2 is enabled for the device.
What: /sys/bus/usb/devices/.../removable
Date: February 2012
@@ -187,6 +189,17 @@ Description:
The file will read "hotplug", "wired" and "not used" if the
information is available, and "unknown" otherwise.
+What: /sys/bus/usb/devices/.../(hub interface)/portX/usb3_lpm_permit
+Date: November 2015
+Contact: Lu Baolu <baolu.lu@linux.intel.com>
+Description:
+ Some USB3.0 devices are not friendly to USB3 LPM. usb3_lpm_permit
+ attribute allows enabling/disabling usb3 lpm of a port. It takes
+ effect both before and after a usb device is enumerated. Supported
+ values are "0" if both u1 and u2 are NOT permitted, "u1" if only u1
+ is permitted, "u2" if only u2 is permitted, "u1_u2" if both u1 and
+ u2 are permitted.
+
What: /sys/bus/usb/devices/.../power/usb2_lpm_l1_timeout
Date: May 2013
Contact: Mathias Nyman <mathias.nyman@linux.intel.com>
diff --git a/Documentation/ABI/testing/sysfs-class-net-cdc_ncm b/Documentation/ABI/testing/sysfs-class-net-cdc_ncm
index 5cedf72d..f7be0e8 100644
--- a/Documentation/ABI/testing/sysfs-class-net-cdc_ncm
+++ b/Documentation/ABI/testing/sysfs-class-net-cdc_ncm
@@ -19,6 +19,25 @@ Description:
Set to 0 to pad all frames. Set greater than tx_max to
disable all padding.
+What: /sys/class/net/<iface>/cdc_ncm/ndp_to_end
+Date: Dec 2015
+KernelVersion: 4.5
+Contact: Bjørn Mork <bjorn@mork.no>
+Description:
+ Boolean attribute showing the status of the "NDP to
+ end" quirk. Defaults to 'N', except for devices
+ already known to need it enabled.
+
+ The "NDP to end" quirk makes the driver place the NDP
+ (the packet index table) after the payload. The NCM
+ specification does not mandate this, but some devices
+ are known to be more restrictive. Write 'Y' to this
+ attribute for temporary testing of a suspect device
+ failing to work with the default driver settings.
+
+ A device entry should be added to the driver if this
+ quirk is found to be required.
+
What: /sys/class/net/<iface>/cdc_ncm/rx_max
Date: May 2014
KernelVersion: 3.16
diff --git a/Documentation/ABI/testing/sysfs-class-net-mesh b/Documentation/ABI/testing/sysfs-class-net-mesh
index c464062..c2b956d 100644
--- a/Documentation/ABI/testing/sysfs-class-net-mesh
+++ b/Documentation/ABI/testing/sysfs-class-net-mesh
@@ -8,7 +8,7 @@ Description:
What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation
Date: May 2011
-Contact: Antonio Quartulli <antonio@meshcoding.com>
+Contact: Antonio Quartulli <a@unstable.cc>
Description:
Indicates whether the data traffic going from a
wireless client to another wireless client will be
@@ -70,7 +70,7 @@ Description:
What: /sys/class/net/<mesh_iface>/mesh/isolation_mark
Date: Nov 2013
-Contact: Antonio Quartulli <antonio@meshcoding.com>
+Contact: Antonio Quartulli <a@unstable.cc>
Description:
Defines the isolation mark (and its bitmask) which
is used to classify clients as "isolated" by the
diff --git a/Documentation/ABI/testing/sysfs-class-net-qmi b/Documentation/ABI/testing/sysfs-class-net-qmi
new file mode 100644
index 0000000..fa5a00b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-net-qmi
@@ -0,0 +1,23 @@
+What: /sys/class/net/<iface>/qmi/raw_ip
+Date: Dec 2015
+KernelVersion: 4.4
+Contact: Bjørn Mork <bjorn@mork.no>
+Description:
+ Boolean. Default: 'N'
+
+ Set this to 'Y' to change the network device link
+ framing from '802.3' to 'raw-ip'.
+
+ The netdev will change to reflect the link framing
+ mode. The netdev is an ordinary ethernet device in
+ '802.3' mode, and the driver expects to exchange
+ frames with an ethernet header over the USB link. The
+ netdev is a headerless p-t-p device in 'raw-ip' mode,
+ and the driver expects to echange IPv4 or IPv6 packets
+ without any L2 header over the USB link.
+
+ Userspace is in full control of firmware configuration
+ through the delegation of the QMI protocol. Userspace
+ is responsible for coordination of driver and firmware
+ link framing mode, changing this setting to 'Y' if the
+ firmware is configured for 'raw-ip' mode.
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
new file mode 100644
index 0000000..58b71dd
--- /dev/null
+++ b/Documentation/arm64/silicon-errata.txt
@@ -0,0 +1,58 @@
+ Silicon Errata and Software Workarounds
+ =======================================
+
+Author: Will Deacon <will.deacon@arm.com>
+Date : 27 November 2015
+
+It is an unfortunate fact of life that hardware is often produced with
+so-called "errata", which can cause it to deviate from the architecture
+under specific circumstances. For hardware produced by ARM, these
+errata are broadly classified into the following categories:
+
+ Category A: A critical error without a viable workaround.
+ Category B: A significant or critical error with an acceptable
+ workaround.
+ Category C: A minor error that is not expected to occur under normal
+ operation.
+
+For more information, consult one of the "Software Developers Errata
+Notice" documents available on infocenter.arm.com (registration
+required).
+
+As far as Linux is concerned, Category B errata may require some special
+treatment in the operating system. For example, avoiding a particular
+sequence of code, or configuring the processor in a particular way. A
+less common situation may require similar actions in order to declassify
+a Category A erratum into a Category C erratum. These are collectively
+known as "software workarounds" and are only required in the minority of
+cases (e.g. those cases that both require a non-secure workaround *and*
+can be triggered by Linux).
+
+For software workarounds that may adversely impact systems unaffected by
+the erratum in question, a Kconfig entry is added under "Kernel
+Features" -> "ARM errata workarounds via the alternatives framework".
+These are enabled by default and patched in at runtime when an affected
+CPU is detected. For less-intrusive workarounds, a Kconfig option is not
+available and the code is structured (preferably with a comment) in such
+a way that the erratum will not be hit.
+
+This approach can make it slightly onerous to determine exactly which
+errata are worked around in an arbitrary kernel source tree, so this
+file acts as a registry of software workarounds in the Linux Kernel and
+will be updated when new workarounds are committed and backported to
+stable kernels.
+
+| Implementor | Component | Erratum ID | Kconfig |
++----------------+-----------------+-----------------+-------------------------+
+| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
+| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
+| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
+| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 |
+| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 |
+| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 |
+| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
+| ARM | Cortex-A57 | #852523 | N/A |
+| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
+| | | | |
+| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |
+| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroup-v1/00-INDEX
index 3f5a40f..6ad425f 100644
--- a/Documentation/cgroups/00-INDEX
+++ b/Documentation/cgroup-v1/00-INDEX
@@ -24,7 +24,5 @@ net_prio.txt
- Network priority cgroups details and usages.
pids.txt
- Process number cgroups details and usages.
-resource_counter.txt
- - Resource Counter API.
unified-hierarchy.txt
- Description the new/next cgroup interface.
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroup-v1/blkio-controller.txt
index 52fa9f3..673dc34 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroup-v1/blkio-controller.txt
@@ -84,8 +84,7 @@ Throttling/Upper Limit policy
- Run dd to read a file and see if rate is throttled to 1MB/s or not.
- # dd if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
- # iflag=direct
+ # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
@@ -374,82 +373,3 @@ One can experience an overall throughput drop if you have created multiple
groups and put applications in that group which are not driving enough
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
on individual groups and throughput should improve.
-
-Writeback
-=========
-
-Page cache is dirtied through buffered writes and shared mmaps and
-written asynchronously to the backing filesystem by the writeback
-mechanism. Writeback sits between the memory and IO domains and
-regulates the proportion of dirty memory by balancing dirtying and
-write IOs.
-
-On traditional cgroup hierarchies, relationships between different
-controllers cannot be established making it impossible for writeback
-to operate accounting for cgroup resource restrictions and all
-writeback IOs are attributed to the root cgroup.
-
-If both the blkio and memory controllers are used on the v2 hierarchy
-and the filesystem supports cgroup writeback, writeback operations
-correctly follow the resource restrictions imposed by both memory and
-blkio controllers.
-
-Writeback examines both system-wide and per-cgroup dirty memory status
-and enforces the more restrictive of the two. Also, writeback control
-parameters which are absolute values - vm.dirty_bytes and
-vm.dirty_background_bytes - are distributed across cgroups according
-to their current writeback bandwidth.
-
-There's a peculiarity stemming from the discrepancy in ownership
-granularity between memory controller and writeback. While memory
-controller tracks ownership per page, writeback operates on inode
-basis. cgroup writeback bridges the gap by tracking ownership by
-inode but migrating ownership if too many foreign pages, pages which
-don't match the current inode ownership, have been encountered while
-writing back the inode.
-
-This is a conscious design choice as writeback operations are
-inherently tied to inodes making strictly following page ownership
-complicated and inefficient. The only use case which suffers from
-this compromise is multiple cgroups concurrently dirtying disjoint
-regions of the same inode, which is an unlikely use case and decided
-to be unsupported. Note that as memory controller assigns page
-ownership on the first use and doesn't update it until the page is
-released, even if cgroup writeback strictly follows page ownership,
-multiple cgroups dirtying overlapping areas wouldn't work as expected.
-In general, write-sharing an inode across multiple cgroups is not well
-supported.
-
-Filesystem support for cgroup writeback
----------------------------------------
-
-A filesystem can make writeback IOs cgroup-aware by updating
-address_space_operations->writepage[s]() to annotate bio's using the
-following two functions.
-
-* wbc_init_bio(@wbc, @bio)
-
- Should be called for each bio carrying writeback data and associates
- the bio with the inode's owner cgroup. Can be called anytime
- between bio allocation and submission.
-
-* wbc_account_io(@wbc, @page, @bytes)
-
- Should be called for each data segment being written out. While
- this function doesn't care exactly when it's called during the
- writeback session, it's the easiest and most natural to call it as
- data segments are added to a bio.
-
-With writeback bio's annotated, cgroup support can be enabled per
-super_block by setting MS_CGROUPWB in ->s_flags. This allows for
-selective disabling of cgroup writeback support which is helpful when
-certain filesystem features, e.g. journaled data mode, are
-incompatible.
-
-wbc_init_bio() binds the specified bio to its cgroup. Depending on
-the configuration, the bio may be executed at a lower priority and if
-the writeback session is holding shared resources, e.g. a journal
-entry, may lead to priority inversion. There is no one easy solution
-for the problem. Filesystems can try to work around specific problem
-cases by skipping wbc_init_bio() or using bio_associate_blkcg()
-directly.
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroup-v1/cgroups.txt
index c6256ae..c6256ae 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroup-v1/cgroups.txt
diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroup-v1/cpuacct.txt
index 9d73cc0..9d73cc0 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroup-v1/cpuacct.txt
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroup-v1/cpusets.txt
index fdf7dff..fdf7dff 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroup-v1/cpusets.txt
diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroup-v1/devices.txt
index 3c1095c..3c1095c 100644
--- a/Documentation/cgroups/devices.txt
+++ b/Documentation/cgroup-v1/devices.txt
diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroup-v1/freezer-subsystem.txt
index e831cb2..e831cb2 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroup-v1/freezer-subsystem.txt
diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroup-v1/hugetlb.txt
index 106245c..106245c 100644
--- a/Documentation/cgroups/hugetlb.txt
+++ b/Documentation/cgroup-v1/hugetlb.txt
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroup-v1/memcg_test.txt
index 8870b02..8870b02 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroup-v1/memcg_test.txt
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroup-v1/memory.txt
index ff71e16..ff71e16 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
diff --git a/Documentation/cgroups/net_cls.txt b/Documentation/cgroup-v1/net_cls.txt
index ec18234..ec18234 100644
--- a/Documentation/cgroups/net_cls.txt
+++ b/Documentation/cgroup-v1/net_cls.txt
diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroup-v1/net_prio.txt
index a82cbd2..a82cbd2 100644
--- a/Documentation/cgroups/net_prio.txt
+++ b/Documentation/cgroup-v1/net_prio.txt
diff --git a/Documentation/cgroups/pids.txt b/Documentation/cgroup-v1/pids.txt
index 1a078b5..1a078b5 100644
--- a/Documentation/cgroups/pids.txt
+++ b/Documentation/cgroup-v1/pids.txt
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
new file mode 100644
index 0000000..31d1f7b
--- /dev/null
+++ b/Documentation/cgroup-v2.txt
@@ -0,0 +1,1293 @@
+
+Control Group v2
+
+October, 2015 Tejun Heo <tj@kernel.org>
+
+This is the authoritative documentation on the design, interface and
+conventions of cgroup v2. It describes all userland-visible aspects
+of cgroup including core and specific controller behaviors. All
+future changes must be reflected in this document. Documentation for
+v1 is available under Documentation/cgroup-legacy/.
+
+CONTENTS
+
+1. Introduction
+ 1-1. Terminology
+ 1-2. What is cgroup?
+2. Basic Operations
+ 2-1. Mounting
+ 2-2. Organizing Processes
+ 2-3. [Un]populated Notification
+ 2-4. Controlling Controllers
+ 2-4-1. Enabling and Disabling
+ 2-4-2. Top-down Constraint
+ 2-4-3. No Internal Process Constraint
+ 2-5. Delegation
+ 2-5-1. Model of Delegation
+ 2-5-2. Delegation Containment
+ 2-6. Guidelines
+ 2-6-1. Organize Once and Control
+ 2-6-2. Avoid Name Collisions
+3. Resource Distribution Models
+ 3-1. Weights
+ 3-2. Limits
+ 3-3. Protections
+ 3-4. Allocations
+4. Interface Files
+ 4-1. Format
+ 4-2. Conventions
+ 4-3. Core Interface Files
+5. Controllers
+ 5-1. CPU
+ 5-1-1. CPU Interface Files
+ 5-2. Memory
+ 5-2-1. Memory Interface Files
+ 5-2-2. Usage Guidelines
+ 5-2-3. Memory Ownership
+ 5-3. IO
+ 5-3-1. IO Interface Files
+ 5-3-2. Writeback
+P. Information on Kernel Programming
+ P-1. Filesystem Support for Writeback
+D. Deprecated v1 Core Features
+R. Issues with v1 and Rationales for v2
+ R-1. Multiple Hierarchies
+ R-2. Thread Granularity
+ R-3. Competition Between Inner Nodes and Threads
+ R-4. Other Interface Issues
+ R-5. Controller Issues and Remedies
+ R-5-1. Memory
+
+
+1. Introduction
+
+1-1. Terminology
+
+"cgroup" stands for "control group" and is never capitalized. The
+singular form is used to designate the whole feature and also as a
+qualifier as in "cgroup controllers". When explicitly referring to
+multiple individual control groups, the plural form "cgroups" is used.
+
+
+1-2. What is cgroup?
+
+cgroup is a mechanism to organize processes hierarchically and
+distribute system resources along the hierarchy in a controlled and
+configurable manner.
+
+cgroup is largely composed of two parts - the core and controllers.
+cgroup core is primarily responsible for hierarchically organizing
+processes. A cgroup controller is usually responsible for
+distributing a specific type of system resource along the hierarchy
+although there are utility controllers which serve purposes other than
+resource distribution.
+
+cgroups form a tree structure and every process in the system belongs
+to one and only one cgroup. All threads of a process belong to the
+same cgroup. On creation, all processes are put in the cgroup that
+the parent process belongs to at the time. A process can be migrated
+to another cgroup. Migration of a process doesn't affect already
+existing descendant processes.
+
+Following certain structural constraints, controllers may be enabled or
+disabled selectively on a cgroup. All controller behaviors are
+hierarchical - if a controller is enabled on a cgroup, it affects all
+processes which belong to the cgroups consisting the inclusive
+sub-hierarchy of the cgroup. When a controller is enabled on a nested
+cgroup, it always restricts the resource distribution further. The
+restrictions set closer to the root in the hierarchy can not be
+overridden from further away.
+
+
+2. Basic Operations
+
+2-1. Mounting
+
+Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2
+hierarchy can be mounted with the following mount command.
+
+ # mount -t cgroup2 none $MOUNT_POINT
+
+cgroup2 filesystem has the magic number 0x63677270 ("cgrp"). All
+controllers which support v2 and are not bound to a v1 hierarchy are
+automatically bound to the v2 hierarchy and show up at the root.
+Controllers which are not in active use in the v2 hierarchy can be
+bound to other hierarchies. This allows mixing v2 hierarchy with the
+legacy v1 multiple hierarchies in a fully backward compatible way.
+
+A controller can be moved across hierarchies only after the controller
+is no longer referenced in its current hierarchy. Because per-cgroup
+controller states are destroyed asynchronously and controllers may
+have lingering references, a controller may not show up immediately on
+the v2 hierarchy after the final umount of the previous hierarchy.
+Similarly, a controller should be fully disabled to be moved out of
+the unified hierarchy and it may take some time for the disabled
+controller to become available for other hierarchies; furthermore, due
+to inter-controller dependencies, other controllers may need to be
+disabled too.
+
+While useful for development and manual configurations, moving
+controllers dynamically between the v2 and other hierarchies is
+strongly discouraged for production use. It is recommended to decide
+the hierarchies and controller associations before starting using the
+controllers after system boot.
+
+
+2-2. Organizing Processes
+
+Initially, only the root cgroup exists to which all processes belong.
+A child cgroup can be created by creating a sub-directory.
+
+ # mkdir $CGROUP_NAME
+
+A given cgroup may have multiple child cgroups forming a tree
+structure. Each cgroup has a read-writable interface file
+"cgroup.procs". When read, it lists the PIDs of all processes which
+belong to the cgroup one-per-line. The PIDs are not ordered and the
+same PID may show up more than once if the process got moved to
+another cgroup and then back or the PID got recycled while reading.
+
+A process can be migrated into a cgroup by writing its PID to the
+target cgroup's "cgroup.procs" file. Only one process can be migrated
+on a single write(2) call. If a process is composed of multiple
+threads, writing the PID of any thread migrates all threads of the
+process.
+
+When a process forks a child process, the new process is born into the
+cgroup that the forking process belongs to at the time of the
+operation. After exit, a process stays associated with the cgroup
+that it belonged to at the time of exit until it's reaped; however, a
+zombie process does not appear in "cgroup.procs" and thus can't be
+moved to another cgroup.
+
+A cgroup which doesn't have any children or live processes can be
+destroyed by removing the directory. Note that a cgroup which doesn't
+have any children and is associated only with zombie processes is
+considered empty and can be removed.
+
+ # rmdir $CGROUP_NAME
+
+"/proc/$PID/cgroup" lists a process's cgroup membership. If legacy
+cgroup is in use in the system, this file may contain multiple lines,
+one for each hierarchy. The entry for cgroup v2 is always in the
+format "0::$PATH".
+
+ # cat /proc/842/cgroup
+ ...
+ 0::/test-cgroup/test-cgroup-nested
+
+If the process becomes a zombie and the cgroup it was associated with
+is removed subsequently, " (deleted)" is appended to the path.
+
+ # cat /proc/842/cgroup
+ ...
+ 0::/test-cgroup/test-cgroup-nested (deleted)
+
+
+2-3. [Un]populated Notification
+
+Each non-root cgroup has a "cgroup.events" file which contains
+"populated" field indicating whether the cgroup's sub-hierarchy has
+live processes in it. Its value is 0 if there is no live process in
+the cgroup and its descendants; otherwise, 1. poll and [id]notify
+events are triggered when the value changes. This can be used, for
+example, to start a clean-up operation after all processes of a given
+sub-hierarchy have exited. The populated state updates and
+notifications are recursive. Consider the following sub-hierarchy
+where the numbers in the parentheses represent the numbers of processes
+in each cgroup.
+
+ A(4) - B(0) - C(1)
+ \ D(0)
+
+A, B and C's "populated" fields would be 1 while D's 0. After the one
+process in C exits, B and C's "populated" fields would flip to "0" and
+file modified events will be generated on the "cgroup.events" files of
+both cgroups.
+
+
+2-4. Controlling Controllers
+
+2-4-1. Enabling and Disabling
+
+Each cgroup has a "cgroup.controllers" file which lists all
+controllers available for the cgroup to enable.
+
+ # cat cgroup.controllers
+ cpu io memory
+
+No controller is enabled by default. Controllers can be enabled and
+disabled by writing to the "cgroup.subtree_control" file.
+
+ # echo "+cpu +memory -io" > cgroup.subtree_control
+
+Only controllers which are listed in "cgroup.controllers" can be
+enabled. When multiple operations are specified as above, either they
+all succeed or fail. If multiple operations on the same controller
+are specified, the last one is effective.
+
+Enabling a controller in a cgroup indicates that the distribution of
+the target resource across its immediate children will be controlled.
+Consider the following sub-hierarchy. The enabled controllers are
+listed in parentheses.
+
+ A(cpu,memory) - B(memory) - C()
+ \ D()
+
+As A has "cpu" and "memory" enabled, A will control the distribution
+of CPU cycles and memory to its children, in this case, B. As B has
+"memory" enabled but not "CPU", C and D will compete freely on CPU
+cycles but their division of memory available to B will be controlled.
+
+As a controller regulates the distribution of the target resource to
+the cgroup's children, enabling it creates the controller's interface
+files in the child cgroups. In the above example, enabling "cpu" on B
+would create the "cpu." prefixed controller interface files in C and
+D. Likewise, disabling "memory" from B would remove the "memory."
+prefixed controller interface files from C and D. This means that the
+controller interface files - anything which doesn't start with
+"cgroup." are owned by the parent rather than the cgroup itself.
+
+
+2-4-2. Top-down Constraint
+
+Resources are distributed top-down and a cgroup can further distribute
+a resource only if the resource has been distributed to it from the
+parent. This means that all non-root "cgroup.subtree_control" files
+can only contain controllers which are enabled in the parent's
+"cgroup.subtree_control" file. A controller can be enabled only if
+the parent has the controller enabled and a controller can't be
+disabled if one or more children have it enabled.
+
+
+2-4-3. No Internal Process Constraint
+
+Non-root cgroups can only distribute resources to their children when
+they don't have any processes of their own. In other words, only
+cgroups which don't contain any processes can have controllers enabled
+in their "cgroup.subtree_control" files.
+
+This guarantees that, when a controller is looking at the part of the
+hierarchy which has it enabled, processes are always only on the
+leaves. This rules out situations where child cgroups compete against
+internal processes of the parent.
+
+The root cgroup is exempt from this restriction. Root contains
+processes and anonymous resource consumption which can't be associated
+with any other cgroups and requires special treatment from most
+controllers. How resource consumption in the root cgroup is governed
+is up to each controller.
+
+Note that the restriction doesn't get in the way if there is no
+enabled controller in the cgroup's "cgroup.subtree_control". This is
+important as otherwise it wouldn't be possible to create children of a
+populated cgroup. To control resource distribution of a cgroup, the
+cgroup must create children and transfer all its processes to the
+children before enabling controllers in its "cgroup.subtree_control"
+file.
+
+
+2-5. Delegation
+
+2-5-1. Model of Delegation
+
+A cgroup can be delegated to a less privileged user by granting write
+access of the directory and its "cgroup.procs" file to the user. Note
+that resource control interface files in a given directory control the
+distribution of the parent's resources and thus must not be delegated
+along with the directory.
+
+Once delegated, the user can build sub-hierarchy under the directory,
+organize processes as it sees fit and further distribute the resources
+it received from the parent. The limits and other settings of all
+resource controllers are hierarchical and regardless of what happens
+in the delegated sub-hierarchy, nothing can escape the resource
+restrictions imposed by the parent.
+
+Currently, cgroup doesn't impose any restrictions on the number of
+cgroups in or nesting depth of a delegated sub-hierarchy; however,
+this may be limited explicitly in the future.
+
+
+2-5-2. Delegation Containment
+
+A delegated sub-hierarchy is contained in the sense that processes
+can't be moved into or out of the sub-hierarchy by the delegatee. For
+a process with a non-root euid to migrate a target process into a
+cgroup by writing its PID to the "cgroup.procs" file, the following
+conditions must be met.
+
+- The writer's euid must match either uid or suid of the target process.
+
+- The writer must have write access to the "cgroup.procs" file.
+
+- The writer must have write access to the "cgroup.procs" file of the
+ common ancestor of the source and destination cgroups.
+
+The above three constraints ensure that while a delegatee may migrate
+processes around freely in the delegated sub-hierarchy it can't pull
+in from or push out to outside the sub-hierarchy.
+
+For an example, let's assume cgroups C0 and C1 have been delegated to
+user U0 who created C00, C01 under C0 and C10 under C1 as follows and
+all processes under C0 and C1 belong to U0.
+
+ ~~~~~~~~~~~~~ - C0 - C00
+ ~ cgroup ~ \ C01
+ ~ hierarchy ~
+ ~~~~~~~~~~~~~ - C1 - C10
+
+Let's also say U0 wants to write the PID of a process which is
+currently in C10 into "C00/cgroup.procs". U0 has write access to the
+file and uid match on the process; however, the common ancestor of the
+source cgroup C10 and the destination cgroup C00 is above the points
+of delegation and U0 would not have write access to its "cgroup.procs"
+files and thus the write will be denied with -EACCES.
+
+
+2-6. Guidelines
+
+2-6-1. Organize Once and Control
+
+Migrating a process across cgroups is a relatively expensive operation
+and stateful resources such as memory are not moved together with the
+process. This is an explicit design decision as there often exist
+inherent trade-offs between migration and various hot paths in terms
+of synchronization cost.
+
+As such, migrating processes across cgroups frequently as a means to
+apply different resource restrictions is discouraged. A workload
+should be assigned to a cgroup according to the system's logical and
+resource structure once on start-up. Dynamic adjustments to resource
+distribution can be made by changing controller configuration through
+the interface files.
+
+
+2-6-2. Avoid Name Collisions
+
+Interface files for a cgroup and its children cgroups occupy the same
+directory and it is possible to create children cgroups which collide
+with interface files.
+
+All cgroup core interface files are prefixed with "cgroup." and each
+controller's interface files are prefixed with the controller name and
+a dot. A controller's name is composed of lower case alphabets and
+'_'s but never begins with an '_' so it can be used as the prefix
+character for collision avoidance. Also, interface file names won't
+start or end with terms which are often used in categorizing workloads
+such as job, service, slice, unit or workload.
+
+cgroup doesn't do anything to prevent name collisions and it's the
+user's responsibility to avoid them.
+
+
+3. Resource Distribution Models
+
+cgroup controllers implement several resource distribution schemes
+depending on the resource type and expected use cases. This section
+describes major schemes in use along with their expected behaviors.
+
+
+3-1. Weights
+
+A parent's resource is distributed by adding up the weights of all
+active children and giving each the fraction matching the ratio of its
+weight against the sum. As only children which can make use of the
+resource at the moment participate in the distribution, this is
+work-conserving. Due to the dynamic nature, this model is usually
+used for stateless resources.
+
+All weights are in the range [1, 10000] with the default at 100. This
+allows symmetric multiplicative biases in both directions at fine
+enough granularity while staying in the intuitive range.
+
+As long as the weight is in range, all configuration combinations are
+valid and there is no reason to reject configuration changes or
+process migrations.
+
+"cpu.weight" proportionally distributes CPU cycles to active children
+and is an example of this type.
+
+
+3-2. Limits
+
+A child can only consume upto the configured amount of the resource.
+Limits can be over-committed - the sum of the limits of children can
+exceed the amount of resource available to the parent.
+
+Limits are in the range [0, max] and defaults to "max", which is noop.
+
+As limits can be over-committed, all configuration combinations are
+valid and there is no reason to reject configuration changes or
+process migrations.
+
+"io.max" limits the maximum BPS and/or IOPS that a cgroup can consume
+on an IO device and is an example of this type.
+
+
+3-3. Protections
+
+A cgroup is protected to be allocated upto the configured amount of
+the resource if the usages of all its ancestors are under their
+protected levels. Protections can be hard guarantees or best effort
+soft boundaries. Protections can also be over-committed in which case
+only upto the amount available to the parent is protected among
+children.
+
+Protections are in the range [0, max] and defaults to 0, which is
+noop.
+
+As protections can be over-committed, all configuration combinations
+are valid and there is no reason to reject configuration changes or
+process migrations.
+
+"memory.low" implements best-effort memory protection and is an
+example of this type.
+
+
+3-4. Allocations
+
+A cgroup is exclusively allocated a certain amount of a finite
+resource. Allocations can't be over-committed - the sum of the
+allocations of children can not exceed the amount of resource
+available to the parent.
+
+Allocations are in the range [0, max] and defaults to 0, which is no
+resource.
+
+As allocations can't be over-committed, some configuration
+combinations are invalid and should be rejected. Also, if the
+resource is mandatory for execution of processes, process migrations
+may be rejected.
+
+"cpu.rt.max" hard-allocates realtime slices and is an example of this
+type.
+
+
+4. Interface Files
+
+4-1. Format
+
+All interface files should be in one of the following formats whenever
+possible.
+
+ New-line separated values
+ (when only one value can be written at once)
+
+ VAL0\n
+ VAL1\n
+ ...
+
+ Space separated values
+ (when read-only or multiple values can be written at once)
+
+ VAL0 VAL1 ...\n
+
+ Flat keyed
+
+ KEY0 VAL0\n
+ KEY1 VAL1\n
+ ...
+
+ Nested keyed
+
+ KEY0 SUB_KEY0=VAL00 SUB_KEY1=VAL01...
+ KEY1 SUB_KEY0=VAL10 SUB_KEY1=VAL11...
+ ...
+
+For a writable file, the format for writing should generally match
+reading; however, controllers may allow omitting later fields or
+implement restricted shortcuts for most common use cases.
+
+For both flat and nested keyed files, only the values for a single key
+can be written at a time. For nested keyed files, the sub key pairs
+may be specified in any order and not all pairs have to be specified.
+
+
+4-2. Conventions
+
+- Settings for a single feature should be contained in a single file.
+
+- The root cgroup should be exempt from resource control and thus
+ shouldn't have resource control interface files. Also,
+ informational files on the root cgroup which end up showing global
+ information available elsewhere shouldn't exist.
+
+- If a controller implements weight based resource distribution, its
+ interface file should be named "weight" and have the range [1,
+ 10000] with 100 as the default. The values are chosen to allow
+ enough and symmetric bias in both directions while keeping it
+ intuitive (the default is 100%).
+
+- If a controller implements an absolute resource guarantee and/or
+ limit, the interface files should be named "min" and "max"
+ respectively. If a controller implements best effort resource
+ guarantee and/or limit, the interface files should be named "low"
+ and "high" respectively.
+
+ In the above four control files, the special token "max" should be
+ used to represent upward infinity for both reading and writing.
+
+- If a setting has a configurable default value and keyed specific
+ overrides, the default entry should be keyed with "default" and
+ appear as the first entry in the file.
+
+ The default value can be updated by writing either "default $VAL" or
+ "$VAL".
+
+ When writing to update a specific override, "default" can be used as
+ the value to indicate removal of the override. Override entries
+ with "default" as the value must not appear when read.
+
+ For example, a setting which is keyed by major:minor device numbers
+ with integer values may look like the following.
+
+ # cat cgroup-example-interface-file
+ default 150
+ 8:0 300
+
+ The default value can be updated by
+
+ # echo 125 > cgroup-example-interface-file
+
+ or
+
+ # echo "default 125" > cgroup-example-interface-file
+
+ An override can be set by
+
+ # echo "8:16 170" > cgroup-example-interface-file
+
+ and cleared by
+
+ # echo "8:0 default" > cgroup-example-interface-file
+ # cat cgroup-example-interface-file
+ default 125
+ 8:16 170
+
+- For events which are not very high frequency, an interface file
+ "events" should be created which lists event key value pairs.
+ Whenever a notifiable event happens, file modified event should be
+ generated on the file.
+
+
+4-3. Core Interface Files
+
+All cgroup core files are prefixed with "cgroup."
+
+ cgroup.procs
+
+ A read-write new-line separated values file which exists on
+ all cgroups.
+
+ When read, it lists the PIDs of all processes which belong to
+ the cgroup one-per-line. The PIDs are not ordered and the
+ same PID may show up more than once if the process got moved
+ to another cgroup and then back or the PID got recycled while
+ reading.
+
+ A PID can be written to migrate the process associated with
+ the PID to the cgroup. The writer should match all of the
+ following conditions.
+
+ - Its euid is either root or must match either uid or suid of
+ the target process.
+
+ - It must have write access to the "cgroup.procs" file.
+
+ - It must have write access to the "cgroup.procs" file of the
+ common ancestor of the source and destination cgroups.
+
+ When delegating a sub-hierarchy, write access to this file
+ should be granted along with the containing directory.
+
+ cgroup.controllers
+
+ A read-only space separated values file which exists on all
+ cgroups.
+
+ It shows space separated list of all controllers available to
+ the cgroup. The controllers are not ordered.
+
+ cgroup.subtree_control
+
+ A read-write space separated values file which exists on all
+ cgroups. Starts out empty.
+
+ When read, it shows space separated list of the controllers
+ which are enabled to control resource distribution from the
+ cgroup to its children.
+
+ Space separated list of controllers prefixed with '+' or '-'
+ can be written to enable or disable controllers. A controller
+ name prefixed with '+' enables the controller and '-'
+ disables. If a controller appears more than once on the list,
+ the last one is effective. When multiple enable and disable
+ operations are specified, either all succeed or all fail.
+
+ cgroup.events
+
+ A read-only flat-keyed file which exists on non-root cgroups.
+ The following entries are defined. Unless specified
+ otherwise, a value change in this file generates a file
+ modified event.
+
+ populated
+
+ 1 if the cgroup or its descendants contains any live
+ processes; otherwise, 0.
+
+
+5. Controllers
+
+5-1. CPU
+
+[NOTE: The interface for the cpu controller hasn't been merged yet]
+
+The "cpu" controllers regulates distribution of CPU cycles. This
+controller implements weight and absolute bandwidth limit models for
+normal scheduling policy and absolute bandwidth allocation model for
+realtime scheduling policy.
+
+
+5-1-1. CPU Interface Files
+
+All time durations are in microseconds.
+
+ cpu.stat
+
+ A read-only flat-keyed file which exists on non-root cgroups.
+
+ It reports the following six stats.
+
+ usage_usec
+ user_usec
+ system_usec
+ nr_periods
+ nr_throttled
+ throttled_usec
+
+ cpu.weight
+
+ A read-write single value file which exists on non-root
+ cgroups. The default is "100".
+
+ The weight in the range [1, 10000].
+
+ cpu.max
+
+ A read-write two value file which exists on non-root cgroups.
+ The default is "max 100000".
+
+ The maximum bandwidth limit. It's in the following format.
+
+ $MAX $PERIOD
+
+ which indicates that the group may consume upto $MAX in each
+ $PERIOD duration. "max" for $MAX indicates no limit. If only
+ one number is written, $MAX is updated.
+
+ cpu.rt.max
+
+ [NOTE: The semantics of this file is still under discussion and the
+ interface hasn't been merged yet]
+
+ A read-write two value file which exists on all cgroups.
+ The default is "0 100000".
+
+ The maximum realtime runtime allocation. Over-committing
+ configurations are disallowed and process migrations are
+ rejected if not enough bandwidth is available. It's in the
+ following format.
+
+ $MAX $PERIOD
+
+ which indicates that the group may consume upto $MAX in each
+ $PERIOD duration. If only one number is written, $MAX is
+ updated.
+
+
+5-2. Memory
+
+The "memory" controller regulates distribution of memory. Memory is
+stateful and implements both limit and protection models. Due to the
+intertwining between memory usage and reclaim pressure and the
+stateful nature of memory, the distribution model is relatively
+complex.
+
+While not completely water-tight, all major memory usages by a given
+cgroup are tracked so that the total memory consumption can be
+accounted and controlled to a reasonable extent. Currently, the
+following types of memory usages are tracked.
+
+- Userland memory - page cache and anonymous memory.
+
+- Kernel data structures such as dentries and inodes.
+
+- TCP socket buffers.
+
+The above list may expand in the future for better coverage.
+
+
+5-2-1. Memory Interface Files
+
+All memory amounts are in bytes. If a value which is not aligned to
+PAGE_SIZE is written, the value may be rounded up to the closest
+PAGE_SIZE multiple when read back.
+
+ memory.current
+
+ A read-only single value file which exists on non-root
+ cgroups.
+
+ The total amount of memory currently being used by the cgroup
+ and its descendants.
+
+ memory.low
+
+ A read-write single value file which exists on non-root
+ cgroups. The default is "0".
+
+ Best-effort memory protection. If the memory usages of a
+ cgroup and all its ancestors are below their low boundaries,
+ the cgroup's memory won't be reclaimed unless memory can be
+ reclaimed from unprotected cgroups.
+
+ Putting more memory than generally available under this
+ protection is discouraged.
+
+ memory.high
+
+ A read-write single value file which exists on non-root
+ cgroups. The default is "max".
+
+ Memory usage throttle limit. This is the main mechanism to
+ control memory usage of a cgroup. If a cgroup's usage goes
+ over the high boundary, the processes of the cgroup are
+ throttled and put under heavy reclaim pressure.
+
+ Going over the high limit never invokes the OOM killer and
+ under extreme conditions the limit may be breached.
+
+ memory.max
+
+ A read-write single value file which exists on non-root
+ cgroups. The default is "max".
+
+ Memory usage hard limit. This is the final protection
+ mechanism. If a cgroup's memory usage reaches this limit and
+ can't be reduced, the OOM killer is invoked in the cgroup.
+ Under certain circumstances, the usage may go over the limit
+ temporarily.
+
+ This is the ultimate protection mechanism. As long as the
+ high limit is used and monitored properly, this limit's
+ utility is limited to providing the final safety net.
+
+ memory.events
+
+ A read-only flat-keyed file which exists on non-root cgroups.
+ The following entries are defined. Unless specified
+ otherwise, a value change in this file generates a file
+ modified event.
+
+ low
+
+ The number of times the cgroup is reclaimed due to
+ high memory pressure even though its usage is under
+ the low boundary. This usually indicates that the low
+ boundary is over-committed.
+
+ high
+
+ The number of times processes of the cgroup are
+ throttled and routed to perform direct memory reclaim
+ because the high memory boundary was exceeded. For a
+ cgroup whose memory usage is capped by the high limit
+ rather than global memory pressure, this event's
+ occurrences are expected.
+
+ max
+
+ The number of times the cgroup's memory usage was
+ about to go over the max boundary. If direct reclaim
+ fails to bring it down, the OOM killer is invoked.
+
+ oom
+
+ The number of times the OOM killer has been invoked in
+ the cgroup. This may not exactly match the number of
+ processes killed but should generally be close.
+
+
+5-2-2. General Usage
+
+"memory.high" is the main mechanism to control memory usage.
+Over-committing on high limit (sum of high limits > available memory)
+and letting global memory pressure to distribute memory according to
+usage is a viable strategy.
+
+Because breach of the high limit doesn't trigger the OOM killer but
+throttles the offending cgroup, a management agent has ample
+opportunities to monitor and take appropriate actions such as granting
+more memory or terminating the workload.
+
+Determining whether a cgroup has enough memory is not trivial as
+memory usage doesn't indicate whether the workload can benefit from
+more memory. For example, a workload which writes data received from
+network to a file can use all available memory but can also operate as
+performant with a small amount of memory. A measure of memory
+pressure - how much the workload is being impacted due to lack of
+memory - is necessary to determine whether a workload needs more
+memory; unfortunately, memory pressure monitoring mechanism isn't
+implemented yet.
+
+
+5-2-3. Memory Ownership
+
+A memory area is charged to the cgroup which instantiated it and stays
+charged to the cgroup until the area is released. Migrating a process
+to a different cgroup doesn't move the memory usages that it
+instantiated while in the previous cgroup to the new cgroup.
+
+A memory area may be used by processes belonging to different cgroups.
+To which cgroup the area will be charged is in-deterministic; however,
+over time, the memory area is likely to end up in a cgroup which has
+enough memory allowance to avoid high reclaim pressure.
+
+If a cgroup sweeps a considerable amount of memory which is expected
+to be accessed repeatedly by other cgroups, it may make sense to use
+POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
+belonging to the affected files to ensure correct memory ownership.
+
+
+5-3. IO
+
+The "io" controller regulates the distribution of IO resources. This
+controller implements both weight based and absolute bandwidth or IOPS
+limit distribution; however, weight based distribution is available
+only if cfq-iosched is in use and neither scheme is available for
+blk-mq devices.
+
+
+5-3-1. IO Interface Files
+
+ io.stat
+
+ A read-only nested-keyed file which exists on non-root
+ cgroups.
+
+ Lines are keyed by $MAJ:$MIN device numbers and not ordered.
+ The following nested keys are defined.
+
+ rbytes Bytes read
+ wbytes Bytes written
+ rios Number of read IOs
+ wios Number of write IOs
+
+ An example read output follows.
+
+ 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353
+ 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252
+
+ io.weight
+
+ A read-write flat-keyed file which exists on non-root cgroups.
+ The default is "default 100".
+
+ The first line is the default weight applied to devices
+ without specific override. The rest are overrides keyed by
+ $MAJ:$MIN device numbers and not ordered. The weights are in
+ the range [1, 10000] and specifies the relative amount IO time
+ the cgroup can use in relation to its siblings.
+
+ The default weight can be updated by writing either "default
+ $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing
+ "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default".
+
+ An example read output follows.
+
+ default 100
+ 8:16 200
+ 8:0 50
+
+ io.max
+
+ A read-write nested-keyed file which exists on non-root
+ cgroups.
+
+ BPS and IOPS based IO limit. Lines are keyed by $MAJ:$MIN
+ device numbers and not ordered. The following nested keys are
+ defined.
+
+ rbps Max read bytes per second
+ wbps Max write bytes per second
+ riops Max read IO operations per second
+ wiops Max write IO operations per second
+
+ When writing, any number of nested key-value pairs can be
+ specified in any order. "max" can be specified as the value
+ to remove a specific limit. If the same key is specified
+ multiple times, the outcome is undefined.
+
+ BPS and IOPS are measured in each IO direction and IOs are
+ delayed if limit is reached. Temporary bursts are allowed.
+
+ Setting read limit at 2M BPS and write at 120 IOPS for 8:16.
+
+ echo "8:16 rbps=2097152 wiops=120" > io.max
+
+ Reading returns the following.
+
+ 8:16 rbps=2097152 wbps=max riops=max wiops=120
+
+ Write IOPS limit can be removed by writing the following.
+
+ echo "8:16 wiops=max" > io.max
+
+ Reading now returns the following.
+
+ 8:16 rbps=2097152 wbps=max riops=max wiops=max
+
+
+5-3-2. Writeback
+
+Page cache is dirtied through buffered writes and shared mmaps and
+written asynchronously to the backing filesystem by the writeback
+mechanism. Writeback sits between the memory and IO domains and
+regulates the proportion of dirty memory by balancing dirtying and
+write IOs.
+
+The io controller, in conjunction with the memory controller,
+implements control of page cache writeback IOs. The memory controller
+defines the memory domain that dirty memory ratio is calculated and
+maintained for and the io controller defines the io domain which
+writes out dirty pages for the memory domain. Both system-wide and
+per-cgroup dirty memory states are examined and the more restrictive
+of the two is enforced.
+
+cgroup writeback requires explicit support from the underlying
+filesystem. Currently, cgroup writeback is implemented on ext2, ext4
+and btrfs. On other filesystems, all writeback IOs are attributed to
+the root cgroup.
+
+There are inherent differences in memory and writeback management
+which affects how cgroup ownership is tracked. Memory is tracked per
+page while writeback per inode. For the purpose of writeback, an
+inode is assigned to a cgroup and all IO requests to write dirty pages
+from the inode are attributed to that cgroup.
+
+As cgroup ownership for memory is tracked per page, there can be pages
+which are associated with different cgroups than the one the inode is
+associated with. These are called foreign pages. The writeback
+constantly keeps track of foreign pages and, if a particular foreign
+cgroup becomes the majority over a certain period of time, switches
+the ownership of the inode to that cgroup.
+
+While this model is enough for most use cases where a given inode is
+mostly dirtied by a single cgroup even when the main writing cgroup
+changes over time, use cases where multiple cgroups write to a single
+inode simultaneously are not supported well. In such circumstances, a
+significant portion of IOs are likely to be attributed incorrectly.
+As memory controller assigns page ownership on the first use and
+doesn't update it until the page is released, even if writeback
+strictly follows page ownership, multiple cgroups dirtying overlapping
+areas wouldn't work as expected. It's recommended to avoid such usage
+patterns.
+
+The sysctl knobs which affect writeback behavior are applied to cgroup
+writeback as follows.
+
+ vm.dirty_background_ratio
+ vm.dirty_ratio
+
+ These ratios apply the same to cgroup writeback with the
+ amount of available memory capped by limits imposed by the
+ memory controller and system-wide clean memory.
+
+ vm.dirty_background_bytes
+ vm.dirty_bytes
+
+ For cgroup writeback, this is calculated into ratio against
+ total available memory and applied the same way as
+ vm.dirty[_background]_ratio.
+
+
+P. Information on Kernel Programming
+
+This section contains kernel programming information in the areas
+where interacting with cgroup is necessary. cgroup core and
+controllers are not covered.
+
+
+P-1. Filesystem Support for Writeback
+
+A filesystem can support cgroup writeback by updating
+address_space_operations->writepage[s]() to annotate bio's using the
+following two functions.
+
+ wbc_init_bio(@wbc, @bio)
+
+ Should be called for each bio carrying writeback data and
+ associates the bio with the inode's owner cgroup. Can be
+ called anytime between bio allocation and submission.
+
+ wbc_account_io(@wbc, @page, @bytes)
+
+ Should be called for each data segment being written out.
+ While this function doesn't care exactly when it's called
+ during the writeback session, it's the easiest and most
+ natural to call it as data segments are added to a bio.
+
+With writeback bio's annotated, cgroup support can be enabled per
+super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
+selective disabling of cgroup writeback support which is helpful when
+certain filesystem features, e.g. journaled data mode, are
+incompatible.
+
+wbc_init_bio() binds the specified bio to its cgroup. Depending on
+the configuration, the bio may be executed at a lower priority and if
+the writeback session is holding shared resources, e.g. a journal
+entry, may lead to priority inversion. There is no one easy solution
+for the problem. Filesystems can try to work around specific problem
+cases by skipping wbc_init_bio() or using bio_associate_blkcg()
+directly.
+
+
+D. Deprecated v1 Core Features
+
+- Multiple hierarchies including named ones are not supported.
+
+- All mount options and remounting are not supported.
+
+- The "tasks" file is removed and "cgroup.procs" is not sorted.
+
+- "cgroup.clone_children" is removed.
+
+- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
+ at the root instead.
+
+
+R. Issues with v1 and Rationales for v2
+
+R-1. Multiple Hierarchies
+
+cgroup v1 allowed an arbitrary number of hierarchies and each
+hierarchy could host any number of controllers. While this seemed to
+provide a high level of flexibility, it wasn't useful in practice.
+
+For example, as there is only one instance of each controller, utility
+type controllers such as freezer which can be useful in all
+hierarchies could only be used in one. The issue is exacerbated by
+the fact that controllers couldn't be moved to another hierarchy once
+hierarchies were populated. Another issue was that all controllers
+bound to a hierarchy were forced to have exactly the same view of the
+hierarchy. It wasn't possible to vary the granularity depending on
+the specific controller.
+
+In practice, these issues heavily limited which controllers could be
+put on the same hierarchy and most configurations resorted to putting
+each controller on its own hierarchy. Only closely related ones, such
+as the cpu and cpuacct controllers, made sense to be put on the same
+hierarchy. This often meant that userland ended up managing multiple
+similar hierarchies repeating the same steps on each hierarchy
+whenever a hierarchy management operation was necessary.
+
+Furthermore, support for multiple hierarchies came at a steep cost.
+It greatly complicated cgroup core implementation but more importantly
+the support for multiple hierarchies restricted how cgroup could be
+used in general and what controllers was able to do.
+
+There was no limit on how many hierarchies there might be, which meant
+that a thread's cgroup membership couldn't be described in finite
+length. The key might contain any number of entries and was unlimited
+in length, which made it highly awkward to manipulate and led to
+addition of controllers which existed only to identify membership,
+which in turn exacerbated the original problem of proliferating number
+of hierarchies.
+
+Also, as a controller couldn't have any expectation regarding the
+topologies of hierarchies other controllers might be on, each
+controller had to assume that all other controllers were attached to
+completely orthogonal hierarchies. This made it impossible, or at
+least very cumbersome, for controllers to cooperate with each other.
+
+In most use cases, putting controllers on hierarchies which are
+completely orthogonal to each other isn't necessary. What usually is
+called for is the ability to have differing levels of granularity
+depending on the specific controller. In other words, hierarchy may
+be collapsed from leaf towards root when viewed from specific
+controllers. For example, a given configuration might not care about
+how memory is distributed beyond a certain level while still wanting
+to control how CPU cycles are distributed.
+
+
+R-2. Thread Granularity
+
+cgroup v1 allowed threads of a process to belong to different cgroups.
+This didn't make sense for some controllers and those controllers
+ended up implementing different ways to ignore such situations but
+much more importantly it blurred the line between API exposed to
+individual applications and system management interface.
+
+Generally, in-process knowledge is available only to the process
+itself; thus, unlike service-level organization of processes,
+categorizing threads of a process requires active participation from
+the application which owns the target process.
+
+cgroup v1 had an ambiguously defined delegation model which got abused
+in combination with thread granularity. cgroups were delegated to
+individual applications so that they can create and manage their own
+sub-hierarchies and control resource distributions along them. This
+effectively raised cgroup to the status of a syscall-like API exposed
+to lay programs.
+
+First of all, cgroup has a fundamentally inadequate interface to be
+exposed this way. For a process to access its own knobs, it has to
+extract the path on the target hierarchy from /proc/self/cgroup,
+construct the path by appending the name of the knob to the path, open
+and then read and/or write to it. This is not only extremely clunky
+and unusual but also inherently racy. There is no conventional way to
+define transaction across the required steps and nothing can guarantee
+that the process would actually be operating on its own sub-hierarchy.
+
+cgroup controllers implemented a number of knobs which would never be
+accepted as public APIs because they were just adding control knobs to
+system-management pseudo filesystem. cgroup ended up with interface
+knobs which were not properly abstracted or refined and directly
+revealed kernel internal details. These knobs got exposed to
+individual applications through the ill-defined delegation mechanism
+effectively abusing cgroup as a shortcut to implementing public APIs
+without going through the required scrutiny.
+
+This was painful for both userland and kernel. Userland ended up with
+misbehaving and poorly abstracted interfaces and kernel exposing and
+locked into constructs inadvertently.
+
+
+R-3. Competition Between Inner Nodes and Threads
+
+cgroup v1 allowed threads to be in any cgroups which created an
+interesting problem where threads belonging to a parent cgroup and its
+children cgroups competed for resources. This was nasty as two
+different types of entities competed and there was no obvious way to
+settle it. Different controllers did different things.
+
+The cpu controller considered threads and cgroups as equivalents and
+mapped nice levels to cgroup weights. This worked for some cases but
+fell flat when children wanted to be allocated specific ratios of CPU
+cycles and the number of internal threads fluctuated - the ratios
+constantly changed as the number of competing entities fluctuated.
+There also were other issues. The mapping from nice level to weight
+wasn't obvious or universal, and there were various other knobs which
+simply weren't available for threads.
+
+The io controller implicitly created a hidden leaf node for each
+cgroup to host the threads. The hidden leaf had its own copies of all
+the knobs with "leaf_" prefixed. While this allowed equivalent
+control over internal threads, it was with serious drawbacks. It
+always added an extra layer of nesting which wouldn't be necessary
+otherwise, made the interface messy and significantly complicated the
+implementation.
+
+The memory controller didn't have a way to control what happened
+between internal tasks and child cgroups and the behavior was not
+clearly defined. There were attempts to add ad-hoc behaviors and
+knobs to tailor the behavior to specific workloads which would have
+led to problems extremely difficult to resolve in the long term.
+
+Multiple controllers struggled with internal tasks and came up with
+different ways to deal with it; unfortunately, all the approaches were
+severely flawed and, furthermore, the widely different behaviors
+made cgroup as a whole highly inconsistent.
+
+This clearly is a problem which needs to be addressed from cgroup core
+in a uniform way.
+
+
+R-4. Other Interface Issues
+
+cgroup v1 grew without oversight and developed a large number of
+idiosyncrasies and inconsistencies. One issue on the cgroup core side
+was how an empty cgroup was notified - a userland helper binary was
+forked and executed for each event. The event delivery wasn't
+recursive or delegatable. The limitations of the mechanism also led
+to in-kernel event delivery filtering mechanism further complicating
+the interface.
+
+Controller interfaces were problematic too. An extreme example is
+controllers completely ignoring hierarchical organization and treating
+all cgroups as if they were all located directly under the root
+cgroup. Some controllers exposed a large amount of inconsistent
+implementation details to userland.
+
+There also was no consistency across controllers. When a new cgroup
+was created, some controllers defaulted to not imposing extra
+restrictions while others disallowed any resource usage until
+explicitly configured. Configuration knobs for the same type of
+control used widely differing naming schemes and formats. Statistics
+and information knobs were named arbitrarily and used different
+formats and units even in the same controller.
+
+cgroup v2 establishes common conventions where appropriate and updates
+controllers so that they expose minimal and consistent interfaces.
+
+
+R-5. Controller Issues and Remedies
+
+R-5-1. Memory
+
+The original lower boundary, the soft limit, is defined as a limit
+that is per default unset. As a result, the set of cgroups that
+global reclaim prefers is opt-in, rather than opt-out. The costs for
+optimizing these mostly negative lookups are so high that the
+implementation, despite its enormous size, does not even provide the
+basic desirable behavior. First off, the soft limit has no
+hierarchical meaning. All configured groups are organized in a global
+rbtree and treated like equal peers, regardless where they are located
+in the hierarchy. This makes subtree delegation impossible. Second,
+the soft limit reclaim pass is so aggressive that it not just
+introduces high allocation latencies into the system, but also impacts
+system performance due to overreclaim, to the point where the feature
+becomes self-defeating.
+
+The memory.low boundary on the other hand is a top-down allocated
+reserve. A cgroup enjoys reclaim protection when it and all its
+ancestors are below their low boundaries, which makes delegation of
+subtrees possible. Secondly, new cgroups have no reserve per default
+and in the common case most cgroups are eligible for the preferred
+reclaim pass. This allows the new low boundary to be efficiently
+implemented with just a minor addition to the generic reclaim code,
+without the need for out-of-band data structures and reclaim passes.
+Because the generic reclaim code considers all cgroups except for the
+ones running low in the preferred first reclaim pass, overreclaim of
+individual groups is eliminated as well, resulting in much better
+overall workload performance.
+
+The original high boundary, the hard limit, is defined as a strict
+limit that can not budge, even if the OOM killer has to be called.
+But this generally goes against the goal of making the most out of the
+available memory. The memory consumption of workloads varies during
+runtime, and that requires users to overcommit. But doing that with a
+strict upper limit requires either a fairly accurate prediction of the
+working set size or adding slack to the limit. Since working set size
+estimation is hard and error prone, and getting it wrong results in
+OOM kills, most users tend to err on the side of a looser limit and
+end up wasting precious resources.
+
+The memory.high boundary on the other hand can be set much more
+conservatively. When hit, it throttles allocations by forcing them
+into direct reclaim to work off the excess, but it never invokes the
+OOM killer. As a result, a high boundary that is chosen too
+aggressively will not terminate the processes, but instead it will
+lead to gradual performance degradation. The user can monitor this
+and make corrections until the minimal memory footprint that still
+gives acceptable performance is found.
+
+In extreme cases, with many concurrent allocations and a complete
+breakdown of reclaim progress within the group, the high boundary can
+be exceeded. But even then it's mostly better to satisfy the
+allocation from the slack available in other groups or the rest of the
+system than killing the group. Otherwise, memory.max is there to
+limit this type of spillover and ultimately contain buggy or even
+malicious applications.
diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
deleted file mode 100644
index 781b1d4..0000000
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ /dev/null
@@ -1,647 +0,0 @@
-
-Cgroup unified hierarchy
-
-April, 2014 Tejun Heo <tj@kernel.org>
-
-This document describes the changes made by unified hierarchy and
-their rationales. It will eventually be merged into the main cgroup
-documentation.
-
-CONTENTS
-
-1. Background
-2. Basic Operation
- 2-1. Mounting
- 2-2. cgroup.subtree_control
- 2-3. cgroup.controllers
-3. Structural Constraints
- 3-1. Top-down
- 3-2. No internal tasks
-4. Delegation
- 4-1. Model of delegation
- 4-2. Common ancestor rule
-5. Other Changes
- 5-1. [Un]populated Notification
- 5-2. Other Core Changes
- 5-3. Controller File Conventions
- 5-3-1. Format
- 5-3-2. Control Knobs
- 5-4. Per-Controller Changes
- 5-4-1. io
- 5-4-2. cpuset
- 5-4-3. memory
-6. Planned Changes
- 6-1. CAP for resource control
-
-
-1. Background
-
-cgroup allows an arbitrary number of hierarchies and each hierarchy
-can host any number of controllers. While this seems to provide a
-high level of flexibility, it isn't quite useful in practice.
-
-For example, as there is only one instance of each controller, utility
-type controllers such as freezer which can be useful in all
-hierarchies can only be used in one. The issue is exacerbated by the
-fact that controllers can't be moved around once hierarchies are
-populated. Another issue is that all controllers bound to a hierarchy
-are forced to have exactly the same view of the hierarchy. It isn't
-possible to vary the granularity depending on the specific controller.
-
-In practice, these issues heavily limit which controllers can be put
-on the same hierarchy and most configurations resort to putting each
-controller on its own hierarchy. Only closely related ones, such as
-the cpu and cpuacct controllers, make sense to put on the same
-hierarchy. This often means that userland ends up managing multiple
-similar hierarchies repeating the same steps on each hierarchy
-whenever a hierarchy management operation is necessary.
-
-Unfortunately, support for multiple hierarchies comes at a steep cost.
-Internal implementation in cgroup core proper is dazzlingly
-complicated but more importantly the support for multiple hierarchies
-restricts how cgroup is used in general and what controllers can do.
-
-There's no limit on how many hierarchies there may be, which means
-that a task's cgroup membership can't be described in finite length.
-The key may contain any varying number of entries and is unlimited in
-length, which makes it highly awkward to handle and leads to addition
-of controllers which exist only to identify membership, which in turn
-exacerbates the original problem.
-
-Also, as a controller can't have any expectation regarding what shape
-of hierarchies other controllers would be on, each controller has to
-assume that all other controllers are operating on completely
-orthogonal hierarchies. This makes it impossible, or at least very
-cumbersome, for controllers to cooperate with each other.
-
-In most use cases, putting controllers on hierarchies which are
-completely orthogonal to each other isn't necessary. What usually is
-called for is the ability to have differing levels of granularity
-depending on the specific controller. In other words, hierarchy may
-be collapsed from leaf towards root when viewed from specific
-controllers. For example, a given configuration might not care about
-how memory is distributed beyond a certain level while still wanting
-to control how CPU cycles are distributed.
-
-Unified hierarchy is the next version of cgroup interface. It aims to
-address the aforementioned issues by having more structure while
-retaining enough flexibility for most use cases. Various other
-general and controller-specific interface issues are also addressed in
-the process.
-
-
-2. Basic Operation
-
-2-1. Mounting
-
-Currently, unified hierarchy can be mounted with the following mount
-command. Note that this is still under development and scheduled to
-change soon.
-
- mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT
-
-All controllers which support the unified hierarchy and are not bound
-to other hierarchies are automatically bound to unified hierarchy and
-show up at the root of it. Controllers which are enabled only in the
-root of unified hierarchy can be bound to other hierarchies. This
-allows mixing unified hierarchy with the traditional multiple
-hierarchies in a fully backward compatible way.
-
-A controller can be moved across hierarchies only after the controller
-is no longer referenced in its current hierarchy. Because per-cgroup
-controller states are destroyed asynchronously and controllers may
-have lingering references, a controller may not show up immediately on
-the unified hierarchy after the final umount of the previous
-hierarchy. Similarly, a controller should be fully disabled to be
-moved out of the unified hierarchy and it may take some time for the
-disabled controller to become available for other hierarchies;
-furthermore, due to dependencies among controllers, other controllers
-may need to be disabled too.
-
-While useful for development and manual configurations, dynamically
-moving controllers between the unified and other hierarchies is
-strongly discouraged for production use. It is recommended to decide
-the hierarchies and controller associations before starting using the
-controllers.
-
-
-2-2. cgroup.subtree_control
-
-All cgroups on unified hierarchy have a "cgroup.subtree_control" file
-which governs which controllers are enabled on the children of the
-cgroup. Let's assume a hierarchy like the following.
-
- root - A - B - C
- \ D
-
-root's "cgroup.subtree_control" file determines which controllers are
-enabled on A. A's on B. B's on C and D. This coincides with the
-fact that controllers on the immediate sub-level are used to
-distribute the resources of the parent. In fact, it's natural to
-assume that resource control knobs of a child belong to its parent.
-Enabling a controller in a "cgroup.subtree_control" file declares that
-distribution of the respective resources of the cgroup will be
-controlled. Note that this means that controller enable states are
-shared among siblings.
-
-When read, the file contains a space-separated list of currently
-enabled controllers. A write to the file should contain a
-space-separated list of controllers with '+' or '-' prefixed (without
-the quotes). Controllers prefixed with '+' are enabled and '-'
-disabled. If a controller is listed multiple times, the last entry
-wins. The specific operations are executed atomically - either all
-succeed or fail.
-
-
-2-3. cgroup.controllers
-
-Read-only "cgroup.controllers" file contains a space-separated list of
-controllers which can be enabled in the cgroup's
-"cgroup.subtree_control" file.
-
-In the root cgroup, this lists controllers which are not bound to
-other hierarchies and the content changes as controllers are bound to
-and unbound from other hierarchies.
-
-In non-root cgroups, the content of this file equals that of the
-parent's "cgroup.subtree_control" file as only controllers enabled
-from the parent can be used in its children.
-
-
-3. Structural Constraints
-
-3-1. Top-down
-
-As it doesn't make sense to nest control of an uncontrolled resource,
-all non-root "cgroup.subtree_control" files can only contain
-controllers which are enabled in the parent's "cgroup.subtree_control"
-file. A controller can be enabled only if the parent has the
-controller enabled and a controller can't be disabled if one or more
-children have it enabled.
-
-
-3-2. No internal tasks
-
-One long-standing issue that cgroup faces is the competition between
-tasks belonging to the parent cgroup and its children cgroups. This
-is inherently nasty as two different types of entities compete and
-there is no agreed-upon obvious way to handle it. Different
-controllers are doing different things.
-
-The cpu controller considers tasks and cgroups as equivalents and maps
-nice levels to cgroup weights. This works for some cases but falls
-flat when children should be allocated specific ratios of CPU cycles
-and the number of internal tasks fluctuates - the ratios constantly
-change as the number of competing entities fluctuates. There also are
-other issues. The mapping from nice level to weight isn't obvious or
-universal, and there are various other knobs which simply aren't
-available for tasks.
-
-The io controller implicitly creates a hidden leaf node for each
-cgroup to host the tasks. The hidden leaf has its own copies of all
-the knobs with "leaf_" prefixed. While this allows equivalent control
-over internal tasks, it's with serious drawbacks. It always adds an
-extra layer of nesting which may not be necessary, makes the interface
-messy and significantly complicates the implementation.
-
-The memory controller currently doesn't have a way to control what
-happens between internal tasks and child cgroups and the behavior is
-not clearly defined. There have been attempts to add ad-hoc behaviors
-and knobs to tailor the behavior to specific workloads. Continuing
-this direction will lead to problems which will be extremely difficult
-to resolve in the long term.
-
-Multiple controllers struggle with internal tasks and came up with
-different ways to deal with it; unfortunately, all the approaches in
-use now are severely flawed and, furthermore, the widely different
-behaviors make cgroup as whole highly inconsistent.
-
-It is clear that this is something which needs to be addressed from
-cgroup core proper in a uniform way so that controllers don't need to
-worry about it and cgroup as a whole shows a consistent and logical
-behavior. To achieve that, unified hierarchy enforces the following
-structural constraint:
-
- Except for the root, only cgroups which don't contain any task may
- have controllers enabled in their "cgroup.subtree_control" files.
-
-Combined with other properties, this guarantees that, when a
-controller is looking at the part of the hierarchy which has it
-enabled, tasks are always only on the leaves. This rules out
-situations where child cgroups compete against internal tasks of the
-parent.
-
-There are two things to note. Firstly, the root cgroup is exempt from
-the restriction. Root contains tasks and anonymous resource
-consumption which can't be associated with any other cgroup and
-requires special treatment from most controllers. How resource
-consumption in the root cgroup is governed is up to each controller.
-
-Secondly, the restriction doesn't take effect if there is no enabled
-controller in the cgroup's "cgroup.subtree_control" file. This is
-important as otherwise it wouldn't be possible to create children of a
-populated cgroup. To control resource distribution of a cgroup, the
-cgroup must create children and transfer all its tasks to the children
-before enabling controllers in its "cgroup.subtree_control" file.
-
-
-4. Delegation
-
-4-1. Model of delegation
-
-A cgroup can be delegated to a less privileged user by granting write
-access of the directory and its "cgroup.procs" file to the user. Note
-that the resource control knobs in a given directory concern the
-resources of the parent and thus must not be delegated along with the
-directory.
-
-Once delegated, the user can build sub-hierarchy under the directory,
-organize processes as it sees fit and further distribute the resources
-it got from the parent. The limits and other settings of all resource
-controllers are hierarchical and regardless of what happens in the
-delegated sub-hierarchy, nothing can escape the resource restrictions
-imposed by the parent.
-
-Currently, cgroup doesn't impose any restrictions on the number of
-cgroups in or nesting depth of a delegated sub-hierarchy; however,
-this may in the future be limited explicitly.
-
-
-4-2. Common ancestor rule
-
-On the unified hierarchy, to write to a "cgroup.procs" file, in
-addition to the usual write permission to the file and uid match, the
-writer must also have write access to the "cgroup.procs" file of the
-common ancestor of the source and destination cgroups. This prevents
-delegatees from smuggling processes across disjoint sub-hierarchies.
-
-Let's say cgroups C0 and C1 have been delegated to user U0 who created
-C00, C01 under C0 and C10 under C1 as follows.
-
- ~~~~~~~~~~~~~ - C0 - C00
- ~ cgroup ~ \ C01
- ~ hierarchy ~
- ~~~~~~~~~~~~~ - C1 - C10
-
-C0 and C1 are separate entities in terms of resource distribution
-regardless of their relative positions in the hierarchy. The
-resources the processes under C0 are entitled to are controlled by
-C0's ancestors and may be completely different from C1. It's clear
-that the intention of delegating C0 to U0 is allowing U0 to organize
-the processes under C0 and further control the distribution of C0's
-resources.
-
-On traditional hierarchies, if a task has write access to "tasks" or
-"cgroup.procs" file of a cgroup and its uid agrees with the target, it
-can move the target to the cgroup. In the above example, U0 will not
-only be able to move processes in each sub-hierarchy but also across
-the two sub-hierarchies, effectively allowing it to violate the
-organizational and resource restrictions implied by the hierarchical
-structure above C0 and C1.
-
-On the unified hierarchy, let's say U0 wants to write the pid of a
-process which has a matching uid and is currently in C10 into
-"C00/cgroup.procs". U0 obviously has write access to the file and
-migration permission on the process; however, the common ancestor of
-the source cgroup C10 and the destination cgroup C00 is above the
-points of delegation and U0 would not have write access to its
-"cgroup.procs" and thus be denied with -EACCES.
-
-
-5. Other Changes
-
-5-1. [Un]populated Notification
-
-cgroup users often need a way to determine when a cgroup's
-subhierarchy becomes empty so that it can be cleaned up. cgroup
-currently provides release_agent for it; unfortunately, this mechanism
-is riddled with issues.
-
-- It delivers events by forking and execing a userland binary
- specified as the release_agent. This is a long deprecated method of
- notification delivery. It's extremely heavy, slow and cumbersome to
- integrate with larger infrastructure.
-
-- There is single monitoring point at the root. There's no way to
- delegate management of a subtree.
-
-- The event isn't recursive. It triggers when a cgroup doesn't have
- any tasks or child cgroups. Events for internal nodes trigger only
- after all children are removed. This again makes it impossible to
- delegate management of a subtree.
-
-- Events are filtered from the kernel side. A "notify_on_release"
- file is used to subscribe to or suppress release events. This is
- unnecessarily complicated and probably done this way because event
- delivery itself was expensive.
-
-Unified hierarchy implements "populated" field in "cgroup.events"
-interface file which can be used to monitor whether the cgroup's
-subhierarchy has tasks in it or not. Its value is 0 if there is no
-task in the cgroup and its descendants; otherwise, 1. poll and
-[id]notify events are triggered when the value changes.
-
-This is significantly lighter and simpler and trivially allows
-delegating management of subhierarchy - subhierarchy monitoring can
-block further propagation simply by putting itself or another process
-in the subhierarchy and monitor events that it's interested in from
-there without interfering with monitoring higher in the tree.
-
-In unified hierarchy, the release_agent mechanism is no longer
-supported and the interface files "release_agent" and
-"notify_on_release" do not exist.
-
-
-5-2. Other Core Changes
-
-- None of the mount options is allowed.
-
-- remount is disallowed.
-
-- rename(2) is disallowed.
-
-- The "tasks" file is removed. Everything should at process
- granularity. Use the "cgroup.procs" file instead.
-
-- The "cgroup.procs" file is not sorted. pids will be unique unless
- they got recycled in-between reads.
-
-- The "cgroup.clone_children" file is removed.
-
-- /proc/PID/cgroup keeps reporting the cgroup that a zombie belonged
- to before exiting. If the cgroup is removed before the zombie is
- reaped, " (deleted)" is appeneded to the path.
-
-
-5-3. Controller File Conventions
-
-5-3-1. Format
-
-In general, all controller files should be in one of the following
-formats whenever possible.
-
-- Values only files
-
- VAL0 VAL1...\n
-
-- Flat keyed files
-
- KEY0 VAL0\n
- KEY1 VAL1\n
- ...
-
-- Nested keyed files
-
- KEY0 SUB_KEY0=VAL00 SUB_KEY1=VAL01...
- KEY1 SUB_KEY0=VAL10 SUB_KEY1=VAL11...
- ...
-
-For a writeable file, the format for writing should generally match
-reading; however, controllers may allow omitting later fields or
-implement restricted shortcuts for most common use cases.
-
-For both flat and nested keyed files, only the values for a single key
-can be written at a time. For nested keyed files, the sub key pairs
-may be specified in any order and not all pairs have to be specified.
-
-
-5-3-2. Control Knobs
-
-- Settings for a single feature should generally be implemented in a
- single file.
-
-- In general, the root cgroup should be exempt from resource control
- and thus shouldn't have resource control knobs.
-
-- If a controller implements ratio based resource distribution, the
- control knob should be named "weight" and have the range [1, 10000]
- and 100 should be the default value. The values are chosen to allow
- enough and symmetric bias in both directions while keeping it
- intuitive (the default is 100%).
-
-- If a controller implements an absolute resource guarantee and/or
- limit, the control knobs should be named "min" and "max"
- respectively. If a controller implements best effort resource
- gurantee and/or limit, the control knobs should be named "low" and
- "high" respectively.
-
- In the above four control files, the special token "max" should be
- used to represent upward infinity for both reading and writing.
-
-- If a setting has configurable default value and specific overrides,
- the default settings should be keyed with "default" and appear as
- the first entry in the file. Specific entries can use "default" as
- its value to indicate inheritance of the default value.
-
-- For events which are not very high frequency, an interface file
- "events" should be created which lists event key value pairs.
- Whenever a notifiable event happens, file modified event should be
- generated on the file.
-
-
-5-4. Per-Controller Changes
-
-5-4-1. io
-
-- blkio is renamed to io. The interface is overhauled anyway. The
- new name is more in line with the other two major controllers, cpu
- and memory, and better suited given that it may be used for cgroup
- writeback without involving block layer.
-
-- Everything including stat is always hierarchical making separate
- recursive stat files pointless and, as no internal node can have
- tasks, leaf weights are meaningless. The operation model is
- simplified and the interface is overhauled accordingly.
-
- io.stat
-
- The stat file. The reported stats are from the point where
- bio's are issued to request_queue. The stats are counted
- independent of which policies are enabled. Each line in the
- file follows the following format. More fields may later be
- added at the end.
-
- $MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS
-
- io.weight
-
- The weight setting, currently only available and effective if
- cfq-iosched is in use for the target device. The weight is
- between 1 and 10000 and defaults to 100. The first line
- always contains the default weight in the following format to
- use when per-device setting is missing.
-
- default $WEIGHT
-
- Subsequent lines list per-device weights of the following
- format.
-
- $MAJ:$MIN $WEIGHT
-
- Writing "$WEIGHT" or "default $WEIGHT" changes the default
- setting. Writing "$MAJ:$MIN $WEIGHT" sets per-device weight
- while "$MAJ:$MIN default" clears it.
-
- This file is available only on non-root cgroups.
-
- io.max
-
- The maximum bandwidth and/or iops setting, only available if
- blk-throttle is enabled. The file is of the following format.
-
- $MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS
-
- ${R|W}BPS are read/write bytes per second and ${R|W}IOPS are
- read/write IOs per second. "max" indicates no limit. Writing
- to the file follows the same format but the individual
- settings may be omitted or specified in any order.
-
- This file is available only on non-root cgroups.
-
-
-5-4-2. cpuset
-
-- Tasks are kept in empty cpusets after hotplug and take on the masks
- of the nearest non-empty ancestor, instead of being moved to it.
-
-- A task can be moved into an empty cpuset, and again it takes on the
- masks of the nearest non-empty ancestor.
-
-
-5-4-3. memory
-
-- use_hierarchy is on by default and the cgroup file for the flag is
- not created.
-
-- The original lower boundary, the soft limit, is defined as a limit
- that is per default unset. As a result, the set of cgroups that
- global reclaim prefers is opt-in, rather than opt-out. The costs
- for optimizing these mostly negative lookups are so high that the
- implementation, despite its enormous size, does not even provide the
- basic desirable behavior. First off, the soft limit has no
- hierarchical meaning. All configured groups are organized in a
- global rbtree and treated like equal peers, regardless where they
- are located in the hierarchy. This makes subtree delegation
- impossible. Second, the soft limit reclaim pass is so aggressive
- that it not just introduces high allocation latencies into the
- system, but also impacts system performance due to overreclaim, to
- the point where the feature becomes self-defeating.
-
- The memory.low boundary on the other hand is a top-down allocated
- reserve. A cgroup enjoys reclaim protection when it and all its
- ancestors are below their low boundaries, which makes delegation of
- subtrees possible. Secondly, new cgroups have no reserve per
- default and in the common case most cgroups are eligible for the
- preferred reclaim pass. This allows the new low boundary to be
- efficiently implemented with just a minor addition to the generic
- reclaim code, without the need for out-of-band data structures and
- reclaim passes. Because the generic reclaim code considers all
- cgroups except for the ones running low in the preferred first
- reclaim pass, overreclaim of individual groups is eliminated as
- well, resulting in much better overall workload performance.
-
-- The original high boundary, the hard limit, is defined as a strict
- limit that can not budge, even if the OOM killer has to be called.
- But this generally goes against the goal of making the most out of
- the available memory. The memory consumption of workloads varies
- during runtime, and that requires users to overcommit. But doing
- that with a strict upper limit requires either a fairly accurate
- prediction of the working set size or adding slack to the limit.
- Since working set size estimation is hard and error prone, and
- getting it wrong results in OOM kills, most users tend to err on the
- side of a looser limit and end up wasting precious resources.
-
- The memory.high boundary on the other hand can be set much more
- conservatively. When hit, it throttles allocations by forcing them
- into direct reclaim to work off the excess, but it never invokes the
- OOM killer. As a result, a high boundary that is chosen too
- aggressively will not terminate the processes, but instead it will
- lead to gradual performance degradation. The user can monitor this
- and make corrections until the minimal memory footprint that still
- gives acceptable performance is found.
-
- In extreme cases, with many concurrent allocations and a complete
- breakdown of reclaim progress within the group, the high boundary
- can be exceeded. But even then it's mostly better to satisfy the
- allocation from the slack available in other groups or the rest of
- the system than killing the group. Otherwise, memory.max is there
- to limit this type of spillover and ultimately contain buggy or even
- malicious applications.
-
-- The original control file names are unwieldy and inconsistent in
- many different ways. For example, the upper boundary hit count is
- exported in the memory.failcnt file, but an OOM event count has to
- be manually counted by listening to memory.oom_control events, and
- lower boundary / soft limit events have to be counted by first
- setting a threshold for that value and then counting those events.
- Also, usage and limit files encode their units in the filename.
- That makes the filenames very long, even though this is not
- information that a user needs to be reminded of every time they type
- out those names.
-
- To address these naming issues, as well as to signal clearly that
- the new interface carries a new configuration model, the naming
- conventions in it necessarily differ from the old interface.
-
-- The original limit files indicate the state of an unset limit with a
- Very High Number, and a configured limit can be unset by echoing -1
- into those files. But that very high number is implementation and
- architecture dependent and not very descriptive. And while -1 can
- be understood as an underflow into the highest possible value, -2 or
- -10M etc. do not work, so it's not consistent.
-
- memory.low, memory.high, and memory.max will use the string "max" to
- indicate and set the highest possible value.
-
-6. Planned Changes
-
-6-1. CAP for resource control
-
-Unified hierarchy will require one of the capabilities(7), which is
-yet to be decided, for all resource control related knobs. Process
-organization operations - creation of sub-cgroups and migration of
-processes in sub-hierarchies may be delegated by changing the
-ownership and/or permissions on the cgroup directory and
-"cgroup.procs" interface file; however, all operations which affect
-resource control - writes to a "cgroup.subtree_control" file or any
-controller-specific knobs - will require an explicit CAP privilege.
-
-This, in part, is to prevent the cgroup interface from being
-inadvertently promoted to programmable API used by non-privileged
-binaries. cgroup exposes various aspects of the system in ways which
-aren't properly abstracted for direct consumption by regular programs.
-This is an administration interface much closer to sysctl knobs than
-system calls. Even the basic access model, being filesystem path
-based, isn't suitable for direct consumption. There's no way to
-access "my cgroup" in a race-free way or make multiple operations
-atomic against migration to another cgroup.
-
-Another aspect is that, for better or for worse, the cgroup interface
-goes through far less scrutiny than regular interfaces for
-unprivileged userland. The upside is that cgroup is able to expose
-useful features which may not be suitable for general consumption in a
-reasonable time frame. It provides a relatively short path between
-internal details and userland-visible interface. Of course, this
-shortcut comes with high risk. We go through what we go through for
-general kernel APIs for good reasons. It may end up leaking internal
-details in a way which can exert significant pain by locking the
-kernel into a contract that can't be maintained in a reasonable
-manner.
-
-Also, due to the specific nature, cgroup and its controllers don't
-tend to attract attention from a wide scope of developers. cgroup's
-short history is already fraught with severely mis-designed
-interfaces, unnecessary commitments to and exposing of internal
-details, broken and dangerous implementations of various features.
-
-Keeping cgroup as an administration interface is both advantageous for
-its role and imperative given its nature. Some of the cgroup features
-may make sense for unprivileged access. If deemed justified, those
-must be further abstracted and implemented as a different interface,
-be it a system call or process-private filesystem, and survive through
-the scrutiny that any interface for general consumption is required to
-go through.
-
-Requiring CAP is not a complete solution but should serve as a
-significant deterrent against spraying cgroup usages in non-privileged
-programs.
diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
index be8d400..f7b12c0 100644
--- a/Documentation/cpu-freq/intel-pstate.txt
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -1,61 +1,131 @@
-Intel P-state driver
+Intel P-State driver
--------------------
-This driver provides an interface to control the P state selection for
-SandyBridge+ Intel processors. The driver can operate two different
-modes based on the processor model, legacy mode and Hardware P state (HWP)
-mode.
-
-In legacy mode, the Intel P-state implements two internal governors,
-performance and powersave, that differ from the general cpufreq governors of
-the same name (the general cpufreq governors implement target(), whereas the
-internal Intel P-state governors implement setpolicy()). The internal
-performance governor sets the max_perf_pct and min_perf_pct to 100; that is,
-the governor selects the highest available P state to maximize the performance
-of the core. The internal powersave governor selects the appropriate P state
-based on the current load on the CPU.
-
-In HWP mode P state selection is implemented in the processor
-itself. The driver provides the interfaces between the cpufreq core and
-the processor to control P state selection based on user preferences
-and reporting frequency to the cpufreq core. In this mode the
-internal Intel P-state governor code is disabled.
-
-In addition to the interfaces provided by the cpufreq core for
-controlling frequency the driver provides sysfs files for
-controlling P state selection. These files have been added to
-/sys/devices/system/cpu/intel_pstate/
-
- max_perf_pct: limits the maximum P state that will be requested by
- the driver stated as a percentage of the available performance. The
- available (P states) performance may be reduced by the no_turbo
+This driver provides an interface to control the P-State selection for the
+SandyBridge+ Intel processors.
+
+The following document explains P-States:
+http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
+As stated in the document, P-State doesn’t exactly mean a frequency. However, for
+the sake of the relationship with cpufreq, P-State and frequency are used
+interchangeably.
+
+Understanding the cpufreq core governors and policies are important before
+discussing more details about the Intel P-State driver. Based on what callbacks
+a cpufreq driver provides to the cpufreq core, it can support two types of
+drivers:
+- with target_index() callback: In this mode, the drivers using cpufreq core
+simply provide the minimum and maximum frequency limits and an additional
+interface target_index() to set the current frequency. The cpufreq subsystem
+has a number of scaling governors ("performance", "powersave", "ondemand",
+etc.). Depending on which governor is in use, cpufreq core will call for
+transitions to a specific frequency using target_index() callback.
+- setpolicy() callback: In this mode, drivers do not provide target_index()
+callback, so cpufreq core can't request a transition to a specific frequency.
+The driver provides minimum and maximum frequency limits and callbacks to set a
+policy. The policy in cpufreq sysfs is referred to as the "scaling governor".
+The cpufreq core can request the driver to operate in any of the two policies:
+"performance: and "powersave". The driver decides which frequency to use based
+on the above policy selection considering minimum and maximum frequency limits.
+
+The Intel P-State driver falls under the latter category, which implements the
+setpolicy() callback. This driver decides what P-State to use based on the
+requested policy from the cpufreq core. If the processor is capable of
+selecting its next P-State internally, then the driver will offload this
+responsibility to the processor (aka HWP: Hardware P-States). If not, the
+driver implements algorithms to select the next P-State.
+
+Since these policies are implemented in the driver, they are not same as the
+cpufreq scaling governors implementation, even if they have the same name in
+the cpufreq sysfs (scaling_governors). For example the "performance" policy is
+similar to cpufreq’s "performance" governor, but "powersave" is completely
+different than the cpufreq "powersave" governor. The strategy here is similar
+to cpufreq "ondemand", where the requested P-State is related to the system load.
+
+Sysfs Interface
+
+In addition to the frequency-controlling interfaces provided by the cpufreq
+core, the driver provides its own sysfs files to control the P-State selection.
+These files have been added to /sys/devices/system/cpu/intel_pstate/.
+Any changes made to these files are applicable to all CPUs (even in a
+multi-package system).
+
+ max_perf_pct: Limits the maximum P-State that will be requested by
+ the driver. It states it as a percentage of the available performance. The
+ available (P-State) performance may be reduced by the no_turbo
setting described below.
- min_perf_pct: limits the minimum P state that will be requested by
- the driver stated as a percentage of the max (non-turbo)
+ min_perf_pct: Limits the minimum P-State that will be requested by
+ the driver. It states it as a percentage of the max (non-turbo)
performance level.
- no_turbo: limits the driver to selecting P states below the turbo
+ no_turbo: Limits the driver to selecting P-State below the turbo
frequency range.
- turbo_pct: displays the percentage of the total performance that
- is supported by hardware that is in the turbo range. This number
+ turbo_pct: Displays the percentage of the total performance that
+ is supported by hardware that is in the turbo range. This number
is independent of whether turbo has been disabled or not.
- num_pstates: displays the number of pstates that are supported
- by hardware. This number is independent of whether turbo has
+ num_pstates: Displays the number of P-States that are supported
+ by hardware. This number is independent of whether turbo has
been disabled or not.
+For example, if a system has these parameters:
+ Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State)
+ Max non turbo ratio: 0x17
+ Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio)
+
+Sysfs will show :
+ max_perf_pct:100, which corresponds to 1 core ratio
+ min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio
+ no_turbo:0, turbo is not disabled
+ num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1)
+ turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates
+
+Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual
+Volume 3: System Programming Guide" to understand ratios.
+
+cpufreq sysfs for Intel P-State
+
+Since this driver registers with cpufreq, cpufreq sysfs is also presented.
+There are some important differences, which need to be considered.
+
+scaling_cur_freq: This displays the real frequency which was used during
+the last sample period instead of what is requested. Some other cpufreq driver,
+like acpi-cpufreq, displays what is requested (Some changes are on the
+way to fix this for acpi-cpufreq driver). The same is true for frequencies
+displayed at /proc/cpuinfo.
+
+scaling_governor: This displays current active policy. Since each CPU has a
+cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this
+is not possible with Intel P-States, as there is one common policy for all
+CPUs. Here, the last requested policy will be applicable to all CPUs. It is
+suggested that one use the cpupower utility to change policy to all CPUs at the
+same time.
+
+scaling_setspeed: This attribute can never be used with Intel P-State.
+
+scaling_max_freq/scaling_min_freq: This interface can be used similarly to
+the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies
+are converted to nearest possible P-State, this is prone to rounding errors.
+This method is not preferred to limit performance.
+
+affected_cpus: Not used
+related_cpus: Not used
+
For contemporary Intel processors, the frequency is controlled by the
-processor itself and the P-states exposed to software are related to
+processor itself and the P-State exposed to software is related to
performance levels. The idea that frequency can be set to a single
-frequency is fiction for Intel Core processors. Even if the scaling
-driver selects a single P state the actual frequency the processor
+frequency is fictional for Intel Core processors. Even if the scaling
+driver selects a single P-State, the actual frequency the processor
will run at is selected by the processor itself.
-For legacy mode debugfs files have also been added to allow tuning of
-the internal governor algorythm. These files are located at
-/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
+Tuning Intel P-State driver
+
+When HWP mode is not used, debugfs files have also been added to allow the
+tuning of the internal governor algorithm. These files are located at
+/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional
+Integral Derivative) controller. The PID tunable parameters are:
deadband
d_gain_pct
@@ -63,3 +133,90 @@ the internal governor algorythm. These files are located at
p_gain_pct
sample_rate_ms
setpoint
+
+To adjust these parameters, some understanding of driver implementation is
+necessary. There are some tweeks described here, but be very careful. Adjusting
+them requires expert level understanding of power and performance relationship.
+These limits are only useful when the "powersave" policy is active.
+
+-To make the system more responsive to load changes, sample_rate_ms can
+be adjusted (current default is 10ms).
+-To make the system use higher performance, even if the load is lower, setpoint
+can be adjusted to a lower number. This will also lead to faster ramp up time
+to reach the maximum P-State.
+If there are no derivative and integral coefficients, The next P-State will be
+equal to:
+ current P-State - ((setpoint - current cpu load) * p_gain_pct)
+
+For example, if the current PID parameters are (Which are defaults for the core
+processors like SandyBridge):
+ deadband = 0
+ d_gain_pct = 0
+ i_gain_pct = 0
+ p_gain_pct = 20
+ sample_rate_ms = 10
+ setpoint = 97
+
+If the current P-State = 0x08 and current load = 100, this will result in the
+next P-State = 0x08 - ((97 - 100) * 0.2) = 8.6 (rounded to 9). Here the P-State
+goes up by only 1. If during next sample interval the current load doesn't
+change and still 100, then P-State goes up by one again. This process will
+continue as long as the load is more than the setpoint until the maximum P-State
+is reached.
+
+For the same load at setpoint = 60, this will result in the next P-State
+= 0x08 - ((60 - 100) * 0.2) = 16
+So by changing the setpoint from 97 to 60, there is an increase of the
+next P-State from 9 to 16. So this will make processor execute at higher
+P-State for the same CPU load. If the load continues to be more than the
+setpoint during next sample intervals, then P-State will go up again till the
+maximum P-State is reached. But the ramp up time to reach the maximum P-State
+will be much faster when the setpoint is 60 compared to 97.
+
+Debugging Intel P-State driver
+
+Event tracing
+To debug P-State transition, the Linux event tracing interface can be used.
+There are two specific events, which can be enabled (Provided the kernel
+configs related to event tracing are enabled).
+
+# cd /sys/kernel/debug/tracing/
+# echo 1 > events/power/pstate_sample/enable
+# echo 1 > events/power/cpu_frequency/enable
+# cat trace
+gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107
+ scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618
+ freq=2474476
+cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
+
+
+Using ftrace
+
+If function level tracing is required, the Linux ftrace interface can be used.
+For example if we want to check how often a function to set a P-State is
+called, we can set ftrace filter to intel_pstate_set_pstate.
+
+# cd /sys/kernel/debug/tracing/
+# cat available_filter_functions | grep -i pstate
+intel_pstate_set_pstate
+intel_pstate_cpu_init
+...
+
+# echo intel_pstate_set_pstate > set_ftrace_filter
+# echo function > current_tracer
+# cat trace | head -15
+# tracer: function
+#
+# entries-in-buffer/entries-written: 80/80 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
+ gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
+ gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
+ <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt
index 9e3c3b3..0a94224 100644
--- a/Documentation/cpu-freq/pcc-cpufreq.txt
+++ b/Documentation/cpu-freq/pcc-cpufreq.txt
@@ -159,8 +159,8 @@ to be strictly associated with a P-state.
2.2 cpuinfo_transition_latency:
-------------------------------
-The cpuinfo_transition_latency field is 0. The PCC specification does
-not include a field to expose this value currently.
+The cpuinfo_transition_latency field is CPUFREQ_ETERNAL. The PCC specification
+does not include a field to expose this value currently.
2.3 cpuinfo_cur_freq:
---------------------
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 3a07a87..6aca64f 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -242,6 +242,23 @@ nodes to be present and contain the properties described below.
Definition: Specifies the syscon node controlling the cpu core
power domains.
+ - dynamic-power-coefficient
+ Usage: optional
+ Value type: <prop-encoded-array>
+ Definition: A u32 value that represents the running time dynamic
+ power coefficient in units of mW/MHz/uVolt^2. The
+ coefficient can either be calculated from power
+ measurements or derived by analysis.
+
+ The dynamic power consumption of the CPU is
+ proportional to the square of the Voltage (V) and
+ the clock frequency (f). The coefficient is used to
+ calculate the dynamic power as below -
+
+ Pdyn = dynamic-power-coefficient * V^2 * f
+
+ where voltage is in uV, frequency is in MHz.
+
Example 1 (dual-cluster big.LITTLE system 32-bit):
cpus {
diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2c2x0.txt
index 06c88a4..fe0398c 100644
--- a/Documentation/devicetree/bindings/arm/l2cc.txt
+++ b/Documentation/devicetree/bindings/arm/l2c2x0.txt
@@ -1,7 +1,8 @@
* ARM L2 Cache Controller
-ARM cores often have a separate level 2 cache controller. There are various
-implementations of the L2 cache controller with compatible programming models.
+ARM cores often have a separate L2C210/L2C220/L2C310 (also known as PL210/PL220/
+PL310 and variants) based level 2 cache controller. All these various implementations
+of the L2 cache controller have compatible programming models (Note 1).
Some of the properties that are just prefixed "cache-*" are taken from section
3.7.3 of the ePAPR v1.1 specification which can be found at:
https://www.power.org/wp-content/uploads/2012/06/Power_ePAPR_APPROVED_v1.1.pdf
@@ -67,12 +68,17 @@ Optional properties:
disable if zero.
- arm,prefetch-offset : Override prefetch offset value. Valid values are
0-7, 15, 23, and 31.
-- arm,shared-override : The default behavior of the pl310 cache controller with
- respect to the shareable attribute is to transform "normal memory
- non-cacheable transactions" into "cacheable no allocate" (for reads) or
- "write through no write allocate" (for writes).
+- arm,shared-override : The default behavior of the L220 or PL310 cache
+ controllers with respect to the shareable attribute is to transform "normal
+ memory non-cacheable transactions" into "cacheable no allocate" (for reads)
+ or "write through no write allocate" (for writes).
On systems where this may cause DMA buffer corruption, this property must be
specified to indicate that such transforms are precluded.
+- arm,parity-enable : enable parity checking on the L2 cache (L220 or PL310).
+- arm,parity-disable : disable parity checking on the L2 cache (L220 or PL310).
+- arm,outer-sync-disable : disable the outer sync operation on the L2 cache.
+ Some core tiles, especially ARM PB11MPCore have a faulty L220 cache that
+ will randomly hang unless outer sync operations are disabled.
- prefetch-data : Data prefetch. Value: <0> (forcibly disable), <1>
(forcibly enable), property absent (retain settings set by firmware)
- prefetch-instr : Instruction prefetch. Value: <0> (forcibly disable),
@@ -91,3 +97,9 @@ L2: cache-controller {
cache-level = <2>;
interrupts = <45>;
};
+
+Note 1: The description in this document doesn't apply to integrated L2
+ cache controllers as found in e.g. Cortex-A15/A7/A57/A53. These
+ integrated L2 controllers are assumed to be all preconfigured by
+ early secure boot code. Thus no need to deal with their configuration
+ in the kernel at all.
diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
index 97ba45a..5651883 100644
--- a/Documentation/devicetree/bindings/arm/pmu.txt
+++ b/Documentation/devicetree/bindings/arm/pmu.txt
@@ -9,8 +9,9 @@ Required properties:
- compatible : should be one of
"apm,potenza-pmu"
"arm,armv8-pmuv3"
- "arm.cortex-a57-pmu"
- "arm.cortex-a53-pmu"
+ "arm,cortex-a72-pmu"
+ "arm,cortex-a57-pmu"
+ "arm,cortex-a53-pmu"
"arm,cortex-a17-pmu"
"arm,cortex-a15-pmu"
"arm,cortex-a12-pmu"
diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt
new file mode 100644
index 0000000..d91a02a
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt
@@ -0,0 +1,91 @@
+Binding for ST's CPUFreq driver
+===============================
+
+ST's CPUFreq driver attempts to read 'process' and 'version' attributes
+from the SoC, then supplies the OPP framework with 'prop' and 'supported
+hardware' information respectively. The framework is then able to read
+the DT and operate in the usual way.
+
+For more information about the expected DT format [See: ../opp/opp.txt].
+
+Frequency Scaling only
+----------------------
+
+No vendor specific driver required for this.
+
+Located in CPU's node:
+
+- operating-points : [See: ../power/opp.txt]
+
+Example [safe]
+--------------
+
+cpus {
+ cpu@0 {
+ /* kHz uV */
+ operating-points = <1500000 0
+ 1200000 0
+ 800000 0
+ 500000 0>;
+ };
+};
+
+Dynamic Voltage and Frequency Scaling (DVFS)
+--------------------------------------------
+
+This requires the ST CPUFreq driver to supply 'process' and 'version' info.
+
+Located in CPU's node:
+
+- operating-points-v2 : [See ../power/opp.txt]
+
+Example [unsafe]
+----------------
+
+cpus {
+ cpu@0 {
+ operating-points-v2 = <&cpu0_opp_table>;
+ };
+};
+
+cpu0_opp_table: opp_table {
+ compatible = "operating-points-v2";
+
+ /* ############################################################### */
+ /* # WARNING: Do not attempt to copy/replicate these nodes, # */
+ /* # they are only to be supplied by the bootloader !!! # */
+ /* ############################################################### */
+ opp0 {
+ /* Major Minor Substrate */
+ /* 2 all all */
+ opp-supported-hw = <0x00000004 0xffffffff 0xffffffff>;
+ opp-hz = /bits/ 64 <1500000000>;
+ clock-latency-ns = <10000000>;
+
+ opp-microvolt-pcode0 = <1200000>;
+ opp-microvolt-pcode1 = <1200000>;
+ opp-microvolt-pcode2 = <1200000>;
+ opp-microvolt-pcode3 = <1200000>;
+ opp-microvolt-pcode4 = <1170000>;
+ opp-microvolt-pcode5 = <1140000>;
+ opp-microvolt-pcode6 = <1100000>;
+ opp-microvolt-pcode7 = <1070000>;
+ };
+
+ opp1 {
+ /* Major Minor Substrate */
+ /* all all all */
+ opp-supported-hw = <0xffffffff 0xffffffff 0xffffffff>;
+ opp-hz = /bits/ 64 <1200000000>;
+ clock-latency-ns = <10000000>;
+
+ opp-microvolt-pcode0 = <1110000>;
+ opp-microvolt-pcode1 = <1150000>;
+ opp-microvolt-pcode2 = <1100000>;
+ opp-microvolt-pcode3 = <1080000>;
+ opp-microvolt-pcode4 = <1040000>;
+ opp-microvolt-pcode5 = <1020000>;
+ opp-microvolt-pcode6 = <980000>;
+ opp-microvolt-pcode7 = <930000>;
+ };
+};
diff --git a/Documentation/devicetree/bindings/crypto/rockchip-crypto.txt b/Documentation/devicetree/bindings/crypto/rockchip-crypto.txt
new file mode 100644
index 0000000..096df34
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/rockchip-crypto.txt
@@ -0,0 +1,29 @@
+Rockchip Electronics And Security Accelerator
+
+Required properties:
+- compatible: Should be "rockchip,rk3288-crypto"
+- reg: Base physical address of the engine and length of memory mapped
+ region
+- interrupts: Interrupt number
+- clocks: Reference to the clocks about crypto
+- clock-names: "aclk" used to clock data
+ "hclk" used to clock data
+ "sclk" used to clock crypto accelerator
+ "apb_pclk" used to clock dma
+- resets: Must contain an entry for each entry in reset-names.
+ See ../reset/reset.txt for details.
+- reset-names: Must include the name "crypto-rst".
+
+Examples:
+
+ crypto: cypto-controller@ff8a0000 {
+ compatible = "rockchip,rk3288-crypto";
+ reg = <0xff8a0000 0x4000>;
+ interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cru ACLK_CRYPTO>, <&cru HCLK_CRYPTO>,
+ <&cru SCLK_CRYPTO>, <&cru ACLK_DMAC1>;
+ clock-names = "aclk", "hclk", "sclk", "apb_pclk";
+ resets = <&cru SRST_CRYPTO>;
+ reset-names = "crypto-rst";
+ status = "okay";
+ };
diff --git a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt
index 040f365..e7780a1 100644
--- a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt
+++ b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt
@@ -1,7 +1,13 @@
* Renesas USB DMA Controller Device Tree bindings
Required Properties:
-- compatible: must contain "renesas,usb-dmac"
+-compatible: "renesas,<soctype>-usb-dmac", "renesas,usb-dmac" as fallback.
+ Examples with soctypes are:
+ - "renesas,r8a7790-usb-dmac" (R-Car H2)
+ - "renesas,r8a7791-usb-dmac" (R-Car M2-W)
+ - "renesas,r8a7793-usb-dmac" (R-Car M2-N)
+ - "renesas,r8a7794-usb-dmac" (R-Car E2)
+ - "renesas,r8a7795-usb-dmac" (R-Car H3)
- reg: base address and length of the registers block for the DMAC
- interrupts: interrupt specifiers for the DMAC, one for each entry in
interrupt-names.
@@ -15,7 +21,7 @@ Required Properties:
Example: R8A7790 (R-Car H2) USB-DMACs
usb_dmac0: dma-controller@e65a0000 {
- compatible = "renesas,usb-dmac";
+ compatible = "renesas,r8a7790-usb-dmac", "renesas,usb-dmac";
reg = <0 0xe65a0000 0 0x100>;
interrupts = <0 109 IRQ_TYPE_LEVEL_HIGH
0 109 IRQ_TYPE_LEVEL_HIGH>;
diff --git a/Documentation/devicetree/bindings/dma/stm32-dma.txt b/Documentation/devicetree/bindings/dma/stm32-dma.txt
new file mode 100644
index 0000000..70cd13f
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/stm32-dma.txt
@@ -0,0 +1,82 @@
+* STMicroelectronics STM32 DMA controller
+
+The STM32 DMA is a general-purpose direct memory access controller capable of
+supporting 8 independent DMA channels. Each channel can have up to 8 requests.
+
+Required properties:
+- compatible: Should be "st,stm32-dma"
+- reg: Should contain DMA registers location and length. This should include
+ all of the per-channel registers.
+- interrupts: Should contain all of the per-channel DMA interrupts in
+ ascending order with respect to the DMA channel index.
+- clocks: Should contain the input clock of the DMA instance.
+- #dma-cells : Must be <4>. See DMA client paragraph for more details.
+
+Optional properties:
+- resets: Reference to a reset controller asserting the DMA controller
+- st,mem2mem: boolean; if defined, it indicates that the controller supports
+ memory-to-memory transfer
+
+Example:
+
+ dma2: dma-controller@40026400 {
+ compatible = "st,stm32-dma";
+ reg = <0x40026400 0x400>;
+ interrupts = <56>,
+ <57>,
+ <58>,
+ <59>,
+ <60>,
+ <68>,
+ <69>,
+ <70>;
+ clocks = <&clk_hclk>;
+ #dma-cells = <4>;
+ st,mem2mem;
+ resets = <&rcc 150>;
+ };
+
+* DMA client
+
+DMA clients connected to the STM32 DMA controller must use the format
+described in the dma.txt file, using a five-cell specifier for each
+channel: a phandle plus four integer cells.
+The four cells in order are:
+
+1. The channel id
+2. The request line number
+3. A 32bit mask specifying the DMA channel configuration which are device
+ dependent:
+ -bit 9: Peripheral Increment Address
+ 0x0: no address increment between transfers
+ 0x1: increment address between transfers
+ -bit 10: Memory Increment Address
+ 0x0: no address increment between transfers
+ 0x1: increment address between transfers
+ -bit 15: Peripheral Increment Offset Size
+ 0x0: offset size is linked to the peripheral bus width
+ 0x1: offset size is fixed to 4 (32-bit alignment)
+ -bit 16-17: Priority level
+ 0x0: low
+ 0x1: medium
+ 0x2: high
+ 0x3: very high
+5. A 32bit mask specifying the DMA FIFO threshold configuration which are device
+ dependent:
+ -bit 0-1: Fifo threshold
+ 0x0: 1/4 full FIFO
+ 0x1: 1/2 full FIFO
+ 0x2: 3/4 full FIFO
+ 0x3: full FIFO
+
+Example:
+
+ usart1: serial@40011000 {
+ compatible = "st,stm32-usart", "st,stm32-uart";
+ reg = <0x40011000 0x400>;
+ interrupts = <37>;
+ clocks = <&clk_pclk2>;
+ dmas = <&dma2 2 4 0x10400 0x3>,
+ <&dma2 7 5 0x10200 0x3>;
+ dma-names = "rx", "tx";
+ };
diff --git a/Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt b/Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt
index b152a75..aead586 100644
--- a/Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt
+++ b/Documentation/devicetree/bindings/dma/ti-dma-crossbar.txt
@@ -14,6 +14,10 @@ The DMA controller node need to have the following poroperties:
Optional properties:
- ti,dma-safe-map: Safe routing value for unused request lines
+- ti,reserved-dma-request-ranges: DMA request ranges which should not be used
+ when mapping xbar input to DMA request, they are either
+ allocated to be used by for example the DSP or they are used as
+ memcpy channels in eDMA.
Notes:
When requesting channel via ti,dra7-dma-crossbar, the DMA clinet must request
@@ -46,6 +50,8 @@ sdma_xbar: dma-router@4a002b78 {
#dma-cells = <1>;
dma-requests = <205>;
ti,dma-safe-map = <0>;
+ /* Protect the sDMA request ranges: 10-14 and 100-126 */
+ ti,reserved-dma-request-ranges = <10 5>, <100 27>;
dma-masters = <&sdma>;
};
diff --git a/Documentation/devicetree/bindings/extcon/extcon-arizona.txt b/Documentation/devicetree/bindings/extcon/extcon-arizona.txt
index e1705fa..e27341f 100644
--- a/Documentation/devicetree/bindings/extcon/extcon-arizona.txt
+++ b/Documentation/devicetree/bindings/extcon/extcon-arizona.txt
@@ -13,3 +13,63 @@ Optional properties:
ARIZONA_ACCDET_MODE_HPR or 2 - Headphone detect mode is set to HPDETR
If this node is not mentioned or if the value is unknown, then
headphone detection mode is set to HPDETL.
+
+ - wlf,use-jd2 : Use the additional JD input along with JD1 for dual pin jack
+ detection.
+ - wlf,use-jd2-nopull : Internal pull on JD2 is disabled when used for
+ jack detection.
+ - wlf,jd-invert : Invert the polarity of the jack detection switch
+
+ - wlf,micd-software-compare : Use a software comparison to determine mic
+ presence
+ - wlf,micd-detect-debounce : Additional software microphone detection
+ debounce specified in milliseconds.
+ - wlf,micd-pol-gpio : GPIO specifier for the GPIO controlling the headset
+ polarity if one exists.
+ - wlf,micd-bias-start-time : Time allowed for MICBIAS to startup prior to
+ performing microphone detection, specified as per the ARIZONA_MICD_TIME_XXX
+ defines.
+ - wlf,micd-rate : Delay between successive microphone detection measurements,
+ specified as per the ARIZONA_MICD_TIME_XXX defines.
+ - wlf,micd-dbtime : Microphone detection hardware debounces specified as the
+ number of measurements to take, valid values being 2 and 4.
+ - wlf,micd-timeout-ms : Timeout for microphone detection, specified in
+ milliseconds.
+ - wlf,micd-force-micbias : Force MICBIAS continuously on during microphone
+ detection.
+ - wlf,micd-configs : Headset polarity configurations (generally used for
+ detection of CTIA / OMTP headsets), the field can be of variable length
+ but should always be a multiple of 3 cells long, each three cell group
+ represents one polarity configuration.
+ The first cell defines the accessory detection pin, zero will use MICDET1
+ and all other values will use MICDET2.
+ The second cell represents the MICBIAS to be used.
+ The third cell represents the value of the micd-pol-gpio pin.
+
+ - wlf,gpsw : Settings for the general purpose switch
+
+Example:
+
+codec: wm8280@0 {
+ compatible = "wlf,wm8280";
+ reg = <0>;
+ ...
+
+ wlf,use-jd2;
+ wlf,use-jd2-nopull;
+ wlf,jd-invert;
+
+ wlf,micd-software-compare;
+ wlf,micd-detect-debounce = <0>;
+ wlf,micd-pol-gpio = <&codec 2 0>;
+ wlf,micd-rate = <ARIZONA_MICD_TIME_8MS>;
+ wlf,micd-dbtime = <4>;
+ wlf,micd-timeout-ms = <100>;
+ wlf,micd-force-micbias;
+ wlf,micd-configs = <
+ 0 1 0 /* MICDET1 MICBIAS1 GPIO=low */
+ 1 2 1 /* MICDET2 MICBIAS2 GPIO=high */
+ >;
+
+ wlf,gpsw = <0>;
+};
diff --git a/Documentation/devicetree/bindings/extcon/extcon-max3355.txt b/Documentation/devicetree/bindings/extcon/extcon-max3355.txt
new file mode 100644
index 0000000..f2288ea
--- /dev/null
+++ b/Documentation/devicetree/bindings/extcon/extcon-max3355.txt
@@ -0,0 +1,21 @@
+Maxim Integrated MAX3355 USB OTG chip
+-------------------------------------
+
+MAX3355 integrates a charge pump and comparators to enable a system with an
+integrated USB OTG dual-role transceiver to function as a USB OTG dual-role
+device.
+
+Required properties:
+- compatible: should be "maxim,max3355";
+- maxim,shdn-gpios: should contain a phandle and GPIO specifier for the GPIO pin
+ connected to the MAX3355's SHDN# pin;
+- id-gpios: should contain a phandle and GPIO specifier for the GPIO pin
+ connected to the MAX3355's ID_OUT pin.
+
+Example:
+
+ usb-otg {
+ compatible = "maxim,max3355";
+ maxim,shdn-gpios = <&gpio2 4 GPIO_ACTIVE_LOW>;
+ id-gpios = <&gpio5 31 GPIO_ACTIVE_HIGH>;
+ };
diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
index c50cf13..f6fec95 100644
--- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt
+++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
@@ -20,6 +20,7 @@ adi,adt7476 +/-1C TDM Extended Temp Range I.C
adi,adt7490 +/-1C TDM Extended Temp Range I.C
adi,adxl345 Three-Axis Digital Accelerometer
adi,adxl346 Three-Axis Digital Accelerometer (backward-compatibility value "adi,adxl345" must be listed too)
+ams,iaq-core AMS iAQ-Core VOC Sensor
at,24c08 i2c serial eeprom (24cxx)
atmel,24c00 i2c serial eeprom (24cxx)
atmel,24c01 i2c serial eeprom (24cxx)
diff --git a/Documentation/devicetree/bindings/iio/accel/mma8452.txt b/Documentation/devicetree/bindings/iio/accel/mma8452.txt
index e3c3746..3c10e85 100644
--- a/Documentation/devicetree/bindings/iio/accel/mma8452.txt
+++ b/Documentation/devicetree/bindings/iio/accel/mma8452.txt
@@ -7,13 +7,18 @@ Required properties:
* "fsl,mma8453"
* "fsl,mma8652"
* "fsl,mma8653"
+
- reg: the I2C address of the chip
Optional properties:
- interrupt-parent: should be the phandle for the interrupt controller
+
- interrupts: interrupt mapping for GPIO IRQ
+ - interrupt-names: should contain "INT1" and/or "INT2", the accelerometer's
+ interrupt line in use.
+
Example:
mma8453fc@1d {
@@ -21,4 +26,5 @@ Example:
reg = <0x1d>;
interrupt-parent = <&gpio1>;
interrupts = <5 0>;
+ interrupt-names = "INT2";
};
diff --git a/Documentation/devicetree/bindings/iio/adc/imx7d-adc.txt b/Documentation/devicetree/bindings/iio/adc/imx7d-adc.txt
new file mode 100644
index 0000000..5c184b9
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/imx7d-adc.txt
@@ -0,0 +1,22 @@
+Freescale imx7d ADC bindings
+
+The devicetree bindings are for the ADC driver written for
+imx7d SoC.
+
+Required properties:
+- compatible: Should be "fsl,imx7d-adc"
+- reg: Offset and length of the register set for the ADC device
+- interrupts: The interrupt number for the ADC device
+- clocks: The root clock of the ADC controller
+- clock-names: Must contain "adc", matching entry in the clocks property
+- vref-supply: The regulator supply ADC reference voltage
+
+Example:
+adc1: adc@30610000 {
+ compatible = "fsl,imx7d-adc";
+ reg = <0x30610000 0x10000>;
+ interrupts = <GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&clks IMX7D_ADC_ROOT_CLK>;
+ clock-names = "adc";
+ vref-supply = <&reg_vcc_3v3_mcu>;
+};
diff --git a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
index 2a1f3af..bcd3ac8 100644
--- a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
+++ b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
@@ -10,16 +10,28 @@ must be specified.
Required properties:
- compatible: Must be one of the following, depending on the
model:
- "mcp3001"
- "mcp3002"
- "mcp3004"
- "mcp3008"
- "mcp3201"
- "mcp3202"
- "mcp3204"
- "mcp3208"
- "mcp3301"
+ "mcp3001" (DEPRECATED)
+ "mcp3002" (DEPRECATED)
+ "mcp3004" (DEPRECATED)
+ "mcp3008" (DEPRECATED)
+ "mcp3201" (DEPRECATED)
+ "mcp3202" (DEPRECATED)
+ "mcp3204" (DEPRECATED)
+ "mcp3208" (DEPRECATED)
+ "mcp3301" (DEPRECATED)
+ "microchip,mcp3001"
+ "microchip,mcp3002"
+ "microchip,mcp3004"
+ "microchip,mcp3008"
+ "microchip,mcp3201"
+ "microchip,mcp3202"
+ "microchip,mcp3204"
+ "microchip,mcp3208"
+ "microchip,mcp3301"
+
+ NOTE: The use of the compatibles with no vendor prefix
+ is deprecated and only listed because old DT use them.
Examples:
spi_controller {
diff --git a/Documentation/devicetree/bindings/iio/adc/mcp3422.txt b/Documentation/devicetree/bindings/iio/adc/mcp3422.txt
index 333139c..dcae4cc 100644
--- a/Documentation/devicetree/bindings/iio/adc/mcp3422.txt
+++ b/Documentation/devicetree/bindings/iio/adc/mcp3422.txt
@@ -1,7 +1,8 @@
-* Microchip mcp3422/3/4/6/7/8 chip family (ADC)
+* Microchip mcp3421/2/3/4/6/7/8 chip family (ADC)
Required properties:
- compatible: Should be
+ "microchip,mcp3421" or
"microchip,mcp3422" or
"microchip,mcp3423" or
"microchip,mcp3424" or
diff --git a/Documentation/devicetree/bindings/iio/adc/palmas-gpadc.txt b/Documentation/devicetree/bindings/iio/adc/palmas-gpadc.txt
new file mode 100644
index 0000000..4bb9a86
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/palmas-gpadc.txt
@@ -0,0 +1,48 @@
+* Palmas general purpose ADC IP block devicetree bindings
+
+Channels list:
+ 0 battery type
+ 1 battery temp NTC (optional current source)
+ 2 GP
+ 3 temp (with ext. diode, optional current source)
+ 4 GP
+ 5 GP
+ 6 VBAT_SENSE
+ 7 VCC_SENSE
+ 8 Backup Battery voltage
+ 9 external charger (VCHG)
+ 10 VBUS
+ 11 DC-DC current probe (how does this work?)
+ 12 internal die temp
+ 13 internal die temp
+ 14 USB ID pin voltage
+ 15 test network
+
+Required properties:
+- compatible : Must be "ti,palmas-gpadc".
+- #io-channel-cells: Should be set to <1>.
+
+Optional sub-nodes:
+ti,channel0-current-microamp: Channel 0 current in uA.
+ Values are rounded to derive 0uA, 5uA, 15uA, 20uA.
+ti,channel3-current-microamp: Channel 3 current in uA.
+ Values are rounded to derive 0uA, 10uA, 400uA, 800uA.
+ti,enable-extended-delay: Enable extended delay.
+
+Example:
+
+pmic {
+ compatible = "ti,twl6035-pmic", "ti,palmas-pmic";
+ ...
+ gpadc {
+ compatible = "ti,palmas-gpadc";
+ interrupts = <18 0
+ 16 0
+ 17 0>;
+ #io-channel-cells = <1>;
+ ti,channel0-current-microamp = <5>;
+ ti,channel3-current-microamp = <10>;
+ };
+ };
+ ...
+};
diff --git a/Documentation/devicetree/bindings/iio/adc/ti-adc128s052.txt b/Documentation/devicetree/bindings/iio/adc/ti-adc128s052.txt
index 15ca6b4..daa2b2c 100644
--- a/Documentation/devicetree/bindings/iio/adc/ti-adc128s052.txt
+++ b/Documentation/devicetree/bindings/iio/adc/ti-adc128s052.txt
@@ -1,7 +1,7 @@
-* Texas Instruments' ADC128S052 and ADC122S021 ADC chip
+* Texas Instruments' ADC128S052, ADC122S021 and ADC124S021 ADC chip
Required properties:
- - compatible: Should be "ti,adc128s052" or "ti,adc122s021"
+ - compatible: Should be "ti,adc128s052", "ti,adc122s021" or "ti,adc124s021"
- reg: spi chip select number for the device
- vref-supply: The regulator supply for ADC reference voltage
diff --git a/Documentation/devicetree/bindings/iio/adc/ti-ads8688.txt b/Documentation/devicetree/bindings/iio/adc/ti-ads8688.txt
new file mode 100644
index 0000000..a02337d
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/ti-ads8688.txt
@@ -0,0 +1,20 @@
+* Texas Instruments' ADS8684 and ADS8688 ADC chip
+
+Required properties:
+ - compatible: Should be "ti,ads8684" or "ti,ads8688"
+ - reg: spi chip select number for the device
+
+Recommended properties:
+ - spi-max-frequency: Definition as per
+ Documentation/devicetree/bindings/spi/spi-bus.txt
+
+Optional properties:
+ - vref-supply: The regulator supply for ADC reference voltage
+
+Example:
+adc@0 {
+ compatible = "ti,ads8688";
+ reg = <0>;
+ vref-supply = <&vdd_supply>;
+ spi-max-frequency = <1000000>;
+};
diff --git a/Documentation/devicetree/bindings/iio/health/max30100.txt b/Documentation/devicetree/bindings/iio/health/max30100.txt
new file mode 100644
index 0000000..f6fbac6
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/health/max30100.txt
@@ -0,0 +1,21 @@
+Maxim MAX30100 heart rate and pulse oximeter sensor
+
+* https://datasheets.maximintegrated.com/en/ds/MAX30100.pdf
+
+Required properties:
+ - compatible: must be "maxim,max30100"
+ - reg: the I2C address of the sensor
+ - interrupt-parent: should be the phandle for the interrupt controller
+ - interrupts: the sole interrupt generated by the device
+
+ Refer to interrupt-controller/interrupts.txt for generic
+ interrupt client node bindings.
+
+Example:
+
+max30100@057 {
+ compatible = "maxim,max30100";
+ reg = <57>;
+ interrupt-parent = <&gpio1>;
+ interrupts = <16 2>;
+};
diff --git a/Documentation/devicetree/bindings/iio/light/us5182d.txt b/Documentation/devicetree/bindings/iio/light/us5182d.txt
index 6f0a530..a619799 100644
--- a/Documentation/devicetree/bindings/iio/light/us5182d.txt
+++ b/Documentation/devicetree/bindings/iio/light/us5182d.txt
@@ -7,13 +7,24 @@ Required properties:
Optional properties:
- upisemi,glass-coef: glass attenuation factor - compensation factor of
resolution 1000 for material transmittance.
+
- upisemi,dark-ths: array of 8 elements containing 16-bit thresholds (adc
counts) corresponding to every scale.
+
- upisemi,upper-dark-gain: 8-bit dark gain compensation factor(4 int and 4
fractional bits - Q4.4) applied when light > threshold
+
- upisemi,lower-dark-gain: 8-bit dark gain compensation factor(4 int and 4
fractional bits - Q4.4) applied when light < threshold
+- upisemi,continuous: This chip has two power modes: one-shot (chip takes one
+ measurement and then shuts itself down) and continuous (
+ chip takes continuous measurements). The one-shot mode is
+ more power-friendly but the continuous mode may be more
+ reliable. If this property is specified the continuous
+ mode will be used instead of the default one-shot one for
+ raw reads.
+
If the optional properties are not specified these factors will default to the
values in the below example.
The glass-coef defaults to no compensation for the covering material.
diff --git a/Documentation/devicetree/bindings/iio/st-sensors.txt b/Documentation/devicetree/bindings/iio/st-sensors.txt
index d3ccdb1..d4b87cc 100644
--- a/Documentation/devicetree/bindings/iio/st-sensors.txt
+++ b/Documentation/devicetree/bindings/iio/st-sensors.txt
@@ -36,6 +36,7 @@ Accelerometers:
- st,lsm303dlm-accel
- st,lsm330-accel
- st,lsm303agr-accel
+- st,lis2dh12-accel
Gyroscopes:
- st,l3g4200d-gyro
diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index 8ba98ee..c98757a 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -13,6 +13,17 @@ Required properties:
- interrupt-parent : Interrupt controller to which the chip is connected
- interrupts : Interrupt to which the chip is connected
+Optional properties:
+
+ - irq-gpios : GPIO pin used for IRQ. The driver uses the
+ interrupt gpio pin as output to reset the device.
+ - reset-gpios : GPIO pin used for reset
+
+ - touchscreen-inverted-x : X axis is inverted (boolean)
+ - touchscreen-inverted-y : Y axis is inverted (boolean)
+ - touchscreen-swapped-x-y : X and Y axis are swapped (boolean)
+ (swapping is done after inverting the axis)
+
Example:
i2c@00000000 {
@@ -23,6 +34,9 @@ Example:
reg = <0x5d>;
interrupt-parent = <&gpio>;
interrupts = <0 0>;
+
+ irq-gpios = <&gpio1 0 0>;
+ reset-gpios = <&gpio1 1 0>;
};
/* ... */
diff --git a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt b/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
index 8eb240a..697a3e7 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
@@ -9,7 +9,9 @@ Required properties:
- touchscreen-size-y: vertical resolution of touchscreen (in pixels)
Optional properties:
-- reset-gpio: GPIO connected to the RESET line of the chip
+- reset-gpios: GPIO connected to the RESET line of the chip
+- enable-gpios: GPIO connected to the ENABLE line of the chip
+- wake-gpios: GPIO connected to the WAKE line of the chip
Example:
diff --git a/Documentation/devicetree/bindings/input/touchscreen/ts4800-ts.txt b/Documentation/devicetree/bindings/input/touchscreen/ts4800-ts.txt
new file mode 100644
index 0000000..4c1c092
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/touchscreen/ts4800-ts.txt
@@ -0,0 +1,11 @@
+* TS-4800 Touchscreen bindings
+
+Required properties:
+- compatible: must be "technologic,ts4800-ts"
+- reg: physical base address of the controller and length of memory mapped
+ region.
+- syscon: phandle / integers array that points to the syscon node which
+ describes the FPGA's syscon registers.
+ - phandle to FPGA's syscon
+ - offset to the touchscreen register
+ - offset to the touchscreen enable bit
diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index 04e6bef..5fdbbcd 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -31,6 +31,8 @@ A switch child node has the following optional property:
switch. Must be set if the switch can not detect
the presence and/or size of a connected EEPROM,
otherwise optional.
+- reset-gpios : phandle and specifier to a gpio line connected to
+ reset pin of the switch chip.
A switch may have multiple "port" children nodes
@@ -114,6 +116,7 @@ Example:
#size-cells = <0>;
reg = <17 1>; /* MDIO address 17, switch 1 in tree */
mii-bus = <&mii_bus1>;
+ reset-gpios = <&gpio5 1 GPIO_ACTIVE_LOW>;
switch1port0: port@0 {
reg = <0>;
diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt
index 9c23fdf..4a7ede9 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt
@@ -1,7 +1,12 @@
Hisilicon MDIO bus controller
Properties:
-- compatible: "hisilicon,mdio","hisilicon,hns-mdio".
+- compatible: can be one of:
+ "hisilicon,hns-mdio"
+ "hisilicon,mdio"
+ "hisilicon,hns-mdio" is recommended to be used for hip05 and later SOCs,
+ while "hisilicon,mdio" is optional for backwards compatibility only on
+ hip04 Soc.
- reg: The base address of the MDIO bus controller register bank.
- #address-cells: Must be <1>.
- #size-cells: Must be <0>. MDIO addresses have no size component.
diff --git a/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt b/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt
new file mode 100644
index 0000000..dea5124
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ieee802154/adf7242.txt
@@ -0,0 +1,18 @@
+* ADF7242 IEEE 802.15.4 *
+
+Required properties:
+ - compatible: should be "adi,adf7242"
+ - spi-max-frequency: maximal bus speed (12.5 MHz)
+ - reg: the chipselect index
+ - interrupts: the interrupt generated by the device via pin IRQ1.
+ IRQ_TYPE_LEVEL_HIGH (4) or IRQ_TYPE_EDGE_FALLING (1)
+
+Example:
+
+ adf7242@0 {
+ compatible = "adi,adf7242";
+ spi-max-frequency = <10000000>;
+ reg = <0>;
+ interrupts = <98 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-parent = <&gpio3>;
+ };
diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt
index b5d7976..5c397ca 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -4,6 +4,7 @@ Required properties:
- compatible: Should be "cdns,[<chip>-]{macb|gem}"
Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs or the 10/100Mbit IP
available on sama5d3 SoCs.
+ Use "cdns,np4-macb" for NP4 SoC devices.
Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb".
Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
the Cadence GEM, or the generic form: "cdns,gem".
@@ -19,6 +20,9 @@ Required properties:
Optional elements: 'tx_clk'
- clocks: Phandles to input clocks.
+Optional properties for PHY child node:
+- reset-gpios : Should specify the gpio for phy reset
+
Examples:
macb0: ethernet@fffc4000 {
@@ -29,4 +33,8 @@ Examples:
local-mac-address = [3a 0e 03 04 05 06];
clock-names = "pclk", "hclk", "tx_clk";
clocks = <&clkc 30>, <&clkc 30>, <&clkc 13>;
+ ethernet-phy@1 {
+ reg = <0x1>;
+ reset-gpios = <&pioE 6 1>;
+ };
};
diff --git a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
index 692076f..f9c32ad 100644
--- a/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
+++ b/Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
@@ -1,8 +1,9 @@
Micrel KSZ9021/KSZ9031 Gigabit Ethernet PHY
-Some boards require special tuning values, particularly when it comes to
-clock delays. You can specify clock delay values by adding
-micrel-specific properties to an Ethernet OF device node.
+Some boards require special tuning values, particularly when it comes
+to clock delays. You can specify clock delay values in the PHY OF
+device node. Deprecated, but still supported, these properties can
+also be added to an Ethernet OF device node.
Note that these settings are applied after any phy-specific fixup from
phy_fixup_list (see phy_init_hw() from drivers/net/phy/phy_device.c),
@@ -57,16 +58,6 @@ KSZ9031:
Examples:
- /* Attach to an Ethernet device with autodetected PHY */
- &enet {
- rxc-skew-ps = <3000>;
- rxdv-skew-ps = <0>;
- txc-skew-ps = <3000>;
- txen-skew-ps = <0>;
- status = "okay";
- };
-
- /* Attach to an explicitly-specified PHY */
mdio {
phy0: ethernet-phy@0 {
rxc-skew-ps = <3000>;
diff --git a/Documentation/devicetree/bindings/net/nfc/st95hf.txt b/Documentation/devicetree/bindings/net/nfc/st95hf.txt
new file mode 100644
index 0000000..ea3178b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nfc/st95hf.txt
@@ -0,0 +1,50 @@
+* STMicroelectronics : NFC Transceiver ST95HF
+
+ST NFC Transceiver is required to attach with SPI bus.
+ST95HF node should be defined in DT as SPI slave device of SPI
+master with which ST95HF transceiver is physically connected.
+The properties defined below are required to be the part of DT
+to include ST95HF transceiver into the platform.
+
+Required properties:
+===================
+- reg: Address of SPI slave "ST95HF transceiver" on SPI master bus.
+
+- compatible: should be "st,st95hf" for ST95HF NFC transceiver
+
+- spi-max-frequency: Max. operating SPI frequency for ST95HF
+ transceiver.
+
+- enable-gpio: GPIO line to enable ST95HF transceiver.
+
+- interrupt-parent : Standard way to specify the controller to which
+ ST95HF transceiver's interrupt is routed.
+
+- interrupts : Standard way to define ST95HF transceiver's out
+ interrupt.
+
+Optional property:
+=================
+- st95hfvin-supply : This is an optional property. It contains a
+ phandle to ST95HF transceiver's regulator supply node in DT.
+
+Example:
+=======
+spi@9840000 {
+ reg = <0x9840000 0x110>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ cs-gpios = <&pio0 4>;
+ status = "okay";
+
+ st95hf@0{
+ reg = <0>;
+ compatible = "st,st95hf";
+ status = "okay";
+ spi-max-frequency = <1000000>;
+ enable-gpio = <&pio4 0>;
+ interrupt-parent = <&pio0>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ };
+
+};
diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index b486f3f..81a9f9e 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -5,8 +5,18 @@ interface contains.
Required properties:
- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 SoC.
+ "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
+ "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
+ "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
"renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
"renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
+ "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible interface.
+ "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible interface.
+
+ When compatible with the generic version, nodes must list the
+ SoC-specific version corresponding to the platform first
+ followed by the generic version.
+
- reg: offset and length of (1) the register block and (2) the stream buffer.
- interrupts: A list of interrupt-specifiers, one for each entry in
interrupt-names.
@@ -37,7 +47,7 @@ Optional properties:
Example:
ethernet@e6800000 {
- compatible = "renesas,etheravb-r8a7795";
+ compatible = "renesas,etheravb-r8a7795", "renesas,etheravb-rcar-gen3";
reg = <0 0xe6800000 0 0x800>, <0 0xe6a00000 0 0x10000>;
interrupt-parent = <&gic>;
interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>,
diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
index 3a9d679..72d82d6 100644
--- a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
@@ -11,6 +11,8 @@ Required properties:
designware version numbers documented in stmmac.txt
- altr,sysmgr-syscon : Should be the phandle to the system manager node that
encompasses the glue register, the register offset, and the register shift.
+ - altr,f2h_ptp_ref_clk use f2h_ptp_ref_clk instead of default eosc1 clock
+ for ptp ref clk. This affects all emacs as the clock is common.
Optional properties:
altr,emac-splitter: Should be the phandle to the emac splitter soft IP node if
diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
index f34fc3c..e862a92 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -35,18 +35,18 @@ Optional properties:
- reset-names: Should contain the reset signal name "stmmaceth", if a
reset phandle is given
- max-frame-size: See ethernet.txt file in the same directory
-- clocks: If present, the first clock should be the GMAC main clock and
- the second clock should be peripheral's register interface clock. Further
- clocks may be specified in derived bindings.
-- clock-names: One name for each entry in the clocks property, the
- first one should be "stmmaceth" and the second one should be "pclk".
-- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is
- available this clock is used for programming the Timestamp Addend Register.
- If not passed then the system clock will be used and this is fine on some
- platforms.
+- clocks: If present, the first clock should be the GMAC main clock
+ The optional second clock should be peripheral's register interface clock.
+ The third optional clock should be the ptp reference clock.
+ Further clocks may be specified in derived bindings.
+- clock-names: One name for each entry in the clocks property.
+ The first one should be "stmmaceth".
+ The optional second one should be "pclk".
+ The optional third one should be "clk_ptp_ref".
- snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register.
- tx-fifo-depth: See ethernet.txt file in the same directory
- rx-fifo-depth: See ethernet.txt file in the same directory
+- mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.
Examples:
@@ -65,4 +65,11 @@ Examples:
tx-fifo-depth = <16384>;
clocks = <&clock>;
clock-names = "stmmaceth";
+ mdio0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "snps,dwmac-mdio";
+ phy1: ethernet-phy@0 {
+ };
+ };
};
diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index 0cb44dc..601256f 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -45,21 +45,10 @@ Devices supporting OPPs must set their "operating-points-v2" property with
phandle to a OPP table in their DT node. The OPP core will use this phandle to
find the operating points for the device.
-Devices may want to choose OPP tables at runtime and so can provide a list of
-phandles here. But only *one* of them should be chosen at runtime. This must be
-accompanied by a corresponding "operating-points-names" property, to uniquely
-identify the OPP tables.
-
If required, this can be extended for SoC vendor specfic bindings. Such bindings
should be documented as Documentation/devicetree/bindings/power/<vendor>-opp.txt
and should have a compatible description like: "operating-points-v2-<vendor>".
-Optional properties:
-- operating-points-names: Names of OPP tables (required if multiple OPP
- tables are present), to uniquely identify them. The same list must be present
- for all the CPUs which are sharing clock/voltage rails and hence the OPP
- tables.
-
* OPP Table Node
This describes the OPPs belonging to a device. This node can have following
@@ -100,6 +89,14 @@ Optional properties:
Entries for multiple regulators must be present in the same order as
regulators are specified in device's DT node.
+- opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to
+ the above opp-microvolt property, but allows multiple voltage ranges to be
+ provided for the same OPP. At runtime, the platform can pick a <name> and
+ matching opp-microvolt-<name> property will be enabled for all OPPs. If the
+ platform doesn't pick a specific <name> or the <name> doesn't match with any
+ opp-microvolt-<name> properties, then opp-microvolt property shall be used, if
+ present.
+
- opp-microamp: The maximum current drawn by the device in microamperes
considering system specific parameters (such as transients, process, aging,
maximum operating temperature range etc.) as necessary. This may be used to
@@ -112,6 +109,9 @@ Optional properties:
for few regulators, then this should be marked as zero for them. If it isn't
required for any regulator, then this property need not be present.
+- opp-microamp-<name>: Named opp-microamp property. Similar to
+ opp-microvolt-<name> property, but for microamp instead.
+
- clock-latency-ns: Specifies the maximum possible transition latency (in
nanoseconds) for switching to this OPP from any other OPP.
@@ -123,6 +123,26 @@ Optional properties:
- opp-suspend: Marks the OPP to be used during device suspend. Only one OPP in
the table should have this.
+- opp-supported-hw: This enables us to select only a subset of OPPs from the
+ larger OPP table, based on what version of the hardware we are running on. We
+ still can't have multiple nodes with the same opp-hz value in OPP table.
+
+ It's an user defined array containing a hierarchy of hardware version numbers,
+ supported by the OPP. For example: a platform with hierarchy of three levels
+ of versions (A, B and C), this field should be like <X Y Z>, where X
+ corresponds to Version hierarchy A, Y corresponds to version hierarchy B and Z
+ corresponds to version hierarchy C.
+
+ Each level of hierarchy is represented by a 32 bit value, and so there can be
+ only 32 different supported version per hierarchy. i.e. 1 bit per version. A
+ value of 0xFFFFFFFF will enable the OPP for all versions for that hierarchy
+ level. And a value of 0x00000000 will disable the OPP completely, and so we
+ never want that to happen.
+
+ If 32 values aren't sufficient for a version hierarchy, than that version
+ hierarchy can be contained in multiple 32 bit values. i.e. <X Y Z1 Z2> in the
+ above example, Z1 & Z2 refer to the version hierarchy Z.
+
- status: Marks the node enabled/disabled.
Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
@@ -157,20 +177,20 @@ Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
compatible = "operating-points-v2";
opp-shared;
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
- opp01 {
+ opp@1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
- opp02 {
+ opp@1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
clock-latency-ns = <290000>;
@@ -236,20 +256,20 @@ independently.
* independently.
*/
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
- opp01 {
+ opp@1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
- opp02 {
+ opp@1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
opp-microamp = <90000;
@@ -312,20 +332,20 @@ DVFS state together.
compatible = "operating-points-v2";
opp-shared;
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000>;
clock-latency-ns = <300000>;
opp-suspend;
};
- opp01 {
+ opp@1100000000 {
opp-hz = /bits/ 64 <1100000000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000>;
clock-latency-ns = <310000>;
};
- opp02 {
+ opp@1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <1025000>;
opp-microamp = <90000>;
@@ -338,20 +358,20 @@ DVFS state together.
compatible = "operating-points-v2";
opp-shared;
- opp10 {
+ opp@1300000000 {
opp-hz = /bits/ 64 <1300000000>;
opp-microvolt = <1045000 1050000 1055000>;
opp-microamp = <95000>;
clock-latency-ns = <400000>;
opp-suspend;
};
- opp11 {
+ opp@1400000000 {
opp-hz = /bits/ 64 <1400000000>;
opp-microvolt = <1075000>;
opp-microamp = <100000>;
clock-latency-ns = <400000>;
};
- opp12 {
+ opp@1500000000 {
opp-hz = /bits/ 64 <1500000000>;
opp-microvolt = <1010000 1100000 1110000>;
opp-microamp = <95000>;
@@ -378,7 +398,7 @@ Example 4: Handling multiple regulators
compatible = "operating-points-v2";
opp-shared;
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000>, /* Supply 0 */
<960000>, /* Supply 1 */
@@ -391,7 +411,7 @@ Example 4: Handling multiple regulators
/* OR */
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000 975000 985000>, /* Supply 0 */
<960000 965000 975000>, /* Supply 1 */
@@ -404,7 +424,7 @@ Example 4: Handling multiple regulators
/* OR */
- opp00 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <970000 975000 985000>, /* Supply 0 */
<960000 965000 975000>, /* Supply 1 */
@@ -417,7 +437,8 @@ Example 4: Handling multiple regulators
};
};
-Example 5: Multiple OPP tables
+Example 5: opp-supported-hw
+(example: three level hierarchy of versions: cuts, substrate and process)
/ {
cpus {
@@ -426,40 +447,73 @@ Example 5: Multiple OPP tables
...
cpu-supply = <&cpu_supply>
- operating-points-v2 = <&cpu0_opp_table_slow>, <&cpu0_opp_table_fast>;
- operating-points-names = "slow", "fast";
+ operating-points-v2 = <&cpu0_opp_table_slow>;
};
};
- cpu0_opp_table_slow: opp_table_slow {
+ opp_table {
compatible = "operating-points-v2";
status = "okay";
opp-shared;
- opp00 {
+ opp@600000000 {
+ /*
+ * Supports all substrate and process versions for 0xF
+ * cuts, i.e. only first four cuts.
+ */
+ opp-supported-hw = <0xF 0xFFFFFFFF 0xFFFFFFFF>
opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <900000 915000 925000>;
...
};
- opp01 {
+ opp@800000000 {
+ /*
+ * Supports:
+ * - cuts: only one, 6th cut (represented by 6th bit).
+ * - substrate: supports 16 different substrate versions
+ * - process: supports 9 different process versions
+ */
+ opp-supported-hw = <0x20 0xff0000ff 0x0000f4f0>
opp-hz = /bits/ 64 <800000000>;
+ opp-microvolt = <900000 915000 925000>;
...
};
};
+};
+
+Example 6: opp-microvolt-<name>, opp-microamp-<name>:
+(example: device with two possible microvolt ranges: slow and fast)
- cpu0_opp_table_fast: opp_table_fast {
+/ {
+ cpus {
+ cpu@0 {
+ compatible = "arm,cortex-a7";
+ ...
+
+ operating-points-v2 = <&cpu0_opp_table>;
+ };
+ };
+
+ cpu0_opp_table: opp_table0 {
compatible = "operating-points-v2";
- status = "okay";
opp-shared;
- opp10 {
+ opp@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
- ...
+ opp-microvolt-slow = <900000 915000 925000>;
+ opp-microvolt-fast = <970000 975000 985000>;
+ opp-microamp-slow = <70000>;
+ opp-microamp-fast = <71000>;
};
- opp11 {
- opp-hz = /bits/ 64 <1100000000>;
- ...
+ opp@1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt-slow = <900000 915000 925000>, /* Supply vcc0 */
+ <910000 925000 935000>; /* Supply vcc1 */
+ opp-microvolt-fast = <970000 975000 985000>, /* Supply vcc0 */
+ <960000 965000 975000>; /* Supply vcc1 */
+ opp-microamp = <70000>; /* Will be used for both slow/fast */
};
};
};
diff --git a/Documentation/devicetree/bindings/phy/brcm,brcmstb-sata-phy.txt b/Documentation/devicetree/bindings/phy/brcm,brcmstb-sata-phy.txt
index 7f81ef9..d87ab7c 100644
--- a/Documentation/devicetree/bindings/phy/brcm,brcmstb-sata-phy.txt
+++ b/Documentation/devicetree/bindings/phy/brcm,brcmstb-sata-phy.txt
@@ -2,6 +2,7 @@
Required properties:
- compatible: should be one or more of
+ "brcm,bcm7425-sata-phy"
"brcm,bcm7445-sata-phy"
"brcm,phy-sata3"
- address-cells: should be 1
diff --git a/Documentation/devicetree/bindings/phy/phy-hi6220-usb.txt b/Documentation/devicetree/bindings/phy/phy-hi6220-usb.txt
new file mode 100644
index 0000000..f17a56e
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-hi6220-usb.txt
@@ -0,0 +1,16 @@
+Hisilicon hi6220 usb PHY
+-----------------------
+
+Required properties:
+- compatible: should be "hisilicon,hi6220-usb-phy"
+- #phy-cells: must be 0
+- hisilicon,peripheral-syscon: phandle of syscon used to control phy.
+Refer to phy/phy-bindings.txt for the generic PHY binding properties
+
+Example:
+ usb_phy: usbphy {
+ compatible = "hisilicon,hi6220-usb-phy";
+ #phy-cells = <0>;
+ phy-supply = <&fixed_5v_hub>;
+ hisilicon,peripheral-syscon = <&sys_ctrl>;
+ };
diff --git a/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt
new file mode 100644
index 0000000..2390e4e
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/rcar-gen3-phy-usb2.txt
@@ -0,0 +1,39 @@
+* Renesas R-Car generation 3 USB 2.0 PHY
+
+This file provides information on what the device node for the R-Car generation
+3 USB 2.0 PHY contains.
+
+Required properties:
+- compatible: "renesas,usb2-phy-r8a7795" if the device is a part of an R8A7795
+ SoC.
+- reg: offset and length of the partial USB 2.0 Host register block.
+- reg-names: must be "usb2_host".
+- clocks: clock phandle and specifier pair(s).
+- #phy-cells: see phy-bindings.txt in the same directory, must be <0>.
+
+Optional properties:
+To use a USB channel where USB 2.0 Host and HSUSB (USB 2.0 Peripheral) are
+combined, the device tree node should set HSUSB properties to reg and reg-names
+properties. This is because HSUSB has registers to select USB 2.0 host or
+peripheral at that channel:
+- reg: offset and length of the partial HSUSB register block.
+- reg-names: must be "hsusb".
+- interrupts: interrupt specifier for the PHY.
+
+Example (R-Car H3):
+
+ usb-phy@ee080200 {
+ compatible = "renesas,usb2-phy-r8a7795";
+ reg = <0 0xee080200 0 0x700>, <0 0xe6590100 0 0x100>;
+ reg-names = "usb2_host", "hsusb";
+ interrupts = <GIC_SPI 108 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&mstp7_clks R8A7795_CLK_EHCI0>,
+ <&mstp7_clks R8A7795_CLK_HSUSB>;
+ };
+
+ usb-phy@ee0a0200 {
+ compatible = "renesas,usb2-phy-r8a7795";
+ reg = <0 0xee0a0200 0 0x700>;
+ reg-names = "usb2_host";
+ clocks = <&mstp7_clks R8A7795_CLK_EHCI0>;
+ };
diff --git a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
index 826454a..68498d5 100644
--- a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
@@ -1,7 +1,10 @@
ROCKCHIP USB2 PHY
Required properties:
- - compatible: rockchip,rk3288-usb-phy
+ - compatible: matching the soc type, one of
+ "rockchip,rk3066a-usb-phy"
+ "rockchip,rk3188-usb-phy"
+ "rockchip,rk3288-usb-phy"
- rockchip,grf : phandle to the syscon managing the "general
register files"
- #address-cells: should be 1
@@ -21,6 +24,7 @@ required properties:
Optional Properties:
- clocks : phandle + clock specifier for the phy clocks
- clock-names: string, clock name, must be "phyclk"
+- #clock-cells: for users of the phy-pll, should be 0
Example:
diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
index 0cebf74..95736d7 100644
--- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
@@ -9,6 +9,7 @@ Required properties:
* allwinner,sun7i-a20-usb-phy
* allwinner,sun8i-a23-usb-phy
* allwinner,sun8i-a33-usb-phy
+ * allwinner,sun8i-h3-usb-phy
- reg : a list of offset + length pairs
- reg-names :
* "phy_ctrl"
diff --git a/Documentation/devicetree/bindings/phy/ti-phy.txt b/Documentation/devicetree/bindings/phy/ti-phy.txt
index 9cf9446..a3b3945 100644
--- a/Documentation/devicetree/bindings/phy/ti-phy.txt
+++ b/Documentation/devicetree/bindings/phy/ti-phy.txt
@@ -31,6 +31,8 @@ OMAP USB2 PHY
Required properties:
- compatible: Should be "ti,omap-usb2"
+ Should be "ti,dra7x-usb2-phy2" for the 2nd instance of USB2 PHY
+ in DRA7x
- reg : Address and length of the register set for the device.
- #phy-cells: determine the number of cells that should be given in the
phandle while referencing this phy.
@@ -40,10 +42,14 @@ Required properties:
* "wkupclk" - wakeup clock.
* "refclk" - reference clock (optional).
-Optional properties:
+Deprecated properties:
- ctrl-module : phandle of the control module used by PHY driver to power on
the PHY.
+Recommended properies:
+- syscon-phy-power : phandle/offset pair. Phandle to the system control
+ module and the register offset to power on/off the PHY.
+
This is usually a subnode of ocp2scp to which it is connected.
usb2phy@4a0ad080 {
@@ -77,14 +83,22 @@ Required properties:
* "div-clk" - apll clock
Optional properties:
- - ctrl-module : phandle of the control module used by PHY driver to power on
- the PHY.
- id: If there are multiple instance of the same type, in order to
differentiate between each instance "id" can be used (e.g., multi-lane PCIe
PHY). If "id" is not provided, it is set to default value of '1'.
- syscon-pllreset: Handle to system control region that contains the
CTRL_CORE_SMA_SW_0 register and register offset to the CTRL_CORE_SMA_SW_0
register that contains the SATA_PLL_SOFT_RESET bit. Only valid for sata_phy.
+ - syscon-pcs : phandle/offset pair. Phandle to the system control module and the
+ register offset to write the PCS delay value.
+
+Deprecated properties:
+ - ctrl-module : phandle of the control module used by PHY driver to power on
+ the PHY.
+
+Recommended properies:
+ - syscon-phy-power : phandle/offset pair. Phandle to the system control
+ module and the register offset to power on/off the PHY.
This is usually a subnode of ocp2scp to which it is connected.
diff --git a/Documentation/devicetree/bindings/serial/renesas,sci-serial.txt b/Documentation/devicetree/bindings/serial/renesas,sci-serial.txt
index 73f825e..401b1b3 100644
--- a/Documentation/devicetree/bindings/serial/renesas,sci-serial.txt
+++ b/Documentation/devicetree/bindings/serial/renesas,sci-serial.txt
@@ -2,7 +2,7 @@
Required properties:
- - compatible: Must contain one of the following:
+ - compatible: Must contain one or more of the following:
- "renesas,scif-r7s72100" for R7S72100 (RZ/A1H) SCIF compatible UART.
- "renesas,scifa-r8a73a4" for R8A73A4 (R-Mobile APE6) SCIFA compatible UART.
@@ -15,10 +15,14 @@ Required properties:
- "renesas,scifa-r8a7790" for R8A7790 (R-Car H2) SCIFA compatible UART.
- "renesas,scifb-r8a7790" for R8A7790 (R-Car H2) SCIFB compatible UART.
- "renesas,hscif-r8a7790" for R8A7790 (R-Car H2) HSCIF compatible UART.
- - "renesas,scif-r8a7791" for R8A7791 (R-Car M2) SCIF compatible UART.
- - "renesas,scifa-r8a7791" for R8A7791 (R-Car M2) SCIFA compatible UART.
- - "renesas,scifb-r8a7791" for R8A7791 (R-Car M2) SCIFB compatible UART.
- - "renesas,hscif-r8a7791" for R8A7791 (R-Car M2) HSCIF compatible UART.
+ - "renesas,scif-r8a7791" for R8A7791 (R-Car M2-W) SCIF compatible UART.
+ - "renesas,scifa-r8a7791" for R8A7791 (R-Car M2-W) SCIFA compatible UART.
+ - "renesas,scifb-r8a7791" for R8A7791 (R-Car M2-W) SCIFB compatible UART.
+ - "renesas,hscif-r8a7791" for R8A7791 (R-Car M2-W) HSCIF compatible UART.
+ - "renesas,scif-r8a7793" for R8A7793 (R-Car M2-N) SCIF compatible UART.
+ - "renesas,scifa-r8a7793" for R8A7793 (R-Car M2-N) SCIFA compatible UART.
+ - "renesas,scifb-r8a7793" for R8A7793 (R-Car M2-N) SCIFB compatible UART.
+ - "renesas,hscif-r8a7793" for R8A7793 (R-Car M2-N) HSCIF compatible UART.
- "renesas,scif-r8a7794" for R8A7794 (R-Car E2) SCIF compatible UART.
- "renesas,scifa-r8a7794" for R8A7794 (R-Car E2) SCIFA compatible UART.
- "renesas,scifb-r8a7794" for R8A7794 (R-Car E2) SCIFB compatible UART.
@@ -27,6 +31,14 @@ Required properties:
- "renesas,hscif-r8a7795" for R8A7795 (R-Car H3) HSCIF compatible UART.
- "renesas,scifa-sh73a0" for SH73A0 (SH-Mobile AG5) SCIFA compatible UART.
- "renesas,scifb-sh73a0" for SH73A0 (SH-Mobile AG5) SCIFB compatible UART.
+ - "renesas,rcar-gen1-scif" for R-Car Gen1 SCIF compatible UART,
+ - "renesas,rcar-gen2-scif" for R-Car Gen2 SCIF compatible UART,
+ - "renesas,rcar-gen3-scif" for R-Car Gen3 SCIF compatible UART,
+ - "renesas,rcar-gen2-scifa" for R-Car Gen2 SCIFA compatible UART,
+ - "renesas,rcar-gen2-scifb" for R-Car Gen2 SCIFB compatible UART,
+ - "renesas,rcar-gen1-hscif" for R-Car Gen1 HSCIF compatible UART,
+ - "renesas,rcar-gen2-hscif" for R-Car Gen2 HSCIF compatible UART,
+ - "renesas,rcar-gen3-hscif" for R-Car Gen3 HSCIF compatible UART,
- "renesas,scif" for generic SCIF compatible UART.
- "renesas,scifa" for generic SCIFA compatible UART.
- "renesas,scifb" for generic SCIFB compatible UART.
@@ -34,15 +46,26 @@ Required properties:
- "renesas,sci" for generic SCI compatible UART.
When compatible with the generic version, nodes must list the
- SoC-specific version corresponding to the platform first followed by the
- generic version.
+ SoC-specific version corresponding to the platform first, followed by the
+ family-specific and/or generic versions.
- reg: Base address and length of the I/O registers used by the UART.
- interrupts: Must contain an interrupt-specifier for the SCIx interrupt.
- clocks: Must contain a phandle and clock-specifier pair for each entry
in clock-names.
- - clock-names: Must contain "sci_ick" for the SCIx UART interface clock.
+ - clock-names: Must contain "fck" for the SCIx UART functional clock.
+ Apart from the divided functional clock, there may be other possible
+ sources for the sampling clock, depending on SCIx variant.
+ On (H)SCI(F) and some SCIFA, an additional clock may be specified:
+ - "hsck" for the optional external clock input (on HSCIF),
+ - "sck" for the optional external clock input (on other variants).
+ On UARTs equipped with a Baud Rate Generator for External Clock (BRG)
+ (some SCIF and HSCIF), additional clocks may be specified:
+ - "brg_int" for the optional internal clock source for the frequency
+ divider (typically the (AXI or SHwy) bus clock),
+ - "scif_clk" for the optional external clock source for the frequency
+ divider (SCIF_CLK).
Note: Each enabled SCIx UART should have an alias correctly numbered in the
"aliases" node.
@@ -58,12 +81,13 @@ Example:
};
scifa0: serial@e6c40000 {
- compatible = "renesas,scifa-r8a7790", "renesas,scifa";
+ compatible = "renesas,scifa-r8a7790",
+ "renesas,rcar-gen2-scifa", "renesas,scifa";
reg = <0 0xe6c40000 0 64>;
interrupt-parent = <&gic>;
interrupts = <0 144 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&mstp2_clks R8A7790_CLK_SCIFA0>;
- clock-names = "sci_ick";
+ clock-names = "fck";
dmas = <&dmac0 0x21>, <&dmac0 0x22>;
dma-names = "tx", "rx";
};
diff --git a/Documentation/devicetree/bindings/staging/ion/hi6220-ion.txt b/Documentation/devicetree/bindings/staging/ion/hi6220-ion.txt
new file mode 100644
index 0000000..c59e27c
--- /dev/null
+++ b/Documentation/devicetree/bindings/staging/ion/hi6220-ion.txt
@@ -0,0 +1,31 @@
+Hi6220 SoC ION
+===================================================================
+Required properties:
+- compatible : "hisilicon,hi6220-ion"
+- list of the ION heaps
+ - heap name : maybe heap_sys_user@0
+ - heap id : id should be unique in the system.
+ - heap base : base ddr address of the heap,0 means that
+ it is dynamic.
+ - heap size : memory size and 0 means it is dynamic.
+ - heap type : the heap type of the heap, please also
+ see the define in ion.h(drivers/staging/android/uapi/ion.h)
+-------------------------------------------------------------------
+Example:
+ hi6220-ion {
+ compatible = "hisilicon,hi6220-ion";
+ heap_sys_user@0 {
+ heap-name = "sys_user";
+ heap-id = <0x0>;
+ heap-base = <0x0>;
+ heap-size = <0x0>;
+ heap-type = "ion_system";
+ };
+ heap_sys_contig@0 {
+ heap-name = "sys_contig";
+ heap-id = <0x1>;
+ heap-base = <0x0>;
+ heap-size = <0x0>;
+ heap-type = "ion_system_contig";
+ };
+ };
diff --git a/Documentation/devicetree/bindings/usb/dwc2.txt b/Documentation/devicetree/bindings/usb/dwc2.txt
index fd132cb..2213682 100644
--- a/Documentation/devicetree/bindings/usb/dwc2.txt
+++ b/Documentation/devicetree/bindings/usb/dwc2.txt
@@ -4,6 +4,7 @@ Platform DesignWare HS OTG USB 2.0 controller
Required properties:
- compatible : One of:
- brcm,bcm2835-usb: The DWC2 USB controller instance in the BCM2835 SoC.
+ - hisilicon,hi6220-usb: The DWC2 USB controller instance in the hi6220 SoC.
- rockchip,rk3066-usb: The DWC2 USB controller instance in the rk3066 Soc;
- "rockchip,rk3188-usb", "rockchip,rk3066-usb", "snps,dwc2": for rk3188 Soc;
- "rockchip,rk3288-usb", "rockchip,rk3066-usb", "snps,dwc2": for rk3288 Soc;
diff --git a/Documentation/devicetree/bindings/usb/dwc3-xilinx.txt b/Documentation/devicetree/bindings/usb/dwc3-xilinx.txt
new file mode 100644
index 0000000..30361b3
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/dwc3-xilinx.txt
@@ -0,0 +1,33 @@
+Xilinx SuperSpeed DWC3 USB SoC controller
+
+Required properties:
+- compatible: Should contain "xlnx,zynqmp-dwc3"
+- clocks: A list of phandles for the clocks listed in clock-names
+- clock-names: Should contain the following:
+ "bus_clk" Master/Core clock, have to be >= 125 MHz for SS
+ operation and >= 60MHz for HS operation
+
+ "ref_clk" Clock source to core during PHY power down
+
+Required child node:
+A child node must exist to represent the core DWC3 IP block. The name of
+the node is not important. The content of the node is defined in dwc3.txt.
+
+Example device node:
+
+ usb@0 {
+ #address-cells = <0x2>;
+ #size-cells = <0x1>;
+ status = "okay";
+ compatible = "xlnx,zynqmp-dwc3";
+ clock-names = "bus_clk" "ref_clk";
+ clocks = <&clk125>, <&clk125>;
+ ranges;
+
+ dwc3@fe200000 {
+ compatible = "snps,dwc3";
+ reg = <0x0 0xfe200000 0x40000>;
+ interrupts = <0x0 0x41 0x4>;
+ dr_mode = "host";
+ };
+ };
diff --git a/Documentation/devicetree/bindings/usb/mt8173-xhci.txt b/Documentation/devicetree/bindings/usb/mt8173-xhci.txt
new file mode 100644
index 0000000..b3a7ffa
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/mt8173-xhci.txt
@@ -0,0 +1,51 @@
+MT8173 xHCI
+
+The device node for Mediatek SOC USB3.0 host controller
+
+Required properties:
+ - compatible : should contain "mediatek,mt8173-xhci"
+ - reg : specifies physical base address and size of the registers,
+ the first one for MAC, the second for IPPC
+ - interrupts : interrupt used by the controller
+ - power-domains : a phandle to USB power domain node to control USB's
+ mtcmos
+ - vusb33-supply : regulator of USB avdd3.3v
+
+ - clocks : a list of phandle + clock-specifier pairs, one for each
+ entry in clock-names
+ - clock-names : must contain
+ "sys_ck": for clock of xHCI MAC
+ "wakeup_deb_p0": for USB wakeup debounce clock of port0
+ "wakeup_deb_p1": for USB wakeup debounce clock of port1
+
+ - phys : a list of phandle + phy specifier pairs
+
+Optional properties:
+ - mediatek,wakeup-src : 1: ip sleep wakeup mode; 2: line state wakeup
+ mode;
+ - mediatek,syscon-wakeup : phandle to syscon used to access USB wakeup
+ control register, it depends on "mediatek,wakeup-src".
+ - vbus-supply : reference to the VBUS regulator;
+ - usb3-lpm-capable : supports USB3.0 LPM
+
+Example:
+usb30: usb@11270000 {
+ compatible = "mediatek,mt8173-xhci";
+ reg = <0 0x11270000 0 0x1000>,
+ <0 0x11280700 0 0x0100>;
+ interrupts = <GIC_SPI 115 IRQ_TYPE_LEVEL_LOW>;
+ power-domains = <&scpsys MT8173_POWER_DOMAIN_USB>;
+ clocks = <&topckgen CLK_TOP_USB30_SEL>,
+ <&pericfg CLK_PERI_USB0>,
+ <&pericfg CLK_PERI_USB1>;
+ clock-names = "sys_ck",
+ "wakeup_deb_p0",
+ "wakeup_deb_p1";
+ phys = <&phy_port0 PHY_TYPE_USB3>,
+ <&phy_port1 PHY_TYPE_USB2>;
+ vusb33-supply = <&mt6397_vusb_reg>;
+ vbus-supply = <&usb_p1_vbus>;
+ usb3-lpm-capable;
+ mediatek,syscon-wakeup = <&pericfg>;
+ mediatek,wakeup-src = <1>;
+};
diff --git a/Documentation/devicetree/bindings/usb/renesas_usb3.txt b/Documentation/devicetree/bindings/usb/renesas_usb3.txt
new file mode 100644
index 0000000..8d52766
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/renesas_usb3.txt
@@ -0,0 +1,23 @@
+Renesas Electronics USB3.0 Peripheral driver
+
+Required properties:
+ - compatible: Must contain one of the following:
+ - "renesas,r8a7795-usb3-peri"
+ - reg: Base address and length of the register for the USB3.0 Peripheral
+ - interrupts: Interrupt specifier for the USB3.0 Peripheral
+ - clocks: clock phandle and specifier pair
+
+Example:
+ usb3_peri0: usb@ee020000 {
+ compatible = "renesas,r8a7795-usb3-peri";
+ reg = <0 0xee020000 0 0x400>;
+ interrupts = <GIC_SPI 104 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cpg CPG_MOD 328>;
+ };
+
+ usb3_peri1: usb@ee060000 {
+ compatible = "renesas,r8a7795-usb3-peri";
+ reg = <0 0xee060000 0 0x400>;
+ interrupts = <GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&cpg CPG_MOD 327>;
+ };
diff --git a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
index 7d48f63..b604056 100644
--- a/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
+++ b/Documentation/devicetree/bindings/usb/renesas_usbhs.txt
@@ -1,11 +1,21 @@
Renesas Electronics USBHS driver
Required properties:
- - compatible: Must contain one of the following:
- - "renesas,usbhs-r8a7790"
- - "renesas,usbhs-r8a7791"
- - "renesas,usbhs-r8a7794"
- - "renesas,usbhs-r8a7795"
+ - compatible: Must contain one or more of the following:
+
+ - "renesas,usbhs-r8a7790" for r8a7790 (R-Car H2) compatible device
+ - "renesas,usbhs-r8a7791" for r8a7791 (R-Car M2-W) compatible device
+ - "renesas,usbhs-r8a7792" for r8a7792 (R-Car V2H) compatible device
+ - "renesas,usbhs-r8a7793" for r8a7793 (R-Car M2-N) compatible device
+ - "renesas,usbhs-r8a7794" for r8a7794 (R-Car E2) compatible device
+ - "renesas,usbhs-r8a7795" for r8a7795 (R-Car H3) compatible device
+ - "renesas,rcar-gen2-usbhs" for R-Car Gen2 compatible device
+ - "renesas,rcar-gen3-usbhs" for R-Car Gen3 compatible device
+
+ When compatible with the generic version, nodes must list the
+ SoC-specific version corresponding to the platform first followed
+ by the generic version.
+
- reg: Base address and length of the register for the USBHS
- interrupts: Interrupt specifier for the USBHS
- clocks: A list of phandle + clock specifier pairs
@@ -22,7 +32,7 @@ Optional properties:
Example:
usbhs: usb@e6590000 {
- compatible = "renesas,usbhs-r8a7790";
+ compatible = "renesas,usbhs-r8a7790", "renesas,rcar-gen2-usbhs";
reg = <0 0xe6590000 0 0x100>;
interrupts = <0 107 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&mstp7_clks R8A7790_CLK_HSUSB>;
diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt b/Documentation/devicetree/bindings/usb/usb-xhci.txt
index 86f67f0..0825732 100644
--- a/Documentation/devicetree/bindings/usb/usb-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt
@@ -3,8 +3,8 @@ USB xHCI controllers
Required properties:
- compatible: should be one of "generic-xhci",
"marvell,armada-375-xhci", "marvell,armada-380-xhci",
- "renesas,xhci-r8a7790", "renesas,xhci-r8a7791" (deprecated:
- "xhci-platform").
+ "renesas,xhci-r8a7790", "renesas,xhci-r8a7791", "renesas,xhci-r8a7793",
+ "renesas,xhci-r8a7795" (deprecated: "xhci-platform").
- reg: should contain address and length of the standard XHCI
register set for the device.
- interrupts: one XHCI interrupt should be described here.
diff --git a/Documentation/dmaengine/client.txt b/Documentation/dmaengine/client.txt
index 11fb87f..9e33189 100644
--- a/Documentation/dmaengine/client.txt
+++ b/Documentation/dmaengine/client.txt
@@ -22,25 +22,14 @@ The slave DMA usage consists of following steps:
Channel allocation is slightly different in the slave DMA context,
client drivers typically need a channel from a particular DMA
controller only and even in some cases a specific channel is desired.
- To request a channel dma_request_channel() API is used.
+ To request a channel dma_request_chan() API is used.
Interface:
- struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
- dma_filter_fn filter_fn,
- void *filter_param);
- where dma_filter_fn is defined as:
- typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param);
+ struct dma_chan *dma_request_chan(struct device *dev, const char *name);
- The 'filter_fn' parameter is optional, but highly recommended for
- slave and cyclic channels as they typically need to obtain a specific
- DMA channel.
-
- When the optional 'filter_fn' parameter is NULL, dma_request_channel()
- simply returns the first channel that satisfies the capability mask.
-
- Otherwise, the 'filter_fn' routine will be called once for each free
- channel which has a capability in 'mask'. 'filter_fn' is expected to
- return 'true' when the desired DMA channel is found.
+ Which will find and return the 'name' DMA channel associated with the 'dev'
+ device. The association is done via DT, ACPI or board file based
+ dma_slave_map matching table.
A channel allocated via this interface is exclusive to the caller,
until dma_release_channel() is called.
@@ -128,7 +117,7 @@ The slave DMA usage consists of following steps:
transaction.
For cyclic DMA, a callback function may wish to terminate the
- DMA via dmaengine_terminate_all().
+ DMA via dmaengine_terminate_async().
Therefore, it is important that DMA engine drivers drop any
locks before calling the callback function which may cause a
@@ -166,12 +155,29 @@ The slave DMA usage consists of following steps:
Further APIs:
-1. int dmaengine_terminate_all(struct dma_chan *chan)
+1. int dmaengine_terminate_sync(struct dma_chan *chan)
+ int dmaengine_terminate_async(struct dma_chan *chan)
+ int dmaengine_terminate_all(struct dma_chan *chan) /* DEPRECATED */
This causes all activity for the DMA channel to be stopped, and may
discard data in the DMA FIFO which hasn't been fully transferred.
No callback functions will be called for any incomplete transfers.
+ Two variants of this function are available.
+
+ dmaengine_terminate_async() might not wait until the DMA has been fully
+ stopped or until any running complete callbacks have finished. But it is
+ possible to call dmaengine_terminate_async() from atomic context or from
+ within a complete callback. dmaengine_synchronize() must be called before it
+ is safe to free the memory accessed by the DMA transfer or free resources
+ accessed from within the complete callback.
+
+ dmaengine_terminate_sync() will wait for the transfer and any running
+ complete callbacks to finish before it returns. But the function must not be
+ called from atomic context or from within a complete callback.
+
+ dmaengine_terminate_all() is deprecated and should not be used in new code.
+
2. int dmaengine_pause(struct dma_chan *chan)
This pauses activity on the DMA channel without data loss.
@@ -197,3 +203,20 @@ Further APIs:
a running DMA channel. It is recommended that DMA engine users
pause or stop (via dmaengine_terminate_all()) the channel before
using this API.
+
+5. void dmaengine_synchronize(struct dma_chan *chan)
+
+ Synchronize the termination of the DMA channel to the current context.
+
+ This function should be used after dmaengine_terminate_async() to synchronize
+ the termination of the DMA channel to the current context. The function will
+ wait for the transfer and any running complete callbacks to finish before it
+ returns.
+
+ If dmaengine_terminate_async() is used to stop the DMA channel this function
+ must be called before it is safe to free memory accessed by previously
+ submitted descriptors or to free any resources accessed within the complete
+ callback of previously submitted descriptors.
+
+ The behavior of this function is undefined if dma_async_issue_pending() has
+ been called between dmaengine_terminate_async() and this function.
diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt
index 67d4ce4..122b7f4 100644
--- a/Documentation/dmaengine/provider.txt
+++ b/Documentation/dmaengine/provider.txt
@@ -327,8 +327,24 @@ supported.
* device_terminate_all
- Aborts all the pending and ongoing transfers on the channel
- - This command should operate synchronously on the channel,
- terminating right away all the channels
+ - For aborted transfers the complete callback should not be called
+ - Can be called from atomic context or from within a complete
+ callback of a descriptor. Must not sleep. Drivers must be able
+ to handle this correctly.
+ - Termination may be asynchronous. The driver does not have to
+ wait until the currently active transfer has completely stopped.
+ See device_synchronize.
+
+ * device_synchronize
+ - Must synchronize the termination of a channel to the current
+ context.
+ - Must make sure that memory for previously submitted
+ descriptors is no longer accessed by the DMA controller.
+ - Must make sure that all complete callbacks for previously
+ submitted descriptors have finished running and none are
+ scheduled to run.
+ - May sleep.
+
Misc notes (stuff that should be documented, but don't really know
where to put them)
diff --git a/Documentation/fault-injection/notifier-error-inject.txt b/Documentation/fault-injection/notifier-error-inject.txt
index 09adabe..83d3f4e 100644
--- a/Documentation/fault-injection/notifier-error-inject.txt
+++ b/Documentation/fault-injection/notifier-error-inject.txt
@@ -10,6 +10,7 @@ modules that can be used to test the following notifiers.
* PM notifier
* Memory hotplug notifier
* powerpc pSeries reconfig notifier
+ * Netdevice notifier
CPU notifier error injection module
-----------------------------------
@@ -87,6 +88,30 @@ Possible pSeries reconfig notifier events to be failed are:
* PSERIES_DRCONF_MEM_ADD
* PSERIES_DRCONF_MEM_REMOVE
+Netdevice notifier error injection module
+----------------------------------------------
+This feature is controlled through debugfs interface
+/sys/kernel/debug/notifier-error-inject/netdev/actions/<notifier event>/error
+
+Netdevice notifier events which can be failed are:
+
+ * NETDEV_REGISTER
+ * NETDEV_CHANGEMTU
+ * NETDEV_CHANGENAME
+ * NETDEV_PRE_UP
+ * NETDEV_PRE_TYPE_CHANGE
+ * NETDEV_POST_INIT
+ * NETDEV_PRECHANGEMTU
+ * NETDEV_PRECHANGEUPPER
+ * NETDEV_CHANGEUPPER
+
+Example: Inject netdevice mtu change error (-22 == -EINVAL)
+
+ # cd /sys/kernel/debug/notifier-error-inject/netdev
+ # echo -22 > actions/NETDEV_CHANGEMTU/error
+ # ip link set eth0 mtu 1024
+ RTNETLINK answers: Invalid argument
+
For more usage examples
-----------------------
There are tools/testing/selftests using the notifier error injection features
diff --git a/Documentation/features/seccomp/seccomp-filter/arch-support.txt b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
index 76d39d6..4f66ec1 100644
--- a/Documentation/features/seccomp/seccomp-filter/arch-support.txt
+++ b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
@@ -33,7 +33,7 @@
| sh: | TODO |
| sparc: | TODO |
| tile: | ok |
- | um: | TODO |
+ | um: | ok |
| unicore32: | TODO |
| x86: | ok |
| xtensa: | TODO |
diff --git a/Documentation/features/time/irq-time-acct/arch-support.txt b/Documentation/features/time/irq-time-acct/arch-support.txt
index e633162..4199ffec 100644
--- a/Documentation/features/time/irq-time-acct/arch-support.txt
+++ b/Documentation/features/time/irq-time-acct/arch-support.txt
@@ -9,7 +9,7 @@
| alpha: | .. |
| arc: | TODO |
| arm: | ok |
- | arm64: | .. |
+ | arm64: | ok |
| avr32: | TODO |
| blackfin: | TODO |
| c6x: | TODO |
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt
index af68efd..e5fe521 100644
--- a/Documentation/filesystems/configfs/configfs.txt
+++ b/Documentation/filesystems/configfs/configfs.txt
@@ -51,15 +51,27 @@ configfs tree is always there, whether mounted on /config or not.
An item is created via mkdir(2). The item's attributes will also
appear at this time. readdir(3) can determine what the attributes are,
read(2) can query their default values, and write(2) can store new
-values. Like sysfs, attributes should be ASCII text files, preferably
-with only one value per file. The same efficiency caveats from sysfs
-apply. Don't mix more than one attribute in one attribute file.
-
-Like sysfs, configfs expects write(2) to store the entire buffer at
-once. When writing to configfs attributes, userspace processes should
-first read the entire file, modify the portions they wish to change, and
-then write the entire buffer back. Attribute files have a maximum size
-of one page (PAGE_SIZE, 4096 on i386).
+values. Don't mix more than one attribute in one attribute file.
+
+There are two types of configfs attributes:
+
+* Normal attributes, which similar to sysfs attributes, are small ASCII text
+files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
+only one value per file should be used, and the same caveats from sysfs apply.
+Configfs expects write(2) to store the entire buffer at once. When writing to
+normal configfs attributes, userspace processes should first read the entire
+file, modify the portions they wish to change, and then write the entire
+buffer back.
+
+* Binary attributes, which are somewhat similar to sysfs binary attributes,
+but with a few slight changes to semantics. The PAGE_SIZE limitation does not
+apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
+The write(2) calls from user space are buffered, and the attributes'
+write_bin_attribute method will be invoked on the final close, therefore it is
+imperative for user-space to check the return code of close(2) in order to
+verify that the operation finished successfully.
+To avoid a malicious user OOMing the kernel, there's a per-binary attribute
+maximum buffer value.
When an item needs to be destroyed, remove it with rmdir(2). An
item cannot be destroyed if any other item has a link to it (via
@@ -171,6 +183,7 @@ among other things. For that, it needs a type.
struct configfs_item_operations *ct_item_ops;
struct configfs_group_operations *ct_group_ops;
struct configfs_attribute **ct_attrs;
+ struct configfs_bin_attribute **ct_bin_attrs;
};
The most basic function of a config_item_type is to define what
@@ -201,6 +214,32 @@ be called whenever userspace asks for a read(2) on the attribute. If an
attribute is writable and provides a ->store method, that method will be
be called whenever userspace asks for a write(2) on the attribute.
+[struct configfs_bin_attribute]
+
+ struct configfs_attribute {
+ struct configfs_attribute cb_attr;
+ void *cb_private;
+ size_t cb_max_size;
+ };
+
+The binary attribute is used when the one needs to use binary blob to
+appear as the contents of a file in the item's configfs directory.
+To do so add the binary attribute to the NULL-terminated array
+config_item_type->ct_bin_attrs, and the item appears in configfs, the
+attribute file will appear with the configfs_bin_attribute->cb_attr.ca_name
+filename. configfs_bin_attribute->cb_attr.ca_mode specifies the file
+permissions.
+The cb_private member is provided for use by the driver, while the
+cb_max_size member specifies the maximum amount of vmalloc buffer
+to be used.
+
+If binary attribute is readable and the config_item provides a
+ct_item_ops->read_bin_attribute() method, that method will be called
+whenever userspace asks for a read(2) on the attribute. The converse
+will happen for write(2). The reads/writes are bufferred so only a
+single read/write will occur; the attributes' need not concern itself
+with it.
+
[struct config_group]
A config_item cannot live in a vacuum. The only way one can be created
diff --git a/Documentation/iio/iio_configfs.txt b/Documentation/iio/iio_configfs.txt
new file mode 100644
index 0000000..f0add35
--- /dev/null
+++ b/Documentation/iio/iio_configfs.txt
@@ -0,0 +1,93 @@
+Industrial IIO configfs support
+
+1. Overview
+
+Configfs is a filesystem-based manager of kernel objects. IIO uses some
+objects that could be easily configured using configfs (e.g.: devices,
+triggers).
+
+See Documentation/filesystems/configfs/configfs.txt for more information
+about how configfs works.
+
+2. Usage
+
+In order to use configfs support in IIO we need to select it at compile
+time via CONFIG_IIO_CONFIGFS config option.
+
+Then, mount the configfs filesystem (usually under /config directory):
+
+$ mkdir /config
+$ mount -t configfs none /config
+
+At this point, all default IIO groups will be created and can be accessed
+under /config/iio. Next chapters will describe available IIO configuration
+objects.
+
+3. Software triggers
+
+One of the IIO default configfs groups is the "triggers" group. It is
+automagically accessible when the configfs is mounted and can be found
+under /config/iio/triggers.
+
+IIO software triggers implementation offers support for creating multiple
+trigger types. A new trigger type is usually implemented as a separate
+kernel module following the interface in include/linux/iio/sw_trigger.h:
+
+/*
+ * drivers/iio/trigger/iio-trig-sample.c
+ * sample kernel module implementing a new trigger type
+ */
+#include <linux/iio/sw_trigger.h>
+
+
+static struct iio_sw_trigger *iio_trig_sample_probe(const char *name)
+{
+ /*
+ * This allocates and registers an IIO trigger plus other
+ * trigger type specific initialization.
+ */
+}
+
+static int iio_trig_hrtimer_remove(struct iio_sw_trigger *swt)
+{
+ /*
+ * This undoes the actions in iio_trig_sample_probe
+ */
+}
+
+static const struct iio_sw_trigger_ops iio_trig_sample_ops = {
+ .probe = iio_trig_sample_probe,
+ .remove = iio_trig_sample_remove,
+};
+
+static struct iio_sw_trigger_type iio_trig_sample = {
+ .name = "trig-sample",
+ .owner = THIS_MODULE,
+ .ops = &iio_trig_sample_ops,
+};
+
+module_iio_sw_trigger_driver(iio_trig_sample);
+
+Each trigger type has its own directory under /config/iio/triggers. Loading
+iio-trig-sample module will create 'trig-sample' trigger type directory
+/config/iio/triggers/trig-sample.
+
+We support the following interrupt sources (trigger types):
+ * hrtimer, uses high resolution timers as interrupt source
+
+3.1 Hrtimer triggers creation and destruction
+
+Loading iio-trig-hrtimer module will register hrtimer trigger types allowing
+users to create hrtimer triggers under /config/iio/triggers/hrtimer.
+
+e.g:
+
+$ mkdir /config/triggers/hrtimer/instance1
+$ rmdir /config/triggers/hrtimer/instance1
+
+Each trigger can have one or more attributes specific to the trigger type.
+
+3.2 "hrtimer" trigger types attributes
+
+"hrtimer" trigger type doesn't have any configurable attribute from /config dir.
+It does introduce the sampling_frequency attribute to trigger directory.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1a8169b..b7d4487 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -730,16 +730,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
uart[8250],io,<addr>[,options]
uart[8250],mmio,<addr>[,options]
+ uart[8250],mmio16,<addr>[,options]
uart[8250],mmio32,<addr>[,options]
uart[8250],0x<addr>[,options]
Start an early, polled-mode console on the 8250/16550
UART at the specified I/O port or MMIO address,
switching to the matching ttyS device later.
MMIO inter-register address stride is either 8-bit
- (mmio) or 32-bit (mmio32).
- If none of [io|mmio|mmio32], <addr> is assumed to be
- equivalent to 'mmio'. 'options' are specified in the
- same format described for ttyS above; if unspecified,
+ (mmio), 16-bit (mmio16), or 32-bit (mmio32).
+ If none of [io|mmio|mmio16|mmio32], <addr> is assumed
+ to be equivalent to 'mmio'. 'options' are specified in
+ the same format described for ttyS above; if unspecified,
the h/w is not re-initialized.
hvc<n> Use the hypervisor console device <n>. This is for
@@ -1011,10 +1012,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
unspecified, the h/w is not initialized.
pl011,<addr>
+ pl011,mmio32,<addr>
Start an early, polled-mode console on a pl011 serial
port at the specified address. The pl011 serial port
must already be setup and configured. Options are not
- yet supported.
+ yet supported. If 'mmio32' is specified, then only
+ the driver will use only 32-bit accessors to read/write
+ the device registers.
msm_serial,<addr>
Start an early, polled-mode console on an msm serial
@@ -2584,8 +2588,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
notsc [BUGS=X86-32] Disable Time Stamp Counter
- nousb [USB] Disable the USB subsystem
-
nowatchdog [KNL] Disable both lockup detectors, i.e.
soft-lockup and NMI watchdog (hard-lockup).
@@ -3900,6 +3902,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
usbcore.usbfs_snoop=
[USB] Set to log all usbfs traffic (default 0 = off).
+ usbcore.usbfs_snoop_max=
+ [USB] Maximum number of bytes to snoop in each URB
+ (default = 65536).
+
usbcore.blinkenlights=
[USB] Set to cycle leds on hubs (default 0 = off).
@@ -3920,6 +3926,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
USB_REQ_GET_DESCRIPTOR request in milliseconds
(default 5000 = 5.0 seconds).
+ usbcore.nousb [USB] Disable the USB subsystem
+
usbhid.mousepoll=
[USBHID] The interval which mice are to be polled at.
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index 58e4904..ff23b75 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -115,14 +115,17 @@ The "bat0" interface can be used like any other regular inter-
face. It needs an IP address which can be either statically con-
figured or dynamically (by using DHCP or similar services):
-# NodeA: ifconfig bat0 192.168.0.1
-# NodeB: ifconfig bat0 192.168.0.2
+# NodeA: ip link set up dev bat0
+# NodeA: ip addr add 192.168.0.1/24 dev bat0
+
+# NodeB: ip link set up dev bat0
+# NodeB: ip addr add 192.168.0.2/24 dev bat0
# NodeB: ping 192.168.0.1
Note: In order to avoid problems remove all IP addresses previ-
ously assigned to interfaces now used by batman advanced, e.g.
-# ifconfig eth0 0.0.0.0
+# ip addr flush dev eth0
LOGGING/DEBUGGING
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 2ea4c45..ceb44a0 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -335,6 +335,14 @@ tcp_keepalive_intvl - INTEGER
after probes started. Default value: 75sec i.e. connection
will be aborted after ~11 minutes of retries.
+tcp_l3mdev_accept - BOOLEAN
+ Enables child sockets to inherit the L3 master device index.
+ Enabling this option allows a "global" listen socket to work
+ across L3 master domains (e.g., VRFs) with connected sockets
+ derived from the listen socket to be bound to the L3 domain in
+ which the packets originated. Only valid when the kernel was
+ compiled with CONFIG_NET_L3_MASTER_DEV.
+
tcp_low_latency - BOOLEAN
If set, the TCP stack makes decisions that prefer lower
latency as opposed to higher throughput. By default, this
@@ -1723,6 +1731,25 @@ addip_enable - BOOLEAN
Default: 0
+pf_enable - INTEGER
+ Enable or disable pf (pf is short for potentially failed) state. A value
+ of pf_retrans > path_max_retrans also disables pf state. That is, one of
+ both pf_enable and pf_retrans > path_max_retrans can disable pf state.
+ Since pf_retrans and path_max_retrans can be changed by userspace
+ application, sometimes user expects to disable pf state by the value of
+ pf_retrans > path_max_retrans, but occasionally the value of pf_retrans
+ or path_max_retrans is changed by the user application, this pf state is
+ enabled. As such, it is necessary to add this to dynamically enable
+ and disable pf state. See:
+ https://datatracker.ietf.org/doc/draft-ietf-tsvwg-sctp-failover for
+ details.
+
+ 1: Enable pf.
+
+ 0: Disable pf.
+
+ Default: 1
+
addip_noauth_enable - BOOLEAN
Dynamic Address Reconfiguration (ADD-IP) requires the use of
authentication to protect the operations of adding or removing new
@@ -1799,7 +1826,9 @@ pf_retrans - INTEGER
having to reduce path_max_retrans to a very low value. See:
http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
for details. Note also that a value of pf_retrans > path_max_retrans
- disables this feature
+ disables this feature. Since both pf_retrans and path_max_retrans can
+ be changed by userspace application, a variable pf_enable is used to
+ disable pf state.
Default: 0
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 9199413..fad6313 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -304,8 +304,12 @@ certain netdevs from flooding unicast traffic for which there is no FDB entry.
IGMP Snooping
^^^^^^^^^^^^^
-XXX: complete this section
-
+In order to support IGMP snooping, the port netdevs should trap to the bridge
+driver all IGMP join and leave messages.
+The bridge multicast module will notify port netdevs on every multicast group
+changed whether it is static configured or dynamically joined/leave.
+The hardware implementation should be forwarding all registered multicast
+traffic groups only to the configured ports.
L3 Routing Offload
------------------
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
index b0e911e..4455888 100644
--- a/Documentation/power/pci.txt
+++ b/Documentation/power/pci.txt
@@ -999,7 +999,7 @@ from its probe routine to make runtime PM work for the device.
It is important to remember that the driver's runtime_suspend() callback
may be executed right after the usage counter has been decremented, because
-user space may already have cuased the pm_runtime_allow() helper function
+user space may already have caused the pm_runtime_allow() helper function
unblocking the runtime PM of the device to run via sysfs, so the driver must
be prepared to cope with that.
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 0784bc3..7328cf8 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -371,6 +371,12 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
- increment the device's usage counter, run pm_runtime_resume(dev) and
return its result
+ int pm_runtime_get_if_in_use(struct device *dev);
+ - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
+ runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
+ nonzero, increment the counter and return 1; otherwise return 0 without
+ changing the counter
+
void pm_runtime_put_noidle(struct device *dev);
- decrement the device's usage counter
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index b784c27..6389551 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -250,6 +250,12 @@ dentry names:
Passed by reference.
+block_device names:
+
+ %pg sda, sda1 or loop0p1
+
+ For printing name of block_device pointers.
+
struct va_format:
%pV
diff --git a/Documentation/usb/chipidea.txt b/Documentation/usb/chipidea.txt
index 3f848c1..05f735a 100644
--- a/Documentation/usb/chipidea.txt
+++ b/Documentation/usb/chipidea.txt
@@ -7,8 +7,8 @@ with 2 Freescale i.MX6Q sabre SD boards.
---------------------------------------
Select CONFIG_USB_OTG_FSM, rebuild kernel Image and modules.
If you want to check some internal variables for otg fsm,
-select CONFIG_USB_CHIPIDEA_DEBUG, there are 2 files which
-can show otg fsm variables and some controller registers value:
+mount debugfs, there are 2 files which can show otg fsm
+variables and some controller registers value:
cat /sys/kernel/debug/ci_hdrc.0/otg
cat /sys/kernel/debug/ci_hdrc.0/registers
diff --git a/Documentation/usb/gadget-testing.txt b/Documentation/usb/gadget-testing.txt
index b24d3ef..5819605 100644
--- a/Documentation/usb/gadget-testing.txt
+++ b/Documentation/usb/gadget-testing.txt
@@ -434,7 +434,7 @@ On host: serialc -v <vendorID> -p <productID> -i<interface#> -a1 -s1024 \
where seriald and serialc are Felipe's utilities found here:
-https://git.gitorious.org/usb/usb-tools.git master
+https://github.com/felipebalbi/usb-tools.git master
12. PHONET function
===================
@@ -579,6 +579,8 @@ The SOURCESINK function provides these attributes in its function directory:
isoc_mult - 0..2 (hs/ss only)
isoc_maxburst - 0..15 (ss only)
bulk_buflen - buffer length
+ bulk_qlen - depth of queue for bulk
+ iso_qlen - depth of queue for iso
Testing the SOURCESINK function
-------------------------------
diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt
index 4a15c90..0a94ffe 100644
--- a/Documentation/usb/power-management.txt
+++ b/Documentation/usb/power-management.txt
@@ -537,17 +537,18 @@ relevant attribute files are usb2_hardware_lpm and usb3_hardware_lpm.
can write y/Y/1 or n/N/0 to the file to enable/disable
USB2 hardware LPM manually. This is for test purpose mainly.
- power/usb3_hardware_lpm
+ power/usb3_hardware_lpm_u1
+ power/usb3_hardware_lpm_u2
When a USB 3.0 lpm-capable device is plugged in to a
xHCI host which supports link PM, it will check if U1
and U2 exit latencies have been set in the BOS
descriptor; if the check is is passed and the host
supports USB3 hardware LPM, USB3 hardware LPM will be
- enabled for the device and this file will be created.
- The file holds a string value (enable or disable)
- indicating whether or not USB3 hardware LPM is
- enabled for the device.
+ enabled for the device and these files will be created.
+ The files hold a string value (enable or disable)
+ indicating whether or not USB3 hardware LPM U1 or U2
+ is enabled for the device.
USB Port Power Control
----------------------
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 092ee9f..053f613 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1451,6 +1451,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+ struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
};
@@ -1459,6 +1460,7 @@ struct kvm_irq_routing_entry {
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
No flags are specified so far, the corresponding field must be set to zero.
@@ -1482,6 +1484,10 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
};
+struct kvm_irq_routing_hv_sint {
+ __u32 vcpu;
+ __u32 sint;
+};
4.53 KVM_ASSIGN_SET_MSIX_NR (deprecated)
@@ -3331,6 +3337,28 @@ the userspace IOAPIC should process the EOI and retrigger the interrupt if
it is still asserted. Vector is the LAPIC interrupt vector for which the
EOI was received.
+ struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC 1
+ __u32 type;
+ union {
+ struct {
+ __u32 msr;
+ __u64 control;
+ __u64 evt_page;
+ __u64 msg_page;
+ } synic;
+ } u;
+ };
+ /* KVM_EXIT_HYPERV */
+ struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU exits into userspace to process some tasks
+related to Hyper-V emulation.
+Valid values for 'type' are:
+ KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
+Hyper-V SynIC state change. Notification is used to remap SynIC
+event/message pages and to enable/disable SynIC messages/events processing
+in userspace.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3685,3 +3713,16 @@ available, means that that the kernel has an implementation of the
H_RANDOM hypercall backed by a hardware random-number generator.
If present, the kernel H_RANDOM handler can be enabled for guest use
with the KVM_CAP_PPC_ENABLE_HCALL capability.
+
+8.2 KVM_CAP_HYPERV_SYNIC
+
+Architectures: x86
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
+used to support Windows Hyper-V based guest paravirt drivers(VMBus).
+
+In order to use SynIC, it has to be activated by setting this
+capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
+will disable the use of APIC hardware virtualization even if supported
+by the CPU, as it's incompatible with SynIC auto-EOI behavior.
diff --git a/Documentation/virtual/kvm/devices/vm.txt b/Documentation/virtual/kvm/devices/vm.txt
index 2d09d1e..f083a16 100644
--- a/Documentation/virtual/kvm/devices/vm.txt
+++ b/Documentation/virtual/kvm/devices/vm.txt
@@ -37,7 +37,8 @@ Returns: -EFAULT if the given address is not accessible
Allows userspace to query the actual limit and set a new limit for
the maximum guest memory size. The limit will be rounded up to
2048 MB, 4096 GB, 8192 TB respectively, as this limit is governed by
-the number of page table levels.
+the number of page table levels. In the case that there is no limit we will set
+the limit to KVM_S390_NO_MEM_LIMIT (U64_MAX).
2. GROUP: KVM_S390_VM_CPU_MODEL
Architectures: s390
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
page cannot be destroyed. See role.invalid.
parent_ptes:
The reverse mapping for the pte/ptes pointing at this page's spt. If
- parent_ptes bit 0 is zero, only one spte points at this pages and
+ parent_ptes bit 0 is zero, only one spte points at this page and
parent_ptes points at this single spte, otherwise, there exists multiple
sptes pointing at this page and (parent_ptes & ~0x1) points at a data
- structure with a list of parent_ptes.
+ structure with a list of parent sptes.
unsync:
If true, then the translations in this page may not match the guest's
translation. This is equivalent to the state of the tlb when a pte is
OpenPOWER on IntegriCloud