diff options
Diffstat (limited to 'Documentation')
28 files changed, 697 insertions, 81 deletions
diff --git a/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads new file mode 100644 index 0000000..b0b0eeb --- /dev/null +++ b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads @@ -0,0 +1,5 @@ +What: /proc/sys/vm/nr_pdflush_threads +Date: June 2012 +Contact: Wanpeng Li <liwp@linux.vnet.ibm.com> +Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush + exported in /proc/sys/vm/ should be removed. diff --git a/Documentation/ABI/testing/sysfs-devices-platform-sh_mobile_lcdc_fb b/Documentation/ABI/testing/sysfs-devices-platform-sh_mobile_lcdc_fb new file mode 100644 index 0000000..2107082 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-platform-sh_mobile_lcdc_fb @@ -0,0 +1,44 @@ +What: /sys/devices/platform/sh_mobile_lcdc_fb.[0-3]/graphics/fb[0-9]/ovl_alpha +Date: May 2012 +Contact: Laurent Pinchart <laurent.pinchart@ideasonboard.com> +Description: + This file is only available on fb[0-9] devices corresponding + to overlay planes. + + Stores the alpha blending value for the overlay. Values range + from 0 (transparent) to 255 (opaque). The value is ignored if + the mode is not set to Alpha Blending. + +What: /sys/devices/platform/sh_mobile_lcdc_fb.[0-3]/graphics/fb[0-9]/ovl_mode +Date: May 2012 +Contact: Laurent Pinchart <laurent.pinchart@ideasonboard.com> +Description: + This file is only available on fb[0-9] devices corresponding + to overlay planes. + + Selects the composition mode for the overlay. Possible values + are + + 0 - Alpha Blending + 1 - ROP3 + +What: /sys/devices/platform/sh_mobile_lcdc_fb.[0-3]/graphics/fb[0-9]/ovl_position +Date: May 2012 +Contact: Laurent Pinchart <laurent.pinchart@ideasonboard.com> +Description: + This file is only available on fb[0-9] devices corresponding + to overlay planes. + + Stores the x,y overlay position on the display in pixels. The + position format is `[0-9]+,[0-9]+'. + +What: /sys/devices/platform/sh_mobile_lcdc_fb.[0-3]/graphics/fb[0-9]/ovl_rop3 +Date: May 2012 +Contact: Laurent Pinchart <laurent.pinchart@ideasonboard.com> +Description: + This file is only available on fb[0-9] devices corresponding + to overlay planes. + + Stores the raster operation (ROP3) for the overlay. Values + range from 0 to 255. The value is ignored if the mode is not + set to ROP3. diff --git a/Documentation/ABI/testing/sysfs-platform-ideapad-laptop b/Documentation/ABI/testing/sysfs-platform-ideapad-laptop index 814b013..b31e782 100644 --- a/Documentation/ABI/testing/sysfs-platform-ideapad-laptop +++ b/Documentation/ABI/testing/sysfs-platform-ideapad-laptop @@ -5,4 +5,15 @@ Contact: "Ike Panhc <ike.pan@canonical.com>" Description: Control the power of camera module. 1 means on, 0 means off. +What: /sys/devices/platform/ideapad/fan_mode +Date: June 2012 +KernelVersion: 3.6 +Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>" +Description: + Change fan mode + There are four available modes: + * 0 -> Super Silent Mode + * 1 -> Standard Mode + * 2 -> Dust Cleaning + * 4 -> Efficient Thermal Dissipation Mode diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl index 3fca32c..25b58ef 100644 --- a/Documentation/DocBook/filesystems.tmpl +++ b/Documentation/DocBook/filesystems.tmpl @@ -224,8 +224,8 @@ all your transactions. </para> <para> -Then at umount time , in your put_super() (2.4) or write_super() (2.5) -you can then call journal_destroy() to clean up your in-core journal object. +Then at umount time , in your put_super() you can then call journal_destroy() +to clean up your in-core journal object. </para> <para> diff --git a/Documentation/IRQ-domain.txt b/Documentation/IRQ-domain.txt index 27dcaab..1401cec 100644 --- a/Documentation/IRQ-domain.txt +++ b/Documentation/IRQ-domain.txt @@ -93,6 +93,7 @@ Linux IRQ number into the hardware. Most drivers cannot use this mapping. ==== Legacy ==== +irq_domain_add_simple() irq_domain_add_legacy() irq_domain_add_legacy_isa() @@ -115,3 +116,7 @@ The legacy map should only be used if fixed IRQ mappings must be supported. For example, ISA controllers would use the legacy map for mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ numbers. + +Most users of legacy mappings should use irq_domain_add_simple() which +will use a legacy domain only if an IRQ range is supplied by the +system and will otherwise use a linear domain mapping. diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index d8147b3..6518a55 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -38,6 +38,13 @@ read or write requests. Note that the total allocated number may be twice this amount, since it applies only to reads or writes (not the accumulated sum). +To avoid priority inversion through request starvation, a request +queue maintains a separate request pool per each cgroup when +CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such +per-block-cgroup request pool. IOW, if there are N block cgroups, +each request queue may have upto N request pools, each independently +regulated by nr_requests. + read_ahead_kb (RW) ------------------ Maximum number of kilobytes to read-ahead for filesystems on this block diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt new file mode 100644 index 0000000..a9faaca --- /dev/null +++ b/Documentation/cgroups/hugetlb.txt @@ -0,0 +1,45 @@ +HugeTLB Controller +------------------- + +The HugeTLB controller allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. Since HugeTLB doesn't +support page reclaim, enforcing the limit at page fault time implies that, +the application will get SIGBUS signal if it tries to access HugeTLB pages +beyond its limit. This requires the application to know beforehand how much +HugeTLB pages it would require for its use. + +HugeTLB controller can be created by first mounting the cgroup filesystem. + +# mount -t cgroup -o hugetlb none /sys/fs/cgroup + +With the above step, the initial or the parent HugeTLB group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. + +New groups can be created under the parent group /sys/fs/cgroup. + +# cd /sys/fs/cgroup +# mkdir g1 +# echo $$ > g1/tasks + +The above steps create a new group g1 and move the current shell +process (bash) into it. + +Brief summary of control files + + hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage + hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + hugetlb.<hugepagesize>.usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb + hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit + +For a system supporting two hugepage size (16M and 16G) the control +files include: + +hugetlb.16GB.limit_in_bytes +hugetlb.16GB.max_usage_in_bytes +hugetlb.16GB.usage_in_bytes +hugetlb.16GB.failcnt +hugetlb.16MB.limit_in_bytes +hugetlb.16MB.max_usage_in_bytes +hugetlb.16MB.usage_in_bytes +hugetlb.16MB.failcnt diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index dd88540..4372e6b 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -73,6 +73,8 @@ Brief summary of control files. memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits + memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded 1. History @@ -187,12 +189,12 @@ the cgroup that brought it in -- this will happen on memory pressure). But see section 8.2: when moving a task to another cgroup, its pages may be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. -Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. +Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. -2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) +2.4 Swap Extension (CONFIG_MEMCG_SWAP) Swap Extension allows you to record charge for swap. A swapped-in page is charged back to original page allocator if possible. @@ -259,7 +261,7 @@ When oom event notifier is registered, event will be delivered. per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by zone->lru_lock, it has no lock of its own. -2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) +2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) With the Kernel memory extension, the Memory Controller is able to limit the amount of kernel memory used by the system. Kernel memory is fundamentally @@ -286,8 +288,8 @@ per cgroup, instead of globally. a. Enable CONFIG_CGROUPS b. Enable CONFIG_RESOURCE_COUNTERS -c. Enable CONFIG_CGROUP_MEM_RES_CTLR -d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) +c. Enable CONFIG_MEMCG +d. Enable CONFIG_MEMCG_SWAP (to use swap extension) 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) # mount -t tmpfs none /sys/fs/cgroup diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index 946c733..1c18449 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -27,6 +27,10 @@ The target is named "raid" and it accepts the following parameters: - rotating parity N (right-to-left) with data restart raid6_nc RAID6 N continue - rotating parity N (right-to-left) with data continuation + raid10 Various RAID10 inspired algorithms chosen by additional params + - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') + - RAID1E: Integrated Adjacent Stripe Mirroring + - and other similar RAID10 variants Reference: Chapter 4 of http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf @@ -59,6 +63,28 @@ The target is named "raid" and it accepts the following parameters: logical size of the array. The bitmap records the device synchronisation state for each region. + [raid10_copies <# copies>] + [raid10_format near] + These two options are used to alter the default layout of + a RAID10 configuration. The number of copies is can be + specified, but the default is 2. There are other variations + to how the copies are laid down - the default and only current + option is "near". Near copies are what most people think of + with respect to mirroring. If these options are left + unspecified, or 'raid10_copies 2' and/or 'raid10_format near' + are given, then the layouts for 2, 3 and 4 devices are: + 2 drives 3 drives 4 drives + -------- ---------- -------------- + A1 A1 A1 A1 A2 A1 A1 A2 A2 + A2 A2 A2 A3 A3 A3 A3 A4 A4 + A3 A3 A4 A4 A5 A5 A5 A6 A6 + A4 A4 A5 A6 A6 A7 A7 A8 A8 + .. .. .. .. .. .. .. .. .. + The 2-device layout is equivalent 2-way RAID1. The 4-device + layout is what a traditional RAID10 would look like. The + 3-device layout is what might be called a 'RAID1E - Integrated + Adjacent Stripe Mirroring'. + <#raid_devs>: The number of devices composing the array. Each device consists of two entries. The first is the device containing the metadata (if any); the second is the one containing the diff --git a/Documentation/devicetree/bindings/arm/mrvl/intc.txt b/Documentation/devicetree/bindings/arm/mrvl/intc.txt index 80b9a94..8b53273 100644 --- a/Documentation/devicetree/bindings/arm/mrvl/intc.txt +++ b/Documentation/devicetree/bindings/arm/mrvl/intc.txt @@ -38,3 +38,23 @@ Example: reg-names = "mux status", "mux mask"; mrvl,intc-nr-irqs = <2>; }; + +* Marvell Orion Interrupt controller + +Required properties +- compatible : Should be "marvell,orion-intc". +- #interrupt-cells: Specifies the number of cells needed to encode an + interrupt source. Supported value is <1>. +- interrupt-controller : Declare this node to be an interrupt controller. +- reg : Interrupt mask address. A list of 4 byte ranges, one per controller. + One entry in the list represents 32 interrupts. + +Example: + + intc: interrupt-controller { + compatible = "marvell,orion-intc", "marvell,intc"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0xfed20204 0x04>, + <0xfed20214 0x04>; + }; diff --git a/Documentation/devicetree/bindings/ata/marvell.txt b/Documentation/devicetree/bindings/ata/marvell.txt new file mode 100644 index 0000000..b5cdd20 --- /dev/null +++ b/Documentation/devicetree/bindings/ata/marvell.txt @@ -0,0 +1,16 @@ +* Marvell Orion SATA + +Required Properties: +- compatibility : "marvell,orion-sata" +- reg : Address range of controller +- interrupts : Interrupt controller is using +- nr-ports : Number of SATA ports in use. + +Example: + + sata@80000 { + compatible = "marvell,orion-sata"; + reg = <0x80000 0x5000>; + interrupts = <21>; + nr-ports = <2>; + } diff --git a/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt index 05428f3..e137874 100644 --- a/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt +++ b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt @@ -27,3 +27,26 @@ Example: interrupt-controller; #interrupt-cells = <1>; }; + +* Marvell Orion GPIO Controller + +Required properties: +- compatible : Should be "marvell,orion-gpio" +- reg : Address and length of the register set for controller. +- gpio-controller : So we know this is a gpio controller. +- ngpio : How many gpios this controller has. +- interrupts : Up to 4 Interrupts for the controller. + +Optional properties: +- mask-offset : For SMP Orions, offset for Nth CPU + +Example: + + gpio0: gpio@10100 { + compatible = "marvell,orion-gpio"; + #gpio-cells = <2>; + gpio-controller; + reg = <0x10100 0x40>; + ngpio = <32>; + interrupts = <35>, <36>, <37>, <38>; + }; diff --git a/Documentation/devicetree/bindings/regulator/tps6586x.txt b/Documentation/devicetree/bindings/regulator/tps6586x.txt index d156e1b..da80c2a 100644 --- a/Documentation/devicetree/bindings/regulator/tps6586x.txt +++ b/Documentation/devicetree/bindings/regulator/tps6586x.txt @@ -9,9 +9,9 @@ Required properties: - regulators: list of regulators provided by this controller, must have property "regulator-compatible" to match their hardware counterparts: sm[0-2], ldo[0-9] and ldo_rtc -- sm0-supply: The input supply for the SM0. -- sm1-supply: The input supply for the SM1. -- sm2-supply: The input supply for the SM2. +- vin-sm0-supply: The input supply for the SM0. +- vin-sm1-supply: The input supply for the SM1. +- vin-sm2-supply: The input supply for the SM2. - vinldo01-supply: The input supply for the LDO1 and LDO2 - vinldo23-supply: The input supply for the LDO2 and LDO3 - vinldo4-supply: The input supply for the LDO4 @@ -30,9 +30,9 @@ Example: #gpio-cells = <2>; gpio-controller; - sm0-supply = <&some_reg>; - sm1-supply = <&some_reg>; - sm2-supply = <&some_reg>; + vin-sm0-supply = <&some_reg>; + vin-sm1-supply = <&some_reg>; + vin-sm2-supply = <&some_reg>; vinldo01-supply = <...>; vinldo23-supply = <...>; vinldo4-supply = <...>; diff --git a/Documentation/devicetree/bindings/watchdog/marvel.txt b/Documentation/devicetree/bindings/watchdog/marvel.txt new file mode 100644 index 0000000..0b2503a --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/marvel.txt @@ -0,0 +1,14 @@ +* Marvell Orion Watchdog Time + +Required Properties: + +- Compatibility : "marvell,orion-wdt" +- reg : Address of the timer registers + +Example: + + wdt@20300 { + compatible = "marvell,orion-wdt"; + reg = <0x20300 0x28>; + status = "okay"; + }; diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 09b132d..afaff31 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -13,6 +13,14 @@ Who: Jim Cromie <jim.cromie@gmail.com>, Jason Baron <jbaron@redhat.com> --------------------------- +What: /proc/sys/vm/nr_pdflush_threads +When: 2012 +Why: Since pdflush is deprecated, the interface exported in /proc/sys/vm/ + should be removed. +Who: Wanpeng Li <liwp@linux.vnet.ibm.com> + +--------------------------- + What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle When: 2012 Why: This optional sub-feature of APM is of dubious reliability, @@ -70,20 +78,6 @@ Who: Luis R. Rodriguez <lrodriguez@atheros.com> --------------------------- -What: IRQF_SAMPLE_RANDOM -Check: IRQF_SAMPLE_RANDOM -When: July 2009 - -Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy - sources in the kernel's current entropy model. To resolve this, every - input point to the kernel's entropy pool needs to better document the - type of entropy source it actually is. This will be replaced with - additional add_*_randomness functions in drivers/char/random.c - -Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> - ---------------------------- - What: The ieee80211_regdom module parameter When: March 2010 / desktop catchup @@ -632,3 +626,14 @@ Why: New drivers should use new V4L2_CAP_VIDEO_M2M capability flag Who: Sylwester Nawrocki <s.nawrocki@samsung.com> ---------------------------- + +What: OMAP private DMA implementation +When: 2013 +Why: We have a DMA engine implementation; all users should be updated + to use this rather than persisting with the old APIs. The old APIs + block merging the old DMA engine implementation into the DMA + engine driver. +Who: Russell King <linux@arm.linux.org.uk>, + Santosh Shilimkar <santosh.shilimkar@ti.com> + +---------------------------- diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index e0cce2a..e540a24 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -114,7 +114,6 @@ prototypes: int (*drop_inode) (struct inode *); void (*evict_inode) (struct inode *); void (*put_super) (struct super_block *); - void (*write_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); int (*freeze_fs) (struct super_block *); int (*unfreeze_fs) (struct super_block *); @@ -136,10 +135,9 @@ write_inode: drop_inode: !!!inode->i_lock!!! evict_inode: put_super: write -write_super: read sync_fs: read -freeze_fs: read -unfreeze_fs: read +freeze_fs: write +unfreeze_fs: write statfs: maybe(read) (see below) remount_fs: write umount_begin: no @@ -206,6 +204,8 @@ prototypes: int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); + int (*swap_activate)(struct file *); + int (*swap_deactivate)(struct file *); locking rules: All except set_page_dirty and freepage may block @@ -229,6 +229,8 @@ migratepage: yes (both) launder_page: yes is_partially_uptodate: yes error_remove_page: yes +swap_activate: no +swap_deactivate: no ->write_begin(), ->write_end(), ->sync_page() and ->readpage() may be called from the request handler (/dev/loop). @@ -330,6 +332,15 @@ cleaned, or an error value if not. Note that in order to prevent the page getting mapped back in and redirtied, it needs to be kept locked across the entire operation. + ->swap_activate will be called with a non-zero argument on +files backing (non block device backed) swapfiles. A return value +of zero indicates success, in which case this file can be used for +backing swapspace. The swapspace operations will be proxied to the +address space operations. + + ->swap_deactivate() will be called in the sys_swapoff() +path after ->swap_activate() returned success. + ----------------------- file_lock_operations ------------------------------ prototypes: void (*fl_copy_lock)(struct file_lock *, struct file_lock *); @@ -346,7 +357,6 @@ prototypes: int (*lm_compare_owner)(struct file_lock *, struct file_lock *); void (*lm_notify)(struct file_lock *); /* unblock callback */ int (*lm_grant)(struct file_lock *, struct file_lock *, int); - void (*lm_release_private)(struct file_lock *); void (*lm_break)(struct file_lock *); /* break_lease callback */ int (*lm_change)(struct file_lock **, int); @@ -355,7 +365,6 @@ locking rules: lm_compare_owner: yes no lm_notify: yes no lm_grant: no no -lm_release_private: maybe no lm_break: yes no lm_change yes no diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 2bef2b3..0742fee 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -94,9 +94,8 @@ protected. --- [mandatory] -BKL is also moved from around sb operations. ->write_super() Is now called -without BKL held. BKL should have been shifted into individual fs sb_op -functions. If you don't need it, remove it. +BKL is also moved from around sb operations. BKL should have been shifted into +individual fs sb_op functions. If you don't need it, remove it. --- [informational] diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index aa754e0..2ee133e 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -216,7 +216,6 @@ struct super_operations { void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); void (*put_super) (struct super_block *); - void (*write_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); int (*freeze_fs) (struct super_block *); int (*unfreeze_fs) (struct super_block *); @@ -273,9 +272,6 @@ or bottom half). put_super: called when the VFS wishes to free the superblock (i.e. unmount). This is called with the superblock lock held - write_super: called when the VFS superblock needs to be written to - disc. This method is optional - sync_fs: called when VFS is writing out all dirty data associated with a superblock. The second parameter indicates whether the method should wait until the write out has been completed. Optional. @@ -592,6 +588,8 @@ struct address_space_operations { int (*migratepage) (struct page *, struct page *); int (*launder_page) (struct page *); int (*error_remove_page) (struct mapping *mapping, struct page *page); + int (*swap_activate)(struct file *); + int (*swap_deactivate)(struct file *); }; writepage: called by the VM to write a dirty page to backing store. @@ -760,6 +758,16 @@ struct address_space_operations { Setting this implies you deal with pages going away under you, unless you have them locked or reference counts increased. + swap_activate: Called when swapon is used on a file to allocate + space if necessary and pin the block lookup information in + memory. A return value of zero indicates success, + in which case this file can be used to back swapspace. The + swapspace operations will be proxied to this address space's + ->swap_{out,in} methods. + + swap_deactivate: Called during swapoff on files where swap_activate + was successful. + The File Object =============== diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 915f28c..849b771 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -88,6 +88,7 @@ Code Seq#(hex) Include File Comments and kernel/power/user.c '8' all SNP8023 advanced NIC card <mailto:mcr@solidum.com> +';' 64-7F linux/vfio.h '@' 00-0F linux/radeonfb.h conflict! '@' 00-0F drivers/video/aty/aty128fb.c conflict! 'A' 00-1F linux/apm_bios.h conflict! diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt index 0bf25ee..4ebbfc3 100644 --- a/Documentation/laptops/laptop-mode.txt +++ b/Documentation/laptops/laptop-mode.txt @@ -262,9 +262,9 @@ MINIMUM_BATTERY_MINUTES=10 # # Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been -# exceeded, the kernel will wake pdflush which will then reduce the amount -# of dirty memory to dirty_background_ratio. Set this nice and low, so once -# some writeout has commenced, we do a lot of it. +# exceeded, the kernel will wake flusher threads which will then reduce the +# amount of dirty memory to dirty_background_ratio. Set this nice and low, +# so once some writeout has commenced, we do a lot of it. # #DIRTY_BACKGROUND_RATIO=5 @@ -384,9 +384,9 @@ CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'} # # Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been -# exceeded, the kernel will wake pdflush which will then reduce the amount -# of dirty memory to dirty_background_ratio. Set this nice and low, so once -# some writeout has commenced, we do a lot of it. +# exceeded, the kernel will wake flusher threads which will then reduce the +# amount of dirty memory to dirty_background_ratio. Set this nice and low, +# so once some writeout has commenced, we do a lot of it. # DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'} diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt index 8d02207..2e9e0ae2 100644 --- a/Documentation/networking/netconsole.txt +++ b/Documentation/networking/netconsole.txt @@ -51,8 +51,23 @@ Built-in netconsole starts immediately after the TCP stack is initialized and attempts to bring up the supplied dev at the supplied address. -The remote host can run either 'netcat -u -l -p <port>', -'nc -l -u <port>' or syslogd. +The remote host has several options to receive the kernel messages, +for example: + +1) syslogd + +2) netcat + + On distributions using a BSD-based netcat version (e.g. Fedora, + openSUSE and Ubuntu) the listening port must be specified without + the -p switch: + + 'nc -u -l -p <port>' / 'nc -u -l <port>' or + 'netcat -u -l -p <port>' / 'netcat -u -l <port>' + +3) socat + + 'socat udp-recv:<port> -' Dynamic reconfiguration: ======================== diff --git a/Documentation/pinctrl.txt b/Documentation/pinctrl.txt index e40f4b4..1479aca 100644 --- a/Documentation/pinctrl.txt +++ b/Documentation/pinctrl.txt @@ -840,9 +840,9 @@ static unsigned long i2c_pin_configs[] = { static struct pinctrl_map __initdata mapping[] = { PIN_MAP_MUX_GROUP("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0", "i2c0"), - PIN_MAP_MUX_CONFIGS_GROUP("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0", i2c_grp_configs), - PIN_MAP_MUX_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0scl", i2c_pin_configs), - PIN_MAP_MUX_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0sda", i2c_pin_configs), + PIN_MAP_CONFIGS_GROUP("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0", i2c_grp_configs), + PIN_MAP_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0scl", i2c_pin_configs), + PIN_MAP_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0sda", i2c_pin_configs), }; Finally, some devices expect the mapping table to contain certain specific diff --git a/Documentation/security/Yama.txt b/Documentation/security/Yama.txt index e369de2..dd908cf 100644 --- a/Documentation/security/Yama.txt +++ b/Documentation/security/Yama.txt @@ -46,14 +46,13 @@ restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...) so that any otherwise allowed process (even those in external pid namespaces) may attach. -These restrictions do not change how ptrace via PTRACE_TRACEME operates. - -The sysctl settings are: +The sysctl settings (writable only with CAP_SYS_PTRACE) are: 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other process running under the same uid, as long as it is dumpable (i.e. did not transition uids, start privileged, or have called - prctl(PR_SET_DUMPABLE...) already). + prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is + unchanged. 1 - restricted ptrace: a process must have a predefined relationship with the inferior it wants to call PTRACE_ATTACH on. By default, @@ -61,12 +60,13 @@ The sysctl settings are: classic criteria is also met. To change the relationship, an inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare an allowed debugger PID to call PTRACE_ATTACH on the inferior. + Using PTRACE_TRACEME is unchanged. 2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace - with PTRACE_ATTACH. + with PTRACE_ATTACH, or through children calling PTRACE_TRACEME. -3 - no attach: no processes may use ptrace with PTRACE_ATTACH. Once set, - this sysctl cannot be changed to a lower value. +3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via + PTRACE_TRACEME. Once set, this sysctl value cannot be changed. The original children-only logic was based on the restrictions in grsecurity. diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 7456360..a92bba8 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -53,6 +53,7 @@ ALC882/883/885/888/889 acer-aspire-8930g Acer Aspire 8330G/6935G acer-aspire Acer Aspire others inv-dmic Inverted internal mic workaround + no-primary-hp VAIO Z workaround (for fixed speaker DAC) ALC861/660 ========== @@ -273,6 +274,10 @@ STAC92HD83* dell-s14 Dell laptop dell-vostro-3500 Dell Vostro 3500 laptop hp-dv7-4000 HP dv-7 4000 + hp_cNB11_intquad HP CNB models with 4 speakers + hp-zephyr HP Zephyr + hp-led HP with broken BIOS for mute LED + hp-inv-led HP with broken BIOS for inverted mute LED auto BIOS setup (default) STAC9872 diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 8c235b6..88152f2 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs: - nr_open - overflowuid - overflowgid +- protected_hardlinks +- protected_symlinks - suid_dumpable - super-max - super-nr @@ -157,6 +159,46 @@ The default is 65534. ============================================================== +protected_hardlinks: + +A long-standing class of security issues is the hardlink-based +time-of-check-time-of-use race, most commonly seen in world-writable +directories like /tmp. The common method of exploitation of this flaw +is to cross privilege boundaries when following a given hardlink (i.e. a +root process follows a hardlink created by another user). Additionally, +on systems without separated partitions, this stops unauthorized users +from "pinning" vulnerable setuid/setgid files against being upgraded by +the administrator, or linking to special files. + +When set to "0", hardlink creation behavior is unrestricted. + +When set to "1" hardlinks cannot be created by users if they do not +already own the source file, or do not have read/write access to it. + +This protection is based on the restrictions in Openwall and grsecurity. + +============================================================== + +protected_symlinks: + +A long-standing class of security issues is the symlink-based +time-of-check-time-of-use race, most commonly seen in world-writable +directories like /tmp. The common method of exploitation of this flaw +is to cross privilege boundaries when following a given symlink (i.e. a +root process follows a symlink belonging to another user). For a likely +incomplete list of hundreds of examples across the years, please see: +http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp + +When set to "0", symlink following behavior is unrestricted. + +When set to "1" symlinks are permitted to be followed only when outside +a sticky world-writable directory, or when the uid of the symlink and +follower match, or when the directory owner matches the symlink's owner. + +This protection is based on the restrictions in Openwall and grsecurity. + +============================================================== + suid_dumpable: This value can be used to query and set the core dump mode for setuid diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee8..078701f 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -42,7 +42,6 @@ Currently, these files are in /proc/sys/vm: - mmap_min_addr - nr_hugepages - nr_overcommit_hugepages -- nr_pdflush_threads - nr_trim_pages (only if CONFIG_MMU=n) - numa_zonelist_order - oom_dump_tasks @@ -77,8 +76,8 @@ huge pages although processes will also directly compact memory as required. dirty_background_bytes -Contains the amount of dirty memory at which the pdflush background writeback -daemon will start writeback. +Contains the amount of dirty memory at which the background kernel +flusher threads will start writeback. Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only one of them may be specified at a time. When one sysctl is written it is @@ -90,7 +89,7 @@ other appears as 0 when read. dirty_background_ratio Contains, as a percentage of total system memory, the number of pages at which -the pdflush background writeback daemon will start writing out dirty data. +the background kernel flusher threads will start writing out dirty data. ============================================================== @@ -113,9 +112,9 @@ retained. dirty_expire_centisecs This tunable is used to define when dirty data is old enough to be eligible -for writeout by the pdflush daemons. It is expressed in 100'ths of a second. -Data which has been dirty in-memory for longer than this interval will be -written out next time a pdflush daemon wakes up. +for writeout by the kernel flusher threads. It is expressed in 100'ths +of a second. Data which has been dirty in-memory for longer than this +interval will be written out next time a flusher thread wakes up. ============================================================== @@ -129,7 +128,7 @@ data. dirty_writeback_centisecs -The pdflush writeback daemons will periodically wake up and write `old' data +The kernel flusher threads will periodically wake up and write `old' data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second. @@ -426,16 +425,6 @@ See Documentation/vm/hugetlbpage.txt ============================================================== -nr_pdflush_threads - -The current number of pdflush threads. This value is read-only. -The value changes according to the number of dirty pages in the system. - -When necessary, additional pdflush threads are created, one per second, up to -nr_pdflush_threads_max. - -============================================================== - nr_trim_pages This is available only on NOMMU kernels. @@ -502,9 +491,10 @@ oom_dump_tasks Enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such -information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and -name. This is helpful to determine why the OOM killer was invoked -and to identify the rogue task that caused it. +information as pid, uid, tgid, vm size, rss, nr_ptes, swapents, +oom_score_adj score, and name. This is helpful to determine why the +OOM killer was invoked, to identify the rogue task that caused it, +and to determine why the OOM killer chose the task it did to kill. If this is set to zero, this information is suppressed. On very large systems with thousands of tasks it may not be feasible to dump @@ -574,16 +564,24 @@ of physical RAM. See above. page-cluster -page-cluster controls the number of pages which are written to swap in -a single attempt. The swap I/O size. +page-cluster controls the number of pages up to which consecutive pages +are read in from swap in a single attempt. This is the swap counterpart +to page cache readahead. +The mentioned consecutivity is not in terms of virtual/physical addresses, +but consecutive on swap space - that means they were swapped out together. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. +Zero disables swap readahead completely. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. +Lower values mean lower latencies for initial faults, but at the same time +extra faults and I/O delays for following faults if they would have been part of +that consecutive pages readahead would have brought in. + ============================================================= panic_on_oom diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt new file mode 100644 index 0000000..0cb6685 --- /dev/null +++ b/Documentation/vfio.txt @@ -0,0 +1,314 @@ +VFIO - "Virtual Function I/O"[1] +------------------------------------------------------------------------------- +Many modern system now provide DMA and interrupt remapping facilities +to help ensure I/O devices behave within the boundaries they've been +allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, +POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC +systems such as Freescale PAMU. The VFIO driver is an IOMMU/device +agnostic framework for exposing direct device access to userspace, in +a secure, IOMMU protected environment. In other words, this allows +safe[2], non-privileged, userspace drivers. + +Why do we want that? Virtual machines often make use of direct device +access ("device assignment") when configured for the highest possible +I/O performance. From a device and host perspective, this simply +turns the VM into a userspace driver, with the benefits of +significantly reduced latency, higher bandwidth, and direct use of +bare-metal device drivers[3]. + +Some applications, particularly in the high performance computing +field, also benefit from low-overhead, direct device access from +userspace. Examples include network adapters (often non-TCP/IP based) +and compute accelerators. Prior to VFIO, these drivers had to either +go through the full development cycle to become proper upstream +driver, be maintained out of tree, or make use of the UIO framework, +which has no notion of IOMMU protection, limited interrupt support, +and requires root privileges to access things like PCI configuration +space. + +The VFIO driver framework intends to unify these, replacing both the +KVM PCI specific device assignment code as well as provide a more +secure, more featureful userspace driver environment than UIO. + +Groups, Devices, and IOMMUs +------------------------------------------------------------------------------- + +Devices are the main target of any I/O driver. Devices typically +create a programming interface made up of I/O access, interrupts, +and DMA. Without going into the details of each of these, DMA is +by far the most critical aspect for maintaining a secure environment +as allowing a device read-write access to system memory imposes the +greatest risk to the overall system integrity. + +To help mitigate this risk, many modern IOMMUs now incorporate +isolation properties into what was, in many cases, an interface only +meant for translation (ie. solving the addressing problems of devices +with limited address spaces). With this, devices can now be isolated +from each other and from arbitrary memory access, thus allowing +things like secure direct assignment of devices into virtual machines. + +This isolation is not always at the granularity of a single device +though. Even when an IOMMU is capable of this, properties of devices, +interconnects, and IOMMU topologies can each reduce this isolation. +For instance, an individual device may be part of a larger multi- +function enclosure. While the IOMMU may be able to distinguish +between devices within the enclosure, the enclosure may not require +transactions between devices to reach the IOMMU. Examples of this +could be anything from a multi-function PCI device with backdoors +between functions to a non-PCI-ACS (Access Control Services) capable +bridge allowing redirection without reaching the IOMMU. Topology +can also play a factor in terms of hiding devices. A PCIe-to-PCI +bridge masks the devices behind it, making transaction appear as if +from the bridge itself. Obviously IOMMU design plays a major factor +as well. + +Therefore, while for the most part an IOMMU may have device level +granularity, any system is susceptible to reduced granularity. The +IOMMU API therefore supports a notion of IOMMU groups. A group is +a set of devices which is isolatable from all other devices in the +system. Groups are therefore the unit of ownership used by VFIO. + +While the group is the minimum granularity that must be used to +ensure secure user access, it's not necessarily the preferred +granularity. In IOMMUs which make use of page tables, it may be +possible to share a set of page tables between different groups, +reducing the overhead both to the platform (reduced TLB thrashing, +reduced duplicate page tables), and to the user (programming only +a single set of translations). For this reason, VFIO makes use of +a container class, which may hold one or more groups. A container +is created by simply opening the /dev/vfio/vfio character device. + +On its own, the container provides little functionality, with all +but a couple version and extension query interfaces locked away. +The user needs to add a group into the container for the next level +of functionality. To do this, the user first needs to identify the +group associated with the desired device. This can be done using +the sysfs links described in the example below. By unbinding the +device from the host driver and binding it to a VFIO driver, a new +VFIO group will appear for the group as /dev/vfio/$GROUP, where +$GROUP is the IOMMU group number of which the device is a member. +If the IOMMU group contains multiple devices, each will need to +be bound to a VFIO driver before operations on the VFIO group +are allowed (it's also sufficient to only unbind the device from +host drivers if a VFIO driver is unavailable; this will make the +group available, but not that particular device). TBD - interface +for disabling driver probing/locking a device. + +Once the group is ready, it may be added to the container by opening +the VFIO group character device (/dev/vfio/$GROUP) and using the +VFIO_GROUP_SET_CONTAINER ioctl, passing the file descriptor of the +previously opened container file. If desired and if the IOMMU driver +supports sharing the IOMMU context between groups, multiple groups may +be set to the same container. If a group fails to set to a container +with existing groups, a new empty container will need to be used +instead. + +With a group (or groups) attached to a container, the remaining +ioctls become available, enabling access to the VFIO IOMMU interfaces. +Additionally, it now becomes possible to get file descriptors for each +device within a group using an ioctl on the VFIO group file descriptor. + +The VFIO device API includes ioctls for describing the device, the I/O +regions and their read/write/mmap offsets on the device descriptor, as +well as mechanisms for describing and registering interrupt +notifications. + +VFIO Usage Example +------------------------------------------------------------------------------- + +Assume user wants to access PCI device 0000:06:0d.0 + +$ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group +../../../../kernel/iommu_groups/26 + +This device is therefore in IOMMU group 26. This device is on the +pci bus, therefore the user will make use of vfio-pci to manage the +group: + +# modprobe vfio-pci + +Binding this device to the vfio-pci driver creates the VFIO group +character devices for this group: + +$ lspci -n -s 0000:06:0d.0 +06:0d.0 0401: 1102:0002 (rev 08) +# echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind +# echo 1102 0002 > /sys/bus/pci/drivers/vfio/new_id + +Now we need to look at what other devices are in the group to free +it for use by VFIO: + +$ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices +total 0 +lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 -> + ../../../../devices/pci0000:00/0000:00:1e.0 +lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 -> + ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 +lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 -> + ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 + +This device is behind a PCIe-to-PCI bridge[4], therefore we also +need to add device 0000:06:0d.1 to the group following the same +procedure as above. Device 0000:00:1e.0 is a bridge that does +not currently have a host driver, therefore it's not required to +bind this device to the vfio-pci driver (vfio-pci does not currently +support PCI bridges). + +The final step is to provide the user with access to the group if +unprivileged operation is desired (note that /dev/vfio/vfio provides +no capabilities on its own and is therefore expected to be set to +mode 0666 by the system). + +# chown user:user /dev/vfio/26 + +The user now has full access to all the devices and the iommu for this +group and can access them as follows: + + int container, group, device, i; + struct vfio_group_status group_status = + { .argsz = sizeof(group_status) }; + struct vfio_iommu_x86_info iommu_info = { .argsz = sizeof(iommu_info) }; + struct vfio_iommu_x86_dma_map dma_map = { .argsz = sizeof(dma_map) }; + struct vfio_device_info device_info = { .argsz = sizeof(device_info) }; + + /* Create a new container */ + container = open("/dev/vfio/vfio, O_RDWR); + + if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION) + /* Unknown API version */ + + if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_X86_IOMMU)) + /* Doesn't support the IOMMU driver we want. */ + + /* Open the group */ + group = open("/dev/vfio/26", O_RDWR); + + /* Test the group is viable and available */ + ioctl(group, VFIO_GROUP_GET_STATUS, &group_status); + + if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) + /* Group is not viable (ie, not all devices bound for vfio) */ + + /* Add the group to the container */ + ioctl(group, VFIO_GROUP_SET_CONTAINER, &container); + + /* Enable the IOMMU model we want */ + ioctl(container, VFIO_SET_IOMMU, VFIO_X86_IOMMU) + + /* Get addition IOMMU info */ + ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info); + + /* Allocate some space and setup a DMA mapping */ + dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + dma_map.size = 1024 * 1024; + dma_map.iova = 0; /* 1MB starting at 0x0 from device view */ + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; + + ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map); + + /* Get a file descriptor for the device */ + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0"); + + /* Test and setup the device */ + ioctl(device, VFIO_DEVICE_GET_INFO, &device_info); + + for (i = 0; i < device_info.num_regions; i++) { + struct vfio_region_info reg = { .argsz = sizeof(reg) }; + + reg.index = i; + + ioctl(device, VFIO_DEVICE_GET_REGION_INFO, ®); + + /* Setup mappings... read/write offsets, mmaps + * For PCI devices, config space is a region */ + } + + for (i = 0; i < device_info.num_irqs; i++) { + struct vfio_irq_info irq = { .argsz = sizeof(irq) }; + + irq.index = i; + + ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, ®); + + /* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */ + } + + /* Gratuitous device reset and go... */ + ioctl(device, VFIO_DEVICE_RESET); + +VFIO User API +------------------------------------------------------------------------------- + +Please see include/linux/vfio.h for complete API documentation. + +VFIO bus driver API +------------------------------------------------------------------------------- + +VFIO bus drivers, such as vfio-pci make use of only a few interfaces +into VFIO core. When devices are bound and unbound to the driver, +the driver should call vfio_add_group_dev() and vfio_del_group_dev() +respectively: + +extern int vfio_add_group_dev(struct iommu_group *iommu_group, + struct device *dev, + const struct vfio_device_ops *ops, + void *device_data); + +extern void *vfio_del_group_dev(struct device *dev); + +vfio_add_group_dev() indicates to the core to begin tracking the +specified iommu_group and register the specified dev as owned by +a VFIO bus driver. The driver provides an ops structure for callbacks +similar to a file operations structure: + +struct vfio_device_ops { + int (*open)(void *device_data); + void (*release)(void *device_data); + ssize_t (*read)(void *device_data, char __user *buf, + size_t count, loff_t *ppos); + ssize_t (*write)(void *device_data, const char __user *buf, + size_t size, loff_t *ppos); + long (*ioctl)(void *device_data, unsigned int cmd, + unsigned long arg); + int (*mmap)(void *device_data, struct vm_area_struct *vma); +}; + +Each function is passed the device_data that was originally registered +in the vfio_add_group_dev() call above. This allows the bus driver +an easy place to store its opaque, private data. The open/release +callbacks are issued when a new file descriptor is created for a +device (via VFIO_GROUP_GET_DEVICE_FD). The ioctl interface provides +a direct pass through for VFIO_DEVICE_* ioctls. The read/write/mmap +interfaces implement the device region access defined by the device's +own VFIO_DEVICE_GET_REGION_INFO ioctl. + +------------------------------------------------------------------------------- + +[1] VFIO was originally an acronym for "Virtual Function I/O" in its +initial implementation by Tom Lyon while as Cisco. We've since +outgrown the acronym, but it's catchy. + +[2] "safe" also depends upon a device being "well behaved". It's +possible for multi-function devices to have backdoors between +functions and even for single function devices to have alternative +access to things like PCI config space through MMIO registers. To +guard against the former we can include additional precautions in the +IOMMU driver to group multi-function PCI devices together +(iommu=group_mf). The latter we can't prevent, but the IOMMU should +still provide isolation. For PCI, SR-IOV Virtual Functions are the +best indicator of "well behaved", as these are designed for +virtualization usage models. + +[3] As always there are trade-offs to virtual machine device +assignment that are beyond the scope of VFIO. It's expected that +future IOMMU technologies will reduce some, but maybe not all, of +these trade-offs. + +[4] In this case the device is below a PCI bridge, so transactions +from either function of the device are indistinguishable to the iommu: + +-[0000:00]-+-1e.0-[06]--+-0d.0 + \-0d.1 + +00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) diff --git a/Documentation/w1/slaves/w1_therm b/Documentation/w1/slaves/w1_therm index 0403aaa..874a8ca 100644 --- a/Documentation/w1/slaves/w1_therm +++ b/Documentation/w1/slaves/w1_therm @@ -3,6 +3,7 @@ Kernel driver w1_therm Supported chips: * Maxim ds18*20 based temperature sensors. + * Maxim ds1825 based temperature sensors. Author: Evgeniy Polyakov <johnpol@2ka.mipt.ru> @@ -15,6 +16,7 @@ supported family codes: W1_THERM_DS18S20 0x10 W1_THERM_DS1822 0x22 W1_THERM_DS18B20 0x28 +W1_THERM_DS1825 0x3B Support is provided through the sysfs w1_slave file. Each open and read sequence will initiate a temperature conversion then provide two |