op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| *	PM: Allow SCSI devices to suspend/resume asynchronously	Rafael J. Wysocki	2010-02-26	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set power.async_suspend for all SCSI devices, targets and hosts, so that they can be suspended and resumed in parallel with the main suspend/resume thread and possibly with other devices they don't depend on in a known way (i.e. devices which are not their parents or children). The power.async_suspend flag is also set for devices that don't have suspend or resume callbacks, because otherwise they would make the main suspend/resume thread wait for their "asynchronous" children (during suspend) or parents (during resume), effectively negating the possible gains from executing these devices' suspend and resume callbacks asynchronously. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Allow USB devices to suspend/resume asynchronously	Rafael J. Wysocki	2010-02-26	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set power.async_suspend for USB devices, endpoints and interfaces, allowing them to be suspended and resumed asynchronously during system sleep transitions. The power.async_suspend flag is also set for devices that don't have suspend or resume callbacks, because otherwise they would make the main suspend/resume thread wait for their "asynchronous" children (during suspend) or parents (during resume), effectively negating the possible gains from executing these devices' suspend and resume callbacks asynchronously. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	USB: implement non-tree resume ordering constraints for PCI host controllers	Alan Stern	2010-02-26	3	-1/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch (as1331) adds non-tree ordering constraints needed for proper resume of PCI USB host controllers from hibernation. The main issue is that non-high-speed devices must not be resumed before the high-speed root hub, because it is the ehci_bus_resume() routine which takes care of handing the device connection over to the companion controller. If the device resume is attempted before the handover then the device won't be found and it will be treated as though it had disconnected. The patch adds a new field to the usb_bus structure; for each full/low-speed bus this field will contain a pointer to the companion high-speed bus (if one exists). It is used during normal device resume; if the hs_companion pointer isn't NULL then we wait for the root-hub device on the hs_companion bus. A secondary issue is that an EHCI controlller shouldn't be resumed before any of its companions. On some machines I have observed handovers failing if the companion controller is reinitialized after the handover. Thus, the EHCI resume routine must wait for the companion controllers to be resumed. The patch also fixes a small bug in usb_hcd_pci_probe(); an error path jumps to the wrong label, causing a memory leak. [rjw: Fixed compilation for CONFIG_PM_SLEEP unset.] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Allow PCI devices to suspend/resume asynchronously	Rafael J. Wysocki	2010-02-26	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set power.async_suspend for all PCI devices and PCIe port services, so that they can be suspended and resumed in parallel with other devices they don't depend on in a known way (i.e. devices which are not their parents or children). This only affects the "regular" suspend and resume stages, which means in particular that the restoration of the PCI devices' standard configuration registers during resume will still be carried out synchronously (at the "early" resume stage). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM / Hibernate: Swap, remove useless check from swsusp_read()	Jiri Slaby	2010-02-26	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It will never reach here if the sws_resume_bdev is erratic. swsusp_read() is called only from software_resume(), but after swsusp_check() which would catch the error state. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM / Hibernate: Really deprecate deprecated user ioctls	Jiri Slaby	2010-02-26	2	-4/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They were deprecated and removed from exported headers more than 2 years ago. Inform users about their removal in the future now. (Switch cases needed to be reorderded for an easy fall through.) And add an entry to feature-removal-schedule. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Allow device drivers to use dpm_wait()	Rafael J. Wysocki	2010-02-26	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some dependencies between devices (in particular, between EHCI USB controllers and their OHCI/UHCI siblings) which are not reflected by the structure of the device tree. With synchronous suspend and resume these dependencies are taken into accout automatically, because the devices in question are always registered in the right order, but to meet these constraints with asynchronous suspend and resume the drivers of these devices will need to use dpm_wait() in their suspend/resume routines, so introduce a helper function allowing them to do that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Start asynchronous resume threads upfront	Rafael J. Wysocki	2010-02-26	1	-19/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It has been shown by testing that total device resume time can be reduced significantly (by as much as 50% or more) if the async threads executing some devices' resume routines are all started before the main resume thread starts to handle the "synchronous" devices. This is a consequence of the fact that the slowest devices tend to be located at the end of dpm_list, so their resume routines are started very late. Consequently, they have to wait for all the preceding "synchronous" devices before their resume routines can be started by the main resume thread, even if they are "asynchronous". By starting their async threads upfront we effectively move those devices towards the beginning of dpm_list, without breaking their ordering with respect to their parents and children. As a result, their resume routines are started much earlier and we are able to save much more device resume time this way. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Add facility for advanced testing of async suspend/resume	Rafael J. Wysocki	2010-02-26	4	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add configuration switch CONFIG_PM_ADVANCED_DEBUG for compiling in extra PM debugging/testing code allowing one to access some PM-related attributes of devices from the user space via sysfs. If CONFIG_PM_ADVANCED_DEBUG is set, add sysfs attribute power/async for every device allowing the user space to access the device's power.async_suspend flag and modify it, if desired. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Add a switch for disabling/enabling asynchronous suspend/resume	Rafael J. Wysocki	2010-02-26	4	-7/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add sysfs attribute /sys/power/pm_async allowing the user space to disable/enable asynchronous suspend/resume of devices. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Asynchronous suspend and resume of devices	Rafael J. Wysocki	2010-02-26	4	-6/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Theoretically, the total time of system sleep transitions (suspend to RAM, hibernation) can be reduced by running suspend and resume callbacks of device drivers in parallel with each other. However, there are dependencies between devices such that we're not allowed to suspend the parent of a device before suspending the device itself. Analogously, we're not allowed to resume a device before resuming its parent. The most straightforward way to take these dependencies into accout is to start the async threads used for suspending and resuming devices at the core level, so that async_schedule() is called for each suspend and resume callback supposed to be executed asynchronously. For this purpose, introduce a new device flag, power.async_suspend, used to mark the devices whose suspend and resume callbacks are to be executed asynchronously (ie. in parallel with the main suspend/resume thread and possibly in parallel with each other) and helper function device_enable_async_suspend() allowing one to set power.async_suspend for given device (power.async_suspend is unset by default for all devices). For each device with the power.async_suspend flag set the PM core will use async_schedule() to execute its suspend and resume callbacks. The async threads started for different devices as a result of calling async_schedule() are synchronized with each other and with the main suspend/resume thread with the help of completions, in the following way: (1) There is a completion, power.completion, for each device object. (2) Each device's completion is reset before calling async_schedule() for the device or, in the case of devices with the power.async_suspend flags unset, before executing the device's suspend and resume callbacks. (3) During suspend, right before running the bus type, device type and device class suspend callbacks for the device, the PM core waits for the completions of all the device's children to be completed. (4) During resume, right before running the bus type, device type and device class resume callbacks for the device, the PM core waits for the completion of the device's parent to be completed. (5) The PM core completes power.completion for each device right after the bus type, device type and device class suspend (or resume) callbacks executed for the device have returned. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Add parent information to timing messages	Rafael J. Wysocki	2010-02-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add parent information to the messages printed by the suspend/resume core when initcall_debug is set. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM: Document device power attributes in sysfs	Rafael J. Wysocki	2010-02-26	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are sysfs attributes in /sys/devices/.../power/ that haven't been documented yet in Documentation/ABI/. Document them as appropriate. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
\| *	PM / Runtime: Add sysfs switch for disabling device run-time PM	Rafael J. Wysocki	2010-02-26	4	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new device sysfs attribute, power/control, allowing the user space to block the run-time power management of the devices. If this attribute is set to "on", the driver of the device won't be able to power manage it at run time (without breaking the rules) and the device will always be in the full power state (except when the entire system goes into a sleep state). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Alan Stern <stern@rowland.harvard.edu>
* \|	Remove EXPERIMENTAL from NFS_FSCACHE	Christian Kujau	2010-02-26	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE (and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel. However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen. As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that much any more, I propose the following patch to fs/nfs/Kconfig. I'd do the same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I feel it would not be appropriate for me to remove the flag. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5 [1] http://lkml.org/lkml/2010/1/23/145 Signed-off-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2010-02-26	7	-196/+575
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: toshiba_acpi: Add full hotkey support hp-wmi: Add support for tablet rotation key dell-laptop: Add another Dell laptop to the DMI whitelist classmate-laptop: use a single MODULE_DEVICE_TABLE to get correct aliases dell-laptop: Pay attention to which devices the hardware switch controls dell-laptop: Use buffer with 32-bit physical address dell-laptop: Blacklist machines not supporting dell-laptop dell-laptop: Block software state changes when rfkill hard blocked dell-laptop: Fix small memory leak dell-laptop: Fix platform device unregistration dell-laptop: Update rfkill state on kill switch compal-laptop: Replace sysfs support with rfkill support compal-laptop: Add support for known Compal made Dell laptops MAINTAINERS: update drivers/platform/x86 information
\| * \|	toshiba_acpi: Add full hotkey support	Matthew Garrett	2010-02-25	1	-7/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling the ENAB method on Toshiba laptops results in notifications being sent when laptop hotkeys are pressed. This patch simply calls that method and sets up an input device if it's successful. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
\| * \|	hp-wmi: Add support for tablet rotation key	Matthew Garrett	2010-02-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The HP touchsmart tablet has a key for rotating the UI from landscape to portrait. Add support for it. Signed-off-by: Matthew Garrett <mjg@redhat.com>
\| * \|	dell-laptop: Add another Dell laptop to the DMI whitelist	Erik Andren	2010-02-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Latitude C640 has another variation of dell in its DMI vendor entry. Add it to the whitelist in order to enjoy the sweet fruits of software backlight toggling. Signed-off-by: Erik Andren <erik.andren@gmail.com>
\| * \|	classmate-laptop: use a single MODULE_DEVICE_TABLE to get correct aliases	Thadeu Lima de Souza Cascardo	2010-02-25	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of a MODULE_DEVICE_TABLE for every acpi_driver ids table, we create a table containing all ids to export to get a module alias for each one. This will fix automatic loading of the driver when one of the ACPI devices is not present (like the accelerometer, which is not present in some models). Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
\| * \|	dell-laptop: Pay attention to which devices the hardware switch controls	Matthew Garrett	2010-02-25	1	-2/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, we assume that the hardware rfkill switch on Dells toggles all radio devices. In fact, this can be configured in the BIOS and so right now we may mark a device as hardware killed even when it isn't. Add code to query the devices controlled by the switch, and use this when determining the hardware kill state of a radio. Signed-off-by: Matthew Garrett <mjg@redhat.com>
\| * \|	dell-laptop: Use buffer with 32-bit physical address	Stuart Hayes	2010-02-25	1	-39/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calls to communicate with system firmware via a SMI (using dcdbas) need to use a buffer that has a physical address of 4GB or less. Currently the dell-laptop driver does not guarantee this, and when the buffer address is higher than 4GB, the address is truncated to 32 bits and the SMI handler writes to the wrong memory address. Signed-off-by: Stuart Hayes <stuart_hayes@dell.com> Acked-by: Matthew Garrett <mjg@redhat.com>
\| * \|	dell-laptop: Blacklist machines not supporting dell-laptop	Mario Limonciello	2010-02-25	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Mini family doesn't support smbios 17,11 although it reports it does. Signed-off-by: Mario Limonciello <superm1@ubuntu.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
\| * \|	dell-laptop: Block software state changes when rfkill hard blocked	Mario Limonciello	2010-02-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "hardware" switch is tied directly to a BIOS interface that will connect and disconnect the hardware from the bus. If you use the software interface to request the BIOS to make these changes, the HW switch will be in an inconsistent state and LEDs may not reflect the state of the HW. Signed-off-by: Mario Limonciello <Mario_Limonciello@Dell.com>
\| * \|	dell-laptop: Fix small memory leak	Matthew Garrett	2010-02-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	da_tokens was not being freed by dell-laptop on unload. Fix that. Signed-off-by: Matthew Garrett <mjg@redhat.com>
\| * \|	dell-laptop: Fix platform device unregistration	Matthew Garrett	2010-02-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dell-laptop currently fails to clean up its platform device correctly. Make sure that it's unregistered. Signed-off-by: Matthew Garrett <mjg@redhat.com>
\| * \|	dell-laptop: Update rfkill state on kill switch	Matthew Garrett	2010-02-25	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rfkill interface on Dells only sends a notification that the switch has been changed via the keyboard controller. Add a filter so we can pick these notifications up and update the rfkill state appropriately. Signed-off-by: Matthew Garrett <mjg@redhat.com>
\| * \|	compal-laptop: Replace sysfs support with rfkill support	Mario Limonciello	2010-02-25	1	-138/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This drops the support for manually groking the files in sysfs to turn on and off the WLAN and BT for Compal laptops in favor of platform rfkill support. It has been combined into a single patch to not introduce regressions in the process of simply adding rfkill support Signed-off-by: Mario Limonciello <Mario_Limonciello@Dell.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Cezary Jackiewicz <cezary.jackiewicz@gmail.com>
\| * \|	compal-laptop: Add support for known Compal made Dell laptops	Mario Limonciello	2010-02-25	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following Dell laptops are known to have been manufacturer by Compal and are supported by the compal-laptop platform driver - Mini 9 - Mini 10 - Mini 12 - Mini 10v - Inspiron 11z Signed-off-by: Mario Limonciello <Mario_Limonciello@Dell.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Cezary Jackiewicz <cezary.jackiewicz@gmail.com>
\| * \|	MAINTAINERS: update drivers/platform/x86 information	Matthew Garrett	2010-02-25	1	-4/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many of the drivers/platform/x86 drivers have nothing to do with ACPI, so it's kind of inappropriate for them to be stuck under the ACPI mailing list. Add a new mailing list (platform-driver-x86@vger.kernel.org) and, with Len's blessing, add myself as subsystem maintainer. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Anisse Astier <anisse@astier.eu> Cc: Carlos Corbacho <carlos@strangeworlds.co.uk> Cc: Cezary Jackiewicz <cezary.jackiewicz@gmail.com> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Daniel Oliveira Nascimento <don@syst.com.br> Cc: Harald Welte <laforge@gnumonks.org> Cc: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Cc: Jonathan Woithe <jwoithe@physics.adelaide.edu.au> Cc: Karol Kozimor <sziwan@users.sourceforge.net> Cc: Len Brown <lenb@kernel.org> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Mattia Dongili <malattia@linux.it> Cc: Peter Feuerer <peter@piie.net> Cc: Sujith Thomas <sujith.thomas@intel.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-02-26	8	-58/+180
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use bastmode in debugfs output dlm: Send lockspace name with uevents dlm: send reply before bast dlm: fix ordering of bast and cast
\| * \| \|	dlm: use bastmode in debugfs output	David Teigland	2010-02-26	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bast mode that appears in the debugfs output should be useful on both master and process nodes. lkb_highbast is currently printed, and is only useful on the master node. lkb_bastmode is only useful on the process node. This patch sets lkb_bastmode on the master node as well, and uses that value in the debugfs print. Signed-off-by: David Teigland <teigland@redhat.com>
\| * \| \|	dlm: Send lockspace name with uevents	Steven Whitehouse	2010-02-26	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
\| * \| \|	dlm: send reply before bast	David Teigland	2010-02-26	1	-26/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the lock master processes a successful operation (request, convert, cancel, or unlock), it will process the effects of the change before sending the reply for the operation. The "effects" of the operation are: - blocking callbacks (basts) for any newly granted locks - waiting or converting locks that can now be granted The cast is queued on the local node when the reply from the lock master is received. This means that a lock holder can receive a bast for a lock mode that is doesn't yet know has been granted. Signed-off-by: David Teigland <teigland@redhat.com>
\| * \| \|	dlm: fix ordering of bast and cast	David Teigland	2010-02-24	6	-28/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When both blocking and completion callbacks are queued for lock, the dlm would always deliver the completion callback (cast) first. In some cases the blocking callback (bast) is queued before the cast, though, and should be delivered first. This patch keeps track of the order in which they were queued and delivers them in that order. This patch also keeps track of the granted mode in the last cast and eliminates the following bast if the bast mode is compatible with the preceding cast mode. This happens when a remotely mastered lock is demoted, e.g. EX->NL, in which case the local node queues a cast immediately after sending the demote message. In this way a cast can be queued for a mode, e.g. NL, that makes an in-transit bast extraneous. Signed-off-by: David Teigland <teigland@redhat.com>
* \| \| \|	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds	2010-02-26	79	-1596/+1666
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://oss.sgi.com/xfs/xfs: (52 commits) fs/xfs: Correct NULL test xfs: optimize log flushing in xfs_fsync xfs: only clear the suid bit once in xfs_write xfs: kill xfs_bawrite xfs: log changed inodes instead of writing them synchronously xfs: remove invalid barrier optimization from xfs_fsync xfs: kill the unused XFS_QMOPT_* flush flags V2 xfs: Use delay write promotion for dquot flushing xfs: Sort delayed write buffers before dispatch xfs: Don't issue buffer IO direct from AIL push V2 xfs: Use delayed write for inodes rather than async V2 xfs: Make inode reclaim states explicit xfs: more reserved blocks fixups xfs: turn off sign warnings xfs: don't hold onto reserved blocks on remount,ro xfs: quota limit statvfs available blocks xfs: replace KM_LARGE with explicit vmalloc use xfs: cleanup up xfs_log_force calling conventions xfs: kill XLOG_VEC_SET_TYPE xfs: remove duplicate buffer flags ...
\| * \ \ \	Merge branch 'linux-2.6.33'	Alex Elder	2010-02-26	1237	-10826/+29011
\| \|\ \ \ \
\| * \| \| \| \|	fs/xfs: Correct NULL test	Julia Lawall	2010-02-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test the value that was just allocated rather than the previously tested one. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ expression x; expression e; identifier l; @@ if (x == NULL \|\| ...) { ... when forall return ...; } ... when != goto l; when != x = e when != &x x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
\| * \| \| \| \|	xfs: optimize log flushing in xfs_fsync	Christoph Hellwig	2010-02-12	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a pinned inode it must have a log item attached to it. Usually that log item will have ili_last_lsn already set, in which case we only need to flush the log up to that LSN instead of doing a full log force. This gives speedups of about 5% in some fsync heavy workloads. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
\| * \| \| \| \|	xfs: only clear the suid bit once in xfs_write	Christoph Hellwig	2010-02-12	3	-55/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	file_remove_suid already calls into ->setattr to clear the suid and sgid bits if needed, no need to start a second transaction to do it ourselves. Note that xfs_write_clear_setuid issues a sync transaction while the path through ->setattr doesn't, but that is consistant with the other filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
\| * \| \| \| \|	xfs: kill xfs_bawrite	Dave Chinner	2010-02-04	2	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are no more users of this function left in the XFS code now that we've switched everything to delayed write flushing. Remove it. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: log changed inodes instead of writing them synchronously	Christoph Hellwig	2010-02-09	1	-29/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an inode has already be flushed delayed write, xfs_inode_clean() returns true and hence xfs_fs_write_inode() can return on a synchronous inode write without having written the inode. Currently these sycnhronous writes only come sync(1), unmount, a sycnhronous NFS export and cachefiles so should be relatively rare and out of common performance paths. Realistically, a synchronous inode write is not necessary here; we can avoid writing the inode by logging any non-transactional changes that are pending. This needs to be done with synchronous transactions, but it avoids seeking between the log and inode clusters as we do now. We don't force the log if the inode is pinned, though, so this differs from the fsync case. For normal sys_sync and unmount behaviour this is fine because we do a synchronous log force in xfs_sync_data which is called from the ->sync_fs code. It does however break the NFS synchronous export guarantees for now, but work is under way to fix this at a higher level or for the higher level to provide an additional flag in the writeback control to tell us that a log force is needed. Portions of this patch are based on work from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Alex Elder <aelder@sgi.com>
\| * \| \| \| \|	xfs: remove invalid barrier optimization from xfs_fsync	Christoph Hellwig	2010-02-02	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always need to flush the disk write cache and can't skip it just because the no inode attributes have changed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
\| * \| \| \| \|	xfs: kill the unused XFS_QMOPT_* flush flags V2	Dave Chinner	2010-02-04	4	-23/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dquots are never flushed asynchronously. Remove the flag and the async write support from the flush function. Make the default flush a delwri flush to make the inode flush code, which leaves the XFS_QMOPT_SYNC the only flag remaining. Convert that to use SYNC_WAIT instead, just like the inode flush code. V2: - just pass flush flags straight through Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: Use delay write promotion for dquot flushing	Dave Chinner	2010-01-26	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item pushing used to do to flush out delayed write dquot buffers. Change it to use the new promotion method rather than an async flush. Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock held, yet the callers make the assumption that after this call the flush lock is held. Always return with the flush lock held. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: Sort delayed write buffers before dispatch	Dave Chinner	2010-01-26	1	-27/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when the xfsbufd writes delayed write buffers, it pushes them to disk in the order they come off the delayed write list. If there are lots of buffers ѕpread widely over the disk, this results in overwhelming the elevator sort queues in the block layer and we end up losing the posibility of merging adjacent buffers to minimise the number of IOs. Use the new generic list_sort function to sort the delwri dispatch queue before issue to ensure that the buffers are pushed in the most friendly order possible to the lower layers. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: Don't issue buffer IO direct from AIL push V2	Dave Chinner	2010-02-02	10	-203/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All buffers logged into the AIL are marked as delayed write. When the AIL needs to push the buffer out, it issues an async write of the buffer. This means that IO patterns are dependent on the order of buffers in the AIL. Instead of flushing the buffer, promote the buffer in the delayed write list so that the next time the xfsbufd is run the buffer will be flushed by the xfsbufd. Return the state to the xfsaild that the buffer was promoted so that the xfsaild knows that it needs to cause the xfsbufd to run to flush the buffers that were promoted. Using the xfsbufd for issuing the IO allows us to dispatch all buffer IO from the one queue. This means that we can make much more enlightened decisions on what order to flush buffers to disk as we don't have multiple places issuing IO. Optimisations to xfsbufd will be in a future patch. Version 2 - kill XFS_ITEM_FLUSHING as it is now unused. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: Use delayed write for inodes rather than async V2	Dave Chinner	2010-02-06	6	-115/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: Make inode reclaim states explicit	Dave Chinner	2010-02-06	3	-29/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	xfs: more reserved blocks fixups	Eric Sandeen	2010-02-08	4	-25/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This mangles the reserved blocks counts a little more. 1) add a helper function for the default reserved count 2) add helper functions to save/restore counts on ro/rw 3) save/restore reserved blocks on freeze/thaw 4) disallow changing reserved count while readonly V2: changed field name to match Dave's changes Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>