summaryrefslogtreecommitdiffstats
path: root/Documentation/power
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/power')
-rw-r--r--Documentation/power/00-INDEX40
-rw-r--r--Documentation/power/apm-acpi.txt32
-rw-r--r--Documentation/power/basic-pm-debugging.txt204
-rw-r--r--Documentation/power/devices.txt512
-rw-r--r--Documentation/power/drivers-testing.txt46
-rw-r--r--Documentation/power/freezing-of-tasks.txt178
-rw-r--r--Documentation/power/interface.txt75
-rw-r--r--Documentation/power/notifiers.txt58
-rw-r--r--Documentation/power/pci.txt299
-rw-r--r--Documentation/power/pm_qos_interface.txt64
-rw-r--r--Documentation/power/power_supply_class.txt173
-rw-r--r--Documentation/power/regulator/consumer.txt182
-rw-r--r--Documentation/power/regulator/machine.txt93
-rw-r--r--Documentation/power/regulator/overview.txt171
-rw-r--r--Documentation/power/regulator/regulator.txt30
-rw-r--r--Documentation/power/s2ram.txt74
-rw-r--r--Documentation/power/states.txt80
-rw-r--r--Documentation/power/swsusp-and-swap-files.txt60
-rw-r--r--Documentation/power/swsusp-dmcrypt.txt138
-rw-r--r--Documentation/power/swsusp.txt407
-rw-r--r--Documentation/power/tricks.txt27
-rw-r--r--Documentation/power/userland-swsusp.txt165
-rw-r--r--Documentation/power/video.txt185
-rw-r--r--Documentation/power/video_extension.txt37
24 files changed, 3330 insertions, 0 deletions
diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
new file mode 100644
index 0000000..fb742c2
--- /dev/null
+++ b/Documentation/power/00-INDEX
@@ -0,0 +1,40 @@
+00-INDEX
+ - This file
+apm-acpi.txt
+ - basic info about the APM and ACPI support.
+basic-pm-debugging.txt
+ - Debugging suspend and resume
+devices.txt
+ - How drivers interact with system-wide power management
+drivers-testing.txt
+ - Testing suspend and resume support in device drivers
+freezing-of-tasks.txt
+ - How processes and controlled during suspend
+interface.txt
+ - Power management user interface in /sys/power
+notifiers.txt
+ - Registering suspend notifiers in device drivers
+pci.txt
+ - How the PCI Subsystem Does Power Management
+pm_qos_interface.txt
+ - info on Linux PM Quality of Service interface
+power_supply_class.txt
+ - Tells userspace about battery, UPS, AC or DC power supply properties
+s2ram.txt
+ - How to get suspend to ram working (and debug it when it isn't)
+states.txt
+ - System power management states
+swsusp-and-swap-files.txt
+ - Using swap files with software suspend (to disk)
+swsusp-dmcrypt.txt
+ - How to use dm-crypt and software suspend (to disk) together
+swsusp.txt
+ - Goals, implementation, and usage of software suspend (ACPI S3)
+tricks.txt
+ - How to trick software suspend (to disk) into working when it isn't
+userland-swsusp.txt
+ - Experimental implementation of software suspend in userspace
+video_extension.txt
+ - ACPI video extensions
+video.txt
+ - Video issues during resume from suspend
diff --git a/Documentation/power/apm-acpi.txt b/Documentation/power/apm-acpi.txt
new file mode 100644
index 0000000..1bd799d
--- /dev/null
+++ b/Documentation/power/apm-acpi.txt
@@ -0,0 +1,32 @@
+APM or ACPI?
+------------
+If you have a relatively recent x86 mobile, desktop, or server system,
+odds are it supports either Advanced Power Management (APM) or
+Advanced Configuration and Power Interface (ACPI). ACPI is the newer
+of the two technologies and puts power management in the hands of the
+operating system, allowing for more intelligent power management than
+is possible with BIOS controlled APM.
+
+The best way to determine which, if either, your system supports is to
+build a kernel with both ACPI and APM enabled (as of 2.3.x ACPI is
+enabled by default). If a working ACPI implementation is found, the
+ACPI driver will override and disable APM, otherwise the APM driver
+will be used.
+
+No, sorry, you cannot have both ACPI and APM enabled and running at
+once. Some people with broken ACPI or broken APM implementations
+would like to use both to get a full set of working features, but you
+simply cannot mix and match the two. Only one power management
+interface can be in control of the machine at once. Think about it..
+
+User-space Daemons
+------------------
+Both APM and ACPI rely on user-space daemons, apmd and acpid
+respectively, to be completely functional. Obtain both of these
+daemons from your Linux distribution or from the Internet (see below)
+and be sure that they are started sometime in the system boot process.
+Go ahead and start both. If ACPI or APM is not available on your
+system the associated daemon will exit gracefully.
+
+ apmd: http://worldvisions.ca/~apenwarr/apmd/
+ acpid: http://acpid.sf.net/
diff --git a/Documentation/power/basic-pm-debugging.txt b/Documentation/power/basic-pm-debugging.txt
new file mode 100644
index 0000000..1555001
--- /dev/null
+++ b/Documentation/power/basic-pm-debugging.txt
@@ -0,0 +1,204 @@
+Debugging hibernation and suspend
+ (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+1. Testing hibernation (aka suspend to disk or STD)
+
+To check if hibernation works, you can try to hibernate in the "reboot" mode:
+
+# echo reboot > /sys/power/disk
+# echo disk > /sys/power/state
+
+and the system should create a hibernation image, reboot, resume and get back to
+the command prompt where you have started the transition. If that happens,
+hibernation is most likely to work correctly. Still, you need to repeat the
+test at least a couple of times in a row for confidence. [This is necessary,
+because some problems only show up on a second attempt at suspending and
+resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
+modes causes the PM core to skip some platform-related callbacks which on ACPI
+systems might be necessary to make hibernation work. Thus, if you machine fails
+to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
+
+# echo platform > /sys/power/disk
+# echo disk > /sys/power/state
+
+which is the default and recommended mode of hibernation.
+
+Unfortunately, the "platform" mode of hibernation does not work on some systems
+with broken BIOSes. In such cases the "shutdown" mode of hibernation might
+work:
+
+# echo shutdown > /sys/power/disk
+# echo disk > /sys/power/state
+
+(it is similar to the "reboot" mode, but it requires you to press the power
+button to make the system resume).
+
+If neither "platform" nor "shutdown" hibernation mode works, you will need to
+identify what goes wrong.
+
+a) Test modes of hibernation
+
+To find out why hibernation fails on your system, you can use a special testing
+facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
+there is the file /sys/power/pm_test that can be used to make the hibernation
+core run in a test mode. There are 5 test modes available:
+
+freezer
+- test the freezing of processes
+
+devices
+- test the freezing of processes and suspending of devices
+
+platform
+- test the freezing of processes, suspending of devices and platform
+ global control methods(*)
+
+processors
+- test the freezing of processes, suspending of devices, platform
+ global control methods(*) and the disabling of nonboot CPUs
+
+core
+- test the freezing of processes, suspending of devices, platform global
+ control methods(*), the disabling of nonboot CPUs and suspending of
+ platform/system devices
+
+(*) the platform global control methods are only available on ACPI systems
+ and are only tested if the hibernation mode is set to "platform"
+
+To use one of them it is necessary to write the corresponding string to
+/sys/power/pm_test (eg. "devices" to test the freezing of processes and
+suspending devices) and issue the standard hibernation commands. For example,
+to use the "devices" test mode along with the "platform" mode of hibernation,
+you should do the following:
+
+# echo devices > /sys/power/pm_test
+# echo platform > /sys/power/disk
+# echo disk > /sys/power/state
+
+Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds,
+resume devices and thaw processes. If "platform" is written to
+/sys/power/pm_test , then after suspending devices the kernel will additionally
+invoke the global control methods (eg. ACPI global control methods) used to
+prepare the platform firmware for hibernation. Next, it will wait 5 seconds and
+invoke the platform (eg. ACPI) global methods used to cancel hibernation etc.
+
+Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
+hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test
+contains a space-separated list of all available tests (including "none" that
+represents the normal functionality) in which the current test level is
+indicated by square brackets.
+
+Generally, as you can see, each test level is more "invasive" than the previous
+one and the "core" level tests the hardware and drivers as deeply as possible
+without creating a hibernation image. Obviously, if the "devices" test fails,
+the "platform" test will fail as well and so on. Thus, as a rule of thumb, you
+should try the test modes starting from "freezer", through "devices", "platform"
+and "processors" up to "core" (repeat the test on each level a couple of times
+to make sure that any random factors are avoided).
+
+If the "freezer" test fails, there is a task that cannot be frozen (in that case
+it usually is possible to identify the offending task by analysing the output of
+dmesg obtained after the failing test). Failure at this level usually means
+that there is a problem with the tasks freezer subsystem that should be
+reported.
+
+If the "devices" test fails, most likely there is a driver that cannot suspend
+or resume its device (in the latter case the system may hang or become unstable
+after the test, so please take that into consideration). To find this driver,
+you can carry out a binary search according to the rules:
+- if the test fails, unload a half of the drivers currently loaded and repeat
+(that would probably involve rebooting the system, so always note what drivers
+have been loaded before the test),
+- if the test succeeds, load a half of the drivers you have unloaded most
+recently and repeat.
+
+Once you have found the failing driver (there can be more than just one of
+them), you have to unload it every time before hibernation. In that case please
+make sure to report the problem with the driver.
+
+It is also possible that the "devices" test will still fail after you have
+unloaded all modules. In that case, you may want to look in your kernel
+configuration for the drivers that can be compiled as modules (and test again
+with these drivers compiled as modules). You may also try to use some special
+kernel command line options such as "noapic", "noacpi" or even "acpi=off".
+
+If the "platform" test fails, there is a problem with the handling of the
+platform (eg. ACPI) firmware on your system. In that case the "platform" mode
+of hibernation is not likely to work. You can try the "shutdown" mode, but that
+is rather a poor man's workaround.
+
+If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
+work (of course, this only may be an issue on SMP systems) and the problem
+should be reported. In that case you can also try to switch the nonboot CPUs
+off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
+see if that works.
+
+If the "core" test fails, which means that suspending of the system/platform
+devices has failed (these devices are suspended on one CPU with interrupts off),
+the problem is most probably hardware-related and serious, so it should be
+reported.
+
+A failure of any of the "platform", "processors" or "core" tests may cause your
+system to hang or become unstable, so please beware. Such a failure usually
+indicates a serious problem that very well may be related to the hardware, but
+please report it anyway.
+
+b) Testing minimal configuration
+
+If all of the hibernation test modes work, you can boot the system with the
+"init=/bin/bash" command line parameter and attempt to hibernate in the
+"reboot", "shutdown" and "platform" modes. If that does not work, there
+probably is a problem with a driver statically compiled into the kernel and you
+can try to compile more drivers as modules, so that they can be tested
+individually. Otherwise, there is a problem with a modular driver and you can
+find it by loading a half of the modules you normally use and binary searching
+in accordance with the algorithm:
+- if there are n modules loaded and the attempt to suspend and resume fails,
+unload n/2 of the modules and try again (that would probably involve rebooting
+the system),
+- if there are n modules loaded and the attempt to suspend and resume succeeds,
+load n/2 modules more and try again.
+
+Again, if you find the offending module(s), it(they) must be unloaded every time
+before hibernation, and please report the problem with it(them).
+
+c) Advanced debugging
+
+In case that hibernation does not work on your system even in the minimal
+configuration and compiling more drivers as modules is not practical or some
+modules cannot be unloaded, you can use one of the more advanced debugging
+techniques to find the problem. First, if there is a serial port in your box,
+you can boot the kernel with the 'no_console_suspend' parameter and try to log
+kernel messages using the serial console. This may provide you with some
+information about the reasons of the suspend (resume) failure. Alternatively,
+it may be possible to use a FireWire port for debugging with firescope
+(ftp://ftp.firstfloor.org/pub/ak/firescope/). On x86 it is also possible to
+use the PM_TRACE mechanism documented in Documentation/s2ram.txt .
+
+2. Testing suspend to RAM (STR)
+
+To verify that the STR works, it is generally more convenient to use the s2ram
+tool available from http://suspend.sf.net and documented at
+http://en.opensuse.org/s2ram . However, before doing that it is recommended to
+carry out STR testing using the facility described in section 1.
+
+Namely, after writing "freezer", "devices", "platform", "processors", or "core"
+into /sys/power/pm_test (available if the kernel is compiled with
+CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
+to given string. The STR test modes are defined in the same way as for
+hibernation, so please refer to Section 1 for more information about them. In
+particular, the "core" test allows you to test everything except for the actual
+invocation of the platform firmware in order to put the system into the sleep
+state.
+
+Among other things, the testing with the help of /sys/power/pm_test may allow
+you to identify drivers that fail to suspend or resume their devices. They
+should be unloaded every time before an STR transition.
+
+Next, you can follow the instructions at http://en.opensuse.org/s2ram to test
+the system, but if it does not work "out of the box", you may need to boot it
+with "init=/bin/bash" and test s2ram in the minimal configuration. In that
+case, you may be able to search for failing drivers by following the procedure
+analogous to the one described in section 1. If you find some failing drivers,
+you will have to unload them every time before an STR transition (ie. before
+you run s2ram), and please report the problems with them.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
new file mode 100644
index 0000000..421e7d0
--- /dev/null
+++ b/Documentation/power/devices.txt
@@ -0,0 +1,512 @@
+Most of the code in Linux is device drivers, so most of the Linux power
+management code is also driver-specific. Most drivers will do very little;
+others, especially for platforms with small batteries (like cell phones),
+will do a lot.
+
+This writeup gives an overview of how drivers interact with system-wide
+power management goals, emphasizing the models and interfaces that are
+shared by everything that hooks up to the driver model core. Read it as
+background for the domain-specific work you'd do with any specific driver.
+
+
+Two Models for Device Power Management
+======================================
+Drivers will use one or both of these models to put devices into low-power
+states:
+
+ System Sleep model:
+ Drivers can enter low power states as part of entering system-wide
+ low-power states like "suspend-to-ram", or (mostly for systems with
+ disks) "hibernate" (suspend-to-disk).
+
+ This is something that device, bus, and class drivers collaborate on
+ by implementing various role-specific suspend and resume methods to
+ cleanly power down hardware and software subsystems, then reactivate
+ them without loss of data.
+
+ Some drivers can manage hardware wakeup events, which make the system
+ leave that low-power state. This feature may be disabled using the
+ relevant /sys/devices/.../power/wakeup file; enabling it may cost some
+ power usage, but let the whole system enter low power states more often.
+
+ Runtime Power Management model:
+ Drivers may also enter low power states while the system is running,
+ independently of other power management activity. Upstream drivers
+ will normally not know (or care) if the device is in some low power
+ state when issuing requests; the driver will auto-resume anything
+ that's needed when it gets a request.
+
+ This doesn't have, or need much infrastructure; it's just something you
+ should do when writing your drivers. For example, clk_disable() unused
+ clocks as part of minimizing power drain for currently-unused hardware.
+ Of course, sometimes clusters of drivers will collaborate with each
+ other, which could involve task-specific power management.
+
+There's not a lot to be said about those low power states except that they
+are very system-specific, and often device-specific. Also, that if enough
+drivers put themselves into low power states (at "runtime"), the effect may be
+the same as entering some system-wide low-power state (system sleep) ... and
+that synergies exist, so that several drivers using runtime pm might put the
+system into a state where even deeper power saving options are available.
+
+Most suspended devices will have quiesced all I/O: no more DMA or irqs, no
+more data read or written, and requests from upstream drivers are no longer
+accepted. A given bus or platform may have different requirements though.
+
+Examples of hardware wakeup events include an alarm from a real time clock,
+network wake-on-LAN packets, keyboard or mouse activity, and media insertion
+or removal (for PCMCIA, MMC/SD, USB, and so on).
+
+
+Interfaces for Entering System Sleep States
+===========================================
+Most of the programming interfaces a device driver needs to know about
+relate to that first model: entering a system-wide low power state,
+rather than just minimizing power consumption by one device.
+
+
+Bus Driver Methods
+------------------
+The core methods to suspend and resume devices reside in struct bus_type.
+These are mostly of interest to people writing infrastructure for busses
+like PCI or USB, or because they define the primitives that device drivers
+may need to apply in domain-specific ways to their devices:
+
+struct bus_type {
+ ...
+ int (*suspend)(struct device *dev, pm_message_t state);
+ int (*suspend_late)(struct device *dev, pm_message_t state);
+
+ int (*resume_early)(struct device *dev);
+ int (*resume)(struct device *dev);
+};
+
+Bus drivers implement those methods as appropriate for the hardware and
+the drivers using it; PCI works differently from USB, and so on. Not many
+people write bus drivers; most driver code is a "device driver" that
+builds on top of bus-specific framework code.
+
+For more information on these driver calls, see the description later;
+they are called in phases for every device, respecting the parent-child
+sequencing in the driver model tree. Note that as this is being written,
+only the suspend() and resume() are widely available; not many bus drivers
+leverage all of those phases, or pass them down to lower driver levels.
+
+
+/sys/devices/.../power/wakeup files
+-----------------------------------
+All devices in the driver model have two flags to control handling of
+wakeup events, which are hardware signals that can force the device and/or
+system out of a low power state. These are initialized by bus or device
+driver code using device_init_wakeup(dev,can_wakeup).
+
+The "can_wakeup" flag just records whether the device (and its driver) can
+physically support wakeup events. When that flag is clear, the sysfs
+"wakeup" file is empty, and device_may_wakeup() returns false.
+
+For devices that can issue wakeup events, a separate flag controls whether
+that device should try to use its wakeup mechanism. The initial value of
+device_may_wakeup() will be true, so that the device's "wakeup" file holds
+the value "enabled". Userspace can change that to "disabled" so that
+device_may_wakeup() returns false; or change it back to "enabled" (so that
+it returns true again).
+
+
+EXAMPLE: PCI Device Driver Methods
+-----------------------------------
+PCI framework software calls these methods when the PCI device driver bound
+to a device device has provided them:
+
+struct pci_driver {
+ ...
+ int (*suspend)(struct pci_device *pdev, pm_message_t state);
+ int (*suspend_late)(struct pci_device *pdev, pm_message_t state);
+
+ int (*resume_early)(struct pci_device *pdev);
+ int (*resume)(struct pci_device *pdev);
+};
+
+Drivers will implement those methods, and call PCI-specific procedures
+like pci_set_power_state(), pci_enable_wake(), pci_save_state(), and
+pci_restore_state() to manage PCI-specific mechanisms. (PCI config space
+could be saved during driver probe, if it weren't for the fact that some
+systems rely on userspace tweaking using setpci.) Devices are suspended
+before their bridges enter low power states, and likewise bridges resume
+before their devices.
+
+
+Upper Layers of Driver Stacks
+-----------------------------
+Device drivers generally have at least two interfaces, and the methods
+sketched above are the ones which apply to the lower level (nearer PCI, USB,
+or other bus hardware). The network and block layers are examples of upper
+level interfaces, as is a character device talking to userspace.
+
+Power management requests normally need to flow through those upper levels,
+which often use domain-oriented requests like "blank that screen". In
+some cases those upper levels will have power management intelligence that
+relates to end-user activity, or other devices that work in cooperation.
+
+When those interfaces are structured using class interfaces, there is a
+standard way to have the upper layer stop issuing requests to a given
+class device (and restart later):
+
+struct class {
+ ...
+ int (*suspend)(struct device *dev, pm_message_t state);
+ int (*resume)(struct device *dev);
+};
+
+Those calls are issued in specific phases of the process by which the
+system enters a low power "suspend" state, or resumes from it.
+
+
+Calling Drivers to Enter System Sleep States
+============================================
+When the system enters a low power state, each device's driver is asked
+to suspend the device by putting it into state compatible with the target
+system state. That's usually some version of "off", but the details are
+system-specific. Also, wakeup-enabled devices will usually stay partly
+functional in order to wake the system.
+
+When the system leaves that low power state, the device's driver is asked
+to resume it. The suspend and resume operations always go together, and
+both are multi-phase operations.
+
+For simple drivers, suspend might quiesce the device using the class code
+and then turn its hardware as "off" as possible with late_suspend. The
+matching resume calls would then completely reinitialize the hardware
+before reactivating its class I/O queues.
+
+More power-aware drivers drivers will use more than one device low power
+state, either at runtime or during system sleep states, and might trigger
+system wakeup events.
+
+
+Call Sequence Guarantees
+------------------------
+To ensure that bridges and similar links needed to talk to a device are
+available when the device is suspended or resumed, the device tree is
+walked in a bottom-up order to suspend devices. A top-down order is
+used to resume those devices.
+
+The ordering of the device tree is defined by the order in which devices
+get registered: a child can never be registered, probed or resumed before
+its parent; and can't be removed or suspended after that parent.
+
+The policy is that the device tree should match hardware bus topology.
+(Or at least the control bus, for devices which use multiple busses.)
+In particular, this means that a device registration may fail if the parent of
+the device is suspending (ie. has been chosen by the PM core as the next
+device to suspend) or has already suspended, as well as after all of the other
+devices have been suspended. Device drivers must be prepared to cope with such
+situations.
+
+
+Suspending Devices
+------------------
+Suspending a given device is done in several phases. Suspending the
+system always includes every phase, executing calls for every device
+before the next phase begins. Not all busses or classes support all
+these callbacks; and not all drivers use all the callbacks.
+
+The phases are seen by driver notifications issued in this order:
+
+ 1 class.suspend(dev, message) is called after tasks are frozen, for
+ devices associated with a class that has such a method. This
+ method may sleep.
+
+ Since I/O activity usually comes from such higher layers, this is
+ a good place to quiesce all drivers of a given type (and keep such
+ code out of those drivers).
+
+ 2 bus.suspend(dev, message) is called next. This method may sleep,
+ and is often morphed into a device driver call with bus-specific
+ parameters and/or rules.
+
+ This call should handle parts of device suspend logic that require
+ sleeping. It probably does work to quiesce the device which hasn't
+ been abstracted into class.suspend() or bus.suspend_late().
+
+ 3 bus.suspend_late(dev, message) is called with IRQs disabled, and
+ with only one CPU active. Until the bus.resume_early() phase
+ completes (see later), IRQs are not enabled again. This method
+ won't be exposed by all busses; for message based busses like USB,
+ I2C, or SPI, device interactions normally require IRQs. This bus
+ call may be morphed into a driver call with bus-specific parameters.
+
+ This call might save low level hardware state that might otherwise
+ be lost in the upcoming low power state, and actually put the
+ device into a low power state ... so that in some cases the device
+ may stay partly usable until this late. This "late" call may also
+ help when coping with hardware that behaves badly.
+
+The pm_message_t parameter is currently used to refine those semantics
+(described later).
+
+At the end of those phases, drivers should normally have stopped all I/O
+transactions (DMA, IRQs), saved enough state that they can re-initialize
+or restore previous state (as needed by the hardware), and placed the
+device into a low-power state. On many platforms they will also use
+clk_disable() to gate off one or more clock sources; sometimes they will
+also switch off power supplies, or reduce voltages. Drivers which have
+runtime PM support may already have performed some or all of the steps
+needed to prepare for the upcoming system sleep state.
+
+When any driver sees that its device_can_wakeup(dev), it should make sure
+to use the relevant hardware signals to trigger a system wakeup event.
+For example, enable_irq_wake() might identify GPIO signals hooked up to
+a switch or other external hardware, and pci_enable_wake() does something
+similar for PCI's PME# signal.
+
+If a driver (or bus, or class) fails it suspend method, the system won't
+enter the desired low power state; it will resume all the devices it's
+suspended so far.
+
+Note that drivers may need to perform different actions based on the target
+system lowpower/sleep state. At this writing, there are only platform
+specific APIs through which drivers could determine those target states.
+
+
+Device Low Power (suspend) States
+---------------------------------
+Device low-power states aren't very standard. One device might only handle
+"on" and "off, while another might support a dozen different versions of
+"on" (how many engines are active?), plus a state that gets back to "on"
+faster than from a full "off".
+
+Some busses define rules about what different suspend states mean. PCI
+gives one example: after the suspend sequence completes, a non-legacy
+PCI device may not perform DMA or issue IRQs, and any wakeup events it
+issues would be issued through the PME# bus signal. Plus, there are
+several PCI-standard device states, some of which are optional.
+
+In contrast, integrated system-on-chip processors often use irqs as the
+wakeup event sources (so drivers would call enable_irq_wake) and might
+be able to treat DMA completion as a wakeup event (sometimes DMA can stay
+active too, it'd only be the CPU and some peripherals that sleep).
+
+Some details here may be platform-specific. Systems may have devices that
+can be fully active in certain sleep states, such as an LCD display that's
+refreshed using DMA while most of the system is sleeping lightly ... and
+its frame buffer might even be updated by a DSP or other non-Linux CPU while
+the Linux control processor stays idle.
+
+Moreover, the specific actions taken may depend on the target system state.
+One target system state might allow a given device to be very operational;
+another might require a hard shut down with re-initialization on resume.
+And two different target systems might use the same device in different
+ways; the aforementioned LCD might be active in one product's "standby",
+but a different product using the same SOC might work differently.
+
+
+Meaning of pm_message_t.event
+-----------------------------
+Parameters to suspend calls include the device affected and a message of
+type pm_message_t, which has one field: the event. If driver does not
+recognize the event code, suspend calls may abort the request and return
+a negative errno. However, most drivers will be fine if they implement
+PM_EVENT_SUSPEND semantics for all messages.
+
+The event codes are used to refine the goal of suspending the device, and
+mostly matter when creating or resuming system memory image snapshots, as
+used with suspend-to-disk:
+
+ PM_EVENT_SUSPEND -- quiesce the driver and put hardware into a low-power
+ state. When used with system sleep states like "suspend-to-RAM" or
+ "standby", the upcoming resume() call will often be able to rely on
+ state kept in hardware, or issue system wakeup events.
+
+ PM_EVENT_HIBERNATE -- Put hardware into a low-power state and enable wakeup
+ events as appropriate. It is only used with hibernation
+ (suspend-to-disk) and few devices are able to wake up the system from
+ this state; most are completely powered off.
+
+ PM_EVENT_FREEZE -- quiesce the driver, but don't necessarily change into
+ any low power mode. A system snapshot is about to be taken, often
+ followed by a call to the driver's resume() method. Neither wakeup
+ events nor DMA are allowed.
+
+ PM_EVENT_PRETHAW -- quiesce the driver, knowing that the upcoming resume()
+ will restore a suspend-to-disk snapshot from a different kernel image.
+ Drivers that are smart enough to look at their hardware state during
+ resume() processing need that state to be correct ... a PRETHAW could
+ be used to invalidate that state (by resetting the device), like a
+ shutdown() invocation would before a kexec() or system halt. Other
+ drivers might handle this the same way as PM_EVENT_FREEZE. Neither
+ wakeup events nor DMA are allowed.
+
+To enter "standby" (ACPI S1) or "Suspend to RAM" (STR, ACPI S3) states, or
+the similarly named APM states, only PM_EVENT_SUSPEND is used; the other event
+codes are used for hibernation ("Suspend to Disk", STD, ACPI S4).
+
+There's also PM_EVENT_ON, a value which never appears as a suspend event
+but is sometimes used to record the "not suspended" device state.
+
+
+Resuming Devices
+----------------
+Resuming is done in multiple phases, much like suspending, with all
+devices processing each phase's calls before the next phase begins.
+
+The phases are seen by driver notifications issued in this order:
+
+ 1 bus.resume_early(dev) is called with IRQs disabled, and with
+ only one CPU active. As with bus.suspend_late(), this method
+ won't be supported on busses that require IRQs in order to
+ interact with devices.
+
+ This reverses the effects of bus.suspend_late().
+
+ 2 bus.resume(dev) is called next. This may be morphed into a device
+ driver call with bus-specific parameters; implementations may sleep.
+
+ This reverses the effects of bus.suspend().
+
+ 3 class.resume(dev) is called for devices associated with a class
+ that has such a method. Implementations may sleep.
+
+ This reverses the effects of class.suspend(), and would usually
+ reactivate the device's I/O queue.
+
+At the end of those phases, drivers should normally be as functional as
+they were before suspending: I/O can be performed using DMA and IRQs, and
+the relevant clocks are gated on. The device need not be "fully on"; it
+might be in a runtime lowpower/suspend state that acts as if it were.
+
+However, the details here may again be platform-specific. For example,
+some systems support multiple "run" states, and the mode in effect at
+the end of resume() might not be the one which preceded suspension.
+That means availability of certain clocks or power supplies changed,
+which could easily affect how a driver works.
+
+
+Drivers need to be able to handle hardware which has been reset since the
+suspend methods were called, for example by complete reinitialization.
+This may be the hardest part, and the one most protected by NDA'd documents
+and chip errata. It's simplest if the hardware state hasn't changed since
+the suspend() was called, but that can't always be guaranteed.
+
+Drivers must also be prepared to notice that the device has been removed
+while the system was powered off, whenever that's physically possible.
+PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
+where common Linux platforms will see such removal. Details of how drivers
+will notice and handle such removals are currently bus-specific, and often
+involve a separate thread.
+
+
+Note that the bus-specific runtime PM wakeup mechanism can exist, and might
+be defined to share some of the same driver code as for system wakeup. For
+example, a bus-specific device driver's resume() method might be used there,
+so it wouldn't only be called from bus.resume() during system-wide wakeup.
+See bus-specific information about how runtime wakeup events are handled.
+
+
+System Devices
+--------------
+System devices follow a slightly different API, which can be found in
+
+ include/linux/sysdev.h
+ drivers/base/sys.c
+
+System devices will only be suspended with interrupts disabled, and after
+all other devices have been suspended. On resume, they will be resumed
+before any other devices, and also with interrupts disabled.
+
+That is, IRQs are disabled, the suspend_late() phase begins, then the
+sysdev_driver.suspend() phase, and the system enters a sleep state. Then
+the sysdev_driver.resume() phase begins, followed by the resume_early()
+phase, after which IRQs are enabled.
+
+Code to actually enter and exit the system-wide low power state sometimes
+involves hardware details that are only known to the boot firmware, and
+may leave a CPU running software (from SRAM or flash memory) that monitors
+the system and manages its wakeup sequence.
+
+
+Runtime Power Management
+========================
+Many devices are able to dynamically power down while the system is still
+running. This feature is useful for devices that are not being used, and
+can offer significant power savings on a running system. These devices
+often support a range of runtime power states, which might use names such
+as "off", "sleep", "idle", "active", and so on. Those states will in some
+cases (like PCI) be partially constrained by a bus the device uses, and will
+usually include hardware states that are also used in system sleep states.
+
+However, note that if a driver puts a device into a runtime low power state
+and the system then goes into a system-wide sleep state, it normally ought
+to resume into that runtime low power state rather than "full on". Such
+distinctions would be part of the driver-internal state machine for that
+hardware; the whole point of runtime power management is to be sure that
+drivers are decoupled in that way from the state machine governing phases
+of the system-wide power/sleep state transitions.
+
+
+Power Saving Techniques
+-----------------------
+Normally runtime power management is handled by the drivers without specific
+userspace or kernel intervention, by device-aware use of techniques like:
+
+ Using information provided by other system layers
+ - stay deeply "off" except between open() and close()
+ - if transceiver/PHY indicates "nobody connected", stay "off"
+ - application protocols may include power commands or hints
+
+ Using fewer CPU cycles
+ - using DMA instead of PIO
+ - removing timers, or making them lower frequency
+ - shortening "hot" code paths
+ - eliminating cache misses
+ - (sometimes) offloading work to device firmware
+
+ Reducing other resource costs
+ - gating off unused clocks in software (or hardware)
+ - switching off unused power supplies
+ - eliminating (or delaying/merging) IRQs
+ - tuning DMA to use word and/or burst modes
+
+ Using device-specific low power states
+ - using lower voltages
+ - avoiding needless DMA transfers
+
+Read your hardware documentation carefully to see the opportunities that
+may be available. If you can, measure the actual power usage and check
+it against the budget established for your project.
+
+
+Examples: USB hosts, system timer, system CPU
+----------------------------------------------
+USB host controllers make interesting, if complex, examples. In many cases
+these have no work to do: no USB devices are connected, or all of them are
+in the USB "suspend" state. Linux host controller drivers can then disable
+periodic DMA transfers that would otherwise be a constant power drain on the
+memory subsystem, and enter a suspend state. In power-aware controllers,
+entering that suspend state may disable the clock used with USB signaling,
+saving a certain amount of power.
+
+The controller will be woken from that state (with an IRQ) by changes to the
+signal state on the data lines of a given port, for example by an existing
+peripheral requesting "remote wakeup" or by plugging a new peripheral. The
+same wakeup mechanism usually works from "standby" sleep states, and on some
+systems also from "suspend to RAM" (or even "suspend to disk") states.
+(Except that ACPI may be involved instead of normal IRQs, on some hardware.)
+
+System devices like timers and CPUs may have special roles in the platform
+power management scheme. For example, system timers using a "dynamic tick"
+approach don't just save CPU cycles (by eliminating needless timer IRQs),
+but they may also open the door to using lower power CPU "idle" states that
+cost more than a jiffie to enter and exit. On x86 systems these are states
+like "C3"; note that periodic DMA transfers from a USB host controller will
+also prevent entry to a C3 state, much like a periodic timer IRQ.
+
+That kind of runtime mechanism interaction is common. "System On Chip" (SOC)
+processors often have low power idle modes that can't be entered unless
+certain medium-speed clocks (often 12 or 48 MHz) are gated off. When the
+drivers gate those clocks effectively, then the system idle task may be able
+to use the lower power idle modes and thereby increase battery life.
+
+If the CPU can have a "cpufreq" driver, there also may be opportunities
+to shift to lower voltage settings and reduce the power cost of executing
+a given number of instructions. (Without voltage adjustment, it's rare
+for cpufreq to save much power; the cost-per-instruction must go down.)
diff --git a/Documentation/power/drivers-testing.txt b/Documentation/power/drivers-testing.txt
new file mode 100644
index 0000000..7f7a737
--- /dev/null
+++ b/Documentation/power/drivers-testing.txt
@@ -0,0 +1,46 @@
+Testing suspend and resume support in device drivers
+ (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+1. Preparing the test system
+
+Unfortunately, to effectively test the support for the system-wide suspend and
+resume transitions in a driver, it is necessary to suspend and resume a fully
+functional system with this driver loaded. Moreover, that should be done
+several times, preferably several times in a row, and separately for hibernation
+(aka suspend to disk or STD) and suspend to RAM (STR), because each of these
+cases involves slightly different operations and different interactions with
+the machine's BIOS.
+
+Of course, for this purpose the test system has to be known to suspend and
+resume without the driver being tested. Thus, if possible, you should first
+resolve all suspend/resume-related problems in the test system before you start
+testing the new driver. Please see Documentation/power/basic-pm-debugging.txt
+for more information about the debugging of suspend/resume functionality.
+
+2. Testing the driver
+
+Once you have resolved the suspend/resume-related problems with your test system
+without the new driver, you are ready to test it:
+
+a) Build the driver as a module, load it and try the test modes of hibernation
+ (see: Documents/power/basic-pm-debugging.txt, 1).
+
+b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
+ "platform" modes (see: Documents/power/basic-pm-debugging.txt, 1).
+
+c) Compile the driver directly into the kernel and try the test modes of
+ hibernation.
+
+d) Attempt to hibernate with the driver compiled directly into the kernel
+ in the "reboot", "shutdown" and "platform" modes.
+
+e) Try the test modes of suspend (see: Documents/power/basic-pm-debugging.txt,
+ 2). [As far as the STR tests are concerned, it should not matter whether or
+ not the driver is built as a module.]
+
+f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
+ (see: Documents/power/basic-pm-debugging.txt, 2).
+
+Each of the above tests should be repeated several times and the STD tests
+should be mixed with the STR tests. If any of them fails, the driver cannot be
+regarded as suspend/resume-safe.
diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt
new file mode 100644
index 0000000..38b5724
--- /dev/null
+++ b/Documentation/power/freezing-of-tasks.txt
@@ -0,0 +1,178 @@
+Freezing of tasks
+ (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+I. What is the freezing of tasks?
+
+The freezing of tasks is a mechanism by which user space processes and some
+kernel threads are controlled during hibernation or system-wide suspend (on some
+architectures).
+
+II. How does it work?
+
+There are four per-task flags used for that, PF_NOFREEZE, PF_FROZEN, TIF_FREEZE
+and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
+PF_NOFREEZE unset (all user space processes and some kernel threads) are
+regarded as 'freezable' and treated in a special way before the system enters a
+suspend state as well as before a hibernation image is created (in what follows
+we only consider hibernation, but the description also applies to suspend).
+
+Namely, as the first step of the hibernation procedure the function
+freeze_processes() (defined in kernel/power/process.c) is called. It executes
+try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and
+either wakes them up, if they are kernel threads, or sends fake signals to them,
+if they are user space processes. A task that has TIF_FREEZE set, should react
+to it by calling the function called refrigerator() (defined in
+kernel/power/process.c), which sets the task's PF_FROZEN flag, changes its state
+to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it.
+Then, we say that the task is 'frozen' and therefore the set of functions
+handling this mechanism is referred to as 'the freezer' (these functions are
+defined in kernel/power/process.c and include/linux/freezer.h). User space
+processes are generally frozen before kernel threads.
+
+It is not recommended to call refrigerator() directly. Instead, it is
+recommended to use the try_to_freeze() function (defined in
+include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the
+task enter refrigerator() if the flag is set.
+
+For user space processes try_to_freeze() is called automatically from the
+signal-handling code, but the freezable kernel threads need to call it
+explicitly in suitable places or use the wait_event_freezable() or
+wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
+that combine interruptible sleep with checking if TIF_FREEZE is set and calling
+try_to_freeze(). The main loop of a freezable kernel thread may look like the
+following one:
+
+ set_freezable();
+ do {
+ hub_events();
+ wait_event_freezable(khubd_wait,
+ !list_empty(&hub_event_list) ||
+ kthread_should_stop());
+ } while (!kthread_should_stop() || !list_empty(&hub_event_list));
+
+(from drivers/usb/core/hub.c::hub_thread()).
+
+If a freezable kernel thread fails to call try_to_freeze() after the freezer has
+set TIF_FREEZE for it, the freezing of tasks will fail and the entire
+hibernation operation will be cancelled. For this reason, freezable kernel
+threads must call try_to_freeze() somewhere or use one of the
+wait_event_freezable() and wait_event_freezable_timeout() macros.
+
+After the system memory state has been restored from a hibernation image and
+devices have been reinitialized, the function thaw_processes() is called in
+order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
+have been frozen leave refrigerator() and continue running.
+
+III. Which kernel threads are freezable?
+
+Kernel threads are not freezable by default. However, a kernel thread may clear
+PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
+directly is strongly discouraged). From this point it is regarded as freezable
+and must call try_to_freeze() in a suitable place.
+
+IV. Why do we do that?
+
+Generally speaking, there is a couple of reasons to use the freezing of tasks:
+
+1. The principal reason is to prevent filesystems from being damaged after
+hibernation. At the moment we have no simple means of checkpointing
+filesystems, so if there are any modifications made to filesystem data and/or
+metadata on disks, we cannot bring them back to the state from before the
+modifications. At the same time each hibernation image contains some
+filesystem-related information that must be consistent with the state of the
+on-disk data and metadata after the system memory state has been restored from
+the image (otherwise the filesystems will be damaged in a nasty way, usually
+making them almost impossible to repair). We therefore freeze tasks that might
+cause the on-disk filesystems' data and metadata to be modified after the
+hibernation image has been created and before the system is finally powered off.
+The majority of these are user space processes, but if any of the kernel threads
+may cause something like this to happen, they have to be freezable.
+
+2. Next, to create the hibernation image we need to free a sufficient amount of
+memory (approximately 50% of available RAM) and we need to do that before
+devices are deactivated, because we generally need them for swapping out. Then,
+after the memory for the image has been freed, we don't want tasks to allocate
+additional memory and we prevent them from doing that by freezing them earlier.
+[Of course, this also means that device drivers should not allocate substantial
+amounts of memory from their .suspend() callbacks before hibernation, but this
+is e separate issue.]
+
+3. The third reason is to prevent user space processes and some kernel threads
+from interfering with the suspending and resuming of devices. A user space
+process running on a second CPU while we are suspending devices may, for
+example, be troublesome and without the freezing of tasks we would need some
+safeguards against race conditions that might occur in such a case.
+
+Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
+of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
+
+"RJW:> Why we freeze tasks at all or why we freeze kernel threads?
+
+Linus: In many ways, 'at all'.
+
+I _do_ realize the IO request queue issues, and that we cannot actually do
+s2ram with some devices in the middle of a DMA. So we want to be able to
+avoid *that*, there's no question about that. And I suspect that stopping
+user threads and then waiting for a sync is practically one of the easier
+ways to do so.
+
+So in practice, the 'at all' may become a 'why freeze kernel threads?' and
+freezing user threads I don't find really objectionable."
+
+Still, there are kernel threads that may want to be freezable. For example, if
+a kernel that belongs to a device driver accesses the device directly, it in
+principle needs to know when the device is suspended, so that it doesn't try to
+access it at that time. However, if the kernel thread is freezable, it will be
+frozen before the driver's .suspend() callback is executed and it will be
+thawed after the driver's .resume() callback has run, so it won't be accessing
+the device while it's suspended.
+
+4. Another reason for freezing tasks is to prevent user space processes from
+realizing that hibernation (or suspend) operation takes place. Ideally, user
+space processes should not notice that such a system-wide operation has occurred
+and should continue running without any problems after the restore (or resume
+from suspend). Unfortunately, in the most general case this is quite difficult
+to achieve without the freezing of tasks. Consider, for example, a process
+that depends on all CPUs being online while it's running. Since we need to
+disable nonboot CPUs during the hibernation, if this process is not frozen, it
+may notice that the number of CPUs has changed and may start to work incorrectly
+because of that.
+
+V. Are there any problems related to the freezing of tasks?
+
+Yes, there are.
+
+First of all, the freezing of kernel threads may be tricky if they depend one
+on another. For example, if kernel thread A waits for a completion (in the
+TASK_UNINTERRUPTIBLE state) that needs to be done by freezable kernel thread B
+and B is frozen in the meantime, then A will be blocked until B is thawed, which
+may be undesirable. That's why kernel threads are not freezable by default.
+
+Second, there are the following two problems related to the freezing of user
+space processes:
+1. Putting processes into an uninterruptible sleep distorts the load average.
+2. Now that we have FUSE, plus the framework for doing device drivers in
+userspace, it gets even more complicated because some userspace processes are
+now doing the sorts of things that kernel threads do
+(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
+
+The problem 1. seems to be fixable, although it hasn't been fixed so far. The
+other one is more serious, but it seems that we can work around it by using
+hibernation (and suspend) notifiers (in that case, though, we won't be able to
+avoid the realization by the user space processes that the hibernation is taking
+place).
+
+There are also problems that the freezing of tasks tends to expose, although
+they are not directly related to it. For example, if request_firmware() is
+called from a device driver's .resume() routine, it will timeout and eventually
+fail, because the user land process that should respond to the request is frozen
+at this point. So, seemingly, the failure is due to the freezing of tasks.
+Suppose, however, that the firmware file is located on a filesystem accessible
+only through another device that hasn't been resumed yet. In that case,
+request_firmware() will fail regardless of whether or not the freezing of tasks
+is used. Consequently, the problem is not really related to the freezing of
+tasks, since it generally exists anyway.
+
+A driver must have all firmwares it may need in RAM before suspend() is called.
+If keeping them is not practical, for example due to their size, they must be
+requested early enough using the suspend notifier API described in notifiers.txt.
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
new file mode 100644
index 0000000..e67211f
--- /dev/null
+++ b/Documentation/power/interface.txt
@@ -0,0 +1,75 @@
+Power Management Interface
+
+
+The power management subsystem provides a unified sysfs interface to
+userspace, regardless of what architecture or platform one is
+running. The interface exists in /sys/power/ directory (assuming sysfs
+is mounted at /sys).
+
+/sys/power/state controls system power state. Reading from this file
+returns what states are supported, which is hard-coded to 'standby'
+(Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk'
+(Suspend-to-Disk).
+
+Writing to this file one of those strings causes the system to
+transition into that state. Please see the file
+Documentation/power/states.txt for a description of each of those
+states.
+
+
+/sys/power/disk controls the operating mode of the suspend-to-disk
+mechanism. Suspend-to-disk can be handled in several ways. We have a
+few options for putting the system to sleep - using the platform driver
+(e.g. ACPI or other suspend_ops), powering off the system or rebooting the
+system (for testing).
+
+Additionally, /sys/power/disk can be used to turn on one of the two testing
+modes of the suspend-to-disk mechanism: 'testproc' or 'test'. If the
+suspend-to-disk mechanism is in the 'testproc' mode, writing 'disk' to
+/sys/power/state will cause the kernel to disable nonboot CPUs and freeze
+tasks, wait for 5 seconds, unfreeze tasks and enable nonboot CPUs. If it is
+in the 'test' mode, writing 'disk' to /sys/power/state will cause the kernel
+to disable nonboot CPUs and freeze tasks, shrink memory, suspend devices, wait
+for 5 seconds, resume devices, unfreeze tasks and enable nonboot CPUs. Then,
+we are able to look in the log messages and work out, for example, which code
+is being slow and which device drivers are misbehaving.
+
+Reading from this file will display all supported modes and the currently
+selected one in brackets, for example
+
+ [shutdown] reboot test testproc
+
+Writing to this file will accept one of
+
+ 'platform' (only if the platform supports it)
+ 'shutdown'
+ 'reboot'
+ 'testproc'
+ 'test'
+
+/sys/power/image_size controls the size of the image created by
+the suspend-to-disk mechanism. It can be written a string
+representing a non-negative integer that will be used as an upper
+limit of the image size, in bytes. The suspend-to-disk mechanism will
+do its best to ensure the image size will not exceed that number. However,
+if this turns out to be impossible, it will try to suspend anyway using the
+smallest image possible. In particular, if "0" is written to this file, the
+suspend image will be as small as possible.
+
+Reading from this file will display the current image size limit, which
+is set to 500 MB by default.
+
+/sys/power/pm_trace controls the code which saves the last PM event point in
+the RTC across reboots, so that you can debug a machine that just hangs
+during suspend (or more commonly, during resume). Namely, the RTC is only
+used to save the last PM event point if this file contains '1'. Initially it
+contains '0' which may be changed to '1' by writing a string representing a
+nonzero integer into it.
+
+To use this debugging feature you should attempt to suspend the machine, then
+reboot it and run
+
+ dmesg -s 1000000 | grep 'hash matches'
+
+CAUTION: Using it will cause your machine's real-time (CMOS) clock to be
+set to a random invalid time after a resume.
diff --git a/Documentation/power/notifiers.txt b/Documentation/power/notifiers.txt
new file mode 100644
index 0000000..ae1b7ec
--- /dev/null
+++ b/Documentation/power/notifiers.txt
@@ -0,0 +1,58 @@
+Suspend notifiers
+ (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+There are some operations that device drivers may want to carry out in their
+.suspend() routines, but shouldn't, because they can cause the hibernation or
+suspend to fail. For example, a driver may want to allocate a substantial amount
+of memory (like 50 MB) in .suspend(), but that shouldn't be done after the
+swsusp's memory shrinker has run.
+
+Also, there may be some operations, that subsystems want to carry out before a
+hibernation/suspend or after a restore/resume, requiring the system to be fully
+functional, so the drivers' .suspend() and .resume() routines are not suitable
+for this purpose. For example, device drivers may want to upload firmware to
+their devices after a restore from a hibernation image, but they cannot do it by
+calling request_firmware() from their .resume() routines (user land processes
+are frozen at this point). The solution may be to load the firmware into
+memory before processes are frozen and upload it from there in the .resume()
+routine. Of course, a hibernation notifier may be used for this purpose.
+
+The subsystems that have such needs can register suspend notifiers that will be
+called upon the following events by the suspend core:
+
+PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will
+ be frozen immediately.
+
+PM_POST_HIBERNATION The system memory state has been restored from a
+ hibernation image or an error occured during the
+ hibernation. Device drivers' .resume() callbacks have
+ been executed and tasks have been thawed.
+
+PM_RESTORE_PREPARE The system is going to restore a hibernation image.
+ If all goes well the restored kernel will issue a
+ PM_POST_HIBERNATION notification.
+
+PM_POST_RESTORE An error occurred during the hibernation restore.
+ Device drivers' .resume() callbacks have been executed
+ and tasks have been thawed.
+
+PM_SUSPEND_PREPARE The system is preparing for a suspend.
+
+PM_POST_SUSPEND The system has just resumed or an error occured during
+ the suspend. Device drivers' .resume() callbacks have
+ been executed and tasks have been thawed.
+
+It is generally assumed that whatever the notifiers do for
+PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously,
+operations performed for PM_SUSPEND_PREPARE should be reversed for
+PM_POST_SUSPEND. Additionally, all of the notifiers are called for
+PM_POST_HIBERNATION if one of them fails for PM_HIBERNATION_PREPARE, and
+all of the notifiers are called for PM_POST_SUSPEND if one of them fails for
+PM_SUSPEND_PREPARE.
+
+The hibernation and suspend notifiers are called with pm_mutex held. They are
+defined in the usual way, but their last argument is meaningless (it is always
+NULL). To register and/or unregister a suspend notifier use the functions
+register_pm_notifier() and unregister_pm_notifier(), respectively, defined in
+include/linux/suspend.h . If you don't need to unregister the notifier, you can
+also use the pm_notifier() macro defined in include/linux/suspend.h .
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
new file mode 100644
index 0000000..dd8fe43
--- /dev/null
+++ b/Documentation/power/pci.txt
@@ -0,0 +1,299 @@
+
+PCI Power Management
+~~~~~~~~~~~~~~~~~~~~
+
+An overview of the concepts and the related functions in the Linux kernel
+
+Patrick Mochel <mochel@transmeta.com>
+(and others)
+
+---------------------------------------------------------------------------
+
+1. Overview
+2. How the PCI Subsystem Does Power Management
+3. PCI Utility Functions
+4. PCI Device Drivers
+5. Resources
+
+1. Overview
+~~~~~~~~~~~
+
+The PCI Power Management Specification was introduced between the PCI 2.1 and
+PCI 2.2 Specifications. It a standard interface for controlling various
+power management operations.
+
+Implementation of the PCI PM Spec is optional, as are several sub-components of
+it. If a device supports the PCI PM Spec, the device will have an 8 byte
+capability field in its PCI configuration space. This field is used to describe
+and control the standard PCI power management features.
+
+The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
+(B0 - B3). The higher the number, the less power the device consumes. However,
+the higher the number, the longer the latency is for the device to return to
+an operational state (D0).
+
+There are actually two D3 states. When someone talks about D3, they usually
+mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the
+device may lose some context). But they may also mean D3cold, which is an
+ACPI D3 state (power is fully off, all state was discarded); or both.
+
+Bus power management is not covered in this version of this document.
+
+Note that all PCI devices support D0 and D3cold by default, regardless of
+whether or not they implement any of the PCI PM spec.
+
+The possible state transitions that a device can undergo are:
+
++---------------------------+
+| Current State | New State |
++---------------------------+
+| D0 | D1, D2, D3|
++---------------------------+
+| D1 | D2, D3 |
++---------------------------+
+| D2 | D3 |
++---------------------------+
+| D1, D2, D3 | D0 |
++---------------------------+
+
+Note that when the system is entering a global suspend state, all devices will
+be placed into D3 and when resuming, all devices will be placed into D0.
+However, when the system is running, other state transitions are possible.
+
+2. How The PCI Subsystem Handles Power Management
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The PCI suspend/resume functionality is accessed indirectly via the Power
+Management subsystem. At boot, the PCI driver registers a power management
+callback with that layer. Upon entering a suspend state, the PM layer iterates
+through all of its registered callbacks. This currently takes place only during
+APM state transitions.
+
+Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
+it does a depth first walk of the device tree. The first walk saves each of the
+device's state and checks for devices that will prevent the system from entering
+a global power state. The next walk then places the devices in a low power
+state.
+
+The first walk allows a graceful recovery in the event of a failure, since none
+of the devices have actually been powered down.
+
+In both walks, in particular the second, all children of a bridge are touched
+before the actual bridge itself. This allows the bridge to retain power while
+its children are being accessed.
+
+Upon resuming from sleep, just the opposite must be true: all bridges must be
+powered on and restored before their children are powered on. This is easily
+accomplished with a breadth-first walk of the PCI device tree.
+
+
+3. PCI Utility Functions
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+These are helper functions designed to be called by individual device drivers.
+Assuming that a device behaves as advertised, these should be applicable in most
+cases. However, results may vary.
+
+Note that these functions are never implicitly called for the driver. The driver
+is always responsible for deciding when and if to call these.
+
+
+pci_save_state
+--------------
+
+Usage:
+ pci_save_state(struct pci_dev *dev);
+
+Description:
+ Save first 64 bytes of PCI config space, along with any additional
+ PCI-Express or PCI-X information.
+
+
+pci_restore_state
+-----------------
+
+Usage:
+ pci_restore_state(struct pci_dev *dev);
+
+Description:
+ Restore previously saved config space.
+
+
+pci_set_power_state
+-------------------
+
+Usage:
+ pci_set_power_state(struct pci_dev *dev, pci_power_t state);
+
+Description:
+ Transition device to low power state using PCI PM Capabilities
+ registers.
+
+ Will fail under one of the following conditions:
+ - If state is less than current state, but not D0 (illegal transition)
+ - Device doesn't support PM Capabilities
+ - Device does not support requested state
+
+
+pci_enable_wake
+---------------
+
+Usage:
+ pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable);
+
+Description:
+ Enable device to generate PME# during low power state using PCI PM
+ Capabilities.
+
+ Checks whether if device supports generating PME# from requested state
+ and fail if it does not, unless enable == 0 (request is to disable wake
+ events, which is implicit if it doesn't even support it in the first
+ place).
+
+ Note that the PMC Register in the device's PM Capabilities has a bitmask
+ of the states it supports generating PME# from. D3hot is bit 3 and
+ D3cold is bit 4. So, while a value of 4 as the state may not seem
+ semantically correct, it is.
+
+
+4. PCI Device Drivers
+~~~~~~~~~~~~~~~~~~~~~
+
+These functions are intended for use by individual drivers, and are defined in
+struct pci_driver:
+
+ int (*suspend) (struct pci_dev *dev, pm_message_t state);
+ int (*resume) (struct pci_dev *dev);
+
+
+suspend
+-------
+
+Usage:
+
+if (dev->driver && dev->driver->suspend)
+ dev->driver->suspend(dev,state);
+
+A driver uses this function to actually transition the device into a low power
+state. This should include disabling I/O, IRQs, and bus-mastering, as well as
+physically transitioning the device to a lower power state; it may also include
+calls to pci_enable_wake().
+
+Bus mastering may be disabled by doing:
+
+pci_disable_device(dev);
+
+For devices that support the PCI PM Spec, this may be used to set the device's
+power state to match the suspend() parameter:
+
+pci_set_power_state(dev,state);
+
+The driver is also responsible for disabling any other device-specific features
+(e.g blanking screen, turning off on-card memory, etc).
+
+The driver should be sure to track the current state of the device, as it may
+obviate the need for some operations.
+
+The driver should update the current_state field in its pci_dev structure in
+this function, except for PM-capable devices when pci_set_power_state is used.
+
+resume
+------
+
+Usage:
+
+if (dev->driver && dev->driver->resume)
+ dev->driver->resume(dev)
+
+The resume callback may be called from any power state, and is always meant to
+transition the device to the D0 state.
+
+The driver is responsible for reenabling any features of the device that had
+been disabled during previous suspend calls, such as IRQs and bus mastering,
+as well as calling pci_restore_state().
+
+If the device is currently in D3, it may need to be reinitialized in resume().
+
+ * Some types of devices, like bus controllers, will preserve context in D3hot
+ (using Vcc power). Their drivers will often want to avoid re-initializing
+ them after re-entering D0 (perhaps to avoid resetting downstream devices).
+
+ * Other kinds of devices in D3hot will discard device context as part of a
+ soft reset when re-entering the D0 state.
+
+ * Devices resuming from D3cold always go through a power-on reset. Some
+ device context can also be preserved using Vaux power.
+
+ * Some systems hide D3cold resume paths from drivers. For example, on PCs
+ the resume path for suspend-to-disk often runs BIOS powerup code, which
+ will sometimes re-initialize the device.
+
+To handle resets during D3 to D0 transitions, it may be convenient to share
+device initialization code between probe() and resume(). Device parameters
+can also be saved before the driver suspends into D3, avoiding re-probe.
+
+If the device supports the PCI PM Spec, it can use this to physically transition
+the device to D0:
+
+pci_set_power_state(dev,0);
+
+Note that if the entire system is transitioning out of a global sleep state, all
+devices will be placed in the D0 state, so this is not necessary. However, in
+the event that the device is placed in the D3 state during normal operation,
+this call is necessary. It is impossible to determine which of the two events is
+taking place in the driver, so it is always a good idea to make that call.
+
+The driver should take note of the state that it is resuming from in order to
+ensure correct (and speedy) operation.
+
+The driver should update the current_state field in its pci_dev structure in
+this function, except for PM-capable devices when pci_set_power_state is used.
+
+
+
+A reference implementation
+-------------------------
+.suspend()
+{
+ /* driver specific operations */
+
+ /* Disable IRQ */
+ free_irq();
+ /* If using MSI */
+ pci_disable_msi();
+
+ pci_save_state();
+ pci_enable_wake();
+ /* Disable IO/bus master/irq router */
+ pci_disable_device();
+ pci_set_power_state(pci_choose_state());
+}
+
+.resume()
+{
+ pci_set_power_state(PCI_D0);
+ pci_restore_state();
+ /* device's irq possibly is changed, driver should take care */
+ pci_enable_device();
+ pci_set_master();
+
+ /* if using MSI, device's vector possibly is changed */
+ pci_enable_msi();
+
+ request_irq();
+ /* driver specific operations; */
+}
+
+This is a typical implementation. Drivers can slightly change the order
+of the operations in the implementation, ignore some operations or add
+more driver specific operations in it, but drivers should do something like
+this on the whole.
+
+5. Resources
+~~~~~~~~~~~~
+
+PCI Local Bus Specification
+PCI Bus Power Management Interface Specification
+
+ http://www.pcisig.com
+
diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
new file mode 100644
index 0000000..c40866e
--- /dev/null
+++ b/Documentation/power/pm_qos_interface.txt
@@ -0,0 +1,64 @@
+PM Quality Of Service Interface.
+
+This interface provides a kernel and user mode interface for registering
+performance expectations by drivers, subsystems and user space applications on
+one of the parameters.
+
+Currently we have {cpu_dma_latency, network_latency, network_throughput} as the
+initial set of pm_qos parameters.
+
+Each parameters have defined units:
+ * latency: usec
+ * timeout: usec
+ * throughput: kbs (kilo bit / sec)
+
+The infrastructure exposes multiple misc device nodes one per implemented
+parameter. The set of parameters implement is defined by pm_qos_power_init()
+and pm_qos_params.h. This is done because having the available parameters
+being runtime configurable or changeable from a driver was seen as too easy to
+abuse.
+
+For each parameter a list of performance requirements is maintained along with
+an aggregated target value. The aggregated target value is updated with
+changes to the requirement list or elements of the list. Typically the
+aggregated target value is simply the max or min of the requirement values held
+in the parameter list elements.
+
+From kernel mode the use of this interface is simple:
+pm_qos_add_requirement(param_id, name, target_value):
+Will insert a named element in the list for that identified PM_QOS parameter
+with the target value. Upon change to this list the new target is recomputed
+and any registered notifiers are called only if the target value is now
+different.
+
+pm_qos_update_requirement(param_id, name, new_target_value):
+Will search the list identified by the param_id for the named list element and
+then update its target value, calling the notification tree if the aggregated
+target is changed. with that name is already registered.
+
+pm_qos_remove_requirement(param_id, name):
+Will search the identified list for the named element and remove it, after
+removal it will update the aggregate target and call the notification tree if
+the target was changed as a result of removing the named requirement.
+
+
+From user mode:
+Only processes can register a pm_qos requirement. To provide for automatic
+cleanup for process the interface requires the process to register its
+parameter requirements in the following way:
+
+To register the default pm_qos target for the specific parameter, the process
+must open one of /dev/[cpu_dma_latency, network_latency, network_throughput]
+
+As long as the device node is held open that process has a registered
+requirement on the parameter. The name of the requirement is "process_<PID>"
+derived from the current->pid from within the open system call.
+
+To change the requested target value the process needs to write a s32 value to
+the open device node. This translates to a pm_qos_update_requirement call.
+
+To remove the user mode request for a target value simply close the device
+node.
+
+
+
diff --git a/Documentation/power/power_supply_class.txt b/Documentation/power/power_supply_class.txt
new file mode 100644
index 0000000..c6cd495
--- /dev/null
+++ b/Documentation/power/power_supply_class.txt
@@ -0,0 +1,173 @@
+Linux power supply class
+========================
+
+Synopsis
+~~~~~~~~
+Power supply class used to represent battery, UPS, AC or DC power supply
+properties to user-space.
+
+It defines core set of attributes, which should be applicable to (almost)
+every power supply out there. Attributes are available via sysfs and uevent
+interfaces.
+
+Each attribute has well defined meaning, up to unit of measure used. While
+the attributes provided are believed to be universally applicable to any
+power supply, specific monitoring hardware may not be able to provide them
+all, so any of them may be skipped.
+
+Power supply class is extensible, and allows to define drivers own attributes.
+The core attribute set is subject to the standard Linux evolution (i.e.
+if it will be found that some attribute is applicable to many power supply
+types or their drivers, it can be added to the core set).
+
+It also integrates with LED framework, for the purpose of providing
+typically expected feedback of battery charging/fully charged status and
+AC/USB power supply online status. (Note that specific details of the
+indication (including whether to use it at all) are fully controllable by
+user and/or specific machine defaults, per design principles of LED
+framework).
+
+
+Attributes/properties
+~~~~~~~~~~~~~~~~~~~~~
+Power supply class has predefined set of attributes, this eliminates code
+duplication across drivers. Power supply class insist on reusing its
+predefined attributes *and* their units.
+
+So, userspace gets predictable set of attributes and their units for any
+kind of power supply, and can process/present them to a user in consistent
+manner. Results for different power supplies and machines are also directly
+comparable.
+
+See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
+example how to declare and handle attributes.
+
+
+Units
+~~~~~
+Quoting include/linux/power_supply.h:
+
+ All voltages, currents, charges, energies, time and temperatures in µV,
+ µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
+ stated. It's driver's job to convert its raw values to units in which
+ this class operates.
+
+
+Attributes/properties detailed
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
+~ ~
+~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
+~ of battery, this class distinguish these terms. Don't mix them! ~
+~ ~
+~ CHARGE_* attributes represents capacity in µAh only. ~
+~ ENERGY_* attributes represents capacity in µWh only. ~
+~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
+~ ~
+~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
+
+Postfixes:
+_AVG - *hardware* averaged value, use it if your hardware is really able to
+report averaged values.
+_NOW - momentary/instantaneous values.
+
+STATUS - this attribute represents operating status (charging, full,
+discharging (i.e. powering a load), etc.). This corresponds to
+BATTERY_STATUS_* values, as defined in battery.h.
+
+HEALTH - represents health of the battery, values corresponds to
+POWER_SUPPLY_HEALTH_*, defined in battery.h.
+
+VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
+minimal power supply voltages. Maximal/minimal means values of voltages
+when battery considered "full"/"empty" at normal conditions. Yes, there is
+no direct relation between voltage and battery capacity, but some dumb
+batteries use voltage for very approximated calculation of capacity.
+Battery driver also can use this attribute just to inform userspace
+about maximal and minimal voltage thresholds of a given battery.
+
+VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
+these ones should be used if hardware could only guess (measure and
+retain) the thresholds of a given power supply.
+
+CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
+battery considered full/empty.
+
+ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
+
+CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
+of charge when battery became full/empty". It also could mean "value of
+charge when battery considered full/empty at given conditions (temperature,
+age)". I.e. these attributes represents real thresholds, not design values.
+
+CHARGE_COUNTER - the current charge counter (in µAh). This could easily
+be negative; there is no empty or full value. It is only useful for
+relative, time-based measurements.
+
+ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
+
+CAPACITY - capacity in percents.
+
+TEMP - temperature of the power supply.
+TEMP_AMBIENT - ambient temperature.
+
+TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
+while battery powers a load)
+TIME_TO_FULL - seconds left for battery to be considered full (i.e.
+while battery is charging)
+
+
+Battery <-> external power supply interaction
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Often power supplies are acting as supplies and supplicants at the same
+time. Batteries are good example. So, batteries usually care if they're
+externally powered or not.
+
+For that case, power supply class implements notification mechanism for
+batteries.
+
+External power supply (AC) lists supplicants (batteries) names in
+"supplied_to" struct member, and each power_supply_changed() call
+issued by external power supply will notify supplicants via
+external_power_changed callback.
+
+
+QA
+~~
+Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
+A: If you cannot find attribute suitable for your driver needs, feel free
+ to add it and send patch along with your driver.
+
+ The attributes available currently are the ones currently provided by the
+ drivers written.
+
+ Good candidates to add in future: model/part#, cycle_time, manufacturer,
+ etc.
+
+
+Q: I have some very specific attribute (e.g. battery color), should I add
+ this attribute to standard ones?
+A: Most likely, no. Such attribute can be placed in the driver itself, if
+ it is useful. Of course, if the attribute in question applicable to
+ large set of batteries, provided by many drivers, and/or comes from
+ some general battery specification/standard, it may be a candidate to
+ be added to the core attribute set.
+
+
+Q: Suppose, my battery monitoring chip/firmware does not provides capacity
+ in percents, but provides charge_{now,full,empty}. Should I calculate
+ percentage capacity manually, inside the driver, and register CAPACITY
+ attribute? The same question about time_to_empty/time_to_full.
+A: Most likely, no. This class is designed to export properties which are
+ directly measurable by the specific hardware available.
+
+ Inferring not available properties using some heuristics or mathematical
+ model is not subject of work for a battery driver. Such functionality
+ should be factored out, and in fact, apm_power, the driver to serve
+ legacy APM API on top of power supply class, uses a simple heuristic of
+ approximating remaining battery capacity based on its charge, current,
+ voltage and so on. But full-fledged battery model is likely not subject
+ for kernel at all, as it would require floating point calculation to deal
+ with things like differential equations and Kalman filters. This is
+ better be handled by batteryd/libbattery, yet to be written.
diff --git a/Documentation/power/regulator/consumer.txt b/Documentation/power/regulator/consumer.txt
new file mode 100644
index 0000000..82b7a43
--- /dev/null
+++ b/Documentation/power/regulator/consumer.txt
@@ -0,0 +1,182 @@
+Regulator Consumer Driver Interface
+===================================
+
+This text describes the regulator interface for consumer device drivers.
+Please see overview.txt for a description of the terms used in this text.
+
+
+1. Consumer Regulator Access (static & dynamic drivers)
+=======================================================
+
+A consumer driver can get access to it's supply regulator by calling :-
+
+regulator = regulator_get(dev, "Vcc");
+
+The consumer passes in it's struct device pointer and power supply ID. The core
+then finds the correct regulator by consulting a machine specific lookup table.
+If the lookup is successful then this call will return a pointer to the struct
+regulator that supplies this consumer.
+
+To release the regulator the consumer driver should call :-
+
+regulator_put(regulator);
+
+Consumers can be supplied by more than one regulator e.g. codec consumer with
+analog and digital supplies :-
+
+digital = regulator_get(dev, "Vcc"); /* digital core */
+analog = regulator_get(dev, "Avdd"); /* analog */
+
+The regulator access functions regulator_get() and regulator_put() will
+usually be called in your device drivers probe() and remove() respectively.
+
+
+2. Regulator Output Enable & Disable (static & dynamic drivers)
+====================================================================
+
+A consumer can enable it's power supply by calling:-
+
+int regulator_enable(regulator);
+
+NOTE: The supply may already be enabled before regulator_enabled() is called.
+This may happen if the consumer shares the regulator or the regulator has been
+previously enabled by bootloader or kernel board initialization code.
+
+A consumer can determine if a regulator is enabled by calling :-
+
+int regulator_is_enabled(regulator);
+
+This will return > zero when the regulator is enabled.
+
+
+A consumer can disable it's supply when no longer needed by calling :-
+
+int regulator_disable(regulator);
+
+NOTE: This may not disable the supply if it's shared with other consumers. The
+regulator will only be disabled when the enabled reference count is zero.
+
+Finally, a regulator can be forcefully disabled in the case of an emergency :-
+
+int regulator_force_disable(regulator);
+
+NOTE: this will immediately and forcefully shutdown the regulator output. All
+consumers will be powered off.
+
+
+3. Regulator Voltage Control & Status (dynamic drivers)
+======================================================
+
+Some consumer drivers need to be able to dynamically change their supply
+voltage to match system operating points. e.g. CPUfreq drivers can scale
+voltage along with frequency to save power, SD drivers may need to select the
+correct card voltage, etc.
+
+Consumers can control their supply voltage by calling :-
+
+int regulator_set_voltage(regulator, min_uV, max_uV);
+
+Where min_uV and max_uV are the minimum and maximum acceptable voltages in
+microvolts.
+
+NOTE: this can be called when the regulator is enabled or disabled. If called
+when enabled, then the voltage changes instantly, otherwise the voltage
+configuration changes and the voltage is physically set when the regulator is
+next enabled.
+
+The regulators configured voltage output can be found by calling :-
+
+int regulator_get_voltage(regulator);
+
+NOTE: get_voltage() will return the configured output voltage whether the
+regulator is enabled or disabled and should NOT be used to determine regulator
+output state. However this can be used in conjunction with is_enabled() to
+determine the regulator physical output voltage.
+
+
+4. Regulator Current Limit Control & Status (dynamic drivers)
+===========================================================
+
+Some consumer drivers need to be able to dynamically change their supply
+current limit to match system operating points. e.g. LCD backlight driver can
+change the current limit to vary the backlight brightness, USB drivers may want
+to set the limit to 500mA when supplying power.
+
+Consumers can control their supply current limit by calling :-
+
+int regulator_set_current_limit(regulator, min_uV, max_uV);
+
+Where min_uA and max_uA are the minimum and maximum acceptable current limit in
+microamps.
+
+NOTE: this can be called when the regulator is enabled or disabled. If called
+when enabled, then the current limit changes instantly, otherwise the current
+limit configuration changes and the current limit is physically set when the
+regulator is next enabled.
+
+A regulators current limit can be found by calling :-
+
+int regulator_get_current_limit(regulator);
+
+NOTE: get_current_limit() will return the current limit whether the regulator
+is enabled or disabled and should not be used to determine regulator current
+load.
+
+
+5. Regulator Operating Mode Control & Status (dynamic drivers)
+=============================================================
+
+Some consumers can further save system power by changing the operating mode of
+their supply regulator to be more efficient when the consumers operating state
+changes. e.g. consumer driver is idle and subsequently draws less current
+
+Regulator operating mode can be changed indirectly or directly.
+
+Indirect operating mode control.
+--------------------------------
+Consumer drivers can request a change in their supply regulator operating mode
+by calling :-
+
+int regulator_set_optimum_mode(struct regulator *regulator, int load_uA);
+
+This will cause the core to recalculate the total load on the regulator (based
+on all it's consumers) and change operating mode (if necessary and permitted)
+to best match the current operating load.
+
+The load_uA value can be determined from the consumers datasheet. e.g.most
+datasheets have tables showing the max current consumed in certain situations.
+
+Most consumers will use indirect operating mode control since they have no
+knowledge of the regulator or whether the regulator is shared with other
+consumers.
+
+Direct operating mode control.
+------------------------------
+Bespoke or tightly coupled drivers may want to directly control regulator
+operating mode depending on their operating point. This can be achieved by
+calling :-
+
+int regulator_set_mode(struct regulator *regulator, unsigned int mode);
+unsigned int regulator_get_mode(struct regulator *regulator);
+
+Direct mode will only be used by consumers that *know* about the regulator and
+are not sharing the regulator with other consumers.
+
+
+6. Regulator Events
+===================
+Regulators can notify consumers of external events. Events could be received by
+consumers under regulator stress or failure conditions.
+
+Consumers can register interest in regulator events by calling :-
+
+int regulator_register_notifier(struct regulator *regulator,
+ struct notifier_block *nb);
+
+Consumers can uregister interest by calling :-
+
+int regulator_unregister_notifier(struct regulator *regulator,
+ struct notifier_block *nb);
+
+Regulators use the kernel notifier framework to send event to thier interested
+consumers.
diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt
new file mode 100644
index 0000000..ce3487d
--- /dev/null
+++ b/Documentation/power/regulator/machine.txt
@@ -0,0 +1,93 @@
+Regulator Machine Driver Interface
+===================================
+
+The regulator machine driver interface is intended for board/machine specific
+initialisation code to configure the regulator subsystem.
+
+Consider the following machine :-
+
+ Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
+ |
+ +-> [Consumer B @ 3.3V]
+
+The drivers for consumers A & B must be mapped to the correct regulator in
+order to control their power supply. This mapping can be achieved in machine
+initialisation code by creating a struct regulator_consumer_supply for
+each regulator.
+
+struct regulator_consumer_supply {
+ struct device *dev; /* consumer */
+ const char *supply; /* consumer supply - e.g. "vcc" */
+};
+
+e.g. for the machine above
+
+static struct regulator_consumer_supply regulator1_consumers[] = {
+{
+ .dev = &platform_consumerB_device.dev,
+ .supply = "Vcc",
+},};
+
+static struct regulator_consumer_supply regulator2_consumers[] = {
+{
+ .dev = &platform_consumerA_device.dev,
+ .supply = "Vcc",
+},};
+
+This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
+to the 'Vcc' supply for Consumer A.
+
+Constraints can now be registered by defining a struct regulator_init_data
+for each regulator power domain. This structure also maps the consumers
+to their supply regulator :-
+
+static struct regulator_init_data regulator1_data = {
+ .constraints = {
+ .min_uV = 3300000,
+ .max_uV = 3300000,
+ .valid_modes_mask = REGULATOR_MODE_NORMAL,
+ },
+ .num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
+ .consumer_supplies = regulator1_consumers,
+};
+
+Regulator-1 supplies power to Regulator-2. This relationship must be registered
+with the core so that Regulator-1 is also enabled when Consumer A enables it's
+supply (Regulator-2). The supply regulator is set by the supply_regulator_dev
+field below:-
+
+static struct regulator_init_data regulator2_data = {
+ .supply_regulator_dev = &platform_regulator1_device.dev,
+ .constraints = {
+ .min_uV = 1800000,
+ .max_uV = 2000000,
+ .valid_ops_mask = REGULATOR_CHANGE_VOLTAGE,
+ .valid_modes_mask = REGULATOR_MODE_NORMAL,
+ },
+ .num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
+ .consumer_supplies = regulator2_consumers,
+};
+
+Finally the regulator devices must be registered in the usual manner.
+
+static struct platform_device regulator_devices[] = {
+{
+ .name = "regulator",
+ .id = DCDC_1,
+ .dev = {
+ .platform_data = &regulator1_data,
+ },
+},
+{
+ .name = "regulator",
+ .id = DCDC_2,
+ .dev = {
+ .platform_data = &regulator2_data,
+ },
+},
+};
+/* register regulator 1 device */
+platform_device_register(&wm8350_regulator_devices[0]);
+
+/* register regulator 2 device */
+platform_device_register(&wm8350_regulator_devices[1]);
diff --git a/Documentation/power/regulator/overview.txt b/Documentation/power/regulator/overview.txt
new file mode 100644
index 0000000..bdcb332
--- /dev/null
+++ b/Documentation/power/regulator/overview.txt
@@ -0,0 +1,171 @@
+Linux voltage and current regulator framework
+=============================================
+
+About
+=====
+
+This framework is designed to provide a standard kernel interface to control
+voltage and current regulators.
+
+The intention is to allow systems to dynamically control regulator power output
+in order to save power and prolong battery life. This applies to both voltage
+regulators (where voltage output is controllable) and current sinks (where
+current limit is controllable).
+
+(C) 2008 Wolfson Microelectronics PLC.
+Author: Liam Girdwood <lg@opensource.wolfsonmicro.com>
+
+
+Nomenclature
+============
+
+Some terms used in this document:-
+
+ o Regulator - Electronic device that supplies power to other devices.
+ Most regulators can enable and disable their output whilst
+ some can control their output voltage and or current.
+
+ Input Voltage -> Regulator -> Output Voltage
+
+
+ o PMIC - Power Management IC. An IC that contains numerous regulators
+ and often contains other susbsystems.
+
+
+ o Consumer - Electronic device that is supplied power by a regulator.
+ Consumers can be classified into two types:-
+
+ Static: consumer does not change it's supply voltage or
+ current limit. It only needs to enable or disable it's
+ power supply. It's supply voltage is set by the hardware,
+ bootloader, firmware or kernel board initialisation code.
+
+ Dynamic: consumer needs to change it's supply voltage or
+ current limit to meet operation demands.
+
+
+ o Power Domain - Electronic circuit that is supplied it's input power by the
+ output power of a regulator, switch or by another power
+ domain.
+
+ The supply regulator may be behind a switch(s). i.e.
+
+ Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
+ | |
+ | +-> [Consumer B], [Consumer C]
+ |
+ +-> [Consumer D], [Consumer E]
+
+ That is one regulator and three power domains:
+
+ Domain 1: Switch-1, Consumers D & E.
+ Domain 2: Switch-2, Consumers B & C.
+ Domain 3: Consumer A.
+
+ and this represents a "supplies" relationship:
+
+ Domain-1 --> Domain-2 --> Domain-3.
+
+ A power domain may have regulators that are supplied power
+ by other regulators. i.e.
+
+ Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
+ |
+ +-> [Consumer B]
+
+ This gives us two regulators and two power domains:
+
+ Domain 1: Regulator-2, Consumer B.
+ Domain 2: Consumer A.
+
+ and a "supplies" relationship:
+
+ Domain-1 --> Domain-2
+
+
+ o Constraints - Constraints are used to define power levels for performance
+ and hardware protection. Constraints exist at three levels:
+
+ Regulator Level: This is defined by the regulator hardware
+ operating parameters and is specified in the regulator
+ datasheet. i.e.
+
+ - voltage output is in the range 800mV -> 3500mV.
+ - regulator current output limit is 20mA @ 5V but is
+ 10mA @ 10V.
+
+ Power Domain Level: This is defined in software by kernel
+ level board initialisation code. It is used to constrain a
+ power domain to a particular power range. i.e.
+
+ - Domain-1 voltage is 3300mV
+ - Domain-2 voltage is 1400mV -> 1600mV
+ - Domain-3 current limit is 0mA -> 20mA.
+
+ Consumer Level: This is defined by consumer drivers
+ dynamically setting voltage or current limit levels.
+
+ e.g. a consumer backlight driver asks for a current increase
+ from 5mA to 10mA to increase LCD illumination. This passes
+ to through the levels as follows :-
+
+ Consumer: need to increase LCD brightness. Lookup and
+ request next current mA value in brightness table (the
+ consumer driver could be used on several different
+ personalities based upon the same reference device).
+
+ Power Domain: is the new current limit within the domain
+ operating limits for this domain and system state (e.g.
+ battery power, USB power)
+
+ Regulator Domains: is the new current limit within the
+ regulator operating parameters for input/ouput voltage.
+
+ If the regulator request passes all the constraint tests
+ then the new regulator value is applied.
+
+
+Design
+======
+
+The framework is designed and targeted at SoC based devices but may also be
+relevant to non SoC devices and is split into the following four interfaces:-
+
+
+ 1. Consumer driver interface.
+
+ This uses a similar API to the kernel clock interface in that consumer
+ drivers can get and put a regulator (like they can with clocks atm) and
+ get/set voltage, current limit, mode, enable and disable. This should
+ allow consumers complete control over their supply voltage and current
+ limit. This also compiles out if not in use so drivers can be reused in
+ systems with no regulator based power control.
+
+ See Documentation/power/regulator/consumer.txt
+
+ 2. Regulator driver interface.
+
+ This allows regulator drivers to register their regulators and provide
+ operations to the core. It also has a notifier call chain for propagating
+ regulator events to clients.
+
+ See Documentation/power/regulator/regulator.txt
+
+ 3. Machine interface.
+
+ This interface is for machine specific code and allows the creation of
+ voltage/current domains (with constraints) for each regulator. It can
+ provide regulator constraints that will prevent device damage through
+ overvoltage or over current caused by buggy client drivers. It also
+ allows the creation of a regulator tree whereby some regulators are
+ supplied by others (similar to a clock tree).
+
+ See Documentation/power/regulator/machine.txt
+
+ 4. Userspace ABI.
+
+ The framework also exports a lot of useful voltage/current/opmode data to
+ userspace via sysfs. This could be used to help monitor device power
+ consumption and status.
+
+ See Documentation/ABI/testing/regulator-sysfs.txt
diff --git a/Documentation/power/regulator/regulator.txt b/Documentation/power/regulator/regulator.txt
new file mode 100644
index 0000000..4200acc
--- /dev/null
+++ b/Documentation/power/regulator/regulator.txt
@@ -0,0 +1,30 @@
+Regulator Driver Interface
+==========================
+
+The regulator driver interface is relatively simple and designed to allow
+regulator drivers to register their services with the core framework.
+
+
+Registration
+============
+
+Drivers can register a regulator by calling :-
+
+struct regulator_dev *regulator_register(struct device *dev,
+ struct regulator_desc *regulator_desc);
+
+This will register the regulators capabilities and operations to the regulator
+core.
+
+Regulators can be unregistered by calling :-
+
+void regulator_unregister(struct regulator_dev *rdev);
+
+
+Regulator Events
+================
+Regulators can send events (e.g. over temp, under voltage, etc) to consumer
+drivers by calling :-
+
+int regulator_notifier_call_chain(struct regulator_dev *rdev,
+ unsigned long event, void *data);
diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
new file mode 100644
index 0000000..2ebdc60
--- /dev/null
+++ b/Documentation/power/s2ram.txt
@@ -0,0 +1,74 @@
+ How to get s2ram working
+ ~~~~~~~~~~~~~~~~~~~~~~~~
+ 2006 Linus Torvalds
+ 2006 Pavel Machek
+
+1) Check suspend.sf.net, program s2ram there has long whitelist of
+ "known ok" machines, along with tricks to use on each one.
+
+2) If that does not help, try reading tricks.txt and
+ video.txt. Perhaps problem is as simple as broken module, and
+ simple module unload can fix it.
+
+3) You can use Linus' TRACE_RESUME infrastructure, described below.
+
+ Using TRACE_RESUME
+ ~~~~~~~~~~~~~~~~~~
+
+I've been working at making the machines I have able to STR, and almost
+always it's a driver that is buggy. Thank God for the suspend/resume
+debugging - the thing that Chuck tried to disable. That's often the _only_
+way to debug these things, and it's actually pretty powerful (but
+time-consuming - having to insert TRACE_RESUME() markers into the device
+driver that doesn't resume and recompile and reboot).
+
+Anyway, the way to debug this for people who are interested (have a
+machine that doesn't boot) is:
+
+ - enable PM_DEBUG, and PM_TRACE
+
+ - use a script like this:
+
+ #!/bin/sh
+ sync
+ echo 1 > /sys/power/pm_trace
+ echo mem > /sys/power/state
+
+ to suspend
+
+ - if it doesn't come back up (which is usually the problem), reboot by
+ holding the power button down, and look at the dmesg output for things
+ like
+
+ Magic number: 4:156:725
+ hash matches drivers/base/power/resume.c:28
+ hash matches device 0000:01:00.0
+
+ which means that the last trace event was just before trying to resume
+ device 0000:01:00.0. Then figure out what driver is controlling that
+ device (lspci and /sys/devices/pci* is your friend), and see if you can
+ fix it, disable it, or trace into its resume function.
+
+For example, the above happens to be the VGA device on my EVO, which I
+used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
+that "radeonfb" simply cannot resume that device - it tries to set the
+PLL's, and it just _hangs_. Using the regular VGA console and letting X
+resume it instead works fine.
+
+NOTE
+====
+pm_trace uses the system's Real Time Clock (RTC) to save the magic number.
+Reason for this is that the RTC is the only reliably available piece of
+hardware during resume operations where a value can be set that will
+survive a reboot.
+
+Consequence is that after a resume (even if it is successful) your system
+clock will have a value corresponding to the magic mumber instead of the
+correct date/time! It is therefore advisable to use a program like ntp-date
+or rdate to reset the correct date/time from an external time source when
+using this trace option.
+
+As the clock keeps ticking it is also essential that the reboot is done
+quickly after the resume failure. The trace option does not use the seconds
+or the low order bits of the minutes of the RTC, but a too long delay will
+corrupt the magic value.
diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt
new file mode 100644
index 0000000..34800cc
--- /dev/null
+++ b/Documentation/power/states.txt
@@ -0,0 +1,80 @@
+
+System Power Management States
+
+
+The kernel supports three power management states generically, though
+each is dependent on platform support code to implement the low-level
+details for each state. This file describes each state, what they are
+commonly called, what ACPI state they map to, and what string to write
+to /sys/power/state to enter that state
+
+
+State: Standby / Power-On Suspend
+ACPI State: S1
+String: "standby"
+
+This state offers minimal, though real, power savings, while providing
+a very low-latency transition back to a working system. No operating
+state is lost (the CPU retains power), so the system easily starts up
+again where it left off.
+
+We try to put devices in a low-power state equivalent to D1, which
+also offers low power savings, but low resume latency. Not all devices
+support D1, and those that don't are left on.
+
+A transition from Standby to the On state should take about 1-2
+seconds.
+
+
+State: Suspend-to-RAM
+ACPI State: S3
+String: "mem"
+
+This state offers significant power savings as everything in the
+system is put into a low-power state, except for memory, which is
+placed in self-refresh mode to retain its contents.
+
+System and device state is saved and kept in memory. All devices are
+suspended and put into D3. In many cases, all peripheral buses lose
+power when entering STR, so devices must be able to handle the
+transition back to the On state.
+
+For at least ACPI, STR requires some minimal boot-strapping code to
+resume the system from STR. This may be true on other platforms.
+
+A transition from Suspend-to-RAM to the On state should take about
+3-5 seconds.
+
+
+State: Suspend-to-disk
+ACPI State: S4
+String: "disk"
+
+This state offers the greatest power savings, and can be used even in
+the absence of low-level platform support for power management. This
+state operates similarly to Suspend-to-RAM, but includes a final step
+of writing memory contents to disk. On resume, this is read and memory
+is restored to its pre-suspend state.
+
+STD can be handled by the firmware or the kernel. If it is handled by
+the firmware, it usually requires a dedicated partition that must be
+setup via another operating system for it to use. Despite the
+inconvenience, this method requires minimal work by the kernel, since
+the firmware will also handle restoring memory contents on resume.
+
+For suspend-to-disk, a mechanism called swsusp called 'swsusp' (Swap
+Suspend) is used to write memory contents to free swap space.
+swsusp has some restrictive requirements, but should work in most
+cases. Some, albeit outdated, documentation can be found in
+Documentation/power/swsusp.txt. Alternatively, userspace can do most
+of the actual suspend to disk work, see userland-swsusp.txt.
+
+Once memory state is written to disk, the system may either enter a
+low-power state (like ACPI S4), or it may simply power down. Powering
+down offers greater savings, and allows this mechanism to work on any
+system. However, entering a real low-power state allows the user to
+trigger wake up events (e.g. pressing a key or opening a laptop lid).
+
+A transition from Suspend-to-Disk to the On state should take about 30
+seconds, though it's typically a bit more with the current
+implementation.
diff --git a/Documentation/power/swsusp-and-swap-files.txt b/Documentation/power/swsusp-and-swap-files.txt
new file mode 100644
index 0000000..f281886
--- /dev/null
+++ b/Documentation/power/swsusp-and-swap-files.txt
@@ -0,0 +1,60 @@
+Using swap files with software suspend (swsusp)
+ (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
+
+The Linux kernel handles swap files almost in the same way as it handles swap
+partitions and there are only two differences between these two types of swap
+areas:
+(1) swap files need not be contiguous,
+(2) the header of a swap file is not in the first block of the partition that
+holds it. From the swsusp's point of view (1) is not a problem, because it is
+already taken care of by the swap-handling code, but (2) has to be taken into
+consideration.
+
+In principle the location of a swap file's header may be determined with the
+help of appropriate filesystem driver. Unfortunately, however, it requires the
+filesystem holding the swap file to be mounted, and if this filesystem is
+journaled, it cannot be mounted during resume from disk. For this reason to
+identify a swap file swsusp uses the name of the partition that holds the file
+and the offset from the beginning of the partition at which the swap file's
+header is located. For convenience, this offset is expressed in <PAGE_SIZE>
+units.
+
+In order to use a swap file with swsusp, you need to:
+
+1) Create the swap file and make it active, eg.
+
+# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
+# mkswap <swap_file_path>
+# swapon <swap_file_path>
+
+2) Use an application that will bmap the swap file with the help of the
+FIBMAP ioctl and determine the location of the file's swap header, as the
+offset, in <PAGE_SIZE> units, from the beginning of the partition which
+holds the swap file.
+
+3) Add the following parameters to the kernel command line:
+
+resume=<swap_file_partition> resume_offset=<swap_file_offset>
+
+where <swap_file_partition> is the partition on which the swap file is located
+and <swap_file_offset> is the offset of the swap header determined by the
+application in 2) (of course, this step may be carried out automatically
+by the same application that determines the swap file's header offset using the
+FIBMAP ioctl)
+
+OR
+
+Use a userland suspend application that will set the partition and offset
+with the help of the SNAPSHOT_SET_SWAP_AREA ioctl described in
+Documentation/power/userland-swsusp.txt (this is the only method to suspend
+to a swap file allowing the resume to be initiated from an initrd or initramfs
+image).
+
+Now, swsusp will use the swap file in the same way in which it would use a swap
+partition. In particular, the swap file has to be active (ie. be present in
+/proc/swaps) so that it can be used for suspending.
+
+Note that if the swap file used for suspending is deleted and recreated,
+the location of its header need not be the same as before. Thus every time
+this happens the value of the "resume_offset=" kernel command line parameter
+has to be updated.
diff --git a/Documentation/power/swsusp-dmcrypt.txt b/Documentation/power/swsusp-dmcrypt.txt
new file mode 100644
index 0000000..59931b4
--- /dev/null
+++ b/Documentation/power/swsusp-dmcrypt.txt
@@ -0,0 +1,138 @@
+Author: Andreas Steinmetz <ast@domdv.de>
+
+
+How to use dm-crypt and swsusp together:
+========================================
+
+Some prerequisites:
+You know how dm-crypt works. If not, visit the following web page:
+http://www.saout.de/misc/dm-crypt/
+You have read Documentation/power/swsusp.txt and understand it.
+You did read Documentation/initrd.txt and know how an initrd works.
+You know how to create or how to modify an initrd.
+
+Now your system is properly set up, your disk is encrypted except for
+the swap device(s) and the boot partition which may contain a mini
+system for crypto setup and/or rescue purposes. You may even have
+an initrd that does your current crypto setup already.
+
+At this point you want to encrypt your swap, too. Still you want to
+be able to suspend using swsusp. This, however, means that you
+have to be able to either enter a passphrase or that you read
+the key(s) from an external device like a pcmcia flash disk
+or an usb stick prior to resume. So you need an initrd, that sets
+up dm-crypt and then asks swsusp to resume from the encrypted
+swap device.
+
+The most important thing is that you set up dm-crypt in such
+a way that the swap device you suspend to/resume from has
+always the same major/minor within the initrd as well as
+within your running system. The easiest way to achieve this is
+to always set up this swap device first with dmsetup, so that
+it will always look like the following:
+
+brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
+
+Now set up your kernel to use /dev/mapper/swap0 as the default
+resume partition, so your kernel .config contains:
+
+CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
+
+Prepare your boot loader to use the initrd you will create or
+modify. For lilo the simplest setup looks like the following
+lines:
+
+image=/boot/vmlinuz
+initrd=/boot/initrd.gz
+label=linux
+append="root=/dev/ram0 init=/linuxrc rw"
+
+Finally you need to create or modify your initrd. Lets assume
+you create an initrd that reads the required dm-crypt setup
+from a pcmcia flash disk card. The card is formatted with an ext2
+fs which resides on /dev/hde1 when the card is inserted. The
+card contains at least the encrypted swap setup in a file
+named "swapkey". /etc/fstab of your initrd contains something
+like the following:
+
+/dev/hda1 /mnt ext3 ro 0 0
+none /proc proc defaults,noatime,nodiratime 0 0
+none /sys sysfs defaults,noatime,nodiratime 0 0
+
+/dev/hda1 contains an unencrypted mini system that sets up all
+of your crypto devices, again by reading the setup from the
+pcmcia flash disk. What follows now is a /linuxrc for your
+initrd that allows you to resume from encrypted swap and that
+continues boot with your mini system on /dev/hda1 if resume
+does not happen:
+
+#!/bin/sh
+PATH=/sbin:/bin:/usr/sbin:/usr/bin
+mount /proc
+mount /sys
+mapped=0
+noresume=`grep -c noresume /proc/cmdline`
+if [ "$*" != "" ]
+then
+ noresume=1
+fi
+dmesg -n 1
+/sbin/cardmgr -q
+for i in 1 2 3 4 5 6 7 8 9 0
+do
+ if [ -f /proc/ide/hde/media ]
+ then
+ usleep 500000
+ mount -t ext2 -o ro /dev/hde1 /mnt
+ if [ -f /mnt/swapkey ]
+ then
+ dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
+ fi
+ umount /mnt
+ break
+ fi
+ usleep 500000
+done
+killproc /sbin/cardmgr
+dmesg -n 6
+if [ $mapped = 1 ]
+then
+ if [ $noresume != 0 ]
+ then
+ mkswap /dev/mapper/swap0 > /dev/null 2>&1
+ fi
+ echo 254:0 > /sys/power/resume
+ dmsetup remove swap0
+fi
+umount /sys
+mount /mnt
+umount /proc
+cd /mnt
+pivot_root . mnt
+mount /proc
+umount -l /mnt
+umount /proc
+exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
+
+Please don't mind the weird loop above, busybox's msh doesn't know
+the let statement. Now, what is happening in the script?
+First we have to decide if we want to try to resume, or not.
+We will not resume if booting with "noresume" or any parameters
+for init like "single" or "emergency" as boot parameters.
+
+Then we need to set up dmcrypt with the setup data from the
+pcmcia flash disk. If this succeeds we need to reset the swap
+device if we don't want to resume. The line "echo 254:0 > /sys/power/resume"
+then attempts to resume from the first device mapper device.
+Note that it is important to set the device in /sys/power/resume,
+regardless if resuming or not, otherwise later suspend will fail.
+If resume starts, script execution terminates here.
+
+Otherwise we just remove the encrypted swap device and leave it to the
+mini system on /dev/hda1 to set the whole crypto up (it is up to
+you to modify this to your taste).
+
+What then follows is the well known process to change the root
+file system and continue booting from there. I prefer to unmount
+the initrd prior to continue booting but it is up to you to modify
+this.
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
new file mode 100644
index 0000000..9d60ab7
--- /dev/null
+++ b/Documentation/power/swsusp.txt
@@ -0,0 +1,407 @@
+Some warnings, first.
+
+ * BIG FAT WARNING *********************************************************
+ *
+ * If you touch anything on disk between suspend and resume...
+ * ...kiss your data goodbye.
+ *
+ * If you do resume from initrd after your filesystems are mounted...
+ * ...bye bye root partition.
+ * [this is actually same case as above]
+ *
+ * If you have unsupported (*) devices using DMA, you may have some
+ * problems. If your disk driver does not support suspend... (IDE does),
+ * it may cause some problems, too. If you change kernel command line
+ * between suspend and resume, it may do something wrong. If you change
+ * your hardware while system is suspended... well, it was not good idea;
+ * but it will probably only crash.
+ *
+ * (*) suspend/resume support is needed to make it safe.
+ *
+ * If you have any filesystems on USB devices mounted before software suspend,
+ * they won't be accessible after resume and you may lose data, as though
+ * you have unplugged the USB devices with mounted filesystems on them;
+ * see the FAQ below for details. (This is not true for more traditional
+ * power states like "standby", which normally don't turn USB off.)
+
+You need to append resume=/dev/your_swap_partition to kernel command
+line. Then you suspend by
+
+echo shutdown > /sys/power/disk; echo disk > /sys/power/state
+
+. If you feel ACPI works pretty well on your system, you might try
+
+echo platform > /sys/power/disk; echo disk > /sys/power/state
+
+. If you have SATA disks, you'll need recent kernels with SATA suspend
+support. For suspend and resume to work, make sure your disk drivers
+are built into kernel -- not modules. [There's way to make
+suspend/resume with modular disk drivers, see FAQ, but you probably
+should not do that.]
+
+If you want to limit the suspend image size to N bytes, do
+
+echo N > /sys/power/image_size
+
+before suspend (it is limited to 500 MB by default).
+
+
+Article about goals and implementation of Software Suspend for Linux
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Author: G‚ábor Kuti
+Last revised: 2003-10-20 by Pavel Machek
+
+Idea and goals to achieve
+
+Nowadays it is common in several laptops that they have a suspend button. It
+saves the state of the machine to a filesystem or to a partition and switches
+to standby mode. Later resuming the machine the saved state is loaded back to
+ram and the machine can continue its work. It has two real benefits. First we
+save ourselves the time machine goes down and later boots up, energy costs
+are real high when running from batteries. The other gain is that we don't have to
+interrupt our programs so processes that are calculating something for a long
+time shouldn't need to be written interruptible.
+
+swsusp saves the state of the machine into active swaps and then reboots or
+powerdowns. You must explicitly specify the swap partition to resume from with
+``resume='' kernel option. If signature is found it loads and restores saved
+state. If the option ``noresume'' is specified as a boot parameter, it skips
+the resuming.
+
+In the meantime while the system is suspended you should not add/remove any
+of the hardware, write to the filesystems, etc.
+
+Sleep states summary
+====================
+
+There are three different interfaces you can use, /proc/acpi should
+work like this:
+
+In a really perfect world:
+echo 1 > /proc/acpi/sleep # for standby
+echo 2 > /proc/acpi/sleep # for suspend to ram
+echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
+echo 4 > /proc/acpi/sleep # for suspend to disk
+echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
+
+and perhaps
+echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
+
+Frequently Asked Questions
+==========================
+
+Q: well, suspending a server is IMHO a really stupid thing,
+but... (Diego Zuccato):
+
+A: You bought new UPS for your server. How do you install it without
+bringing machine down? Suspend to disk, rearrange power cables,
+resume.
+
+You have your server on UPS. Power died, and UPS is indicating 30
+seconds to failure. What do you do? Suspend to disk.
+
+
+Q: Maybe I'm missing something, but why don't the regular I/O paths work?
+
+A: We do use the regular I/O paths. However we cannot restore the data
+to its original location as we load it. That would create an
+inconsistent kernel state which would certainly result in an oops.
+Instead, we load the image into unused memory and then atomically copy
+it back to it original location. This implies, of course, a maximum
+image size of half the amount of memory.
+
+There are two solutions to this:
+
+* require half of memory to be free during suspend. That way you can
+read "new" data onto free spots, then cli and copy
+
+* assume we had special "polling" ide driver that only uses memory
+between 0-640KB. That way, I'd have to make sure that 0-640KB is free
+during suspending, but otherwise it would work...
+
+suspend2 shares this fundamental limitation, but does not include user
+data and disk caches into "used memory" by saving them in
+advance. That means that the limitation goes away in practice.
+
+Q: Does linux support ACPI S4?
+
+A: Yes. That's what echo platform > /sys/power/disk does.
+
+Q: What is 'suspend2'?
+
+A: suspend2 is 'Software Suspend 2', a forked implementation of
+suspend-to-disk which is available as separate patches for 2.4 and 2.6
+kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
+highmem and preemption. It also has a extensible architecture that
+allows for arbitrary transformations on the image (compression,
+encryption) and arbitrary backends for writing the image (eg to swap
+or an NFS share[Work In Progress]). Questions regarding suspend2
+should be sent to the mailing list available through the suspend2
+website, and not to the Linux Kernel Mailing List. We are working
+toward merging suspend2 into the mainline kernel.
+
+Q: What is the freezing of tasks and why are we using it?
+
+A: The freezing of tasks is a mechanism by which user space processes and some
+kernel threads are controlled during hibernation or system-wide suspend (on some
+architectures). See freezing-of-tasks.txt for details.
+
+Q: What is the difference between "platform" and "shutdown"?
+
+A:
+
+shutdown: save state in linux, then tell bios to powerdown
+
+platform: save state in linux, then tell bios to powerdown and blink
+ "suspended led"
+
+"platform" is actually right thing to do where supported, but
+"shutdown" is most reliable (except on ACPI systems).
+
+Q: I do not understand why you have such strong objections to idea of
+selective suspend.
+
+A: Do selective suspend during runtime power management, that's okay. But
+it's useless for suspend-to-disk. (And I do not see how you could use
+it for suspend-to-ram, I hope you do not want that).
+
+Lets see, so you suggest to
+
+* SUSPEND all but swap device and parents
+* Snapshot
+* Write image to disk
+* SUSPEND swap device and parents
+* Powerdown
+
+Oh no, that does not work, if swap device or its parents uses DMA,
+you've corrupted data. You'd have to do
+
+* SUSPEND all but swap device and parents
+* FREEZE swap device and parents
+* Snapshot
+* UNFREEZE swap device and parents
+* Write
+* SUSPEND swap device and parents
+
+Which means that you still need that FREEZE state, and you get more
+complicated code. (And I have not yet introduce details like system
+devices).
+
+Q: There don't seem to be any generally useful behavioral
+distinctions between SUSPEND and FREEZE.
+
+A: Doing SUSPEND when you are asked to do FREEZE is always correct,
+but it may be unneccessarily slow. If you want your driver to stay simple,
+slowness may not matter to you. It can always be fixed later.
+
+For devices like disk it does matter, you do not want to spindown for
+FREEZE.
+
+Q: After resuming, system is paging heavily, leading to very bad interactivity.
+
+A: Try running
+
+cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
+
+after resume. swapoff -a; swapon -a may also be useful.
+
+Q: What happens to devices during swsusp? They seem to be resumed
+during system suspend?
+
+A: That's correct. We need to resume them if we want to write image to
+disk. Whole sequence goes like
+
+ Suspend part
+ ~~~~~~~~~~~~
+ running system, user asks for suspend-to-disk
+
+ user processes are stopped
+
+ suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
+ with state snapshot
+
+ state snapshot: copy of whole used memory is taken with interrupts disabled
+
+ resume(): devices are woken up so that we can write image to swap
+
+ write image to swap
+
+ suspend(PMSG_SUSPEND): suspend devices so that we can power off
+
+ turn the power off
+
+ Resume part
+ ~~~~~~~~~~~
+ (is actually pretty similar)
+
+ running system, user asks for suspend-to-disk
+
+ user processes are stopped (in common case there are none, but with resume-from-initrd, noone knows)
+
+ read image from disk
+
+ suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
+ with image restoration
+
+ image restoration: rewrite memory with image
+
+ resume(): devices are woken up so that system can continue
+
+ thaw all user processes
+
+Q: What is this 'Encrypt suspend image' for?
+
+A: First of all: it is not a replacement for dm-crypt encrypted swap.
+It cannot protect your computer while it is suspended. Instead it does
+protect from leaking sensitive data after resume from suspend.
+
+Think of the following: you suspend while an application is running
+that keeps sensitive data in memory. The application itself prevents
+the data from being swapped out. Suspend, however, must write these
+data to swap to be able to resume later on. Without suspend encryption
+your sensitive data are then stored in plaintext on disk. This means
+that after resume your sensitive data are accessible to all
+applications having direct access to the swap device which was used
+for suspend. If you don't need swap after resume these data can remain
+on disk virtually forever. Thus it can happen that your system gets
+broken in weeks later and sensitive data which you thought were
+encrypted and protected are retrieved and stolen from the swap device.
+To prevent this situation you should use 'Encrypt suspend image'.
+
+During suspend a temporary key is created and this key is used to
+encrypt the data written to disk. When, during resume, the data was
+read back into memory the temporary key is destroyed which simply
+means that all data written to disk during suspend are then
+inaccessible so they can't be stolen later on. The only thing that
+you must then take care of is that you call 'mkswap' for the swap
+partition used for suspend as early as possible during regular
+boot. This asserts that any temporary key from an oopsed suspend or
+from a failed or aborted resume is erased from the swap device.
+
+As a rule of thumb use encrypted swap to protect your data while your
+system is shut down or suspended. Additionally use the encrypted
+suspend image to prevent sensitive data from being stolen after
+resume.
+
+Q: Can I suspend to a swap file?
+
+A: Generally, yes, you can. However, it requires you to use the "resume=" and
+"resume_offset=" kernel command line parameters, so the resume from a swap file
+cannot be initiated from an initrd or initramfs image. See
+swsusp-and-swap-files.txt for details.
+
+Q: Is there a maximum system RAM size that is supported by swsusp?
+
+A: It should work okay with highmem.
+
+Q: Does swsusp (to disk) use only one swap partition or can it use
+multiple swap partitions (aggregate them into one logical space)?
+
+A: Only one swap partition, sorry.
+
+Q: If my application(s) causes lots of memory & swap space to be used
+(over half of the total system RAM), is it correct that it is likely
+to be useless to try to suspend to disk while that app is running?
+
+A: No, it should work okay, as long as your app does not mlock()
+it. Just prepare big enough swap partition.
+
+Q: What information is useful for debugging suspend-to-disk problems?
+
+A: Well, last messages on the screen are always useful. If something
+is broken, it is usually some kernel driver, therefore trying with as
+little as possible modules loaded helps a lot. I also prefer people to
+suspend from console, preferably without X running. Booting with
+init=/bin/bash, then swapon and starting suspend sequence manually
+usually does the trick. Then it is good idea to try with latest
+vanilla kernel.
+
+Q: How can distributions ship a swsusp-supporting kernel with modular
+disk drivers (especially SATA)?
+
+A: Well, it can be done, load the drivers, then do echo into
+/sys/power/disk/resume file from initrd. Be sure not to mount
+anything, not even read-only mount, or you are going to lose your
+data.
+
+Q: How do I make suspend more verbose?
+
+A: If you want to see any non-error kernel messages on the virtual
+terminal the kernel switches to during suspend, you have to set the
+kernel console loglevel to at least 4 (KERN_WARNING), for example by
+doing
+
+ # save the old loglevel
+ read LOGLEVEL DUMMY < /proc/sys/kernel/printk
+ # set the loglevel so we see the progress bar.
+ # if the level is higher than needed, we leave it alone.
+ if [ $LOGLEVEL -lt 5 ]; then
+ echo 5 > /proc/sys/kernel/printk
+ fi
+
+ IMG_SZ=0
+ read IMG_SZ < /sys/power/image_size
+ echo -n disk > /sys/power/state
+ RET=$?
+ #
+ # the logic here is:
+ # if image_size > 0 (without kernel support, IMG_SZ will be zero),
+ # then try again with image_size set to zero.
+ if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
+ echo 0 > /sys/power/image_size
+ echo -n disk > /sys/power/state
+ RET=$?
+ fi
+
+ # restore previous loglevel
+ echo $LOGLEVEL > /proc/sys/kernel/printk
+ exit $RET
+
+Q: Is this true that if I have a mounted filesystem on a USB device and
+I suspend to disk, I can lose data unless the filesystem has been mounted
+with "sync"?
+
+A: That's right ... if you disconnect that device, you may lose data.
+In fact, even with "-o sync" you can lose data if your programs have
+information in buffers they haven't written out to a disk you disconnect,
+or if you disconnect before the device finished saving data you wrote.
+
+Software suspend normally powers down USB controllers, which is equivalent
+to disconnecting all USB devices attached to your system.
+
+Your system might well support low-power modes for its USB controllers
+while the system is asleep, maintaining the connection, using true sleep
+modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
+/sys/power/state file; write "standby" or "mem".) We've not seen any
+hardware that can use these modes through software suspend, although in
+theory some systems might support "platform" modes that won't break the
+USB connections.
+
+Remember that it's always a bad idea to unplug a disk drive containing a
+mounted filesystem. That's true even when your system is asleep! The
+safest thing is to unmount all filesystems on removable media (such USB,
+Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
+before suspending; then remount them after resuming.
+
+There is a work-around for this problem. For more information, see
+Documentation/usb/persist.txt.
+
+Q: Can I suspend-to-disk using a swap partition under LVM?
+
+A: No. You can suspend successfully, but you'll not be able to
+resume. uswsusp should be able to work with LVM. See suspend.sf.net.
+
+Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
+compiled with the similar configuration files. Anyway I found that
+suspend to disk (and resume) is much slower on 2.6.16 compared to
+2.6.15. Any idea for why that might happen or how can I speed it up?
+
+A: This is because the size of the suspend image is now greater than
+for 2.6.15 (by saving more data we can get more responsive system
+after resume).
+
+There's the /sys/power/image_size knob that controls the size of the
+image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
+root), the 2.6.15 behavior should be restored. If it is still too
+slow, take a look at suspend.sf.net -- userland suspend is faster and
+supports LZF compression to speed it up further.
diff --git a/Documentation/power/tricks.txt b/Documentation/power/tricks.txt
new file mode 100644
index 0000000..3b26bb5
--- /dev/null
+++ b/Documentation/power/tricks.txt
@@ -0,0 +1,27 @@
+ swsusp/S3 tricks
+ ~~~~~~~~~~~~~~~~
+Pavel Machek <pavel@suse.cz>
+
+If you want to trick swsusp/S3 into working, you might want to try:
+
+* go with minimal config, turn off drivers like USB, AGP you don't
+ really need
+
+* turn off APIC and preempt
+
+* use ext2. At least it has working fsck. [If something seems to go
+ wrong, force fsck when you have a chance]
+
+* turn off modules
+
+* use vga text console, shut down X. [If you really want X, you might
+ want to try vesafb later]
+
+* try running as few processes as possible, preferably go to single
+ user mode.
+
+* due to video issues, swsusp should be easier to get working than
+ S3. Try that first.
+
+When you make it work, try to find out what exactly was it that broke
+suspend, and preferably fix that.
diff --git a/Documentation/power/userland-swsusp.txt b/Documentation/power/userland-swsusp.txt
new file mode 100644
index 0000000..7b99636
--- /dev/null
+++ b/Documentation/power/userland-swsusp.txt
@@ -0,0 +1,165 @@
+Documentation for userland software suspend interface
+ (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
+
+First, the warnings at the beginning of swsusp.txt still apply.
+
+Second, you should read the FAQ in swsusp.txt _now_ if you have not
+done it already.
+
+Now, to use the userland interface for software suspend you need special
+utilities that will read/write the system memory snapshot from/to the
+kernel. Such utilities are available, for example, from
+<http://suspend.sourceforge.net>. You may want to have a look at them if you
+are going to develop your own suspend/resume utilities.
+
+The interface consists of a character device providing the open(),
+release(), read(), and write() operations as well as several ioctl()
+commands defined in include/linux/suspend_ioctls.h . The major and minor
+numbers of the device are, respectively, 10 and 231, and they can
+be read from /sys/class/misc/snapshot/dev.
+
+The device can be open either for reading or for writing. If open for
+reading, it is considered to be in the suspend mode. Otherwise it is
+assumed to be in the resume mode. The device cannot be open for simultaneous
+reading and writing. It is also impossible to have the device open more than
+once at a time.
+
+The ioctl() commands recognized by the device are:
+
+SNAPSHOT_FREEZE - freeze user space processes (the current process is
+ not frozen); this is required for SNAPSHOT_CREATE_IMAGE
+ and SNAPSHOT_ATOMIC_RESTORE to succeed
+
+SNAPSHOT_UNFREEZE - thaw user space processes frozen by SNAPSHOT_FREEZE
+
+SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
+ last argument of ioctl() should be a pointer to an int variable,
+ the value of which will indicate whether the call returned after
+ creating the snapshot (1) or after restoring the system memory state
+ from it (0) (after resume the system finds itself finishing the
+ SNAPSHOT_CREATE_IMAGE ioctl() again); after the snapshot
+ has been created the read() operation can be used to transfer
+ it out of the kernel
+
+SNAPSHOT_ATOMIC_RESTORE - restore the system memory state from the
+ uploaded snapshot image; before calling it you should transfer
+ the system memory snapshot back to the kernel using the write()
+ operation; this call will not succeed if the snapshot
+ image is not available to the kernel
+
+SNAPSHOT_FREE - free memory allocated for the snapshot image
+
+SNAPSHOT_PREF_IMAGE_SIZE - set the preferred maximum size of the image
+ (the kernel will do its best to ensure the image size will not exceed
+ this number, but if it turns out to be impossible, the kernel will
+ create the smallest image possible)
+
+SNAPSHOT_GET_IMAGE_SIZE - return the actual size of the hibernation image
+
+SNAPSHOT_AVAIL_SWAP_SIZE - return the amount of available swap in bytes (the
+ last argument should be a pointer to an unsigned int variable that will
+ contain the result if the call is successful).
+
+SNAPSHOT_ALLOC_SWAP_PAGE - allocate a swap page from the resume partition
+ (the last argument should be a pointer to a loff_t variable that
+ will contain the swap page offset if the call is successful)
+
+SNAPSHOT_FREE_SWAP_PAGES - free all swap pages allocated by
+ SNAPSHOT_ALLOC_SWAP_PAGE
+
+SNAPSHOT_SET_SWAP_AREA - set the resume partition and the offset (in <PAGE_SIZE>
+ units) from the beginning of the partition at which the swap header is
+ located (the last ioctl() argument should point to a struct
+ resume_swap_area, as defined in kernel/power/suspend_ioctls.h,
+ containing the resume device specification and the offset); for swap
+ partitions the offset is always 0, but it is different from zero for
+ swap files (see Documentation/swsusp-and-swap-files.txt for details).
+
+SNAPSHOT_PLATFORM_SUPPORT - enable/disable the hibernation platform support,
+ depending on the argument value (enable, if the argument is nonzero)
+
+SNAPSHOT_POWER_OFF - make the kernel transition the system to the hibernation
+ state (eg. ACPI S4) using the platform (eg. ACPI) driver
+
+SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
+ immediately enter the suspend-to-RAM state, so this call must always
+ be preceded by the SNAPSHOT_FREEZE call and it is also necessary
+ to use the SNAPSHOT_UNFREEZE call after the system wakes up. This call
+ is needed to implement the suspend-to-both mechanism in which the
+ suspend image is first created, as though the system had been suspended
+ to disk, and then the system is suspended to RAM (this makes it possible
+ to resume the system from RAM if there's enough battery power or restore
+ its state on the basis of the saved suspend image otherwise)
+
+The device's read() operation can be used to transfer the snapshot image from
+the kernel. It has the following limitations:
+- you cannot read() more than one virtual memory page at a time
+- read()s accross page boundaries are impossible (ie. if ypu read() 1/2 of
+ a page in the previous call, you will only be able to read()
+ _at_ _most_ 1/2 of the page in the next call)
+
+The device's write() operation is used for uploading the system memory snapshot
+into the kernel. It has the same limitations as the read() operation.
+
+The release() operation frees all memory allocated for the snapshot image
+and all swap pages allocated with SNAPSHOT_ALLOC_SWAP_PAGE (if any).
+Thus it is not necessary to use either SNAPSHOT_FREE or
+SNAPSHOT_FREE_SWAP_PAGES before closing the device (in fact it will also
+unfreeze user space processes frozen by SNAPSHOT_UNFREEZE if they are
+still frozen when the device is being closed).
+
+Currently it is assumed that the userland utilities reading/writing the
+snapshot image from/to the kernel will use a swap parition, called the resume
+partition, or a swap file as storage space (if a swap file is used, the resume
+partition is the partition that holds this file). However, this is not really
+required, as they can use, for example, a special (blank) suspend partition or
+a file on a partition that is unmounted before SNAPSHOT_CREATE_IMAGE and
+mounted afterwards.
+
+These utilities MUST NOT make any assumptions regarding the ordering of
+data within the snapshot image. The contents of the image are entirely owned
+by the kernel and its structure may be changed in future kernel releases.
+
+The snapshot image MUST be written to the kernel unaltered (ie. all of the image
+data, metadata and header MUST be written in _exactly_ the same amount, form
+and order in which they have been read). Otherwise, the behavior of the
+resumed system may be totally unpredictable.
+
+While executing SNAPSHOT_ATOMIC_RESTORE the kernel checks if the
+structure of the snapshot image is consistent with the information stored
+in the image header. If any inconsistencies are detected,
+SNAPSHOT_ATOMIC_RESTORE will not succeed. Still, this is not a fool-proof
+mechanism and the userland utilities using the interface SHOULD use additional
+means, such as checksums, to ensure the integrity of the snapshot image.
+
+The suspending and resuming utilities MUST lock themselves in memory,
+preferrably using mlockall(), before calling SNAPSHOT_FREEZE.
+
+The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
+in the memory location pointed to by the last argument of ioctl() and proceed
+in accordance with it:
+1. If the value is 1 (ie. the system memory snapshot has just been
+ created and the system is ready for saving it):
+ (a) The suspending utility MUST NOT close the snapshot device
+ _unless_ the whole suspend procedure is to be cancelled, in
+ which case, if the snapshot image has already been saved, the
+ suspending utility SHOULD destroy it, preferrably by zapping
+ its header. If the suspend is not to be cancelled, the
+ system MUST be powered off or rebooted after the snapshot
+ image has been saved.
+ (b) The suspending utility SHOULD NOT attempt to perform any
+ file system operations (including reads) on the file systems
+ that were mounted before SNAPSHOT_CREATE_IMAGE has been
+ called. However, it MAY mount a file system that was not
+ mounted at that time and perform some operations on it (eg.
+ use it for saving the image).
+2. If the value is 0 (ie. the system state has just been restored from
+ the snapshot image), the suspending utility MUST close the snapshot
+ device. Afterwards it will be treated as a regular userland process,
+ so it need not exit.
+
+The resuming utility SHOULD NOT attempt to mount any file systems that could
+be mounted before suspend and SHOULD NOT attempt to perform any operations
+involving such file systems.
+
+For details, please refer to the source code.
diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt
new file mode 100644
index 0000000..2b35849
--- /dev/null
+++ b/Documentation/power/video.txt
@@ -0,0 +1,185 @@
+
+ Video issues with S3 resume
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ 2003-2006, Pavel Machek
+
+During S3 resume, hardware needs to be reinitialized. For most
+devices, this is easy, and kernel driver knows how to do
+it. Unfortunately there's one exception: video card. Those are usually
+initialized by BIOS, and kernel does not have enough information to
+boot video card. (Kernel usually does not even contain video card
+driver -- vesafb and vgacon are widely used).
+
+This is not problem for swsusp, because during swsusp resume, BIOS is
+run normally so video card is normally initialized. It should not be
+problem for S1 standby, because hardware should retain its state over
+that.
+
+We either have to run video BIOS during early resume, or interpret it
+using vbetool later, or maybe nothing is necessary on particular
+system because video state is preserved. Unfortunately different
+methods work on different systems, and no known method suits all of
+them.
+
+Userland application called s2ram has been developed; it contains long
+whitelist of systems, and automatically selects working method for a
+given system. It can be downloaded from CVS at
+www.sf.net/projects/suspend . If you get a system that is not in the
+whitelist, please try to find a working solution, and submit whitelist
+entry so that work does not need to be repeated.
+
+Currently, VBE_SAVE method (6 below) works on most
+systems. Unfortunately, vbetool only runs after userland is resumed,
+so it makes debugging of early resume problems
+hard/impossible. Methods that do not rely on userland are preferable.
+
+Details
+~~~~~~~
+
+There are a few types of systems where video works after S3 resume:
+
+(1) systems where video state is preserved over S3.
+
+(2) systems where it is possible to call the video BIOS during S3
+ resume. Unfortunately, it is not correct to call the video BIOS at
+ that point, but it happens to work on some machines. Use
+ acpi_sleep=s3_bios.
+
+(3) systems that initialize video card into vga text mode and where
+ the BIOS works well enough to be able to set video mode. Use
+ acpi_sleep=s3_mode on these.
+
+(4) on some systems s3_bios kicks video into text mode, and
+ acpi_sleep=s3_bios,s3_mode is needed.
+
+(5) radeon systems, where X can soft-boot your video card. You'll need
+ a new enough X, and a plain text console (no vesafb or radeonfb). See
+ http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
+ Alternatively, you should use vbetool (6) instead.
+
+(6) other radeon systems, where vbetool is enough to bring system back
+ to life. It needs text console to be working. Do vbetool vbestate
+ save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
+ vbestate restore < /tmp/delme; setfont <whatever>, and your video
+ should work.
+
+(7) on some systems, it is possible to boot most of kernel, and then
+ POSTing bios works. Ole Rohne has patch to do just that at
+ http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
+
+(8) on some systems, you can use the video_post utility mentioned here:
+ http://bugzilla.kernel.org/show_bug.cgi?id=3670. Do echo 3 > /sys/power/state
+ && /usr/sbin/video_post - which will initialize the display in console mode.
+ If you are in X, you can switch to a virtual terminal and back to X using
+ CTRL+ALT+F1 - CTRL+ALT+F7 to get the display working in graphical mode again.
+
+Now, if you pass acpi_sleep=something, and it does not work with your
+bios, you'll get a hard crash during resume. Be careful. Also it is
+safest to do your experiments with plain old VGA console. The vesafb
+and radeonfb (etc) drivers have a tendency to crash the machine during
+resume.
+
+You may have a system where none of above works. At that point you
+either invent another ugly hack that works, or write proper driver for
+your video card (good luck getting docs :-(). Maybe suspending from X
+(proper X, knowing your hardware, not XF68_FBcon) might have better
+chance of working.
+
+Table of known working notebooks:
+
+Model hack (or "how to do it")
+------------------------------------------------------------------------------
+Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
+Acer TM 230 s3_bios (2)
+Acer TM 242FX vbetool (6)
+Acer TM C110 video_post (8)
+Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
+Acer TM 4052LCi s3_bios (2)
+Acer TM 636Lci s3_bios,s3_mode (4)
+Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
+Acer TM 660 ??? (*)
+Acer TM 800 vga=normal, X patches, see webpage (5) or vbetool (6)
+Acer TM 803 vga=normal, X patches, see webpage (5) or vbetool (6)
+Acer TM 803LCi vga=normal, vbetool (6)
+Arima W730a vbetool needed (6)
+Asus L2400D s3_mode (3)(***) (S1 also works OK)
+Asus L3350M (SiS 740) (6)
+Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
+Asus M6887Ne vga=normal, s3_bios (2), use radeon driver instead of fglrx in x.org
+Athlon64 desktop prototype s3_bios (2)
+Compal CL-50 ??? (*)
+Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
+Compaq Evo N620c vga=normal, s3_bios (2)
+Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
+Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
+Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
+Dell Inspiron 4000 ??? (*)
+Dell Inspiron 500m ??? (*)
+Dell Inspiron 510m ???
+Dell Inspiron 5150 vbetool needed (6)
+Dell Inspiron 600m ??? (*)
+Dell Inspiron 8200 ??? (*)
+Dell Inspiron 8500 ??? (*)
+Dell Inspiron 8600 ??? (*)
+eMachines athlon64 machines vbetool needed (6) (someone please get me model #s)
+HP NC6000 s3_bios, may not use radeonfb (2); or vbetool (6)
+HP NX7000 ??? (*)
+HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
+HP Omnibook XE3 athlon version none (1)
+HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
+HP Omnibook XE3L-GF vbetool (6)
+HP Omnibook 5150 none (1), (S1 also works OK)
+IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
+IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
+IBM TP R32 / Type 2658-MMG none (1)
+IBM TP R40 2722B3G ??? (*)
+IBM TP R50p / Type 1832-22U s3_bios (2)
+IBM TP R51 none (1)
+IBM TP T30 236681A ??? (*)
+IBM TP T40 / Type 2373-MU4 none (1)
+IBM TP T40p none (1)
+IBM TP R40p s3_bios (2)
+IBM TP T41p s3_bios (2), switch to X after resume
+IBM TP T42 s3_bios (2)
+IBM ThinkPad T42p (2373-GTG) s3_bios (2)
+IBM TP X20 ??? (*)
+IBM TP X30 s3_bios, s3_mode (4)
+IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
+IBM TP X32 none (1), but backlight is on and video is trashed after long suspend. s3_bios,s3_mode (4) works too. Perhaps that gets better results?
+IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
+IBM TP 600e none(1), but a switch to console and back to X is needed
+Medion MD4220 ??? (*)
+Samsung P35 vbetool needed (6)
+Sharp PC-AR10 (ATI rage) none (1), backlight does not switch off
+Sony Vaio PCG-C1VRX/K s3_bios (2)
+Sony Vaio PCG-F403 ??? (*)
+Sony Vaio PCG-GRT995MP none (1), works with 'nv' X driver
+Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
+Sony Vaio PCG-N505SN ??? (*)
+Sony Vaio vgn-s260 X or boot-radeon can init it (5)
+Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
+Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
+Toshiba Libretto L5 none (1)
+Toshiba Libretto 100CT/110CT vbetool (6)
+Toshiba Portege 3020CT s3_mode (3)
+Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
+Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
+Toshiba Satellite 4090XCDT ??? (*)
+Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****)
+Toshiba M30 (2) xor X with nvidia driver using internal AGP
+Uniwill 244IIO ??? (*)
+
+Known working desktop systems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mainboard Graphics card hack (or "how to do it")
+------------------------------------------------------------------------------
+Asus A7V8X nVidia RIVA TNT2 model 64 s3_bios,s3_mode (4)
+
+
+(*) from http://www.ubuntulinux.org/wiki/HoaryPMResults, not sure
+ which options to use. If you know, please tell me.
+
+(***) To be tested with a newer kernel.
+
+(****) Not with SMP kernel, UP only.
diff --git a/Documentation/power/video_extension.txt b/Documentation/power/video_extension.txt
new file mode 100644
index 0000000..b2f9b15
--- /dev/null
+++ b/Documentation/power/video_extension.txt
@@ -0,0 +1,37 @@
+ACPI video extensions
+~~~~~~~~~~~~~~~~~~~~~
+
+This driver implement the ACPI Extensions For Display Adapters for
+integrated graphics devices on motherboard, as specified in ACPI 2.0
+Specification, Appendix B, allowing to perform some basic control like
+defining the video POST device, retrieving EDID information or to
+setup a video output, etc. Note that this is an ref. implementation
+only. It may or may not work for your integrated video device.
+
+Interfaces exposed to userland through /proc/acpi/video:
+
+VGA/info : display the supported video bus device capability like Video ROM, CRT/LCD/TV.
+VGA/ROM : Used to get a copy of the display devices' ROM data (up to 4k).
+VGA/POST_info : Used to determine what options are implemented.
+VGA/POST : Used to get/set POST device.
+VGA/DOS : Used to get/set ownership of output switching:
+ Please refer ACPI spec B.4.1 _DOS
+VGA/CRT : CRT output
+VGA/LCD : LCD output
+VGA/TVO : TV output
+VGA/*/brightness : Used to get/set brightness of output device
+
+Notify event through /proc/acpi/event:
+
+#define ACPI_VIDEO_NOTIFY_SWITCH 0x80
+#define ACPI_VIDEO_NOTIFY_PROBE 0x81
+#define ACPI_VIDEO_NOTIFY_CYCLE 0x82
+#define ACPI_VIDEO_NOTIFY_NEXT_OUTPUT 0x83
+#define ACPI_VIDEO_NOTIFY_PREV_OUTPUT 0x84
+
+#define ACPI_VIDEO_NOTIFY_CYCLE_BRIGHTNESS 0x82
+#define ACPI_VIDEO_NOTIFY_INC_BRIGHTNESS 0x83
+#define ACPI_VIDEO_NOTIFY_DEC_BRIGHTNESS 0x84
+#define ACPI_VIDEO_NOTIFY_ZERO_BRIGHTNESS 0x85
+#define ACPI_VIDEO_NOTIFY_DISPLAY_OFF 0x86
+
OpenPOWER on IntegriCloud