summaryrefslogtreecommitdiffstats
path: root/Documentation/power/devices.txt
blob: f987afe43e28e1327ee76ab4f3fdb617abfe0829 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298

Device Power Management


Device power management encompasses two areas - the ability to save
state and transition a device to a low-power state when the system is
entering a low-power state; and the ability to transition a device to
a low-power state while the system is running (and independently of
any other power management activity). 


Methods

The methods to suspend and resume devices reside in struct bus_type: 

struct bus_type {
       ...
       int             (*suspend)(struct device * dev, pm_message_t state);
       int             (*resume)(struct device * dev);
};

Each bus driver is responsible implementing these methods, translating
the call into a bus-specific request and forwarding the call to the
bus-specific drivers. For example, PCI drivers implement suspend() and
resume() methods in struct pci_driver. The PCI core is simply
responsible for translating the pointers to PCI-specific ones and
calling the low-level driver.

This is done to a) ease transition to the new power management methods
and leverage the existing PM code in various bus drivers; b) allow
buses to implement generic and default PM routines for devices, and c)
make the flow of execution obvious to the reader. 


System Power Management

When the system enters a low-power state, the device tree is walked in
a depth-first fashion to transition each device into a low-power
state. The ordering of the device tree is guaranteed by the order in
which devices get registered - children are never registered before
their ancestors, and devices are placed at the back of the list when
registered. By walking the list in reverse order, we are guaranteed to
suspend devices in the proper order. 

Devices are suspended once with interrupts enabled. Drivers are
expected to stop I/O transactions, save device state, and place the
device into a low-power state. Drivers may sleep, allocate memory,
etc. at will. 

Some devices are broken and will inevitably have problems powering
down or disabling themselves with interrupts enabled. For these
special cases, they may return -EAGAIN. This will put the device on a
list to be taken care of later. When interrupts are disabled, before
we enter the low-power state, their drivers are called again to put
their device to sleep. 

On resume, the devices that returned -EAGAIN will be called to power
themselves back on with interrupts disabled. Once interrupts have been
re-enabled, the rest of the drivers will be called to resume their
devices. On resume, a driver is responsible for powering back on each
device, restoring state, and re-enabling I/O transactions for that
device. 

System devices follow a slightly different API, which can be found in

	include/linux/sysdev.h
	drivers/base/sys.c

System devices will only be suspended with interrupts disabled, and
after all other devices have been suspended. On resume, they will be
resumed before any other devices, and also with interrupts disabled.


Runtime Power Management

Many devices are able to dynamically power down while the system is
still running. This feature is useful for devices that are not being
used, and can offer significant power savings on a running system. 

In each device's directory, there is a 'power' directory, which
contains at least a 'state' file. Reading from this file displays what
power state the device is currently in. Writing to this file initiates
a transition to the specified power state, which must be a decimal in
the range 1-3, inclusive; or 0 for 'On'.

The PM core will call the ->suspend() method in the bus_type object
that the device belongs to if the specified state is not 0, or
->resume() if it is. 

Nothing will happen if the specified state is the same state the
device is currently in. 

If the device is already in a low-power state, and the specified state
is another, but different, low-power state, the ->resume() method will
first be called to power the device back on, then ->suspend() will be
called again with the new state. 

The driver is responsible for saving the working state of the device
and putting it into the low-power state specified. If this was
successful, it returns 0, and the device's power_state field is
updated. 

The driver must take care to know whether or not it is able to
properly resume the device, including all step of reinitialization
necessary. (This is the hardest part, and the one most protected by
NDA'd documents). 

The driver must also take care not to suspend a device that is
currently in use. It is their responsibility to provide their own
exclusion mechanisms.

The runtime power transition happens with interrupts enabled. If a
device cannot support being powered down with interrupts, it may
return -EAGAIN (as it would during a system power management
transition),  but it will _not_ be called again, and the transaction
will fail.

There is currently no way to know what states a device or driver
supports a priori. This will change in the future. 

pm_message_t meaning

pm_message_t has two fields. event ("major"), and flags.  If driver
does not know event code, it aborts the request, returning error. Some
drivers may need to deal with special cases based on the actual type
of suspend operation being done at the system level. This is why
there are flags.

Event codes are:

ON -- no need to do anything except special cases like broken
HW.

# NOTIFICATION -- pretty much same as ON?

FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
scratch. That probably means stop accepting upstream requests, the
actual policy of what to do with them beeing specific to a given
driver. It's acceptable for a network driver to just drop packets
while a block driver is expected to block the queue so no request is
lost. (Use IDE as an example on how to do that). FREEZE requires no
power state change, and it's expected for drivers to be able to
quickly transition back to operating state.

SUSPEND -- like FREEZE, but also put hardware into low-power state. If
there's need to distinguish several levels of sleep, additional flag
is probably best way to do that.

Transitions are only from a resumed state to a suspended state, never
between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).

All events are:

[NOTE NOTE NOTE: If you are driver author, you should not care; you
should only look at event, and ignore flags.]

#Prepare for suspend -- userland is still running but we are going to
#enter suspend state. This gives drivers chance to load firmware from
#disk and store it in memory, or do other activities taht require
#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
#are forbiden once the suspend dance is started.. event = ON, flags =
#PREPARE_TO_SUSPEND

Apm standby -- prepare for APM event. Quiesce devices to make life
easier for APM BIOS. event = FREEZE, flags = APM_STANDBY

Apm suspend -- same as APM_STANDBY, but it we should probably avoid
spinning down disks. event = FREEZE, flags = APM_SUSPEND

System halt, reboot -- quiesce devices to make life easier for BIOS. event
= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT

System shutdown -- at least disks need to be spun down, or data may be
lost. Quiesce devices, just to make life easier for BIOS. event =
FREEZE, flags = SYSTEM_SHUTDOWN

Kexec    -- turn off DMAs and put hardware into some state where new
kernel can take over. event = FREEZE, flags = KEXEC

Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
may need to be enabled on some devices. This actually has at least 3
subtypes, system can reboot, enter S4 and enter S5 at the end of
swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
SYSTEM_SHUTDOWN, SYSTEM_S4

Suspend to ram  -- put devices into low power state. event = SUSPEND,
flags = SUSPEND_TO_RAM

Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
devices into low power mode, but you must be able to reinitialize
device from scratch in resume method. This has two flavors, its done
once on suspending kernel, once on resuming kernel. event = FREEZE,
flags = DURING_SUSPEND or DURING_RESUME

Device detach requested from /sys -- deinitialize device; proably same as
SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
= FREEZE, flags = DEV_DETACH.

#These are not really events sent:
#
#System fully on -- device is working normally; this is probably never
#passed to suspend() method... event = ON, flags = 0
#
#Ready after resume -- userland is now running, again. Time to free any
#memory you ate during prepare to suspend... event = ON, flags =
#READY_AFTER_RESUME
#


pm_message_t meaning

pm_message_t has two fields. event ("major"), and flags.  If driver
does not know event code, it aborts the request, returning error. Some
drivers may need to deal with special cases based on the actual type
of suspend operation being done at the system level. This is why
there are flags.

Event codes are:

ON -- no need to do anything except special cases like broken
HW.

# NOTIFICATION -- pretty much same as ON?

FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
scratch. That probably means stop accepting upstream requests, the
actual policy of what to do with them being specific to a given
driver. It's acceptable for a network driver to just drop packets
while a block driver is expected to block the queue so no request is
lost. (Use IDE as an example on how to do that). FREEZE requires no
power state change, and it's expected for drivers to be able to
quickly transition back to operating state.

SUSPEND -- like FREEZE, but also put hardware into low-power state. If
there's need to distinguish several levels of sleep, additional flag
is probably best way to do that.

Transitions are only from a resumed state to a suspended state, never
between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).

All events are:

[NOTE NOTE NOTE: If you are driver author, you should not care; you
should only look at event, and ignore flags.]

#Prepare for suspend -- userland is still running but we are going to
#enter suspend state. This gives drivers chance to load firmware from
#disk and store it in memory, or do other activities taht require
#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
#are forbiden once the suspend dance is started.. event = ON, flags =
#PREPARE_TO_SUSPEND

Apm standby -- prepare for APM event. Quiesce devices to make life
easier for APM BIOS. event = FREEZE, flags = APM_STANDBY

Apm suspend -- same as APM_STANDBY, but it we should probably avoid
spinning down disks. event = FREEZE, flags = APM_SUSPEND

System halt, reboot -- quiesce devices to make life easier for BIOS. event
= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT

System shutdown -- at least disks need to be spun down, or data may be
lost. Quiesce devices, just to make life easier for BIOS. event =
FREEZE, flags = SYSTEM_SHUTDOWN

Kexec    -- turn off DMAs and put hardware into some state where new
kernel can take over. event = FREEZE, flags = KEXEC

Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
may need to be enabled on some devices. This actually has at least 3
subtypes, system can reboot, enter S4 and enter S5 at the end of
swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
SYSTEM_SHUTDOWN, SYSTEM_S4

Suspend to ram  -- put devices into low power state. event = SUSPEND,
flags = SUSPEND_TO_RAM

Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
devices into low power mode, but you must be able to reinitialize
device from scratch in resume method. This has two flavors, its done
once on suspending kernel, once on resuming kernel. event = FREEZE,
flags = DURING_SUSPEND or DURING_RESUME

Device detach requested from /sys -- deinitialize device; proably same as
SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
= FREEZE, flags = DEV_DETACH.

#These are not really events sent:
#
#System fully on -- device is working normally; this is probably never
#passed to suspend() method... event = ON, flags = 0
#
#Ready after resume -- userland is now running, again. Time to free any
#memory you ate during prepare to suspend... event = ON, flags =
#READY_AFTER_RESUME
#
OpenPOWER on IntegriCloud