summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
* Fix common misspellingsLucas De Marchi2011-03-319-17/+17
| | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* drbd: fix up merge errorLinus Torvalds2011-03-281-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | In commit 95a0f10cddbf ("drbd: store in-core bitmap little endian, regardless of architecture") drbd had made the sane choice to use little-endian bitmap functions everywhere. However, it used the horrible old functions names from <asm-generic/bitops/le.h>, that were never really meant to be exported. In the meantime, things got cleaned up, and in commit c4945b9ed472 ("asm-generic: rename generic little-endian bitops functions") we renamed the LE bitops to something sane, exactly so that they could be used in random code without people gouging their eyes out when seeing the crazy jumble of letters that were the old internal names. As a result the drbd thing merged cleanly (commit 8d49a77568d1: "Merge branch 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block"), since there was no data conflict - but the end result obviously doesn't actually compile. Reported-and-tested-by: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-03-2716-1395/+2214
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block: (122 commits) cciss: fix lost command issue drbd: need include for bitops functions declarations Revert "cciss: Add missing allocation in scsi_cmd_stack_setup and corresponding deallocation" cciss: fix missed command status value CMD_UNABORTABLE cciss: remove unnecessary casts cciss: Mask off error bits of c->busaddr in cmd_special_free when calling pci_free_consistent cciss: Inform controller we are using 32-bit tags. cciss: hoist tag masking out of loop cciss: Add missing allocation in scsi_cmd_stack_setup and corresponding deallocation cciss: export resettable host attribute drbd: drop code present under #ifdef which is relevant to 2.6.28 and below drbd: Fixed handling of read errors on a 'VerifyS' node drbd: Fixed handling of read errors on a 'VerifyT' node drbd: Implemented real timeout checking for request processing time drbd: Remove unused function atodb_endio() drbd: improve log message if received sector offset exceeds local capacity drbd: kill dead code drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio fails drbd: Removed left over, now wrong comments drbd: serialize admin requests for new verify run with pending bitmap io ...
| * cciss: fix lost command issueBud Brown2011-03-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Under certain workloads a command may seem to get lost. IOW, the Smart Array thinks all commands have been completed but we still have commands in our completion queue. This may lead to system instability, filesystems going read-only, or even panics depending on the affected filesystem. We add an extra read to force the write to complete. Testing shows this extra read avoids the problem. Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * drbd: need include for bitops functions declarationsStephen Rothwell2011-03-171-0/+3
| | | | | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * Revert "cciss: Add missing allocation in scsi_cmd_stack_setup and ↵Jens Axboe2011-03-121-9/+0
| | | | | | | | | | | | | | | | | | | | | | corresponding deallocation" This reverts commit 978eb516a4e1a1b47163518d6f5d5e81ab27a583. The commit was broken, relying on other changes that have not been committed yet. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: fix missed command status value CMD_UNABORTABLEStephen M. Cameron2011-03-122-2/+22
| | | | | | | | | | | | | | and fix a nearby typo, "do" that should have been "due" Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: remove unnecessary castsStephen M. Cameron2011-03-121-2/+1
| | | | | | | | | | Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: Mask off error bits of c->busaddr in cmd_special_free when calling ↵Stephen M. Cameron2011-03-121-3/+3
| | | | | | | | | | | | | | pci_free_consistent Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: Inform controller we are using 32-bit tags.Stephen M. Cameron2011-03-122-10/+16
| | | | | | | | | | | | | | | | Controller will DMA only 32-bits of the tag per command on completion if it knows we are only using 32-bit tags. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: hoist tag masking out of loopStephen M. Cameron2011-03-121-3/+1
| | | | | | | | | | | | | | | | In process_nonindexed_cmd, hoist figuring of masked tag out of loop since it is the same throughout. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: Add missing allocation in scsi_cmd_stack_setup and corresponding ↵Stephen M. Cameron2011-03-111-0/+9
| | | | | | | | | | | | | | | | | | | | deallocation This bit got lost somewhere along the way. Without this, panic. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * cciss: export resettable host attributeStephen M. Cameron2011-03-111-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | This attribute, requested by Redhat, allows kexec-tools to know whether the controller can honor the reset_devices kernel parameter and actually reset the controller. For kdump to work properly it is necessary that the reset_devices parameter be honored. This attribute enables kexec-tools to warn the user if they attempt to designate a non-resettable controller as the dump device. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * drbd: drop code present under #ifdef which is relevant to 2.6.28 and belowOr Gerlitz2011-03-101-5/+1
| | | | | | | | | | | | Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fixed handling of read errors on a 'VerifyS' nodePhilipp Reisner2011-03-101-4/+0
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fixed handling of read errors on a 'VerifyT' nodePhilipp Reisner2011-03-101-13/+15
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Implemented real timeout checking for request processing timePhilipp Reisner2011-03-105-0/+47
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Remove unused function atodb_endio()Andreas Gruenbacher2011-03-102-36/+6
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: improve log message if received sector offset exceeds local capacityLars Ellenberg2011-03-101-1/+2
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: kill dead codeLars Ellenberg2011-03-101-93/+0
| | | | | | | | | | | | | | | | This code became obsolete and unused last December with drbd: bitmap keep track of changes vs on-disk bitmap Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio failsLars Ellenberg2011-03-102-18/+34
| | | | | | | | | | | | | | | | | | Just deal with it more gracefully, if we fail to add even a single page to an empty bio. We used to BUG_ON() there, but it has been observed in some Xen deployment, so we need to handle that case more robustly now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Removed left over, now wrong commentsPhilipp Reisner2011-03-101-7/+1
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: serialize admin requests for new verify run with pending bitmap ioLars Ellenberg2011-03-101-0/+5
| | | | | | | | | | | | | | | | | | | | This is an addendum to drbd: serialize admin requests for new resync with pending bitmap io It avoids a race that could trigger "FIXME" assert log messages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix potential imbalance of ap_in_flightLars Ellenberg2011-03-102-29/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we receive a barrier ack, we walk the ring list of drbd requests in the transfer log of the respective epoch, do some housekeeping, and free those objects. We tried to keep epochs of mirrored and unmirrored drbd requests separate, and assert that no local-only requests are present in a barrier_acked epoch. It turns out that this has quite a number of corner cases and would add bloated code without functional benefit. We now revert the (insufficient) commits drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions drbd: Ensure that an epoch contains only requests of one kind and instead fix the processing of barrier acks to cope with a mix of local-only and mirrored requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: silence some noisy log messages during disconnectLars Ellenberg2011-03-102-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fail to send the information that we lost our disk, we have no connection, and no disk: no access to data anymore. That is either expected (deconfiguration), or there will be so much noise in the logs that "Sending state failed" is not useful at all. Drop it. If the reason for a shorter than expected receive was a signal, which we sent because we already decided to disconnect, these additional log messages are confusing and useless. This patch follows this pattern: - dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); + if (!signal_pending(current)) + dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); Also make them all dev_warn for consistency. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: describe bitmap locking for bulk operation in finer detailLars Ellenberg2011-03-105-63/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we do no longer in-place endian-swap the bitmap, we allow selected bitmap operations (testing bits, sometimes even settting bits) during some bulk operations. This caused us to hit a lot of FIXME asserts similar to FIXME asender in drbd_bm_count_bits, bitmap locked for 'write from resync_finished' by worker Which now is nonsense: looking at the bitmap is perfectly legal as long as it is not being resized. This cosmetic patch defines some flags to describe expectations in finer detail, so the asserts in e.g. bm_change_bits_to() can be skipped if appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: log UUIDs whenever they changeLars Ellenberg2011-03-105-51/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | All decisions about sync, sync direction, and wether or not to allow a connect or attach are based on our set of UUIDs to tag a data generation. Log changes to the UUIDs whenever they occur, logging "new current UUID P:Q:R:S" is more useful than "Creating new current UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: We can not process BIOs with a size of 0Philipp Reisner2011-03-101-0/+1
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Provide hints with the error message when clearing the sync pause flagPhilipp Reisner2011-03-101-2/+10
| | | | | | | | | | | | | | | | When the user clears the sync-pause flag, and sync stays in pause state, give hints to the user, why it still is in pause state. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: queue bitmap writeout more intelligentlyLars Ellenberg2011-03-102-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "lazy writeout" of cleared bitmap pages happens during resync, and should happen again once the resync finishes cleanly, or is aborted. If resync finished cleanly, or was aborted because of peer disk failure, we trigger the writeout from worker context in the after state change work. If resync was aborted because of connection failure, we should not immediately trigger bitmap writeout, but rather postpone the writeout to after the connection cleanup happened. We now do it in the receiver context from drbd_disconnect(). If resync was aborted because of local disk failure, well, there is nothing to write to anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: don't pointlessly queue bitmap send, if we lost connectionLars Ellenberg2011-03-101-2/+7
| | | | | | | | | | | | | | | | | | This is a minor optimization and cleanup, and also considerably reduces some harmless (but noisy) race with the connection cleanup code. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: serialize admin requests for new resync with pending bitmap ioLars Ellenberg2011-03-101-1/+8
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: only generate and send a new sync uuid after a successful state changeLars Ellenberg2011-03-101-13/+12
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: cleaned up __set_current_state() followed by schedule_timeout() callsPhilipp Reisner2011-03-103-10/+5
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Ensure that an epoch contains only requests of one kindPhilipp Reisner2011-03-103-26/+28
| | | | | | | | | | | | | | | | | | The assert in drbd_req.c:755 forces us to have only requests of one kind in an epoch. The two kinds we distinguish here are: local-only or mirrored. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fixed P_NEG_ACK processing for protocol A and BPhilipp Reisner2011-03-101-12/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs. The master bio might already be completed, therefore the request is no longer in the collision hash. => Do not try to validate block_id as request In Protocol B we might already have got a P_RECV_ACK but then get a P_NEG_ACK after wards. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Killed an assert that is no longer validPhilipp Reisner2011-03-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The point is that drbd_disconnect() can be called with a cstate of WFConnection. That happens if the user issues "drbdsetup disconnect" while the drbd_connect() function executes. Then drbdd_init() will call drbdd(), which in turn will return without receiving any packets. Then drbdd_init() will end up calling drbd_disconnect() with a cstate of WFConnection. Bottom line: This assertion is wrong as it is, and we do not see value in fixing it. => Removing it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Do not drop net config if sending in drbd_send_protocol() failsPhilipp Reisner2011-03-102-2/+2
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Work on the Ahead -> SyncSource transitionPhilipp Reisner2011-03-104-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The test if rs_pending_cnt == 0 was too weak. Using Test for unacked_cnt == 0 instead. Moved that into the worker. Since unacked_cnt gets already increased when an P_RS_DATA_REQ comes in. Also using a timer to make Ahead -> SyncSource -> Ahead cycles slower... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Nothing should stop SyncSource -> Ahead transitionsPhilipp Reisner2011-03-101-1/+1
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Do not full sync if a P_SYNC_UUID packet gets lostPhilipp Reisner2011-03-103-15/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See also commit from 2009-08-15 "drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost." We saw cases where the History UUIDs where not as expected. So the detection of the special case did not trigger. With the sync UUID no longer being a random number, but deducible from the previous bitmap UUID, the detection of this special case becomes more reliable. The SyncUUID now is the previous bitmap UUID + 0x1000000000000. Rule 5a: Cs = H1p & H1p + Offset = Bp Connection was lost before SyncUUID Packet came through. Corrent (peer) UUIDs: Bp = H1p H1p = H2p H2p = 0 Become Sync target. Rule 7a: Cp = H1s & H1s + Offset = Bs Connection was lost before SyncUUID Packet came through. Correct (own) UUIDs: Bs = H1s H1s = H2s H2s = 0 Become Sync source. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Corrected off-by-one error in DRBD_MINOR_COUNT_MAXPhilipp Reisner2011-03-101-3/+4
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Remove useless / wrong commentsAndreas Gruenbacher2011-03-101-10/+0
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Cleaned up the resync timer logicPhilipp Reisner2011-03-103-39/+13
| | | | | | | | | | | | | | | | | | Besides removed a few lines of code, this moves the inspection of the state from before the queuing process to after the queuing. I.e. more closely to the actual invocation of the work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Be more careful with SyncSource -> Ahead transitionsPhilipp Reisner2011-03-102-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | We may not get from SyncSource to Ahead if we have sent some P_RS_DATA_REPLY packets to the peer and are waiting for P_WRITE_ACK. Again, this is not relevant for proper tuned systems, but makes sure that the not-tuned system does not get diverging bitmaps. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: No longer answer P_RS_DATA_REQUEST packets when in C_AHEAD modePhilipp Reisner2011-03-103-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the sync source node replies to a P_RS_DATA_REQUEST packet when it is already in ahead mode. I.e. those two packets crossed each other on the wire, that may lead to diverging bitmaps. This never happens in a well-tuned-system. In a well-tuned- system the resync controller has reduced the resync speed to zero long before we got into ahead-mode. But we have to be prepared for the not-well-tuned-system of course as well. Because -> diverging bitmaps = non terminating resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitionsPhilipp Reisner2011-03-101-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | Create a new barrier when leaving the AHEAD mode. Otherwise we trigger the assertion in req_mod(, barrier_acked) D_ASSERT(req->rq_state & RQ_NET_SENT); The new barrier is created by recycling the newest existing one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: ratelimit io error messagesLars Ellenberg2011-03-101-4/+5
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: There might be a resync after unfreezing IO due to no disk [Bugz 332]Philipp Reisner2011-03-101-7/+5
| | | | | | | | | | | | | | | | When on-no-data-accessible is set to suspend-io, also consider that a Primary, SyncTarget node losses its connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix potential access of on-stack wait_queue_head_t after returnLars Ellenberg2011-03-101-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I run into something declaring itself as "spinlock deadlock", BUG: spinlock lockup on CPU#1, kjournald/27816, ffff88000ad6bca0 Pid: 27816, comm: kjournald Tainted: G W 2.6.34.6 #2 Call Trace: <IRQ> [<ffffffff811ba0aa>] do_raw_spin_lock+0x11e/0x14d [<ffffffff81340fde>] _raw_spin_lock_irqsave+0x6a/0x81 [<ffffffff8103b694>] ? __wake_up+0x22/0x50 [<ffffffff8103b694>] __wake_up+0x22/0x50 [<ffffffffa07ff661>] bm_async_io_complete+0x258/0x299 [drbd] but the call traces do not fit at all, all other cpus are cpu_idle. I think it may be this race: drbd_bm_write_page wait_queue_head_t io_wait; atomic_t in_flight; bm_async_io submit_bio bm_async_io_complete if (atomic_dec_and_test(in_flight)) wait_event(io_wait, atomic_read(in_flight) == 0) return wake_up(io_wait) The wake_up now accesses the wait_queue_head_t spinlock, which is no longer valid, since the stack frame of drbd_bm_write_page has been clobbered now. Fix this by using struct completion, which does both the condition test as well as the wake_up inside its spinlock, so this race cannot happen. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
OpenPOWER on IntegriCloud