summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* blkio: Add io controller stats likeDivyesh Shah2010-04-023-25/+194
| | | | | | | | | | | | | | - io_service_time - io_wait_time - io_serviced - io_service_bytes These stats are accumulated per operation type helping us to distinguish between read and write, and sync and async IO. This patch does not increment any of these stats. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* blkio: Remove per-cfqq nr_sectors as we'll be passingDivyesh Shah2010-04-023-14/+5
| | | | | | | | | that info at request dispatch with other stats now. This patch removes the existing support for accounting sectors for a blkio_group. This will be added back differently in the next two patches. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Merge branch 'for-linus' into for-2.6.35Jens Axboe2010-04-0233-176/+461
|\
| * Block: Fix block/elevator.c elevator_get() off-by-one errorwzt.wzt@gmail.com2010-04-021-1/+1
| | | | | | | | | | | | | | | | | | elevator_get() not check the name length, if the name length > sizeof(elv), elv will miss the '\0'. And elv buffer will be replace "-iosched" as something like aaaaaaaaa, then call request_module() can load an not trust module. Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * drbd: lc_element_by_index() never returns NULLPhilipp Reisner2010-04-021-1/+1
| | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: unlock on error pathDan Carpenter2010-04-021-0/+1
| | | | | | | | | | | | | | | | We take the spin_lock again in fail_all_cmds() so we need to unlock here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cfq-iosched: Do not merge queues of BE and IDLE classesDivyesh Shah2010-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if they are found to be co-operating. The prio_trees do not have any IDLE cfqqs on them. cfq_close_cooperator() is called from cfq_select_queue() and cfq_completed_request(). The latter ensures that the close cooperator code does not get invoked if the current cfqq is of class IDLE but the former doesn't seem to have any such checks. So an IDLE cfqq may get merged with a BE cfqq from the same group which should be avoided. Signed-off-by: Divyesh Shah<dpshah@google.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cfq-iosched: Add additional blktrace log messages in CFQ for easier debuggingDivyesh Shah2010-03-251-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have helped us debug some issues we've noticed in earlier IO controller versions and should be useful now as well. The extra logging covers: - idling behavior. Since there are so many conditions based on which we decide to idle or not, this patch adds a log message for some conditions that we've found useful. - workload slices and current prio and workload type Changelog from v1: o moved log message from cfq_set_active_queue() to __cfq_set_active_queue() o changed queue_count to st->count Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * i2o: Remove the dangerous kobj_to_i2o_device macroFerenc Wagner2010-03-241-1/+0
| | | | | | | | | | | | | | | | | | | | This macro worked only when applied to variables named 'kobj'. While this could have been fixed by simply renaming the macro argument, a more type-safe replacement by an inline function would be preferred. However, nobody uses this macro, so it's simpler to just remove it. Signed-off-by: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: remove 16 bytes of padding from struct request on 64bitsRichard Kennedy2010-03-191-5/+6
| | | | | | | | | | | | | | | | | | Remove alignment padding to shrink struct request from 336 to 320 bytes so needing one fewer cacheline and therefore removing 48 bytes from struct request_queue. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Merge branch 'master' into for-linusJens Axboe2010-03-191756-49229/+63811
| |\ | | | | | | | | | | | | | | | | | | Conflicts: block/Kconfig Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cfq-iosched: fix a kbuild regressionShaohua Li2010-03-191-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alex Shi reported a kbuild regression which is about 10% performance lost. He bisected to this commit: 3dde36ddea3e07dd025c4c1ba47edec91606fec0. The reason is cfqq_close() can't find close cooperator. Restoring cfq_rq_close()'s threshold to original value makes the regression go away. Since for_preempt parameter isn't used anymore, this patch deletes it. Reported-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | block: make CONFIG_BLK_CGROUP visibleLi Zefan2010-03-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Make the config visible, so we can choose from CONFIG_BLK_CGROUP=y and CONFIG_BLK_CGROUP=m when CONFIG_IOSCHED_CFQ=m. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | Remove GENHD_FL_DRIVERFSNeilBrown2010-03-162-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag is not used, so best discarded. Signed-off-by: NeilBrown <neilb@suse.de> -- Hi Jens, I came across this recently - these are the only two occurances of "GENHD_FL_DRIVERFS" in the kernel, so it cannot be needed. NeilBrown Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | block: Export max number of segments and max segment size in sysfsMartin K. Petersen2010-03-151-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | These two values are useful when debugging issues surrounding maximum I/O size. Put them in sysfs with the rest of the queue limits. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | block: Finalize conversion of block limits functionsMartin K. Petersen2010-03-153-28/+2
| | | | | | | | | | | | | | | | | | | | | Remove compatibility wrappers and update remaining drivers. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | block: Fix overrun in lcm() and move it to libMartin K. Petersen2010-03-154-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lcm() was defined to take integer-sized arguments. The supplied arguments are multiplied, however, causing us to overflow given sufficiently large input. That in turn led to incorrect optimal I/O size reporting in some cases (RAID over RAID). Switch lcm() over to unsigned long similar to gcd() and move the function from blk-settings.c to lib. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | vfs: improve writeback_inodes_wb()Edward Shishkin2010-03-122-60/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not pin/unpin superblock for every inode in writeback_inodes_wb(), pin it for the whole group of inodes which belong to the same superblock and call writeback_sb_inodes() handler for them. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | paride: fix off-by-one testRoel Kluin2010-03-123-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With `while (j++ < PX_SPIN)' j reaches PX_SPIN + 1 after the loop. This is probably unlikely to produce a problem. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | drbd: fix al-to-on-disk-bitmap for 4k logical_block_sizeLars Ellenberg2010-03-111-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, applying the in-core activity-log to the on-disk bitmap did not care for logical_block_size. On logical_block_size != 512 byte, this very likely results in misalligned block access and spurious "io errors". We now simply always submit aligned whole 4k blocks, fixing this for logical block sizes of 512, 1024, 2048 and 4096. For even larger logical block sizes, this won't work. But I'm not aware of devices with such properties being available. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: Renamed overwrite_peer to primary_forcePhilipp Reisner2010-03-112-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: Forcing primary should also work for Consistent disks [Bugz 266]Philipp Reisner2010-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Up to now this only worked for Outdated and Inconsistent disks, that it did not worked for Consistent disks was an inconsistent omission. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: Make sure we do not send state updates during an empty resync [Bugz 271]Philipp Reisner2010-03-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | This is a race condition that existed for ages. The previous commit reduces the window, this one closes it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: Reduce the time an empty resync takes usuallyPhilipp Reisner2010-03-113-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | This mitigates changes introduced with commit: http://git.drbd.org/?p=drbd-8.3.git;a=commit;h=4b6803a3276652da3737 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: add missing drbd command names to avoid <NULL> in error messagesLars Ellenberg2010-03-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmdname() should map command number to its human readable representation. The string table was incomplete, though. Maybe rather do a switch() block, and let the compiler help us to keep it complete? Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd_disconnect: grab meta.socket mutex as wellLars Ellenberg2010-03-112-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a race and potential kernel panic if e.g. the worker was just about to send a few P_RS_IS_IN_SYNC via the meta socket for checksum based resync, while the receiver destroys the sockets in drbd_disconnect. To make sure no-one is using the meta socket, it is not enough to stop the asender... Grab the meta socket mutex before destroying it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | fix unit of rs_same_csums accountingLars Ellenberg2010-03-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on resync request size, we need to account for more than one bit. Impact: cosmetic If SyncTarget reported correctly 100% equal checksums, the SyncSource usually reported 12% equal checksums instead, because it only counted requests, we typically do 32k resync requests, and the bitmap granularity is still 4k. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: fix broken state change after split-brain attach while connectedLars Ellenberg2010-03-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Situation: we have diverging data sets, i.e. we had a split brain somewhen, but currently are connected, one node diskless. Then we try to attach that disk, figure it is consistent, but has a diverging data set, we refuse to attach. This led to strange state changes: 22:18:35 bb drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> Connected) pdsk( DUnknown -> UpToDate ) 22:19:30 bb drbd1: disk( Diskless -> Attaching ) 22:19:30 bb drbd1: disk( Attaching -> Negotiating ) 22:19:30 bb drbd1: drbd_sync_handshake: 22:19:30 bb drbd1: self 97BF25798B9D5222:F33D1F62ADE698DD:4269796F9D027C83:AC45D8B5C3C1BF93 bits:19449 flags:0 22:19:30 bb drbd1: peer 280DFB6E125465D3:F33D1F62ADE698DC:4269796F9D027C82:AC45D8B5C3C1BF93 bits:2575806 flags:0 22:19:30 bb drbd1: uuid_compare()=100 by rule 90 22:19:30 bb drbd1: Split-Brain detected, dropping connection! 22:19:30 bb drbd1: disk( Negotiating -> Diskless ) while the other side says: 22:19:30 aa drbd1: Split-Brain detected, dropping connection! 22:19:30 aa drbd1: Disk attach process on the peer node was aborted. 22:19:30 aa drbd1: conn( Connected -> TOO_LARGE ) pdsk( Diskless -> Consistent ) This should be fixed now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: fix NULL pointer dereference on 4k hard sect sizeLars Ellenberg2010-03-111-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | we still don't support 4k 'physical' sectors 'natively', but use a read-modify-write workaround. And we even tried to use the extra page before we allocated it :( Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | drbd: --dry-run option for drbdsetup net ( drbdadm -- --dry-run connect <res> )Philipp Reisner2010-03-115-6/+43
| | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | block: drbd: Convert semaphore to mutexThomas Gleixner2010-03-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | The bm_change semaphore is semantically a mutex. Convert it to a real mutex. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
| * | Add DocBook documentation for the block tracepoints.William Cohen2010-03-092-0/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a simple description of the various block tracepoints available in the kernel. Signed-off-by: William Cohen <wcohen@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | Documentation: fix block/biodoc.txt dma mapping descriptionFUJITA Tomonori2010-03-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - It looks incorrect to use blk_rq_map_sg with pci_map_page here about DMA mappings. dma_map_sg? - better to use dma_map_page instead of pci_map_page. http://marc.info/?l=linux-kernel&m=126596737604808&w=2 Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | blkdev: fix merge_bvec_fn return value checks v2Dmitry Monakhov2010-03-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | merge_bvec_fn() returns bvec->bv_len on success. So we have to check against this value. But in case of fs_optimization merge we compare with wrong value. This patch must be included in b428cd6da7e6559aca69aa2e3a526037d3f20403 But accidentally i've forgot to add this in the initial patch. To make things straight let's replace all such checks. In fact this makes code easy to understand. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | Linux 2.6.34-rc3v2.6.34-rc3Linus Torvalds2010-03-301-1/+1
| | |
* | | KEYS: Add MAINTAINERS recordDavid Howells2010-03-301-0/+10
| | | | | | | | | | | | | | | | | | | | | Add a MAINTAINERS record for the key management facility. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2010-03-301-1/+5
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: CRED: Fix memory leak in error handling
| * | | CRED: Fix memory leak in error handlingMathieu Desnoyers2010-03-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a memory leak on an OOM condition in prepare_usermodehelper_creds(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfsLinus Torvalds2010-03-307-36/+67
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs: [LogFS] Erase new journal segments [LogFS] Move reserved segments with journal [LogFS] Clear PagePrivate when moving journal Simplify and fix pad_wbuf Prevent data corruption in logfs_rewrite_block() Use deactivate_locked_super Fix logfs_get_sb_final error path Write out both superblocks on mismatch Prevent schedule while atomic in __logfs_readdir Plug memory leak in writeseg_end_io Limit max_pages for insane devices Open segment file before using it
| * | | | [LogFS] Erase new journal segmentsJoern Engel2010-03-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device contains on old logfs image and the journal is moved to segment that have never been used by the current logfs and not all journal segments are erased before the next mount, the old content can confuse mount code. To prevent this, always erase the new journal segments. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | [LogFS] Move reserved segments with journalJoern Engel2010-03-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a GC livelock. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | [LogFS] Clear PagePrivate when moving journalJoern Engel2010-03-283-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_logfs_journal_wl_pass() must call freeseg(), thereby clear PagePrivate on all pages of the current journal segment. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Simplify and fix pad_wbufJoern Engel2010-03-281-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A comment in the old code read: /* The math in this function can surely use some love */ And indeed it did. In the case that area->a_used_bytes is exactly 4096 bytes below segment size it fell apart. pad_wbuf is now split into two helpers that are significantly less complicated. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Prevent data corruption in logfs_rewrite_block()Joern Engel2010-03-281-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment was correct, so make the code match the comment. As the new comment indicates, we might be able to do a little less work. But for the current -rc series let's keep it simple and just fix the bug. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Use deactivate_locked_superJoern Engel2010-03-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found by Al Viro. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Fix logfs_get_sb_final error pathJoern Engel2010-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rootdir was already allocated, so we must iput it again. Found by Al Viro. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Write out both superblocks on mismatchJoern Engel2010-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the first superblock is wrong and the second gets written, there will still be a mismatch on next mount. Write both to make sure they match. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Prevent schedule while atomic in __logfs_readdirJoern Engel2010-03-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently filldir can sleep, which forbids kmap_atomic. Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Plug memory leak in writeseg_end_ioJoern Engel2010-03-271-0/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Joern Engel <joern@logfs.org>
| * | | | Limit max_pages for insane devicesJoern Engel2010-03-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel SSDs have a limit of 0xffff as queue_max_hw_sectors(q). Such a limit may make sense from a hardware pov, but it causes bio_alloc() to return NULL. Signed-off-by: Joern Engel <joern@logfs.org>
OpenPOWER on IntegriCloud