summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* dma-debug: hash_bucket_find needs to allow for offsets within an entryNeil Horman2011-08-231-7/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Users of the pci_dma_sync_single_* api allow users to sync address ranges within the range of a mapped entry (i.e. you can dma map address X to dma_addr_t A and then pci_dma_sync_single on dma_addr_t A+1. The dma-debug library however assume dma syncs will always occur using the base address of a mapped region, and uses that assumption to find entries in its hash table. Since thats often (but not always the case), the dma debug library can give us false errors about missing entries, which are reported as syncing of memory not allocated by the driver. This was noted in the cxgb3 driver as this error: WARNING: at lib/dma-debug.c:902 check_sync+0xdd/0x48c() Hardware name: To be filled by O.E.M. cxgb3 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000fff97800] [size=1984 bytes] Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer e1000e snd soundcore r8169 cxgb3 iTCO_wdt snd_page_alloc mii shpchp i2c_i801 iTCO_vendor_support mdio microcode firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1818, comm: ifconfig Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1 Call Trace: [<ffffffff81050f71>] warn_slowpath_common+0x85/0x9d [<ffffffff8105102c>] warn_slowpath_fmt+0x46/0x48 [<ffffffff8124658e>] ? check_sync+0x39/0x48c [<ffffffff8107c470>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81246632>] check_sync+0xdd/0x48c [<ffffffff81246ca6>] debug_dma_sync_single_for_device+0x3f/0x41 [<ffffffffa011615c>] ? pci_map_page+0x84/0x97 [cxgb3] [<ffffffffa0117bc3>] pci_dma_sync_single_for_device.clone.0+0x65/0x6e [cxgb3] [<ffffffffa0117ed1>] refill_fl+0x305/0x30a [cxgb3] [<ffffffffa011857d>] t3_sge_alloc_qset+0x6a7/0x821 [cxgb3] [<ffffffffa010a07b>] cxgb_up+0x4d0/0xe62 [cxgb3] [<ffffffff81086037>] ? __module_text_address+0x12/0x58 [<ffffffffa010aa4c>] cxgb_open+0x3f/0x309 [cxgb3] [<ffffffff813e9f6c>] __dev_open+0x8e/0xbc [<ffffffff813e7ca5>] __dev_change_flags+0xbe/0x142 [<ffffffff813e9ea8>] dev_change_flags+0x21/0x57 [<ffffffff81445937>] devinet_ioctl+0x29a/0x54b [<ffffffff811f9a87>] ? inode_has_perm+0xaa/0xce [<ffffffff81446ed2>] inet_ioctl+0x8f/0xa7 [<ffffffff813d683a>] sock_do_ioctl+0x29/0x48 [<ffffffff813d6c83>] sock_ioctl+0x213/0x222 [<ffffffff81137f78>] vfs_ioctl+0x32/0xa6 [<ffffffff811384e2>] do_vfs_ioctl+0x47a/0x4b3 [<ffffffff81138571>] sys_ioctl+0x56/0x79 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b ---[ end trace 69a4d4cc77b58004 ]--- (some edits by Joerg Roedel) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Jay Fenalson <fenlason@redhat.com> CC: Divy LeRay <divy@chelsio.com> CC: Stanislaw Gruszka <sgruszka@redhat.com> CC: Joerg Roedel <joerg.roedel@amd.com> CC: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Linux 3.1-rc3v3.1-rc3Linus Torvalds2011-08-221-2/+2
|
* Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2011-08-226-8/+26
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tools: Add group event scheduling option to perf record/stat MAINTAINERS: Fix list of perf events source files perf tools: Fix build against newer glibc perf tools: Fix error handling of unknown events perf evlist: Fix missing event name init for default event perf list: Fix exit value
| * Merge branch 'perf/urgent' of ↵Ingo Molnar2011-08-186-8/+26
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
| | * perf tools: Add group event scheduling option to perf record/statLin Ming2011-08-182-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Group event scheduling command line option is missing in perf record/stat. Add it to perf record/stat, which is same as in perf top. Reported-by: Andi Kleen <andi@firstfloor.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * MAINTAINERS: Fix list of perf events source filesGeunsik Lim2011-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes made kernel/perf_event.c be split and moved to kernel/events/. Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Kosina <trivial@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1313653497-27263-1-git-send-email-leemgs1@gmail.com Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf tools: Fix build against newer glibcJosh Boyer2011-08-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream glibc commit 295e904 added a definition for __attribute_const__ to cdefs.h. This causes the following error when building perf: util/include/linux/compiler.h:8:0: error: "__attribute_const__" redefined [-Werror] /usr/include/sys/cdefs.h:226:0: note: this is the location of the previous definition Wrap __attribute_const__ in #ifndef as we do for __always_inline. Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110818113720.GL2227@zod.bos.redhat.com Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf tools: Fix error handling of unknown eventsStephane Eranian2011-08-181-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a problem with the parse_events() code not printing the correct event name when an event was unknown and starting with an 'r'. The source of the problem was the way raw notation was parsed. Without the patch: $ perf stat -e retired_foo invalid event modifier: 'tired_foo' With the patch: $ perf stat -e retired_foo invalid or unsupported event: 'retired_foo' This also covers the case where the name of the event was not printed at all when perf was linked with libpfm4. Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20110723021043.GA20178@quad Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf evlist: Fix missing event name init for default eventStephane Eranian2011-08-181-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When no event is given to perf record, perf top, a default event is initialized (cycles). However, perf_evlist__add_default() was not setting the symbolic name for the event. Perf top worked simply because it was reconstructing the name from the event code. But it should not have to do this. This patch initializes the evsel->name field properly. This second version improves the code flow on the non error path. Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20110607161936.GA8163@quad Signed-off-by: Stephane Eranian <eranian@google.com> [committer note: Use perf_evsel__delete() instead of plain free()] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf list: Fix exit valueStephane Eranian2011-08-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes an issue with the exit value of perf list: $ perf list; echo $? 129 perf list returns an error exit code even though there is no error. There was a stray exit(129) in print_events(). This patch removes this exit(). $ perf list; echo $? 0 $ perf list hw sw cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] alignment-faults [Software event] emulation-faults [Software event] $ echo $? 0 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20110523123917.GA31060@quad Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | Merge branch 'stable/bug.fixes' of ↵Linus Torvalds2011-08-225-11/+15
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/tracing: Fix tracing config option properly xen: Do not enable PV IPIs when vector callback not present xen/x86: replace order-based range checking of M2P table by linear one xen: xen-selfballoon.c needs more header files
| * | | xen/tracing: Fix tracing config option properlyJeremy Fitzhardinge2011-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steven Rostedt says we should use CONFIG_EVENT_TRACING. Cc:Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen: Do not enable PV IPIs when vector callback not presentStefano Stabellini2011-08-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix regression for HVM case on older (<4.1.1) hypervisors caused by commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Thu Dec 2 17:55:10 2010 +0000 xen: PV on HVM: support PV spinlocks and IPIs This change replaced the SMP operations with event based handlers without taking into account that this only works when the hypervisor supports callback vectors. This causes unexplainable hangs early on boot for HVM guests with more than one CPU. BugLink: http://bugs.launchpad.net/bugs/791850 CC: stable@kernel.org Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/x86: replace order-based range checking of M2P table by linear oneJan Beulich2011-08-173-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The order-based approach is not only less efficient (requiring a shift and a compare, typical generated code looking like this mov eax, [machine_to_phys_order] mov ecx, eax shr ebx, cl test ebx, ebx jnz ... whereas a direct check requires just a compare, like in cmp ebx, [machine_to_phys_nr] jae ... ), but also slightly dangerous in the 32-on-64 case - the element address calculation can wrap if the next power of two boundary is sufficiently far away from the actual upper limit of the table, and hence can result in user space addresses being accessed (with it being unknown what may actually be mapped there). Additionally, the elimination of the mistaken use of fls() here (should have been __fls()) fixes a latent issue on x86-64 that would trigger if the code was run on a system with memory extending beyond the 44-bit boundary. CC: stable@kernel.org Signed-off-by: Jan Beulich <jbeulich@novell.com> [v1: Based on Jeremy's feedback] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen: xen-selfballoon.c needs more header filesRandy Dunlap2011-08-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix build errors (found when CONFIG_SYSFS is not enabled): drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE' drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | Merge branch 'fixes' of ↵Linus Torvalds2011-08-211-5/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: core: handle ack_busy when fetching the Config ROM
| * | | | firewire: core: handle ack_busy when fetching the Config ROMStefan Richter2011-08-131-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some older Panasonic made camcorders (Panasonic AG-EZ30 and NV-DX110, Grundig Scenos DLC 2000) reject requests with ack_busy_X if a request is sent immediately after they sent a response to a prior transaction. This causes firewire-core to fail probing of the camcorder with "giving up on config rom for node id ...". Consequently, programs like kino or dvgrab are unaware of the presence of a camcorder. Such transaction failures happen also with the ieee1394 driver stack (of the 2.4...2.6 kernel series until 2.6.36 inclusive) but with a lower likelihood, such that kino or dvgrab are generally able to use these camcorders via the older driver stack. The cause for firewire-ohci's or firewire-core's worse behavior is not yet known. Gap count optimization in firewire-core is not the cause. Perhaps the slightly higher latency of transaction completion in the older stack plays a role. (ieee1394: AR-resp DMA context tasklet -> packet completion ktread -> user process; firewire-core: tasklet -> user process.) This change introduces retries and delays after ack_busy_X into firewire-core's Config ROM reader, such that at least firewire-core's probing and /dev/fw* creation are successful. This still leaves the problem that userland processes are facing transaction failures. gscanbus's built-in retry routines deal with them successfully, but neither kino's nor dvgrab's do ever succeed. But at least DV capture with "dvgrab -noavc -card 0" works now. Live video preview in kino works too, but not actual capture. One way to prevent Configuration ROM reading failures in application programs is to modify libraw1394 to synthesize read responses by means of firewire-core's Configuration ROM cache. This would only leave CMP and FCP transaction failures as a potential problem source for applications. Reported-and-tested-by: Thomas Seilund <tps@netmaster.dk> Reported-and-tested-by: René Fritz <rene@colorcube.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* | | | | Btrfs: fix 64 bit divide problemJosef Bacik2011-08-211-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check if there is enough space for balancing smarter"). We can't do 64-bit divides on 32-bit architectures. In cases where we need to divide/multiply by 2 we should just left/right shift respectively, and in cases where theres N number of devices use do_div. Also make the counters u64 to match up with rw_devices. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Acked-and-tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for_linus' of ↵Linus Torvalds2011-08-215-9/+37
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ext4: Fix ext4_should_writeback_data() for no-journal mode
| * | | | | ext4: flush any pending end_io requests before DIO reads w/dioread_nolockJiaying Zhang2011-08-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between ext4 buffer write and direct_IO read with dioread_nolock mount option enabled. The problem is that we clear PageWriteback flag during end_io time but will do uninitialized-to-initialized extent conversion later with dioread_nolock. If an O_direct read request comes in during this period, ext4 will return zero instead of the recently written data. This patch checks whether there are any pending uninitialized-to-initialized extent conversion requests before doing O_direct read to close the race. Note that this is just a bandaid fix. The fundamental issue is that we clear PageWriteback flag before we really complete an IO, which is problem-prone. To fix the fundamental issue, we may need to implement an extent tree cache that we can use to look up pending to-be-converted extents. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | | | ext4: fix nomblk_io_submit option so it correctly converts uninit blocksTheodore Ts'o2011-08-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug discovered by Jan Kara: Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back the old IO submission code but apparently it forgot to return the old handling of uninitialized buffers so we unconditionnaly call block_write_full_page() without specifying end_io function. So AFAICS we never convert unwritten extents to written in some cases. For example when I mount the fs as: mount -t ext4 -o nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600); char buf[1024]; memset(buf, 'a', sizeof(buf)); fallocate(fd, 0, 0, 16384); write(fd, buf, sizeof(buf)); I get a file full of zeros (after remounting the filesystem so that pagecache is dropped) instead of seeing the first KB contain 'a's. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | | | ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.Tao Ma2011-08-132-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We don't increase i_aiodio_unwritten when setting EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process to wait forever. Part of the patch came from Eric in his e-mail, but it doesn't fix the problem met by Michael actually. http://marc.info/?l=linux-ext4&m=131316851417460&w=2 Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | | | ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inodeJiaying Zhang2011-08-132-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flush inode's i_completed_io_list before calling ext4_io_wait to prevent the following deadlock scenario: A page fault happens while some process is writing inode A. During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Also moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion, ext4_evict_inode() is called before ext4_destroy_inode() and in ext4_evict_inode(), we may call ext4_truncate() without holding i_mutex lock. As a result, there is a race between flush_completed_IO that is called from ext4_ext_truncate() and ext4_end_io_work, which may cause corruption on an io_end structure. This change moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode() to resolve the race between ext4_truncate() and ext4_end_io_work during inode deletion. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | | | ext4: Fix ext4_should_writeback_data() for no-journal modeCurt Wohlgemuth2011-08-132-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_should_writeback_data() had an incorrect sequence of tests to determine if it should return 0 or 1: in particular, even in no-journal mode, 0 was being returned for a non-regular-file inode. This meant that, in non-journal mode, we would use ext4_journalled_aops for directories, symlinks, and other non-regular files. However, calling journalled aop callbacks when there is no valid handle, can cause problems. This would cause a kernel crash with Jan Kara's commit 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data"), because we now dereference 'handle' in ext4_journalled_write_end(). I also added BUG_ONs to check for a valid handle in the obviously journal-only aops callbacks. I tested this running xfstests with a scratch device in these modes: - no-journal - data=ordered - data=writeback - data=journal All work fine; the data=journal run has many failures and a crash in xfstests 074, but this is no different from a vanilla kernel. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-219-39/+76
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: sound/aoa/fabrics/layout.c: remove unneeded kfree ALSA: hda - Fix error check from snd_hda_get_conn_index() in patch_cirrus.c ALSA: hda - Don't spew too many ELD errors ALSA: usb-audio - Fix missing mixer dB information ALSA: hda - Add "PCM" volume to vmaster slave list ALSA: hda - Fix duplicated capture-volume creation for ALC268 models ALSA: ac97: Add HP Compaq dc5100 SFF(PT003AW) to Headphone Jack Sense whitelist ALSA: snd_usb_caiaq: track submitted output urbs
| * | | | | | ALSA: sound/aoa/fabrics/layout.c: remove unneeded kfreeJulia Lawall2011-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The label outnodev is only used when kzalloc has not yet taken place or has failed, so there is no need for the call for kfree under this label. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; expression E1!=0,E2,E3,E4; statement S; iterator I; @@ ( if (...) { ... when != kfree(x) when != x = E3 when != E3 = x * return ...; } ... when != x = E2 when != I(...,x,...) S if (...) { ... when != x = E4 kfree(x); ... return ...; } ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: hda - Fix error check from snd_hda_get_conn_index() in patch_cirrus.cTakashi Iwai2011-08-201-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snd_hda_get_conn_index() returns a negative value while the current code stores it in an unsigned int. It must be stored in a signed integer. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: hda - Don't spew too many ELD errorsTakashi Iwai2011-08-201-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently HD-audio driver shows the all error ELD byte as an error in the kernel message. This is annoying when the video driver doesn't set the correct ELD from the beginning. e.g. radeon sends a zero-byte data, but we still check ELD with the fixed 128 byte as a workaround for some broken devices, it spews 128-times errors. For avoiding this, the driver aborts reading when the first byte is invalid. In such a case, the whole data is certainly invalid. Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: usb-audio - Fix missing mixer dB informationTakashi Iwai2011-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent fix for testing dB range at the mixer creation time seems to cause regressions in some devices. In such devices, reading the dB info at probing time gives an error, thus both dBmin and dBmax are still zero, and TLV flag isn't set although the later read of dB info succeeds. This patch adds a workaround for such a case by assuming that the later read will succeed. In future, a similar test should be performed in a case where a wrong dB range is seen even in the later read. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: <stable@kernel.org>
| * | | | | | ALSA: hda - Add "PCM" volume to vmaster slave listTakashi Iwai2011-08-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new parser may use "PCM" volume, but it was missing the vmaster slave list, thus "Master" volume didn't control it. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=41342 Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: hda - Fix duplicated capture-volume creation for ALC268 modelsTakashi Iwai2011-08-161-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the duplicated creation of capture-mixer elements for some static ALC268 configurations. The capture mixers must be put to cap_mixer field instead of mixers array. Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: ac97: Add HP Compaq dc5100 SFF(PT003AW) to Headphone Jack Sense whitelistDaniel T Chen2011-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: https://bugs.launchpad.net/bugs/826081 The original reporter needs 'Headphone Jack Sense' enabled to have audible audio, so add his PCI SSID to the whitelist. Reported-and-tested-by: Muhammad Khurram Khan Cc: <stable@kernel.org> Signed-off-by: Daniel T Chen <crimsun@ubuntu.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | | | | ALSA: snd_usb_caiaq: track submitted output urbsDaniel Mack2011-08-142-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snd_usb_caiaq driver currently assumes that output urbs are serviced in time and doesn't track when and whether they are given back by the USB core. That usually works fine, but due to temporary limitations of the XHCI stack, we faced that urbs were submitted more than once with this approach. As it's no good practice to fire and forget urbs anyway, this patch introduces a proper bit mask to track which requests have been submitted and given back. That alone however doesn't make the driver work in case the host controller is broken and doesn't give back urbs at all, and the output stream will stop once all pre-allocated output urbs are consumed. But it does prevent crashes of the controller stack in such cases. See http://bugzilla.kernel.org/show_bug.cgi?id=40702 for more details. Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-and-tested-by: Matej Laitl <matej@laitl.cz> Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
* | | | | | | pci: fix new kernel-doc warning in pci.cRandy Dunlap2011-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix new kernel-doc warning in pci.c: Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps' Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-192-0/+8
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel: drm/i915: set GFX_MODE to pre-Ivybridge default value even on Ivybridge
| * | | | | | | drm/i915: set GFX_MODE to pre-Ivybridge default value even on IvybridgeJesse Barnes2011-08-192-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to Ivybridge, the GFX_MODE would default to 0x800, meaning that MI_FLUSH would flush the TLBs in addition to the rest of the caches indicated in the MI_FLUSH command. However starting with Ivybridge, the register defaults to 0x2800 out of reset, meaning that to invalidate the TLB we need to use PIPE_CONTROL. Since we're not doing that yet, go back to the old default so things work. v2: don't forget to actually *clear* the new bit Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-08-1926-135/+800
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: (23 commits) Revert "cfq: Remove special treatment for metadata rqs." block: fix flush machinery for stacking drivers with differring flush flags block: improve rq_affinity placement blktrace: add FLUSH/FUA support Move some REQ flags to the common bio/request area allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH xen/blkback: Make description more obvious. cfq-iosched: Add documentation about idling block: Make rq_affinity = 1 work as expected block: swim3: fix unterminated of_device_id table block/genhd.c: remove useless cast in diskstats_show() drivers/cdrom/cdrom.c: relax check on dvd manufacturer value drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse bsg-lib: add module.h include cfq-iosched: Reduce linked group count upon group destruction blk-throttle: correctly determine sync bio loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices loop: add management interface for on-demand device allocation loop: replace linked list of allocated devices with an idr index ...
| * | | | | | | | Revert "cfq: Remove special treatment for metadata rqs."Jens Axboe2011-08-191-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a kernel build regression since 3.1-rc1, which is about 10% regression. The kernel source is in an ext3 filesystem. Alex Shi bisect it to commit: commit a07405b7802691d29ab3b23bdc76ee6d006aad0b Author: Justin TerAvest <teravest@google.com> Date: Sun Jul 10 22:09:19 2011 +0200 cfq: Remove special treatment for metadata rqs. Apparently this is caused by lack metadata preemption, where ext3/ext4 do use READ_META. I didn't see a way to fix the issue, so suggest reverting the patch. This reverts commit a07405b7802691d29ab3b23bdc76ee6d006aad0b. Reported-by: Alex Shi<alex.shi@intel.com> Reported-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | block: fix flush machinery for stacking drivers with differring flush flagsJeff Moyer2011-08-154-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement FLUSH/FUA to support merge, introduced a performance regression when running any sort of fsyncing workload using dm-multipath and certain storage (in our case, an HP EVA). The test I ran was fs_mark, and it dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out that dm-multipath always advertised flush+fua support, and passed commands on down the stack, where those flags used to get stripped off. The above commit changed that behavior: static inline struct request *__elv_next_request(struct request_queue *q) { struct request *rq; while (1) { - while (!list_empty(&q->queue_head)) { + if (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || - (rq->cmd_flags & REQ_FLUSH_SEQ)) - return rq; - rq = blk_do_flush(q, rq); - if (rq) - return rq; + return rq; } Note that previously, a command would come in here, have REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush: struct request *blk_do_flush(struct request_queue *q, struct request *rq) { unsigned int fflags = q->flush_flags; /* may change, cache it */ bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); unsigned skip = 0; ... if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { rq->cmd_flags &= ~REQ_FLUSH; if (!has_fua) rq->cmd_flags &= ~REQ_FUA; return rq; } So, the flush machinery was bypassed in such cases (q->flush_flags == 0 && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)). Now, however, we don't get into the flush machinery at all. Instead, __elv_next_request just hands a request with flush and fua bits set to the scsi_request_fn, even if the underlying request_queue does not support flush or fua. The agreed upon approach is to fix the flush machinery to allow stacking. While this isn't used in practice (since there is only one request-based dm target, and that target will now reflect the flush flags of the underlying device), it does future-proof the solution, and make it function as designed. In order to make this work, I had to add a field to the struct request, inside the flush structure (to store the original req->end_io). Shaohua had suggested overloading the union with rb_node and completion_data, but the completion data is used by device mapper and can also be used by other drivers. So, I didn't see a way around the additional field. I tested this patch on an HP EVA with both ext4 and xfs, and it recovers the lost performance. Comments and other testers, as always, are appreciated. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | block: improve rq_affinity placementShaohua Li2011-08-111-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverts commit 35ae66e0a09ab70ed(block: Make rq_affinity = 1 work as expected). The purpose is to avoid an unnecessary IPI. Let's take an example. My test box has cpu 0-7, one socket. Say request is added from CPU 1, blk_complete_request() occurs at CPU 7. Without the reverted patch, softirq will be done at CPU 7. With it, an IPI will be directed to CPU 0, and softirq will be done at CPU 0. In this case, doing softirq at CPU 0 and CPU 7 have no difference from cache sharing point view and we can avoid an ipi if doing it in CPU 7. An immediate concern is this is just like QUEUE_FLAG_SAME_FORCE, but actually not. blk_complete_request() is running in interrupt handler, and currently I/O controller doesn't support multiple interrupts (I checked several LSI cards and AHCI), so only one CPU can run blk_complete_request(). This is still quite different as QUEUE_FLAG_SAME_FORCE. Since only one CPU runs softirq, the only difference with below patch is softirq not always runs at the first CPU of a group. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | blktrace: add FLUSH/FUA supportNamhyung Kim2011-08-113-16/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | Move some REQ flags to the common bio/request areaMatthew Wilcox2011-08-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as on a request, so relocate them to the shared part of the enum. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | Merge branch 'stable/for-jens' of ↵Jens Axboe2011-08-092-4/+4
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
| | * | | | | | | | xen/blkback: Make description more obvious.Konrad Rzeszutek Wilk2011-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the frontend having Xen but the backend not, it just looks odd: <*> Xen virtual block device support <*> Block-device backend driver Fix it to have the 'Xen' in front of it. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | | | | | xen-blkfront: Fix one off warning about name clashStefan Bader2011-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid telling users to use xvde and onwards when using xvde. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | | | | | xen-blkfront: Drop name and minor adjustments for emulated scsi devicesStefan Bader2011-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These were intended to avoid the namespace clash when representing emulated IDE and SCSI devices. However that seems to confuse users more than expected (a disk defined as sda becomes xvde). So for now go back to the scheme which does no adjustments. This will break when mixing IDE and SCSI names in the configuration of guests but should be by now expected. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | | | | allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSHJeff Moyer2011-08-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_insert_flush has the following check: /* * If there's data but flush is not necessary, the request can be * processed directly without going through flush machinery. Queue * for normal execution. */ if ((policy & REQ_FSEQ_DATA) && !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { list_add_tail(&rq->queuelist, &q->queue_head); return; } However, blk_flush_policy will not return with policy set to only REQ_FSEQ_DATA: static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq) { unsigned int policy = 0; if (fflags & REQ_FLUSH) { if (rq->cmd_flags & REQ_FLUSH) policy |= REQ_FSEQ_PREFLUSH; if (blk_rq_sectors(rq)) policy |= REQ_FSEQ_DATA; if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA)) policy |= REQ_FSEQ_POSTFLUSH; } return policy; } Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set. Fix this mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH check. Tejun notes: Hmmm... yes, this can become a correctness issue if (and only if) blk_queue_flush() is called to change q->flush_flags while requests are in-flight; otherwise, requests wouldn't reach the function at all. Also, I think it would be a generally good idea to always set FSEQ_DATA if the request has data. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | cfq-iosched: Add documentation about idlingVivek Goyal2011-08-051-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are always questions about why CFQ is idling on various conditions. Recent ones is Christoph asking again why to idle on REQ_NOIDLE. His assertion is that XFS is relying more and more on workqueues and is concerned that CFQ idling on IO from every workqueue will impact XFS badly. So he suggested that I add some more documentation about CFQ idling and that can provide more clarity on the topic and also gives an opprotunity to poke a hole in theory and lead to improvements. So here is my attempt at that. Any comments are welcome. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block: Make rq_affinity = 1 work as expectedTao Ma2011-08-051-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make the request completed in the __make_request cpu. But it makes the old rq_affinity = 1 not work any more. The root cause is that if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu, ccpu will be the same as group_cpu, so the completion will be excuted in the 'cpu' not 'group_cpu'. This patch fix problem by simpling removing group_cpu and the codes are more explicit now. If ccpu == cpu, we complete in cpu, otherwise we raise_blk_irq to ccpu. Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Reviewed-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block: swim3: fix unterminated of_device_id tableAxel Lin2011-08-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of_device_id structures need a NULL terminating entry, add it. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
OpenPOWER on IntegriCloud