summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* libata: clean up ZPODD when a port is detachedLevente Kurusa2014-05-071-0/+9
| | | | | | | | | | | | | | When a ZPODD device is unbound via sysfs, the ACPI notify handler is not removed. This causes panics as observed in Bug #74601. The panic only happens when the wake happens from outside the kernel (i.e. inserting a media or pressing a button). Add a loop to ata_port_detach which loops through the port's devices and checks if zpodd is enabled, if so call zpodd_exit. Cc: stable@vger.kernel.org Reviewed-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Levente Kurusa <levex@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: imx: software workaround for phy reset issue in resumeShawn Guo2014-05-041-0/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When suspending imx6q systems which have rootfs on SATA, the following error will likely be seen in resume. The SATA link will fail to come up, and it results in an unusable system across the suspend/resume cycle. $ echo mem > /sys/power/state PM: Syncing filesystems ... done. PM: Preparing system for mem sleep Freezing user space processes ... (elapsed 0.002 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. PM: Entering mem sleep sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk PM: suspend of devices complete after 61.914 msecs PM: suspend devices took 0.070 seconds PM: late suspend of devices complete after 4.906 msecs PM: noirq suspend of devices complete after 4.521 msecs Disabling non-boot CPUs ... CPU1: shutdown CPU2: shutdown CPU3: shutdown Enabling non-boot CPUs ... CPU1: Booted secondary processor CPU1 is up CPU2: Booted secondary processor CPU2 is up CPU3: Booted secondary processor CPU3 is up PM: noirq resume of devices complete after 10.486 msecs PM: early resume of devices complete after 4.679 msecs sd 0:0:0:0: [sda] Starting disk PM: resume of devices complete after 22.674 msecs PM: resume devices took 0.030 seconds PM: Finishing wakeup. Restarting tasks ... done. $ ata1: SATA link down (SStatus 1 SControl 300) ata1: SATA link down (SStatus 1 SControl 300) ata1: limiting SATA link speed to 1.5 Gbps ata1: SATA link down (SStatus 1 SControl 310) ata1.00: disabled ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen t4 ata1: irq_stat 0x00000040, connection status changed ata1: SError: { CommWake DevExch } ata1: hard resetting link sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: killing request sd 0:0:0:0: rejecting I/O to offline device Aborting journal on device sda2-8. sd 0:0:0:0: rejecting I/O to offline device EXT4-fs warning (device sda2): ext4_end_bio:317: I/O error writing to inode 132577 (offset 0 size 0 starting block 26235) Buffer I/O error on device sda2, logical block 10169 ... It's caused by a silicon issue that SATA phy does not get reset by controller when coming back from LPM. The patch adds a software workaround for this issue. It enforces a software reset on SATA phy in imx_sata_enable() function, so that we can ensure SATA link will come up properly in both power-on and resume. The software reset is implemented by writing phy reset register through the phy control register bus interface. Functions imx_phy_reg_[addressing|write|read]() implement this bus interface, while imx_sata_phy_reset() performs the actually reset operation. Signed-off-by: Richard Zhu <r65037@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: imx: add namespace for register enumsShawn Guo2014-05-041-7/+9
| | | | | | | | Update register enums a little bit to add proper namespace prefix, and have the names match i.MX reference manual. Signed-off-by: Shawn Guo <shawn.guo@freescale.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: disable DEVSLP for Intel ValleyviewJacob Pan2014-04-243-0/+23
| | | | | | | | | | | | | | On Intel Valleyview SoC, SATA device sleep is not reliable. When DEVSLP is attempted on certain SSDs, port_devslp write would fail and result in malfunction of AHCI controller. AHCI controller may be not shown in PCI enumeration after reset. Complete power source removal may be required to recover from this failure. So we blacklist this device and override host device reported capabilities such that device LPM will only attempt slumber but not DEVSLP. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* libata/ahci: accommodate tag ordered controllersDan Williams2014-04-182-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AHCI spec allows implementations to issue commands in tag order rather than FIFO order: 5.3.2.12 P:SelectCmd HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1) or HBA selects the command to issue that has had the PxCI bit set to '1' longer than any other command pending to be issued. The result is that commands posted sequentially (time-wise) may play out of sequence when issued by hardware. This behavior has likely been hidden by drives that arrange for commands to complete in issue order. However, it appears recent drives (two from different vendors that we have found so far) inflict out-of-order completions as a matter of course. So, we need to take care to maintain ordered submission, otherwise we risk triggering a drive to fall out of sequential-io automation and back to random-io processing, which incurs large latency and degrades throughput. This issue was found in simple benchmarks where QD=2 seq-write performance was 30-50% *greater* than QD=32 seq-write performance. Tagging for -stable and making the change globally since it has a low risk-to-reward ratio. Also, word is that recent versions of an unnamed OS also does it this way now. So, drives in the field are already experienced with this tag ordering scheme. Cc: <stable@vger.kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ed Ciechanowski <ed.ciechanowski@intel.com> Reviewed-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: Do not receive interrupts sent by dummy portsAlexander Gordeev2014-04-181-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | In multiple MSI mode all AHCI ports (including dummy) get assigned separate MSI vectors and (as result of execution pci_enable_msi_exact() function) separate IRQ numbers, (mapped to the MSI vectors). Therefore, although interrupts from dummy ports are not desired they are still enabled. We do not request IRQs for dummy ports, but that only means we do not assign AHCI-specific ISRs to corresponding IRQ numbers. As result, dummy port interrupts still could come and traverse all the way from the PCI device to the kernel, causing unnecessary overhead. This update disables IRQs for dummy ports and prevents the described issue. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: David Milburn <dmilburn@redhat.com> Cc: linux-ide@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 5ca72c4f7c41 ("AHCI: Support multiple MSIs")
* ahci: Use pci_enable_msi_exact() instead of pci_enable_msi_range()Alexander Gordeev2014-04-171-4/+4
| | | | | | | | | The driver calls pci_enable_msi_range() function with the range of [nvec..nvec] which is what pci_enable_msi_exact() function is for. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: linux-ide@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: Ensure "MSI Revert to Single Message" mode is not enforcedAlexander Gordeev2014-04-172-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AHCI specification allows hardware to choose to revert to single MSI mode when fewer messages are allocated than requested. Yet, at least ICH10 chipset reverts to single MSI mode even when enough messages are allocated in some cases (see below). This update forces the driver to not rely on initialization of multiple MSIs mode alone and always check if "MSI Revert to Single Message" (MRSM) mode was enforced by the controller and fallback to the single MSI mode in case it did. That prevents a situation when the driver configured multiple per-port IRQ handlers, but the controller sends all port's interrupts to a single IRQ, which could easily screw up the interrupt handling and lead to delays and possibly crashes. The fix was tested on a 6-port controller that successfully reverted to the single MSI mode: 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller (prog-if 01 [AHCI 1.0]) Subsystem: Super Micro Computer Inc Device 10a7 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 101 I/O ports at f110 [size=8] I/O ports at f100 [size=4] I/O ports at f0f0 [size=8] I/O ports at f0e0 [size=4] I/O ports at f020 [size=32] Memory at fbf00000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Capabilities: [b0] PCI Advanced Features Kernel driver in use: ahci With 6 ports just 8 MSI vectors should be enough, but the adapter enforces the MRSM mode when less than 16 vectors are written to the Multiple Messages Enable PCI register. I instigated MRSM mode by forcing @nvec to 8 in ahci_init_interrupts(). Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: linux-ide@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
* ahci: do not request irq for dummy portDavid Milburn2014-04-161-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | System may crash in ahci_hw_interrupt() or ahci_thread_fn() when accessing the interrupt status in a port's private_data if the port is actually a DUMMY port. 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller <snip console output for linux-3.15-rc1> [ 9.352080] ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x1 impl SATA mode [ 9.352084] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc [ 9.368155] Console: switching to colour frame buffer device 128x48 [ 9.439759] mgag200 0000:11:00.0: fb0: mgadrmfb frame buffer device [ 9.446765] mgag200 0000:11:00.0: registered panic notifier [ 9.470166] scsi1 : ahci [ 9.479166] scsi2 : ahci [ 9.488172] scsi3 : ahci [ 9.497174] scsi4 : ahci [ 9.506175] scsi5 : ahci [ 9.515174] scsi6 : ahci [ 9.518181] ata1: SATA max UDMA/133 abar m2048@0x95c00000 port 0x95c00100 irq 91 [ 9.526448] ata2: DUMMY [ 9.529182] ata3: DUMMY [ 9.531916] ata4: DUMMY [ 9.534650] ata5: DUMMY [ 9.537382] ata6: DUMMY [ 9.576196] [drm] Initialized mgag200 1.0.0 20110418 for 0000:11:00.0 on minor 0 [ 9.845257] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 9.865161] ata1.00: ATAPI: Optiarc DVD RW AD-7580S, FX04, max UDMA/100 [ 9.891407] ata1.00: configured for UDMA/100 [ 9.900525] scsi 1:0:0:0: CD-ROM Optiarc DVD RW AD-7580S FX04 PQ: 0 ANSI: 5 [ 10.247399] iTCO_vendor_support: vendor-support=0 [ 10.261572] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 [ 10.269764] iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS [ 10.301932] sd 0:2:0:0: [sda] 570310656 512-byte logical blocks: (291 GB/271 GiB) [ 10.317085] sd 0:2:0:0: [sda] Write Protect is off [ 10.328326] sd 0:2:0:0: [sda] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 10.375452] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 10.384217] IP: [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci] [ 10.392101] PGD 0 [ 10.394353] Oops: 0000 [#1] SMP [ 10.397978] Modules linked in: sr_mod(+) cdrom sd_mod iTCO_wdt crc_t10dif iTCO_vendor_support crct10dif_common ahci libahci libata lpc_ich mfd_core mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 10.426499] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc1 #1 [ 10.433495] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011 [ 10.443886] task: ffffffff81906460 ti: ffffffff818f0000 task.ti: ffffffff818f0000 [ 10.452239] RIP: 0010:[<ffffffffa0133df0>] [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci] [ 10.462838] RSP: 0018:ffff880033c03d98 EFLAGS: 00010046 [ 10.468767] RAX: 0000000000a400a4 RBX: ffff880029a6bc18 RCX: 00000000fffffffa [ 10.476731] RDX: 00000000000000a4 RSI: ffff880029bb0000 RDI: ffff880029a6bc18 [ 10.484696] RBP: ffff880033c03dc8 R08: 0000000000000000 R09: ffff88002f800490 [ 10.492661] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000000 [ 10.500625] R13: ffff880029a6bd98 R14: 0000000000000000 R15: ffffc90000194000 [ 10.508590] FS: 0000000000000000(0000) GS:ffff880033c00000(0000) knlGS:0000000000000000 [ 10.517623] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 10.524035] CR2: 000000000000003c CR3: 00000000328ff000 CR4: 00000000000007b0 [ 10.531999] Stack: [ 10.534241] 0000000000000017 ffff880031ba7d00 000000000000005c ffff880031ba7d00 [ 10.542535] 0000000000000000 000000000000005c ffff880033c03e10 ffffffff810c2a1e [ 10.550827] ffff880031ae2900 000000008108fb4f ffff880031ae2900 ffff880031ae2984 [ 10.559121] Call Trace: [ 10.561849] <IRQ> [ 10.563994] [<ffffffff810c2a1e>] handle_irq_event_percpu+0x3e/0x1a0 [ 10.571309] [<ffffffff810c2bbd>] handle_irq_event+0x3d/0x60 [ 10.577631] [<ffffffff810c4fdd>] try_one_irq.isra.6+0x8d/0xf0 [ 10.584142] [<ffffffff810c5313>] note_interrupt+0x173/0x1f0 [ 10.590460] [<ffffffff810c2a8e>] handle_irq_event_percpu+0xae/0x1a0 [ 10.597554] [<ffffffff810c2bbd>] handle_irq_event+0x3d/0x60 [ 10.603872] [<ffffffff810c5727>] handle_edge_irq+0x77/0x130 [ 10.610199] [<ffffffff81014b8f>] handle_irq+0xbf/0x150 [ 10.616040] [<ffffffff8109ff4e>] ? vtime_account_idle+0xe/0x50 [ 10.622654] [<ffffffff815fca1a>] ? atomic_notifier_call_chain+0x1a/0x20 [ 10.630140] [<ffffffff816038cf>] do_IRQ+0x4f/0xf0 [ 10.635490] [<ffffffff815f8aed>] common_interrupt+0x6d/0x6d [ 10.641805] <EOI> [ 10.643950] [<ffffffff8149ca9f>] ? cpuidle_enter_state+0x4f/0xc0 [ 10.650972] [<ffffffff8149ca98>] ? cpuidle_enter_state+0x48/0xc0 [ 10.657775] [<ffffffff8149cb47>] cpuidle_enter+0x17/0x20 [ 10.663807] [<ffffffff810b0070>] cpu_startup_entry+0x2c0/0x3d0 [ 10.670423] [<ffffffff815dfcc7>] rest_init+0x77/0x80 [ 10.676065] [<ffffffff81a60f47>] start_kernel+0x40f/0x41a [ 10.682190] [<ffffffff81a60941>] ? repair_env_string+0x5c/0x5c [ 10.688799] [<ffffffff81a60120>] ? early_idt_handlers+0x120/0x120 [ 10.695699] [<ffffffff81a605ee>] x86_64_start_reservations+0x2a/0x2c [ 10.702889] [<ffffffff81a60733>] x86_64_start_kernel+0x143/0x152 [ 10.709689] Code: a0 fc ff 85 c0 8b 4d d4 74 c3 48 8b 7b 08 89 ca 48 c7 c6 60 66 13 a0 31 c0 e8 9d 70 28 e1 8b 4d d4 eb aa 0f 1f 84 00 00 00 00 00 <45> 8b 64 24 3c 48 89 df e8 23 47 4c e1 41 83 fc 01 19 c0 48 83 [ 10.731470] RIP [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci] [ 10.739441] RSP <ffff880033c03d98> [ 10.743333] CR2: 000000000000003c [ 10.747032] ---[ end trace b6e82636970e2690 ]--- [ 10.760190] Kernel panic - not syncing: Fatal exception in interrupt [ 10.767291] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-of-by: David Milburn <dmilburn@redhat.com> Fixes: 5ca72c4f7c41 ("AHCI: Support multiple MSIs")
* pata_samsung_cf: fix ata_host_activate() failure handlingBartlomiej Zolnierkiewicz2014-04-151-3/+7
| | | | | | | | | | Add missing clk_disable() call to ata_host_activate() failure path. Cc: Ben Dooks <ben-linux@fluff.org> Cc: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* pata_arasan_cf: fix ata_host_activate() failure handlingBartlomiej Zolnierkiewicz2014-04-141-2/+5
| | | | | | | | | | | Add missing cf_exit() and clk_put() calls to ata_host_activate() failure path. Cc: Viresh Kumar <viresh.linux@gmail.com> Cc: Shiraz Hashim <shiraz.hashim@st.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>
* ata: fix i.MX AHCI driver dependenciesJean Delvare2014-04-081-1/+1
| | | | | | | | | | | The ahci_imx driver is only needed on Freescale i.MX platforms so don't let it be built on other platforms, except for build test purpose. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Richard Zhu <r65037@freescale.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* pata_at91: fix ata_host_activate() failure handlingBartlomiej Zolnierkiewicz2014-04-021-5/+6
| | | | | | | | | | | | | | | | | | | Add missing clk_put() call to ata_host_activate() failure path. Sergei says, "Hm, I have once fixed that (see that *if* (!ret)) but looks like a later commit 477c87e90853d136b188c50c0e4a93d01cad872e (ARM: at91/pata: use gpio_is_valid to check the gpio) broke it again. :-( Would be good if the changelog did mention that..." Cc: Andrew Victor <linux@maxim.org.za> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
* libata: Update queued trim blacklist for M5x0 drivesMartin K. Petersen2014-04-021-2/+4
| | | | | | | | | | | | | | | | Crucial/Micron M500 drives properly support queued DSM TRIM starting with firmware MU05. Update the blacklist so we only disable queued trim for older firmware releases. Early M550 series drives suffer from the same issue as M500. A bugfix firmware is in the pipeline but not ready yet. Until then, blacklist queued trim for M550. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Chris Samuel <chris@csamuel.org> Cc: Marc MERLIN <marc@merlins.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
* libata: make AHCI_XGENE depend on PHY_XGENETejun Heo2014-04-021-2/+1
| | | | | | | | | | | | | | | | | | | | AHCI_XGENE is only applicable on ARM64 but it can also be enabled for compile testing; however, AHCI_XGENE selects PHY_XGENE which has other arch specific dependencies. This leads to the following warning when enabling it on other archs for compile testing. warning: (AHCI_XGENE) selects PHY_XGENE which has unmet direct dependencies (HAS_IOMEM && OF && (ARM64 || COMPILE_TEST)) Selecting a config option which itself has dependencies can easily lead to broken configurations. For now, let's just make AHCI_XGENE depend on PHY_XGENE which has all the necessary dependencies already. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Loc Ho <lho@apm.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kishon Vijay Abraham I <kishon@ti.com>
* Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2014-04-0147-6279/+6603
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver update from Jens Axboe: "On top of the core pull request, here's the pull request for the driver related changes for 3.15. It contains: - Improvements for msi-x registration for block drivers (mtip32xx, skd, cciss, nvme) from Alexander Gordeev. - A round of cleanups and improvements for drbd from Andreas Gruenbacher and Rashika Kheria. - A round of clanups and improvements for bcache from Kent. - Removal of sleep_on() and friends in DAC960, ataflop, swim3 from Arnd Bergmann. - Bug fix for a bug in the mtip32xx async completion code from Sam Bradshaw. - Bug fix for accidentally bouncing IO on 32-bit platforms with mtip32xx from Felipe Franciosi" * 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits) bcache: remove nested function usage bcache: Kill bucket->gc_gen bcache: Kill unused freelist bcache: Rework btree cache reserve handling bcache: Kill btree_io_wq bcache: btree locking rework bcache: Fix a race when freeing btree nodes bcache: Add a real GC_MARK_RECLAIMABLE bcache: Add bch_keylist_init_single() bcache: Improve priority_stats bcache: Better alloc tracepoints bcache: Kill dead cgroup code bcache: stop moving_gc marking buckets that can't be moved. bcache: Fix moving_pred() bcache: Fix moving_gc deadlocking with a foreground write bcache: Fix discard granularity bcache: Fix another bug recovering from unclean shutdown bcache: Fix a bug recovering from unclean shutdown bcache: Fix a journalling reclaim after recovery bug bcache: Fix a null ptr deref in journal replay ...
| * Merge branch 'bcache-for-3.15' of git://evilpiepirate.org/~kent/linux-bcache ↵Jens Axboe2014-03-1818-784/+664
| |\ | | | | | | | | | | | | | | | | | | | | | | | | into for-3.15/drivers Kent writes: Jens, here's the bcache changes for 3.15. Lots of bugfixes, and some refactoring and cleanups.
| | * bcache: remove nested function usageJohn Sheu2014-03-182-72/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Uninlined nested functions can cause crashes when using ftrace, as they don't follow the normal calling convention and confuse the ftrace function graph tracer as it examines the stack. Also, nested functions are supported as a gcc extension, but may fail on other compilers (e.g. llvm). Signed-off-by: John Sheu <john.sheu@gmail.com>
| | * bcache: Kill bucket->gc_genKent Overstreet2014-03-184-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | gc_gen was a temporary used to recalculate last_gc, but since we only need bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update last_gc directly. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Kill unused freelistKent Overstreet2014-03-186-129/+112
| | | | | | | | | | | | | | | | | | | | | | | | This was originally added as at optimization that for various reasons isn't needed anymore, but it does add a lot of nasty corner cases (and it was responsible for some recently fixed bugs). Just get rid of it now. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Rework btree cache reserve handlingKent Overstreet2014-03-186-139/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the bucket allocation reserves to use _real_ reserves - separate freelists - instead of watermarks, which if nothing else makes the current code saner to reason about and is going to be important in the future when we add support for multiple btrees. It also adds btree_check_reserve(), which checks (and locks) the reserves for both bucket allocation and memory allocation for btree nodes; the old code just kinda sorta assumed that since (e.g. for btree node splits) it had the root locked and that meant no other threads could try to make use of the same reserve; this technically should have been ok for memory allocation (we should always have a reserve for memory allocation (the btree node cache is used as a reserve and we preallocate it)), but multiple btrees will mean that locking the root won't be sufficient anymore, and for the bucket allocation reserve it was technically possible for the old code to deadlock. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Kill btree_io_wqKent Overstreet2014-03-183-24/+2
| | | | | | | | | | | | | | | | | | | | | | | | With the locking rework in the last patch, this shouldn't be needed anymore - btree_node_write_work() only takes b->write_lock which is never held for very long. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: btree locking reworkKent Overstreet2014-03-184-52/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new lock, b->write_lock, which is required to actually modify - or write - a btree node; this lock is only held for short durations. This means we can write out a btree node without taking b->lock, which _is_ held for long durations - solving a deadlock when btree_flush_write() (from the journalling code) is called with a btree node locked. Right now just occurs in bch_btree_set_root(), but with an upcoming journalling rework is going to happen a lot more. This also turns b->lock is now more of a read/intent lock instead of a read/write lock - but not completely, since it still blocks readers. May turn it into a real intent lock at some point in the future. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a race when freeing btree nodesKent Overstreet2014-03-181-33/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the bucket on the unused freelist, where it can be reused right away without any ordering requirements. It would be better to wait on at least a journal write to go down before reusing the bucket. bch_btree_set_root() does this, and inserting into non leaf nodes is completely synchronous so we should be ok, but future patches are just going to get rid of the unused freelist - it was needed in the past for various reasons but shouldn't be anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Add a real GC_MARK_RECLAIMABLEKent Overstreet2014-03-184-14/+21
| | | | | | | | | | | | | | | | | | | | | This means the garbage collection code can better check for data and metadata pointers to the same buckets. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Add bch_keylist_init_single()Kent Overstreet2014-03-182-4/+7
| | | | | | | | | | | | | | | | | | | | | This will potentially save us an allocation when we've got inode/dirent bkeys that don't fit in the keylist's inline keys. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Improve priority_statsKent Overstreet2014-03-181-6/+20
| | | | | | | | | | | | | | | | | | Break down data into clean data/dirty data/metadata. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Better alloc tracepointsKent Overstreet2014-03-183-19/+46
| | | | | | | | | | | | | | | | | | | | | Change the invalidate tracepoint to indicate how much data we're invalidating, and change the alloc tracepoints to indicate what offset they're for. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Kill dead cgroup codeKent Overstreet2014-03-185-202/+0
| | | | | | | | | | | | | | | | | | This hasn't been used or even enabled in ages. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: stop moving_gc marking buckets that can't be moved.Nicholas Swenson2014-03-181-1/+4
| | | | | | | | | | | | Signed-off-by: Nicholas Swenson <nks@daterainc.com>
| | * bcache: Fix moving_pred()Kent Overstreet2014-03-181-5/+3
| | | | | | | | | | | | | | | | | | Avoid a potential null pointer deref (e.g. from check keys for cache misses) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix moving_gc deadlocking with a foreground writeNicholas Swenson2014-03-185-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deadlock happened because a foreground write slept, waiting for a bucket to be allocated. Normally the gc would mark buckets available for invalidation. But the moving_gc was stuck waiting for outstanding writes to complete. These writes used the bcache_wq, the same queue foreground writes used. This fix gives moving_gc its own work queue, so it was still finish moving even if foreground writes are stuck waiting for allocation. It also makes work queue a parameter to the data_insert path, so moving_gc can use its workqueue for writes. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix discard granularityKent Overstreet2014-03-181-0/+1
| | | | | | | | | | | | | | | | | | blk_stack_limits() doesn't like a discard granularity of 0. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix another bug recovering from unclean shutdownKent Overstreet2014-03-183-65/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The on disk bucket gens are allowed to be out of date, when we reuse buckets that didn't have any live data in them. To deal with this, the initial gc has to update the bucket gen when we find a pointer gen newer than the bucket's gen. Unfortunately we weren't doing this for pointers in the journal that we're about to replay. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a bug recovering from unclean shutdownKent Overstreet2014-03-181-2/+2
| | | | | | | | | | | | | | | | | | | | | The code to fixup incorrect bucket prios incorrectly did not skip btree node freeing keys Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a journalling reclaim after recovery bugKent Overstreet2014-03-181-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | On recovery we weren't correctly keeping track of what journal buckets had open journal entries, thus it was possible for them to be overwritten until we'd written all new journal entries. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a null ptr deref in journal replayKent Overstreet2014-03-171-1/+5
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a lockdep splat in an error pathKent Overstreet2014-03-171-3/+5
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix a shutdown bugKent Overstreet2014-02-253-2/+12
| | | | | | | | | | | | | | | | | | Shutdown wasn't cancelling/waiting on journal_write_work() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix flash_dev_cache_miss() for real this timeKent Overstreet2014-02-251-14/+5
| | | | | | | | | | | | | | | | | | | | | The code was using sectors to count the number of sectors it was zeroing... but then it passed it to bio_advance()... after it had been set to 0. Amusing... Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| | * bcache: Fix another compiler warning on m68kKent Overstreet2014-02-181-2/+2
| | | | | | | | | | | | | | | | | | | | | Use a bigger hammer this time Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org>
| * | mtip32xx: mtip_async_complete() bug fixesSam Bradshaw2014-03-132-39/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
| * | mtip32xx: Unmap the DMA segments before completing the IO requestFelipe Franciosi2014-03-131-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the buffers are unmapped after completing a request, then stale data might be in the request. Signed-off-by: Felipe Franciosi <felipe@paradoxo.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
| * | mtip32xx: Set queue bounce limitFelipe Franciosi2014-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to set the queue bounce limit during the device initialization to prevent excessive bouncing on 32 bit architectures. Signed-off-by: Felipe Franciosi <felipe@paradoxo.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: Use pci_enable_msi_range() and pci_enable_msix_range()Alexander Gordeev2014-03-131-24/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | cciss: Fallback to MSI rather than to INTx if MSI-X failedAlexander Gordeev2014-03-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver falls back to INTx mode when MSI-X initialization failed. This is a suboptimal behaviour for chips that also support MSI. This update changes that behaviour and falls back to MSI mode in case MSI-X mode initialization failed. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Mike Miller <mike.miller@hp.com> Cc: iss_storagedev@hp.com Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
| * | swim3: fix interruptible_sleep_on raceArnd Bergmann2014-03-131-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interruptible_sleep_on is racy and going away. This replaces the one caller in the swim3 driver with the equivalent race-free wait_event_interruptible call. Since we're here already, this also fixes the case where we get interrupted from atomic context, which used to just spin in the loop. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | ataflop: fix sleep_on racesArnd Bergmann2014-03-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleep_on() is inherently racy, and has been deprecated for a long time. This fixes two instances in the atari floppy driver: * fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the regular mutex since it gets released in interrupt context. The open-coded version using wait_event() and cmpxchg() is equivalent to the existing code but does the checks atomically, and we can now safely check the condition with irqs enabled. * format_wait becomes a completion, which is the natural structure here. The format ioctl waits for the background task to either complete or abort. This does not attempt to fix the preexisting bug of calling schedule with local interrupts disabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | DAC960: remove sleep_on usageArnd Bergmann2014-03-131-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleep_on and its variants are going away. The use of sleep_on() in DAC960_V2_ExecuteUserCommand seems to be bogus because the command by the time we get there, the command has completed already and we just enter the timeout. Based on this interpretation, I concluded that we can replace it with a simple msleep(1000) and rearrange the code around it slightly. The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent to the race-free version using wait_event_interruptible_timeout. I left the driver to return -EINTR rather than -ERESTARTSYS to preserve the timeout behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | mtip32xx: Use pci_enable_msi() instead of pci_enable_msi_range()Alexander Gordeev2014-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit "mtip32xx: Use pci_enable_msix_range() instead of pci_enable_msix()" was unnecessary, since pci_enable_msi() function is not deprecated and is still preferable for enabling the single MSI mode. This update reverts usage of pci_enable_msi() function. Besides, the changelog for that commit was bogus, since mtip32xx driver uses MSI interrupt, not MSI-X. Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
OpenPOWER on IntegriCloud