| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a ZPODD device is unbound via sysfs, the ACPI notify handler
is not removed. This causes panics as observed in Bug #74601. The
panic only happens when the wake happens from outside the kernel
(i.e. inserting a media or pressing a button). Add a loop to
ata_port_detach which loops through the port's devices and checks
if zpodd is enabled, if so call zpodd_exit.
Cc: stable@vger.kernel.org
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When suspending imx6q systems which have rootfs on SATA, the following
error will likely be seen in resume. The SATA link will fail to come
up, and it results in an unusable system across the suspend/resume
cycle.
$ echo mem > /sys/power/state
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ... (elapsed 0.002 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
PM: Entering mem sleep
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Stopping disk
PM: suspend of devices complete after 61.914 msecs
PM: suspend devices took 0.070 seconds
PM: late suspend of devices complete after 4.906 msecs
PM: noirq suspend of devices complete after 4.521 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
CPU2: shutdown
CPU3: shutdown
Enabling non-boot CPUs ...
CPU1: Booted secondary processor
CPU1 is up
CPU2: Booted secondary processor
CPU2 is up
CPU3: Booted secondary processor
CPU3 is up
PM: noirq resume of devices complete after 10.486 msecs
PM: early resume of devices complete after 4.679 msecs
sd 0:0:0:0: [sda] Starting disk
PM: resume of devices complete after 22.674 msecs
PM: resume devices took 0.030 seconds
PM: Finishing wakeup.
Restarting tasks ... done.
$ ata1: SATA link down (SStatus 1 SControl 300)
ata1: SATA link down (SStatus 1 SControl 300)
ata1: limiting SATA link speed to 1.5 Gbps
ata1: SATA link down (SStatus 1 SControl 310)
ata1.00: disabled
ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen t4
ata1: irq_stat 0x00000040, connection status changed
ata1: SError: { CommWake DevExch }
ata1: hard resetting link
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: killing request
sd 0:0:0:0: rejecting I/O to offline device
Aborting journal on device sda2-8.
sd 0:0:0:0: rejecting I/O to offline device
EXT4-fs warning (device sda2): ext4_end_bio:317: I/O error writing to inode 132577 (offset 0 size 0 starting block 26235)
Buffer I/O error on device sda2, logical block 10169
...
It's caused by a silicon issue that SATA phy does not get reset by
controller when coming back from LPM. The patch adds a software
workaround for this issue. It enforces a software reset on SATA phy
in imx_sata_enable() function, so that we can ensure SATA link will
come up properly in both power-on and resume.
The software reset is implemented by writing phy reset register through
the phy control register bus interface. Functions
imx_phy_reg_[addressing|write|read]() implement this bus interface, while
imx_sata_phy_reset() performs the actually reset operation.
Signed-off-by: Richard Zhu <r65037@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
| |
Update register enums a little bit to add proper namespace prefix, and
have the names match i.MX reference manual.
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Intel Valleyview SoC, SATA device sleep is not reliable. When
DEVSLP is attempted on certain SSDs, port_devslp write would fail
and result in malfunction of AHCI controller. AHCI controller may
be not shown in PCI enumeration after reset. Complete power source
removal may be required to recover from this failure. So we blacklist
this device and override host device reported capabilities such that
device LPM will only attempt slumber but not DEVSLP.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AHCI spec allows implementations to issue commands in tag order
rather than FIFO order:
5.3.2.12 P:SelectCmd
HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
or HBA selects the command to issue that has had the
PxCI bit set to '1' longer than any other command
pending to be issued.
The result is that commands posted sequentially (time-wise) may play out
of sequence when issued by hardware.
This behavior has likely been hidden by drives that arrange for commands
to complete in issue order. However, it appears recent drives (two from
different vendors that we have found so far) inflict out-of-order
completions as a matter of course. So, we need to take care to maintain
ordered submission, otherwise we risk triggering a drive to fall out of
sequential-io automation and back to random-io processing, which incurs
large latency and degrades throughput.
This issue was found in simple benchmarks where QD=2 seq-write
performance was 30-50% *greater* than QD=32 seq-write performance.
Tagging for -stable and making the change globally since it has a low
risk-to-reward ratio. Also, word is that recent versions of an unnamed
OS also does it this way now. So, drives in the field are already
experienced with this tag ordering scheme.
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
Reviewed-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In multiple MSI mode all AHCI ports (including dummy) get assigned
separate MSI vectors and (as result of execution
pci_enable_msi_exact() function) separate IRQ numbers, (mapped to the
MSI vectors).
Therefore, although interrupts from dummy ports are not desired they
are still enabled. We do not request IRQs for dummy ports, but that
only means we do not assign AHCI-specific ISRs to corresponding IRQ
numbers.
As result, dummy port interrupts still could come and traverse all the
way from the PCI device to the kernel, causing unnecessary overhead.
This update disables IRQs for dummy ports and prevents the described
issue.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: David Milburn <dmilburn@redhat.com>
Cc: linux-ide@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 5ca72c4f7c41 ("AHCI: Support multiple MSIs")
|
|
|
|
|
|
|
|
|
| |
The driver calls pci_enable_msi_range() function with the range of
[nvec..nvec] which is what pci_enable_msi_exact() function is for.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AHCI specification allows hardware to choose to revert to
single MSI mode when fewer messages are allocated than requested.
Yet, at least ICH10 chipset reverts to single MSI mode even when
enough messages are allocated in some cases (see below).
This update forces the driver to not rely on initialization of
multiple MSIs mode alone and always check if "MSI Revert to
Single Message" (MRSM) mode was enforced by the controller and
fallback to the single MSI mode in case it did.
That prevents a situation when the driver configured multiple
per-port IRQ handlers, but the controller sends all port's
interrupts to a single IRQ, which could easily screw up the
interrupt handling and lead to delays and possibly crashes.
The fix was tested on a 6-port controller that successfully
reverted to the single MSI mode:
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA
AHCI Controller (prog-if 01 [AHCI 1.0])
Subsystem: Super Micro Computer Inc Device 10a7
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 101
I/O ports at f110 [size=8]
I/O ports at f100 [size=4]
I/O ports at f0f0 [size=8]
I/O ports at f0e0 [size=4]
I/O ports at f020 [size=32]
Memory at fbf00000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ahci
With 6 ports just 8 MSI vectors should be enough, but the adapter
enforces the MRSM mode when less than 16 vectors are written to
the Multiple Messages Enable PCI register. I instigated MRSM mode
by forcing @nvec to 8 in ahci_init_interrupts().
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: linux-ide@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
System may crash in ahci_hw_interrupt() or ahci_thread_fn() when
accessing the interrupt status in a port's private_data if the port is
actually a DUMMY port.
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
<snip console output for linux-3.15-rc1>
[ 9.352080] ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x1 impl SATA mode
[ 9.352084] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc
[ 9.368155] Console: switching to colour frame buffer device 128x48
[ 9.439759] mgag200 0000:11:00.0: fb0: mgadrmfb frame buffer device
[ 9.446765] mgag200 0000:11:00.0: registered panic notifier
[ 9.470166] scsi1 : ahci
[ 9.479166] scsi2 : ahci
[ 9.488172] scsi3 : ahci
[ 9.497174] scsi4 : ahci
[ 9.506175] scsi5 : ahci
[ 9.515174] scsi6 : ahci
[ 9.518181] ata1: SATA max UDMA/133 abar m2048@0x95c00000 port 0x95c00100 irq 91
[ 9.526448] ata2: DUMMY
[ 9.529182] ata3: DUMMY
[ 9.531916] ata4: DUMMY
[ 9.534650] ata5: DUMMY
[ 9.537382] ata6: DUMMY
[ 9.576196] [drm] Initialized mgag200 1.0.0 20110418 for 0000:11:00.0 on minor 0
[ 9.845257] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 9.865161] ata1.00: ATAPI: Optiarc DVD RW AD-7580S, FX04, max UDMA/100
[ 9.891407] ata1.00: configured for UDMA/100
[ 9.900525] scsi 1:0:0:0: CD-ROM Optiarc DVD RW AD-7580S FX04 PQ: 0 ANSI: 5
[ 10.247399] iTCO_vendor_support: vendor-support=0
[ 10.261572] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[ 10.269764] iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS
[ 10.301932] sd 0:2:0:0: [sda] 570310656 512-byte logical blocks: (291 GB/271 GiB)
[ 10.317085] sd 0:2:0:0: [sda] Write Protect is off
[ 10.328326] sd 0:2:0:0: [sda] Write cache: disabled, read cache: disabled, supports DPO and FUA
[ 10.375452] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c
[ 10.384217] IP: [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci]
[ 10.392101] PGD 0
[ 10.394353] Oops: 0000 [#1] SMP
[ 10.397978] Modules linked in: sr_mod(+) cdrom sd_mod iTCO_wdt crc_t10dif iTCO_vendor_support crct10dif_common ahci libahci libata lpc_ich mfd_core mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas dm_mirror dm_region_hash
dm_log dm_mod
[ 10.426499] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc1 #1
[ 10.433495] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
[ 10.443886] task: ffffffff81906460 ti: ffffffff818f0000 task.ti: ffffffff818f0000
[ 10.452239] RIP: 0010:[<ffffffffa0133df0>] [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci]
[ 10.462838] RSP: 0018:ffff880033c03d98 EFLAGS: 00010046
[ 10.468767] RAX: 0000000000a400a4 RBX: ffff880029a6bc18 RCX: 00000000fffffffa
[ 10.476731] RDX: 00000000000000a4 RSI: ffff880029bb0000 RDI: ffff880029a6bc18
[ 10.484696] RBP: ffff880033c03dc8 R08: 0000000000000000 R09: ffff88002f800490
[ 10.492661] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000000
[ 10.500625] R13: ffff880029a6bd98 R14: 0000000000000000 R15: ffffc90000194000
[ 10.508590] FS: 0000000000000000(0000) GS:ffff880033c00000(0000) knlGS:0000000000000000
[ 10.517623] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 10.524035] CR2: 000000000000003c CR3: 00000000328ff000 CR4: 00000000000007b0
[ 10.531999] Stack:
[ 10.534241] 0000000000000017 ffff880031ba7d00 000000000000005c ffff880031ba7d00
[ 10.542535] 0000000000000000 000000000000005c ffff880033c03e10 ffffffff810c2a1e
[ 10.550827] ffff880031ae2900 000000008108fb4f ffff880031ae2900 ffff880031ae2984
[ 10.559121] Call Trace:
[ 10.561849] <IRQ>
[ 10.563994] [<ffffffff810c2a1e>] handle_irq_event_percpu+0x3e/0x1a0
[ 10.571309] [<ffffffff810c2bbd>] handle_irq_event+0x3d/0x60
[ 10.577631] [<ffffffff810c4fdd>] try_one_irq.isra.6+0x8d/0xf0
[ 10.584142] [<ffffffff810c5313>] note_interrupt+0x173/0x1f0
[ 10.590460] [<ffffffff810c2a8e>] handle_irq_event_percpu+0xae/0x1a0
[ 10.597554] [<ffffffff810c2bbd>] handle_irq_event+0x3d/0x60
[ 10.603872] [<ffffffff810c5727>] handle_edge_irq+0x77/0x130
[ 10.610199] [<ffffffff81014b8f>] handle_irq+0xbf/0x150
[ 10.616040] [<ffffffff8109ff4e>] ? vtime_account_idle+0xe/0x50
[ 10.622654] [<ffffffff815fca1a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 10.630140] [<ffffffff816038cf>] do_IRQ+0x4f/0xf0
[ 10.635490] [<ffffffff815f8aed>] common_interrupt+0x6d/0x6d
[ 10.641805] <EOI>
[ 10.643950] [<ffffffff8149ca9f>] ? cpuidle_enter_state+0x4f/0xc0
[ 10.650972] [<ffffffff8149ca98>] ? cpuidle_enter_state+0x48/0xc0
[ 10.657775] [<ffffffff8149cb47>] cpuidle_enter+0x17/0x20
[ 10.663807] [<ffffffff810b0070>] cpu_startup_entry+0x2c0/0x3d0
[ 10.670423] [<ffffffff815dfcc7>] rest_init+0x77/0x80
[ 10.676065] [<ffffffff81a60f47>] start_kernel+0x40f/0x41a
[ 10.682190] [<ffffffff81a60941>] ? repair_env_string+0x5c/0x5c
[ 10.688799] [<ffffffff81a60120>] ? early_idt_handlers+0x120/0x120
[ 10.695699] [<ffffffff81a605ee>] x86_64_start_reservations+0x2a/0x2c
[ 10.702889] [<ffffffff81a60733>] x86_64_start_kernel+0x143/0x152
[ 10.709689] Code: a0 fc ff 85 c0 8b 4d d4 74 c3 48 8b 7b 08 89 ca 48 c7 c6 60 66 13 a0 31 c0 e8 9d 70 28 e1 8b 4d d4 eb aa 0f 1f 84 00 00 00 00 00 <45> 8b 64 24 3c 48 89 df e8 23 47 4c e1 41 83 fc 01 19 c0 48 83
[ 10.731470] RIP [<ffffffffa0133df0>] ahci_hw_interrupt+0x100/0x130 [libahci]
[ 10.739441] RSP <ffff880033c03d98>
[ 10.743333] CR2: 000000000000003c
[ 10.747032] ---[ end trace b6e82636970e2690 ]---
[ 10.760190] Kernel panic - not syncing: Fatal exception in interrupt
[ 10.767291] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-of-by: David Milburn <dmilburn@redhat.com>
Fixes: 5ca72c4f7c41 ("AHCI: Support multiple MSIs")
|
|
|
|
|
|
|
|
|
|
| |
Add missing clk_disable() call to ata_host_activate() failure path.
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing cf_exit() and clk_put() calls to ata_host_activate()
failure path.
Cc: Viresh Kumar <viresh.linux@gmail.com>
Cc: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The ahci_imx driver is only needed on Freescale i.MX platforms so
don't let it be built on other platforms, except for build test
purpose.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Richard Zhu <r65037@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing clk_put() call to ata_host_activate() failure path.
Sergei says,
"Hm, I have once fixed that (see that *if* (!ret)) but looks like a
later commit 477c87e90853d136b188c50c0e4a93d01cad872e (ARM:
at91/pata: use gpio_is_valid to check the gpio) broke it again. :-(
Would be good if the changelog did mention that..."
Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Crucial/Micron M500 drives properly support queued DSM TRIM starting
with firmware MU05. Update the blacklist so we only disable queued trim
for older firmware releases.
Early M550 series drives suffer from the same issue as M500. A bugfix
firmware is in the pipeline but not ready yet. Until then, blacklist
queued trim for M550.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Chris Samuel <chris@csamuel.org>
Cc: Marc MERLIN <marc@merlins.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AHCI_XGENE is only applicable on ARM64 but it can also be enabled for
compile testing; however, AHCI_XGENE selects PHY_XGENE which has other
arch specific dependencies. This leads to the following warning when
enabling it on other archs for compile testing.
warning: (AHCI_XGENE) selects PHY_XGENE which has unmet direct
dependencies (HAS_IOMEM && OF && (ARM64 || COMPILE_TEST))
Selecting a config option which itself has dependencies can easily
lead to broken configurations. For now, let's just make AHCI_XGENE
depend on PHY_XGENE which has all the necessary dependencies already.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Loc Ho <lho@apm.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull block driver update from Jens Axboe:
"On top of the core pull request, here's the pull request for the
driver related changes for 3.15. It contains:
- Improvements for msi-x registration for block drivers (mtip32xx,
skd, cciss, nvme) from Alexander Gordeev.
- A round of cleanups and improvements for drbd from Andreas
Gruenbacher and Rashika Kheria.
- A round of clanups and improvements for bcache from Kent.
- Removal of sleep_on() and friends in DAC960, ataflop, swim3 from
Arnd Bergmann.
- Bug fix for a bug in the mtip32xx async completion code from Sam
Bradshaw.
- Bug fix for accidentally bouncing IO on 32-bit platforms with
mtip32xx from Felipe Franciosi"
* 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits)
bcache: remove nested function usage
bcache: Kill bucket->gc_gen
bcache: Kill unused freelist
bcache: Rework btree cache reserve handling
bcache: Kill btree_io_wq
bcache: btree locking rework
bcache: Fix a race when freeing btree nodes
bcache: Add a real GC_MARK_RECLAIMABLE
bcache: Add bch_keylist_init_single()
bcache: Improve priority_stats
bcache: Better alloc tracepoints
bcache: Kill dead cgroup code
bcache: stop moving_gc marking buckets that can't be moved.
bcache: Fix moving_pred()
bcache: Fix moving_gc deadlocking with a foreground write
bcache: Fix discard granularity
bcache: Fix another bug recovering from unclean shutdown
bcache: Fix a bug recovering from unclean shutdown
bcache: Fix a journalling reclaim after recovery bug
bcache: Fix a null ptr deref in journal replay
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
into for-3.15/drivers
Kent writes:
Jens, here's the bcache changes for 3.15. Lots of bugfixes, and some
refactoring and cleanups.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Uninlined nested functions can cause crashes when using ftrace, as they don't
follow the normal calling convention and confuse the ftrace function graph
tracer as it examines the stack.
Also, nested functions are supported as a gcc extension, but may fail on other
compilers (e.g. llvm).
Signed-off-by: John Sheu <john.sheu@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.
It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.
This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.
Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.
This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This means the garbage collection code can better check for data and metadata
pointers to the same buckets.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This will potentially save us an allocation when we've got inode/dirent bkeys
that don't fit in the keylist's inline keys.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Break down data into clean data/dirty data/metadata.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change the invalidate tracepoint to indicate how much data we're invalidating,
and change the alloc tracepoints to indicate what offset they're for.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
This hasn't been used or even enabled in ages.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Avoid a potential null pointer deref (e.g. from check keys for cache misses)
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Deadlock happened because a foreground write slept, waiting for a bucket
to be allocated. Normally the gc would mark buckets available for invalidation.
But the moving_gc was stuck waiting for outstanding writes to complete.
These writes used the bcache_wq, the same queue foreground writes used.
This fix gives moving_gc its own work queue, so it was still finish moving
even if foreground writes are stuck waiting for allocation. It also makes
work queue a parameter to the data_insert path, so moving_gc can use its
workqueue for writes.
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
blk_stack_limits() doesn't like a discard granularity of 0.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.
Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The code to fixup incorrect bucket prios incorrectly did not skip btree node
freeing keys
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On recovery we weren't correctly keeping track of what journal buckets had open
journal entries, thus it was possible for them to be overwritten until we'd
written all new journal entries.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Shutdown wasn't cancelling/waiting on journal_write_work()
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The code was using sectors to count the number of sectors it was zeroing... but
then it passed it to bio_advance()... after it had been set to 0. Amusing...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use a bigger hammer this time
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes 2 issues in the fast completion path:
1) Possible double completions / double dma_unmap_sg() calls due to lack
of atomicity in the check and subsequent dereference of the upper layer
callback function. Fixed with cmpxchg before unmap and callback.
2) Regression in unaligned IO constraining workaround for p420m devices.
Fixed by checking if IO is unaligned and using proper semaphore if so.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the buffers are unmapped after completing a request, then stale data
might be in the request.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to set the queue bounce limit during the device initialization to
prevent excessive bouncing on 32 bit architectures.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently the driver falls back to INTx mode when MSI-X
initialization failed. This is a suboptimal behaviour
for chips that also support MSI. This update changes that
behaviour and falls back to MSI mode in case MSI-X mode
initialization failed.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: iss_storagedev@hp.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
interruptible_sleep_on is racy and going away. This replaces the one
caller in the swim3 driver with the equivalent race-free
wait_event_interruptible call. Since we're here already, this
also fixes the case where we get interrupted from atomic context,
which used to just spin in the loop.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sleep_on() is inherently racy, and has been deprecated for a long time.
This fixes two instances in the atari floppy driver:
* fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the
regular mutex since it gets released in interrupt context. The
open-coded version using wait_event() and cmpxchg() is equivalent
to the existing code but does the checks atomically, and we can
now safely check the condition with irqs enabled.
* format_wait becomes a completion, which is the natural structure
here. The format ioctl waits for the background task to either
complete or abort.
This does not attempt to fix the preexisting bug of calling schedule
with local interrupts disabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sleep_on and its variants are going away. The use of sleep_on() in
DAC960_V2_ExecuteUserCommand seems to be bogus because the command
by the time we get there, the command has completed already and
we just enter the timeout. Based on this interpretation, I concluded
that we can replace it with a simple msleep(1000) and rearrange the
code around it slightly.
The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent
to the race-free version using wait_event_interruptible_timeout.
I left the driver to return -EINTR rather than -ERESTARTSYS to preserve
the timeout behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit "mtip32xx: Use pci_enable_msix_range() instead of
pci_enable_msix()" was unnecessary, since pci_enable_msi()
function is not deprecated and is still preferable for
enabling the single MSI mode. This update reverts usage of
pci_enable_msi() function.
Besides, the changelog for that commit was bogus, since
mtip32xx driver uses MSI interrupt, not MSI-X.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|