summaryrefslogtreecommitdiffstats
path: root/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r314913: MFV r314911: 7867 ARC space accounting leakavg2017-03-231-0/+6
|
* MFC r314274: l2arc: fix write size calculation broken by Compressed ARC commitavg2017-03-111-18/+18
|
* MFC 313879jpaetzel2017-03-071-3/+8
| | | | | | | | | | | | | | | | | | | | | | MVF: 313876 7504 kmem_reap hangs spa_sync and administrative tasks illumos/illumos-gate@405a5a0f5c3ab36cb76559467d1a62ba648bd809 https://github.com/illumos/illumos-gate/commit/405a5a0f5c3ab36cb76559467d1a62ba648bd80 https://www.illumos.org/issues/7504 We see long spa_sync(). We are waiting to hold dp_config_rwlock for writer. Some other thread holds dp_config_rwlock for reader, then calls arc_get_data_buf(), which finds that arc_is_overflowing()==B_TRUE. So it waits (while holding dp_config_rwlock for reader) for arc_reclaim_thread to signal arc_reclaim_waiters_cv. Before signaling, arc_reclaim_thread does arc_kmem_reap_now(), which takes ~seconds. Author: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com>
* MFC r313687: remove l2_padding_needed statistic from zfs arcavg2017-02-211-2/+0
|
* Revert r308753: some unrelated changes were included into the commitavg2016-11-171-16/+16
|
* MFC r308040,308479: nap time between pats is forced to be at most halfavg2016-11-171-16/+16
| | | | | | | of the timeout Note that in this branch the default nap period is 1 second unlike the head where the period is 10 seconds.
* MFC r305561: MFV r305560:mav2016-10-141-1/+3
| | | | | | | | | | | | | | | | 7278 tuning zfs_arc_max does not impact arc_c_min When changing zfs_arc_max (e.g. as zdb does), it may be set to less than the default arc_c_min. arc_c_min should decrease to not be more than arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective. Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Author: Matthew Ahrens <mahrens@delphix.com> openzfs/openzfs@608764beadaf4bb71c5d8fe1818e8392ac66a61b
* MFC r305323: MFV r302991: 6950 ARC should cache compressed datamav2016-10-141-1662/+1826
| | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@dcbf3bd6a1f1360fc1afcee9e22c6dcff7844bf2 https://github.com/illumos/illumos-gate/commit/dcbf3bd6a1f1360fc1afcee9e22c6dcff 7844bf2 https://www.illumos.org/issues/6950 When reading compressed data from disk, the ARC should keep the compressed block cached and only decompress it when consumers access the block. The uncompressed data should be short-lived allowing the ARC to cache a much large r amount of data. The DMU would also maintain a smaller cache of uncompressed blocks to minimize the impact of decompressing frequently accessed blocks. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com>
* MFC r302838: 6513 partially filled holes lose birth timeavg2016-08-151-3/+17
|
* MFC r301873: l2arc: reset b_tmp_cdata to NULL in the case of unset b_daddravg2016-07-131-0/+1
|
* MFC r302265, r302382smh2016-07-131-11/+92
| | | | | | | Allow ZFS ARC min / max to be tuned at runtime Relnotes: YES Sponsored by: Multiplay
* MFC r300870,r300884:ngie2016-06-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | r300870: Unbreak the zfs(4) build vm/vm_pageout.h grew a dependency on the bool typedef in r300865 arc.c didn't include sys/types.h, which included the definition for the typedef Other items (ofed, drm2) might need to be chased for this commit. Pointyhat to: alc r300884: Fix up r300870 The sys/types.h fix I proposed was only tested with zfs(4), not with libzpool, which is where the build failure actually existed Remove vm/vm_pageout.h from arc.c and zfs_vnops.c because they're both unneeded In collaboration with: kib
* MFC r297848: l2arc: make sure that all writes honor ashift of a cache deviceavg2016-05-171-93/+151
| | | | | Note: no MFC stable/9 because it has become quite out of date with head, so the merge would be quite labourious and, thus, risky.
* MFC r296530: MFV r296529:mav2016-03-211-5/+5
| | | | | | | | | | | | | | 6672 arc_reclaim_thread() should use gethrtime() instead of ddi_get_lbolt() 6673 want a macro to convert seconds to nanoseconds and vice-versa Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Eli Rosenthal <eli.rosenthal@delphix.com> illumos/illumos-gate@a8f6344fa0921599e1f4511e41b5f9a25c38c0f9
* MFC r294809: MFV r294808:mav2016-03-201-0/+2
| | | | | | | | | | | | 6421 Add missing multilist_destroy calls to arc_fini Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@57deb2328260c447bf1db25fe74e0eece102733e
* MFC r277300 (by smh): Mechanically convert cddl sun #ifdef's to illumosmav2016-03-201-10/+10
| | | | | | | | | Since the upstream for cddl code is now illumos not sun, mechanically convert all sun #ifdef's to illumos #ifdef's which have been used in all newer code for some time. Also do a manual pass to correct the use if #ifdef comments as per style(9) as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos.
* MFC r290191 (by avg):mav2015-11-131-10/+21
| | | | | | | | | | | | | | l2arc: do not call trim_map_free() for blocks with zero b_asize b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication. PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl>
* MFC r289422:mav2015-11-131-3/+3
| | | | | | | | | | | | | | | | | | 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@45818ee124adeaaf947698996b4f4c722afc6d1f This is only a partial merge of respective ZFS infrastructure changes. At this moment FreeBSD kernel has no those crypto algorithms, so the parts of the code to enable them are commented out. When they are implemented, it will be trivial to plug them in.
* MFC r289305: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()mav2015-11-131-0/+10
| | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@8fe00bfb8790ad51653f67b01d5ac14256cbb404
* MFC r289295: 5219 l2arc_write_buffers() may write beyond target_szmav2015-11-131-1/+3
| | | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <skiselkov@gmail.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Justin Gibbs <gibbs@FreeBSD.org> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@freebsd.org> illumos/illumos-gate@d7d9a6d919f92d74ea0510a53f8441396048e800
* MFC r288064 (by avg): 6220 memleak in l2arc on debug buildmav2015-10-031-0/+7
| | | | | | | | | | | | | illumos/illumos-gate/commit/c546f36aa898d913ff77674fb5ff97f15b2e08b4 https://www.illumos.org/issues/6220 5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked when an arc_hdr is realloced from full to l2only. Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <sensille@gmx.net>
* MFC r287706 (by delphij):mav2015-10-031-19/+14
| | | | | | | | | | | | | | | | | | | 6214 zpools going south In r286570 (MFV of r277426) an unprotected write to b_flags to set the compression mode was introduced. This would open a race window where data is partially decompressed, modified, checksummed and written to the pool, resulting in pool corruption due to the partial decompression. Prevent this by reintroducing b_compress illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c Illumos issues: 6214 zpools going south https://www.illumos.org/issues/6214
* MFC r287702: 5987 zfs prefetch code needs workmav2015-10-031-9/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the ZFS prefetch code to detect only forward, sequential streams. The following kstats have been added: kstat.zfs.misc.arcstats.sync_wait_for_async How many sync reads have waited for async read to complete. (less is better) kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better) zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file. The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream. illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af Illumos ZFS issues: 5987 zfs prefetch code needs work https://www.illumos.org/issues/5987
* MFC r287283 (by delphij):mav2015-10-031-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a buffer overrun which may lead to data corruption, introduced in r286951 by reinstating changes in r274628. In l2arc_compress_buf(), we allocate a buffer to stash away the compressed data in 'cdata', allocated of l2hdr->b_asize bytes. We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata, which is of l2hdr->b_asize bytes, and have the compressed size (or original size, if compress didn't gain enough) stored in csize. To pad the buffer to fit the optimal write size, we round up the compressed size to L2 device's vdev_ashift. Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE, we are guaranteed that the rounded up csize would be <= b_asize. However, this is not necessarily true when we round up to 1 << vdev_ashift, because it could be larger than SPA_MINBLOCKSIZE. So, in the worst case scenario, we are overwriting at most (1 << vdev_ashift - SPA_MINBLOCKSIZE) bytes of memory next to the compressed data buffer. Andriy's original change in r274628 reorganized the code a little bit, by moving the padding to after we determined that the compression was beneficial. At which point, we would check rounded size against the allocated buffer size, and the buffer overrun would not be possible.
* MFC r286951: Restore part of r274628, reverted at r286776.mav2015-10-031-1/+2
|
* MFC r286776: Remove some random accumulated diff from Illumos.mav2015-10-031-19/+13
|
* MFC r286774: 2618 arc.c mistypes in the commentsmav2015-10-031-3/+3
| | | | | | | | | Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed by: Josef Sipek <jeffpc@josefsipek.net> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bart Coddens <bart.coddens@gmail.com> illumos/illumos-gate@fc98fea58e89224f6f13d7fae246d6cb5dfa35ea
* MFC r286770: Fix r286766 build with debug.mav2015-10-031-6/+6
|
* MFC r286767: Fix minor mismerge sometimes earlier.mav2015-10-031-4/+4
|
* MFC r286766: 5817 change type of arcs_size from uint64_t to refcount_tmav2015-10-031-25/+108
| | | | | | | | | | | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@2fd872a734cf486007a8dba532cec52bfb4d40e5 As a way to make it more difficult to introduce bugs into the ARC, and to make it easier to diagnose issues when bugs do creep in, it would be beneficial to change the type of the arc_state_t's arcs_size field to be a refcount_t instead of a uint64_t. This would allow us to make stricter checks when incrementing and decrementing the value with debugging enabled, but still fallback to simple, fast atomic operations when debugging is disabled.
* MFC r286764: 6033 arc_adjust() should search MFU lists for oldest buffermav2015-10-031-2/+2
| | | | | | | | | | | | | | | | | | when adjusting MFU size. illumos/illumos-gate@31c46cf23cd1cf4d66390a983dc5072d7d299ba2 https://www.illumos.org/issues/6033 When we're looking for the list containing oldest buffer we never actually look at the MFU lists even when we try to evict from MFU. looks like a copy paste error, the fix is here: Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Xin Li <delphij@delphij.net> Reviewed by: Prakash Surya <me@prakashsurya.com> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Alek Pinchuk <alek@nexenta.com> Obtained from: illumos
* MFC r286763: 5497 lock contention on arcs_mtxmav2015-10-031-635/+1131
| | | | | | | | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@244781f10dcd82684fd8163c016540667842f203 This patch attempts to reduce lock contention on the current arc_state_t mutexes. These mutexes are used liberally to protect the number of LRU lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at which these locks are acquired has been shown to greatly affect the performance of highly concurrent, cached workloads.
* MFC r286762: Revert part of r205231, introducing multiple ARC state locks.mav2015-10-031-278/+148
| | | | | This local implementation will be replaced by one from Illumos to reduce code divergence and make further merges easier.
* MFC r286655: Fix set of sign extension bugs in r286625.mav2015-10-031-4/+5
|
* MFC r286647: Fix assertion panic caused by combination of r286598 and TRIM.mav2015-10-031-4/+8
|
* MFC r286628: Fix r286625 build on i386.mav2015-10-031-1/+1
|
* MFC r286626: Fix minor mismerge in r286574.mav2015-10-031-42/+42
|
* MFC r286625:mav2015-10-031-95/+158
| | | | | | | | | | | | | 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@2ec99e3e987d8aa273f1e9ba2b983557d058198c
* MFC r286623: Remove extra lock, that IMO only creates potential problems now.mav2015-10-031-10/+2
|
* MFC r286598: 5701 zpool list reports incorrect "alloc" value for cache devicesmav2015-10-031-45/+142
|
* MFC r286576: Fix r286570 build with debug.mav2015-10-031-0/+2
|
* MFC r286574: 5445 Add more visibility via arcstats; specificallymav2015-10-031-14/+223
| | | | | | | | | | | | | arc_state_t stats and differentiate between "data" and "metadata" Reviewed by: Basil Crow <basil.crow@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <bayard.bell@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@4076b1bf41cfd9f968a33ed54a7ae76d9e996fe8
* MFC r286570: 5408 managing ZFS cache devices requires lots of RAMmav2015-10-031-577/+880
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <dev.fs.zfs@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Chris Williamson <Chris.Williamson@delphix.com> illumos/illumos-gate@89c86e32293a30cdd7af530c38b2073fee01411c Currently, every buffer cached in the L2ARC is accompanied by a 240-byte header in memory, leading to very high memory consumption when using very large cache devices. These changes significantly reduce this overhead. Currently: L1-only header = 176 bytes L1 + L2 or L2-only header = 176 bytes + 32 byte checksum + 32 byte l2hdr = 240 bytes Memory-optimized: L1-only header = 176 bytes L1 + L2 header = 176 bytes + 32 byte checksum = 208 bytes L2-only header = 96 bytes + 32 byte checksum = 128 bytes So overall: Trunk Optimized +-----------------+ L1-only | 176 B | 176 B | (same) +-----------------+ L1 & L2 | 240 B | 208 B | (saved 32 bytes) +-----------------+ L2-only | 240 B | 128 B | (saved 116 bytes) +-----------------+ For an average blocksize of 8KB, this means that for the L2ARC, the ratio of metadata to data has gone down from about 2.92% to 1.56%. For a 'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this means that we expect a completely full L2ARC to use (1600 GB * 0.0156) / 60GB = 41% of the available memory, down from 78%. Relnotes: yes
* MFC r281109: Add DTrace probe to the new ARC reclaim cause added in r281026.mav2015-10-031-1/+5
|
* MFC r277826 (by delphij):mav2015-10-021-0/+1
| | | | | Diff reduction with upstream. The actual change was merged in r272483 already.
* MFC r277452 (by will): Fix arc__shrink DTrace probe's to_free argument.mav2015-10-021-5/+1
| | | | | Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or false cases. Actually set the value of to_free before using it.
* MFC r275780 (by delphij):mav2015-10-021-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata ZFS should keep in ARC at minimum. In arc_evict(), when doing recycle, take more factors into account by applying the following policy: 1. If no evictable data, evict metadata; 2. If no evictable metadata, evict data; 3. If we hit arc_meta_limit, evict metadata; 4. If we haven't hit arc_meta_min, evict data; 5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest cached element from data and metadata. (FreeBSD) evict the data type specified by caller, which is the existing behavior. Note that because of our splitted locks (implemented in r205231 to improve scalability by reducing lock contention), implementing the fifth Illumos behavior will not be cheap, so for now just implement the 1-4 and fall back to current behavior for 5. Illumos issue: 5368 ARC should cache more metadata
* MFC r287099: account for ashift when gathering buffers to be written to ↵avg2015-09-111-12/+33
| | | | | | | l2arc device The change differs from that in head because of other changes that have not been MFC-ed yet.
* MFC r284513: l2arc: pass correct size to trim requestsavg2015-09-111-3/+3
|
* MFC 278040:jpaetzel2015-07-201-1/+1
| | | | | | | | | | | | | Prevent inlining txg_quiesce This allows dtrace to monitor the calls to txg_quiesce which can be really helpful. Also standardize __noinline order for arc_kmem_reap_now. Sponsored by: Multiplay Approved by: re
OpenPOWER on IntegriCloud