summaryrefslogtreecommitdiffstats
path: root/sys/cddl
Commit message (Collapse)AuthorAgeFilesLines
* Switch zfs_panic_recover to panic for bad DVAsmh2015-11-061-1/+1
| | | | | | | | | | | As reported by Coverity a null pointer de-reference panic would be triggered when zfs_recover was set so switch to straight panic as it can never be recovered. Reported by: Coverity Scan MFC after: 1 X-MFC-With: r290401 Sponsored by: Multiplay
* Provide information about bad DVAsmh2015-11-051-1/+7
| | | | | | | Provide information about which vdev has an issue with a bad DVA. MFC after: 1 week Sponsored by: Multiplay
* Allow zfs_recover to be changed at runtimesmh2015-11-051-1/+1
| | | | | MFC after: 1 week Sponsored by: Multiplay
* Fix the open solaris atomic functions on arm64. Without this we may use theandrew2015-11-051-0/+87
| | | | | | | | | | | | | | | wrong value in the comparison, leading to incorrectly setting the new value. This has been observed in the ZFS code. Without this we can lose track of the reference count in a zrlock object. We should move to use the generic atomic functions, however as this has been observed I would prefer to have this working, then move to the generic functions. PR: 204037 Sponsored by: ABT Systems Ltd
* zfs: allow the lookup of extended attributes of an unlinked fileavg2015-11-021-1/+1
| | | | | | | That's required for extattr_get_fd(2) and the like to work properly. PR: 203201 MFC after: 17 days
* l2arc: do not call trim_map_free() for blocks with zero b_asizeavg2015-10-301-10/+21
| | | | | | | | | | | | | b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication. PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl> MFC after: 11 days
* Rename remaining linux32 symbols such as linux_sysent[] andjhb2015-10-221-10/+34
| | | | | | | | | | | | | | | | | linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D3954
* MFV r289561: 6328 Fix cstyle errors in zfs codebasemav2015-10-1916-46/+44
| | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Jorgen Lundman <lundman@lundman.net> Approved by: Robert Mustacchi <rm@joyent.com> Author: Paul Dagnelie <pcd@delphix.com> illumos/illumos-gate@9a686fbc186e8e2a64e9a5094d44c7d6fa0ea167
* MFV r289526:mav2015-10-181-2/+0
| | | | | | | | | | | | | | 5561 support root pools on EFI/GPT partitioned disks 5125 update zpool/libzfs to manage bootable whole disk pools (EFI/GPT labeled disks) Reviewed by: Jean McCormack <jean.mccormack@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com> illumos/illumos-gate@1a902ef8628b0dffd6df5442354ab59bb8530962 This is NOP changes for FreeBSD.
* Fix ZFS ABI compat shims for `zfs receive` after r289362.mav2015-10-171-0/+10
| | | | Difference appeared much less drammatic then seemed originally.
* MFV r289310:mav2015-10-1624-78/+650
| | | | | | | | | | | | | | | | | | 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@45818ee124adeaaf947698996b4f4c722afc6d1f This is only a partial merge of respective ZFS infrastructure changes. At this moment FreeBSD kernel has no those crypto algorithms, so the parts of the code to enable them are commented out. When they are implemented, it will be trivial to plug them in.
* MFV r289312: 2605 want to resume interrupted zfs sendmav2015-10-1515-163/+900
| | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@omniti.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@9c3fd1216fa7fb02cfbc78a2518a686d54b48ab8 For more info, see: - slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive - video https://www.youtube.com/watch?v=iY44jPMvxog - manpage changes (for zfs resume -s and zfs send -t) - upcoming talk at the OpenZFS Developer Summit The TL;DR is: Use "zfs receive -s" to save the partially received state on failure. On failure, get the receive token with "zfs get receive_resume_token <fs>" Resume the send with "zfs send -t <token_value>" Relnotes: yes
* MFV r289308: 6267 dn_bonus evicted too earlymav2015-10-145-43/+38
| | | | | | | | | | Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Justin T. Gibbs <gibbs@FreeBSD.org> illumos/illumos-gate@d2058105c61ec61df3a2dd3f839fed8c3fe7bfd6
* MFV r289306: 6295 metaslab_condense's dbgmsg should include vdev idmav2015-10-141-5/+6
| | | | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Joe Stein <joe.stein@delphix.com> illumos/illumos-gate@daec38ecb4fb5e73e4ca9e99be84f6b8c50c02fa
* MFV r289304: 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign()mav2015-10-141-0/+10
| | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@8fe00bfb8790ad51653f67b01d5ac14256cbb404
* MFV r289298: 6286 ZFS internal error when set large block on bootfsmav2015-10-141-3/+3
| | | | | | | | | | Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@6de9bb5603e65b16816b7ab29e39bac820e2da2b
* MFV r289296: 6288 dmu_buf_will_dirty could be fastermav2015-10-141-10/+51
| | | | | | | | | | | Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Matthew Ahrens <mahrens@delphix.com> illumos/illumos-gate@0f2e7d03b8f588387cb8dd8dd500cbe5ff4484e0
* MFV r289294: 5219 l2arc_write_buffers() may write beyond target_szmav2015-10-141-1/+3
| | | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <skiselkov@gmail.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Reviewed by: Justin Gibbs <gibbs@FreeBSD.org> Approved by: Matthew Ahrens <mahrens@delphix.com> Author: Andriy Gapon <avg@freebsd.org> illumos/illumos-gate@d7d9a6d919f92d74ea0510a53f8441396048e800
* FreeBSD-specific addition to r289191.mav2015-10-121-0/+3
|
* MFV r289188: 6281 prefetching should apply to 1MB readsmav2015-10-121-2/+2
| | | | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Alexander Motin <mav@freebsd.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Gordon Ross <gordon.ross@nexenta.com> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@632802744ef6d17e06d6980a95f631615c3b060f
* MFV r289187: 6251 add tunable to disable free_bpobj processingmav2015-10-121-2/+8
| | | | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@139510fb6efa97dbe5f5479594b308d940cab8d1
* MFV r289185: 6250 zvol_dump_init() can hold txg openmav2015-10-121-44/+63
| | | | | | | | | | | Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Garrett D'Amore <garrett@damore.org> Author: George Wilson <george.wilson@delphix.com> illumos/illumos-gate@b10bba72460aeaa53119c76ff5e647fd5585bece
* Restore original array_rd_sz semantics.mav2015-10-031-1/+1
| | | | | | | | | | | Before r278702 prefetch was blocked for I/Os > 1MB, after -- >= 1MB. 1MB I/Os are used for bulk operations in CTL (XCOPY, VERIFY), and disabling prefetch for them reduced the performance. This is temporary local patch, that should be replaced when upstreamed. Discussed with: mahrens MFC after: 3 days
* MFV r288408:markj2015-09-302-16/+70
| | | | | | | | | | | | | 6266 harden dtrace_difo_chunksize() with respect to malicious DIF illumos/illumos-gate@395c7a3dcfc66b8b671dc4b3c4a2f0ca26449922 Reviewed by: Alex Wilson <alex.wilson@joyent.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Bryan Cantrill <bryan@joyent.com> MFC after: 1 week
* sdt: static-ize couple of variablesavg2015-09-291-2/+2
| | | | MFC after: 11 days
* sdt module does not seem to actually use any symbol from opensolaris moduleavg2015-09-291-1/+0
| | | | MFC after: 11 days
* std: it is important that func name is never an empty stringavg2015-09-291-0/+2
| | | | | | | | | | | | | | otherwise DTRACE_ANCHORED() returns false and that makes stack() insert a bogus frame at the top. For example: dtrace -n 'test:dtrace_test::sdttest { stack(); } This change is not really a solution, but just a work-around. The real solution is to record the probe's call site and to use that for resolving a function name. PR: 195222 MFC after: 22 days
* sdt: start checking version field when parsing probe definitionsavg2015-09-291-0/+6
| | | | | | This is an extra safety measure. MFC after: 21 days
* dtrace_getarg: remove stray return statement on amd64, powerpcavg2015-09-292-2/+0
| | | | MFC after: 10 days
* define aok in libnvpair which is linked to all zfs libraries that need aokavg2015-09-281-0/+8
| | | | | | | | This removes the circular dependency of libnvpair on libzfs / libzpool. PR: 199811 Obtained from: bapt MFC after: 23 days
* MFV r288063: make dataset property de-registration operation O(1)delphij2015-09-258-203/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A change to a property on a dataset must be propagated to its descendants in case that property is inherited. For datasets whose information is not currently loaded into memory (e.g. a snapshot that isn't currently mounted), there is nothing to do; the property change will take effect the next time that dataset is loaded. To handle updates to datasets that are in-core, ZFS registers a callback entry for each property of each loaded dataset with the dsl directory that holds that dataset. There is a dsl directory associated with each live dataset that references both the live dataset and any snapshots of the live dataset. A property change is effected by doing a traversal of the tree of dsl directories for a pool, starting at the directory sourcing the change, and invoking these callbacks. The current implementation both registers and de-registers properties individually for each loaded dataset. While registration for a property is O(1) (insert into a list), de-registration is O(n) (search list and then remove). The 'n' for de-registration, however, is not limited to the size (number of snapshots + 1) of the dsl directory. The eviction portion of the life cycle for the in core state of datasets is asynchronous, which allows multiple copies of the dataset information to be in-core at once. Only one of these copies is active at any time with the rest going through tear down processing, but all copies contribute to the cost of performing a dsl_prop_unregister(). One way to create multiple, in-flight copies of dataset information is by performing "zfs list" operations from multiple threads concurrently. In-core dataset information is loaded on demand and then evicted when reference counts drops to zero. For datasets that are not mounted, there is no persistent reference count to keep them resident. So, a list operation will load them, compute the information required to do the list operation, and then evict them. When performing this operation from multiple threads it is possible that some of the in-core dataset information will be reused, but also possible to lose the race and load the dataset again, even while the same information is being torn down. Compounding the performance issue further is a change made for illumos issue 5056 which made dataset eviction single threaded. In environments using automation to manage ZFS datasets, it is now possible to create enough of a backlog of dataset evictions to consume excessive amounts of kernel memory and to bog down the system. The fix employed here is to make property de-registration O(1). With this change in place, it is hoped that a single thread is more than sufficient to handle eviction processing. If it isn't, the problem can be solved by increasing the number of threads devoted to the eviction taskq. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_prop.h: Associate dsl property callback records with both the dsl directory and the dsl dataset that is registering the callback. Both connections are protected by the dsl directory's "dd_lock". When linking callbacks into a dsl directory, group them by the property type. This helps reduce the space penalty for the double association (the property name pointer is stored once per dsl_dir instead of in each record) and reduces the number of strcmp() calls required to do callback processing when updating a single property. Property types are stored in a linked list since currently ZFS registers a maximum of 10 property types for each dataset. Note that the property buckets/records associated with a dsl directory are created on demand, but only freed when the dsl directory is freed. Given the static nature of property types and their small number, there is no benefit to freeing the few bytes of memory used to represent the property record earlier. When a property record becomes empty, the dsl directory is either going to become unreferenced a little later in this thread of execution, or there is a high chance that another dataset is going to be loaded that would recreate the bucket anyway. Replace dsl_prop_unregister() with dsl_prop_unregister_all(). All callers of dsl_prop_unregister() are trying to remove all property registrations for a given dsl dataset anyway. By changing the API, we can avoid doing any lookups of callbacks by property type and just traverse the list of all callbacks for the dataset and free each one. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c: Replace use of dsl_prop_unregister() with the new dsl_prop_unregister_all() API. illumos/illumos-gate@03bad06fbb261fd4a7151a70dfeff2f5041cce1f Author: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Illumos issue: 6171 dsl_prop_unregister() slows down dataset eviction https://www.illumos.org/issues/6171 MFC after: 2 weeks
* MFV r287817: 6220 memleak in l2arc on debug buildavg2015-09-211-0/+7
| | | | | | | | | | | | | https://github.com/illumos/illumos-gate/commit/c546f36aa898d913ff77674fb5ff97f15b2e08b4 https://www.illumos.org/issues/6220 5408 introduced a memleak in l2arc, namely the member b_thawed gets leaked when an arc_hdr is realloced from full to l2only. Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: George Wilson <george@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: Arne Jansen <sensille@gmx.net>
* MFV r287623: 5997 FRU field not set during pool creation and neverdelphij2015-09-133-6/+28
| | | | | | | | | | | | | | | | | | | | | | | | | updated ZFS already supports storing the vdev FRU in a vdev property. There is code in libzfs to work with this property, and there is code in the zfs-retire FMA module that looks for that information. But there is no code actually setting or updating the FRU. To address this, ZFS is changed to send a handful of new events whenever a vdev is added, attached, cleared, or onlined, as well as when a pool is created or imported. Note that syseventd is not currently available on FreeBSD and thus some work is needed to actually support the new ZFS events (e.g. in zfsd) to actually use this capability, this changeset is mostly a diff reduction from upstream. illumos/illumos-gate@1437283407f89cab03860accf49408f94559bc34 Illumos issues: 5997 FRU field not set during pool creation and never updated https://www.illumos.org/issues/5997
* Note r286552 as merged and reduce diff against upstream.delphij2015-09-131-0/+1
|
* MFV r287699: 6214 zpools going southdelphij2015-09-122-34/+14
| | | | | | | | | | | | | | | | | In r286570 (MFV of r277426) an unprotected write to b_flags to set the compression mode was introduced. This would open a race window where data is partially decompressed, modified, checksummed and written to the pool, resulting in pool corruption due to the partial decompression. Prevent this by reintroducing b_compress illumos/illumos-gate@d4cd038c92c36fd0ae35945831a8fc2975b5272c Illumos issues: 6214 zpools going south https://www.illumos.org/issues/6214
* MFV r287684: 6091 avl_add doesn't assert on non-debug buildsdelphij2015-09-121-3/+7
| | | | | | | | | | | | Use assfail() from libuutil instead of ASSERT() in userland AVL avl_add. illumos/illumos-gate@faa2b6be2fc102adf9ed584fc1a667b4ddf50d78 Illumos issues: 6091 avl_add doesn't assert on non-debug builds https://www.illumos.org/issues/6091
* MFV r287624: 5987 zfs prefetch code needs workdelphij2015-09-128-668/+267
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the ZFS prefetch code to detect only forward, sequential streams. The following kstats have been added: kstat.zfs.misc.arcstats.sync_wait_for_async How many sync reads have waited for async read to complete. (less is better) kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch How many demand read didn't have to wait for I/O because of predictive prefetch. (more is better) zfetch kstats have been similified to hits, misses, and max_streams, with max_streams representing times when we were not able to create new stream because we already have the maximum number of sequences for a file. The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes to prefetch per stream. illumos/illumos-gate@cf6106c8a0d6598b045811f9650d66e07eb332af Illumos ZFS issues: 5987 zfs prefetch code needs work https://www.illumos.org/issues/5987
* Remove the arg0 field from struct amd64_frame. Its existence was a bug,markj2015-09-111-1/+1
| | | | | | | | | | | | since on amd64 the first argument to a function is generally not on the stack. Revert an old DTrace bug fix to some code that assumed that sizeof(struct amd64_frame) == 16. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255
* MFV r283513:markj2015-09-111-2/+12
| | | | | | | | | | 5930 fasttrap_pid_enable() panics when prfind() fails in forking process Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@9df7e4e12eb093557252d3bec029b5c382613e36
* MFV r283512:markj2015-09-111-3/+3
| | | | | | | | | | 3599 dtrace_dynvar tail calls can blow stack Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Bryan Cantrill <bryan@joyent.com> illumos/illumos-gate@d47448f09aae3aa1a87fc450a0c44638e7ce7b51
* Expose an interface to determine if an ACE is inherited.delphij2015-09-042-1/+4
| | | | | | | Submitted by: sef Reviewed by: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3540
* Apply the noline attribute to vdev_queue_max_async_writesallanjude2015-08-311-1/+1
| | | | | | | | | | | | | This makes it possible to analyze the performance of the new ZFS write throttle with dtrace PR: 200316 Submitted by: Lacey Powers <lacey.leanne@gmail.com> Reviewed by: avg, smh, delphij (no objection) Approved by: bapt (mentor) MFC after: 1 month Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D3472
* Fix a buffer overrun which may lead to data corruption, introduced indelphij2015-08-291-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r286951 by reinstating changes in r274628. In l2arc_compress_buf(), we allocate a buffer to stash away the compressed data in 'cdata', allocated of l2hdr->b_asize bytes. We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata, which is of l2hdr->b_asize bytes, and have the compressed size (or original size, if compress didn't gain enough) stored in csize. To pad the buffer to fit the optimal write size, we round up the compressed size to L2 device's vdev_ashift. Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE, we are guaranteed that the rounded up csize would be <= b_asize. However, this is not necessarily true when we round up to 1 << vdev_ashift, because it could be larger than SPA_MINBLOCKSIZE. So, in the worst case scenario, we are overwriting at most (1 << vdev_ashift - SPA_MINBLOCKSIZE) bytes of memory next to the compressed data buffer. Andriy's original change in r274628 reorganized the code a little bit, by moving the padding to after we determined that the compression was beneficial. At which point, we would check rounded size against the allocated buffer size, and the buffer overrun would not be possible.
* In r286705 (Illumos 5960/a2cdcdd), a separate thread is created with curprocdelphij2015-08-291-2/+2
| | | | | | | | | | as parent. In the case of a send or receive, the curproc would be the userland application that issues the ioctl. This would trigger an assertion failure introduced in Solaris compatibility shims in r196458 when kernel is compiled with INVARIANTS. Fix this by using p0 (proc0 or kernel) as the parent thread when creating the kernel threads.
* MFV (partial) r286889: 5692 expose the number of hole blocks in a fileavg2015-08-243-13/+96
| | | | | | | | | | | | | | | | | | | | | | FreeBSD porting notes: - only kernel-side changes are merged - the new ioctl is not actually implemented yet - thus, the goal is to synchronize DMU code illumos/illumos-gate@2bcf0248e992f292c7b814458bcdce2f004925d6 https://www.illumos.org/issues/5692 we would like to expose the number of hole (sparse) blocks in a file. this can be useful to for example if you want to fill in the holes with some data; knowing the number of holes in advances allows you to report progress on hole filling. We could use SEEK_HOLE to do that but it would be O(n) where n is the number of holes present in the file. Author: Max Grossman <max.grossman@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Approved by: Richard Lowe <richlowe@richlowe.net>
* spa_import_rootpool: prevent lock and resource leakavg2015-08-241-0/+2
| | | | | | | | The lock leak could lead to a deadlock later. PR: 198563 Submitted by: Fabian Keil <fk@fabiankeil.de> MFC after: 1 week
* account for ashift when gathering buffers to be written to l2arc deviceavg2015-08-241-13/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The change that introduced the L2ARC compression support also introduced a bug where the on-disk size of the selected buffers could end up larger than the target size if the ashift is greater than 9. This was because the buffer selection could did not take into account the fact that on-disk size could be larger than the in-memory buffer size due to the alignment requirements. At the moment b_asize is a misnomer as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded (on FreeBSD), but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem could also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). This change is already in ZFS-on-Linux: zfsonlinux/zfs@ef56b0780c80ebb0b1e637b8b8c79530a8ab3201 PR: 198242 PR: 195746 (possibly related) Reviewed by: mahrens (https://reviews.csiden.org/r/229/) Tested by: gkontos@aicom.gr (most recently) MFC after: 15 days X-MFC note: patch does not apply as is at the moment Relnotes: yes Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2764 Reviewed by: noone (@FreeBSD.org)
* try to fix lor between z_teardown_lock and spa_namespace_lockavg2015-08-211-10/+16
| | | | | | | | | | | | | | | | | | | The lock order reversal and a resulting deadlock were introduced in r285021 / D2865. The problem is that zfs_register_callbacks() calls dsl_prop_get_integer() that has to acquire spa_namespace_lock. At the same time, spa_config_sync() is called with spa_namespace_lock held and then it performs ZFS vnode operations that acquire z_teardown_lock in the reader mode. So, fix the problem by using dsl_prop_get_int_ds() instead of dsl_prop_get_integer(). The former does not need to look up the pool and the dataset by name. Reported by: many Reviewed by: delphij Tested by: delphij, Jens Schweikhardt <schweikh@schweikhardt.net> MFC after: 5 days X-MFC with: r285021
* fix a mismerge in r286539 (MFV 286538: 5562 ZFS sa_handle's violate...)avg2015-08-211-1/+1
| | | | | | PR: 202358 X-MFC with: r286539 X-MFC attn: mav
* Restore part of r274628, reverted at r286776.mav2015-08-201-1/+2
| | | | Submitted by: avg
OpenPOWER on IntegriCloud