| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Set NOCACHE flag for CREATE namei() calls, do not specially handle
MAKEENTRY in VOP_LOOKUP().
|
|
|
|
|
| |
Use %d instead of %u for error number. This way we see ERESTART as -1
not 4294967295 when doing DTrace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS large block support. The default recordsize remains at 128KB.
A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to
allow adjusting the permitted maximum record size, or
zfs_max_recordsize, with a default of 1MB. ZFS will not allow
setting recordsize greater than zfs_max_recordsize as a safety
belt, because larger recordsize means greater read and write
latency and more memory usage.
Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool).
Limited safety belt is provided for mounted root filesystem but use
caution when using a larger value.
Illumos issue:
5027 zfs large block support
|
| |
|
|
|
|
|
|
|
|
| |
It is implemented for LUNs backed by ZVOLs in "dev" mode and files.
GEOM has no such API, so for LUNs backed by raw devices all LBAs will
be reported as mapped/unknown.
Sponsored by: iXsystems, Inc.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix l2arc compression buffers leak
We have observed that arc_release() can be called concurrently with a
l2arc in-flight write.
Also, we have observed that arc_hdr_destroy() can be called from
arc_write_done() for a zio with ZIO_FLAG_IO_REWRITE flag in similar
circumstances.
Previously the l2arc headers would be freed while leaking their
associated compression buffers. Now the buffers are placed on
l2arc_free_on_write list for delayed freeing. This is similar to what
was already done to arc buffers that were supposed to be freed
concurrently with in-flight writes of those buffers.
In addition to fixing the discovered leaks this change also adds some
protective code to assert that a compression buffer associated with a
l2arc header is never leaked.
A new kstat l2_cdata_free_on_write is added. It keeps a count of
delayed compression buffer frees which previously would have been leaks.
Tested by: Vitalij Satanivskij <satan@ukr.net> et al
Requested by: many
Sponsored by: HybridCluster / ClusterHQ
This is a 10.1-RELEASE errata candidate.
|
|
|
|
|
|
|
| |
Add a tunable for spa_slop_shift which controls how much space we
would reserve by default. Tuning is not recommended.
Relnotes: yes
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve zdb -b performance:
- Reduce gethrtime() call to 1/100th of blkptr's;
- Skip manipulating the size-ordered tree;
- Issue more (10, previously 3) async reads;
- Use lighter weight testing in traverse_visitbp();
Illumos issue:
5243 zdb -b could be much faster
|
|
|
|
|
|
| |
Disable TRIM on file backed ZFS vdevs and fix TRIM on init
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
| |
Add to CTL support for logical block provisioning threshold notifications.
For ZVOL-backed LUNs this allows to inform initiators if storage's used or
available spaces get above/below the configured thresholds.
Sponsored by: iXsystems, Inc.
|
| |
|
|
|
|
| |
variable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change addresses 4 bugs in ZFS exposed by Richard Kojedzinszky's
crash.sh script attached to FreeNAS bug 4109:
https://bugs.freenas.org/issues/4109
Three are in the snapshot layer:
a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD
"VOP_INACTIVE must not do any destructive actions to a vnode
and its filesystem node, nor invalidate them in any way."
gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In
OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM.
Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim
and merge in the requisite vnode_destroy from zfsctl_common_reclaim.
b) gfs_lookup_dot and various zfsctl functions do not honor the
FreeBSD VFS convention of only locking from the root downward. When
looking up ".." the convention is to drop the current leaf vnode lock before
acquiring the directory vnode and then subsequently re-acquiring the lock on the
leaf vnode. This fixes that in all the places that our exercised by crash.sh.
c) The snapshot may already be unmounted when the directory vnode is reclaimed.
Check for this case and return.
One in the common layer:
d) Callers of traverse expect the reference to the vnode passed in to be
maintained. Don't release it.
This last one may be an unclear contract. There may in fact be some callers that
do expect the reference to be dropped on success in addition to callers that
expect it to be released. In this case a further audit of the callers is needed
and a consensus on the correct behavior.
PR: 184677
Submitted by: kmacy
Reviewed by: delphij, will, avg
Sponsored by: iXsystems
|
|
|
|
|
|
|
|
| |
Add a tunable for arc_shrink_shift (vfs.zfs.arc_shrink_shift) that
controls how much fraction, 1/2^arc_shrink_shift, should be reclaimed
when there is memory pressure.
Submitted by: Richard Kojedzinszky <krichy at tvnetwork.hu>
|
|
|
|
| |
Add tunable vfs.zfs.space_map_blksz for space map's maximum block size.
|
|
|
|
|
|
|
| |
- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.
Sponsored by: Mellanox Technologies
|
|
|
|
|
|
|
| |
Refactor the code and stop restore_object from creating two transactions.
Illumos issue:
3693 restore_object uses at least two transactions to restore an object
|
|
|
|
|
| |
Illumos issue:
5175 implement dmu_read_uio_dbuf() to improve cached read performance
|
|
|
|
|
|
|
| |
Use loaned ARC buffer for zfs receive to avoid copy.
Illumos issue:
5162 zfs recv should use loaned arc buffer to avoid copy
|
|
|
|
|
|
|
|
| |
Split the godfather zio into CPU number's to reduce lock
contention.
Illumos issue:
5176 lock contention on godfather zio
|
|
|
|
|
| |
Illumos issue:
5177 remove dead code from dsl_scan.c
|
|
|
|
|
| |
Illumos issue:
5174 add sdt probe for blocked read in dbuf_read()
|
|
|
|
|
|
|
|
| |
Add a new sysctl, vfs.zfs.vol.unmap_enabled, which allows the system
administrator to toggle whether ZFS should ignore UNMAP requests.
Illumos issue:
5149 zvols need a way to ignore DKIOCFREE
|
|
|
|
|
|
|
|
|
| |
Add tunable for number of metaslabs per vdev
(vfs.zfs.vdev.metaslabs_per_vdev). The default remains
at 200.
Illumos issue:
5161 add tunable for number of metaslabs per vdev
|
|
|
|
|
|
|
|
|
|
|
| |
Make space_map_truncate() always do space_map_reallocate(). Without
this, setting space_map_max_blksz would cause panic for existing pool,
as dmu_objset_set_blocksize would fail if the object have multiple blocks.
Illumos issues:
5164 space_map_max_blksz causes panic, does not work
5165 zdb fails assertion when run on pool with recently-enabled
spacemap_histogram feature
|
|
|
|
|
|
|
|
|
|
| |
Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT
when cloning. This prevents DS_FLAG_DEFER_DESTROY being inherited from a
clone that is marked for deferred destroy, which causes snapshots of the
clone being destroyed when getting a hold or clone.
Illumos issue:
5150 zfs clone of a defer_destroy snapshot causes strangeness
|
|
|
|
|
|
| |
Don't make nested definition for range_seg_cache.
Reported by: ian
|
|
|
|
|
|
|
|
| |
In arc_kmem_reap_now(), reap range_seg_cache too to reclaim memory in
response of memory pressure.
Illumos issue:
5163 arc should reap range_seg_cache
|
|
|
|
|
|
|
|
| |
Use write_psize instead of write_asize when doing vdev_space_update.
Without this change the accounting of L2ARC usage would be wrong and
give 16EB free space because the number became negative and overflows.
Obtained from: FreeNAS (issue #6239)
|
|
|
|
|
|
|
|
|
| |
Prevent ZFS leaking pool free space
Early MFC approved by re@
Approved by: re@ (glebius)
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
| |
Continue the crusade towards a dev_clone()-free kernel, removing its
usage from dtrace. The dtrace code already uses cdevpriv(9) since FreeBSD
8, so this change is quite harmless.
Originally by: davide
Reviewed by: markj
|
| |
|
|
|
|
|
|
| |
Fix various issues with zvols
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
| |
Added missing ZFS sysctls
This also includes small additional direct changes as it still uses the old
way of handling tunables.
Sponsored by: Multiplay
|
|
|
|
|
|
| |
Remove unused ZFS ARC functions
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add dtrace probe support for zfs SET_ERROR(..)
MFC r271873:
Fix static kernel build with options ZFS
MFC r271819:
Remove sys/types.h include as per style (9)
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor ZFS ARC reclaim logic to be more VM cooperative
MFC r270861:
Ensure that ZFS ARC free memory checks include cached pages
MFC r272483:
Refactor ZFS ARC reclaim checks and limits
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avoid cache the file's state indefinitely. The va_filerev is what is sent
to the client as the "change" attribute, the client is periodically fetching
the attributes and without this option the attribute remains as some garbage
value.
Phabric: D905
Reported by: Kevin Buhr <buhr@asaurus.net>
Reviewed by: rmacklem, delphij
Approved by: delphij
Obtained from: r272467
Sponsored by: QNAP Systems Inc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
limit how many blocks can be free'ed before a new transaction group is
created. The default is no limit (infinite), but we should probably have
a lower default, e.g. 100,000.
With this limit, we can guard against the case where ZFS could run out of
memory when destroying large numbers of blocks in a single transaction
group, as the entire DDT needs to be brought into memory.
Illumos issue:
5138 add tunable for maximum number of blocks freed in one txg
|
|
|
|
| |
Make ZVOL writes in device mode support IO_SYNC flag.
|
|
|
|
|
|
|
| |
Illumos issue:
5136 fix write throttle comment in dsl_pool.c
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
| |
Diff reduction with kernel code: instruct the compiler that the data of
these types may be unaligned to their "normal" alignment and exercise
caution when accessing them.
PR: 194071
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).
This makes some space estimates more accurate and uses less memory
for some data structures.
Illumos issue:
5141 zfs minimum indirect block size is 4K
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correctly report hole at end of file.
When asked to find a hole, the DMU sees that there are no holes in the
object, and returns ESRCH. The ZPL interprets this as "no holes before
the end of the file", and therefore inserts the "virtual hole" at the
end of the file. Because DMU and ZPL have different ideas of where the
end of an object/file is, we will end up returning the end of file,
which is generally larger, instead of returning the end of object.
The fix is to handle the "virtual hole" in the DMU. If no hole is found,
the DMU will return a hole at the end of the file, rather than an error.
Illumos issue:
5139 SEEK_HOLE failed to report a hole at end of file
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
| |
In zil_claim, don't issue warning if we get EBUSY (inconsistent) when
opening an objset, instead, ignore it silently.
Illumos issue:
5140 message about "%recv could not be opened" is printed when
booting after crash
Approved by: re (gjb)
|
|
|
|
|
|
|
|
| |
Persist vdev_resilver_txg changes to avoid panic caused by validation
vs a vdev_resilver_txg value from a previous resilver.
Approved by: re (glebius)
Sponsored by: Multiplay
|
|
|
|
|
|
|
| |
Don't treat TRIM requests returning ENOTSUP as an unexpected error
Approved by: re (gjb)
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
| |
Add sysctls for ZFS dirty data tuning.
MFC r266533:
Improve sysctl descriptions for new ZFS sysctls.
Approved by: re (marius)
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
| |
In dnode_sync(), do dnode_increase_indirection() before processing
the dn_next_nblkptr.
Illumos issue:
5117 space map reallocation can cause corruption
Approved by: re (gjb)
|