| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Initialize zfs vnode v_hash when the vnode is allocated.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce lock contention on the z_teardown_lock under heavily cached
read workload by splitting the single teardown rrw lock into
RRM_NUM_LOCKS (17) of them.
Read acquisitions are randomly distributed among these locks based
on curthread pointer. Write acquisitions are going to all the
locks, which for the usage of this type of lock should be rare.
Illumos issue:
5008 lock contention (rrw_exit) while running a read only load
|
|
|
|
|
|
|
|
|
| |
When a sync task is waiting for a txg to complete, we should hurry it along
by increasing the number of outstanding async writes (i.e. make
vdev_queue_max_async_writes() return a larger number).
Illumos issue:
4753 increase number of outstanding async writes when sync task is waiting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the interaction between the DMU and ARC so that when the DMU is
shutting down an objset, we do not evict the data from the ARC. Instead
we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is
when dbuf_rele_and_unlock() determines that the administrator has requested
that it not be kept in memory, via the primarycache/secondarycache properties.
In this case, we evict the data from the ARC by its blkptr_t, the same way
as when a block is freed we explicitly evict it from the ARC.
Illumos issue:
4631 zvol_get_stats triggering too many reads
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DTrace's pid provider works by inserting breakpoint instructions at probe
sites and installing a hook at the kernel's trap handler. The fasttrap code
will emulate the overwritten instruction in some common cases, but otherwise
copies it out into some scratch space in the traced process' address space
and ensures that it's executed after returning from the trap.
In Solaris and illumos, this (per-thread) scratch space comes from some
reserved space in TLS, accessible via the fs segment register. This
approach is somewhat unappealing on FreeBSD since it would require some
modifications to rtld and jemalloc (for static TLS) to ensure that TLS is
executable, and would thus introduce dependencies on their implementation
details. I think it would also be impossible to safely trace static binaries
compiled without these modifications.
This change implements the functionality in a different way, by having
fasttrap map pages into the target process' address space on demand. Each
page is divided into 64-byte chunks for use by individual threads, and
fasttrap's process descriptor struct has been extended to keep track of
any scratch space allocated for the corresponding process.
With this change it's possible to trace all libc functions in a program,
e.g. with
pid$target:libc.so.*::entry {@[probefunc] = count();}
Previously this would generally cause the victim process to crash, as
tracing memcpy on amd64 requires the functionality described above.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight
vfs.zfs.spa_load_verify_metadata
vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
|
|
|
|
|
|
|
| |
Add missing *_destroy() calls in various places with ZFS.
Illumos issue:
4975 missing mutex_destroy() calls in zfs
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.
|
|
|
|
| |
Use reserved space for ZFS administrative commands.
|
|
|
|
|
|
|
|
| |
Explicitly mark file removal transactions as "presumed to result
in a net free of space" so they will not fail with ENOSPC.
Illumos issue: 4950 files sometimes can't be removed from a full
filesystem
|
|
|
|
|
|
|
|
| |
- Fix handling of "new" style of ioctl in compatiblity mode [1];
- Reorganize code and reduce diff from upstream;
- Improve forward compatibility shims for previous kernel;
Reported by: sbruno [1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFV r260708
4427 pid provider rejects probes with valid UTF-8 names
This make use of Solaris' u8_validate() which we happen to
use since r185029 for ZFS.
Use of u8_textprep.c required -Wno-cast-qual for powerpc.
Illumos Revision: 1444d846b126463eb1059a572ff114d51f7562e5
Reference:
https://www.illumos.org/issues/4427
Obtained from: Illumos
|
|
|
|
| |
4929 want prevsnap property
|
|
|
|
| |
4924 LZ4 Compression for metadata
|
|
|
|
| |
4914 zfs on-disk bookmark structure should be named *_phys_t
|
|
|
|
| |
4756 metaslab_group_preload() could deadlock
|
|
|
|
| |
4897 Space accounting mismatch in L2ARC/zpool
|
|
|
|
|
| |
4881 zfs send performance degradation when embedded block pointers are
encountered
|
|
|
|
|
| |
4390 i/o errors when deleting filesystem/zvol can lead to space map
corruption
|
|
|
|
|
| |
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new zfs property, "redundant_metadata" which can have values "all" or
"most". The default will be "all", which is the current behavior. When set
to all, ZFS stores an extra copy of all metadata. If a single on-disk block
is corrupt, at worst a single block of user data (which is recordsize bytes
long) can be lost.
Setting to "most" will cause us to only store 1 copy of level-1 indirect
blocks of user data files. This can improve performance of random writes,
because less metadata has to be written. In practice, at worst about
100 blocks (of recordsize bytes each) of user data can be lost if a single
on-disk block is corrupt.
The exact behavior of which metadata blocks are stored redundantly may change
in future releases.
Illumos issue: 3835 zfs need not store 2 copies of all metadata
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge from OpenSolaris (24-Jul-2010):
6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops
6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink
This finishes a set of merges from the older OpenSolaris releases.
Still the FreeBSD port has many differences that are difficult to
account for but that seems normal given that the kernels are different.
Obtained from: OpenSolaris (through Illumos)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
4477 DTrace should speak JSON
Add stubs for CTF functions which are not yet implemented.
4474 DTrace Userland CTF Support
4475 DTrace userland Keyword
4476 DTrace tests should be better citizens
4479 pid provider types
4480 dof emulation is missing checks
4471 DTrace count() with histogram
4472 DTrace full width distribution histograms
4473 DTrace frequency trails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various DTrace Merges from OpenSolaris/Illumos:
15-Sep-2008:
6735480 race between probe enabling and provider registration
20-Apr-2008:
6822482 DOF validation needs to handle loadable sections flagged as unloadable
22-Apr-2009:
6823388 DTrace ioctl handlers must validate all structure members
30-Jun-2009:
6851093 system drops to kmdb with anonymous dtrace probes + kmdb
Obtained from: OpenSolaris
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small merges from OpenSolaris:
These have no effect on FreeBSD, in fact they are ifdef'ed,
but make easier future merges:
6699767 panic in spec_open()
6718877 crgetzoneid() use can cause problems when forking processes with
USDT providers in a non global zone
|
|
|
|
|
|
|
| |
Fix bug in sync control in new "dev" mode of ZVOL (r265678).
Don't check ZVOL_WCE flag, used in Solaris to control device "write cache".
It is not applicable on FreeBSD and by default set to "disable".
|
|
|
|
|
|
| |
Reduce some warnings in the Solaris unicode support.
Clean some warnings from parenthesis and minor style issues.
|
|
|
|
|
|
| |
Replace gethrtime() with cpu_ticks(), as source of random for the taskqueue
selection. gethrtime() in our port updated with HZ rate, so unusable for
this specific purpose, completely draining benefit of multiple taskqueues.
|
|
|
|
|
| |
3897 zfs filesystem and snapshot limits (fix leak)
4901 zfs filesystem/snapshot limit leaks
|
|
|
|
|
|
| |
Eliminate duplicate checks in vdev_geom_io_intr error handling
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
| |
Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some
upstream DTrace code.
MFC r262330:
1452 DTrace buffer autoscaling should be less violent
illumos/illumos-gate@6fb4854bed54ce82bd8610896b64ddebcd4af706
|
|
|
|
|
|
|
|
|
|
| |
Add the ability to set a minimum ashift size for ZFS pool creation or root level
vdev addition.
Change max_auto_ashift sysctl to error when an invalid value is requested instead
of silently limiting it.
Sponsored by: Multiplay
|
|
|
|
|
|
|
| |
Expose a few DTrace parameters as sysctls under kern.dtrace and add
descriptions for several existing sysctls.
PR: 187027
|
|
|
|
|
|
|
|
|
|
|
| |
Import George Wilson's change for Illumos #4730:
4730 metaslab group taskq should be destroyed in metaslab_group_destroy()
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Original author: George Wilson
|
|
|
|
| |
4745 fix AVL code misspellings
|
|
|
|
| |
3897 zfs filesystem and snapshot limits
|
|
|
|
|
|
|
| |
4754 io issued to near-full luns even after setting noalloc threshold
4755 mg_alloc_failures is no longer needed
illumos/illumos@b6240e830b871f59c22a3918aebb3b36c872edba
|
|
|
|
|
|
| |
4374 dn_free_ranges should use range_tree_t
illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add property and sysctl to control how ZVOLs are exposed to OS.
New ZFS property volmode and sysctl vfs.zfs.vol.mode allow switching ZVOL
between three modes:
geom -- existing fully functional behavior (default);
dev -- exposing volumes only as raw disk device file in devfs;
none -- not exposing volumes outside ZFS.
The "dev" mode is less functional (can't be partitioned, mounted, etc),
but it is faster, and in some scenarios with untrusted consumers safer.
It can be useful for NAS, VM block storages, etc.
The "none" mode may be convenient for backup servers, etc. that don't
need direct data access.
Due to the way ZVOL is integrated with main ZFS code, those property
and sysctl are checked only during pool import and volume creation.
|
|
|
|
|
|
|
|
| |
3580 Want zvols to return volblocksize when queried for physical block size
illumos/illumos-gate@a0b60564dfc644f4bfaef1ce26d343b44cf68bc5
It is irrelevant for FreeBSD, just reducing diff.
|
|
|
|
|
| |
Fix emulation of call and jmp instructions on i386 and for 32-bit processes
on amd64.
|
|
|
|
|
| |
Move some files that are identical on i386 and amd64 to an x86 subdirectory
rather than keeping duplicate copies.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
4248 dtrace(1M) should never create DOF with empty probes section
4249 Only probes from the first DTrace object file will be included
Illumos Revision: 4a20ab41aadcb81c53e72fc65886e964e9add59
Reference:
https://www.illumos.org/issues/4248
https://www.illumos.org/issues/4249
Obtained from: Illumos
|
|
|
|
|
|
| |
Fix ZIO reordering issue which could cause data loss / corruption.
Sponsored by: Multiplay
|
|
|
|
|
|
| |
4478 dtrace_dof_maxsize is far too small
illumos/illumos-gate@d339a29bb4765c4b6883a935cf69b669cd05bca0
|
|
|
|
| |
In addition to r264077, tell GEOM that we do support BIO_DELETE now.
|
|
|
|
|
|
|
|
| |
Add BIO_DELETE support to ZVOL.
It is an adapted merge from the vendor branch of:
701 UNMAP support for COMSTAR (in part related to ZFS)
2130 zvol DKIOCFREE uses nested DMU transactions
|
|
|
|
|
|
|
|
| |
Create zvol devices on zfs clone.
While big and shiny patch is not ready, it is better to have something.
PR: kern/178999
|
|
|
|
| |
Report ZVOL block size as GEOM stripesize.
|