| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
"NOW" priority
MFC r265321 - Fix double fault panic when returning EOPNOTSUPP
MFC r269407 - Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
Sponsored by: Multiplay
|
|
|
|
|
| |
Return 0 for the PPID of threads in process 0, as process 0 doesn't have a
parent process.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update dis_tables.c to the latest Illumos version.
This includes decodes of recent Intel instructions, in particular
VT-x and related instructions. This allows the FBT provider to
locate the exit points of routines that include these new
instructions.
Illumos issues:
3414 Need a new word of AT_SUN_HWCAP bits
3415 Add isainfo support for f16c and rdrand
3416 Need disassembler support for rdrand and f16c
3413 isainfo -v overflows 80 columns
3417 mdb disassembler confuses rdtscp for invlpg
1518 dis should support AMD SVM/AMD-V/Pacifica instructions
1096 i386 disassembler should understand complex nops
1362 add kvmstat for monitoring of KVM statistics
1363 add vmregs[] variable to DTrace
1364 need disassembler support for VMX instructions
1365 mdb needs 16-bit disassembler support
This corresponds to Illumos-gate (github) version
eb23829ff08a873c612ac45d191d559394b4b408
|
|
|
|
|
|
|
|
| |
In vdev_get_stats, check that the vdev is not a hole before computing the
fragmentation. This fixes a panic when removing log device.
Illumos issue:
5049 panic when removing log device
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.
This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.
Illumos issue:
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue:
5034 ARC's buf_hash_table is too small
|
|
|
|
|
|
|
| |
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets
|
|
|
|
|
|
| |
Add 64-bit atomic ops for armv6, and also for armv4 only in kernel code.
Use the new ops in the cddl code (and avoid defining functions with the
same names locally).
|
|
|
|
| |
Add two sysctls for newly added tunables.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
|
|
|
|
| |
Correct the check for errors from proc_rwmem().
|
|
|
|
|
| |
Transform the I/O when vdev_physical_ashift is greater than
SPA_MINBLOCKSHIFT.
|
|
|
|
|
|
|
|
| |
As of r268075, the responsibility of rounding up buffer to optimal size have
been transferred from zio_compress_data to its caller. Therefore, passing
the 'minblocksize' down will be a no-op.
Eliminate the parameter to reduce diff against upstream.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r267759:
Fix a couple of bugs on amd64 when fetching probe arguments beyond the
first five for probes entered through a UD fault (i.e. FBT probes).
Specifically, handle the fact that dtrace_invop_callsite must be
16 byte-aligned and thus may not immediately follow the call to
dtrace_invop() in dtrace_invop_start(). Also fetch register arguments and
the stack pointer through a struct trapframe instead of a struct reg.
r267761:
Fix some bugs when fetching probe arguments in i386. Firstly ensure that
the 4 byte-aligned dtrace_invop_callsite can be found and that it
immediately follows the call to dtrace_invop(). Secondly, fix some pointer
arithmetic to account for differences between struct i386_frame and illumos'
struct frame. Finally, ensure that dtrace_getarg() isn't inlined. It works
by following a fixed number of frame pointers to the probe site, so inlining
breaks it.
PR: 191260
|
|
|
|
|
|
| |
Allow creation of SDT probes from a module in which no providers are
defined. This ensures that the sdt:zfs:: probes appear despite the fact
the sdt provider is defined in the kernel rather than in zfs.ko.
|
|
|
|
|
|
|
|
|
| |
When fetching function arguments out of a frame on amd64, explicitly select
the register based on the argument index rather than relying on the fields
in struct reg to be in the right order. This assumption is incorrect on
FreeBSD and generally led to bogus argument values for the sixth argument
of PID and USDT probes; the first five are passed directly to dtrace_probe()
via the fasttrap trap handler and so were correctly handled.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a function, memstr, which can be used to convert a buffer of
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:
dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'
Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:
fbt::kern_execve:entry
{
printf("%s", memstr(args[1]->begin_argv, ' ',
args[1]->begin_envv - args[1]->begin_argv));
}
The debug.dtrace.memstr_max sysctl limits the maximum argument size to
memstr().
|
|
|
|
| |
Initialize zfs vnode v_hash when the vnode is allocated.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce lock contention on the z_teardown_lock under heavily cached
read workload by splitting the single teardown rrw lock into
RRM_NUM_LOCKS (17) of them.
Read acquisitions are randomly distributed among these locks based
on curthread pointer. Write acquisitions are going to all the
locks, which for the usage of this type of lock should be rare.
Illumos issue:
5008 lock contention (rrw_exit) while running a read only load
|
|
|
|
|
|
|
|
|
| |
When a sync task is waiting for a txg to complete, we should hurry it along
by increasing the number of outstanding async writes (i.e. make
vdev_queue_max_async_writes() return a larger number).
Illumos issue:
4753 increase number of outstanding async writes when sync task is waiting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the interaction between the DMU and ARC so that when the DMU is
shutting down an objset, we do not evict the data from the ARC. Instead
we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is
when dbuf_rele_and_unlock() determines that the administrator has requested
that it not be kept in memory, via the primarycache/secondarycache properties.
In this case, we evict the data from the ARC by its blkptr_t, the same way
as when a block is freed we explicitly evict it from the ARC.
Illumos issue:
4631 zvol_get_stats triggering too many reads
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DTrace's pid provider works by inserting breakpoint instructions at probe
sites and installing a hook at the kernel's trap handler. The fasttrap code
will emulate the overwritten instruction in some common cases, but otherwise
copies it out into some scratch space in the traced process' address space
and ensures that it's executed after returning from the trap.
In Solaris and illumos, this (per-thread) scratch space comes from some
reserved space in TLS, accessible via the fs segment register. This
approach is somewhat unappealing on FreeBSD since it would require some
modifications to rtld and jemalloc (for static TLS) to ensure that TLS is
executable, and would thus introduce dependencies on their implementation
details. I think it would also be impossible to safely trace static binaries
compiled without these modifications.
This change implements the functionality in a different way, by having
fasttrap map pages into the target process' address space on demand. Each
page is divided into 64-byte chunks for use by individual threads, and
fasttrap's process descriptor struct has been extended to keep track of
any scratch space allocated for the corresponding process.
With this change it's possible to trace all libc functions in a program,
e.g. with
pid$target:libc.so.*::entry {@[probefunc] = count();}
Previously this would generally cause the victim process to crash, as
tracing memcpy on amd64 requires the functionality described above.
|
|
|
|
|
| |
Ensure that all eight syscall arguments are available to dtrace_probe(),
rather than just the first five.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve extreme rewind import.
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
vfs.zfs.spa_load_verify_maxinflight
vfs.zfs.spa_load_verify_metadata
vfs.zfs.spa_load_verify_data
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
|
|
|
|
|
|
|
| |
Add missing *_destroy() calls in various places with ZFS.
Illumos issue:
4975 missing mutex_destroy() calls in zfs
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.
|
|
|
|
| |
Use reserved space for ZFS administrative commands.
|
|
|
|
|
|
|
|
| |
Explicitly mark file removal transactions as "presumed to result
in a net free of space" so they will not fail with ENOSPC.
Illumos issue: 4950 files sometimes can't be removed from a full
filesystem
|
|
|
|
|
|
|
|
| |
- Fix handling of "new" style of ioctl in compatiblity mode [1];
- Reorganize code and reduce diff from upstream;
- Improve forward compatibility shims for previous kernel;
Reported by: sbruno [1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFV r260708
4427 pid provider rejects probes with valid UTF-8 names
This make use of Solaris' u8_validate() which we happen to
use since r185029 for ZFS.
Use of u8_textprep.c required -Wno-cast-qual for powerpc.
Illumos Revision: 1444d846b126463eb1059a572ff114d51f7562e5
Reference:
https://www.illumos.org/issues/4427
Obtained from: Illumos
|
|
|
|
| |
4929 want prevsnap property
|
|
|
|
| |
4924 LZ4 Compression for metadata
|
|
|
|
| |
4914 zfs on-disk bookmark structure should be named *_phys_t
|
|
|
|
| |
4756 metaslab_group_preload() could deadlock
|
|
|
|
| |
4897 Space accounting mismatch in L2ARC/zpool
|
|
|
|
|
| |
4881 zfs send performance degradation when embedded block pointers are
encountered
|
|
|
|
|
| |
4390 i/o errors when deleting filesystem/zvol can lead to space map
corruption
|
|
|
|
|
| |
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new zfs property, "redundant_metadata" which can have values "all" or
"most". The default will be "all", which is the current behavior. When set
to all, ZFS stores an extra copy of all metadata. If a single on-disk block
is corrupt, at worst a single block of user data (which is recordsize bytes
long) can be lost.
Setting to "most" will cause us to only store 1 copy of level-1 indirect
blocks of user data files. This can improve performance of random writes,
because less metadata has to be written. In practice, at worst about
100 blocks (of recordsize bytes each) of user data can be lost if a single
on-disk block is corrupt.
The exact behavior of which metadata blocks are stored redundantly may change
in future releases.
Illumos issue: 3835 zfs need not store 2 copies of all metadata
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge from OpenSolaris (24-Jul-2010):
6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops
6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink
This finishes a set of merges from the older OpenSolaris releases.
Still the FreeBSD port has many differences that are difficult to
account for but that seems normal given that the kernels are different.
Obtained from: OpenSolaris (through Illumos)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
4477 DTrace should speak JSON
Add stubs for CTF functions which are not yet implemented.
4474 DTrace Userland CTF Support
4475 DTrace userland Keyword
4476 DTrace tests should be better citizens
4479 pid provider types
4480 dof emulation is missing checks
4471 DTrace count() with histogram
4472 DTrace full width distribution histograms
4473 DTrace frequency trails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various DTrace Merges from OpenSolaris/Illumos:
15-Sep-2008:
6735480 race between probe enabling and provider registration
20-Apr-2008:
6822482 DOF validation needs to handle loadable sections flagged as unloadable
22-Apr-2009:
6823388 DTrace ioctl handlers must validate all structure members
30-Jun-2009:
6851093 system drops to kmdb with anonymous dtrace probes + kmdb
Obtained from: OpenSolaris
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small merges from OpenSolaris:
These have no effect on FreeBSD, in fact they are ifdef'ed,
but make easier future merges:
6699767 panic in spec_open()
6718877 crgetzoneid() use can cause problems when forking processes with
USDT providers in a non global zone
|
|
|
|
|
|
|
| |
Fix bug in sync control in new "dev" mode of ZVOL (r265678).
Don't check ZVOL_WCE flag, used in Solaris to control device "write cache".
It is not applicable on FreeBSD and by default set to "disable".
|
|
|
|
|
|
| |
Reduce some warnings in the Solaris unicode support.
Clean some warnings from parenthesis and minor style issues.
|
|
|
|
|
|
| |
Replace gethrtime() with cpu_ticks(), as source of random for the taskqueue
selection. gethrtime() in our port updated with HZ rate, so unusable for
this specific purpose, completely draining benefit of multiple taskqueues.
|
|
|
|
|
| |
3897 zfs filesystem and snapshot limits (fix leak)
4901 zfs filesystem/snapshot limit leaks
|
|
|
|
|
|
| |
Eliminate duplicate checks in vdev_geom_io_intr error handling
Sponsored by: Multiplay
|
|
|
|
|
|
|
|
|
|
| |
Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some
upstream DTrace code.
MFC r262330:
1452 DTrace buffer autoscaling should be less violent
illumos/illumos-gate@6fb4854bed54ce82bd8610896b64ddebcd4af706
|