summaryrefslogtreecommitdiffstats
path: root/sys/cddl
Commit message (Collapse)AuthorAgeFilesLines
* MFC r265152 - Reintroduce priority for the TRIM ZIOs instead of using the ↵smh2014-08-2112-153/+194
| | | | | | | | | "NOW" priority MFC r265321 - Fix double fault panic when returning EOPNOTSUPP MFC r269407 - Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods Sponsored by: Multiplay
* MFC r269525:markj2014-08-211-1/+4
| | | | | Return 0 for the PPID of threads in process 0, as process 0 doesn't have a parent process.
* MFC r266103grehan2014-08-191-17/+188
| | | | | | | | | | | | | | | | | | | | | | | | | Update dis_tables.c to the latest Illumos version. This includes decodes of recent Intel instructions, in particular VT-x and related instructions. This allows the FBT provider to locate the exit points of routines that include these new instructions. Illumos issues: 3414 Need a new word of AT_SUN_HWCAP bits 3415 Add isainfo support for f16c and rdrand 3416 Need disassembler support for rdrand and f16c 3413 isainfo -v overflows 80 columns 3417 mdb disassembler confuses rdtscp for invlpg 1518 dis should support AMD SVM/AMD-V/Pacifica instructions 1096 i386 disassembler should understand complex nops 1362 add kvmstat for monitoring of KVM statistics 1363 add vmregs[] variable to DTrace 1364 need disassembler support for VMX instructions 1365 mdb needs 16-bit disassembler support This corresponds to Illumos-gate (github) version eb23829ff08a873c612ac45d191d559394b4b408
* MFC r269543: MFV r269542:delphij2014-08-181-1/+2
| | | | | | | | In vdev_get_stats, check that the vdev is not a hole before computing the fragmentation. This fixes a panic when removing log device. Illumos issue: 5049 panic when removing log device
* MFC r269431: MFV r269427:delphij2014-08-182-4/+4
| | | | | | | | | | | | In dnode_children_t, use C99's "[]" idiom for declaring the variable sized array dnc_children at the end of the structure. This prevents the compiler from mistakenly optimizing away accesses beyond the array's defined size. Illumos issue: 5038 Remove "old-style" flexible array usage in ZFS. Author: Justin T. Gibbs <justing@spectralogic.com>
* MFC r269230: MFV r269224:delphij2014-08-121-3/+9
| | | | | | | | | | | | | Increase default ARC buf_hash_table size. When typical block size is small, the hash table could be too small, which would lead to long hash chains and limit performance for cached reads. A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which allows users to override the default assumption of average (typical) block size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB). Illumos issue: 5034 ARC's buf_hash_table is too small
* MFC r269229,269404,269466: MFV r269223:delphij2014-08-127-39/+134
| | | | | | | Change dn->dn_dbufs from linked list to AVL tree. Illumos issues: 4873 zvol unmap calls can take a very long time for larger datasets
* MFC r269403, r269405, r269410, r269414:ian2014-08-112-3/+3
| | | | | | Add 64-bit atomic ops for armv6, and also for armv4 only in kernel code. Use the new ops in the cddl code (and avoid defining functions with the same names locally).
* MFC r269138:delphij2014-08-101-0/+11
| | | | Add two sysctls for newly added tunables.
* MFC r269118: MFV r269010:delphij2014-08-1011-207/+614
| | | | | | | | | | | | | Import Illumos changes to address the following Illumos issues: 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4984 device selection should use fragmentation metric
* MFC r259211:markj2014-08-092-2/+2
| | | | Correct the check for errors from proc_rwmem().
* MFC r269093:delphij2014-08-081-1/+2
| | | | | Transform the I/O when vdev_physical_ashift is greater than SPA_MINBLOCKSHIFT.
* MFC r269086:delphij2014-08-084-7/+4
| | | | | | | | As of r268075, the responsibility of rounding up buffer to optimal size have been transferred from zio_compress_data to its caller. Therefore, passing the 'minblocksize' down will be a no-op. Eliminate the parameter to reduce diff against upstream.
* MFC r267759, r267761markj2014-08-054-20/+20
| | | | | | | | | | | | | | | | | | | | | | r267759: Fix a couple of bugs on amd64 when fetching probe arguments beyond the first five for probes entered through a UD fault (i.e. FBT probes). Specifically, handle the fact that dtrace_invop_callsite must be 16 byte-aligned and thus may not immediately follow the call to dtrace_invop() in dtrace_invop_start(). Also fetch register arguments and the stack pointer through a struct trapframe instead of a struct reg. r267761: Fix some bugs when fetching probe arguments in i386. Firstly ensure that the 4 byte-aligned dtrace_invop_callsite can be found and that it immediately follows the call to dtrace_invop(). Secondly, fix some pointer arithmetic to account for differences between struct i386_frame and illumos' struct frame. Finally, ensure that dtrace_getarg() isn't inlined. It works by following a fixed number of frame pointers to the probe site, so inlining breaks it. PR: 191260
* MFC r267706:markj2014-08-051-16/+17
| | | | | | Allow creation of SDT probes from a module in which no providers are defined. This ensures that the sdt:zfs:: probes appear despite the fact the sdt provider is defined in the kernel rather than in zfs.ko.
* MFC r256822:markj2014-08-042-2/+35
| | | | | | | | | When fetching function arguments out of a frame on amd64, explicitly select the register based on the argument index rather than relying on the fields in struct reg to be in the right order. This assumption is incorrect on FreeBSD and generally led to bogus argument values for the sixth argument of PID and USDT probes; the first five are passed directly to dtrace_probe() via the fasttrap trap handler and so were correctly handled.
* MFC r256571:markj2014-08-043-0/+49
| | | | | | | | | | | | | | | | | | | | | | | Add a function, memstr, which can be used to convert a buffer of null-separated strings to a single string. This can be used to print the full arguments of a process using execsnoop (from the DTrace toolkit) or with the following one-liner: dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}' Note that this relies on the process arguments being cached via the struct proc, which means that it will not work for argvs longer than kern.ps_arg_cache_limit. However, the following rather non-portable script can be used to extract any argv at exec time: fbt::kern_execve:entry { printf("%s", memstr(args[1]->begin_argv, ' ', args[1]->begin_envv - args[1]->begin_argv)); } The debug.dtrace.memstr_max sysctl limits the maximum argument size to memstr().
* MFC r269189:kib2014-08-042-4/+3
| | | | Initialize zfs vnode v_hash when the vnode is allocated.
* MFC r268865: MFV r268852:delphij2014-08-027-17/+130
| | | | | | | | | | | | | Reduce lock contention on the z_teardown_lock under heavily cached read workload by splitting the single teardown rrw lock into RRM_NUM_LOCKS (17) of them. Read acquisitions are randomly distributed among these locks based on curthread pointer. Write acquisitions are going to all the locks, which for the usage of this type of lock should be rare. Illumos issue: 5008 lock contention (rrw_exit) while running a read only load
* MFC r268859: MFV r268851:delphij2014-08-025-6/+44
| | | | | | | | | When a sync task is waiting for a txg to complete, we should hurry it along by increasing the number of outstanding async writes (i.e. make vdev_queue_max_async_writes() return a larger number). Illumos issue: 4753 increase number of outstanding async writes when sync task is waiting
* MFC r268858: MFV r268850:delphij2014-08-023-81/+64
| | | | | | | | | | | | | | | Change the interaction between the DMU and ARC so that when the DMU is shutting down an objset, we do not evict the data from the ARC. Instead we simply coordinate the destruction of the DMU's data with the ARC. The only case where we actually need to explicitly evict from the ARC is when dbuf_rele_and_unlock() determines that the administrator has requested that it not be kept in memory, via the primarycache/secondarycache properties. In this case, we evict the data from the ARC by its blkptr_t, the same way as when a block is freed we explicitly evict it from the ARC. Illumos issue: 4631 zvol_get_stats triggering too many reads
* MFC r268855: MFV r268848:delphij2014-08-027-39/+93
| | | | | | | | | | | | | | | | | | Instead of asserting all zio's be properly aligned, only assert on the logical ones. Cap uberblocks at 8k, otherwise with ashift=17, there would be only one uberblock. This fixes a problem that zdb would trip assert on pools with ashift >= 0xe (8k). While there, also change the code so it only attempt to condense space map unless the uncondensed size consumes greater than zfs_metaslab_condense_block_threshold blocks. Illumos issue: 4958 zdb trips assert on pools with ashift >= 0xe
* MFC r264434:markj2014-07-314-11/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DTrace's pid provider works by inserting breakpoint instructions at probe sites and installing a hook at the kernel's trap handler. The fasttrap code will emulate the overwritten instruction in some common cases, but otherwise copies it out into some scratch space in the traced process' address space and ensures that it's executed after returning from the trap. In Solaris and illumos, this (per-thread) scratch space comes from some reserved space in TLS, accessible via the fs segment register. This approach is somewhat unappealing on FreeBSD since it would require some modifications to rtld and jemalloc (for static TLS) to ensure that TLS is executable, and would thus introduce dependencies on their implementation details. I think it would also be impossible to safely trace static binaries compiled without these modifications. This change implements the functionality in a different way, by having fasttrap map pages into the target process' address space on demand. Each page is divided into 64-byte chunks for use by individual threads, and fasttrap's process descriptor struct has been extended to keep track of any scratch space allocated for the corresponding process. With this change it's possible to trace all libc functions in a program, e.g. with pid$target:libc.so.*::entry {@[probefunc] = count();} Previously this would generally cause the victim process to crash, as tracing memcpy on amd64 requires the functionality described above.
* MFC r264435:markj2014-07-291-1/+7
| | | | | Ensure that all eight syscall arguments are available to dtrace_probe(), rather than just the first five.
* MFC r268720: MFV r268714:delphij2014-07-291-13/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve extreme rewind import. When doing an "extreme rewind" import ("zpool import -XF"), we attempt to verify all data in the pool, essentially scrubbing the entire pool. The problem is that spa_load_verify_cb() issues an unbounded number of concurrent scrub i/os. This can lead to all of memory being used for these zio's, wedging the system. Like normal scrub, we need to put a cap on the number of outstanding i/os, and have the traverse thread block when we reach this cap. For this purpose the cap can be very large (10,000) to optimize the elevator algorithm. Three kernel tunables have been added: vfs.zfs.spa_load_verify_maxinflight vfs.zfs.spa_load_verify_metadata vfs.zfs.spa_load_verify_data The latter two tunables controls whether metadata and/or user data when doing extreme rewind. Make 'zpool import -T' imply scrub. Make zpool import -T <txg> accept hexadecimal values for the txg when prefixed with 0x. Skip txg's for which there is no uberblock when doing extreme rewind. Skip reading all user data twice by skipping prefetches when doing extreme rewinds as we do not access via the ARC. Illumos issues: 4970 need controls on i/o issued by zpool import -XF 4971 zpool import -T should accept hex values 4972 zpool import -T implies extreme rewind, and thus a scrub 4973 spa_load_retry retries the same txg 4974 spa_load_verify() reads all data twice
* MFC r268713: MFV r268702:delphij2014-07-295-1/+19
| | | | | | | Add missing *_destroy() calls in various places with ZFS. Illumos issue: 4975 missing mutex_destroy() calls in zfs
* MFC r268420:mav2014-07-241-1/+1
| | | | | | | | | | | | Remove IO_SYNC flag when writing extended file attributes on ZFS. While it is possible to create and write file, modify its permissions, etc. without ever doing sync, it looks odd that it is required for setting extended file attributes on ZFS. UFS does not do sync there too. Samba uses those extended attributes to store some its data, and doing it synchronously by many times reduces file creation performance for systems without SLOG device.
* MFC r268473: MFV r268455:delphij2014-07-2319-70/+160
| | | | Use reserved space for ZFS administrative commands.
* MFC r268464: MFV r268452:delphij2014-07-237-21/+66
| | | | | | | | Explicitly mark file removal transactions as "presumed to result in a net free of space" so they will not fail with ENOSPC. Illumos issue: 4950 files sometimes can't be removed from a full filesystem
* MFC r268116:delphij2014-07-172-73/+104
| | | | | | | | - Fix handling of "new" style of ioctl in compatiblity mode [1]; - Reorganize code and reduce diff from upstream; - Improve forward compatibility shims for previous kernel; Reported by: sbruno [1]
* MFC r268097:pfg2014-07-161-15/+13
| | | | | | | | | | | | | | | | MFV r260708 4427 pid provider rejects probes with valid UTF-8 names This make use of Solaris' u8_validate() which we happen to use since r185029 for ZFS. Use of u8_textprep.c required -Wno-cast-qual for powerpc. Illumos Revision: 1444d846b126463eb1059a572ff114d51f7562e5 Reference: https://www.illumos.org/issues/4427 Obtained from: Illumos
* MFC r268128: MFV r268122:delphij2014-07-153-1/+10
| | | | 4929 want prevsnap property
* MFC r268126: MFV r268121:delphij2014-07-154-35/+32
| | | | 4924 LZ4 Compression for metadata
* MFC r268123: MFV r268119:delphij2014-07-1524-109/+109
| | | | 4914 zfs on-disk bookmark structure should be named *_phys_t
* MFC r268086: MFV r267570:delphij2014-07-151-3/+22
| | | | 4756 metaslab_group_preload() could deadlock
* MFC r268085: MFV r267569:delphij2014-07-151-5/+13
| | | | 4897 Space accounting mismatch in L2ARC/zpool
* MFC r268082: MFV r267567:delphij2014-07-151-15/+20
| | | | | 4881 zfs send performance degradation when embedded block pointers are encountered
* MFC r268079: MFV r267566:delphij2014-07-1516-157/+294
| | | | | 4390 i/o errors when deleting filesystem/zvol can lead to space map corruption
* MFC r268075: MFV r267565:delphij2014-07-1539-236/+1313
| | | | | 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks
* MFC r266771: MFV r266766:delphij2014-07-156-22/+76
| | | | | | | | | | | | | | | | | | | Add a new zfs property, "redundant_metadata" which can have values "all" or "most". The default will be "all", which is the current behavior. When set to all, ZFS stores an extra copy of all metadata. If a single on-disk block is corrupt, at worst a single block of user data (which is recordsize bytes long) can be lost. Setting to "most" will cause us to only store 1 copy of level-1 indirect blocks of user data files. This can improve performance of random writes, because less metadata has to be written. In practice, at worst about 100 blocks (of recordsize bytes each) of user data can be lost if a single on-disk block is corrupt. The exact behavior of which metadata blocks are stored redundantly may change in future releases. Illumos issue: 3835 zfs need not store 2 copies of all metadata
* MFC r268290:pfg2014-07-131-14/+64
| | | | | | | | | | | | | Merge from OpenSolaris (24-Jul-2010): 6679140 asymmetric alloc/dealloc activity can induce dynamic variable drops 6679193 dtrace_dynvar walker produces flood of dtrace_dynhash_sink This finishes a set of merges from the older OpenSolaris releases. Still the FreeBSD port has many differences that are difficult to account for but that seems normal given that the kernels are different. Obtained from: OpenSolaris (through Illumos)
* MFC 267929, 267937, 267939, 267940, 267941, 267942, 267987, 268006:rpaulo2014-07-126-80/+1071
| | | | | | | | | | | | | | | | 2915 DTrace in a zone should see "cpu", "curpsinfo", et al 2916 DTrace in a zone should be able to access fds[] 2917 DTrace in a zone should have limited provider access 4477 DTrace should speak JSON Add stubs for CTF functions which are not yet implemented. 4474 DTrace Userland CTF Support 4475 DTrace userland Keyword 4476 DTrace tests should be better citizens 4479 pid provider types 4480 dof emulation is missing checks 4471 DTrace count() with histogram 4472 DTrace full width distribution histograms 4473 DTrace frequency trails
* MFC r268130, r268224, r268230, r268231:pfg2014-07-123-9/+48
| | | | | | | | | | | | | | | | | | Various DTrace Merges from OpenSolaris/Illumos: 15-Sep-2008: 6735480 race between probe enabling and provider registration 20-Apr-2008: 6822482 DOF validation needs to handle loadable sections flagged as unloadable 22-Apr-2009: 6823388 DTrace ioctl handlers must validate all structure members 30-Jun-2009: 6851093 system drops to kmdb with anonymous dtrace probes + kmdb Obtained from: OpenSolaris
* MFC r268125:pfg2014-07-061-2/+4
| | | | | | | | | | | | Small merges from OpenSolaris: These have no effect on FreeBSD, in fact they are ifdef'ed, but make easier future merges: 6699767 panic in spec_open() 6718877 crgetzoneid() use can cause problems when forking processes with USDT providers in a non global zone
* MFC r268178:mav2014-07-051-0/+4
| | | | | | | Fix bug in sync control in new "dev" mode of ZVOL (r265678). Don't check ZVOL_WCE flag, used in Solaris to control device "write cache". It is not applicable on FreeBSD and by default set to "disable".
* MVC r268014:pfg2014-07-021-7/+6
| | | | | | Reduce some warnings in the Solaris unicode support. Clean some warnings from parenthesis and minor style issues.
* MFC r267029, r267038:mav2014-06-171-0/+4
| | | | | | Replace gethrtime() with cpu_ticks(), as source of random for the taskqueue selection. gethrtime() in our port updated with HZ rate, so unusable for this specific purpose, completely draining benefit of multiple taskqueues.
* MFC r266915: MFV 266913+266914:delphij2014-06-061-5/+12
| | | | | 3897 zfs filesystem and snapshot limits (fix leak) 4901 zfs filesystem/snapshot limit leaks
* MFC r264885smh2014-05-261-26/+30
| | | | | | Eliminate duplicate checks in vdev_geom_io_intr error handling Sponsored by: Multiplay
* MFC r262329:markj2014-05-252-15/+34
| | | | | | | | | | Define the KM_NORMALPRI flag for kmem_alloc(), as it is used in some upstream DTrace code. MFC r262330: 1452 DTrace buffer autoscaling should be less violent illumos/illumos-gate@6fb4854bed54ce82bd8610896b64ddebcd4af706
OpenPOWER on IntegriCloud