summaryrefslogtreecommitdiffstats
path: root/sys/cddl
Commit message (Collapse)AuthorAgeFilesLines
* MFV r251619:delphij2013-06-1111-12/+151
| | | | | | | | | ZFS needs better comments. Illumos ZFS issues: 3741 zfs needs better comments MFC after: 2 weeks
* MFV r251519:delphij2013-06-083-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Illumos ZFS issue #3805 arc shouldn't cache freed blocks Quote from the Illumos issue: ZFS should proactively evict freed blocks from the cache. Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons: 1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again. 2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU) The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata. Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks). On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed. The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue. MFC after: 2 weeks
* MFV r251474:delphij2013-06-067-79/+415
| | | | | | | | | | | | | * Illumos zfs issue #3137 L2ARC compression Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled. The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario. MFC after: 2 weeks
* SDT probes can directly pass up to five arguments as arguments tomarkj2013-06-022-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dtrace_probe(). Arguments beyond these five must be obtained in an architecture-specific way; this can be done through the getargval provider method, and through dtrace_getarg() if getargval isn't overridden. This change fixes two off-by-one bugs in the way these arguments are fetched in FreeBSD's DTrace implementation. First, the SDT provider must set the aframes parameter to 1 when creating a probe. The aframes parameter controls the number of frames that dtrace_getarg() will step over in order to find the frame containing the extra arguments. On FreeBSD, dtrace_getarg() is called in SDT probe context via dtrace_probe()->dtrace_dif_emulate()->dtrace_dif_variable->dtrace_getarg() so aframes must be 3 since the arguments are in dtrace_probe()'s frame; it was previously being called with a value of 2 instead. illumos uses a different aframes value for SDT probes, but this is because illumos SDT probes fire by triggering the #UD fault handler rather than calling dtrace_probe() directly. The second bug has to do with the way arguments are grabbed out dtrace_probe()'s frame on amd64. The code currently jumps over the first stack argument and retrieves the rest of them using a pointer into the stack. This works on i386 because all of dtrace_probe()'s arguments will be on the stack and the first argument is the probe ID, which should be ignored. However, it is incorrect to ignore the first stack argument on amd64, so we correct the pointer used to access the arguments. MFC after: 2 weeks
* Port the SDT test now that it's possible to create SDT probes that takemarkj2013-06-021-0/+37
| | | | | | | | | | | | seven arguments. The original test uses Solaris' uadmin system call to trigger the test probe; this change adds a sysctl to the dtrace_test module and gets the test program to trigger the test probe via the sysctl handler. The test is currently failing on amd64 because of some bugs in the way that probe arguments beyond the first five are obtained - these bugs will be fixed in a separate change.
* The fasttrap provider cleans up probes asynchronously when a process withmarkj2013-05-241-64/+49
| | | | | | | | | | | | | | | USDT probes exits. This was previously done with a callout; however, it is possible to sleep while holding the DTrace mutexes, so a panic will occur on INVARIANTS kernels if the callout handler can't immediately acquire one of these mutexes. This panic will be frequently triggered on systems where a USDT-enabled program (perl, for instance) is often run. This revision changes the fasttrap cleanup mechanism so that a dedicated thread is used instead of a callout. The old behaviour is otherwise preserved. Reviewed by: rpaulo MFC after: 1 month
* Bring back part of r249367 by adding DTrace's temporal option, which allowsmarkj2013-05-124-131/+192
| | | | | | | | | | | | | | | | | | | | | | | | users to guarantee that the output of DTrace scripts will be time-ordered. This option is enabled by adding the line #pragma D option temporal to the beginning of a script, or by adding '-x temporal' to the arguments of dtrace(1). This change fixes a bug in the original port of the temporal option. This bug was causing some assertions to fail, so they had been disabled; in this revision the assertions are working properly and are enabled. The DTrace version number has been bumped from 1.9.0 to 1.9.1 to reflect the language change that's being introduced. This change corresponds to part of illumos-gate commit e5803b76927480: 3021 option for time-ordered output from dtrace(1M) Reviewed by: pfg Obtained from: illumos MFC after: 1 month
* In case ZFS doesn't use UMA for buffers there's no need to waste memorydavide2013-05-011-0/+3
| | | | | | creating zones that will remain empty. Reviewed by: pjd
* Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabledsmh2013-04-264-15/+16
| | | | | | | | Enabled ZFS TRIM by default Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
* MFV r249857:mm2013-04-245-14/+78
| | | | | | | | | | | | | Merge vendor bugfix for a possible deadlock related to async destroy and improve write performance by introducing a new lock protecting tx_open_txg. Illumos ZFS issues: 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock MFC after: 1 week
* The zfs synctask code restructuring introduced a new bug that makes itmm2013-04-232-14/+24
| | | | | | | | | | | | impossible to set quota and reservation on pools lower than version 22. Problem has been reported and a solution discussed with vendor. Illumos ZFS issues: 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reported by: Steve Wills <swills@FreeBSD.org> MFC after: 3 days
* DTrace: Revert r249367pfg2013-04-173-189/+131
| | | | | | | | | | | | | | | | | | | | | | The following change from illumos brought caused DTrace to pause in an interactive environment: 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This was not detected during testing because it doesn't affect scripts. We shouldn't be changing the environment, especially since the LD_NOLAZYLOAD option doesn't apply to our (GNU) ld. Unfortunately the change from upstream was made in such a way that it is very difficult to separate this change from the others so, at least for now, it's better to just revert everything. Reference: https://www.illumos.org/issues/3026 Reported by: Navdeep Parhar and Mark Johnston
* DTrace: option for time-ordered outputpfg2013-04-113-131/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge changes from illumos: 3021 option for time-ordered output from dtrace(1M) 3022 DTrace: keys should not affect the sort order when sorting by value 3023 it should be possible to dereference dynamic variables 3024 D integer narrowing needs some work 3025 register leak in D code generation 3026 libdtrace should set LD_NOLAZYLOAD=1 to help the pid provider This brings yet another feature implemented in upstream DTrace. A complete description is available here: http://dtrace.org/blogs/ahl/2012/07/28/my-new-dtrace-favorite/ This change bumps the DT_VERS_* number to 1.9.1 in accordance to what is done in illumos. This change was somewhat complicated because upstream is mixed many changes in an individual commit and some of the tests don't really apply to us. There are also appear to be differences in timestamping with Solaris so we had to workaround some assertions making sure no regression happened. Special thanks to Fabian Keil for changes and testing. Illumos Revisions: 13758:23432da34147 Reference: https://www.illumos.org/issues/3021 https://www.illumos.org/issues/3022 https://www.illumos.org/issues/3023 https://www.illumos.org/issues/3024 https://www.illumos.org/issues/3025 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 months
* MFV r249354:mm2013-04-111-2/+2
| | | | | | | | | | | Merge bugfixes accepted and integrated by vendor. Underlying problems have been reported by us and fixed in r240942 and r249196. Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 8 days
* Cast to (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmdmm2013-04-101-2/+2
| | | | MFC after: 9 days
* ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl()mm2013-04-093-26/+85
| | | | | | | | | | | | | | | doesn't copyout in this case. To solve this issue a new struct zfs_iocparm_t is introduced consisting of: - zfs_ioctl_version (future backwards compatibility purposes) - user space pointer to zfs_cmd_t (copyin and copyout) - size of zfs_cmd_t (verification purposes) The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way what makes porting of new changes easier and ensures correct behavior if returning an error. MFC after: 10 days
* MFV r249186:mm2013-04-062-3/+35
| | | | | | | | | | | Do not list read-only pools in zpool.cache Reduce diff against vendor in unused vdev_disk.c Illumos ZFS issues: 3639 zpool.cache should skip over readonly pools 3640 want automatic devid updates MFC after: 1 week
* MFV r248660:mm2013-04-064-12/+13
| | | | | | | | | Merge vendor change - modify time processing in deadman thread. Illumos ZFS issues: 3618 ::zio dcmd does not show timestamp data MFC after: 3 weeks
* Provide a fix for kernel panic if receiving recursive deduplicated streams.mm2013-04-061-4/+5
| | | | | | | | | Problem reported to vendor. Illumos ZFS issues: 3692 Panic on zfs receive of a recursive deduplicated stream MFC after: 2 weeks
* MFV r248217:mm2013-04-0656-852/+995
| | | | | | | | | | Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet. Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs MFC after: 3 weeks
* MFV r242816:mm2013-04-061-1/+3
| | | | | | | Import vendor change to reduce diff, no effect on FreeBSD. Illumos ZFS issues: 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd
* spa_open_common: fix argument to zvol_create_minorsavg2013-04-031-1/+1
| | | | | | | | | | | | | Prior to r248571 spa_open was always called with a bare pool name, but now it is called with a dataset name instead (spa_lookup handles that). So, when a ZFS root is mounted spa_open is called with a name of a root dataset, which can very well be different from the pool name. But zvol_create_minors should be called with the pool name, because it performs a recursive traversal of all datasets under the name to find all those that are volumes. MFC after: 7 days
* Fix possible pool hold leak in dmu_send_impl()mm2013-04-031-3/+3
| | | | | | | | Problem reported to vendor: https://www.illumos.org/issues/3645 Reported by: Andriy Gapon <avg@FreeBSD.org> MFC after: 15 days
* Do not check against uninitialized rc and comment out vendor codemm2013-04-021-2/+5
| | | | MFC after: 16 days
* Dtrace: enablings on defunct providers prevent providers from unregisteringpfg2013-04-013-15/+171
| | | | | | | | | | | | | | | | | | | | | | Merge change from illumos: 1368 enablings on defunct providers prevent providers from unregistering We try to address some underlying differences between the Solaris and FreeBSD implementations: dtrace_attach() / dtrace_detach() are currently unimplemented in FreeBSD but the new code from illumos makes use of taskq so some adaptations were made to dtrace_open() and dtrace_close() to handle them appropriately. Illumos Revision: r13430:8e6add739e38 Reference: https://www.illumos.org/issues/1368 Reviewed by: gnn Tested by: Fabian Keil Obtained from: Illumos MFC after: 3 weeks
* Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock heldmm2013-04-011-0/+2
| | | | | Submitted by: Andriy Gapon <avg@FreeBSD.org> MFC after: 17 days
* Dtrace: dtrace.c erroneously checks for memory alignment on amd64.pfg2013-03-261-1/+1
| | | | | | | | | | | | | | Merge change from illumos: 3511 dtrace.c erroneously checks for memory alignment on amd64 Illumos Revision: c93cc65 Reference: https://www.illumos.org/issues/3511 Obtained from: Illumos MFC after: 3 weeks
* Dtrace: Add SUN MDB-like type-aware print() action.pfg2013-03-251-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Merge change from illumos: 1694 Add type-aware print() action This is a very nice feature implemented in upstream Dtrace. A complete description is available here: http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/ This change bumps the DT_VERS_* number to 1.9.0 in accordance to what is done in illumos. While here also include some minor cleanups to ease further merging and appease clang with a fix by Fabian Keil. Illumos Revisions: 13501:c3a7090dbc16 13483:f413e6c5d297 Reference: https://www.illumos.org/issues/1560 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
* Dtrace: add toupper()/tolower() and enhancements to lltostr().pfg2013-03-252-13/+84
| | | | | | | | | | | | | | | | | | | | | | | | | Merge changes from illumos: 1451 DTrace needs toupper()/tolower() subroutines 1457 lltostr() D subroutine should take an optional base This change bumps the DT_VERS_* number to 1.8.1 in accordance to what is done in illumos. The test suite we currently include is outdated and doesnt support some updates in tst.subr.d which had to be left out for now. Illumos Revisions: r13458 5e394d8db762 r13459 c3454574dd1a Reference: https://www.illumos.org/issues/1451 https://www.illumos.org/issues/1457 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
* Dtrace: add optional size argument to tracemem().pfg2013-03-242-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Merge change from illumos: 1455 DTrace tracemem() should take an optional size argument Our local enhancements to dt_print_bytes were equivalent to those in illumos but we made it match the illumos version to ease further code merges. For now leave out tst.smallsize.d and tst.smallsize.d.out since those don't seem to work cleanly on FreeBSD. This change bumps the DT_VERS_* number to 1.7.1 in accordance to what is done in illumos. Illumos Revision: 13457:571b0355c2e3 Reference: https://www.illumos.org/issues/1455 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month
* ZFS: Fix a panic while unmounting a busy filesystem.will2013-03-231-0/+2
| | | | | | | | | | | | | | | | This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic. Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check. Submitted by: avg Reviewed by: avg Approved by: ken (mentor) MFC after: 1 month
* fbt_getargdesc: correctly handle types for return probesavg2013-03-231-5/+17
| | | | MFC after: 6 days
* fbt_typoff_init: fix an off by one in determining required memory sizeavg2013-03-231-0/+2
| | | | | | | | | | | This issue would be silent most of the time, but if the requested memory is a multiple of a page size, then accessing one element beyond the end would lead to a kernel page fault. Otherwise, the unlucky last type would just be inaccessible. Reported by: glebius Tested by: glebius MFC after: 6 days
* Fix for building libzpool under i386.smh2013-03-211-1/+1
| | | | | | Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
* Add missing descriptions for ZFS sysctlssmh2013-03-211-2/+3
| | | | | | Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
* Optimisation of TRIM processing.smh2013-03-211-34/+97
| | | | | | | | | | | | | | | | | | | | | | | | | Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed. In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time. This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds) Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
* Names the ZFS TRIM threadsmh2013-03-211-0/+5
| | | | | | Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
* TRIM cache devices based on time instead of TXGs.smh2013-03-212-7/+28
| | | | | | | | | | | | | | | | | | | | | | Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/17122c31ac7f82875e837019205c21651c05f8cd MFC after: 2 weeks
* Improve TXG handling in the TRIM module.smh2013-03-214-10/+9
| | | | | | | | | | | | | | | | | | | | This patch adds some improvements to the way the trim module considers TXGs: - Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/5b46ad40d9081d75505d6f3bf04ac652445df366 MFC after: 2 weeks
* Don't register repair writes in the trim map.smh2013-03-211-6/+11
| | | | | | | | | | | | | | | | | The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes. I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: Source: https://github.com/dechamps/zfs/commit/6a3cebaf7c5fcc92007280b5d403c15d0e61dfe3
* Add TRIM support for L2ARC.smh2013-03-214-8/+14
| | | | | | | | | | | | | This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/31aae373994fd112256607edba7de2359da3e9dc MFC after: 2 weeks
* Release hold on pool before calling zvol_create_minor()mm2013-03-201-1/+4
|
* Run zvol_create_minors() only if in non-error casemm2013-03-191-4/+6
|
* Run zvol_create_minors() on snapshot creationmm2013-03-191-0/+9
|
* MFV r247580:mm2013-03-1964-5523/+5846
| | | | | | | | | | | Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd.
* MFC @248493mm2013-03-192-2/+3
|\
| * Plug memory leak in dsl_check_snap_cb()mm2013-03-191-1/+2
| | | | | | | | | | | | This was unnoticed because the function is very rarely used. MFC after: 3 days
| * Partially revert r195702. Deferring stops is now implemented via a set ofjhb2013-03-181-1/+1
| | | | | | | | | | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
* | Add missing zvol_create_mirrors() on zfs_ioc_create()mm2013-03-181-0/+4
| |
* | MFC @248461mm2013-03-186-46/+1434
|\ \ | |/
OpenPOWER on IntegriCloud