| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Thanks to dim and pjd for the pointer to zfs_context.h for building
userland.
|
|
|
|
| |
building on universe.
|
|
|
|
| |
Commit the zfs piece.
|
|
|
|
|
|
|
|
|
| |
Fix a race by defining two tasks in the zio structure
as we can still be returning from issue task when interrupt task is used.
Tested by: pjd
Approved by: pjd, delphij (mentor)
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this case we call target function only on a single CPU and do not
need any synchronization at the setup stage.
It's a bit non-obvious but setup function of NULL means that
smp_rendezvous_cpus waits for all CPUs to arrive at the rendezvous
point, but without doing any actual setup. While using
smp_no_rendevous_barrier means that each CPU proceeds on its own
schedule without any synchronization whatsoever.
MFC after: 3 weeks
|
| |
|
|
|
|
|
|
|
|
| |
detected ashift does not support this. With this change, pools
created while stripesize=512 could not be imported when stripesize
becomes larger (on the same drive).
Noticed by: pjd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dealock was caused in the following way:
- thread T1 on CPU C1 holds a spin mutex, IPIs CPU C2 and waits for the
IPI to be handled
- C2 executes timer interrupt filter, thus has interrupts disabled, and
gets blocked on the spin mutex held by T1
The problem seems to have been introduced by simplifications made to
OpenSolaris code during porting.
The problem is fixed by reorganizing the code to more closely resemble
the upstream version. Interrupt filter (cyclic_fire) now doesn't
acquire any locks, all per-CPU data accesses are performed on a
target CPU with preemption and interrupts disabled thus precluding
concurrent access to the data.
cyp_mtx spin mutex is used to disable preemtion and interrupts; it's not
used for classical mutual exclusion, because xcall already serializes
calls to a CPU. It's an emulation of OpenSolaris
cyb_set_level(CY_HIGH_LEVEL) call, the spin mutexes could probably be
reduced to just a spinlock_enter()/_exit() pair.
Diff with upstream version is now reduced by ~500 lines, however it still
remains quite large - many things that are not needed (at the moment) or
are irrelevant on FreeBSD were simply ripped out during porting.
Examples of such things:
- support for CPU onlining/offlining
- support for suspend/resume
- support for running callouts at soft interrupt levels
- support for callout rebinding from CPU to CPU
- support for CPU partitions
Tested by: Artem Belevich <fbsdlist@src.cx>
MFC after: 3 weeks
X-MFC with: r216252
|
|
|
|
|
|
|
| |
smp_rendezvous_cpus already properly handles current CPU case
and non-SMP case.
MFC after: 3 weeks
|
|
|
|
|
|
|
| |
smp_rendezvous_cpus alreadt does the right thing in a very similar
fashion, so the code was kind of duplicating that.
MFC after: 3 weeks
|
|
|
|
|
|
|
| |
Also use pc_cpumask to be future-friendly.
Reviewed by: jhb
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
| |
alignment on drives with large sector sizes (e.g. 4 KiB) but the
implementation might need to be revisited if devices with large stripesizes
appear (e.g. if RAID controllers or flash drives start using the field),
probably by introducing a physsectorsize field in GEOM providers.
Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@
|
|
|
|
|
|
|
|
|
| |
with filesystems created under MacOS X ZFS port. This is kind of filesystem
corruption (we don't allow for setting empty ACLs), so make acl_get_file(3)
and related syscalls fail with EINVAL in that case. In theory, we could
return empty ACL to userland, but I'm afraid this would break some code.
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate
page cache. When sendfile stumbles upon a page that is not populated
yet, it sends out all the mbufs that it collected so far. This
resulted in very poor performance with ZFS when file data is not in the
page cache, because ZFS vop_read for UIO_NOCOPY case populated only
those pages that are already in cache, but not valid. Which means that
most of the time it populated only the first requested page in the
described above scenario.
Reported by: Alexander Zagrebin <alexz@visp.ru>
Tested by: Alexander Zagrebin <alexz@visp.ru>,
Artemiev Igor <ai@kliksys.ru>
MFC after: 12 days
|
|
|
|
|
| |
Reported by: Daniel Braniss <danny@cs.huji.ac.il>
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and VFS_RELE on a non-existing hold on snapshot parent's z_vfs.
This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4
(bug IDs: 6792139, 6794830) - not applicable to FreeBSD.
This fixes the process hang if umounting a manually mounted snapshot.
Reported by: Alexander Zagrebin <alexz@visp.ru>
Approved by: delphij (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
| |
what we have. Without the check the kernel could accessing memory that
does not belong to the request struct.
Note that we do not test if the struct equals in size at this time, which
may faciliate forward compatibility with newer binaries.
Reviewed by: pjd at MeetBSD CA '2010
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
| |
OpenSolaris onnv-revision: 10209:91f47f0e7728
6830541 zfs_get_data_trips on a verify
6696242 multiple zfs_fillpage() zfs: accessing past end of object panics
6785914 zfs fails to drop dn_struct_rwlock in recovery code path
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should make vnode_pager_getpages path a bit shorter and clearer.
Also this should eliminate problems with partially valid pages.
Having this method opens room for future optimizations.
To do: try to satisfy other pages besides the required one taking into
account tradeofs between number of page faults, read throughput and read
latency. Also, eventually vop_putpages should be added too.
Reviewed by: kib, mm, pjd
MFC after: 3 weeks
|
|
|
|
| |
Found with: clang
|
|
|
|
| |
Found with: clang
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since r212650 and before this change sendfile(2) could produce
a partially valid page for a trailing portion of a ZFS vnode.
vm_fault() always wants to see a fully valid page even if it's the last
page that partially extends beyond vnode's end. Otherwise it calls
vop_getpages() to bring in the page. In the case of ZFS this means
that the data is read from the page into the same page and this breaks
checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in
vm_fault() will get blocked forever waiting for it to be cleared.
Many thanks to Kai and Jeremy for reproducing the issue and providing
important debugging information and help.
Reported by: Kai Gallasch <gallasch@free.de>,
Jeremy Chadwick <freebsd@jdc.parodius.com>
Tested by: Kai Gallasch <gallasch@free.de>,
Jeremy Chadwick <freebsd@jdc.parodius.com>
Reviewed by: kib
MFC after: 3 days
To-Do: apply the same treatment to tmpfs + sendfile
|
|
|
|
|
|
|
|
| |
VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write
operations.
Reviewed by: mm
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
This corrects writing to append-only files on ZFS.
PR: kern/149495 [1], kern/151082 [2]
Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2]
Approved by: delphij (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
physical memory
This is needed to correctly autotune ZFS ARC size when vm_kmem_size is
set to value larger than available physical memory.
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted
OpenSolaris revision and Bug IDs:
9725:0bf7402e8022
6843014 ZFS B_FAILFAST handling is broken
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6843014)
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenSolaris revision and Bug IDs:
9701:cc5b64682e64
6803605 should be able to offline log devices
6726045 vdev_deflate_ratio is not set when offlining a log device
6599442 zpool import has faults in the display
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442)
MFC after: 3 weeks
|
|
|
|
|
|
|
|
| |
zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths
and it seems to be a not very good idea to do cpu pinning there.
Suggested by: kib
MFC after: 2 weeks
|
|
|
|
| |
MFC after: 2 weeks
|
|
|
|
|
|
| |
PR: kern/146410, kern/138790
MFC after: 3 weeks
X-MFC with: r212780
|
|
|
|
|
|
| |
Picked from analogous code in tmpfs.
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
Those checks are not present in upstream code and they are enforced in
actual calculations of delta by which ARC size can be grown or should be
reduced.
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_paging_target() is not a trigger of any kind for pageademon, but
rather a "soft" target for it when it's already triggered.
Thus, trying to keep 2048 pages above that level at the expense of ARC
was simply driving ARC size into the ground even with normal memory
loads.
Instead, use a threshold at which a pagedaemon scan is triggered, so
that ARC reclaiming helps with pagedaemon's task, but the latter still
recycles active and inactive pages.
PR: kern/146410, kern/138790
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix possible loss of correct error return code in ZFS mount
OpenSolaris revisions and Bug IDs:
11824:53128e5db7cf
6863610 ZFS mount can lose correct error return
12079:13822b941977
6939941 problem with moving files in zfs (142901-12)
Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6863610, 6939941)
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mirrors code in tmpfs.
This changge shouldn't affect much read path, it may cause unnecessary
vm_page_lookup calls in the case where v_object has no active or inactive
pages but has some cache pages. I believe this situation to be non-essential.
In write path this change should allow us to properly detect the above
case and free a cache page when we write to a range that corresponds to it.
If this situation is undetected then we could have a discrepancy between
data in page cache and in ARC or on disk.
This change allows us to re-enable vn_has_cached_data() check in zfs_write.
NOTE: strictly speaking resident_page_count and cache fields of v_object
should be exmined under VM_OBJECT_LOCK, but for this particular usage
we may get away with it.
Discussed with: alc, kib
Approved by: pjd
Tested with: tools/regression/fsx
MFC after: 3 weeks
|
|
|
|
|
|
|
|
| |
uint64_t, int64_t were redundant there
Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
Reviewed by: alc
Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
Reviewed by: alc
Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).
PR: kern/141305
Reported by: Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by: alc
Tested by: tools/regression/sockets/sendfile
MFC after: 2 weeks
|
|
|
|
|
|
| |
PR: kern/150544
Approved by: delphij (mentor)
MFC after: 1 day
|
|
|
|
|
|
|
|
|
| |
Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir
PR: kern/150544
Approved by: delphij (mentor)
Obtained from: perforce (pjd)
MFC after: 1 day
|
|
|
|
| |
Reviewed by: alc
|
|
|
|
|
|
|
|
|
| |
* processes now can't go away while we are inserting probes (fixes a panic)
* if a trap happens, we won't be holding the process lock (fixes a hang)
* fix a LOR between the process lock and the fasttrap bucket list lock
Thanks to kib for pointing some problems.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
| |
fasttrap_tracepoint_enable().
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
| |
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
|
|
|
|
|
| |
Reported by: keramida
MFC after: 2 weeks
|
|
|
|
|
|
| |
vdevs, so don't deny adding new vdevs if bootfs property is set.
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
* when the process exits, remove the associated USDT probes
* when the process forks, duplicate the USDT probes.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.
sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.
sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.
Sponsored by: Spectra Logic Corporation
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
programs that refuse to run as root (pgsql) to install probes when their
user is part of the wheel group.
Sponsored by: The FreeBSD Foundation
> Description of fields to fill in above: 76 columns --|
> PR: If a GNATS PR is affected by the change.
> Submitted by: If someone else sent in the change.
> Reviewed by: If someone else reviewed your modification.
> Approved by: If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email.
> Security: Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.
M dev/dtrace/dtrace_load.c
|