| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Add `gmirror create` subcommand, alike to gstripe, gconcat, etc.
It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.
Sponsored by: iXsystems, Inc.
|
|
|
|
| |
Refine r301173 a bit.
|
|
|
|
| |
Avoid sleeping when the mirror I/O queue is non-empty.
|
|
|
|
| |
Use providergone method to cover race between destroy and g_access().
|
|
|
|
|
| |
Introduce internal counter to track opens. Using provider's counters is
not very successfull after calling g_wither_provider().
|
|
|
|
| |
gmirror: Use bool instead of boolean_t.
|
|
|
|
|
| |
It is just a helper function combining G_PF_WITHER setting with
g_orphan_provider().
|
|
|
|
| |
Don't treat an error from g_mirror_clear_metadata() as fatal.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will result in lock recursion and is more generally incorrect since
the completion handlers will just reinsert the BIOs into the queue we're
trying to drain.
Reviewed by: imp, ngie
Approved by: re (gjb)
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D6908
|
|
|
|
|
|
|
|
|
|
| |
otherwise the system will hang.
This is a temporarily least intrusive crutch to get certain panicing systems
dumping. The proper fix should question is g_mirror_destroy() should be called
on a panicing system at all.
Discussed with: mav
|
|
|
|
| |
Sponsored by: The FreeBSD Foundation
|
|
|
|
| |
No functional change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In struct:gctl_req, nargs is unsigned.
In mirror:
g_mirror_syncreqs is unsigned.
In raid:
in struct:g_raid_volume, v_disks_count is unsigned.
In virstor:
in struct:g_virstor_softc, n_components is unsigned.
MFC after: 2 weeks
|
|
|
|
| |
Differential Revision: https://reviews.freebsd.org/D5784
|
|
|
|
|
|
|
|
|
| |
for all struct bio you get back from g_{new,alloc}_bio. Temporary
bios that you create on the stack or elsewhere should use this before
first use of the bio, and between uses of the bio. At the moment, it
is nothing more than a wrapper around bzero, but that may change in
the future. The wrapper also removes one place where we encode the
size of struct bio in the KBI.
|
|
|
|
|
|
|
|
|
|
| |
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
|
|
|
|
| |
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
When CPU is not busy, those queues are typically empty. When CPU is busy,
then one more extra sorting is the last thing it needs. If specific device
(HDD) really needs sorting, then it will be done later by CAM.
This supposed to fix livelock reported for mirror of two SSDs, when UFS
fires zillion of BIO_DELETE requests, that totally blocks I/O subsystem by
pointless sorting of requests and responses under single mutex lock.
MFC after: 2 weeks
|
|
|
|
|
|
| |
While there, use bioq_takefirst() in place where it is convenient.
MFC after: 1 week
|
|
|
|
|
|
|
| |
Do not report GEOM::candelete if none of providers support BIO_DELETE.
If consumer still requests BIO_DELETE, report error instead of hanging.
MFC after: 2 weeks
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
|
|
|
|
|
| |
Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the problem, when gmirror starts again just after stop.
The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.
Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.
MFC after: 2 weeks
|
|
|
|
|
| |
PR: 184985
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.
Silence from: geom@
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.
To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.
Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).
To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.
This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).
Sponsored by: iXsystems, Inc.
MFC after: 2 months
|
|
|
|
|
| |
The G_MIRROR_DEBUG() macro already appends a newline. Also, most of the
log messages emitted by gmirror start with an uppercase letter.
|
|
|
|
| |
Submitted by: Brenden Fabeny
|
|
|
|
|
| |
Obtained from: Netflix
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
| |
This allows to use gmirror e.g. on top of ZVOLs.
PR: kern/175323
Submitted by: Alexei.Volkov@softlynx.ru, mav
Reported by: Alexei.Volkov@softlynx.ru
Tested by: Alexei.Volkov@softlynx.ru
Reviewed by: ae, mav, pjd
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.
ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.
PR: kern/113957
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bytes syncronized.
The rationale behind this is the following: for large disks the
percent synchronisation counter ticks too seldom, and monitoring
software (as well as human operator) can't tell whether
synchronisation goes on or one of disks got stuck. On an idle
server one can look into gstat and see whether synchronisation goes
on or not, but on a busy server that won't work. Also, new value
monitored can be differentiated obtaining the synchronisation speed
quite precisely.
Submitted by: Konstantin Kukushkin <dark ramtel.ru>
Reviewed by: pjd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
we need to pass BIO_DELETE requests down to providers that support
it. Also, we need to announce our support for BIO_DELETE to upper
consumer. This requires:
- In g_mirror_start() return true for "GEOM::candelete" request.
- In g_mirror_init_disk() probe below provider for "GEOM::candelete"
attribute, and mark disk with a flag if it does support BIO_DELETE.
- In g_mirror_register_request() distribute BIO_DELETE requests only
to those disks, that do support it.
Note that we announce "GEOM::candelete" as true unconditionally of
whether we have TRIM-capable media down below or not. This is made
intentionally, because upper consumer (usually UFS) requests the
attribite only once at mount time. And if user ever migrates his
mirror from HDDs to SSDs, then he/she would get TRIM working without
remounting filesystem.
Reviewed by: pjd
|
|
|
|
|
|
|
| |
BIO_DELETE requests same way as BIO_WRITE removing them from
queue. This fixes panic with BIO_DELETE operations on geom_mirror.
Reviewed by: pjd
|
|
|
|
|
|
| |
PR: kern/154860
Reviewed by: pjd
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
protect geom from destroying while it is tasting.
PR: kern/154860
Reviewed by: pjd
MFC after: 1 week
|
|
|
|
|
|
| |
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
|
|
|
|
| |
Reviewed by: pjd
|
|
|
|
|
| |
ignore adX/adaY difference in both directions to simplify migration to
the CAM-based ATA or back.
|
|
|
|
|
|
|
|
|
|
|
| |
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.
Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: silence on geom@ during 2 weeks
X-MFC after: to be determined in last commit with code from this project
|
|
|
|
| |
- Make optional string values always an empty string.
|
|
|
|
| |
MFC after: 1 week
|
| |
|
|
|
|
| |
components, hoping others fit, if they are not equal.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.
PR: kern/113885
Reviewed by: sobomax
|
|
|
|
| |
Submitted by: Mel Flynn
|
|
|
|
|
|
|
| |
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.
Requested by: scottl
|
|
|
|
| |
in situations where sleeping isnt allowed.
|