| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Change knlist_destroy() to assertion.
|
|
|
|
| |
Style.
|
|
|
|
| |
Some optimizations for kqueue timers.
|
|
|
|
| |
Some style.
|
|
|
|
|
|
| |
Do not clear KN_INFLUX when not owning influx state.
PR: 214923
|
|
|
|
| |
Switch from stdatomic.h to atomic.h for kernel.
|
|
|
|
|
|
| |
Provide high precision conversion from ns,us,ms -> sbintime in kevent.
Tested by: ian
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Define SBT_MAX.
MFC r267896 (by davide):
Improve r264388.
MFC note. The SBT_MAX definition already existed on stable/10, but without
the refinement from r267896. Also, consumers of SBT_MAX were not converted,
since r264388 was not merged properly.
Reviewed by: mav
|
|
|
|
| |
Explicitely check for the valid range of file descriptor values.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT
NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
Do not let the two events be combined, since one would overwrite
the other's data.
PR: 180385
Submitted by: David A. Bright <david_a_bright@dell.com>
Sponsored by: Dell Inc.
|
|
|
|
|
|
|
|
|
|
|
|
| |
For future use in the Linuxulator:
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp()
counterpart for kern_kevent() with file pointer parameter instead
of file descriptor an pass the buck to it.
Suggested by: mjg [2]
|
|
|
|
| |
Use SLIST_FOREACH_SAFE() to fix iteration.
|
|
|
|
|
|
|
|
|
| |
Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
Sponsored by: Google, Inc.
|
|
|
|
| |
Extend kqueue's EVFILT_TIMER by adding precision unit flags support.
|
|
|
|
|
|
|
|
|
| |
Fix overflow for timeout values of more than 68 years, which is the maximum
covered by sbintime (LONG_MAX seconds).
MFC r259633 (by se):
Fix compilation on 32 bit architectures and use INT64_MAX instead of
LONG_MAX for the upper bound check.
|
| |
|
|
|
|
|
|
| |
Fix a race between kqueue_register() and kqueue_scan() setting KN_INFLUX
flag while knlist is not locked, which caused lost notifications from
parallel knote().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r258148:
Add a note that this file is compiled as part of the kernel and libc.
Requested by: kib
r258149:
Change cap_rights_merge(3) and cap_rights_remove(3) to return pointer
to the destination cap_rights_t structure.
This already matches manual page.
r258150:
Sync return value with actual implementation.
r258151:
Style.
r258152:
Precisely document capability rights here too (they are already documented
in rights(4)).
r258153:
The CAP_LINKAT, CAP_MKDIRAT, CAP_MKFIFOAT, CAP_MKNODAT, CAP_RENAMEAT,
CAP_SYMLINKAT and CAP_UNLINKAT capability rights make no sense without
the CAP_LOOKUP right, so include this rights.
r258154:
- Move CAP_EXTATTR_* and CAP_ACL_* rights to index 1 to have more room
in index 0 for the future.
- Move CAP_BINDAT and CAP_CONNECTAT rights to index 0 so we can include
CAP_LOOKUP right in them.
- Shuffle the bits around so there are no gaps. This is last chance to do
that as all moved rights are not used yet.
r258181:
Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had
a very hard time to fully understand) with much more intuitive rights:
CAP_EVENT - when set on descriptor, the descriptor can be monitored
with syscalls like select(2), poll(2), kevent(2).
CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2)
syscall can be called on this kqueue to with the eventlist
argument set to non-NULL value; in other words the given
kqueue descriptor can be used to monitor other descriptors.
CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2)
syscall can be called on this kqueue to with the changelist
argument set to non-NULL value; in other words it allows to
modify events monitored with the given kqueue descriptor.
Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and
CAP_KQUEUE_CHANGE.
Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT.
r258182:
Correct right names.
Sponsored by: The FreeBSD Foundation
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
negative timeout both before and after the conversion to sbintime_t.
For periodic kqueue timer, convert zero timeout into 1ms, to avoid
interrupt storm on fast event timers.
Reported and tested by: pho
Discussed with: mav
Reviewed by: davide
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)
|
|
|
|
|
|
|
|
|
|
|
|
| |
code could need to remove a kqueue from the filedesc list. Global
lock is already locked, which causes sleepable after non-sleepable
lock acquisition.
Reported and tested by: pho
Reviewed by: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
|
|
|
|
| |
Approved by: re (delphij)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.
Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.
Approved by: re (delphij)
Sponsored by: Google Summer of Code
Submitted by: Yuri Victorovich <yuri at rawbw dot com>
Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
|
|
|
|
|
|
|
|
|
| |
time removal on kqueue close.
Reported and tested by: pho
Reviewed by: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.
The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
|
|
| |
vnode backed file descriptors have this method implemented.
Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
| |
- Set NOTE_TRACKERR before running filt_proc(). If the knote did not
have NOTE_FORK set in fflags when registered, then the TRACKERR event
could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events. The special
handling for pids is done knote_fork() directly and no longer in
filt_proc().
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if NOTE_EXIT is not being monitored. The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT. This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
knote without signalling it to userland and use this when a process
exits but the fflags in the knote is zero.
Reviewed by: jmg
MFC after: 1 month
|
|
|
|
|
|
|
|
|
| |
In order to get some coverage of C11 atomics in kernelspace, switch at
least one piece of code in kernelspace to use C11 atomics instead of
<machine/atomic.h>.
While there, slightly improve the code by adding an assertion to prevent
the use count from going negative.
|
|
|
|
|
|
| |
optimize it out.
Submitted by: bde
|
|
|
|
| |
Submitted by: bde
|
|
|
|
|
|
| |
select(), nanosleep() and kevent() functions after calloutng changes.
Reported by: bde
|
|
|
|
|
|
|
|
|
| |
- Rewrite kevent() timeout implementation to allow sub-tick precision.
- Make the interval timings for EVFILT_TIMER more accurate. This also
removes an hack introduced in r238424.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
adds an extra tick to account for the current partial clock tick. However,
that is not appropriate for a repeating timer when the exact tvtohz() value
should be used for subsequent intervals. Fix repeating callouts for
EVFILT_TIMER by subtracting 1 tick from the tvtohz() result similar to the
fix used in realitexpire() for interval timers.
While here, update a few comments to note that if the EVFILT_TIMER code
were to move out of kern_event.c, it should move to kern_time.c (where the
interval timer code it mimics lives) rather than kern_timeout.c.
MFC after: 1 month
|
|
|
|
| |
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
| |
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues
Reviewed by: glebius
Approved by: ae(mentor)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.
That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
| |
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)
|
|
|
|
|
|
|
|
| |
Change the names of a couple of capability rights to be less
FreeBSD-specific.
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.
Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc
|
|
|
|
|
|
|
|
| |
and remove the falloc() version that lacks flag argument. This is done
to reduce the KPI bloat.
Requested by: jhb
X-MFC-note: do not
|
|
|
|
|
|
| |
LOR: 185
Submitted by: Matthew Fleming @ Isilon
MFC after: 1 week
|
|
|
|
|
| |
Bumped into by: ed
MFC after: 3 days
|
|
|
|
|
|
| |
at add it again.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
r195175. Remove all definitions, documentation, and usage.
fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.
Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature
|
|
|
|
|
|
|
|
|
|
| |
unlocked. fdrop() closes file descriptor when reference count goes to
zero. Close method for vnodes locks the vnode, resulting in "sleepable
after non-sleepable". For pipes, pipe mutex is before kqueue lock,
causing LOR.
Reported and tested by: pho
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
| |
contained only SLIST_HEAD as its member, thus sizeof(struct klist) would
equal to sizeof(struct klist *), so this change makes the code more
correct in terms of semantics, but should be a no-op to compiler at this
time.
Reported by: MQ <antinvidia at gmail com>
|
|
|
|
|
| |
Requested by: bde
Approved by: ed (mentor)
|
|
|
|
| |
Approved by: ed (mentor, implicit)
|