summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Revisit the capability failure trace points. The initial implementationdes2011-10-183-5/+27
| | | | | | | | only logged instances where an operation on a file descriptor required capabilities which the file descriptor did not have. By adding a type enum to struct ktr_cap_fail, we can catch other types of capability failures as well, such as disallowed system calls or attempts to wrap a file descriptor with more capabilities than it had to begin with.
* Fix double vision syndrome (read: double output) when in themarcel2011-10-161-13/+7
| | | | debugger without a panic.
* Control the execution permission of the readable segments forkib2011-10-151-1/+9
| | | | | | | i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel
* In elf32_trans_prot() and when compiling for amd64 or ia64, addmarcel2011-10-131-0/+6
| | | | | PROT_EXECUTE when PROT_READ is needed. By default i386 allows execution when reading is allowed and JDK 1.4.x depends on that.
* Make memguard(9) capable to guard uma(9) allocations.glebius2011-10-121-1/+1
|
* Correct a bug in export of capability-related information from the sysctlsrwatson2011-10-121-12/+20
| | | | | | | | | supporting procstat -f: properly provide capability rights information to userspace. The bug resulted from a merge-o during upstreaming (or rather, a failure to properly merge FreeBSD-side changed downstream). Spotted by: des, kibab MFC after: 3 days
* Don't call fixup_filename() on each witness lock call.adrian2011-10-121-41/+63
| | | | | | | | | | | This has been irking me for a while. This causes significant CPU use on bottlenecked CPUs (eg my older EEEPC w/ an earlier Celeron CPU and my MIPS24k boards) when they're passing a lot of traffic. Since the file/line values are only used for printing, this should only affect display. It should have no operational change on the code, besides reducing CPU use.
* Add a new trace point, KTRFAC_CAPFAIL, which traces capability checkdes2011-10-112-1/+30
| | | | failures. It is included in the default set for ktrace(1) and kdump(1).
* When unmounting a filesystem always wait for the vfs_busy lock to clearmckusick2011-10-111-12/+0
| | | | | | | | | | | so that if no vnodes in the filesystem are actively in use the unmount will succeed rather than failing with EBUSY. Reported by: Garrett Cooper Reviewed by: Attilio Rao and Kostik Belousov Tested by: Garrett Cooper PR: kern/161016 MFC after: 3 weeks
* In device_get_children() avoid malloc(0) in order to increase portabilitymarius2011-10-091-0/+5
| | | | | | to other operating systems. PR: 154287
* Fix the handling of an empty kmem map by sysctl_kmem_map_free(). Inalc2011-10-081-2/+2
| | | | | | | | the unlikely event that sysctl_kmem_map_free() was performed on an empty kmem map, it would incorrectly report the free space as zero. Discussed with: avg MFC after: 1 week
* Change one printf() to log().jonathan2011-10-071-1/+1
| | | | | | | | As noted in kern/159780, printf() is not very jail-friendly, since it can't be easily monitored by jail management tools. This patch reports an error via log() instead, which, if nobody is watching the log file, still prints to the console. Approved by: mentor (rwatson) Submitted by: Eugene Grosbein <eugen@eg.sd.rdtc.ru> MFC after: 5 days
* Disallow various debug.kdb sysctl's when securelevel is raised.obrien2011-10-071-9/+14
| | | | PR: 161350
* Return proper errno when we hit error when doing sanity check.delphij2011-10-071-4/+22
| | | | | | | | This fixes dtrace crashes when module is not compiled with CTF data. Submitted by: Paul Ambrose ambrosehua at gmail.com MFC after: 1 week
* - Currently, sched_balance_pair() may cause a CPU to send an IPI_PREEMPT tomarius2011-10-061-4/+9
| | | | | | | | | | | | | | | | | | | | itself, which sparc64 hardware doesn't support. One way to solve this would be to directly call sched_preempt() instead of issuing a self-IPI. However, quoting jhb@: "On the other hand, you can probably just skip the IPI entirely if we are going to send it to the current CPU. Presumably, once this routine finishes, the current CPU will exit softlock (or will do so "soon") and will then pick the next thread to run based on the adjustments made in this routine, so there's no need to IPI the CPU running this routine anyway. I think this is the better solution. Right now what is probably happening on other platforms is as soon as this routine finishes the CPU processes its self-IPI and causes mi_switch() which will just switch back to the softclock thread it is already running." - With r226054 and the the above change in place, sparc64 now no longer is incompatible with ULE and vice versa. However, powerpc/E500 still is. Submitted by: jhb Reviewed by: jeff
* Remove assertion against empty NFSv4 ACLs. An empty ACL is not exactlytrasz2011-10-051-4/+0
| | | | | | | | | | | | valid - we don't allow for setting it on a file, for example - but it's not something we should assert on. For STABLE kernel, it changes nothing, because it's not compiled with INVARIANTS. If it was, it would fix crashes. It also fixes an assert in libc encountered with NFSv4 without nfsuserd(8) running. Submitted by: Yuri Pankov (earlier version) MFC after: 1 month
* Supply unique (st_dev, st_ino) value pair for the fstat(2) done on the pipes.kib2011-10-051-2/+26
| | | | | Reviewed by: jhb, Peter Jeremy <peterjeremy acm org> MFC after: 2 weeks
* Move parts of the commit log for r166167, where Tor explained thekib2011-10-041-0/+32
| | | | | | interaction between vnode locks and vfs_busy(), into comment. MFC after: 1 week
* Actually enforce limit for inheritable resources on fork.trasz2011-10-041-6/+6
| | | | MFC after: 3 days
* Move some code inside the racct_proc_fork(); it spares a few lock operationstrasz2011-10-032-20/+22
| | | | | | and it's more logical this way. MFC after: 3 days
* Assert that exiting process does not return to usermode.kib2011-10-031-0/+2
| | | | | Reviewed by: avg, jhb MFC after: 1 week
* Fix another bug introduced in r225641, which caused rctl to access certaintrasz2011-10-033-11/+40
| | | | | | fields in 'struct proc' before they got initialized in do_fork(). MFC after: 3 days
* Fix bug introduced in r225641, which would cause panic if racct_proc_fork()trasz2011-10-031-18/+1
| | | | | | returned error -- the racct_destroy_locked() would get called twice. MFC after: 3 days
* The sigwait(3) function shall not return EINTR, according to thekib2011-10-011-0/+2
| | | | | | | | | | | | | | POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7 contains the wrapper sigwait(3) which hides EINTR from callers. The EINTR return is used by libthr to handle required cancellation point in the sigwait(3). To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and earlier, to have right ABI for sigwait(3), transform EINTR return from sigwait(2) into ERESTART. Discussed with: davidxu MFC after: 1 week
* Fix handling of corrupt compress(1)ed data. [11:04]bz2011-09-281-0/+4
| | | | | | | | | | Add missing length checks on unix socket addresses. [11:05] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-11:04.compress Security: CVE-2011-2895 [11:04] Security: FreeBSD-SA-11:05.unix
* Revert r225372:attilio2011-09-271-14/+0
| | | | | | | | | | | | | | wdog_kern_pat() acquires eventhandler mutex, thus it cannot work in kernel context (from where kdb_trap() runs). The right way to fix this is both offering the cpu-stop-on-panic-and-skip-locking logic and also a context for KDB to officially run. We can re-enable this (or a similar) improvement when these 2 patches hit the tree. Sponsored by: Sandvine Incorporated Discussed with: emaste, rstone MFC after: immediately
* Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() onkib2011-09-272-11/+9
| | | | | | | | | | syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal() do not want to deliver the signal, and debugger never get a notification of exec. Found and tested by: Anton Yuzhaninov <citrin citrin ru> Discussed with: jhb MFC after: 2 weeks
* Fix interrupt counters dumping on SW_WATCHDOG fire.mav2011-09-271-1/+1
|
* Fix error handling bug that would prevent MAC structures from gettingtrasz2011-09-171-20/+18
| | | | | | freed properly if resource limit got exceeded. Approved by: re (kib)
* Fix long-standing thinko regarding maxproc accounting. Basically,trasz2011-09-172-37/+7
| | | | | | | | | | we were accounting the newly created process to its parent instead of the child itself. This caused problems later, when the child changed its credentials - the per-uid, per-jail etc counters were not properly updated, because the maxproc counter in the child process was 0. Approved by: re (kib)
* Auto-generated code from sys_ prefixing makesyscalls.sh changekmacy2011-09-161-323/+323
| | | | Approved by: re(bz)
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-1654-409/+427
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Ensure that ta_pending doesn't overflow u_short by capping its value at ↵adrian2011-09-151-1/+3
| | | | | | | | | | | | | USHRT_MAX. If it overflows before the taskqueue can run, the task will be re-added to the taskqueue and cause a loop in the task list. Reported by: Arnaud Lacombe <lacombar@gmail.com> Submitted by: Ryan Stone <rysto32@gmail.com> Reviewed by: jhb Approved by: re (kib) MFC after: 1 day
* Modify vfs_register() to use a hash calculationrmacklem2011-09-131-1/+44
| | | | | | | | | | | | | | | on vfc_name to set vfc_typenum, so that vfc_typenum doesn't change when file systems are loaded in different orders. This keeps NFS file handles from changing, for file systems that use vfc_typenum in their fsid. This change is controlled via a loader.conf variable called vfs.typenumhash, since vfc_typenum will change once when this is enabled. It defaults to 1 for 9.0, but will default to 0 when MFC'd to stable/8. Tested by: hrs Reviewed by: jhb, pjd (earlier version) Approved by: re (kib) MFC after: 1 month
* dump_write() returns ENXIO if the dump is trying to be written outsideattilio2011-09-121-2/+5
| | | | | | | | | | | | | | | | of the device boundry. While this is generally ok, the problem is that all the consumers handle similar cases (and expect to catch) ENOSPC for this (for a reference look at minidumpsys() and dumpsys() constructions). That ends up in consumers not recognizing the issue and amd64 failing to retry if the number of pages grows up during minidump. Fix this by returning ENOSPC in dump_write() and while here add some more diagnostic on involved values. Sponsored by: Sandvine Incorporated In collabouration with: emaste Approved by: re (kib) MFC after: 10 days
* Fix error return codes for ioctls on init/lock state devices.ed2011-09-121-1/+2
| | | | | | | | | | | In revision 223722 we introduced support for driver ioctls on init/lock state devices. Unfortunately the call to ttydevsw_cioctl() clobbers the value of the error variable, meaning that in many cases ioctl() will now return ENOTTY, even though the ioctl() was processed properly. Reported by: Boris Samorodov <bsam ipt ru> Patch by: jilles@ Approved by: re@ (kib@)
* Inline the syscallenter() and syscallret(). This reduces the time measuredkib2011-09-112-162/+213
| | | | | | | | by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
* Improve the informations reported in case of busy buffers during the shutdown:attilio2011-09-082-9/+21
| | | | | | | | | | | | | | | | | | | - Axe out the SHOW_BUSYBUFS option and uses a tunable for selectively enable/disable it, which is defaulted for not printing anything (0 value) but can be changed for printing (1 value) and be verbose (2 value) - Improves the informations outputed: right now, there is no track of the actual struct buf object or vnode which are referenced by the shutdown process, but it is printed the related struct bufobj object which is not really helpful - Add more verbosity about the state of the struct buf lock and the vnode informations, with the latter to be activated separately by the sysctl Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib Approved by: re (ksmith) MFC after: 10 days
* Fix whitespace.trasz2011-09-071-1/+1
| | | | | Submitted by: amdmi3 Approved by: re (rwatson)
* Work around a kernel panic triggered by forkbomb with an rctl ruletrasz2011-09-061-0/+11
| | | | | | | | such as j:name:maxproc:sigkill=100. Proper fix - deferring psignal to a taskqueue - is somewhat complicated and thus will happen after 9.0. Approved by: re (kib)
* Interrupts are disabled/enabled when entering and exiting the KDB context.attilio2011-09-041-0/+14
| | | | | | | | | | | | | | | While this is generally good, it brings along a serie of problems, like clocks going off sync and in presence of SW_WATCHDOG, watchdogs firing without a good reason (missed hardclock wdog ticks update). Fix the latter by kicking the watchdog just before to re-enable the interrupts. Also, while here, not rely on users to stop the watchdog manually when entering DDB but do that when entering KDB context. Sponsored by: Sandvine Incorporated Reviewed by: emaste, rstone Approved by: re (kib) MFC after: 1 week
* Since r224036 the cputime and wallclock are supposed to be in seconds,trasz2011-09-041-2/+2
| | | | | | not microseconds. Make it so. Approved by: re (kib)
* Fix panic that happens when fork(2) fails due to a limit other thantrasz2011-09-031-7/+12
| | | | | | | the rctl one - for example, it happens when someone reaches maximum number of processes in the system. Approved by: re (kib)
* Correct several issues in the integration of POSIX shared memory objectsrwatson2011-09-021-12/+16
| | | | | | | | | | | | | | | | | | | | | | and the new setmode and setowner fileops in FreeBSD 9.0: - Add new MAC Framework entry point mac_posixshm_check_create() to allow MAC policies to authorise shared memory use. Provide a stub policy and test policy templates. - Add missing Biba and MLS implementations of mac_posixshm_check_setmode() and mac_posixshm_check_setowner(). - Add 'accmode' argument to mac_posixshm_check_open() -- unlike the mac_posixsem_check_open() entry point it was modeled on, the access mode is required as shared memory access can be read-only as well as writable; this isn't true of POSIX semaphores. - Implement full range of POSIX shared memory entry points for Biba and MLS. Sponsored by: Google Inc. Obtained from: TrustedBSD Project Approved by: re (kib)
* Attempt to make break-to-debugger and alternative break-to-debugger morerwatson2011-08-261-1/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | accessible: (1) Always compile in support for breaking into the debugger if options KDB is present in the kernel. (2) Disable both by default, but allow them to be enabled via tunables and sysctls debug.kdb.break_to_debugger and debug.kdb.alt_break_to_debugger. (3) options BREAK_TO_DEBUGGER and options ALT_BREAK_TO_DEBUGGER continue to behave as before -- only now instead of compiling in break-to-debugger support, they change the default values of the above sysctls to enable those features by default. Current kernel configurations should, therefore, continue to behave as expected. (4) Migrate alternative break-to-debugger state machine logic out of individual device drivers into centralised KDB code. This has a number of upsides, but also one downside: it's now tricky to release sio spin locks when entering the debugger, so we don't. However, similar logic does not exist in other device drivers, including uart. (5) dcons requires some special handling; unlike other console types, it allows overriding KDB's own debugger selection, so we need a new interface to KDB to allow that to work. GENERIC kernels in -CURRENT will now support break-to-debugger as long as appropriate boot/run-time options are set, which should improve the debuggability of BETA kernels significantly. MFC after: 3 weeks Reviewed by: kib, nwhitehorn Approved by: re (bz)
* Fix format strings for KTR_STATE in 4BSD ad ULE schedulers.delphij2011-08-262-4/+4
| | | | | | | Submitted by: Ivan Klymenko <fidaj@ukr.net> PR: kern/159904, kern/159905 MFC after: 2 weeks Approved by: re (kib)
* Delay the recursive decrement of pr_uref when jails are made invisiblejamie2011-08-261-26/+5
| | | | | | | | | | | but not removed; decrement it instead when the child jail actually goes away. This avoids letting the counter go below zero in the case where dying (pr_uref==0) jails are "resurrected", and an associated KASSERT panic. Submitted by: Steven Hartland Approved by: re (bz) MFC after: 1 week
* Fix a deficiency in the selinfo interface:attilio2011-08-258-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
* Increase the defaults for the maximum socket buffer limit,bz2011-08-251-1/+1
| | | | | | | | | | | | | | | | | | and the maximum TCP send and receive buffer limits from 256kB to 2MB. For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths. Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls. Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)
* Generalize ffs_pages_remove() into vn_pages_remove().mm2011-08-251-0/+15
| | | | | | | | | | | Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
OpenPOWER on IntegriCloud