summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* MFC r285715:ed2015-08-171-64/+61
| | | | | | | | | | | | | | | | | | Add an API for easily creating userspace threads in kernelspace. This change refactors the existing create_thread() function to be more generic. It replaces almost all of its arguments by a callback that can be used to extract the thread ID and copy it out to the right place, but also to perform additional initialization steps, such as setting the trapframe. This also makes the difference between thr_new() and thr_create() more clear in my opinion. This function is going to be used by the CloudABI compatibility layer. It looks like the OpenSolaris compatibility framework already provides a function called thread_create(). Rename this function to do_thread_create() and use a macro to deal with the namespacing conflict. A similar approach is already used for thread_exit().
* MFC r286106:kib2015-08-071-0/+62
| | | | Provide a prefaulting for the userspace i/o buffers, disabled by default.
* MFC r285384:kib2015-08-072-3/+9
| | | | Do not allow creation of the dirty buffers for the dead buffer objects.
* Make the kern.racct.tunable actually work.trasz2015-08-051-0/+1
| | | | | | | | This is a direct commit to 10-STABLE - 11-CURRENT is not affected, because tunables are automatically fetched there. MFC after: ASAP Sponsored by: The FreeBSD Foundation
* MFC r285888:ae2015-08-051-1/+1
| | | | | | | | | Build debug version of rmlock's methods only when LOCK_DEBUG > 0. Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1). This means that debugging code always built. In addition the kernel modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code is always used by kernel modules.
* MFC r282086:trasz2015-08-031-1/+1
| | | | | | | | | Make setproctitle(3) work in Capsicum capability mode. This makes ctld(8) child processes to indicate initiator address and name in their titles, similar to what iscsid(8) child processes do. PR: 181352 Sponsored by: The FreeBSD Foundation
* Fix ia64 to not override the call stack bottom address with thekib2015-08-031-2/+1
| | | | | | | | | register stack bottom address, after the merge of r284956 in r285967. Note: this is a direct commit to stable/10. Reported and tested by: clusteradm (peter) Sponsored by: The FreeBSD Foundation
* MFC: r285839marius2015-07-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Revert the other functional half of r239864, i. e. the merge of r134227 from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus() but also for the MD cache invalidation, TLB demapping and remote register reading IPIs due to the following reasons: - The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on sparc64. That's because on sparc64, spin locks don't disable interrupts completely but only raise the processor interrupt level to PIL_TICK. This means that IPIs still get delivered and direct dispatch IPIs such as the cache invalidation etc. IPIs in question are still executed. - In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action(). Consequently, smp_ipi_mtx may be locked for an extended amount of time as queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are scheduled via a soft interrupt. Moreover, given that this soft interrupt is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action() on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in turn leading to the target in question trying to send an IPI by itself while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a deadlock. o As mentioned in the commit message of r245850, on least some sun4u platforms concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers, i. e. ipi_{all_but_self,cpu,selected}(). o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually are started, when smp_started is no longer 0. o In all MD and MI IPI helpers, check for smp_started == 1 rather than for smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs trying to be delivered to APs that in fact aren't up and running, yet. While at it, move setting of the cpu_ipi_{selected,single}() pointers to the appropriate delivery functions from mp_init() to cpu_mp_start() where it's better suited and allows to get rid of the global isjbus variable. o Given that now concurrent IPI delivery no longer is possible, also nuke the delays before completely disabling interrupts again in the CPU-specific cross-trap delivery functions, previously giving other CPUs a window for sending IPIs on their part. Actually, we now should be able to entirely get rid of completely disabling interrupts in these functions. Such a change needs more testing, though. o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not necessary for correctness, this avoids page faults when accessing the stack of a foreign CPU as {s,}tick now is locked into the TLBs as part of static kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as possible, avoiding jitter. PR: 201245
* MFC r285483: pipe_direct_write: Fix mismatched pipelock/unlockcem2015-07-281-2/+3
| | | | | | | | If a signal is caught in pipelock, causing it to fail, pipe_direct_write should not try to pipeunlock. Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division
* MFC r284956:kib2015-07-281-1/+1
| | | | Do not calculate the stack's bottom address twice.
* MFC r285039:kib2015-07-281-3/+0
| | | | Remove asserts which might reference freed memory.
* MFC r285134 (by mjg):kib2015-07-281-28/+24
| | | | | | | fd: de-k&r-ify functions + some whitespace fixes MFC r285269: Handle copyout for the fcntl(F_OGETLK) using oflock structure.
* MFC r285663, r285664, r285667:markj2015-07-214-28/+37
| | | | | | | | | Ensure that locstat_nsecs() has no effect when lockstat probes are not enabled or when the profiled lock carries the LO_NOPROFILE flag. PR: 201642, 201517 Approved by: re (gjb) Tested by: Jason Unovitch
* Revert r284178 and r284256.kib2015-07-211-68/+41
| | | | Approved by: re (gjb)
* MFC r285424 (ian):delphij2015-07-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | Use the monotonic (uptime) counter rather than time-of-day to measure elapsed time between ntp_adjtime() clock offset adjustments. This eliminates spurious frequency steering after a large clock step (such as a 1970->2015 step on a system with no battery-backed clock hardware). This problem was discovered after the import of ntpd 4.2.8, which does things in a slightly different (but still correct) order than the 4.2.4 we had previously. In particular, 4.2.4 would step the clock then immediately after use ntp_adjtime() to set the frequency and offset to zero, which captured the post-step time-of-day as a side effect. In 4.2.8, ntpd sets frequency and offset to zero before any initial clock step, capturing the time as 1970-ish, then when it next calls ntp_adjtime() it's with a non-zero offset measurement. This non-zero value gets multiplied by the apparent 45-year interval, which blows up into a completely bogus frequency steer. That gets clamped to 500ppm, but that's still enough to make the clock drift so fast that ntpd has to keep stepping it every few minutes to compensate. Approved by: re (gjb)
* MFC r284887:kib2015-07-111-1/+21
| | | | | | | | | Handle errors from background write of the cylinder group blocks. MFC r284927: Simplify code. Approved by: re (gjb)
* MFC r284297: several lockstat improvementsavg2015-07-013-38/+103
|
* MFC r284495:kib2015-07-011-19/+27
| | | | | Keep a vnode which is freed but still owing inactivation, on the active list. This closes a race where such vnode is not msync-ed until reboot.
* MFC r284719:kib2015-06-301-7/+9
| | | | | Only take previous buffer queue lock (olock) when needed for REMFREE in binsfree().
* MFC r279444:neel2015-06-281-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow passthrough devices to be hinted. MFC r279683: When ICW1 is issued the edge sense circuit is reset which means that following an initialization a low-to-high transistion is necesary to generate an interrupt. MFC r279925: Add -p parameter to list PCI device to pass through to the guest. MFC r281559: Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'. MFC r280447: When fetching an instruction in non-64bit mode, consider the value of the code segment base address. MFC r280725: Move legacy interrupt allocation for virtio devices to common code. MFC r280775: Fix the RTC device model to operate correctly in 12-hour mode. MFC r280929: Fix "MOVS" instruction memory to MMIO emulation. MFC r280968: Display instruction bytes and %rip prior to aborting due to an instruction emulation error. MFC r281145: Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions. MFC r281542: Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749) MFC r281561: Prior to aborting due to an ioport error, it is always interesting to see what the guest's %rip is. MFC r281611: If the number of guest vcpus is less than '1' then flag it as an error. MFC r281612: Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly. MFC r281630: Relax the check on which vectors can be delivered through the APIC. According to the Intel SDM vectors 16 through 255 are allowed to be delivered via the local APIC. MFC r281879: Missing break in switch case (Coverity ID 1292499) MFC r281946: Don't allow guest to modify readonly bits in the PCI config 'status' register. MFC r281987: STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation. MFC r282206: Implement the century byte in the RTC.
* When using KTRACE, set a variable to the appropriate value and don'ttuexen2015-06-221-0/+8
| | | | | | | | | | leave it initialized at NULL. Since the affected functions where moved from sys/kern/uipc_syscalls.c to sys/netinet/sctp_syscalls.c it was not possible to MFC r284613. Therefore, this is a direct commit with the corresponding changes of r284613. Reported by: Coverity CID: 1018058, 1018060
* MFC r282213:trasz2015-06-2112-80/+287
| | | | | | | | | | | | | | | | | | Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). MFC r282901: Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Note those two are MFC-ed together, because the latter one changes the name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have committed the renaming separately... Relnotes: yes Sponsored by: The FreeBSD Foundation
* MFC r284127:markj2015-06-211-4/+13
| | | | | | | | | witness: don't warn about matrix inconsistencies without holding the mutex Lock order checking is done without the witness mutex held, so multiple threads that are racing to establish a new lock order may read matrix entries that are in an inconsistent state. Don't print a warning in this case, but instead just redo the check after taking the witness lock.
* MFC r284178:kib2015-06-181-41/+68
| | | | | | | Add barriers when updating and reading th_generation. MFC r284256: Tweaks for r284178.
* MFC, r284192:ken2015-06-161-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r284192 | ken | 2015-06-09 15:39:38 -0600 (Tue, 09 Jun 2015) | 102 lines Add support for reading MAM attributes to camcontrol(8) and libcam(3). MAM is Medium Auxiliary Memory and is most commonly found as flash chips on tapes. This includes support for reading attributes and decoding most known attributes, but does not yet include support for writing attributes or reporting attributes in XML format. libsbuf/Makefile: Add subr_prf.c for the new sbuf_hexdump() function. This function is essentially the same function. libsbuf/Symbol.map: Add a new shared library minor version, and include the sbuf_hexdump() function. libsbuf/Version.def: Add version 1.4 of the libsbuf library. libutil/hexdump.3: Document sbuf_hexdump() alongside hexdump(3), since it is essentially the same function. camcontrol/Makefile: Add attrib.c. camcontrol/attrib.c: Implementation of READ ATTRIBUTE support for camcontrol(8). camcontrol/camcontrol.8: Document the new 'camcontrol attrib' subcommand. camcontrol/camcontrol.c: Add the new 'camcontrol attrib' subcommand. camcontrol/camcontrol.h: Add a function prototype for scsiattrib(). share/man/man9/sbuf.9: Document the existence of sbuf_hexdump() and point users to the hexdump(3) man page for more details. sys/cam/scsi/scsi_all.c: Add a table of known attributes, text descriptions and handler functions. Add a new scsi_attrib_sbuf() function along with a number of other related functions that help decode attributes. scsi_attrib_ascii_sbuf() decodes ASCII format attributes. scsi_attrib_int_sbuf() decodes binary format attributes, and will pass them off to scsi_attrib_hexdump_sbuf() if they're bigger than 8 bytes. scsi_attrib_vendser_sbuf() decodes the vendor and drive serial number attribute. scsi_attrib_volcoh_sbuf() decodes the Volume Coherency Information attribute that LTFS writes out. sys/cam/scsi/scsi_all.h: Add a number of attribute-related structure definitions and other defines. Add function prototypes for all of the functions added in scsi_all.c. sys/kern/subr_prf.c: Add a new function, sbuf_hexdump(). This is the same as the existing hexdump(9) function, except that it puts the result in an sbuf. This also changes subr_prf.c so that it can be compiled in userland for includsion in libsbuf. We should work to change this so that the kernel hexdump implementation is a wrapper around sbuf_hexdump() with a statically allocated sbuf with a drain. That will require a drain function that goes to the kernel printf() buffer that can take a non-NUL terminated string as input. That is because an sbuf isn't NUL-terminated until it is finished, and we don't want to finish it while we're still using it. We should also work to consolidate the userland hexdump and kernel hexdump implemenatations, which are currently separate. This would also mean making applications that currently link in libutil link in libsbuf. sys/sys/sbuf.h: Add the prototype for sbuf_hexdump(), and add another copy of the hexdump flag values if they aren't already defined. Ideally the flags should be defined in one place but the implemenation makes it difficult to do properly. (See above.) Sponsored by: Spectra Logic Corporation ------------------------------------------------------------------------
* MFC r283889,r283891:delphij2015-06-151-0/+1
| | | | | | | | | | | | Clear p_stops when doing PT_DETACH and PROCFS_CTL_DETACH. Without this, if a process was being traced by truss(1), which uses different p_stops bits than gdb(1), the latter would misbehave because of the unexpected bits. Reported by: jceel Submitted by: sef Sponsored by: iXsystems, Inc.
* MFC 283546:jhb2015-06-134-1/+88
| | | | Add KTR tracing for some MI ptrace events.
* Add chunk missed in the r284199.kib2015-06-101-0/+1
|
* MFC r283602:kib2015-06-102-16/+17
| | | | | | | | | Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. MFC r283629: Add missed {}.
* MFC r283601:kib2015-06-101-10/+10
| | | | | Add V_MNTREF flag, to indicate that caller of vn_start*_write() already owns a reference on the mount point, and the functions can consume it.
* MFC r283600:kib2015-06-104-0/+15
| | | | | | | | Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread.
* MFC r283115asomers2015-06-091-5/+5
| | | | | | | Properly null-terminate strings in a kernel dump header. A version string longer than 192 bytes will cause the version field of a dump header to overflow. strncpy doesn't null terminate it, so savecore will print a corrupted info file. Using strlcpy fixes the bug.
* MFC r283735:kib2015-06-054-11/+3
| | | | Remove several write-only variables.
* MFC r283745:kib2015-06-051-3/+6
| | | | Do not raise priority of the idle thread on singal delivery.
* MFC r259438 by pjd: Fix syscalls that can be loaded as kernel modulesemaste2015-06-032-17/+17
| | | | | | | | | They were not given the flag allowing to call them from capability mode sandbox. And regenerate init_sysent.c Sponsored by: The FreeBSD Foundation
* MFC r261220 by csjp: Allow sigwait(2) in capabilities mode.emaste2015-06-032-1/+2
| | | | | | It's common for multi-threaded processes to create a thread for the purpose of synchronously processing signals. Allow such processes to utilize a capabilities sandbox.
* MFC r259436,259437 by pjd: Allow for pselect(2) in capability mode.emaste2015-06-032-2/+3
|
* Regen for r283940.emaste2015-06-031-1/+1
|
* MFC r257736 (by pjd):emaste2015-06-031-8/+1
| | | | | | | - Remove mac_get_fd/mac_set_fd - those are not syscalls. The __mac_get_fd() and __mac_set_fd() syscalls are listed earlier. - Correct typo in syscall name. It should be sched_rr_get_interval, not sched_rr_getinterval.
* MFC r283320:kib2015-05-301-5/+2
| | | | Always obey thread request to not stop on non-boundary.
* MFC r281915:markj2015-05-291-2/+1
| | | | | | | Make vpanic() externally visible. MFC r281916: Fix DTrace's panic() action.
* MFC r282708:kib2015-05-241-47/+58
| | | | | On exec, single-threading must be enforced before arguments space is allocated from exec_map.
* MFC r279728, r279729, r279756, r279773, r282424, r281367:ian2015-05-241-4/+36
| | | | | | | | | | | | | | | | | | | Add mutex support to the pps_ioctl() API in the kernel. Add PPS support to USB serial drivers. Use correct mode variable for PPS support. Switch polarity of USB serial PPS events. The ftdi "get latency" and "get bitmode" device commands are read operations, not writes. Implement a mechanism for making changes in the kernel<->driver PPS interface without breaking ABI or API compatibility with existing drivers. Bump version number to indicate the new PPS ABI version changes in the pps_state structure.
* MFC r274711:ian2015-05-231-0/+7
| | | | Stop using early_putc immediately after configuring console with cninit()
* MFC r282690:kib2015-05-231-2/+4
| | | | | Call uma_reclaim() from the additional pagedaemon thread to reclaim kmem arena address space.
* MFC r282944:kib2015-05-221-21/+30
| | | | | | Decrement p_boundary_count in the single-threading thread, during making other thread runnable. This guarantees that upon return from the thread_single_end(), p_boundary_count is zero.
* MFC r282594:ae2015-05-211-0/+1
| | | | | | | m_dup() is supposed to give a writable copy of an mbuf chain. It uses m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field. If original mbuf chain has M_RDONLY flag, its copy also will have it. Reset this flag explicitly.
* MFC r280495:hselasky2015-05-211-16/+47
| | | | | | | | | | | | | | Implement a simple OID number garbage collector. Given the increasing number of dynamically created and destroyed SYSCTLs during runtime it is very likely that the current new OID number limit of 0x7fffffff can be reached. Especially if dynamic OID creation and destruction results from automatic tests. Additional changes: - Optimize the typical use case by decrementing the next automatic OID sequence number instead of incrementing it. This saves searching time when inserting new OIDs into a fresh parent OID node. - Add simple check for duplicate non-automatic OID numbers.
* MFC r282679:kib2015-05-161-0/+23
| | | | | Do not return from thread_single(SINGLE_BOUNDARY) until all stopped thread are guarenteed to be removed from the processors.
* MFC: r281960rmacklem2015-05-141-6/+8
| | | | | | | | | | | | | MAXBSIZE defines both the largest UFS block size and the largest size for a buffer in the buffer cache. This patch defines a new constant MAXBCACHEBUF, which is the largest size for a buffer in the buffer cache. Having a separate constant allows MAXBCACHEBUF to be set larger than MAXBSIZE on a per-architecture basis, so that NFS can do larger read/writes for these architectures. It modifies sys/param.h so that BKVASIZE can also be set on a per-architecture basis. A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE is fixed as well.
OpenPOWER on IntegriCloud