summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Fix nits in previous commit:marcel2014-10-111-12/+11
| | | | | | | | | | | | | | 1. Remove initializer for badstack_sbuf_size; it gets set unconditionally. 2. Remove meaningless comment. 3. Group witness_count and its sysctl together. 4. Fix spacing in for statements (space after for and within condition). 5. Change *all* M_NOWAIT usages in witness_initialize() to M_WAITOK; not just those that were newly introduced -- the allocation is assumed to succeed for all allocations. 6. Avoid using uint8_t as the base type in sizeof() expressions; Use the variable name (w_rmatrix) as much as possible. Pointed out by: jhb@ (thanks!)
* Turn WITNESS_COUNT into a tunable and sysctl. This allows adjustingmarcel2014-10-111-11/+27
| | | | | | | | | | the value without recompiling the kernel. This is useful when recompiling is not possible as an immediate solution. When we run out of witness objects, witness is completely disabled. Not having an immediate solution can therefore be problematic. Submitted by: Sreekanth Rupavatharam <rupavath@juniper.net> Obtained from: Juniper Networks, Inc.
* Regenerate after r272823:marcel2014-10-092-6/+6
| | | | | | | | Move the SCTP syscalls to netinet with the rest of the SCTP code. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.
* Move the SCTP syscalls to netinet with the rest of the SCTP code. Themarcel2014-10-092-494/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.
* Add a bus method to fetch the VM domain for the given device/bus.adrian2014-10-092-0/+44
| | | | | | | | | | | | | | | | | | | * Add a bus_if.m method - get_domain() - returning the VM domain or ENOENT if the device isn't in a VM domain; * Add bus methods to print out the domain of the device if appropriate; * Add code in srat.c to save the PXM -> VM domain mapping that's done and expose a function to translate VM domain -> PXM; * Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute and if so map it to the VM domain; * (.. yes, this works recursively.) * Have the pci bus glue print out the device VM domain if present. Note: this is just the plumbing to start enumerating information - it doesn't at all modify behaviour. Differential Revision: D906 Reviewed by: jhb Sponsored by: Norse Corp
* Fix draining in ttydev_leave():marcel2014-10-091-10/+25
| | | | | | | | | | | | | | | | | 1. ERESTART is not only returned when the revoke count changed. It is also returned when a signal is received. While a change in the revoke count should be ignored, a signal should not. 2. Waiting until the output queue is entirely drained can cause a hang when the underlying device is stuck or broken. Have tty_drain() take care of this by telling it when we're leaving. When leaving, tty_drain() will use a timed wait to address point 2 above and it will check the revoke count to handle point 1 above. The timeout is set to 1 second, which is arbitrary and long enough to expect a change in the output queue. Discussed with: jilles@ Reported by: Yamagi Burmeister <lists@yamagi.org>
* Apply r269126 to tty_timedwait():marcel2014-10-091-4/+4
| | | | Don't return ERESTART when the device is gone.
* Add schedgraph traces for callout handlers. Specifically, a callwheel logsjhb2014-10-081-4/+10
| | | | | | | | a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph.
* Make kern.nswbuf tunable from loader.jkim2014-10-071-3/+6
| | | | MFC after: 1 week
* Convert racct stubs to inline functions.mjg2014-10-061-84/+0
| | | | | | This saves some symbols and function calls for kernel without RACCT. MFC after: 1 week
* filedesc: fix up breakage introduced in 272505mjg2014-10-051-5/+5
| | | | | | | | | | | Include sequence counter supports incoditionally [1]. This fixes reprted build problems with e.g. nvidia driver due to missing opt_capsicum.h. Replace fishy looking sizeof with offsetof. Make fde_seq the last member in order to simplify calculations. Suggested by: kib [1] X-MFC: with 272505
* On error, sbuf_bcat() returns -1. Some callers returned this -1 tokib2014-10-052-9/+15
| | | | | | | | | | | | | the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week
* Avoid unnecessary ppeers_lock acquisition in exit1.mjg2014-10-051-10/+12
| | | | MFC after: 1 week
* Get rid of crshared.mjg2014-10-051-11/+1
|
* Slightly reword comment. Move code, which is described by thekib2014-10-041-5/+4
| | | | | | | | comment, after it. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add kernel option KSTACK_USAGE_PROF to sample the stack depth onkib2014-10-041-0/+5
| | | | | | | | | | interrupts and report the largest value seen as sysctl debug.max_kstack_used. Useful to estimate how close the kernel stack size is to overflow. In collaboration with: Larry Baird <lab@gta.com> Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week
* Fixes for i/o during coredumping:kib2014-10-042-22/+15
| | | | | | | | | | | | | | | | - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode iskib2014-10-041-7/+10
| | | | | | | | not locked, but range is. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Make kevent(2) periodic timer events more reliably periodic. The eventian2014-10-041-5/+9
| | | | | | | | | callout is now scheduled using the C_ABSOLUTE flag, and the absolute time of each event is calculated as the time the previous event was scheduled for plus the interval. This ensures that latency in processing a given event doesn't perturb the arrival time of any subsequent events. Reviewed by: jhb
* Plug capability races.mjg2014-10-041-7/+52
| | | | | | | | | | | | fp and appropriate capability lookups were not atomic, which could result in improper capabilities being checked. This could result either in protection bypass or in a spurious ENOTCAPABLE. Make fp + capability check atomic with the help of sequence counters. Reviewed by: kib MFC after: 3 weeks
* Require p_cansched() for changing a process' protection status viajhb2014-10-021-1/+1
| | | | | | | procctl() rather than p_cansee(). Submitted by: rwatson MFC after: 3 days
* In the syncer, drop the sync mutex while patting the watchdog.will2014-10-011-1/+8
| | | | | | | | | | Some watchdog drivers (like ipmi) need to sleep while patting the watchdog. See sys/dev/ipmi/ipmi.c:ipmi_wd_event(), which calls malloc(M_WAITOK). Submitted by: asomers MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 637548 on 2012/10/04
* Test for absence of M_NOFREE before attempting to purge the mbuf's tags.np2014-09-301-1/+1
| | | | | | This will leave more state intact should the assertion go off. MFC after: 1 month
* Use bzero instead of explicitly zeroing stuff in do_execve.mjg2014-09-291-22/+1
| | | | | | | | While strictly speaking this is not correct since some fields are pointers, it makes no difference on all supported archs and we already rely on it doing the right thing in other places. No functional changes.
* tty_rel_free() can be called more than once for the same tty so make sureneel2014-09-281-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that the tty is dequeued from 'tty_list' only the first time. The panic below was seen when a revoke(2) was issued on an nmdm device. In this case there was also a thread that was blocked on a read(2) on the device. The revoke(2) woke up the blocked thread which would typically return an error to userspace. In this case the reader also held the last reference on the file descriptor so fdrop() ended up calling tty_rel_free() via ttydev_close(). tty_rel_free() then tried to dequeue 'tp' again which led to the panic. panic: Bad link elm 0xfffff80042602400 prev->next != elm cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f9c90460 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00f9c90510 vpanic() at vpanic+0x189/frame 0xfffffe00f9c90590 panic() at panic+0x43/frame 0xfffffe00f9c905f0 tty_rel_free() at tty_rel_free+0x29b/frame 0xfffffe00f9c90640 ttydev_close() at ttydev_close+0x1f9/frame 0xfffffe00f9c90690 devfs_close() at devfs_close+0x298/frame 0xfffffe00f9c90720 VOP_CLOSE_APV() at VOP_CLOSE_APV+0x13c/frame 0xfffffe00f9c90770 vn_close() at vn_close+0x194/frame 0xfffffe00f9c90810 vn_closefile() at vn_closefile+0x48/frame 0xfffffe00f9c90890 devfs_close_f() at devfs_close_f+0x2c/frame 0xfffffe00f9c908c0 _fdrop() at _fdrop+0x29/frame 0xfffffe00f9c908e0 sys_read() at sys_read+0x63/frame 0xfffffe00f9c90980 amd64_syscall() at amd64_syscall+0x2b3/frame 0xfffffe00f9c90ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00f9c90ab0 --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b78d8a, rsp = 0x7fffffbfdaf8, rbp = 0x7fffffbfdb30 --- CR: https://reviews.freebsd.org/D851 Reviewed by: glebius, ed Reported by: Leon Dang Sponsored by: Nahanni Systems MFC after: 1 week
* - Remove empty wrappers ether_poll_[de]register_drv(). [1]glebius2014-09-281-15/+2
| | | | | | | - Move polling(9) declarations out of ifq.h back to if_var.h they are absolutely unrelated to queues. Submitted by: Mikhail <mp lenta.ru> [1]
* Make do_dup() static and move relevant macros to kern_descrip.cmjg2014-09-261-1/+8
| | | | No functional changes.
* Don't panic if a resource is allocated twice. Instead, print a warning andjhb2014-09-261-1/+4
| | | | | | | fail the allocation request. Allocations of "reserved" resources such as PCI BARs already fail the request instead of panic'ing in this case. MFC after: 1 week
* Fix fcntl(2) compat32 after r270691. The copyin and copyout of thekib2014-09-251-15/+20
| | | | | | | | | | | | | struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days
* In kern_linkat() and kern_renameat(), do not call namei(9) whilekib2014-09-251-29/+56
| | | | | | | | | | | | | | | | holding a write reference on the filesystem. Try to get write reference in unblocked way after all vnodes are resolved; if failed, drop all locks and retry after waiting for suspension end. The VFS_UNMOUNT() methods for UFS and tmpfs try to establish suspension on unmount, while covered vnode is locked by VFS, which prevents namei() from stepping over the mount point. The thread doing namei() sleeps on the covered vnode lock, owning the write ref. Reported by: bdrewery Tested by: bdrewery (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Stage one of multipass suspend/resumejhibbits2014-09-232-3/+59
| | | | | | | | | | | | Summary: Add the beginnings of multipass suspend/resume, by introducing BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this. Reviewers: jhb Reviewed By: jhb Differential Revision: https://reviews.freebsd.org/D590
* Add a new fo_fill_kinfo fileops method to add type-specific information tojhb2014-09-2210-626/+463
| | | | | | | | | | | | | | struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)
* Convert from timeout(9) to callout(9).jhb2014-09-221-1/+9
|
* Improve transmit sending offload, TSO, algorithm in general.hselasky2014-09-221-0/+31
| | | | | | | | | | | | | | | | | The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week
* svn revisions r269964 and r269963 seemed to have impaired small memorysbruno2014-09-221-0/+2
| | | | | | | | | | | | | footprint systems(32M/64M) and didn't leave enough free memory to load modules when it was setting up page tables that for sizes that are never used on these smallish boards. Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this from happening. Verified on mips32 h/w. PR: 193465 Submitted by: delphij Reviewed by: adrian
* Reprase r271616 comments.mav2014-09-171-2/+2
| | | | | Submitted by: alc MFC after: 1 month
* Migrate ie->ie_assign_cpu and associated code to use an int for CPU ratheradrian2014-09-171-3/+3
| | | | | | | | | | | | | | | | | than u_char. Migrate post_filter to use an int for a CPU rather than u_char. Change intr_event_bind() to use an int for CPU rather than u_char. It touches the ppc, sparc64, arm and mips machdep code but it should (hah!) be a no-op. Tested: * i386, AMD64 laptops Reviewed by: jhb
* Modify cpuset_setithread() to take a CPU ID as an integer, not a char.adrian2014-09-161-1/+1
| | | | We're going to end up having > 254 CPUs at some point.
* Validate the mode argument in access, eaccess, and faccessat for optionalngie2014-09-161-0/+3
| | | | | | | | | | | | | | | POSIX compliance and to improve compatibility with Linux and NetBSD The issue was identified with lib/libc/sys/t_access:access_inval from NetBSD Update the manpage accordingly PR: 181155 Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code) MFC after: 4 weeks Phabric: D678 (code), D786 (manpage) Sponsored by: EMC / Isilon Storage Division
* Add comments describing r271604 change.mav2014-09-151-0/+12
| | | | MFC after: 3 days
* Add couple memory barries to serialize tdq_cpu_idle and tdq_load accesses.mav2014-09-141-0/+2
| | | | | | | | | This change fixes transient performance drops in some of my benchmarks, vanishing as soon as I am trying to collect any stats from the scheduler. It looks like reordered access to those variables sometimes caused loss of IPI_PREEMPT, that delayed thread execution until some later interrupt. MFC after: 3 days
* Fix error handling in cpuset_setithread() introduced in r267716.melifaro2014-09-131-6/+17
| | | | | Noted by: kib MFC after: 1 week
* Fix various issues with invalid file operations:jhb2014-09-127-225/+55
| | | | | | | | | | | | | | | | | | - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_*() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_*(), but a dummy fo_stat() implementation was added.
* Tweak pipe_truncate() to more closely match pipe_chown() and pipe_chmod()jhb2014-09-121-4/+8
| | | | by checking PIPE_NAMED and using invfo_truncate() for unnamed pipes.
* Simplify vntype_to_kinfo() by returning when the desired value is foundjhb2014-09-121-5/+2
| | | | | | | instead of breaking out of the loop and then immediately checking the loop index so that if it was broken out of the proper value can be returned. While here, use nitems().
* Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().glebius2014-09-102-4/+0
|
* Avoid unlocking unlocked mutex in RCTL jail code. Specific test casetrasz2014-09-091-2/+4
| | | | | | | | is attached to PR. PR: 193457 MFC after: 1 week Sponsored by: The FreeBSD Foundation
* - Make hhook_run_socket() vnet-aware instead of adding CURVNET_SET() aroundhrs2014-09-081-31/+20
| | | | | | | | the function calls. - Fix a memory leak and stats in the case that hhook_run_socket() fails in soalloc(). PR: 193265
* pause_sbt(): Take the cold path (ie. use DELAY()) if KDB is activedumbbell2014-09-081-1/+1
| | | | | | | | | | This fixes a panic in the i915 driver when one uses debug.kdb.enter=1 under vt(4). PR: 193269 Reported by: emaste@ Submitted by: avg@ MFC after: 3 days
* Fix for r271182.glebius2014-09-071-4/+6
| | | | | Submitted by: mjg Pointy hat to: me, submitter and everyone who urged me to commit
OpenPOWER on IntegriCloud