summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Introduce the sysclock_getsnapshot() and sysclock_snap2bintime() KPIs. Thelstewart2011-12-242-6/+142
| | | | | | | | | | | | | | | | | | | | sysclock_getsnapshot() function allows the caller to obtain a snapshot of all the system clock and timecounter state required to create time stamps at a later point. The sysclock_snap2bintime() function converts a previously obtained snapshot into a bintime time stamp according to the specified flags e.g. which system clock, uptime vs absolute time, etc. These KPIs enable useful functionality, including direct comparison of the feedback and feed-forward system clocks and generation of multiple time stamps with different formats from a single timecounter read. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)
* Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and usejhb2011-12-232-0/+20
| | | | | | | | | | these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended attributes of a vnode are changed. Note that OS X already implements this behavior. Reviewed by: rwatson MFC after: 2 weeks
* Use TASK_INITIALIZER() for dev_dtr_task rather than a dedicated SYSINIT().jhb2011-12-221-10/+2
|
* ule: ensure that batch timeshare threads are scheduled fairlyavg2011-12-191-2/+2
| | | | | | | | | | | | | | With the previous code, if the range of priorities for timeshare batch threads was greater than RQ_NQS, then the threads with low priorities in the part of the range above RQ_NQS would be scheduled to the run-queues as if they had high priorities at the beginning of the range. In other words, threads with a nice level of +N could be scheduled as if they had a nice level of -M. Reported by: George Mitchell <george@m5p.com> Reviewed by: jhb Tested by: George Mitchell <george@m5p.com> (earlier version) MFC after: 1 week
* Fix style and white spaces.trociny2011-12-171-14/+14
| | | | MFC after: 1 week
* On start most of sysctl_kern_proc functions use the same pattern:trociny2011-12-171-112/+80
| | | | | | | | | | | | | locate a process calling pfind() and do some additional checks like p_candebug(). To reduce this code duplication a new function pget() is introduced and used. As the function may be useful not only in kern_proc.c it is in the kernel name space. Suggested by: kib Reviewed by: kib MFC after: 2 weeks
* belatedly transfer copyrights from libkern/gets.c to kern_cons.cavg2011-12-171-0/+3
| | | | | MFC after: 2 months MFC with: r228642
* replace uses of libkern gets with cngetsavg2011-12-171-2/+2
| | | | MFC after: 2 months
* introduce cngets, a method for kernel to read a string from consoleavg2011-12-171-0/+49
| | | | | | | | | | | | | This is intended as a replacement for libkern's gets and mostly borrows its implementation. It uses cngrab/cnungrab to delimit kernel's access to console input. Note: libkern's gets obviously doesn't share any bits of implementation iwth libc's gets. They also have different APIs and the former doesn't have the overflow problems of the latter. Inspired by: bde MFC after: 2 months
* introduce cngrab/cnungrab stub calls in some places where they make senseavg2011-12-173-0/+10
| | | | MFC after: 2 months
* kern cons: introduce infrastructure for console grabbing by kernelavg2011-12-171-0/+26
| | | | | | | | | | | | At the moment grab and ungrab methods of all console drivers are no-ops. Current intended meaning of the calls is that the kernel takes control of console input. In the future the semantics may be extended to mean that the calling thread takes full ownership of the console (e.g. console output from other threads could be suspended). Inspired by: bde MFC after: 2 months
* Fire a kevent if necessary after seeking on a regular file. This fixes ajhb2011-12-161-0/+1
| | | | | | | | case where a kevent would not fire on a regular file if an application read to EOF and then seeked backwards into the file. Reviewed by: kib MFC after: 2 weeks
* Use vm_mmap_to_errno().jhb2011-12-151-9/+2
| | | | Submitted by: kib
* Fix select/poll/kqueue for write on reverse direction before first write.jilles2011-12-141-2/+4
| | | | | | | | | | | | | | | | | The reverse direction of a pipe is lazily allocated on the first write in that direction (because pipes are usually used in one direction only). A special case is needed to ensure the pipe appears writable before the first write because there are 0 bytes of pending data in 0 bytes of buffer space at that point, leaving 0 bytes of data that can be written with the normal code. Note that the first write returns [ENOMEM] if kern.ipc.maxpipekva is exceeded and does not block or return [EAGAIN], so selecting true for write is correct even in that case. PR: kern/93685 Submitted by: gianni MFC after: 2 weeks
* Add a helper API to allow in-kernel code to map portions of shared memoryjhb2011-12-141-0/+119
| | | | | | objects created by shm_open(2) into the kernel's address space. This provides a convenient way for creating shared memory buffers between userland and the kernel without requiring custom character devices.
* Match other formatting.obrien2011-12-141-4/+4
|
* Disallow various debug.kdb sysctl's when securelevel is raised.obrien2011-12-131-4/+6
| | | | PR: 161350
* - Add a sysctl to allow non-root users the ability to set idleeadler2011-12-131-25/+33
| | | | | | | | | | | | priorities. - While here fix up some style nits. Discussed with: cperciva (breifly) Reviewed by: pjd (earlier version) Reviewed by: bde Approved by: jhb MFC after: 1 month
* Document a large number of currently undocumented sysctls. While hereeadler2011-12-135-11/+16
| | | | | | | | | | | | fix some style(9) issues and reduce redundancy. PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week
* put sys/systm.h at its proper place or add it if missingavg2011-12-122-2/+2
| | | | | | | Reported by: lstewart, tinderbox Pointyhat to: avg, attilio MFC after: 1 week MFC with: r228430
* kern_racct: move sys/systm.h inclusion to its proper placeavg2011-12-121-2/+1
| | | | | | | | This should fix the build failure introduced with r228424. Also remove duplicate inclusion of sys/param.h. Pointyhat to: avg MFC after: 1 week
* panic: add a switch and infrastructure for stopping other CPUs in SMP caseavg2011-12-1110-18/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
* Move cpu_set_upcall(newtd, td) up before the first call ofpho2011-12-091-2/+2
| | | | | | | | thread_free(newtd). This to avoid a possible page fault in cpu_thread_clean() as seen on amd64 with syscall fuzzing. Reviewed by: kib MFC after: 1 week
* - Fix ktrace leakage if error is seteadler2011-12-081-1/+1
| | | | | | | PR: kern/163098 Submitted by: Loganaden Velvindron <loganaden@devio.us> Approved by: sbruno@ MFC after: 1 month
* Eliminate stale numbers from a comment.alc2011-12-071-5/+2
|
* Eliminate the possibility of 32-bit arithmetic overflow in the calculationalc2011-12-071-4/+4
| | | | | | | | | | of vm_kmem_size that may occur if the system administrator has specified a vm.vm_kmem_size tunable value that exceeds the hard cap. PR: 162741 Submitted by: Adam McDougall Reviewed by: bde@ MFC after: 3 weeks
* Most users of pipe(2) do not call fstat(2) on the returned pipe descriptors.kib2011-12-061-9/+28
| | | | | | | | | | | Optimize for the case, by lazily allocating the pipe inode number at the fstat(2) time. If alloc_unr(9) returns failure, do not fail fstat(2), since uses of inode numbers are even rare then fstat(2), but provide zero inode forever. Note that alloc_unr() failure is unlikely due to total number of pipes in the system limited by the number of file descriptors. Based on the submission by: gianni MFC after: 2 weeks
* Really protect kern.proc.ps_strings sysctls with p_candebug(). Thistrociny2011-12-061-1/+1
| | | | | | | was intended to be in r228288. Spotted by: many MFC after: 1 week
* Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug().trociny2011-12-051-2/+4
| | | | | | | | | | | | | | | | | | Citing jilles: If we are ever going to do ASLR, the AUXV information tells an attacker where the stack, executable and RTLD are located, which defeats much of the point of randomizing the addresses in the first place. Given that the AUXV information seems to be used by debuggers only anyway, I think it would be good to move it to p_candebug() now. The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already under p_candebug(). Suggested by: jilles Discussed with: rwatson MFC after: 1 week
* Add a missing curly bracketkevlo2011-12-051-0/+1
|
* critical_exit: ignore td_owepreempt if kdb_active is setavg2011-12-041-1/+1
| | | | | | | | | calling mi_switch in such a context results in a recursion via kdb_switch Suggested by: jhb Reviewed by: jhb MFC after: 5 weeks
* In sysctl_kern_proc_ps_strings() there is no much sense in checkingtrociny2011-12-041-8/+0
| | | | | | for P_WEXIT and P_SYSTEM flags. Reviewed by: kib
* Make sure the description of pause() ishselasky2011-12-031-1/+2
| | | | | | | | equivalent to its implementation. No code change. Suggested by: Bruce Evans MFC after: 3 days
* - Fix typos s/(more|less) then|\1 than/eadler2011-12-031-5/+5
| | | | | | Submitted by: Davide Italiano <davide.italiano@gmail.com> Approved by: brucec MFC after: 3 days
* Use umtx_copyin_timeout() to copy and check timeout parameter.pho2011-12-031-5/+1
| | | | | In collaboration with: kib MFC after: 1 week
* Add umtx_copyin_timeout() and move parameter checks here.pho2011-12-031-53/+25
| | | | | In collaboration with: kib MFC after: 1 week
* Rename copyin_timeout32 to umtx_copyin_timeout32 and move parameterpho2011-12-031-42/+18
| | | | | | | check here. Include check for negative seconds value. In collaboration with: kib MFC after: 1 week
* It doesn't make much sense to check whether child is NULL after alreadymarius2011-12-021-4/+5
| | | | | | | | | | having dereferenced it. We either should generally check the device_t's supplied to bus functions before using them (which we seem to virtually never do) or just assume that they are not NULL. While at it make this code fit 78 columns. Found with: Coverity Prevent(tm) CID: 4230
* - In device_probe_child(9) check the return value of device_set_driver(9)marius2011-12-021-11/+15
| | | | | | | | | when actually setting a driver as especially ENOMEM is fatal in these cases. - Annotate other calls to device_set_devclass(9) and device_set_driver(9) without the return value being checked and that are okay to fail. Reviewed by: yongari (slightly earlier version)
* When changing the user priority of a thread, change the real priorityjhb2011-12-021-2/+3
| | | | | | | | | in addition to the user priority for threads whose current real priority is equal to the previous user priority or if the new priority is a real-time priority. This allows priority changes of other threads to have an immediate effect. MFC after: 2 weeks
* If alloc_unr() call in the pipe_create() failed, then pipe->pipe_ino iskib2011-12-011-2/+2
| | | | | | | | | | | | | | -1. But, because ino_t is unsigned, this case was not covered by the test ino > 0 in pipeclose(), leading to the free_unr(-1). Fix it by explicitely comparing with 0 and -1. [1] Do no access freed memory, the inode number was cached to prevent access to cpipe after it possibly was freed, but I failed to commit the right patch. Noted by: gianni [1] Pointy hat to: kib MFC after: 3 days
* Revise the sysctl handling code and restructure the hierarchy of sysctlslstewart2011-12-011-45/+51
| | | | | | | | | | | | | | | | | | | | introduced when feed-forward clock support is enabled in the kernel: - Rename the "choice" variable to "available". - Streamline the implementation of the "active" variable's sysctl handler function. - Create a kern.sysclock sysctl node for general sysclock related configuration options. Place the "available" and "active" variables under this node. - Create a kern.sysclock.ffclock sysctl node for feed-forward clock specific configuration options. Place the "version" and "ffcounter_bypass" variables under this node. - Tweak some of the description strings. Discussed with: Julien Ridoux (jridoux at unimelb edu au)
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-302-3/+3
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
* Make sysclock_active publicly available to external consumers.lstewart2011-11-291-2/+0
| | | | | | | | | | | Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Discussed with: Julien Ridoux (jridoux at unimelb edu au) Submitted by: Julien Ridoux (jridoux at unimelb edu au)
* Do away with the somewhat clunky sysclock_ops structure and associated code,lstewart2011-11-291-90/+12
| | | | | | | | | | | | | | reimplementing the [get]{bin,nano,micro}[up]time() wrapper functions in terms of the new "fromclock" API instead. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Discussed with: Julien Ridoux (jridoux at unimelb edu au) Submitted by: Julien Ridoux (jridoux at unimelb edu au)
* Make the fbclock_[get]{bin,nano,micro}[up]time() function prototypes public solstewart2011-11-291-12/+12
| | | | | | | | | | | | | | that new APIs with some performance sensitivity can be built on top of them. These functions should not be called directly except in special circumstances. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Discussed with: Julien Ridoux (jridoux at unimelb edu au) Submitted by: Julien Ridoux (jridoux at unimelb edu au)
* Fix an oversight in r227747 by calling fbclock_bin{up}time() directly from thelstewart2011-11-291-5/+5
| | | | | | | | | | | | | fbclock_{nanouptime|microuptime|bintime|nanotime|microtime}() functions to avoid indirecting through a sysclock_ops wrapper function. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)
* Add sysctl to retrieve ps_strings structure location of another process.trociny2011-11-271-0/+57
| | | | | Suggested by: kib Reviewed by: kib
* In sysctl_kern_proc_auxv the process was released too early: we stilltrociny2011-11-271-5/+7
| | | | | | need to hold it when checking process sv_flags. MFC after: 2 weeks
* Export the "ffclock" feature for kernels compiled with feed-forward clocklstewart2011-11-261-0/+2
| | | | | | | support. Suggested by: netchild Reviewed by: netchild
OpenPOWER on IntegriCloud