summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
...
* Implement flexible BPF timestamping framework.jkim2010-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@
* Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSImav2010-06-141-0/+22
| | | | (FSB interrupts) to be used by non-PCI devices, such as HPET.
* Add another variation of make_dev(9), make_dev_p(9), that is allowedkib2010-06-121-24/+53
| | | | | | | | to fail and can return useful error code. Requested by: jh Reviewed by: imp, jh MFC after: 3 weeks
* When make_dev_credf(MAKEDEV_WAITOK) is called, usekib2010-06-121-4/+4
| | | | | | | | devctl_notify_f(M_WAITOK) for devfs notifications. Suggested by: jh Reviewed by: imp, jh MFC after: 3 weeks
* Add modifications of devctl_notify(9) functions that take flags. Usekib2010-06-121-6/+21
| | | | | | | | | | | flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for the memory allocation. As Warner noted, allowing the functions to sleep might cause reordering of the queued notifications. Reviewed by: imp, jh MFC after: 3 weeks
* fix a few cases where a string is passed via format argument instead ofavg2010-06-112-2/+2
| | | | | | | | | | via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week
* Update several places that iterate over CPUs to use CPU_FOREACH().jhb2010-06-118-27/+12
|
* Add INVARIANTS checking that numfreebufs values are sane. Also add amdf2010-06-111-10/+54
| | | | | | | | | per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)
* In another move to join with the age of the Fruitbat, increase SYSVivoras2010-06-112-5/+5
| | | | | | | | | | | shared resources defaults beyond absolute minimums. The new values are chosen mostly by magic. They are still fairly small and will need increasing for large installations (especially SHMMAX). However, they are now enough to e.g. start PostgreSQL installations with ~~300 users and nearly 512 MB of shared buffers. Reviewed by: A short discussion on hackers@
* Store interrupt trap frame into struct thread. It allows interrupt handlermav2010-06-101-0/+8
| | | | | | | to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@
* Unconfuse THREAD and SMT flagsivoras2010-06-101-1/+3
|
* Cosmetic change to XML - less ugly newlinesivoras2010-06-101-2/+2
|
* Reorganize the code in bdwrite() which handles move of dirtinesskib2010-06-081-70/+65
| | | | | | | | | | | | | | | | | | | | from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks
* Fix a sign bug that caused adaptive spinning in sx_xlock() to not workjhb2010-06-081-1/+1
| | | | | | | | properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days
* Call BUS_PROBE_NOMATCH() when device detached due to driver unload.mav2010-06-071-0/+4
| | | | This allows bus to power-down device when driver unloaded on-flight.
* Declare ip6 as (struct in6_addr *) instead of (struct in_addr *). This iscperciva2010-06-041-1/+1
| | | | | | | | | a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month
* Assert that the thread lock is held in sched_pctcpu() instead ofjhb2010-06-032-2/+2
| | | | | | | recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month
* The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.trasz2010-06-031-1/+1
| | | | | Found with: Coverity Prevent CID: 3688
* The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.trasz2010-06-031-1/+1
| | | | | Found with: Coverity Prevent CID: 3684
* The acl_cnt field is unsigned; no point in checking if it's >= 0.trasz2010-06-031-1/+0
| | | | | Found with: Coverity Prevent CID: 3683
* Sometimes vnodes share the lock despite being different vnodes onkib2010-06-031-2/+3
| | | | | | | | | | | | | different mount points, e.g. the nullfs vnode and the covered vnode from the lower filesystem. In this case, existing assertion in vop_rename_pre() may be triggered. Check for vnode locks equiality instead of the vnodes itself to not trip over the situation. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: pho MFC after: 2 weeks
* Minimize the use of the page queues lock for synchronizing access to thealc2010-06-022-4/+0
| | | | | page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.
* Add a facility to dynamically adjust or unconfigure p1003_1b mib.kib2010-06-022-5/+37
| | | | | | | | | Use it to allow to tune sem_nsem_max at runtime, only when sem.ko module is present in kernel. Requested and tested by: amdmi3 Reviewed by: jhb MFC after: 3 days
* Revert taskqueue(9) related commits until mdf@ is approved and canzml2010-06-011-14/+6
| | | | | | resolve issues. This reverts commits r207439, r208623, r208624
* Avoid a wakeup(9) if we can be sure no one is waiting on the task.zml2010-05-281-3/+11
| | | | | Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb
* Revert r207439 and solve the problem differently. The task handlerzml2010-05-281-6/+5
| | | | | | | | | ta_func may free the task structure, so no references to its members are valid after the handler has been called. Using a per-queue member and having waits longer than strictly necessary was suggested by jhb. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb
* When close() is called on a connected socket pair, SO_ISCONNECTED might berwatson2010-05-271-1/+4
| | | | | | | | | | | | set but be cleared before the call to sodisconnect(). In this case, ENOTCONN is returned: suppress this error rather than returning it to userspace so that close() doesn't report an error improperly. PR: kern/144061 Reported by: Matt Reimer <mreimer at vpop.net>, Nikolay Denev <ndenev at gmail.com>, Mikolaj Golub <to.my.trociny at gmail.com> MFC after: 3 days
* Add the support for reporting the NOCOREDUMP flag fromattilio2010-05-271-0/+4
| | | | | | | | sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week
* Allow to use syscallname(9) outside subr_trap.c.kib2010-05-261-2/+1
| | | | MFC after: 1 month
* Ignore the 'addr' argument passed to PT_STEP (it is required to be '1'jhb2010-05-251-14/+20
| | | | | | | for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week
* Eliminate the acquisition and release of the page queues lock fromalc2010-05-251-5/+0
| | | | | | vfs_busy_pages(). It is no longer needed. Submitted by: kib
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* - Implement MI helper functions, dividing one or two timer interrupts withmav2010-05-241-0/+52
| | | | | | | | arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
* Fix the double counting of the last process thread td_incruntimekib2010-05-242-4/+4
| | | | | | | | | | | on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days
* Reorganize syscall entry and leave handling.kib2010-05-236-4/+197
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
* - Adjust the whitespace for the lines that output fields in 'show pcpu' injhb2010-05-211-4/+3
| | | | | | | | DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month
* Assert that the thread passed to sched_bind() and sched_unbind() isjhb2010-05-212-3/+5
| | | | | | curthread as those routines are only supported for curthread currently. MFC after: 1 month
* Allow a const char * to be passed as the process name to kproc_kthread_add()jhb2010-05-211-1/+1
| | | | | | without generating a warning. MFC after: 1 month
* Remove PIOLLHUP from the flags used to test for to set exceptfsdkib2010-05-211-1/+1
| | | | | | | | | | fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks
* The page queues lock is no longer required by vm_page_set_invalid(), soalc2010-05-181-2/+0
| | | | | | | | | eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib
* This pushes all of JC's patches that I have in place. Irrs2010-05-162-2/+5
| | | | | | | | | | | am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC
* Fix an issue with the dynamic pcpu/vnet data allocators.bz2010-05-141-1/+1
| | | | | | | | | | | | | | | We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than *_MODMIN space and ignore the extra space (*). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days
* Add VOP_ADVLOCKPURGE so that the file system is called when purgingzml2010-05-123-1/+19
| | | | | | | locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr
* When there is no memory or KVA, try to help by reclaiming some vnodes.pjd2010-05-121-0/+13
| | | | | | | | This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> MFC after: 1 week
* I added vfs_lowvnodes event, but it was only used for a short while and nowpjd2010-05-111-1/+0
| | | | | | it is totally unused. Remove it. MFC after: 3 days
* Right now, WITNESS just blindly pipes all the output to theattilio2010-05-113-16/+20
| | | | | | | | | | | | | (TOCONS | TOLOG) mask even when called from DDB points. That breaks several output, where the most notable is textdump output. Fix this by having configurable callbacks passed to witness_list_locks() and witness_display_spinlock() for printing out datas. Reported by: several broken textdump outputs Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC after: 7 days X-MFC: r207922
* There is not a good reason to have a different prototype for db_printf()attilio2010-05-111-6/+6
| | | | | | | | when compared to printf(). Unify it by returning the number of characters displayed for db_printf() as well. MFC after: 7 days
* Fix a hang introduced in r206878 for kernel compiled with SMP support butattilio2010-05-111-1/+2
| | | | | | | | | being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878
* Update a comment: It no longer makes sense to talk about the page queuesalc2010-05-081-4/+1
| | | | lock here.
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-082-5/+0
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
OpenPOWER on IntegriCloud