summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Make sure the clone lists are sorted in the right order.phk2005-10-011-3/+5
| | | | | Explosion triggered by: pjd MFC: 3 days
* Big polling(4) cleanup.glebius2005-10-011-100/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Axe poll in trap. o Axe IFF_POLLING flag from if_flags. o Rework revision 1.21 (Giant removal), in such a way that poll_mtx is not dropped during call to polling handler. This fixes problem with idle polling. o Make registration and deregistration from polling in a functional way, insted of next tick/interrupt. o Obsolete kern.polling.enable. Polling is turned on/off with ifconfig. Detailed kern_poll.c changes: - Remove polling handler flags, introduced in 1.21. The are not needed now. - Forget and do not check if_flags, if_capenable and if_drv_flags. - Call all registered polling handlers unconditionally. - Do not drop poll_mtx, when entering polling handlers. - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx. - In netisr_poll() axe the block, where polling code asks drivers to unregister. - In netisr_poll() and ether_poll() do polling always, if any handlers are present. - In ether_poll_[de]register() remove a lot of error hiding code. Assert that arguments are correct, instead. - In ether_poll_[de]register() use standard return values in case of error or success. - Introduce poll_switch() that is a sysctl handler for kern.polling.enable. poll_switch() goes through interface list and enabled/disables polling. A message that kern.polling.enable is deprecated is printed. Detailed driver changes: - On attach driver announces IFCAP_POLLING in if_capabilities, but not in if_capenable. - On detach driver calls ether_poll_deregister() if polling is enabled. - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING flag. If there is no, then unlocks and returns. - In ioctl handler driver checks for IFCAP_POLLING flag requested to be set or cleared. Driver first calls ether_poll_[de]register(), then obtains driver lock and [dis/en]ables interrupts. - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable. If present, then returns.This is important to protect from spurious interrupts. Reviewed by: ru, sam, jhb
* Copy new process argument list in do_execve() before grabbing PROC_LOCKtruckman2005-10-011-10/+10
| | | | | | | | | | to avoid touching pageable memory while holding a mutex. Simplify argument list replacement logic. PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days
* Un-staticize waitrunningbufspace() and call it before returning fromtruckman2005-09-301-1/+1
| | | | | | | ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().
* Fox a LOR of sleep and sched_lock by using a timeout waitdavidxu2005-09-302-8/+1
| | | | | | when process reaches maximum number of threads. MFC after: 3 days
* Un-staticize runningbufwakeup() and staticize updateproc.truckman2005-09-302-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>
* Trim a couple of unneeded includes.jhb2005-09-291-1/+0
|
* Close a race in biodone(), whereby the bio_done field of the passedpeadar2005-09-291-3/+5
| | | | | | | | | | bio may have been freed and reassigned by the wakeup before being tested after releasing the bdonelock. There's a non-zero chance this is the cause of a few of the crashes knocking around with biodone() sitting in the stack backtrace. Reviewed By: phk@
* puc(4) does strange things to resources in order to fool thephk2005-09-281-0/+25
| | | | | | | | | | subdrivers to hook up. It should probably be rewritten to implement a simple bus to which the sub drivers attach using some kind of hint. Until then, provide a couple of crutch functions with big warning signs so it can survive the recent changes to struct resource.
* Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,rwatson2005-09-281-6/+0
| | | | | | | | | | | | | | | | | | | | osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde
* Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep trackcsjp2005-09-281-16/+15
| | | | | | | | | of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT macros around vrele as it's possible that this can call in the VOP_INACTIVE filesystem specific code. Also while we are here, remove the Giant assertion. from the sysctl handler, we do not actually require Giant here so we shouldn't assert it. Doing so will just complicate things when Giant is removed from the sysctl framework.
* If KDB_STOP_NMI is compiled into the kernel, defaultrwatson2005-09-271-1/+1
| | | | | | debug.kdb.stop_cpus_with_nmi to 1 rather than 0. MFC after: 3 days
* In lockstatus(), don't lock and unlock the interlock when testing therwatson2005-09-271-2/+8
| | | | | | | | | | sleep lock status while kdb_active, or we risk contending with the mutex on another CPU, resulting in a panic when using "show lockedvnods" while in DDB. MFC after: 3 days Reviewed by: jhb Reported by: kris
* No longer maintain mbstat statistics for the mbuf allocator, UMArwatson2005-09-271-11/+0
| | | | | | statistics and libmemstat(3) are now used to track mbuf statistics. MFC after: 1 month
* Use the refcount API to manage the reference count for user credentialsjhb2005-09-272-18/+9
| | | | | | rather than using pool mutexes. Tested on: i386, alpha, sparc64
* Use the reference count API to manage the reference counts for processjhb2005-09-271-11/+4
| | | | | | | limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64
* Use the refcount API to implement reference counts on process argumentjhb2005-09-271-11/+4
| | | | | | | structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64
* Update the "created from" section to reflect the most recent version ofcsjp2005-09-272-2/+2
| | | | | | syscalls.master Requested by: jhb
* Mark the extended attribute syscalls as being MP safe.csjp2005-09-271-13/+13
| | | | Requested by: jhb
* Add the spin lock used by the binary nvidia driver to the static lockjhb2005-09-261-0/+1
| | | | | | | order list so that WITNESS and the driver play together nicely. Tested by: Harald Schmalzbauer MFC after: 3 days
* Add "show allpcpu" to DDB, which prints the current CPU id followed byrwatson2005-09-261-12/+36
| | | | | | | | the per-cpu data for all CPUs. This is easier to ask users to do than "figure out how many CPUs you have, now run show pcpu, then run it once for each CPU you have". MFC after: 3 days
* Reorder statements to avoid accessing unknown memory.davidxu2005-09-261-2/+2
| | | | | In theory, invoking kenv with very long string can panic kernel.
* Acquire Giant in uprintf() and tprintf() rather than asserting it. Inrwatson2005-09-261-6/+11
| | | | | | | | | | | the vast majority of cases, these functions are called without mutexes held, meaning that in all but two cases, there will be no ordering issues with doing this, and it will eliminate the need for changes in the caller. In two cases, mutexes are held, so Giant must be acquired before those mutexes such that uprintf() and tprintf() recurse Giant rather than generating a lock order reversal. Suggested by: bde
* Add rman_is_region_manager() for the benefit of an alpha hack.phk2005-09-251-0/+7
|
* Implement new world order in VFS locking for extended attributes. This willcsjp2005-09-243-71/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remove the unconditional acquisition of Giant for extended attribute related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup Giant. Mark the following system calls as being MP safe so we no longer pickup Giant in the system call handler: o extattrctl o extattr_set_file o extattr_get_file o extattr_delete_file o extattr_set_fd o extattr_get_fd o extattr_delete_fd o extattr_set_link o extattr_get_link o extattr_delete_link o extattr_list_file o extattr_list_link o extattr_list_fd -Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which will keep track of any Giant acquisitions. -Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT -Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that we are sufficiently protected. I've tested these changes with various TrustedBSD MAC policies which use extended attribute a lot on SMP and UP systems (thanks to Scott Long for making some SMP hardware available to me for testing). Discussed with: jeff Requested by: jhb, rwatson
* Split struct resource in an external and internal part.phk2005-09-241-40/+57
| | | | | | | | | | | | | | The external part is still called 'struct resource' but the contents is now visible to drivers etc. This makes it part of the device driver ABI so it not be changed lightly. A comment to this effect is in place. The internal part is called 'struct resource_i' and contain its external counterpart as one field. Move the bus_space tag+handle into the external struct resource, this removes the need for device drivers to even know about these fields in order to use bus_space to access hardware. (More in following commit).
* Add two convenience functions for device drivers: bus_alloc_resources()phk2005-09-241-0/+31
| | | | | | | | | | | | and bus_free_resources(). These functions take a list of resources and handle them all in one go. A flag makes it possible to mark a resource as optional. A typical device driver can save 10-30 lines of code by using these. Usage examples will follow RSN. MFC: A good idea, eventually.
* Canonicalize the UNIX domain socket copyright layout: original holdersrwatson2005-09-231-2/+3
| | | | | | before more recent holders. MFC after: 3 days
* Don't pretend to be thread0 when calling sync().ups2005-09-221-2/+2
| | | | | | | | | It confuses the lock manager since in some places thread0 is then used for vnode locking while curthread is used for vnode unlocking. Found by: Yahoo! Reviewed by: ps@,jhb@ MFC after: 3 days
* Temporarily disable nice threshold detection code, as it can starvedavidxu2005-09-221-1/+3
| | | | | | | | a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087
* Use correct VFS locking rather than unconditionally grabbing Giant aroundjhb2005-09-211-12/+8
| | | | | | | namei() calls in kern_alternate_path(). Reviewed by: csjp MFC after: 1 week
* Pass 'curthread' into VFS_STATFS() from acctwatch(), rather than passingrwatson2005-09-211-2/+2
| | | | | | | | NULL. The NFS client expects that a thread will always be present for a VOP so that it can check for signal conditions, and will dereference a NULL pointer if one isn't present. MFC after: 3 days
* Correct an incorrect comment from the dawn of time: neither tprintf()rwatson2005-09-201-10/+2
| | | | | | | | | | nor uprintf() is believed to perform tsleep() or msleep() as written, as ttycheckoutq() is called with '0' as its sleep argument. Remove recently added WITNESS warnings for sleep as the comment was incorrect. This should silence a warning from the nfs_timer() code. Discussed with: bde
* Start time_uptime with 1 instead of 0.andre2005-09-191-1/+1
| | | | Discussed with: phk
* Rewamp DEVFS internals pretty severely [1].phk2005-09-191-39/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Give DEVFS a proper inode called struct cdev_priv. It is important to keep in mind that this "inode" is shared between all DEVFS mountpoints, therefore it is protected by the global device mutex. Link the cdev_priv's into a list, protected by the global device mutex. Keep track of each cdev_priv's state with a flag bit and of references from mountpoints with a dedicated usecount. Reap the benefits of much improved kernel memory allocator and the generally better defined device driver APIs to get rid of the tables of pointers + serial numbers, their overflow tables, the atomics to muck about in them and all the trouble that resulted in. This makes RAM the only limit on how many devices we can have. The cdev_priv is actually a super struct containing the normal cdev as the "public" part, and therefore allocation and freeing has moved to devfs_devs.c from kern_conf.c. The overall responsibility is (to be) split such that kern/kern_conf.c is the stuff that deals with drivers and struct cdev and fs/devfs handles filesystems and struct cdev_priv and their private liason exposed only in devfs_int.h. Move the inode number from cdev to cdev_priv and allocate inode numbers properly with unr. Local dirents in the mountpoints (directories, symlinks) allocate inodes from the same pool to guarantee against overlaps. Various other fields are going to migrate from cdev to cdev_priv in the future in order to hide them. A few fields may migrate from devfs_dirent to cdev_priv as well. Protect the DEVFS mountpoint with an sx lock instead of lockmgr, this lock also protects the directory tree of the mountpoint. Give each mountpoint a unique integer index, allocated with unr. Use it into an array of devfs_dirent pointers in each cdev_priv. Initially the array points to a single element also inside cdev_priv, but as more devfs instances are mounted, the array is extended with malloc(9) as necessary when the filesystem populates its directory tree. Retire the cdev alias lists, the cdev_priv now know about all the relevant devfs_dirents (and their vnodes) and devfs_revoke() will pick them up from there. We still spelunk into other mountpoints and fondle their data without 100% good locking. It may make better sense to vector the revoke event into the tty code and there do a destroy_dev/make_dev on the tty's devices, but that's for further study. Lots of shuffling of stuff and churn of bits for no good reason[2]. XXX: There is still nothing preventing the dev_clone EVENTHANDLER from being invoked at the same time in two devfs mountpoints. It is not obvious what the best course of action is here. XXX: comment out an if statement that lost its body, until I can find out what should go there so it doesn't do damage in the meantime. XXX: Leave in a few extra malloc types and KASSERTS to help track down any remaining issues. Much testing provided by: Kris Much confusion caused by (races in): md(4) [1] You are not supposed to understand anything past this point. [2] This line should simplify life for the peanut gallery.
* Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),rwatson2005-09-192-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
* Remove mac_create_root_mount() and mpo_create_root_mount(), whichrwatson2005-09-191-4/+0
| | | | | | | | | | | | | | | | | provided access to the root file system before the start of the init process. This was used briefly by SEBSD before it knew about preloading data in the loader, and using that method to gain access to data earlier results in fewer inconsistencies in the approach. Policy modules still have access to the root file system creation event through the mac_create_mount() entry point. Removed now, and will be removed from RELENG_6, in order to gain third party policy dependencies on the entry point for the lifetime of the 6.x branch. MFC after: 3 days Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com> Sponsored by: SPARTA
* Move the UUID generator into its own function, called kern_uuidgen(),marcel2005-09-181-29/+39
| | | | | | so that UUIDs can be generated from within the kernel. The uuidgen(2) syscall now allocates kernel memory, calls the generator, and does a copyout() for the whole UUID store. This change is in support of GPT.
* Add three new read-only socket options, which allow regression testsrwatson2005-09-181-0/+17
| | | | | | | | | | | | | | | and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week
* Fix spelling in a comment.rwatson2005-09-181-1/+1
| | | | MFC after: 3 days
* Re-comment sbcompress() to explain what it is it does; it took merwatson2005-09-182-14/+40
| | | | | | | | | | quite a bit of reading to figure it out, and I want to avoid figuring it out again. Convert an if (foo) else printf("this is almost a panic") into a KASSERT. MFC after: 3 days
* MFp4: Expose device_probe_child()imp2005-09-181-1/+1
|
* Implement new world order in VFS locking for ACLs. This will remove thecsjp2005-09-173-132/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unconditional acquisition of Giant for ACL related operations. If the file system is set as being MP safe and debug.mpsafevfs is 1, do not pickup giant. For any operations which require namei(9) lookups: __acl_get_file __acl_get_link __acl_set_file __acl_set_link __acl_delete_file __acl_delete_link __acl_aclcheck_file __acl_aclcheck_link -Set the MPSAFE flag in NDINIT -Initialize vfslocked variable using the NDHASGIANT macro For functions which operate on fds, make sure the operations are locked: __acl_get_fd __acl_set_fd __acl_delete_fd __acl_aclcheck_fd -Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode Discussed with: jeff
* Break out of loop if next buffer pointer has become invalid while flushingtegge2005-09-161-0/+15
| | | | | | current buffer. Reviewed by: kan
* Fix race condition that caused activation of an event toups2005-09-151-2/+4
| | | | | | | be ignored immediately after it was deactivated. Found by: Yahoo! MFC after: 3 days
* Oops, missed adding the required include.jhb2005-09-151-0/+1
| | | | Pointy hat to: jhb
* Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down})jhb2005-09-151-8/+2
| | | | with the disallow sleeping facility.
* Don't disallow sleeping for handlers on swi's since some swi handlersjhb2005-09-151-2/+4
| | | | | | (like CAM) do sleep in their handlers. Requested by: scottl
* - Enforce an implicit lock order that Giant cannot be locked while holdingjhb2005-09-151-1/+17
| | | | | | | | | | any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week
* - Add a new simple facility for marking the current thread as being in ajhb2005-09-152-0/+6
| | | | | | | | | | | | | state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk
OpenPOWER on IntegriCloud