summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Make the DIAGNOSTIC code which complains about long {call|time}out(9)phk2003-12-071-5/+11
| | | | | | functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.
* Regen due to kse_switchin(2).marcel2003-12-072-2/+4
|
* Add kse_switchin(2). This syscall can be used by KSE implementationsmarcel2003-12-073-0/+50
| | | | | | | | to have the kernel switch to a new thread, instead of doing it in userland. It is in fact needed on ia64 where syscall restarts do not return to userland first. It's completely handled inside the kernel. As such, any context created by the kernel as part of an upcall and caused by some syscall needs to be restored by the kernel.
* rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).peter2003-12-071-1/+1
| | | | | | | Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff
* Re-arrange and consolidate some random debugging stuffscottl2003-12-071-0/+53
|
* - Giant is no longer required by vm_thread_new().alc2003-12-072-4/+0
|
* Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(),rwatson2003-12-061-1/+1
| | | | | | | | | | | | | and the mpo_create_cred() MAC policy entry point to mpo_copy_cred_label(). This is more consistent with similar entry points for creation and label copying, as mac_create_cred() was called from crdup() as opposed to during process creation. For a number of policies, this removes the requirement for special handling when copying credential labels, and improves consistency. Approved by: re (scottl) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Fix all users of mp_maxid to use the same semantics, namely:jhb2003-12-031-1/+1
| | | | | | | | 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha
* Export a few SMP related symbols in UP kernels as well. This is needed tojhb2003-12-031-0/+36
| | | | | | | | | | aid other kernel code, especially code which can be in a module such as the acpi_cpu(4) driver, to work properly with both SMP and UP kernels. The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and the smp_rendezvous() function. This also means that CPU_ABSENT() is now always implemented the same on all kernels. Approved by: re (scottl)
* Fixed a bug in sendfile(2) where the sent data would be corrupted duedg2003-12-011-0/+5
| | | | | | | | | | to sendfile(2) being erroneously automatically restarted after a signal is delivered. Fixed by converting ERESTART to EINTR prior to exiting. Updated manual page to indicate the potential EINTR error, its cause and consequences. Approved by: re@freebsd.org
* In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in theiedowse2003-11-301-3/+7
| | | | | | | | | | | | | forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)
* - Don't forget to unlock the vnode interlock in the LK_NOWAIT case.jeff2003-11-301-1/+2
| | | | | Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)
* Do not attempt to destroy NULL vfs options list.kan2003-11-231-1/+1
| | | | | Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>
* - Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still calledjhb2003-11-211-6/+6
| | | | | | | | | | | | | | | | | | | | very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)
* Fix a major faux pas of mine. I was causing 2 very bad things tomarkm2003-11-202-4/+2
| | | | | | | | | | | | | | | happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *
* Hackfix to patch around a kernel panic I introduced. Real fix tomarkm2003-11-181-0/+4
| | | | | | follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-183-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Add a sysctl, security.bsd.see_other_gids, similar in semanticsrwatson2003-11-171-2/+51
| | | | | | | to see_other_uids but with the logical conversion. This is based on (but not identical to) the patch submitted by Samy Al Bahra. Submitted by: Samy Al Bahra <samy@kerneled.com>
* Initial landing of SMP support for FreeBSD/amd64.peter2003-11-172-3/+3
| | | | | | | | | | | | | | | | - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
* - Mark ksq_assigned as volatile so that when this code is used withoutjeff2003-11-171-3/+3
| | | | sched_lock we can be sure that we'll pick up the new value.
* - Remove long dead code. rslices hasn't been used in some time and neitherjeff2003-11-171-52/+4
| | | | has sched_pickcpu().
* Expand the argument to the ithread enable/disable helper hooks from anpeter2003-11-171-3/+3
| | | | int to something big enough to hold a pointer. amd64 needs this.
* Implement sockets support for __mac_get_fd() and __mac_set_fd()rwatson2003-11-161-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | system calls, and prefer these calls over getsockopt()/setsockopt() for ABI reasons. When addressing UNIX domain sockets, these calls retrieve and modify the socket label, not the label of the rendezvous vnode. - Create mac_copy_socket_label() entry point based on mac_copy_pipe_label() entry point, intended to copy the socket label into temporary storage that doesn't require a socket lock to be held (currently Giant). - Implement mac_copy_socket_label() for various policies. - Expose socket label allocation, free, internalize, externalize entry points as non-static from mac_net.c. - Use mac_socket_label_set() in __mac_set_fd(). MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and mac_get_peer() to retrieve and set various socket labels without directly invoking the getsockopt() interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Reduce gratuitous redundancy and length in function names:rwatson2003-11-161-7/+5
| | | | | | | | | mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - Modify alpha's sf_buf implementation to use the direct virtual-to-alc2003-11-162-6/+9
| | | | | | | | | physical mapping. - Move the sf_buf API to its own header file; make struct sf_buf's definition machine dependent. In this commit, we remove an unnecessary field from struct sf_buf on the alpha, amd64, and ia64. Ultimately, we may eliminate struct sf_buf on those architecures except as an opaque pointer that references a vm page.
* When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, makerwatson2003-11-161-0/+8
| | | | | | | | | sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Localized the cy driver's locking.bde2003-11-161-3/+0
|
* Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".phk2003-11-151-4/+4
|
* Initialize sequence numbers to 0 in seminit() instead of using whatevertjr2003-11-151-0/+1
| | | | | garbage happens to be in memory. This did not seem to cause any problems except making semaphore ID's unpredictable (and ugly in ipcs(1) output).
* Send B_PHYS out to pasture, it no longer serves any function.phk2003-11-153-11/+1
|
* - Remove the remaining now unnecessary checks for the buf's b_object beingalc2003-11-151-10/+4
| | | | | NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff
* - Introduce kseq_runq_{add,rem}() which are used to insert and removejeff2003-11-151-61/+83
| | | | | | | | | | | | | | | | | | | | | | kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.
* Better fix than my previous commit:cognet2003-11-142-8/+9
| | | | | | | | | | | in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting. Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week
* Fix a number of style(9) bugs introduced in r1.113 by me.kan2003-11-141-47/+46
| | | | Suggested by: bde
* - regen.jeff2003-11-142-3/+3
|
* - Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implementsjeff2003-11-141-1/+1
| | | | | | parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and giant must be acquired prior to the proc lock, so ptrace must require giant still.
* Various minor details:phk2003-11-131-8/+17
| | | | | | | | Give the HZ/overflow check a 10% margin. Eliminate bogus newline. If timecounters have equal quality, prefer higher frequency. Some inspiration from: bde
* - Close a race where a thread on another CPU could release a contested lockjhb2003-11-121-4/+12
| | | | | | | | | | | | and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)
* At the request of several developers, restore the DIAGNOSIC codemckusick2003-11-121-0/+28
| | | | | | | | | | deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>
* Mark __mac_get_pid() as MPSAFE in the comment, as it runs withoutrwatson2003-11-121-37/+15
| | | | | | | | | | | Giant and is also MPSAFE. Push Giant further down into __mac_get_fd() and __mac_set_fd(), grabbing it only for constrained regions dealing with VFS, and dropping it entirely for operations related to labeling of pipes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* MNAMELEN is back to an int again after Kirk's statfs commitpeter2003-11-121-1/+1
| | | | | kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1
* Fix a typo in a comment.jhb2003-11-121-1/+1
| | | | Submitted by: das
* Replace B_PHYS conditional assignment to bio_offset with KASSERT checkphk2003-11-121-2/+7
| | | | to see that the originating code already did it right.
* Update the five files derived from /sys/kern/syscalls.mastermckusick2003-11-122-18/+18
| | | | | | | | | after the additions made for the new statfs structure (version 1.157). These must be updated in a separate checkin after syscalls.master has been checked in so that they reflect its new CVS identity. As these are purely derived files, it is not clear to me why they are under CVS at all. I presume that it has something to do with having `make world' operate properly.
* Update the statfs structure with 64-bit fields to allowmckusick2003-11-125-38/+606
| | | | | | | | | | | | | | | | | accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
* Modify the MAC Framework so that instead of embedding a (struct label)rwatson2003-11-122-75/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* 1. Consolidate mount struct allocation/destruction into a common code inkan2003-11-121-429/+171
| | | | | | | | | | | | | | | | | | | | | | | vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
* Add an implementation of turnstiles and change the sleep mutex code to usejhb2003-11-115-977/+510
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP
* Bound the number of iterations a thread can perform insidejkoshy2003-11-111-6/+8
| | | | | | | | | ktr_resize_pool(); this eliminates a potential livelock. Return ENOSPC only if we encountered an out-of-memory condition when trying to increase the pool size. Reviewed by: jhb, bde (style)
* Have utrace(2) return ENOMEM if malloc() fails. Document this errorjkoshy2003-11-111-1/+1
| | | | | | return in its manual page. Reviewed by: jhb
OpenPOWER on IntegriCloud