summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* o Rename "namespace" argument to "attrnamespace" as namespace is a C++rwatson2001-03-192-2/+2
| | | | | | | | reserved word. Part 2 of syscalls.master commit to catch rebuilt files. Submitted by: jkh Obtained from: TrustedBSD Project
* o Rename "namespace" argument to "attrnamespace" as namespace is a C++rwatson2001-03-196-25/+25
| | | | | | | reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
* Fix a couple of things in the internal mbuf allocation interface:bmilekic2001-03-171-8/+8
| | | | | | | | | | | | | | | - Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to be allocated from mb_map. In the case where nmbufs-reserved space is not an exact multiple of PAGE_SIZE (which it should be, but anyway...), we hold nmbufs as an absolute maximum which need not ever be reached. - Clean up m_clalloc(); make it more consistent in the sense that the first argument `ncl' really means "the number of clusters ensured to be allocated" and not "the number of pages worth of clusters to be allocated," as was previously the case. This also makes it consistent with m_mballoc() as well as the comment that preceeds it. Reviewed by: jlemon
* Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).peter2001-03-171-13/+6
| | | | | | | | | | | | | | | | | Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
* When doing a recv(.. MSG_WAITALL) for a message which is larger thanjlemon2001-03-161-0/+6
| | | | | | | | | | | the socket buffer size, the receive is done in sections. After completing a read, call pru_rcvd on the underlying protocol before blocking again. This allows the the protocol to take appropriate action, such as sending a TCP window update to the peer, if the window happened to close because the socket buffer was filled. If the protocol is not notified, a TCP transfer may stall until the remote end sends a window probe.
* Kill the 4MB kernel limit dead. [I hope :-)].peter2001-03-151-3/+9
| | | | | | | | | | | | | | | For UP, we were using $tmp_stk as a stack from the data section. If the kernel text section grew beyond ~3MB, the data section would be pushed beyond the temporary 4MB P==V mapping. This would cause the trampoline up to high memory to fault. The hack workaround I did was to use all of the page table pages that we already have while preparing the initial P==V mapping, instead of just the first one. For SMP, the AP bootstrap process suffered the same sort of problem and got the same treatment. MFC candidate - this breaks on 4.x just the same.. Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>
* Jake essentially rewrote this. It is not by any stretch of thepeter2001-03-151-2/+0
| | | | imagination a derivative of what I did before.
* Regenerate after rwatson's commit to syscalls.master (rev 1.85)peter2001-03-152-2/+2
|
* o Change the API and ABI of the Extended Attribute kernel interfaces torwatson2001-03-156-34/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
* Dont call device close and ioctl functions if device has disappeared.sos2001-03-131-2/+5
| | | | Reviewed by: phk
* Assert that the process we're trying to enqueue isn't already there.des2001-03-111-0/+21
|
* When aio_read/write() is used on a raw device, physical buffers arealc2001-03-101-7/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | used for up to "vfs.aio.max_buf_aio" of the requests. If a request size is MAXPHYS, but the request base isn't page aligned, vmapbuf() will map the end of the user space buffer into the start of the kva allocated for the next physical buffer. Don't use a physical buffer in this case. (This change addresses problem report 25617.) When an aio_read/write() on a raw device has completed, timeout() is used to schedule a signal to the process. Thus, the reporting is delayed up to 10 ms (assuming hz is 100). The process might have terminated in the meantime, causing a trap 12 when attempting to deliver the signal. Thus, the timeout must be cancelled when removing the job. aio jobs in state JOBST_JOBQGLOBAL should be removed from the kaio_jobqueue list during process rundown. During process rundown, some aio jobs might move from one list to a different list that has already been "emptied", causing the rundown to be incomplete. Retry the rundown. A call to BUF_KERNPROC() is needed after obtaining a physical buffer to disassociate the lock from the running process since it can return to userland without releasing that lock. PR: 25617 Submitted by: tegge
* Don't call malloc with M_WAITOK while holding a mutex.alfred2001-03-091-22/+21
|
* Push the test for a disconnected socket when accept()ing down to thejlemon2001-03-091-4/+1
| | | | | protocol layer. Not all protocols behave identically. This fixes the brokenness observed with unix-domain sockets (and postfix)
* Fix mtx_legal2block. The only time that it is bad to block on a mutex isjhb2001-03-094-6/+28
| | | | | | | | | | | | | | | | if we hold a spin mutex, since we can trivially get into deadlocks if we start switching out of processes that hold spinlocks. Checking to see if interrupts were disabled was a sort of cheap way of doing this since most of the time interrupts were only disabled when holding a spin lock. At least on the i386. To fix this properly, use a per-process counter p_spinlocks that counts the number of spin locks currently held, and instead of checking to see if interrupts are disabled in the witness code, check to see if we hold any spin locks. Since child processes always start up with the sched lock magically held in fork_exit(), we initialize p_spinlocks to 1 for child processes. Note that proc0 doesn't go through fork_exit(), so it starts with no spin locks held. Consulting from: cp
* Use the kthread API to create and destroy AIO daemons.alc2001-03-091-9/+7
| | | | Submitted by: jhb
* Add a new informative KASSERT to ensure that a process is in the SRUN statejhb2001-03-091-0/+3
| | | | before we return it to cpu_switch().
* Fix is a similar race condition as existed in the mbuf code. When we gobmilekic2001-03-081-6/+7
| | | | | | | | | | | | into an interruptable sleep and we increment a sleep count, we make sure that we are the thread that will decrement the count when we wakeup. Otherwise, what happens is that if we get interrupted (signal) and we have to wake up, but before we get our mutex, some thread that wants to wake us up detects that the count is non-zero and so enters wakeup_one(), but there's nothing on the sleep queue and so we don't get woken up. The thread will still decrement the sleep count, which is bad because we will also decrement it again later (as we got interrupted) and are already off the sleep queue.
* Make the wait for sendfile buffers interruptable. Stops one processdwmalone2001-03-081-3/+24
| | | | | | | | consuming them all and then getting stuck. Reviewed by: dg Reviewed by: bmilekic Observed by: Andreas Persson <pap@garen.net>
* Make the SYSCTL_OUT handlers sysctl_old_user() and sysctl_old_kernel()tmm2001-03-081-4/+10
| | | | | | | | more robust. They would correctly return ENOMEM for the first time when the buffer was exhausted, but subsequent calls in this case could cause writes ouside of the buffer bounds. Approved by: rwatson
* Fixes to track snapshot copy-on-write checking in the specinfomckusick2001-03-072-8/+1
| | | | | | structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
* Bitch more loudly when someone botches changes to kinfo_procmckusick2001-03-071-3/+12
| | | | | | in the hopes that they will actually *read* the comment above it and *follow* the instructions so as to cause all the rest of us less a lot less grief.
* - Don't hold the proc lock across VREF and the fd* functions to avoid lockjhb2001-03-071-4/+21
| | | | | | order reversals. - Add some preliminary locking in the !RF_PROC case. - Protect p_estcpu with sched_lock.
* - Release Giant a bit earlier on syscall exit.jhb2001-03-071-20/+14
| | | | | | - Don't try to grab Giant before postsig() in userret() as it is no longer needed. - Don't grab Giant before psignal() in ast() but get the proc lock instead.
* Grab the process lock while calling psignal and before calling psignal.jhb2001-03-074-14/+46
|
* Proc locking including using proc lock in place of proctree wherejhb2001-03-071-10/+27
| | | | appropriate and locking processes while we signal them.
* Proc locking.jhb2001-03-071-3/+9
|
* Use the proc lock to protect access to p_sigacts->ps_sigintr.jhb2001-03-071-4/+4
|
* - Proc locking.jhb2001-03-071-17/+29
| | | | - Remove some unneeded spl()'s.
* Lock the process while sending it SIGARLM and updating p_realtimer.jhb2001-03-071-0/+4
|
* - Proc locking.jhb2001-03-071-25/+7
| | | | - Remove unneeded spl()'s.
* - Proc locking. Most of signal handling is now MP safe and doesn't requirejhb2001-03-071-70/+162
| | | | | | | | | | Giant. The only exception is the CANSIGNAL() macro. Unlocking the proc lock around sendsig() in trapsignal() is also questionable. Note that the functions sigexit(), psignal(), and issignal() must be called with the proc lock of the process in question held. postsig() and trapsignal() should not be called with the proc lock held, but they also do not require Giant anymore either. - Remove spl's that are now no longer needed as they are fully replaced.
* Lock initproc when we send SIGINT to init during shutdown.jhb2001-03-071-0/+2
|
* - Add an extra check in priority_propagation() for UP systems to ensure wejhb2001-03-073-3/+27
| | | | | | | | don't end up back at ourselves which would indicate deadlock. - Add the proc lock to the witness dup_list as we may hold more than one process lock at a time. - Don't assert a mutex is owned in _mtx_unlock_sleep() as that is too late. We do the checks in the macros instead.
* - Use _PHOLD and move it before a PROC_UNLOCK to reduce the number ofjhb2001-03-071-7/+18
| | | | | | | | | | mutex operations in kthread_create(). - Lock a kthread's proc before changing its parent via proc_reparent(). - Test P_KTHREAD not P_SYSTEM in kthread_suspend() and kthread_resume(). P_SYSTEM just means that the process shouldn't be swapped and is used for vinum's daemon for example. - Lock all the signal state used for suspending and resuming kthreads with the proc lock.
* - Lock the forklist with an sx lock.jhb2001-03-071-14/+57
| | | | | | | | | | - Add proc locking to fork1(). Always lock the child procoess (new process) first when both processes need to be locked at the same time. - Remove unneeded spl()'s as the data they protected is now locked. - Ensure that the proctree is exclusively locked and the new process is locked when setting up the parent process pointer. - Lock the check for P_KTHREAD in p_flag in fork_exit().
* Check to see if p_fd is NULL before derferencing it in checkdirs(). It'sjhb2001-03-072-0/+4
| | | | | | possible for us to see a process in the early stages of fork before p_fd has been initialized. Ideally, we wouldn't stick a process on the allproc list until it was fully created however.
* - Call proc_reparent() when handing a process off to init in exit ratherjhb2001-03-071-18/+23
| | | | | | | | | | than dinking around in the process lists explicitly. - Hold both the proctree lock and proc lock of the child process when reparenting a process via proc_reparent. - Lock processes while sending them signals. - Miscellaenous proc locking. - proc_reparent() now asserts that the child is locked in addition to an exclusive proctree lock.
* In order to avoid recursing on the backing mutex for sx locks in thejhb2001-03-061-2/+2
| | | | | | | INVARIANTS case, define the actual KASSERT() in _SX_ASSERT_[SX]LOCKED macros that are used in the sx code itself and convert the SX_ASSERT_[SX]LOCKED macros to simple wrappers that grab the mutex for the duration of the check.
* Make the KASSERTs report the correct function names.des2001-03-061-18/+11
| | | | | Fix two off-by-one errors that would sometimes cause the final length of the sbuf to include the trailing zero.
* o Introduce filesystem-independent POSIX.1e ACL utility routines torwatson2001-03-063-15/+1320
| | | | | | | | | | | | | | | | | | | | | | | | support implementations of ACLs in file systems. Introduce the following new functions: vaccess_acl_posix1e() vaccess() that accepts an ACL acl_posix1e_mode_to_perm() Convert mode bits to ACL rights acl_posix1e_mode_to_entry() Build ACL entry from mode/uid/gid acl_posix1e_perms_to_mode() Generate file mode from ACL acl_posix1e_check() Syntax verification for ACL These functions allow a file system to rely on central ACL evaluation and syntax checking, as well as providing useful utilities to allow ACL-based file systems to generate mode/owner/etc information to return via VOP_GETATTR(), and to support file systems that split their ACL information over their existing inode storage (mode, uid, gid) and extended ACL into extended attributes (additional users, groups, ACL mask). o Add prototypes for exported functions to sys/acl.h, sys/vnode.h Reviewed by: trustedbsd-discuss, freebsd-arch Obtained from: TrustedBSD Project
* Add a missing splx() to aio_fphysio(). (This change is a no-op in -5.0,alc2001-03-061-12/+6
| | | | | | | | but potentially significant in -4.x.) Eliminate a pointless parameter to aio_fphysio(). Remove unnecessary casts from aio_fphysio() and aio_physwakeup().
* - Add sx_descr description member to sx lock structurebmilekic2001-03-061-6/+26
| | | | | | | | | | | | | | | | | - Add sx_xholder member to sx struct which is used for INVARIANTS-enabled assertions. It indicates the thread that presently owns the xlock. - Add some assertions to the sx lock code that will detect the fatal API abuse: xlock --> xlock xlock --> slock which now works thanks to sx_xholder. Notice that the remaining two problematic cases: slock --> xlock slock --> slock (a little less problematic, but still recursion) will need to be handled by witness eventually, as they are more involved. Reviewed by: jhb, jake, jasone
* Implement shared/exclusive locks.jasone2001-03-051-0/+160
| | | | Reviewed by: bmilekic, jake, jhb
* Eliminate the aio_freejobs list. Its purpose was to store freealc2001-03-051-40/+30
| | | | | | | | | | | | | | | | | | | | | aiocb's allocated by zalloc(). In other words, zfree() was never called. Now, we call zfree(). Why eliminate this micro- optimization? At some later point, when we multithread the AIO system, we would need a mutex to synchronize access to aio_freejobs, making its use nearly indistinguishable in cost from zalloc() and zfree(). Remove unnecessary fhold() and fdrop() calls from aio_qphysio(), undo'ing a part of revision 1.86. The reference count on the file structure is already incremented by _aio_aqueue() before it calls aio_qphysio(). (Update the comments to document this fact.) Remove unnecessary casts from _aio_aqueue(), aio_read(), aio_write() and aio_waitcomplete(). Remove an unnecessary "return;" from aio_process(). Add "static" in various places.
* Do not set a default ELF syscall ABI fallback.obrien2001-03-042-6/+8
| | | | | If one runs an un-branded Linux static binary that calls Linux's fcntl the machine will reboot when interupted by the FreeBSD syscall ABI.
* implement OCRNL, ONOCR, and ONLRETassar2001-03-041-1/+11
| | | | Obtained from: NetBSD
* Remove the field privatemodes from struct __aiocb_private and thealc2001-03-041-137/+2
| | | | | | | related code from aio_read() and aio_write(). This field was intended, but never used, to allow a mythical user-level library to make an aio_read() or aio_write() behave like an ordinary read() or write(), i.e., a blocking I/O operation.
* Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath.adrian2001-03-022-4/+4
| | | | Submitted by: Naoki Kobayashi <shibata@geo.titech.ac.jp>
* Ok, the kernel will panic in kmem_malloc() if the kernel map is full, sojhb2001-03-021-4/+0
| | | | | | | malloc with M_WAITOK can't actually return NULL. I wish I could get two people to give me the same answer about this when I ask... Submitted by: jake
OpenPOWER on IntegriCloud