summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Fix a few style bogons.jhb2009-01-211-2/+1
| | | | Submitted by: bde
* Move the code from ufs_lookup.c used to do dotdot lookup, intokib2009-01-211-0/+32
| | | | | | | | | the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. Requested by: jhb Tested by: pho MFC after: 1 month
* Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:jhb2009-01-212-5/+7
| | | | | | | | | | | | VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature. Inspired by: ups Tested by: pho
* Add functions WITNESS so it can be asserted that the lock is not released for athompsa2009-01-211-0/+49
| | | | | | | | | | | section of code, this uses WITNESS_NORELEASE() and WITNESS_RELEASEOK() to mark the boundaries. Both functions require the lock to be held when calling. This is intended for scenarios like a bus asserting that the bus lock is not dropped during a driver call. There doesn't appear to be a man page to document this in. Reviewed by: jhb
* FFS puts the extended attributes blocks at the negative blocks for thekib2009-01-201-1/+1
| | | | | | | | | | | | | | | | | | vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data. Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already. Reported by: csjp Reviewed by: ups MFC after: 3 weeks
* Add a limit on namecache entries.mckay2009-01-201-0/+6
| | | | | | | | | | In normal operation, the number of cache entries is roughly equal to the number of active vnodes. However, when most of the recently accessed vnodes have many hard links, the number of cache entries can be 32000 times as large, exhausting kernel memory and provoking a panic in kmem_malloc(). MFC after: 2 weeks
* Teach m_copyback() to use trailing space of the last mbuf in chain.mav2009-01-181-0/+4
|
* - Implement generic macros for producing KTR records that are compatiblejeff2009-01-175-51/+132
| | | | | | | | | | | | with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia
* Lock the semaphore identifier lock during semaphore initialization tokib2009-01-151-0/+4
| | | | | | | | | guarantee atomicity of the operation for other semaphore consumers. In particular, this should guard against access to the semaphore with not done or partially done MAC label assignment. Reviewed by: rwatson MFC after: 1 month
* It seems that there are at least three issues with IPC_RMID operationkib2009-01-141-152/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on SysV semaphores. The squeeze of the semaphore array in the kern_semctl() modifies sem_base for the semaphores with sem_base greater then sem_base of the removed semaphore, as well as the values of the semaphores, without locking their mutex. This can lead to (killable) hangs or unexpected behaviour of the processes performing any sem operations while other process does IPC_RMID. The semexit_myhook() eventhandler unlocks SEMUNDO_LOCK() while accessing *suptr. This allows for IPC_RMID for the sem id to be performed in parallel with undo hook referenced by the current undo structure. This leads to the panic("semexit - semid not allocated") [1]. The semaphore creation is protected by Giant, while IPC_RMID is done while only semaphore mutex is held. This seems to result in invalid values for semtot, causing random ENOSPC error returns [2]. Redo the locking of the semaphores lifetime cycle. Delegate the sem_mtx to the sole purpose of protecting semget() and semctl(IPC_RMID). Introduce new sem_undo_mtx to protect SEM_UNDO handling. Remove the Giant remnants from the code. Note that mac_sysvsem_check_semget() and mac_sysvsem_create() are now called while sem_mtx is held, as well as mac_sysvsem_cleanup() [3]. When semaphore is removed, acquire semaphore locks for all semaphores with sem_base that is going to be changed by squeeze of the sema array. The lock order is not important there, because the region is protected by sem_mtx. Organize both used and free sem_undo structures into the lists, protected by sem_undo_mtx. In semexit_myhook(), remove sem_undo structure that is being processed, from used list, without putting it onto the free to prevent modifications by other threads. This allows for sem_undo_lock to be dropped to acquire individial semaphore locks without violating lock order. Since IPC_RMID may no longer find this sem_undo, do tolerate references to unallocated semaphores in undo structure, and check sequential number to not undo unrelated semaphore with the same id. While there, convert functions definitions to ANSI C and fix small style(9) glitches. Reported by: Omer Faruk Sen <omerfsen gmail com> [1], pho [2] Reviewed by: rwatson [3] Tested by: pho MFC after: 1 month
* Add a new KTR tracepoint in the KTR_CALLOUT class to note when a calloutjhb2009-01-131-0/+1
| | | | | | routine finishes executing. MFC after: 1 week
* Do not call namei() while having another user-controlled vnodekib2009-01-081-18/+31
| | | | | | | | | | | | | | | locked. Lookup could attempt to recursively lock that vnode. Do not call vn_start_write(V_WAIT) while vnode is locked, this may result in a deadlock with suspension. vfs_busy() the mountpoint before dropping vnode lock for vnode that was used to look up the mountpoint, to prevent unmount in between. Reported and tested by: pho Reviewed by: rwatson MFC after: 3 weeks
* Remove Giant locking from domains list.ed2009-01-041-9/+9
| | | | | | | | | | | | | | During boot, the domain list is locked with Giant. It is not possible to register any protocols after the system has booted, so the lock is only used to protect insertion of entries. There is already a mutex in uipc_domain.c called dom_mtx. Use this mutex to lock the list, instead of using Giant. It won't matter anything with respect to performance, but we'll never get rid of Giant if we don't remove from places where we don't need it. Approved by: rwatson MFC after: 3 weeks
* Remove two further uses (debugging and NULLing) of pr_ousrreq, missed duerwatson2009-01-042-3/+0
| | | | | | to svn commit in the wrong directory. Spotted by: bz
* Back out r186615; the sanitizing of the pointers in the error casebz2009-01-041-2/+0
| | | | | | is not needed and seems that it will not be needed either. Pointy hat: mine, mine, mine and not pho's
* Extend the struct vm_page wire_count to u_int to avoid the overflowkib2009-01-031-1/+5
| | | | | | | | | | | | | | | | of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks
* Fix a corner case in my previous commit.ed2009-01-021-1/+2
| | | | | Even though there are not many setups that have absolutely no console device, make sure a close() on a TTY doesn't dereference a null pointer.
* Don't let /dev/console be revoked if the TTY below is being closed.ed2009-01-021-0/+7
| | | | | | | | | | | During startup some of the syscons TTY's are used to set attributes like the screensaver and mouse options. These actions cause /dev/console to be rendered unusable. Fix the issue by leaving the TTY opened when it is used as the console device. Reported by: imp
* White space and comment tweaks.rwatson2009-01-011-2/+2
| | | | MFC after: 3 weeks
* Temporary workaround for the limitations of the mbuf flowid field: zerorwatson2009-01-011-0/+2
| | | | | | the field in the mbuf constructors, since otherwise we have no way to tell if they are valid. In the future, Kip has plans to add a flag specifically to indicate validity, which is the preferred model.
* Don't clobber sysctl_root()'s error number.ed2009-01-011-2/+5
| | | | | | | When sysctl() is being called with a buffer that is too small, it will return ENOMEM. Unfortunately the changes I made the other day sets the error number to 0, because it just returns the error number of the copyout(). Revert this part of the change.
* Document the relationship between enum VM_GUEST and the vm_guest_sysctl_namesivoras2008-12-301-1/+3
| | | | | | array. Approved by: gnn (original version)
* Added missing second part of cleaning j->ip[46] as requested by bzpho2008-12-301-0/+2
| | | | | Approved by: kib (mentor) Pointy hat: pho
* Make sure that unused j->ip[46] are clearedpho2008-12-301-2/+4
| | | | | Reviewed by: bz Approved by: kib (mentor)
* Rename mbcnt to mbcnt_delta in uipc_send() -- unlike other localrwatson2008-12-301-3/+3
| | | | | | | variables named mbcnt in uipc_usrreq.c, this instance is a delta rather than a cache of sb_mbcnt. MFC after: 3 weeks
* Clear the pointers to the file in the struct filedesc before file is closedkib2008-12-301-6/+8
| | | | | | | | in fdfree. Otherwise, sysctl_kern_proc_filedesc may dereference stale struct file * values. Reported and tested by: pho MFC after: 1 month
* In r185557, the check for existing negative entry for the given namekib2008-12-301-22/+11
| | | | | | | | | | did not compared nc_dvp with supplied parent directory vnode pointer. Add the check and note that now branches for vp != NULL and vp == NULL are the same, thus can be merged. Reported and reviewed by: kan Tested by: pho MFC after: 2 weeks
* Fix compilation. Also move ogetkerninfo() to kern_xxx.c.ed2008-12-292-212/+208
| | | | | | | It seems I forgot to remove `int error' from a single piece of code. I'm also moving ogetkerninfo() to kern_xxx.c, because it belongs to the class of compat system information system calls, not the generic sysctl code.
* Push down Giant inside sysctl. Also add some more assertions to the code.ed2008-12-292-53/+39
| | | | | | | | | | | | | | In the existing code we didn't really enforce that callers hold Giant before calling userland_sysctl(), even though there is no guarantee it is safe. Fix this by just placing Giant locks around the call to the oid handler. This also means we only pick up Giant for a very short period of time. Maybe we should add MPSAFE flags to sysctl or phase it out all together. I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root() and name2oid() are called with the sysctl lock held. Reviewed by: Jille Timmermans <jille quis cx>
* vm_map_lock_read() does not increment map->timestamp, so we shouldkib2008-12-291-2/+2
| | | | | | | | | | compare map->timestamp with saved timestamp after map read lock is reacquired, not with saved timestamp + 1. The only consequence of the +1 was unconditional lookup of the next map entry, though. Tested by: pho Approved by: des MFC after: 2 weeks
* drop rnh lock before destroying itkmacy2008-12-281-0/+1
|
* Hide detect_virtual() along with the accompanying stringbz2008-12-271-7/+9
| | | | | | | | arrays under #ifndef XEN to make XEN config compile again. In case of Xen vm_guest is hard coded. Move the list for the vm_guest sysctl out of the restictive bounds as the sysctl is there in either case.
* Prevent overflow of uio_resid.pho2008-12-271-0/+2
| | | | Approved by: kib
* Following the recent security advisory, add a comment describing ourrwatson2008-12-251-0/+22
| | | | | | | | | | | | | | | | | invariants and approach for protocol switch methods in protsw_init(), and also some KASSERT's for non-domain init entries in protocol switch tables: pru_abort and pru_send must both be implemented. For now, leave those assertions #if 0'd, since there are a few protocols that violate them in non-harmful ways. Whether or not we should enforce pru_abort being implemented for non-stream protocols is an interesting question: currently abort is only invoked on stream sockets in situations where un-accepted sockets must be abruptly closed (i.e., close() on a listen socket with pending connections), but in principle it is useful for datagram sockets and most datagram socket types implement it. MFC after: 3 weeks
* Do not KASSERT when vp->v_dd is NULL. Only directories which have had ".."marcus2008-12-231-1/+1
| | | | | | | | | looked up would have v_dd set to a non-NULL value. This fixes a panic seen when running installworld on a diskless system with a separate /usr file system. Submitted by: cracauer Approved by: kib
* Keep the hold on the vnode during VOP_VPTOCNP() call, allowing the vopkib2008-12-231-1/+1
| | | | | | implementation to drop vnode lock, if needed. Reported and tested by: pho
* Add missing newlines to flags tags of CPU topology, for prettierivoras2008-12-231-2/+2
| | | | | | | output. Reviewed by: jeff (original version) Approved by: gnn (mentor) (original version)
* Prevent cross-site forgery attacks on ftpd(8) due to splittingcperciva2008-12-231-0/+5
| | | | | | | | | | | | | long commands into multiple requests. [08:12] Avoid calling uninitialized function pointers in protocol switch code. [08:13] Merry Christmas everybody... Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-08:12.ftpd, FreeBSD-SA-08:13.protosw
* Revert r185891.ed2008-12-211-2/+10
| | | | | | | | | | | | In r185891 I removed the newlines from messages written to /dev/console, because it made startup messages from rc-scripts harder to read. This, unfortunately, causes the kernel message that is printed after a non-terminated log message to be concatenated. This could be fixed, but on short term it's better to just revert the change. Reported by: Jaakko Heinonen <jh saunalahti fi>
* Set PTS_FINISHED before waking up any threads.ed2008-12-211-2/+1
| | | | | | | | Inside ptsdrv_{in,out}wakeup() we call KNOTE_LOCKED() to wake up any kevent(2) users. Because the kqueue handlers are executed synchronously, we must set PTS_FINISHED before calling ptsdrv_{in,out}wakeup(). Discovered by: nork
* Let wchan names more closely match pre-MPSAFE TTY behaviour.ed2008-12-201-3/+3
| | | | | | | | | Right now the wchan strings "ttyinp" and "ttybgw" only differ one character from the strings we used prior to MPSAFE TTY. Just rename them back to their pre-MPSAFE TTY counterparts. Also rename "ttylck" to "ttymtx", which should make it more clear that a process is blocked on the TTY mutex, not some other form of locking.
* Modularize the Open Firmware client interface to allow run-time switchingnwhitehorn2008-12-201-25/+34
| | | | | | | | | | | | of OFW access semantics, in order to allow future support for real-mode OF access and flattened device frees. OF client interface modules are implemented using KOBJ, in a similar way to the PPC PMAP modules. Because we need Open Firmware to be available before mutexes can be used on sparc64, changes are also included to allow KOBJ to be used very early in the boot process by only using the mutex once we know it has been initialized. Reviewed by: marius, grehan
* Further beautify the lock strings to be more pleasing to the eye andivoras2008-12-191-4/+4
| | | | | | | self documenting within 6 characters. Reviewed by: ed (older version) Approved by: gnn (older version)
* Removed a comment made obsolete by revisions 157927 and 174292.ru2008-12-181-1/+0
|
* By popular request, stringify kern.vm_guest sysctl. Now it returns aivoras2008-12-181-3/+27
| | | | | | | short, self-documenting string describing the detected virtual environment. Approved by: gnn (mentor) (earlier version)
* Remove spaces in wait object names to make top (1) output prettier andivoras2008-12-181-5/+5
| | | | | | | unbreak scripts that examine ps (1) output. Reviewed by: ed Approved by: gnn (mentor)
* The quotactl, statfs and fstatfs syscall implementations may dereferencekib2008-12-181-6/+18
| | | | | | | | | | | | | | NULL pointer to struct mount if the looked up vnode is reclaimed. Also, these syscalls only mnt_ref() the mp, still allowing it to be unmounted; only struct mount memory is kept from being reused. Lock the vnode when doing name lookup, then reference its mount point, unlock the vnode and vfs_busy the mountpoint. This sequence shall take care of both races. Reported and tested by: pho Discussed with: attilio MFC after: 1 month
* Do not return success and doomed vnode from lookup. LK_UPGRADE allowskib2008-12-181-0/+4
| | | | | | | the vnode to be reclaimed. Tested by: pho MFC after: 1 month
* Introduce a sysctl kern.vm_guest that reflects what the kernel knows aboutivoras2008-12-171-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it running under a virtual environment. This also introduces a globally accessible variable vm_guest that can be used where appropriate in the kernel to inspect this environment. To make it easier for the long run, an enum VM_GUEST is also introduced, which could possibly be factored out in a header somewhere (but the question is where - vm/vm_param.h? sys/param.h?) so it eventually becomes a part of the standard KPI. In any case, it's a start. The purpose of all this isn't to absolutely detect that the OS is running under a virtual environment (cf. "redpill") but to allow the parts of the kernel and the userland that care about this particular aspect and can do something useful depending on it to have a standardised interface. Reducing kern.hz is one example but there are other things that could be done like avoiding context switches, not using CPU instructions that are known to be slow in emulation, possibly different strategies in VM (memory) allocation, CPU scheduling, etc. It isn't clear if the JAILS/VIMAGE functionality should also be exposed by this particular mechanism (probably not since they're not "full" virtual hardware environments). Sometime in the future another sysctl and a variable could be introduced to reflect if the kernel supports any kind of virtual hosting (e.g. VMWare VMI, Xen dom0). Reviewed by: silence from src-commiters@, virtualization@, kmacy@ Approved by: gnn (mentor) Security: Obscurity doesn't help.
* Remove sysctl debug.elf_trace and the trace field in auxargs. They gopeter2008-12-171-4/+0
| | | | | nowhere. It used to be the equivalent of $LD_DEBUG in rtld-elf. Elf_Auxargs is an internal structure.
OpenPOWER on IntegriCloud