summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_socket.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Give struct socket structures a ref counting interface similar todillon2001-11-171-6/+27
| | | | | | | vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
* Remove EOL whitespace.keramida2001-11-121-8/+8
| | | | Reviewed by: alfred
* Make KASSERT's print the values that triggered a panic.keramida2001-11-121-2/+3
| | | | Reviewed by: alfred
* Change the kernel's ucred API as follows:jhb2001-10-111-2/+1
| | | | | | | | - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
* - Combine kern.ps_showallprocs and kern.ipc.showallsockets intorwatson2001-10-091-19/+0
| | | | | | | | | | | | | | | | | | | | | | | a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc(). Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models. Reviewed by: ps, billf Obtained from: TrustedBSD Project
* Only allow users to see their own socket connections ifps2001-10-051-0/+30
| | | | | | | | | kern.ipc.showallsockets is set to 0. Submitted by: billf (with modifications by me) Inspired by: Dave McKay (aka pm aka Packet Magnet) Reviewed by: peter MFC after: 2 weeks
* Hopefully improve control message passing over Unix domain sockets.dwmalone2001-10-041-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Allow the sending of more than one control message at a time over a unix domain socket. This should cover the PR 29499. 2) This requires that unp_{ex,in}ternalize and unp_scan understand mbufs with more than one control message at a time. 3) Internalize and externalize used to work on the mbuf in-place. This made life quite complicated and the code for sizeof(int) < sizeof(file *) could end up doing the wrong thing. The patch always create a new mbuf/cluster now. This resulted in the change of the prototype for the domain externalise function. 4) You can now send SCM_TIMESTAMP messages. 5) Always use CMSG_DATA(cm) to determine the start where the data in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1) in some places, which gives the wrong alignment on the alpha. (NetBSD made this fix some time ago). This results in an ABI change for discriptor passing and creds passing on the alpha. (Probably on the IA64 and Spare ports too). 6) Fix userland programs to use CMSG_* macros too. 7) Be more careful about freeing mbufs containing (file *)s. This is made possible by the prototype change of externalise. PR: 29499 MFC after: 6 weeks
* Use the passed in thread to selrecord() instead of curthread.jhb2001-09-211-2/+2
|
* KSE Milestone 2julian2001-09-121-34/+34
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-0/+3
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Actually show the values that tripped the assertion "receive 1"alfred2001-04-271-1/+3
|
* When doing a recv(.. MSG_WAITALL) for a message which is larger thanjlemon2001-03-161-0/+6
| | | | | | | | | | | the socket buffer size, the receive is done in sections. After completing a read, call pru_rcvd on the underlying protocol before blocking again. This allows the the protocol to take appropriate action, such as sending a TCP window update to the peer, if the window happened to close because the socket buffer was filled. If the protocol is not notified, a TCP transfer may stall until the remote end sends a window probe.
* Push the test for a disconnected socket when accept()ing down to thejlemon2001-03-091-4/+1
| | | | | protocol layer. Not all protocols behave identically. This fixes the brokenness observed with unix-domain sockets (and postfix)
* In soshutdown(), use SHUT_{RD,WR,RDWR} instead of FREAD and FWRITE.ru2001-02-271-3/+5
| | | | Also, return EINVAL if `how' is invalid, as required by POSIX spec.
* Introduce a NOTE_LOWAT flag for use with the read/write filters, whichjlemon2001-02-241-0/+4
| | | | | | | | | | allow the watermark to be passed in via the data field during the EV_ADD operation. Hook this up to the socket read/write filters; if specified, it overrides the so_{rcv|snd}.sb_lowat values in the filter. Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>
* When returning EV_EOF for the socket read/write filters, also returnjlemon2001-02-241-0/+2
| | | | | | | the current socket error in fflags. This may be useful for determining why a connect() request fails. Inspired by: "Jonathan Graehl" <jonathan@graehl.org>
* o Move per-process jail pointer (p->pr_prison) to inside of the subjectrwatson2001-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
* Extend kqueue down to the device layer.jlemon2001-02-151-27/+28
| | | | Backwards compatible approach suggested by: peter
* Return ECONNABORTED from accept if connection is closed while on thejlemon2001-02-141-5/+2
| | | | | | | listen queue, as well as the current behavior of a zero-length sockaddr. Obtained from: KAME Reviewed by: -net
* First step towards an MP-safe zone allocator:des2001-01-211-2/+2
| | | | | | | - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone
* * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.bmilekic2000-12-211-9/+9
| | | | | | | | | | | | | | | | | | This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-4/+2
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* Accept filters broke kernels compiled without options INET.alfred2000-11-201-6/+19
| | | | | | | Make accept filters conditional on INET support to fix. Pointed out by: bde Tested and assisted by: Stephen J. Kiernan <sab@vegamuse.org>
* Check so_error in filt_so{read|write} in order to detect UDP errors.jlemon2000-09-281-0/+4
| | | | PR: 21601
* Remove uidinfo hash table lookup and maintenance out of chgproccnt() andtruckman2000-09-051-2/+2
| | | | | | | | | | | | | | chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
* Remove any possibility of hiwat-related race conditions by changinggreen2000-08-291-2/+2
| | | | | | | the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and a u_long target to set it to. The whole thing is splnet(). This fixes a problem that jdp has been able to provoke.
* Make the kqueue socket read filter honor the SO_RCVLOWAT value.jlemon2000-08-071-1/+1
| | | | Spotted by: "Steve M." <stevem@redlinenetworks.com>
* only allow accept filter modifications on listening socketsalfred2000-07-201-0/+8
| | | | Submitted by: ps
* fix races in the uidinfo subsystem, several problems existed:alfred2000-06-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | 1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle. fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already. 2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races. fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure. 3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity. Reviewed by: green
* return of the accept filter part IIalfred2000-06-201-0/+101
| | | | | | | | | | | accept filters are now loadable as well as able to be compiled into the kernel. two accept filters are provided, one that returns sockets when data arrives the other when an http request is completed (doesn't work with 0.9 requests) Reviewed by: jmg
* backout accept optimizations.alfred2000-06-181-4/+0
| | | | Requested by: jmg, dcs, jdp, nate
* add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()alfred2000-06-151-0/+4
| | | | | | | | | | | | until the incoming connection has either data waiting or what looks like a HTTP request header already in the socketbuffer. This ought to reduce the context switch time and overhead for processing requests. The initial idea and code for HTTPACCEPT came from Yahoo engineers and has been cleaned up and a more lightweight DELAYACCEPT for non-http servers has been added Reviewed by: silence on hackers.
* Fix panic by moving the prp == 0 check up the order of sanity checks.asmodai2000-06-131-2/+3
| | | | | Submitted by: Bart Thate <freebsd@1st.dudi.org> on -current Approved by: rwatson
* o Modify jail to limit creation of sockets to UNIX domain sockets,rwatson2000-06-041-0/+9
| | | | | | | | | | | | | | | | | TCP/IP (v4) sockets, and routing sockets. Previously, interaction with IPv6 was not well-defined, and might be inappropriate for some environments. Similarly, sysctl MIB entries providing interface information also give out only addresses from those protocol domains. For the time being, this functionality is enabled by default, and toggleable using the sysctl variable jail.socket_unixiproute_only. In the future, protocol domains will be able to determine whether or not they are ``jail aware''. o Further limitations on process use of getpriority() and setpriority() by jailed processes. Addresses problem described in kern/17878. Reviewed by: phk, jmg
* Back out the previous change to the queue(3) interface.jake2000-05-261-2/+2
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-2/+2
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Introduce kqueue() and kevent(), a kernel event notification facility.jlemon2000-04-161-0/+109
|
* Make sure to free the socket in soabort() if the protocol couldn'tfenner2000-03-181-1/+7
| | | | | free it (this could happen if the protocol already freed its part and we just kept the socket around to make sure accept(2) didn't block)
* Add aio_waitcomplete(). Make aio work correctly for socket descriptors.jasone2000-01-141-0/+1
| | | | | | | | Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable. PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
* Correct an uninitialized variable use, which, unlike most times, isgreen1999-12-271-4/+2
| | | | | | | actually a bug this time. Submitted by: bde Reviewed by: bde
* This is Bosko Milekic's mbuf allocation waiting code. Basically, thisgreen1999-12-121-0/+12
| | | | | | | | means that running out of mbuf space isn't a panic anymore, and code which runs out of network memory will sleep to wait for it. Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: green, wollman
* KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCPshin1999-11-221-1/+112
| | | | | | | | | | for IPv6 yet) With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* This is a partial commit of the patch from PR 14914:phk1999-11-161-5/+6
| | | | | | | | | | | | | Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
* Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf totalgreen1999-10-091-4/+10
| | | | usage limit.
* Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually.green1999-09-191-8/+4
| | | | | | Make a sonewconn3() which takes an extra argument (proc) so new sockets created with sonewconn() from a user's system call get the correct credentials, not just the parent's credentials.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Reviewed by: the cast of thousandsgreen1999-06-171-4/+11
| | | | | | | | | This is the change to struct sockets that gets rid of so_uid and replaces it with a much more useful struct pcred *so_cred. This is here to be able to do socket-level credential checks (i.e. IPFW uid/gid support, to be added to HEAD soon). Along with this comes an update to pidentd which greatly simplifies the code necessary to get a uid from a socket. Soon to come: a sysctl() interface to finding individual sockets' credentials.
* Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expectedpeter1999-06-041-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | to either enqueue or free their mbuf chains, but tcp_usr_send() was dropping them on the floor if the tcpcb/inpcb has been torn down in the middle of a send/write attempt. This has been responsible for a wide variety of mbuf leak patterns, ranging from slow gradual leakage to rather rapid exhaustion. This has been a problem since before 2.2 was branched and appears to have been fixed in rev 1.16 and lost in 1.23/1.28. Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking (extensively) into this on a live production 2.2.x system and that it was the actual cause of the leak and looks like it fixes it. The machine in question was loosing (from memory) about 150 mbufs per hour under load and a change similar to this stopped it. (Don't blame Jayanth for this patch though) An alternative approach to this would be to recheck SS_CANTSENDMORE etc inside the splnet() right before calling pru_send() after all the potential sleeps, interrupts and delays have happened. However, this would mean exposing knowledge of the tcp stack's reset handling and removal of the pcb to the generic code. There are other things that call pru_send() directly though. Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
* Realy fix overflow on SO_*TIMEOache1999-05-211-4/+12
| | | | Submitted by: bde
* Add sysctl descriptions to many SYSCTL_XXXsbillf1999-05-031-3/+3
| | | | | | | PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
OpenPOWER on IntegriCloud