summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_syscalls.c
Commit message (Collapse)AuthorAgeFilesLines
* Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions thatkib2010-03-191-4/+19
| | | | | | | take iov. Reviewed by: tuexen MFC after: 2 weeks
* Remove dead statement.kib2010-03-191-1/+0
| | | | | Reviewed by: tuexen MFC after: 2 weeks
* Fix two style issues.kib2010-03-191-2/+2
| | | | MFC after: 2 weeks
* Use NULL instead of 0 when setting up pointer.pjd2010-02-181-2/+2
|
* Fix argument order in a call to mtx_init.mjacob2009-12-171-1/+1
| | | | MFC after: 1 week
* If socket buffer space appears to be lower then sum of count of alreadykib2009-11-031-9/+1
| | | | | | | | | | | | | | prepared bytes and next portion of transfer, inner loop of kern_sendfile() aborts, not preparing next mbuf for socket buffer, and not modifying any outer loop invariants. The thread loops in the outer loop forever. Instead of breaking from inner loop, prepare only bytes that fit into the socket buffer space. In collaboration with: pho Reviewed by: bz PR: kern/138999 MFC after: 2 weeks
* Fix style issue.kib2009-10-291-1/+1
|
* Do not dereference vp->v_mount without holding vnode lock and checkingkib2009-10-011-2/+5
| | | | | | | that the vnode is not reclaimed. Noted by: Igor Sysoev <is rambler-co ru> MFC after: 1 week
* Get SCTP working in combination with VIMAGE.tuexen2009-09-191-0/+9
| | | | | | Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+2
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Audit file descriptor numbers for various socket-related system calls.rwatson2009-07-011-0/+17
| | | | | Approved by: re (audit argument blanket) MFC after: 3 days
* Define missing audit argument macro AUDIT_ARG_SOCKET(), andrwatson2009-07-011-0/+3
| | | | | | | | capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days
* SCTP needs either IPv4 or IPv6 as lower layer[1].bz2009-06-101-4/+8
| | | | | | | | So properly hide the already #ifdef SCTP code with #if defined(INET) || defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add internal 'mac_policy_count' counter to the MAC Framework, which is arwatson2009-06-021-36/+12
| | | | | | | | | | | | | | | | | | count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project
* Split native socketpair() syscall onto kern_socketpair() which shoulddchagin2009-05-311-24/+29
| | | | | | | be used by kernel consumers and socketpair() itself. Approved by: kib (mentor) MFC after: 1 month
* - Implement a lockless file descriptor lookup algorithm injeff2009-05-141-16/+9
| | | | | | | | | | | | fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.
* A NOP change: style / whitespace cleanup of the noise that slippedzec2009-05-081-1/+1
| | | | | | | into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)
* Change the curvnet variable from a global const struct vnet *,zec2009-05-051-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
* sendfile doesn't modify the vnode - acquire vnode lock sharedkmacy2009-04-121-1/+1
| | | | Reviewed by: ups, jeffr
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-5/+5
| | | | MFC after: 3 months
* When sendto(2) is called with an explicit destination addressrwatson2008-05-221-1/+5
| | | | | | | | | | | argument, call mac_socket_check_connect() on that address before proceeding with the send. Otherwise policies instrumenting the connect entry point for the purposes of checking destination addresses will not have the opportunity to check implicit connect requests. MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.
* When writing trailers in sendfile(2), don't call kern_writev()rwatson2008-04-271-3/+4
| | | | | | | | | | | | | | | | | | while holding the socket buffer lock. These leads to an immediate panic due to recursing the socket buffer lock. This bug was introduced in uipc_syscalls.c:1.240, but masked by another bug until that was fixed in uipc_syscalls.c:1.269. Note that the current fix isn't perfect, but better than panicking: normally we guarantee that simultaneous invocations of a system call to write on a stream socket won't be interlaced, which is ensured by use of the socket buffer sleep lock. This is guaranteed for the sendfile headers, but not trailers. In practice, this is likely not a problem, but should be fixed. MFC after: 3 days Pointy hat to: andre (1.240), cperciva (1.269)
* Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.ru2008-03-251-20/+8
| | | | | | | | | | Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
* After finishing sending file data in sendfile(2), don't forget to sendcperciva2008-02-241-1/+3
| | | | | | | | | the provided trailers. This has been broken since revision 1.240. Submitted by: Dan Nelson PR: kern/120948 "sounds ok to me" from: phk MFC after: 3 days
* This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payloaddes2008-02-231-0/+36
| | | | | | | | | | | | | | | | | | | | | | | consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks
* Fix sendfile(2) write-only file permission bypass.simon2008-02-141-14/+17
| | | | | Security: FreeBSD-SA-08:03.sendfile Submitted by: kib
* Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufsphk2008-02-031-1/+42
| | | | | | | | referencing the files VM pages are returned from the network stack, making changes to the file safe. This flag does not guarantee that the data has been transmitted to the other end.
* Give MEXTADD() another argument to make both void pointers to thephk2008-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.
* Correct two problems relating to sorflush(), which is called to flushrwatson2008-01-311-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-2/+2
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-2/+2
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Remove explicit locking of struct file.jeff2007-12-301-28/+5
| | | | | | | | | | | | | - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-12/+12
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* - During shutdown pending, when the last sack came in andrrs2007-08-271-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)
* Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, whichrwatson2007-08-061-54/+13
| | | | | | | | | | | | | | | previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
* - Add some needed error checking on bad fd passing in the sctprrs2007-07-021-6/+10
| | | | | | syscalls. Approved by: re@freebsd.org (Ken Smith) Obtained from: Weongyo Jeong (weongyo.jeong@gmail.com)
* In kern_sendfile() adjust byte accounting of the file sending loop toandre2007-05-191-13/+37
| | | | | | | | | | | | | | | | | | | | | | ignore the size of any headers that were passed with the sendfile(2) system call. Otherwise the file sent will be truncated by the header size if the nbytes parameter was provided. The bug doesn't show up when either nbytes is zero, meaning send the whole file, or no header iovec is provided. Resolve a potential error aliasing of errors from the VM and sf_buf parts and the protocol send parts where an error of the latter over- writes one of the former. Update comments. The byte accounting bug wasn't seen in earlier because none of the popular sendfile(2) consumers, Apache, lighttpd and our ftpd(8) use it in modes that trigger it. The varnish HTTP proxy makes full use of it and exposed the problem. Bug found by: phk Tested by: phk
* Generally migrate to ANSI function headers, and remove 'register' use.rwatson2007-05-161-21/+21
|
* sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flagsrwatson2007-05-031-4/+0
| | | | | | | | | | | | | | | | | | | | | | | on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.
* Don't reinvent vm_page_grab().pjd2007-04-201-23/+3
| | | | Reviewed by: ups
* Fix a bug in sendfile(2) when files larger than page size and nbytes=0.pjd2007-04-191-2/+2
| | | | | | | When nbytes=0, sendfile(2) should use file size. Because of the bug, it was sending half of a file. The bug is that 'off' variable can't be used for size calculation, because it changes inside the loop, so we should use uap->offset instead.
* Remove XXX comment that changes to file fields should be protected withrwatson2007-04-061-5/+0
| | | | | | | the file lock rather than the filedesc lock: I fixed this in the last revision. Spotted by: kris
* Replace custom file descriptor array sleep lock constructed using a mutexrwatson2007-04-041-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
* Fix a fd leak in socketpair():jhb2007-04-021-2/+7
| | | | | | - Close the new file objects created during socketpair() if the copyout of the new file descriptors fails. - Add a test to the socketpair regression test for this edge case.
* Further system call comment cleanup:rwatson2007-03-051-3/+1
| | | | | | | | | | - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
* Remove 'MPSAFE' annotations from the comments above most system calls: allrwatson2007-03-041-78/+3
| | | | | | | | system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
* Fixes the MSG_PEEK for sctp_generic_recvmsg() the msg_flagsrrs2007-01-241-2/+10
| | | | | | were not being copied in properly so PEEK and any other msg_flags input operation were not being performed right. Approved by: gnn
* In kern_sendfile() fix the calculation of sbytes (the total number of bytesandre2006-11-121-23/+19
| | | | | | | | | | | | | | | written to the socket). The rewrite in revision 1.240 got confused by the FreeBSD 4.x bug compatibility code. For some reason lighttpd, that was used for testing the new sendfile code, was not affected by the problem but apache and others using headers/trailers in the sendfile call received incorrect sbytes values after return from non- blocking sockets. This then lead to restarts with wrong offsets and thus mixed up file contents when the socket was writeable again. All programs not using headers/trailers, like ftpd, were not affected by the bug. Reported by: Pawel Worach <pawel.worach-at-gmail.com> Tested by: Pawel Worach <pawel.worach-at-gmail.com>
* Style cleanups to the sctp_* syscall functions.andre2006-11-071-107/+95
|
OpenPOWER on IntegriCloud