summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* Fix handling of packets which matched an "ipfw fwd" rule on the input side.luigi2002-08-031-0/+13
|
* When preserving the IP header in extra mbuf in the IP forwardingrwatson2002-08-021-0/+7
| | | | | | | | case, also preserve the MAC label. Note that this mbuf allocation is fairly non-optimal, but not my fault. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Work to fix LINT build.rwatson2002-08-021-0/+1
| | | | Reported by: phk
* Introduce support for Mandatory Access Control and extensiblerwatson2002-08-011-4/+29
| | | | | | | | | | | | kernel access control. Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Document the undocumented assumption that at least one of the PCBrwatson2002-08-012-0/+4
| | | | | | | | | pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-08-011-0/+5
| | | | | | | | | | | | | kernel access control. Add support for labeling most out-going ICMP messages using an appropriate MAC entry point. Currently, we do not explicitly label packet reflect (timestamp, echo request) ICMP events, implicitly using the originating packet label since the mbuf is reused. This will be made explicit at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-316-0/+73
| | | | | | | | | | | | | | | | | | kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-17/+41
| | | | | | | | | | | | | | | kernel access control. Instrument the raw IP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check the socket and mbuf labels before permitting delivery to a socket, permitting MAC policies to selectively allow delivery of raw IP mbufs to various raw IP sockets that may be open. Restructure the policy checking code to compose IPsec and MAC results in a more readable manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+8
| | | | | | | | | | | | | | | kernel access control. When fragmenting an IP datagram, invoke an appropriate MAC entry point so that MAC labels may be copied (...) to the individual IP fragment mbufs by MAC policies. When IP options are inserted into an IP datagram when leaving a host, preserve the label if we need to reallocate the mbuf for alignment or size reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+17
| | | | | | | | | | | | | | | | kernel access control. Instrument the code managing IP fragment reassembly queues (struct ipq) to invoke appropriate MAC entry points to maintain a MAC label on each queue. Permit MAC policies to associate information with a queue based on the mbuf that caused it to be created, update that information based on further mbufs accepted by the queue, influence the decision making process by which mbufs are accepted to the queue, and set the label of the mbuf holding the reassembled datagram following reassembly completetion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+6
| | | | | | | | | | | kernel access control. When generating an IGMP message, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the target interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+5
| | | | | | | | | | kernel access control. When generating an ARP query, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+6
| | | | | | | | | | | kernel access control. Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-301-0/+1
| | | | | | | | | | | | | | kernel access control. Label IP fragment reassembly queues, permitting security features to be maintained on those objects. ipq_label will be used to manage the reassembly of fragments into IP datagrams using security properties. This permits policies to deny the reassembly of fragments, as well as influence the resulting label of a datagram following reassembly. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Use a common way to release locks before exit.maxim2002-07-291-2/+4
| | | | Reviewed by: hsu
* Wire the sysctl output buffer before grabbing any locks to preventtruckman2002-07-283-0/+9
| | | | | | | SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
* make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.ume2002-07-251-3/+3
| | | | MFC after: 1 week
* cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,ume2002-07-252-6/+4
| | | | | | ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week
* Only log things net.inet.ip.fw.verbose is setluigi2002-07-241-1/+2
|
* Don't forget to recalculate the IP checksum of the originalru2002-07-231-4/+12
| | | | | | | IP datagram embedded into ICMP error message. Spotted by: tcpdump 3.7.1 (-vvv) MFC after: 3 days
* Don't shrink socket buffers in tcp_mss(), application might have alreadyru2002-07-222-4/+8
| | | | | | | configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows. PR: kern/11966 MFC after: 1 week
* do not refer to IN6P_BINDV6ONLY anymore.ume2002-07-221-1/+0
| | | | | Obtained from: KAME MFC after: 1 week
* Fix overflows in intermediate calculations in sysctl_msec_to_ticks().jdp2002-07-201-2/+2
| | | | | | | At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days
* Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernelrwatson2002-07-201-0/+2
| | | | | | | | | | data structures pick up security and synchronization primitives, it becomes increasingly desirable not to arbitrarily export them via include files to userland, as the userland applications pick up new #include dependencies. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Add the tcps_sndrexmitbad statistic, keep track of late acks that causeddillon2002-07-193-0/+3
| | | | unnecessary retransmissions.
* Introduce two new sysctl's:dillon2002-07-184-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the *original* code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.
* Move IPFW2 definition before including ip_fw.hluigi2002-07-181-32/+30
| | | | Make indentation of new parts consistent with the style used for this file.
* I don't know how the minimum retransmit timeout managed to get set todillon2002-07-171-1/+7
| | | | | | | | one second but it badly breaks throughput on networks with minor packet loss. Complaints by: at least two people tracked down to this. MFC after: 3 days
* Fix a panic when doing "ipfw add pipe 1 log ..."luigi2002-07-172-6/+31
| | | | | Also synchronize ip_dummynet.c with the version in RELENG_4 to ease MFC's.
* Implement keepalives for dynamic rules, so they will not expireluigi2002-07-142-107/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | just because you leave your session idle. Also, put in a fix for 64-bit architectures (to be revised). In detail: ip_fw.h * Reorder fields in struct ip_fw to avoid alignment problems on 64-bit machines. This only masks the problem, I am still not sure whether I am doing something wrong in the code or there is a problem elsewhere (e.g. different aligmnent of structures between userland and kernel because of pragmas etc.) * added fields in dyn_rule to store ack numbers, so we can generate keepalives when the dynamic rule is about to expire ip_fw2.c * use a local function, send_pkt(), to generate TCP RST for Reset rules; * save about 250 bytes by cleaning up the various snprintf() in ipfw_log() ... * ... and use twice as many bytes to implement keepalives (this seems to be working, but i have not tested it extensively). Keepalives are generated once every 5 seconds for the last 20 seconds of the lifetime of a dynamic rule for an established TCP flow. The packets are sent to both sides, so if at least one of the endpoints is responding, the timeout is refreshed and the rule will not expire. You can disable this feature with sysctl net.inet.ip.fw.dyn_keepalive=0 (the default is 1, to have them enabled). MFC after: 1 day (just kidding... I will supply an updated version of ipfw2 for RELENG_4 tomorrow).
* Avoid dereferencing a null pointer in ro_rt.luigi2002-07-121-2/+3
| | | | | | | | | This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30. So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.
* Back out the previous change, since it looks like locking udbinfo providestruckman2002-07-121-8/+1
| | | | sufficient protection.
* Lock inp while we're accessing it.truckman2002-07-121-1/+8
|
* Defer calling SYSCTL_OUT() until after the locks have been released.truckman2002-07-113-5/+10
|
* Reduce the nesting level of a code block that doesn't need to be intruckman2002-07-112-26/+20
| | | | an else clause.
* Change one variable to make it easier to switch between ipfw and ipfw2luigi2002-07-091-5/+3
|
* Fix a bug caused by dereferencing an invalid pointer whenluigi2002-07-081-62/+65
| | | | | | | | | no punch_fw was used. Fix another couple of bugs which prevented rules from being installed properly. On passing, use IPFW2 instead of NEW_IPFW to compile the new code, and slightly simplify the instruction generation code.
* No functional changes, but:luigi2002-07-081-279/+267
| | | | | | | | | | | | | | Following Darren's suggestion, make Dijkstra happy and rewrite the ipfw_chk() main loop removing a lot of goto's and using instead a variable to store match status. Add a lot of comments to explain what instructions are supposed to do and how -- this should ease auditing of the code and make people more confident with it. In terms of code size: the entire file takes about 12700 bytes of text, about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!) for ipfw_log().
* Remove one unused command name.luigi2002-07-081-1/+0
|
* Forgot to update one field name in one of the latest commits.luigi2002-07-081-2/+2
|
* Implement the last 2-3 missing instructions for ipfw,luigi2002-07-052-36/+111
| | | | | | | | | | | | | | | | now it should support all the instructions of the old ipfw. Fix some bugs in the user interface, /sbin/ipfw. Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw). Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now. MFC after: 7 days
* Remove trailing whitespacebrian2002-07-0110-142/+142
|
* Extend the effect of the sysctl net.inet.tcp.icmp_may_rstjesper2002-06-302-2/+2
| | | | | | | | so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks
* One possible code path for syncache_respond() is:jlemon2002-06-281-1/+7
| | | | | | | | | | syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B) Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible. Problem found by: Don Bowman <don@sandvine.com>
* Fix warning.dfr2002-06-281-1/+1
| | | | Reviewed by: luigi
* The new ipfw code.luigi2002-06-274-276/+2952
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug. The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules). The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time. I have not renamed the header file because it would have required touching a one-line change to a number of kernel files. In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon. On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones. Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this: 10.20.30.0/26{18,44,33,22,9} which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes). Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled. The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.
* Warning fixes for 64 bits platforms. With this last fix,mux2002-06-272-2/+2
| | | | | | I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi
* Just a comment on some additional consistency checks that couldluigi2002-06-261-0/+5
| | | | be added here.
* At long last, commit the zero copy sockets code.ken2002-06-261-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Avoid unlocking the inp twice if badport_bandlim() returns -1.hsu2002-06-242-4/+8
| | | | Reported by: jlemon
OpenPOWER on IntegriCloud