summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* Implement the last 2-3 missing instructions for ipfw,luigi2002-07-052-36/+111
| | | | | | | | | | | | | | | | now it should support all the instructions of the old ipfw. Fix some bugs in the user interface, /sbin/ipfw. Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw). Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now. MFC after: 7 days
* Remove trailing whitespacebrian2002-07-0110-142/+142
|
* Extend the effect of the sysctl net.inet.tcp.icmp_may_rstjesper2002-06-302-2/+2
| | | | | | | | so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks
* One possible code path for syncache_respond() is:jlemon2002-06-281-1/+7
| | | | | | | | | | syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B) Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible. Problem found by: Don Bowman <don@sandvine.com>
* Fix warning.dfr2002-06-281-1/+1
| | | | Reviewed by: luigi
* The new ipfw code.luigi2002-06-274-276/+2952
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug. The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules). The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time. I have not renamed the header file because it would have required touching a one-line change to a number of kernel files. In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon. On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones. Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this: 10.20.30.0/26{18,44,33,22,9} which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes). Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled. The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.
* Warning fixes for 64 bits platforms. With this last fix,mux2002-06-272-2/+2
| | | | | | I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi
* Just a comment on some additional consistency checks that couldluigi2002-06-261-0/+5
| | | | be added here.
* At long last, commit the zero copy sockets code.ken2002-06-261-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Avoid unlocking the inp twice if badport_bandlim() returns -1.hsu2002-06-242-4/+8
| | | | Reported by: jlemon
* Style bug: fix 4 space indentations that should have been tabs.hsu2002-06-242-10/+10
| | | | Submitted by: jlemon
* Slightly restructure the #ifdef INET6 sections to make the codeluigi2002-06-231-31/+19
| | | | | | | more readable. Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.
* Move two global variables to automatic variables within theluigi2002-06-232-4/+6
| | | | only function where they are used (they are used with TCPDEBUG only).
* Move some global variables in more appropriate places.luigi2002-06-231-3/+28
| | | | | | | Add XXX comments to mark places which need to be taken care of if we want to remove this part of the kernel from Giant. Add a comment on a potential performance problem with ip_forward()
* fix bad indentation and whitespace resulting from cut&pasteluigi2002-06-232-26/+25
|
* fix indentation of a commentluigi2002-06-231-1/+1
|
* fix a typo in a commentluigi2002-06-231-1/+1
|
* Remove ip_fw_fwd_addr (forgotten in previous commit)luigi2002-06-231-7/+5
| | | | remove some extra whitespace.
* Remove (almost all) global variables that were used to holdluigi2002-06-2210-410/+442
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days
* Fix logic which resulted in missing a call to INP_UNLOCK().hsu2002-06-211-5/+2
| | | | Submitted by: jlemon, mux
* TCP notify functions can change the pcb list.hsu2002-06-212-4/+4
|
* Solve the 'unregistered netisr 18' information notice with a sledgehammer.peter2002-06-201-4/+7
| | | | | | Register the ISR early, but do not actually kick off the timer until we see some activity. This still saves us from running the arp timers on a system with no network cards.
* Remove so*_locked(), which were backed out by mistake.tanimura2002-06-184-6/+6
|
* Notify functions can destroy the pcb, so they have to return anhsu2002-06-147-40/+65
| | | | | | | | indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
* Re-commit w/fix:silby2002-06-143-3/+18
| | | | | | | | | | | Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.
* Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)silby2002-06-143-20/+1
| | | | Pointy-hat to: silby
* Ensure that the syn cache's syn-ack packets contain the samesilby2002-06-143-1/+20
| | | | | | | ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks
* Because we're holding an exclusive write lock on the head, references tohsu2002-06-131-3/+0
| | | | the new inp cannot leak out even though it has been placed on the head list.
* The UDP head was unlocked too early in one unicast case.hsu2002-06-121-10/+10
| | | | Submitted by: bug reported by arr
* Fix logic which resulted in missing a call to INP_UNLOCK().hsu2002-06-122-10/+4
|
* Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK.hsu2002-06-121-4/+2
| | | | | | | Submitted by: tegge, jlemon Prefer LIST_FOREACH macro. Submitted by: jlemon
* Remember to initialize the control block head mutex.hsu2002-06-112-0/+2
|
* Fix typo.hsu2002-06-111-2/+2
| | | | Submitted by: Kyunghwan Kim <redjade@atropos.snu.ac.kr>
* Every array elt is initialized in the following loop, so removehsu2002-06-101-1/+1
| | | | unnecessary M_ZERO.
* Lock up inpcb.hsu2002-06-1014-117/+665
| | | | Submitted by: Jennifer Yang <yangjihui@yahoo.com>
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.tanimura2002-05-3115-499/+120
| | | | Requested by: hsu
* Avoid unintentional trigraph.wollman2002-05-301-1/+1
|
* - Change the newly turned INVARIANTS #ifdef blocks (they were changed fromarr2002-05-213-24/+16
| | | | | DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code readability.
* - Turn a few DIAGNOSTIC into INVARIANTS since they are really sanityarr2002-05-201-3/+3
| | | | checks.
* - Turn a DIAGNOSTIC into an INVARIANTS since it's a sanity check. Usearr2002-05-201-2/+3
| | | | proper ``if'' statement style.
* - Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this linearr2002-05-201-1/+1
| | | | | | through the #endif is really a sanity check. Reviewed by: jake
* Lock down a socket, milestone 1.tanimura2002-05-2015-120/+499
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
* Reset token-ring source routing control field on receipt of ethernet framekbyanc2002-05-151-0/+1
| | | | | without source routing information. This restores the behaviour in this scenario to that of prior to my last commit.
* Modify the arguments to syncache_socket() to include the mbuf (m) thatrwatson2002-05-141-4/+6
| | | | | | | | | | | | results in the syncache entry being turned into a socket. While it's not used in the main tree, this is required in the MAC tree so that labels can be propagated from the mbuf to the socket. This is also useful if you're doing things like transparent IP connection hijacking and you want to use the syncache/cookie mechanism, but we won't go there. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Add ipfw hooks to ether_demux() and ether_output_frame().luigi2002-05-132-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ipfw processing of frames at layer 2 can be enabled by the sysctl variable net.link.ether.ipfw=1 Consider this feature experimental, because right now, the firewall is invoked in the places indicated below, and controlled by the sysctl variables listed on the right. As a consequence, a packet can be filtered from 1 to 4 times depending on the path it follows, which might make a ruleset a bit hard to follow. I will add an ipfw option to tell if we want a given rule to apply to ether_demux() and ether_output_frame(), but we have run out of flags in the struct ip_fw so i need to think a bit on how to implement this. to upper layers | | +----------->-----------+ ^ V [ip_input] [ip_output] net.inet.ip.fw.enable=1 | | ^ V [ether_demux] [ether_output_frame] net.link.ether.ipfw=1 | | +->- [bdg_forward]-->---+ net.link.ether.bridge_ipfw=1 ^ V | | to devices
* Remove custom definitions (IP_FW_TCPF_SYN etc.) of TCP header flagsluigi2002-05-132-12/+1
| | | | which are the same as the original ones (TH_SYN etc.)
* Add code to match MAC header fields (at the moment supported onluigi2002-05-121-62/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bridged packets only, soon to come also for packets on ordinary ether_input() and ether_output() paths. The syntax is ipfw add <action> MAC dst src type where dst and src can be "any" or a MAC address optionallyfollowed by a mask, e.g. 10:20:30:40:50 10:20:30:40:50/32 10:20:30:40:50&ff:ff:ff:f0:ff:0f and type can be a single ethernet type, a range, or a type followed by a mask (values are always in hexadecimal) e.g. 0800 0800-0806 0800/8 0800&03ff Note, I am still uncertain on what is the best format for inputting these values, having the values in hexadecimal is convenient in most cases but can be confusing sometimes. Suggestions welcome. Implement suggestion from PR 37778 to allow "not me" on destination and source IP. The code in the PR was slightly wrong and interfered with the normal handling of IP addresses. This version hopefully is correct. Minor cleanup of the code, in some places moving the indentation to 4 spaces because the code was becoming too deep. Eventually, in a separate commit, I will move the whole file to 4 space indent.
* s/demon/daemon/dd2002-05-122-3/+3
|
* Remove some duplicate types that should have been removed as part ofmike2002-05-111-40/+0
| | | | | | the rearranging in the previous revision. Pointy hat to: cvs update (merging), mike (for not noticing)
* Cleanup the interface to ip_fw_chk, two of the input argumentsluigi2002-05-094-104/+108
| | | | | | | | | | | | | | | were totally useless and have been removed. ip_input.c, ip_output.c: Properly initialize the "ip" pointer in case the firewall does an m_pullup() on the packet. Remove some debugging code forgotten long ago. ip_fw.[ch], bridge.c: Prepare the grounds for matching MAC header fields in bridged packets, so we can have 'etherfw' functionality without a lot of kernel and userland bloat.
OpenPOWER on IntegriCloud