summaryrefslogtreecommitdiffstats
path: root/sys/netinet/ip_output.c
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which isrwatson2005-07-051-1/+1
| | | | | | | | | | | redundant with respect to existing mbuf copy label routines. Expose a new mac_copy_mbuf() routine at the top end of the Framework and use that; use the existing mpo_copy_mbuf_label() routine on the bottom end. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl)
* Stop embedding struct ifnet at the top of driver softcs. Instead thebrooks2005-06-101-2/+2
| | | | | | | | | | | | | | | | | | | | struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam
* Bring back the full packet destination manipulation for 'ipfw fwd'andre2005-02-221-1/+5
| | | | | | | | | | | | | | | | | | | | with the kernel compile time option: options IPFIREWALL_FORWARD_EXTENDED This option has to be specified in addition to IPFIRWALL_FORWARD. With this option even packets targeted for an IP address local to the host can be redirected. All restrictions to ensure proper behaviour for locally generated packets are turned off. Firewall rules have to be carefully crafted to make sure that things like PMTU discovery do not break. Document the two kernel options. PR: kern/71910 PR: kern/73129 MFC after: 1 week
* Correctly move the packet header in ip_insertoptions().alc2005-01-231-1/+2
| | | | | | Reported by: Anupam Chanda Reviewed by: sam@ MFC after: 2 weeks
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Remove an errant blank line apparently introduced inrwatson2004-12-251-1/+0
| | | | ip_output.c:1.194.
* Pass the inpcb reference into ip_getmoptions() rather than just therwatson2004-12-051-6/+14
| | | | | | | | | | inp->inp_moptions pointer, so that ip_getmoptions() can perform necessary locking when doing non-atomic reads. Lock the inpcb by default to copy any data to local variables, then unlock before performing sooptcopyout(). MFC after: 2 weeks
* Push the inpcb argument into ip_setmoptions() when setting IP multicastrwatson2004-12-051-10/+8
| | | | socket options, so that it is available for locking.
* Start working through inpcb locking for ip_ctloutput() by cleaning uprwatson2004-12-051-10/+13
| | | | | | | | | | | | modifications to the inpcb IP options mbuf: - Lock the inpcb before passing it into ip_pcbopts() in order to prevent simulatenous reads and read-modify-writes that could result in races. - Pass the inpcb reference into ip_pcbopts() instead of the option chain pointer in the inpcb. - Assert the inpcb lock in ip_pcbots. - Convert one or two uses of a pointer as a boolean or an integer comparison to a comparison with NULL for readability.
* Add an additional struct inpcb * argument to pfil(9) in order to enablemlaier2004-09-291-1/+1
| | | | | | | | | | | | | | | | | | | passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)
* Make comments more clear for the packet changed cases after pfil hooks.andre2004-09-131-1/+2
|
* revert comment from rev1.158 now that rev1.225 backed it out..jmg2004-09-061-3/+1
| | | | MFC after: 3 days
* In the case the destination of a packet was changed by the packet filterandre2004-08-271-2/+2
| | | | | | | | | | | | to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface. Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface. Submitted by: brooks
* Always compile PFIL_HOOKS into the kernel and remove the associated kernelandre2004-08-271-17/+7
| | | | | | | | | | | compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack. If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active. Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
* Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.mlaier2004-08-221-13/+14
| | | | | | | | | | Previously the early drop was disabled unconditionally for ALTQ-enabled kernels. This should give some benefit for the normal gateway + LAN-server case with a busy LAN leg and an ALTQ managed uplink. Reviewed and style help from: cperciva, pjd
* Make the kernel compile again if you are not using PFIL_HOOKSpeter2004-08-181-0/+4
|
* Convert ipfw to use PFIL_HOOKS. This is change is transparent to userlandandre2004-08-171-282/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)
* Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSDdwmalone2004-08-141-6/+1
| | | | | | | | | | | | | | | | | | | | | have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
* Consistently use NULL for pointer comparisons.andre2004-08-111-10/+10
|
* Make a comment that "ipfw forward" is not SMP and PREEMPTION safe.andre2004-08-091-0/+1
|
* o Delayed checksums are now calculated in divert_packet() for diverted packetsandre2004-08-031-10/+0
| | | | Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.
* In ip_ctloutput(), acquire the inpcb lock around some of the basicrwatson2004-06-241-5/+10
| | | | inpcb flag and status updates.
* Link ALTQ to the build and break with ABI for struct ifnet. Please recompilemlaier2004-06-131-0/+7
| | | | | | | | | | | | your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation. __FreeBSD_version bump will follow. Tested-by: (i386)LINT
* o Calculate a number of bytes to copy (cnt) correctly:maxim2004-05-111-1/+1
| | | | | | | | | | | | | | | | | | +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ | | |C| | | | | | | | IP |N|O|L|P| | IP | | IP | | #1 |O|D|E|T| | #2 | | #n | | |P|E|N|R| | | | | +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->| | | src | +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) | dst +-- cp[IPOPT_OFF + 1] PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks
* Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.darrenr2004-05-021-1/+1
|
* Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra argdarrenr2004-05-021-1/+1
| | | | | (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
* In an effort to simplify the routing code, try to deprecate rtalloc()luigi2004-04-141-1/+1
| | | | | | | | in favour of rtalloc_ign(), which is what would end up being called anyways. There are 25 more instances of rtalloc() in net*/ and about 10 instances of rtalloc_ign()
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Fixed a bug in previous revision: compute the payload checksum beforeru2004-04-071-8/+8
| | | | | | | | | | | | | | we convert ip_len into a network byte order; in_delayed_cksum() still expects it in host byte order. The symtom was the ``in_cksum_skip: out of data by %d'' complaints from the kernel. To add to the previous commit log. These fixes make tcpdump(1) happy by not complaining about UDP/TCP checksum being bad for looped back IP multicast when multicast router is deactivated. Reported by: Vsevolod Lobko
* Untangle IP multicast routing interaction with delayed payload checksums.ru2004-03-251-13/+3
| | | | | | | | | | Compute the payload checksum for a locally originated IP multicast where God intended, in ip_mloopback(), rather than doing it in ip_output() and only when multicast router is active. This is more correct as we do not fool ip_input() that the packet has the correct payload checksum when in fact it does not (when multicast router is inactive). This is also more efficient if we don't join the multicast group we send to, thus allowing the hardware to checksum the payload.
* Two minor follow-ups on the MT_TAG removal:mlaier2004-03-021-3/+2
| | | | | | | | ifp is now passed explicitly to ether_demux; no need to look it up again. Make mtag a global var in ip_input. Noticed by: rwatson Approved by: bms(mentor)
* Re-remove MT_TAGs. The problems with dummynet have been fixed now.mlaier2004-02-251-51/+48
| | | | | Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam
* Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet ismlaier2004-02-181-60/+53
| | | | | | not working properly with the patch in place. Approved by: bms(mentor)
* don't update outgoing ifp, if ipsec tunnel mode encapsulationume2004-02-161-3/+5
| | | | | | was not made. Obtained from: KAME
* This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacingmlaier2004-02-131-53/+60
| | | | | | | | | | | them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson
* Initial import of RFC 2385 (TCP-MD5) digest support.bms2004-02-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
* pass pcb rather than so. it is expected that per socket policyume2004-02-031-8/+2
| | | | works again.
* Do not set the ip_id to zero when DF is set on packet andandre2004-01-081-12/+6
| | | | | | | | | | | | | | restore the general pre-randomid behaviour. Setting the ip_id to zero causes several problems with packet reassembly when a device along the path removes the DF bit for some reason. Other BSD and Linux have found and fixed the same issues. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-16/+9
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Remove RTF_PRCLONING from routing table and adjust users of itandre2003-11-201-2/+2
| | | | | | | | | | | | accordingly. The define is left intact for ABI compatibility with userland. This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Remove the global one-level rtcache variable and associatedandre2003-11-141-17/+8
| | | | | | | | complex locking and rework ip_rtaddr() to do its own rtlookup. Adopt all its callers to this and make ip_output() callable with NULL rt pointer. Reviewed by: sam (mentor)
* Introduce ip_fastforward and remove ip_flow.andre2003-11-141-0/+1
| | | | | | | | | | | | | | | Short description of ip_fastforward: o adds full direct process-to-completion IPv4 forwarding code o handles ip fragmentation incl. hw support (ip_flow did not) o sends icmp needfrag to source if DF is set (ip_flow did not) o supports ipfw and ipfilter (ip_flow did not) o supports divert, ipfw fwd and ipfilter nat (ip_flow did not) o returns anything it can't handle back to normal ip_input Enable with sysctl -w net.inet.ip.fastforwarding=1 Reviewed by: sam (mentor)
* Do not fragment a packet with hardware assistance if it has the DFandre2003-11-121-1/+2
| | | | | | bit set. Reviewed by: sam (mentor)
* assert optional inpcb is passed in lockedsam2003-11-081-0/+2
| | | | Supported by: FreeBSD Foundation
* - cleanup SP refcnt issue.ume2003-11-041-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec_*_policy -> ipsec_*_pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct secpolicy *) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME
* Note that when ip_output() is called from ip_forward(), it will alreadyrwatson2003-11-031-0/+2
| | | | | have its options inserted, so the opt argument to ip_output() must be NULL.
* Locking for updates to routing table entries. Each rtentry gets a mutexsam2003-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
* o update PFIL_HOOKS support to current API used by netbsdsam2003-09-231-19/+8
| | | | | | | | | | | o revamp IPv4+IPv6+bridge usage to match API changes o remove pfil_head instances from protosw entries (no longer used) o add locking o bump FreeBSD version for 3rd party modules Heavy lifting by: "Max Laier" <max@love2party.net> Supported by: FreeBSD Foundation Obtained from: NetBSD (bits of pfil.h and pfil.c)
* Implement MBUF_STRESS_TEST mark II.silby2003-09-011-18/+2
| | | | | | | | | | | | | | | | | | | Changes from the original implementation: - Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use. - m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test. - Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.
* Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent onbms2003-08-201-0/+12
| | | | | | | | | | specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp. Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)
OpenPOWER on IntegriCloud