summaryrefslogtreecommitdiffstats
path: root/sys/netinet6/ip6_output.c
Commit message (Collapse)AuthorAgeFilesLines
* pf: Fix possible incorrect IPv6 fragmentationkp2017-04-201-0/+2
| | | | | | | | | | | | | | | | | | | When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days (cherry picked from commit 4f3397263b95a45dd58e2be3a566029f8841cace)
* MFC r275710:Luiz Otavio O Souza2015-10-201-1/+2
| | | | | | | | | | | | | | | | Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC TAG: IPSEC-HEAD Issue: #4841
* Revert IPSEC patches.Luiz Otavio O Souza2015-10-201-54/+47
| | | | | | | | | Revert "Importing pfSense patch IPSEC_sysctl.RELENG_10.diff" This reverts commit 1a5bcc816de96758225aa0a4d2b5ddc7b88b6b58. TAG: IPSEC-HEAD Issue: #4841
* Importing pfSense patch IPSEC_sysctl.RELENG_10.diffRenato Botelho2015-08-171-47/+54
|
* Merge r280955kp2015-06-181-3/+4
| | | | | | | | | | | | | | | Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2817 Reviewed by: gnn
* Merge r278842kp2015-06-181-48/+63
| | | | | | | Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4). Differential Revision: https://reviews.freebsd.org/D2815 Reviewed by: gnn
* MFC r282578:ae2015-05-141-9/+5
| | | | | | | | Mark data checksum as valid for multicast packets, that we send back to myself via simloop. Also remove duplicate check under #ifdef DIAGNOSTIC. PR: 180065
* MFC r279588:ae2015-03-121-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. MFC r279684: tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. PR: 197059 Sponsored by: Yandex LLC
* MFC r266800 by vanhu:ae2014-11-051-171/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 MC r266822 by bz: Use IPv4 statistics in ipsec4_process_packet() rather than the IPv6 version. This also unbreaks the NOINET6 builds after r266800. MFC r268083 by zec: The assumption in ipsec4_process_packet() that the payload may be only IPv4 is wrong, so check the IP version before mangling the payload header. MFC r272394: Do not strip outer header when operating in transport mode. Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP encapsulation, it will be handled with tunneling interface. And thus proper interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling interface will see packets in both directions. PR: 194761
* Merge r262763, r262767, r262771, r262806 from head:glebius2014-03-211-6/+6
| | | | | | | | | | - Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
* Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,glebius2014-03-041-13/+2
| | | | | | | | | | r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029, r262030, r262162 from head. Large flowtable revamp. See commit messages for merged revisions for details. Sponsored by: Netflix
* Restructure the mbuf pkthdr to make it fit for upcoming capabilities andandre2013-08-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation
* In r227207, to fix the issue with possible NULL inp_socket pointertrociny2013-07-041-7/+4
| | | | | | | | | | | | | | | | | | | | | dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly. Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set. Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR. PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week
* Finally change the mbuf to have its own fib field instead of stealingjulian2013-05-161-1/+2
| | | | | | 4 flag bits. This was supposed to happen in 8.0, and again in 2012.. MFC after: never
* Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.ae2013-04-091-11/+11
| | | | MFC after: 1 week
* - Use m_getcl() instead of hand allocating.glebius2013-03-151-24/+12
| | | | | | | | | - Do not calculate constant length values at run time, CTASSERT() their sanity. - Remove superfluous cleaning of mbuf fields after allocation. - Replace compat macros with function calls. Sponsored by: Nginx, Inc.
* - Use m_getcl() instead of hand allocating.glebius2013-03-151-7/+4
| | | | | | | - Use m_get()/m_gethdr() instead of macros. - Remove superfluous cleaning of mbuf fields after allocation. Sponsored by: Nginx, Inc.
* When we have some address to forward (e.g. it was specified with ipfw fwd),ae2012-12-191-7/+9
| | | | | | | we should pass it as first argument into in6_selectroute_fib function to initiate new route lookup. MFC after: 1 week
* Make dst_sa initialization only when it is actually needed.ae2012-12-191-9/+12
| | | | MFC after: 1 week
* The selectroute functions does own account of EHOSTUNREACH errors,ae2012-12-191-8/+0
| | | | | | no need to do it twice. MFC after: 1 week
* Mechanically substitute flags from historic mbuf allocator withglebius2012-12-051-9/+9
| | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
* Remove the recently added sysctl variable net.pfil.forward.ae2012-11-021-4/+3
| | | | | | | | | Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre
* Remove the IPFIREWALL_FORWARD kernel option and make possible to turnae2012-10-251-4/+2
| | | | | | | | | on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks
* Remove __P.delphij2012-10-221-10/+10
| | | | | | Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months
* In ip6_ctloutput() guard inp_flags modifications with INP_WLOCK.trociny2012-08-191-0/+6
| | | | MFC after: 2 weeks
* In case of IPsec he have to do delayed checksum calculations beforebz2012-07-311-0/+14
| | | | | | | | | adding any extension header, or rather before calling into IPsec processing as we may send the packet and not return to IPv6 output processing here. PR: kern/170116 MFC After: 3 days
* Improve the should-never-hit printf to ease debugging in case we'd ever hitbz2012-07-311-2/+3
| | | | | | it again when doing the delayed IPv6 checksum calculations. MFC after: 3 days
* For consistency put the IPsec comment iside the #fidef section.bz2012-07-291-1/+1
| | | | MFC after: 3 days
* When ip_output()/ip6_output() is supplied a struct route *ro argument,glebius2012-07-041-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)
* Seperate SCTP checksum offloading for IPv4 and IPv6.tuexen2012-05-301-11/+11
| | | | | | | While there: remove some trainling whitespaces. MFC after: 3 days X-MFC with: 236170
* It turns out that too many drivers are not only parsing the L2/3/4bz2012-05-281-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958
* Correctly get the payload length in host byte order. While webz2012-05-261-4/+4
| | | | | | | | | | already plan to support >64k payload here, the IPv6 header payload length obviously is only 16 bit and the calculations need to be right. Reported by: dim Tested by: dim MFC after: 1 day X-MFC: with r235958
* MFp4 bz_ipv6_fast:bz2012-05-251-10/+63
| | | | | | | | | | | | | | | | Add support for delayed checksum calculations in the IPv6 output path. We currently cannot offload to the card if we add extension headers (which incl. fragmentation). Fix two SCTP offload support copy&paste bugs: calculate checksums if fragmenting and no need to flag IPv4 header checksums in the IPv6 forwarding path. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
* In selectroute() add a missing fibnum argument to an in6_rtalloc()bz2012-02-241-2/+2
| | | | | | | | | | | | call in an #if 0 section. In in6_selecthlim() optimize a case where in6p cannot be NULL due to an earlier check. More consistently use u_int instead of int for fibnum function arguments. Sponsored by: Cisco Systems, Inc. MFC after: 3 days
* Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:bz2012-02-171-8/+18
| | | | | | | | | | | | Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
* Cache SO_REUSEPORT socket option in inpcb-layer in order to avoidtrociny2011-11-061-3/+32
| | | | | | | | | | | | | | inp_socket->so_options dereference when we may not acquire the lock on the inpcb. This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked. Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>
* Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixeshrs2011-09-201-0/+2
| | | | | | | | inconsistency when options are specified by both setsockopt() and ancillary data types. PR: kern/158307 Approved by: re (bz)
* Add support for IPv6 to ipfw fwd:bz2011-08-201-2/+34
| | | | | | | | | | | | | | | | | | | Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)
* Fix more continuous/contiguous typos (cf. r215955)brucec2010-11-271-2/+2
|
* IP_BINDANY is not correctly handled in getsockopt() case.attilio2010-09-241-0/+1
| | | | | | | | | Fix it by specifying the correct bits. Sponsored by: Sandvine Incorporated Reviewed by: bz, emaste, rstone Obtained from: Sandvine Incorporated MFC after: 10 days
* try working around panic by validating rt and llekmacy2010-05-121-1/+2
| | | | MFC after: 3 days
* Add flowtable support to IPv6kmacy2010-05-091-6/+30
| | | | | | | Tested by: qingli@ Reviewed by: qingli@ MFC after: 3 days
* The proper fix for the delayed SCTP checksum is torrs2010-03-121-1/+1
| | | | | | | | | | have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks
* With the recent change of the sctp checksum to support offload,rrs2010-03-121-0/+19
| | | | | | | | | | | no delayed checksum was added to the ip6 output code. This causes cards that do not support SCTP checksum offload to have SCTP packets that are IPv6 NOT have the sctp checksum performed. Thus you could not communicate with a peer. This adds the missing bits to make the checksum happen for these cards. PR: 144529 MFC after: 2 weeks
* Virtualize the pfil hooks so that different jails may chose differentjulian2009-10-111-2/+2
| | | | | | | | packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months
* This patch fixes the following issues:qingli2009-09-051-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+0
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Modify most routines returning 'struct ifaddr *' to return referencesrwatson2009-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
* After r193232 rt_tables in vnet.h are no longer indirectly dependent onbz2009-06-081-1/+0
| | | | | | | | | the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
OpenPOWER on IntegriCloud