summaryrefslogtreecommitdiffstats
path: root/sys/netpfil/pf/pf.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r312943kp2017-07-201-0/+8
| | | | | | | | Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). PR: 220830
* MFC r316355kp2017-04-081-1/+1
| | | | | | | | | | pf: Fix leak of pf_state_keys If we hit the state limit we returned from pf_create_state() without cleaning up. PR: 217997 Submitted by: Max <maximos@als.nnov.ru>
* MFC 315529kp2017-03-261-0/+3
| | | | | | | | | | | | | | | | | | pf: Fix rule evaluation after inet6 route-to In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out of a different interface. pf_test6() needs to know that the packet was forwarded (in case it needs to refragment so it knows whether to call ip6_output() or ip6_forward()). This lead pf_test6() to try to evaluate rules against the PF_FWD direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT. Once fwdir is set correctly the correct output/forward function will be called. PR: 217883 Submitted by: Kajetan Staszkiewicz Sponsored by: InnoGames GmbH
* MFC r304152:kp2016-08-191-6/+6
| | | | | | | | | | pf: Add missing byte-order swap to pf_match_addr_range Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not match addresses correctly on little-endian systems. PR: 211796 Obtained from: OpenBSD (sthen)
* Update pf(4) and pflog(4) to survive basic VNET testing, which includesbz2016-06-231-36/+41
| | | | | | | | | | | | | | proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924
* pf: Filter on and set vlan PCP valueskp2016-06-171-1/+71
| | | | | | | | | | | Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786
* pf: Fix more ICMP mistranslationkp2016-05-231-1/+1
| | | | | | | | In the default case fix the substitution of the destination address. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week
* pf: Fix ICMP translationkp2016-05-231-10/+5
| | | | | | | | Fix ICMP source address rewriting in rdr scenarios. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week
* sys/net*: minor spelling fixes.pfg2016-05-031-1/+1
| | | | No functional change.
* pf: Improve forwarding detectionkp2016-03-161-4/+6
| | | | | | | | | | When we guess the nature of the outbound packet (output vs. forwarding) we need to take bridges into account. When bridging the input interface does not match the output interface, but we're not forwarding. Similarly, it's possible for the interface to actually be the bridge interface itself (and not a member interface). PR: 202351 MFC after: 2 weeks
* in pf_print_state_parts, do not use skw->proto to print the protocol but ourkp2016-02-201-1/+1
| | | | | | | local copy proto that we very carefully set beforehands. skw being NULL is perfectly valid there. Obtained from: OpenBSD (henning)
* Convert pf(4) to the new routing API.melifaro2016-01-071-42/+89
| | | | Differential Revision: https://reviews.freebsd.org/D4763
* Bring back the ability of passing cached route via nd6_output_ifp().melifaro2015-11-151-1/+1
|
* pf: Fix broken rule skip calculationkp2015-11-071-2/+2
| | | | | | | | r289932 accidentally broke the rule skip calculation. The address family argument to PF_ANEQ() is now important, and because it was set to 0 the macro always evaluated to false. This resulted in incorrect skip values, which in turn broke the rule evaluations.
* pf: Fix IPv6 checksums with route-to.kp2015-10-291-0/+7
| | | | | | | | | | | | | | When using route-to (or reply-to) pf sends the packet directly to the output interface. If that interface doesn't support checksum offloading the checksum has to be calculated in software. That was already done in the IPv4 case, but not for the IPv6 case. As a result we'd emit packets with pseudo-header checksums (i.e. incorrect checksums). This issue was exposed by the changes in r289316 when pf stopped performing full checksum calculations for all packets. Submitted by: Luoqi Chen MFC after: 1 week
* Eliminate last rtalloc_ign() caller.melifaro2015-10-271-3/+0
| | | | Differential Revision: https://reviews.freebsd.org/D3927
* pf: Fix TSO issueskp2015-10-141-31/+81
| | | | | | | | | | | | | | | | | | | | | In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779
* Simplify the way of attaching IPv6 link-layer header.melifaro2015-09-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether|atm|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469
* pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is setkp2015-09-011-1/+11
| | | | | | | | | | | | | | | | If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in pf_test6() because the rcvif and the ifp (output interface) are different. In that case we're bridging though, and the rcvif the the bridge member on which the packet was received and ifp is the bridge itself. If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect. Instead check if the rcvif is a member of the ifp bridge. (In other words, the if_bridge is the ifp's softc). If that's the case we're not forwarding but bridging. PR: 202351 Reviewed by: eri Differential Revision: https://reviews.freebsd.org/D3534
* Simplify logic added in r285945 as suggested by glebiusgarga2015-07-281-4/+2
| | | | | | Approved by: glebius MFC after: 3 days Sponsored by: Netgate
* Respect pf rule log option before log dropped packets with IP options orgarga2015-07-281-2/+4
| | | | | | | | | | | dangerous v6 headers Reviewed by: gnn, eri Approved by: gnn Obtained from: pfSense MFC after: 3 days Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D3222
* Fix a typo in r280169. Of course we are interested in deleting nsn onlyglebius2015-07-281-1/+1
| | | | | | if we have just created it and we were the last reference. Submitted by: dhartmei
* ALTQ FAIRQ discipline import from DragonFLYeri2015-06-241-0/+18
| | | | | | | | Differential Revision: https://reviews.freebsd.org/D2847 Reviewed by: glebius, wblock(manpage) Approved by: gnn(mentor) Obtained from: pfSense Sponsored by: Netgate
* Use MTX_SYSINIT() instead of mtx_init() to separate mutex initializationglebius2015-05-191-9/+5
| | | | | | | from associated structures initialization. The mutexes are global, while the structures are per-vnet. Submitted by: Nikos Vassiliadis <nvass gmx.com>
* A miss from r283061: don't dereference NULL is pf_get_mtag() fails.glebius2015-05-181-2/+4
| | | | | PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>
* Don't dereference NULL is pf_get_mtag() fails.glebius2015-05-181-12/+14
| | | | | PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>
* pf: Fix forwarding detectionkp2015-04-141-1/+1
| | | | | | | | | | | | If the direction is not PF_OUT we can never be forwarding. Some input packets have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound packets, causing panics. Equally, we need to ensure that packets were really received and not locally generated before trying to ip6_forward() them. Differential Revision: https://reviews.freebsd.org/D2286 Approved by: gnn(mentor)
* Always lock the hash row of a source node when updating its 'states' counter.glebius2015-03-171-55/+56
| | | | | PR: 182401 Sponsored by: Nginx, Inc.
* Reset mbuf pointer to NULL in fastroute case to indicate that mbuf wasae2015-03-121-0/+1
| | | | | | | | consumed by filter. This fixes several panics due to accessing to mbuf after free. Submitted by: Kristof Provost MFC after: 1 week
* In the forwarding case refragment the reassembled packets with the sameglebius2015-02-161-1/+11
| | | | | | | | | | size as they arrived in. This allows the sender to determine the optimal fragment size by Path MTU Discovery. Roughly based on the OpenBSD work by Alexander Bluhm. Submitted by: Kristof Provost Differential Revision: D1767
* Update the pf fragment handling code to closer match recent OpenBSD.glebius2015-02-161-0/+39
| | | | | | | | | That partially fixes IPv6 fragment handling. Thanks to Kristof for working on that. Submitted by: Kristof Provost Tested by: peter Differential Revision: D1765
* Back out r276841, r276756, r276747, r276746. The change in r276747 is veryglebius2015-01-221-30/+69
| | | | | | | | | | | | | very questionable, since it makes vimages more dependent on each other. But the reason for the backout is that it screwed up shutting down the pf purge threads, and now kernel immedially panics on pf module unload. Although module unloading isn't an advertised feature of pf, it is very important for development process. I'd like to not backout r276746, since in general it is good. But since it has introduced numerous build breakages, that later were addressed in r276841, r276756, r276747, I need to back it out as well. Better replay it in clean fashion from scratch.
* Reapply previous patch to fix build.rodrigc2015-01-061-9/+6
| | | | PR: 194515
* Instead of creating a purge thread for every vnet, createrodrigc2015-01-061-58/+24
| | | | | | | | a single purge thread and clean up all vnets from this thread. PR: 194515 Differential Revision: D1315 Submitted by: Nikos Vassiliadis <nvass@gmx.com>
* Merge: r258322 from projects/pf branchrodrigc2015-01-061-2/+0
| | | | | | | | | | | | | | Split functions that initialize various pf parts into their vimage parts and global parts. Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). PR: 194515 Differential Revision: D1309 Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com> Reviewed by: trociny, zec, gnn
* Finish r274315: remove union 'u' from struct pf_send_entry.melifaro2014-11-091-16/+11
| | | | Suggested by: kib
* Remove unused 'struct route' fields.melifaro2014-11-091-2/+0
|
* Fix multiple incorrect SYSCTL arguments in the kernel:hselasky2014-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies
* Add a complete implementation of MurmurHash3. Tweak both implementationsdes2014-10-181-7/+7
| | | | | | | so they match the established idiom. Document them in hash(9). MFC after: 1 month MFC with: r272906
* Change the PF hash from Jenkins to Murmur3. In forwarding testsgnn2014-10-101-7/+7
| | | | | | | | | this showed a conservative 3% incrase in PPS. Differential Revision: https://reviews.freebsd.org/D461 Submitted by: des Reviewed by: emaste MFC after: 1 month
* Clean up unused CSUM_FRAGMENT.glebius2014-09-031-2/+1
| | | | Sponsored by: Nginx, Inc.
* Explicitly free packet on PF_DROP, otherwise a "quick" rule withglebius2014-09-011-0/+8
| | | | | | | | "route-to" may still forward it. PR: 177808 Submitted by: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH
* Do not lookup source node twice when pf_map_addr() is used.glebius2014-08-151-2/+0
| | | | | | PR: 184003 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH
* pf_map_addr() can fail and in this case we should drop the packet,glebius2014-08-151-28/+13
| | | | | | | | | | | otherwise bad consequences including a routing loop can occur. Move pf_set_rt_ifp() earlier in state creation sequence and inline it, cutting some extra code. PR: 183997 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH
* Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.glebius2014-08-151-0/+3
| | | | | | PR: 127920 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH
* - Count global pf(4) statistics in counter(9).glebius2014-08-141-18/+21
| | | | | | | | | | | - Do not count global number of states and of src_nodes, use uma_zone_get_cur() to obtain values. - Struct pf_status becomes merely an ioctl API structure, and moves to netpfil/pf/pf.h with its constants. - V_pf_status is now of type struct pf_kstatus. Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH
* Pull in r267961 and r267973 again. Fix for issues reported will follow.hselasky2014-06-281-2/+0
|
* Revert r267961, r267973:gjb2014-06-271-0/+2
| | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifhselasky2014-06-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
* Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,jhb2014-05-291-1/+1
| | | | not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
OpenPOWER on IntegriCloud