From fdfdadb612a4b077d62094c7d4aa65de3524cf62 Mon Sep 17 00:00:00 2001 From: rwatson Date: Mon, 30 May 2011 09:43:55 +0000 Subject: Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc. --- sys/contrib/pf/net/pf.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) (limited to 'sys/contrib/pf/net/pf.c') diff --git a/sys/contrib/pf/net/pf.c b/sys/contrib/pf/net/pf.c index d65ab8d..756ad3a 100644 --- a/sys/contrib/pf/net/pf.c +++ b/sys/contrib/pf/net/pf.c @@ -3034,16 +3034,14 @@ pf_socket_lookup(int direction, struct pf_pdesc *pd) #ifdef INET case AF_INET: #ifdef __FreeBSD__ - INP_INFO_RLOCK(pi); /* XXX LOR */ - inp = in_pcblookup_hash(pi, saddr->v4, sport, daddr->v4, - dport, 0, NULL); + inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4, + dport, INPLOOKUP_RLOCKPCB, NULL); if (inp == NULL) { - inp = in_pcblookup_hash(pi, saddr->v4, sport, - daddr->v4, dport, INPLOOKUP_WILDCARD, NULL); - if(inp == NULL) { - INP_INFO_RUNLOCK(pi); + inp = in_pcblookup(pi, saddr->v4, sport, + daddr->v4, dport, INPLOOKUP_WILDCARD | + INPLOOKUP_RLOCKPCB, NULL); + if (inp == NULL) return (-1); - } } #else inp = in_pcbhashlookup(tb, saddr->v4, sport, daddr->v4, dport); @@ -3058,16 +3056,14 @@ pf_socket_lookup(int direction, struct pf_pdesc *pd) #ifdef INET6 case AF_INET6: #ifdef __FreeBSD__ - INP_INFO_RLOCK(pi); - inp = in6_pcblookup_hash(pi, &saddr->v6, sport, - &daddr->v6, dport, 0, NULL); + inp = in6_pcblookup(pi, &saddr->v6, sport, + &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL); if (inp == NULL) { - inp = in6_pcblookup_hash(pi, &saddr->v6, sport, - &daddr->v6, dport, INPLOOKUP_WILDCARD, NULL); - if (inp == NULL) { - INP_INFO_RUNLOCK(pi); + inp = in6_pcblookup(pi, &saddr->v6, sport, + &daddr->v6, dport, INPLOOKUP_WILDCARD | + INPLOOKUP_RLOCKPCB, NULL); + if (inp == NULL) return (-1); - } } #else inp = in6_pcbhashlookup(tb, &saddr->v6, sport, &daddr->v6, @@ -3085,9 +3081,10 @@ pf_socket_lookup(int direction, struct pf_pdesc *pd) return (-1); } #ifdef __FreeBSD__ + INP_RLOCK_ASSERT(inp); pd->lookup.uid = inp->inp_cred->cr_uid; pd->lookup.gid = inp->inp_cred->cr_groups[0]; - INP_INFO_RUNLOCK(pi); + INP_RUNLOCK(inp); #else pd->lookup.uid = inp->inp_socket->so_euid; pd->lookup.gid = inp->inp_socket->so_egid; -- cgit v1.1 From 90dbe667c5ed0f8123234e0f8b196f31befc4cf7 Mon Sep 17 00:00:00 2001 From: bz Date: Tue, 31 May 2011 15:05:29 +0000 Subject: Remove some further INET related symbols from pf to allow the module to not only compile bu load as well for testing with IPv6-only kernels. For the moment we ignore the csum change in pf_ioctl.c given the pending update to pf45. Reported by: dru Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 20 days --- sys/contrib/pf/net/pf.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'sys/contrib/pf/net/pf.c') diff --git a/sys/contrib/pf/net/pf.c b/sys/contrib/pf/net/pf.c index 756ad3a..2ce254f 100644 --- a/sys/contrib/pf/net/pf.c +++ b/sys/contrib/pf/net/pf.c @@ -6132,9 +6132,11 @@ pf_routable(struct pf_addr *addr, sa_family_t af, struct pfi_kif *kif) #ifdef __FreeBSD__ /* XXX MRT not always INET */ /* stick with table 0 though */ +#ifdef INET if (af == AF_INET) in_rtalloc_ign((struct route *)&ro, 0, 0); else +#endif rtalloc_ign((struct route *)&ro, 0); #else /* ! __FreeBSD__ */ rtalloc_noclone((struct route *)&ro, NO_CLONING); @@ -6214,9 +6216,11 @@ pf_rtlabel_match(struct pf_addr *addr, sa_family_t af, struct pf_addr_wrap *aw) # ifdef RTF_PRCLONING rtalloc_ign((struct route *)&ro, (RTF_CLONING|RTF_PRCLONING)); # else /* !RTF_PRCLONING */ +#ifdef INET if (af == AF_INET) in_rtalloc_ign((struct route *)&ro, 0, 0); else +#endif rtalloc_ign((struct route *)&ro, 0); # endif #else /* ! __FreeBSD__ */ @@ -6789,11 +6793,13 @@ pf_check_proto_cksum(struct mbuf *m, int off, int len, u_int8_t p, sa_family_t a KMOD_UDPSTAT_INC(udps_badsum); break; } +#ifdef INET case IPPROTO_ICMP: { KMOD_ICMPSTAT_INC(icps_checksum); break; } +#endif #ifdef INET6 case IPPROTO_ICMPV6: { @@ -6889,9 +6895,11 @@ pf_check_proto_cksum(struct mbuf *m, int off, int len, u_int8_t p, case IPPROTO_UDP: KMOD_UDPSTAT_INC(udps_badsum); break; +#ifdef INET case IPPROTO_ICMP: KMOD_ICMPSTAT_INC(icps_checksum); break; +#endif #ifdef INET6 case IPPROTO_ICMPV6: KMOD_ICMP6STAT_INC(icp6s_checksum); -- cgit v1.1 From e9eb5d3b9cabfc492871c5e6a6b40f13063f17f9 Mon Sep 17 00:00:00 2001 From: rwatson Date: Sat, 4 Jun 2011 16:33:06 +0000 Subject: Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc. --- sys/contrib/pf/net/pf.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'sys/contrib/pf/net/pf.c') diff --git a/sys/contrib/pf/net/pf.c b/sys/contrib/pf/net/pf.c index 2ce254f..135d734 100644 --- a/sys/contrib/pf/net/pf.c +++ b/sys/contrib/pf/net/pf.c @@ -3034,6 +3034,10 @@ pf_socket_lookup(int direction, struct pf_pdesc *pd) #ifdef INET case AF_INET: #ifdef __FreeBSD__ + /* + * XXXRW: would be nice if we had an mbuf here so that we + * could use in_pcblookup_mbuf(). + */ inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4, dport, INPLOOKUP_RLOCKPCB, NULL); if (inp == NULL) { @@ -3056,6 +3060,10 @@ pf_socket_lookup(int direction, struct pf_pdesc *pd) #ifdef INET6 case AF_INET6: #ifdef __FreeBSD__ + /* + * XXXRW: would be nice if we had an mbuf here so that we + * could use in6_pcblookup_mbuf(). + */ inp = in6_pcblookup(pi, &saddr->v6, sport, &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL); if (inp == NULL) { -- cgit v1.1