summaryrefslogtreecommitdiffstats
path: root/net/openvswitch
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-07-09 18:24:39 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2013-07-09 18:24:39 -0700
commit496322bc91e35007ed754184dcd447a02b6dd685 (patch)
treef5298d0a74c0a6e65c0e98050b594b8d020904c1 /net/openvswitch
parent2e17c5a97e231f3cb426f4b7895eab5be5c5442e (diff)
parent56e0ef527b184b3de2d7f88c6190812b2b2ac6bf (diff)
downloadop-kernel-dev-496322bc91e35007ed754184dcd447a02b6dd685.zip
op-kernel-dev-496322bc91e35007ed754184dcd447a02b6dd685.tar.gz
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
Diffstat (limited to 'net/openvswitch')
-rw-r--r--net/openvswitch/Kconfig14
-rw-r--r--net/openvswitch/Makefile3
-rw-r--r--net/openvswitch/actions.c8
-rw-r--r--net/openvswitch/datapath.c371
-rw-r--r--net/openvswitch/datapath.h4
-rw-r--r--net/openvswitch/dp_notify.c2
-rw-r--r--net/openvswitch/flow.c211
-rw-r--r--net/openvswitch/flow.h47
-rw-r--r--net/openvswitch/vport-gre.c275
-rw-r--r--net/openvswitch/vport-internal_dev.c3
-rw-r--r--net/openvswitch/vport-netdev.c9
-rw-r--r--net/openvswitch/vport-netdev.h1
-rw-r--r--net/openvswitch/vport.c34
-rw-r--r--net/openvswitch/vport.h23
14 files changed, 884 insertions, 121 deletions
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index d9ea33c..27ee56b 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -26,3 +26,17 @@ config OPENVSWITCH
called openvswitch.
If unsure, say N.
+
+config OPENVSWITCH_GRE
+ bool "Open vSwitch GRE tunneling support"
+ depends on INET
+ depends on OPENVSWITCH
+ depends on NET_IPGRE_DEMUX && !(OPENVSWITCH=y && NET_IPGRE_DEMUX=m)
+ default y
+ ---help---
+ If you say Y here, then the Open vSwitch will be able create GRE
+ vport.
+
+ Say N to exclude this support and reduce the binary size.
+
+ If unsure, say Y.
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 15e7384..01bddb2 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -10,5 +10,6 @@ openvswitch-y := \
dp_notify.o \
flow.o \
vport.o \
+ vport-gre.o \
vport-internal_dev.o \
- vport-netdev.o \
+ vport-netdev.o
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 894b6cb..22c5f39 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -130,9 +130,13 @@ static int set_eth_addr(struct sk_buff *skb,
if (unlikely(err))
return err;
+ skb_postpull_rcsum(skb, eth_hdr(skb), ETH_ALEN * 2);
+
memcpy(eth_hdr(skb)->h_source, eth_key->eth_src, ETH_ALEN);
memcpy(eth_hdr(skb)->h_dest, eth_key->eth_dst, ETH_ALEN);
+ ovs_skb_postpush_rcsum(skb, eth_hdr(skb), ETH_ALEN * 2);
+
return 0;
}
@@ -432,6 +436,10 @@ static int execute_set_action(struct sk_buff *skb,
skb->mark = nla_get_u32(nested_attr);
break;
+ case OVS_KEY_ATTR_IPV4_TUNNEL:
+ OVS_CB(skb)->tun_key = nla_data(nested_attr);
+ break;
+
case OVS_KEY_ATTR_ETHERNET:
err = set_eth_addr(skb, nla_data(nested_attr));
break;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d12d6b8..f7e3a0d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -362,6 +362,14 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
static size_t key_attr_size(void)
{
return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */
+ + nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */
+ + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */
+ + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
+ + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
+ + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */
+ + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */
+ + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
+ + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */
+ nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */
+ nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */
+ nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
@@ -464,16 +472,89 @@ static int flush_flows(struct datapath *dp)
return 0;
}
-static int validate_actions(const struct nlattr *attr,
- const struct sw_flow_key *key, int depth);
+static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, int attr_len)
+{
+
+ struct sw_flow_actions *acts;
+ int new_acts_size;
+ int req_size = NLA_ALIGN(attr_len);
+ int next_offset = offsetof(struct sw_flow_actions, actions) +
+ (*sfa)->actions_len;
+
+ if (req_size <= (ksize(*sfa) - next_offset))
+ goto out;
+
+ new_acts_size = ksize(*sfa) * 2;
+
+ if (new_acts_size > MAX_ACTIONS_BUFSIZE) {
+ if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size)
+ return ERR_PTR(-EMSGSIZE);
+ new_acts_size = MAX_ACTIONS_BUFSIZE;
+ }
+
+ acts = ovs_flow_actions_alloc(new_acts_size);
+ if (IS_ERR(acts))
+ return (void *)acts;
+
+ memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len);
+ acts->actions_len = (*sfa)->actions_len;
+ kfree(*sfa);
+ *sfa = acts;
+
+out:
+ (*sfa)->actions_len += req_size;
+ return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset);
+}
+
+static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len)
+{
+ struct nlattr *a;
+
+ a = reserve_sfa_size(sfa, nla_attr_size(len));
+ if (IS_ERR(a))
+ return PTR_ERR(a);
+
+ a->nla_type = attrtype;
+ a->nla_len = nla_attr_size(len);
+
+ if (data)
+ memcpy(nla_data(a), data, len);
+ memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len));
+
+ return 0;
+}
+
+static inline int add_nested_action_start(struct sw_flow_actions **sfa, int attrtype)
+{
+ int used = (*sfa)->actions_len;
+ int err;
+
+ err = add_action(sfa, attrtype, NULL, 0);
+ if (err)
+ return err;
+
+ return used;
+}
-static int validate_sample(const struct nlattr *attr,
- const struct sw_flow_key *key, int depth)
+static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_offset)
+{
+ struct nlattr *a = (struct nlattr *) ((unsigned char *)sfa->actions + st_offset);
+
+ a->nla_len = sfa->actions_len - st_offset;
+}
+
+static int validate_and_copy_actions(const struct nlattr *attr,
+ const struct sw_flow_key *key, int depth,
+ struct sw_flow_actions **sfa);
+
+static int validate_and_copy_sample(const struct nlattr *attr,
+ const struct sw_flow_key *key, int depth,
+ struct sw_flow_actions **sfa)
{
const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
const struct nlattr *probability, *actions;
const struct nlattr *a;
- int rem;
+ int rem, start, err, st_acts;
memset(attrs, 0, sizeof(attrs));
nla_for_each_nested(a, attr, rem) {
@@ -492,7 +573,26 @@ static int validate_sample(const struct nlattr *attr,
actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
return -EINVAL;
- return validate_actions(actions, key, depth + 1);
+
+ /* validation done, copy sample action. */
+ start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE);
+ if (start < 0)
+ return start;
+ err = add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, nla_data(probability), sizeof(u32));
+ if (err)
+ return err;
+ st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS);
+ if (st_acts < 0)
+ return st_acts;
+
+ err = validate_and_copy_actions(actions, key, depth + 1, sfa);
+ if (err)
+ return err;
+
+ add_nested_action_end(*sfa, st_acts);
+ add_nested_action_end(*sfa, start);
+
+ return 0;
}
static int validate_tp_port(const struct sw_flow_key *flow_key)
@@ -508,8 +608,30 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
return -EINVAL;
}
+static int validate_and_copy_set_tun(const struct nlattr *attr,
+ struct sw_flow_actions **sfa)
+{
+ struct ovs_key_ipv4_tunnel tun_key;
+ int err, start;
+
+ err = ovs_ipv4_tun_from_nlattr(nla_data(attr), &tun_key);
+ if (err)
+ return err;
+
+ start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET);
+ if (start < 0)
+ return start;
+
+ err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &tun_key, sizeof(tun_key));
+ add_nested_action_end(*sfa, start);
+
+ return err;
+}
+
static int validate_set(const struct nlattr *a,
- const struct sw_flow_key *flow_key)
+ const struct sw_flow_key *flow_key,
+ struct sw_flow_actions **sfa,
+ bool *set_tun)
{
const struct nlattr *ovs_key = nla_data(a);
int key_type = nla_type(ovs_key);
@@ -519,18 +641,27 @@ static int validate_set(const struct nlattr *a,
return -EINVAL;
if (key_type > OVS_KEY_ATTR_MAX ||
- nla_len(ovs_key) != ovs_key_lens[key_type])
+ (ovs_key_lens[key_type] != nla_len(ovs_key) &&
+ ovs_key_lens[key_type] != -1))
return -EINVAL;
switch (key_type) {
const struct ovs_key_ipv4 *ipv4_key;
const struct ovs_key_ipv6 *ipv6_key;
+ int err;
case OVS_KEY_ATTR_PRIORITY:
case OVS_KEY_ATTR_SKB_MARK:
case OVS_KEY_ATTR_ETHERNET:
break;
+ case OVS_KEY_ATTR_TUNNEL:
+ *set_tun = true;
+ err = validate_and_copy_set_tun(a, sfa);
+ if (err)
+ return err;
+ break;
+
case OVS_KEY_ATTR_IPV4:
if (flow_key->eth.type != htons(ETH_P_IP))
return -EINVAL;
@@ -606,8 +737,24 @@ static int validate_userspace(const struct nlattr *attr)
return 0;
}
-static int validate_actions(const struct nlattr *attr,
- const struct sw_flow_key *key, int depth)
+static int copy_action(const struct nlattr *from,
+ struct sw_flow_actions **sfa)
+{
+ int totlen = NLA_ALIGN(from->nla_len);
+ struct nlattr *to;
+
+ to = reserve_sfa_size(sfa, from->nla_len);
+ if (IS_ERR(to))
+ return PTR_ERR(to);
+
+ memcpy(to, from, totlen);
+ return 0;
+}
+
+static int validate_and_copy_actions(const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ int depth,
+ struct sw_flow_actions **sfa)
{
const struct nlattr *a;
int rem, err;
@@ -627,12 +774,14 @@ static int validate_actions(const struct nlattr *attr,
};
const struct ovs_action_push_vlan *vlan;
int type = nla_type(a);
+ bool skip_copy;
if (type > OVS_ACTION_ATTR_MAX ||
(action_lens[type] != nla_len(a) &&
action_lens[type] != (u32)-1))
return -EINVAL;
+ skip_copy = false;
switch (type) {
case OVS_ACTION_ATTR_UNSPEC:
return -EINVAL;
@@ -661,20 +810,26 @@ static int validate_actions(const struct nlattr *attr,
break;
case OVS_ACTION_ATTR_SET:
- err = validate_set(a, key);
+ err = validate_set(a, key, sfa, &skip_copy);
if (err)
return err;
break;
case OVS_ACTION_ATTR_SAMPLE:
- err = validate_sample(a, key, depth);
+ err = validate_and_copy_sample(a, key, depth, sfa);
if (err)
return err;
+ skip_copy = true;
break;
default:
return -EINVAL;
}
+ if (!skip_copy) {
+ err = copy_action(a, sfa);
+ if (err)
+ return err;
+ }
}
if (rem > 0)
@@ -739,24 +894,18 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
if (err)
goto err_flow_free;
- err = ovs_flow_metadata_from_nlattrs(&flow->key.phy.priority,
- &flow->key.phy.skb_mark,
- &flow->key.phy.in_port,
- a[OVS_PACKET_ATTR_KEY]);
- if (err)
- goto err_flow_free;
-
- err = validate_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, 0);
+ err = ovs_flow_metadata_from_nlattrs(flow, key_len, a[OVS_PACKET_ATTR_KEY]);
if (err)
goto err_flow_free;
-
- flow->hash = ovs_flow_hash(&flow->key, key_len);
-
- acts = ovs_flow_actions_alloc(a[OVS_PACKET_ATTR_ACTIONS]);
+ acts = ovs_flow_actions_alloc(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
err = PTR_ERR(acts);
if (IS_ERR(acts))
goto err_flow_free;
+
+ err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, 0, &acts);
rcu_assign_pointer(flow->sf_acts, acts);
+ if (err)
+ goto err_flow_free;
OVS_CB(packet)->flow = flow;
packet->priority = flow->key.phy.priority;
@@ -846,6 +995,99 @@ static struct genl_multicast_group ovs_dp_flow_multicast_group = {
.name = OVS_FLOW_MCGROUP
};
+static int actions_to_attr(const struct nlattr *attr, int len, struct sk_buff *skb);
+static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
+{
+ const struct nlattr *a;
+ struct nlattr *start;
+ int err = 0, rem;
+
+ start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
+ if (!start)
+ return -EMSGSIZE;
+
+ nla_for_each_nested(a, attr, rem) {
+ int type = nla_type(a);
+ struct nlattr *st_sample;
+
+ switch (type) {
+ case OVS_SAMPLE_ATTR_PROBABILITY:
+ if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY, sizeof(u32), nla_data(a)))
+ return -EMSGSIZE;
+ break;
+ case OVS_SAMPLE_ATTR_ACTIONS:
+ st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
+ if (!st_sample)
+ return -EMSGSIZE;
+ err = actions_to_attr(nla_data(a), nla_len(a), skb);
+ if (err)
+ return err;
+ nla_nest_end(skb, st_sample);
+ break;
+ }
+ }
+
+ nla_nest_end(skb, start);
+ return err;
+}
+
+static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
+{
+ const struct nlattr *ovs_key = nla_data(a);
+ int key_type = nla_type(ovs_key);
+ struct nlattr *start;
+ int err;
+
+ switch (key_type) {
+ case OVS_KEY_ATTR_IPV4_TUNNEL:
+ start = nla_nest_start(skb, OVS_ACTION_ATTR_SET);
+ if (!start)
+ return -EMSGSIZE;
+
+ err = ovs_ipv4_tun_to_nlattr(skb, nla_data(ovs_key));
+ if (err)
+ return err;
+ nla_nest_end(skb, start);
+ break;
+ default:
+ if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key))
+ return -EMSGSIZE;
+ break;
+ }
+
+ return 0;
+}
+
+static int actions_to_attr(const struct nlattr *attr, int len, struct sk_buff *skb)
+{
+ const struct nlattr *a;
+ int rem, err;
+
+ nla_for_each_attr(a, attr, len, rem) {
+ int type = nla_type(a);
+
+ switch (type) {
+ case OVS_ACTION_ATTR_SET:
+ err = set_action_to_attr(a, skb);
+ if (err)
+ return err;
+ break;
+
+ case OVS_ACTION_ATTR_SAMPLE:
+ err = sample_action_to_attr(a, skb);
+ if (err)
+ return err;
+ break;
+ default:
+ if (nla_put(skb, type, nla_len(a), nla_data(a)))
+ return -EMSGSIZE;
+ break;
+ }
+ }
+
+ return 0;
+}
+
static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
{
return NLMSG_ALIGN(sizeof(struct ovs_header))
@@ -863,6 +1105,7 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
{
const int skb_orig_len = skb->len;
const struct sw_flow_actions *sf_acts;
+ struct nlattr *start;
struct ovs_flow_stats stats;
struct ovs_header *ovs_header;
struct nlattr *nla;
@@ -916,10 +1159,19 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
* This can only fail for dump operations because the skb is always
* properly sized for single flows.
*/
- err = nla_put(skb, OVS_FLOW_ATTR_ACTIONS, sf_acts->actions_len,
- sf_acts->actions);
- if (err < 0 && skb_orig_len)
- goto error;
+ start = nla_nest_start(skb, OVS_FLOW_ATTR_ACTIONS);
+ if (start) {
+ err = actions_to_attr(sf_acts->actions, sf_acts->actions_len, skb);
+ if (!err)
+ nla_nest_end(skb, start);
+ else {
+ if (skb_orig_len)
+ goto error;
+
+ nla_nest_cancel(skb, start);
+ }
+ } else if (skb_orig_len)
+ goto nla_put_failure;
return genlmsg_end(skb, ovs_header);
@@ -964,6 +1216,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
struct sk_buff *reply;
struct datapath *dp;
struct flow_table *table;
+ struct sw_flow_actions *acts = NULL;
int error;
int key_len;
@@ -977,9 +1230,14 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
/* Validate actions. */
if (a[OVS_FLOW_ATTR_ACTIONS]) {
- error = validate_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, 0);
- if (error)
+ acts = ovs_flow_actions_alloc(nla_len(a[OVS_FLOW_ATTR_ACTIONS]));
+ error = PTR_ERR(acts);
+ if (IS_ERR(acts))
goto error;
+
+ error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, 0, &acts);
+ if (error)
+ goto err_kfree;
} else if (info->genlhdr->cmd == OVS_FLOW_CMD_NEW) {
error = -EINVAL;
goto error;
@@ -994,8 +1252,6 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
table = ovsl_dereference(dp->table);
flow = ovs_flow_tbl_lookup(table, &key, key_len);
if (!flow) {
- struct sw_flow_actions *acts;
-
/* Bail out if we're not allowed to create a new flow. */
error = -ENOENT;
if (info->genlhdr->cmd == OVS_FLOW_CMD_SET)
@@ -1019,19 +1275,12 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
error = PTR_ERR(flow);
goto err_unlock_ovs;
}
- flow->key = key;
clear_stats(flow);
- /* Obtain actions. */
- acts = ovs_flow_actions_alloc(a[OVS_FLOW_ATTR_ACTIONS]);
- error = PTR_ERR(acts);
- if (IS_ERR(acts))
- goto error_free_flow;
rcu_assign_pointer(flow->sf_acts, acts);
/* Put flow in bucket. */
- flow->hash = ovs_flow_hash(&key, key_len);
- ovs_flow_tbl_insert(table, flow);
+ ovs_flow_tbl_insert(table, flow, &key, key_len);
reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
info->snd_seq,
@@ -1039,7 +1288,6 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
} else {
/* We found a matching flow. */
struct sw_flow_actions *old_acts;
- struct nlattr *acts_attrs;
/* Bail out if we're not allowed to modify an existing flow.
* We accept NLM_F_CREATE in place of the intended NLM_F_EXCL
@@ -1054,21 +1302,8 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
/* Update actions. */
old_acts = ovsl_dereference(flow->sf_acts);
- acts_attrs = a[OVS_FLOW_ATTR_ACTIONS];
- if (acts_attrs &&
- (old_acts->actions_len != nla_len(acts_attrs) ||
- memcmp(old_acts->actions, nla_data(acts_attrs),
- old_acts->actions_len))) {
- struct sw_flow_actions *new_acts;
-
- new_acts = ovs_flow_actions_alloc(acts_attrs);
- error = PTR_ERR(new_acts);
- if (IS_ERR(new_acts))
- goto err_unlock_ovs;
-
- rcu_assign_pointer(flow->sf_acts, new_acts);
- ovs_flow_deferred_free_acts(old_acts);
- }
+ rcu_assign_pointer(flow->sf_acts, acts);
+ ovs_flow_deferred_free_acts(old_acts);
reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
info->snd_seq, OVS_FLOW_CMD_NEW);
@@ -1089,10 +1324,10 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
ovs_dp_flow_multicast_group.id, PTR_ERR(reply));
return 0;
-error_free_flow:
- ovs_flow_free(flow);
err_unlock_ovs:
ovs_unlock();
+err_kfree:
+ kfree(acts);
error:
return error;
}
@@ -1812,10 +2047,11 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(vport))
goto exit_unlock;
- err = 0;
if (a[OVS_VPORT_ATTR_TYPE] &&
- nla_get_u32(a[OVS_VPORT_ATTR_TYPE]) != vport->ops->type)
+ nla_get_u32(a[OVS_VPORT_ATTR_TYPE]) != vport->ops->type) {
err = -EINVAL;
+ goto exit_unlock;
+ }
reply = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!reply) {
@@ -1823,10 +2059,11 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
goto exit_unlock;
}
- if (!err && a[OVS_VPORT_ATTR_OPTIONS])
+ if (a[OVS_VPORT_ATTR_OPTIONS]) {
err = ovs_vport_set_options(vport, a[OVS_VPORT_ATTR_OPTIONS]);
- if (err)
- goto exit_free;
+ if (err)
+ goto exit_free;
+ }
if (a[OVS_VPORT_ATTR_UPCALL_PID])
vport->upcall_portid = nla_get_u32(a[OVS_VPORT_ATTR_UPCALL_PID]);
@@ -1867,8 +2104,8 @@ static int ovs_vport_cmd_del(struct sk_buff *skb, struct genl_info *info)
goto exit_unlock;
}
- reply = ovs_vport_cmd_build_info(vport, info->snd_portid, info->snd_seq,
- OVS_VPORT_CMD_DEL);
+ reply = ovs_vport_cmd_build_info(vport, info->snd_portid,
+ info->snd_seq, OVS_VPORT_CMD_DEL);
err = PTR_ERR(reply);
if (IS_ERR(reply))
goto exit_unlock;
@@ -1897,8 +2134,8 @@ static int ovs_vport_cmd_get(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(vport))
goto exit_unlock;
- reply = ovs_vport_cmd_build_info(vport, info->snd_portid, info->snd_seq,
- OVS_VPORT_CMD_NEW);
+ reply = ovs_vport_cmd_build_info(vport, info->snd_portid,
+ info->snd_seq, OVS_VPORT_CMD_NEW);
err = PTR_ERR(reply);
if (IS_ERR(reply))
goto exit_unlock;
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 16b8406..a914864 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -88,9 +88,12 @@ struct datapath {
/**
* struct ovs_skb_cb - OVS data in skb CB
* @flow: The flow associated with this packet. May be %NULL if no flow.
+ * @tun_key: Key for the tunnel that encapsulated this packet. NULL if the
+ * packet is not being tunneled.
*/
struct ovs_skb_cb {
struct sw_flow *flow;
+ struct ovs_key_ipv4_tunnel *tun_key;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
@@ -119,6 +122,7 @@ struct dp_upcall_info {
struct ovs_net {
struct list_head dps;
struct work_struct dp_notify_work;
+ struct vport_net vport_net;
};
extern int ovs_net_id;
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index ef4feec..c323567 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -78,7 +78,7 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
void *ptr)
{
struct ovs_net *ovs_net;
- struct net_device *dev = ptr;
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct vport *vport = NULL;
if (!ovs_is_internal_dev(dev))
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index b15321a..5c519b1 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -40,6 +40,7 @@
#include <linux/icmpv6.h>
#include <linux/rculist.h>
#include <net/ip.h>
+#include <net/ip_tunnels.h>
#include <net/ipv6.h>
#include <net/ndisc.h>
@@ -198,20 +199,18 @@ void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb)
spin_unlock(&flow->lock);
}
-struct sw_flow_actions *ovs_flow_actions_alloc(const struct nlattr *actions)
+struct sw_flow_actions *ovs_flow_actions_alloc(int size)
{
- int actions_len = nla_len(actions);
struct sw_flow_actions *sfa;
- if (actions_len > MAX_ACTIONS_BUFSIZE)
+ if (size > MAX_ACTIONS_BUFSIZE)
return ERR_PTR(-EINVAL);
- sfa = kmalloc(sizeof(*sfa) + actions_len, GFP_KERNEL);
+ sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL);
if (!sfa)
return ERR_PTR(-ENOMEM);
- sfa->actions_len = actions_len;
- nla_memcpy(sfa->actions, actions, actions_len);
+ sfa->actions_len = 0;
return sfa;
}
@@ -354,6 +353,14 @@ struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *la
return NULL;
}
+static void __flow_tbl_insert(struct flow_table *table, struct sw_flow *flow)
+{
+ struct hlist_head *head;
+ head = find_bucket(table, flow->hash);
+ hlist_add_head_rcu(&flow->hash_node[table->node_ver], head);
+ table->count++;
+}
+
static void flow_table_copy_flows(struct flow_table *old, struct flow_table *new)
{
int old_ver;
@@ -370,7 +377,7 @@ static void flow_table_copy_flows(struct flow_table *old, struct flow_table *new
head = flex_array_get(old->buckets, i);
hlist_for_each_entry(flow, head, hash_node[old_ver])
- ovs_flow_tbl_insert(new, flow);
+ __flow_tbl_insert(new, flow);
}
old->keep_flows = true;
}
@@ -590,10 +597,10 @@ out:
* - skb->network_header: just past the Ethernet header, or just past the
* VLAN header, to the first byte of the Ethernet payload.
*
- * - skb->transport_header: If key->dl_type is ETH_P_IP or ETH_P_IPV6
+ * - skb->transport_header: If key->eth.type is ETH_P_IP or ETH_P_IPV6
* on output, then just past the IP header, if one is present and
* of a correct length, otherwise the same as skb->network_header.
- * For other key->dl_type values it is left untouched.
+ * For other key->eth.type values it is left untouched.
*/
int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
int *key_lenp)
@@ -605,6 +612,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
memset(key, 0, sizeof(*key));
key->phy.priority = skb->priority;
+ if (OVS_CB(skb)->tun_key)
+ memcpy(&key->tun_key, OVS_CB(skb)->tun_key, sizeof(key->tun_key));
key->phy.in_port = in_port;
key->phy.skb_mark = skb->mark;
@@ -618,6 +627,9 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
memcpy(key->eth.dst, eth->h_dest, ETH_ALEN);
__skb_pull(skb, 2 * ETH_ALEN);
+ /* We are going to push all headers that we pull, so no need to
+ * update skb->csum here.
+ */
if (vlan_tx_tag_present(skb))
key->eth.tci = htons(skb->vlan_tci);
@@ -759,9 +771,18 @@ out:
return error;
}
-u32 ovs_flow_hash(const struct sw_flow_key *key, int key_len)
+static u32 ovs_flow_hash(const struct sw_flow_key *key, int key_start, int key_len)
+{
+ return jhash2((u32 *)((u8 *)key + key_start),
+ DIV_ROUND_UP(key_len - key_start, sizeof(u32)), 0);
+}
+
+static int flow_key_start(struct sw_flow_key *key)
{
- return jhash2((u32 *)key, DIV_ROUND_UP(key_len, sizeof(u32)), 0);
+ if (key->tun_key.ipv4_dst)
+ return 0;
+ else
+ return offsetof(struct sw_flow_key, phy);
}
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *table,
@@ -769,28 +790,31 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *table,
{
struct sw_flow *flow;
struct hlist_head *head;
+ u8 *_key;
+ int key_start;
u32 hash;
- hash = ovs_flow_hash(key, key_len);
+ key_start = flow_key_start(key);
+ hash = ovs_flow_hash(key, key_start, key_len);
+ _key = (u8 *) key + key_start;
head = find_bucket(table, hash);
hlist_for_each_entry_rcu(flow, head, hash_node[table->node_ver]) {
if (flow->hash == hash &&
- !memcmp(&flow->key, key, key_len)) {
+ !memcmp((u8 *)&flow->key + key_start, _key, key_len - key_start)) {
return flow;
}
}
return NULL;
}
-void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow)
+void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
+ struct sw_flow_key *key, int key_len)
{
- struct hlist_head *head;
-
- head = find_bucket(table, flow->hash);
- hlist_add_head_rcu(&flow->hash_node[table->node_ver], head);
- table->count++;
+ flow->hash = ovs_flow_hash(key, flow_key_start(key), key_len);
+ memcpy(&flow->key, key, sizeof(flow->key));
+ __flow_tbl_insert(table, flow);
}
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
@@ -817,6 +841,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6),
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
+ [OVS_KEY_ATTR_TUNNEL] = -1,
};
static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
@@ -954,6 +979,105 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
return 0;
}
+int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr,
+ struct ovs_key_ipv4_tunnel *tun_key)
+{
+ struct nlattr *a;
+ int rem;
+ bool ttl = false;
+
+ memset(tun_key, 0, sizeof(*tun_key));
+
+ nla_for_each_nested(a, attr, rem) {
+ int type = nla_type(a);
+ static const u32 ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = {
+ [OVS_TUNNEL_KEY_ATTR_ID] = sizeof(u64),
+ [OVS_TUNNEL_KEY_ATTR_IPV4_SRC] = sizeof(u32),
+ [OVS_TUNNEL_KEY_ATTR_IPV4_DST] = sizeof(u32),
+ [OVS_TUNNEL_KEY_ATTR_TOS] = 1,
+ [OVS_TUNNEL_KEY_ATTR_TTL] = 1,
+ [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0,
+ [OVS_TUNNEL_KEY_ATTR_CSUM] = 0,
+ };
+
+ if (type > OVS_TUNNEL_KEY_ATTR_MAX ||
+ ovs_tunnel_key_lens[type] != nla_len(a))
+ return -EINVAL;
+
+ switch (type) {
+ case OVS_TUNNEL_KEY_ATTR_ID:
+ tun_key->tun_id = nla_get_be64(a);
+ tun_key->tun_flags |= TUNNEL_KEY;
+ break;
+ case OVS_TUNNEL_KEY_ATTR_IPV4_SRC:
+ tun_key->ipv4_src = nla_get_be32(a);
+ break;
+ case OVS_TUNNEL_KEY_ATTR_IPV4_DST:
+ tun_key->ipv4_dst = nla_get_be32(a);
+ break;
+ case OVS_TUNNEL_KEY_ATTR_TOS:
+ tun_key->ipv4_tos = nla_get_u8(a);
+ break;
+ case OVS_TUNNEL_KEY_ATTR_TTL:
+ tun_key->ipv4_ttl = nla_get_u8(a);
+ ttl = true;
+ break;
+ case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT:
+ tun_key->tun_flags |= TUNNEL_DONT_FRAGMENT;
+ break;
+ case OVS_TUNNEL_KEY_ATTR_CSUM:
+ tun_key->tun_flags |= TUNNEL_CSUM;
+ break;
+ default:
+ return -EINVAL;
+
+ }
+ }
+ if (rem > 0)
+ return -EINVAL;
+
+ if (!tun_key->ipv4_dst)
+ return -EINVAL;
+
+ if (!ttl)
+ return -EINVAL;
+
+ return 0;
+}
+
+int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
+ const struct ovs_key_ipv4_tunnel *tun_key)
+{
+ struct nlattr *nla;
+
+ nla = nla_nest_start(skb, OVS_KEY_ATTR_TUNNEL);
+ if (!nla)
+ return -EMSGSIZE;
+
+ if (tun_key->tun_flags & TUNNEL_KEY &&
+ nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, tun_key->tun_id))
+ return -EMSGSIZE;
+ if (tun_key->ipv4_src &&
+ nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, tun_key->ipv4_src))
+ return -EMSGSIZE;
+ if (nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, tun_key->ipv4_dst))
+ return -EMSGSIZE;
+ if (tun_key->ipv4_tos &&
+ nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, tun_key->ipv4_tos))
+ return -EMSGSIZE;
+ if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, tun_key->ipv4_ttl))
+ return -EMSGSIZE;
+ if ((tun_key->tun_flags & TUNNEL_DONT_FRAGMENT) &&
+ nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT))
+ return -EMSGSIZE;
+ if ((tun_key->tun_flags & TUNNEL_CSUM) &&
+ nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM))
+ return -EMSGSIZE;
+
+ nla_nest_end(skb, nla);
+ return 0;
+}
+
/**
* ovs_flow_from_nlattrs - parses Netlink attributes into a flow key.
* @swkey: receives the extracted flow key.
@@ -996,6 +1120,14 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK);
}
+ if (attrs & (1 << OVS_KEY_ATTR_TUNNEL)) {
+ err = ovs_ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], &swkey->tun_key);
+ if (err)
+ return err;
+
+ attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL);
+ }
+
/* Data attributes. */
if (!(attrs & (1 << OVS_KEY_ATTR_ETHERNET)))
return -EINVAL;
@@ -1122,10 +1254,9 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
/**
* ovs_flow_metadata_from_nlattrs - parses Netlink attributes into a flow key.
- * @priority: receives the skb priority
- * @mark: receives the skb mark
- * @in_port: receives the extracted input port.
- * @key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
+ * @flow: Receives extracted in_port, priority, tun_key and skb_mark.
+ * @key_len: Length of key in @flow. Used for calculating flow hash.
+ * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
* sequence.
*
* This parses a series of Netlink attributes that form a flow key, which must
@@ -1133,42 +1264,56 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
* get the metadata, that is, the parts of the flow key that cannot be
* extracted from the packet itself.
*/
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u32 *mark, u16 *in_port,
- const struct nlattr *attr)
+int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len,
+ const struct nlattr *attr)
{
+ struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key;
const struct nlattr *nla;
int rem;
- *in_port = DP_MAX_PORTS;
- *priority = 0;
- *mark = 0;
+ flow->key.phy.in_port = DP_MAX_PORTS;
+ flow->key.phy.priority = 0;
+ flow->key.phy.skb_mark = 0;
+ memset(tun_key, 0, sizeof(flow->key.tun_key));
nla_for_each_nested(nla, attr, rem) {
int type = nla_type(nla);
if (type <= OVS_KEY_ATTR_MAX && ovs_key_lens[type] > 0) {
+ int err;
+
if (nla_len(nla) != ovs_key_lens[type])
return -EINVAL;
switch (type) {
case OVS_KEY_ATTR_PRIORITY:
- *priority = nla_get_u32(nla);
+ flow->key.phy.priority = nla_get_u32(nla);
+ break;
+
+ case OVS_KEY_ATTR_TUNNEL:
+ err = ovs_ipv4_tun_from_nlattr(nla, tun_key);
+ if (err)
+ return err;
break;
case OVS_KEY_ATTR_IN_PORT:
if (nla_get_u32(nla) >= DP_MAX_PORTS)
return -EINVAL;
- *in_port = nla_get_u32(nla);
+ flow->key.phy.in_port = nla_get_u32(nla);
break;
case OVS_KEY_ATTR_SKB_MARK:
- *mark = nla_get_u32(nla);
+ flow->key.phy.skb_mark = nla_get_u32(nla);
break;
}
}
}
if (rem)
return -EINVAL;
+
+ flow->hash = ovs_flow_hash(&flow->key,
+ flow_key_start(&flow->key), key_len);
+
return 0;
}
@@ -1181,6 +1326,10 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
goto nla_put_failure;
+ if (swkey->tun_key.ipv4_dst &&
+ ovs_ipv4_tun_to_nlattr(skb, &swkey->tun_key))
+ goto nla_put_failure;
+
if (swkey->phy.in_port != DP_MAX_PORTS &&
nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port))
goto nla_put_failure;
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 0875fde..66ef722 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -40,7 +40,38 @@ struct sw_flow_actions {
struct nlattr actions[];
};
+/* Used to memset ovs_key_ipv4_tunnel padding. */
+#define OVS_TUNNEL_KEY_SIZE \
+ (offsetof(struct ovs_key_ipv4_tunnel, ipv4_ttl) + \
+ FIELD_SIZEOF(struct ovs_key_ipv4_tunnel, ipv4_ttl))
+
+struct ovs_key_ipv4_tunnel {
+ __be64 tun_id;
+ __be32 ipv4_src;
+ __be32 ipv4_dst;
+ __be16 tun_flags;
+ u8 ipv4_tos;
+ u8 ipv4_ttl;
+};
+
+static inline void ovs_flow_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
+ const struct iphdr *iph, __be64 tun_id,
+ __be16 tun_flags)
+{
+ tun_key->tun_id = tun_id;
+ tun_key->ipv4_src = iph->saddr;
+ tun_key->ipv4_dst = iph->daddr;
+ tun_key->ipv4_tos = iph->tos;
+ tun_key->ipv4_ttl = iph->ttl;
+ tun_key->tun_flags = tun_flags;
+
+ /* clear struct padding. */
+ memset((unsigned char *) tun_key + OVS_TUNNEL_KEY_SIZE, 0,
+ sizeof(*tun_key) - OVS_TUNNEL_KEY_SIZE);
+}
+
struct sw_flow_key {
+ struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
struct {
u32 priority; /* Packet QoS priority. */
u32 skb_mark; /* SKB mark. */
@@ -130,7 +161,7 @@ struct sw_flow *ovs_flow_alloc(void);
void ovs_flow_deferred_free(struct sw_flow *);
void ovs_flow_free(struct sw_flow *flow);
-struct sw_flow_actions *ovs_flow_actions_alloc(const struct nlattr *);
+struct sw_flow_actions *ovs_flow_actions_alloc(int actions_len);
void ovs_flow_deferred_free_acts(struct sw_flow_actions *);
int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *,
@@ -141,10 +172,10 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
const struct nlattr *);
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u32 *mark, u16 *in_port,
- const struct nlattr *);
+int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len,
+ const struct nlattr *attr);
-#define MAX_ACTIONS_BUFSIZE (16 * 1024)
+#define MAX_ACTIONS_BUFSIZE (32 * 1024)
#define TBL_MIN_BUCKETS 1024
struct flow_table {
@@ -173,11 +204,15 @@ void ovs_flow_tbl_deferred_destroy(struct flow_table *table);
struct flow_table *ovs_flow_tbl_alloc(int new_size);
struct flow_table *ovs_flow_tbl_expand(struct flow_table *table);
struct flow_table *ovs_flow_tbl_rehash(struct flow_table *table);
-void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow);
+void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
+ struct sw_flow_key *key, int key_len);
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
-u32 ovs_flow_hash(const struct sw_flow_key *key, int key_len);
struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *idx);
extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1];
+int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr,
+ struct ovs_key_ipv4_tunnel *tun_key);
+int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
+ const struct ovs_key_ipv4_tunnel *tun_key);
#endif /* flow.h */
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
new file mode 100644
index 0000000..493e977
--- /dev/null
+++ b/net/openvswitch/vport-gre.c
@@ -0,0 +1,275 @@
+/*
+ * Copyright (c) 2007-2013 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ */
+
+#ifdef CONFIG_OPENVSWITCH_GRE
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/if.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/if_tunnel.h>
+#include <linux/if_vlan.h>
+#include <linux/in.h>
+#include <linux/if_vlan.h>
+#include <linux/in.h>
+#include <linux/in_route.h>
+#include <linux/inetdevice.h>
+#include <linux/jhash.h>
+#include <linux/list.h>
+#include <linux/kernel.h>
+#include <linux/workqueue.h>
+#include <linux/rculist.h>
+#include <net/route.h>
+#include <net/xfrm.h>
+
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ip_tunnels.h>
+#include <net/gre.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <net/protocol.h>
+
+#include "datapath.h"
+#include "vport.h"
+
+/* Returns the least-significant 32 bits of a __be64. */
+static __be32 be64_get_low32(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+ return (__force __be32)x;
+#else
+ return (__force __be32)((__force u64)x >> 32);
+#endif
+}
+
+static __be16 filter_tnl_flags(__be16 flags)
+{
+ return flags & (TUNNEL_CSUM | TUNNEL_KEY);
+}
+
+static struct sk_buff *__build_header(struct sk_buff *skb,
+ int tunnel_hlen)
+{
+ const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
+ struct tnl_ptk_info tpi;
+
+ skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
+ if (IS_ERR(skb))
+ return NULL;
+
+ tpi.flags = filter_tnl_flags(tun_key->tun_flags);
+ tpi.proto = htons(ETH_P_TEB);
+ tpi.key = be64_get_low32(tun_key->tun_id);
+ tpi.seq = 0;
+ gre_build_header(skb, &tpi, tunnel_hlen);
+
+ return skb;
+}
+
+static __be64 key_to_tunnel_id(__be32 key, __be32 seq)
+{
+#ifdef __BIG_ENDIAN
+ return (__force __be64)((__force u64)seq << 32 | (__force u32)key);
+#else
+ return (__force __be64)((__force u64)key << 32 | (__force u32)seq);
+#endif
+}
+
+/* Called with rcu_read_lock and BH disabled. */
+static int gre_rcv(struct sk_buff *skb,
+ const struct tnl_ptk_info *tpi)
+{
+ struct ovs_key_ipv4_tunnel tun_key;
+ struct ovs_net *ovs_net;
+ struct vport *vport;
+ __be64 key;
+
+ ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
+ vport = rcu_dereference(ovs_net->vport_net.gre_vport);
+ if (unlikely(!vport))
+ return PACKET_REJECT;
+
+ key = key_to_tunnel_id(tpi->key, tpi->seq);
+ ovs_flow_tun_key_init(&tun_key, ip_hdr(skb), key,
+ filter_tnl_flags(tpi->flags));
+
+ ovs_vport_receive(vport, skb, &tun_key);
+ return PACKET_RCVD;
+}
+
+static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
+{
+ struct net *net = ovs_dp_get_net(vport->dp);
+ struct flowi4 fl;
+ struct rtable *rt;
+ int min_headroom;
+ int tunnel_hlen;
+ __be16 df;
+ int err;
+
+ if (unlikely(!OVS_CB(skb)->tun_key)) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ /* Route lookup */
+ memset(&fl, 0, sizeof(fl));
+ fl.daddr = OVS_CB(skb)->tun_key->ipv4_dst;
+ fl.saddr = OVS_CB(skb)->tun_key->ipv4_src;
+ fl.flowi4_tos = RT_TOS(OVS_CB(skb)->tun_key->ipv4_tos);
+ fl.flowi4_mark = skb->mark;
+ fl.flowi4_proto = IPPROTO_GRE;
+
+ rt = ip_route_output_key(net, &fl);
+ if (IS_ERR(rt))
+ return PTR_ERR(rt);
+
+ tunnel_hlen = ip_gre_calc_hlen(OVS_CB(skb)->tun_key->tun_flags);
+
+ min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
+ + tunnel_hlen + sizeof(struct iphdr)
+ + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
+ if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
+ int head_delta = SKB_DATA_ALIGN(min_headroom -
+ skb_headroom(skb) +
+ 16);
+ err = pskb_expand_head(skb, max_t(int, head_delta, 0),
+ 0, GFP_ATOMIC);
+ if (unlikely(err))
+ goto err_free_rt;
+ }
+
+ if (vlan_tx_tag_present(skb)) {
+ if (unlikely(!__vlan_put_tag(skb,
+ skb->vlan_proto,
+ vlan_tx_tag_get(skb)))) {
+ err = -ENOMEM;
+ goto err_free_rt;
+ }
+ skb->vlan_tci = 0;
+ }
+
+ /* Push Tunnel header. */
+ skb = __build_header(skb, tunnel_hlen);
+ if (unlikely(!skb)) {
+ err = 0;
+ goto err_free_rt;
+ }
+
+ df = OVS_CB(skb)->tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ?
+ htons(IP_DF) : 0;
+
+ skb->local_df = 1;
+
+ return iptunnel_xmit(net, rt, skb, fl.saddr,
+ OVS_CB(skb)->tun_key->ipv4_dst, IPPROTO_GRE,
+ OVS_CB(skb)->tun_key->ipv4_tos,
+ OVS_CB(skb)->tun_key->ipv4_ttl, df);
+err_free_rt:
+ ip_rt_put(rt);
+error:
+ return err;
+}
+
+static struct gre_cisco_protocol gre_protocol = {
+ .handler = gre_rcv,
+ .priority = 1,
+};
+
+static int gre_ports;
+static int gre_init(void)
+{
+ int err;
+
+ gre_ports++;
+ if (gre_ports > 1)
+ return 0;
+
+ err = gre_cisco_register(&gre_protocol);
+ if (err)
+ pr_warn("cannot register gre protocol handler\n");
+
+ return err;
+}
+
+static void gre_exit(void)
+{
+ gre_ports--;
+ if (gre_ports > 0)
+ return;
+
+ gre_cisco_unregister(&gre_protocol);
+}
+
+static const char *gre_get_name(const struct vport *vport)
+{
+ return vport_priv(vport);
+}
+
+static struct vport *gre_create(const struct vport_parms *parms)
+{
+ struct net *net = ovs_dp_get_net(parms->dp);
+ struct ovs_net *ovs_net;
+ struct vport *vport;
+ int err;
+
+ err = gre_init();
+ if (err)
+ return ERR_PTR(err);
+
+ ovs_net = net_generic(net, ovs_net_id);
+ if (ovsl_dereference(ovs_net->vport_net.gre_vport)) {
+ vport = ERR_PTR(-EEXIST);
+ goto error;
+ }
+
+ vport = ovs_vport_alloc(IFNAMSIZ, &ovs_gre_vport_ops, parms);
+ if (IS_ERR(vport))
+ goto error;
+
+ strncpy(vport_priv(vport), parms->name, IFNAMSIZ);
+ rcu_assign_pointer(ovs_net->vport_net.gre_vport, vport);
+ return vport;
+
+error:
+ gre_exit();
+ return vport;
+}
+
+static void gre_tnl_destroy(struct vport *vport)
+{
+ struct net *net = ovs_dp_get_net(vport->dp);
+ struct ovs_net *ovs_net;
+
+ ovs_net = net_generic(net, ovs_net_id);
+
+ rcu_assign_pointer(ovs_net->vport_net.gre_vport, NULL);
+ ovs_vport_deferred_free(vport);
+ gre_exit();
+}
+
+const struct vport_ops ovs_gre_vport_ops = {
+ .type = OVS_VPORT_TYPE_GRE,
+ .create = gre_create,
+ .destroy = gre_tnl_destroy,
+ .get_name = gre_get_name,
+ .send = gre_tnl_send,
+};
+
+#endif /* OPENVSWITCH_GRE */
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 84e0a03..98d3edb 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -67,7 +67,7 @@ static struct rtnl_link_stats64 *internal_dev_get_stats(struct net_device *netde
static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
{
rcu_read_lock();
- ovs_vport_receive(internal_dev_priv(netdev)->vport, skb);
+ ovs_vport_receive(internal_dev_priv(netdev)->vport, skb, NULL);
rcu_read_unlock();
return 0;
}
@@ -221,6 +221,7 @@ static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
skb->dev = netdev;
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, netdev);
+ skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
netif_rx(skb);
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4f01c6d..5982f3f 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -49,7 +49,9 @@ static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
return;
skb_push(skb, ETH_HLEN);
- ovs_vport_receive(vport, skb);
+ ovs_skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+
+ ovs_vport_receive(vport, skb, NULL);
return;
error:
@@ -170,7 +172,7 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
netdev_vport->dev->name,
packet_length(skb), mtu);
- goto error;
+ goto drop;
}
skb->dev = netdev_vport->dev;
@@ -179,9 +181,8 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
return len;
-error:
+drop:
kfree_skb(skb);
- ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
return 0;
}
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index a3cb3a3..dd298b5 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -39,6 +39,5 @@ netdev_vport_priv(const struct vport *vport)
}
const char *ovs_netdev_get_name(const struct vport *);
-const char *ovs_netdev_get_config(const struct vport *);
#endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 7206231..d4c7fa0 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -38,6 +38,10 @@
static const struct vport_ops *vport_ops_list[] = {
&ovs_netdev_vport_ops,
&ovs_internal_vport_ops,
+
+#ifdef CONFIG_OPENVSWITCH_GRE
+ &ovs_gre_vport_ops,
+#endif
};
/* Protected by RCU read lock for reading, ovs_mutex for writing. */
@@ -325,7 +329,8 @@ int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb)
* Must be called with rcu_read_lock. The packet cannot be shared and
* skb->data should point to the Ethernet header.
*/
-void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
+void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
+ struct ovs_key_ipv4_tunnel *tun_key)
{
struct pcpu_tstats *stats;
@@ -335,6 +340,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
stats->rx_bytes += skb->len;
u64_stats_update_end(&stats->syncp);
+ OVS_CB(skb)->tun_key = tun_key;
ovs_dp_process_received_packet(vport, skb);
}
@@ -351,7 +357,7 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
{
int sent = vport->ops->send(vport, skb);
- if (likely(sent)) {
+ if (likely(sent > 0)) {
struct pcpu_tstats *stats;
stats = this_cpu_ptr(vport->percpu_stats);
@@ -360,7 +366,12 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
stats->tx_packets++;
stats->tx_bytes += sent;
u64_stats_update_end(&stats->syncp);
- }
+ } else if (sent < 0) {
+ ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
+ kfree_skb(skb);
+ } else
+ ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
+
return sent;
}
@@ -371,7 +382,7 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
* @err_type: one of enum vport_err_type types to indicate the error type
*
* If using the vport generic stats layer indicate that an error of the given
- * type has occured.
+ * type has occurred.
*/
void ovs_vport_record_error(struct vport *vport, enum vport_err_type err_type)
{
@@ -397,3 +408,18 @@ void ovs_vport_record_error(struct vport *vport, enum vport_err_type err_type)
spin_unlock(&vport->stats_lock);
}
+
+static void free_vport_rcu(struct rcu_head *rcu)
+{
+ struct vport *vport = container_of(rcu, struct vport, rcu);
+
+ ovs_vport_free(vport);
+}
+
+void ovs_vport_deferred_free(struct vport *vport)
+{
+ if (!vport)
+ return;
+
+ call_rcu(&vport->rcu, free_vport_rcu);
+}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 68a377b..376045c 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -34,6 +34,11 @@ struct vport_parms;
/* The following definitions are for users of the vport subsytem: */
+/* The following definitions are for users of the vport subsytem: */
+struct vport_net {
+ struct vport __rcu *gre_vport;
+};
+
int ovs_vport_init(void);
void ovs_vport_exit(void);
@@ -123,9 +128,8 @@ struct vport_parms {
* existing vport to a &struct sk_buff. May be %NULL for a vport that does not
* have any configuration.
* @get_name: Get the device's name.
- * @get_config: Get the device's configuration.
- * May be null if the device does not have an ifindex.
- * @send: Send a packet on the device. Returns the length of the packet sent.
+ * @send: Send a packet on the device. Returns the length of the packet sent,
+ * zero for dropped packets or negative for error.
*/
struct vport_ops {
enum ovs_vport_type type;
@@ -139,7 +143,6 @@ struct vport_ops {
/* Called with rcu_read_lock or ovs_mutex. */
const char *(*get_name)(const struct vport *);
- void (*get_config)(const struct vport *, void *);
int (*send)(struct vport *, struct sk_buff *);
};
@@ -154,6 +157,7 @@ enum vport_err_type {
struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *,
const struct vport_parms *);
void ovs_vport_free(struct vport *);
+void ovs_vport_deferred_free(struct vport *vport);
#define VPORT_ALIGN 8
@@ -186,12 +190,21 @@ static inline struct vport *vport_from_priv(const void *priv)
return (struct vport *)(priv - ALIGN(sizeof(struct vport), VPORT_ALIGN));
}
-void ovs_vport_receive(struct vport *, struct sk_buff *);
+void ovs_vport_receive(struct vport *, struct sk_buff *,
+ struct ovs_key_ipv4_tunnel *);
void ovs_vport_record_error(struct vport *, enum vport_err_type err_type);
/* List of statically compiled vport implementations. Don't forget to also
* add yours to the list at the top of vport.c. */
extern const struct vport_ops ovs_netdev_vport_ops;
extern const struct vport_ops ovs_internal_vport_ops;
+extern const struct vport_ops ovs_gre_vport_ops;
+
+static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb,
+ const void *start, unsigned int len)
+{
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
+}
#endif /* vport.h */
OpenPOWER on IntegriCloud