1. 19 Feb, 2018 1 commit
  2. 16 Feb, 2018 4 commits
    • Ben Pfaff's avatar
      ofp-meter: Fix use-after-free for decoding meter mods. · f4775bbd
      Ben Pfaff authored
      ofputil_pull_bands() may change bands->data.
      
      Found by libfuzzer-ngram.
      Reported-by: default avatarBhargava Shastry <bshastry@sect.tu-berlin.de>
      Signed-off-by: default avatarBen Pfaff <blp@ovn.org>
      Reviewed-by: Yifeng Sun<pkusunyifeng@gmail.com>
      f4775bbd
    • Ed Swierk's avatar
      datapath: Remove padding from packet before L3+ conntrack processing · a948fa4b
      Ed Swierk authored
      Upstream commit:
          commit 9382fe71c0058465e942a633869629929102843d
          Author: Ed Swierk <eswierk@skyportsystems.com>
          Date:   Wed Jan 31 18:48:02 2018 -0800
      
          openvswitch: Remove padding from packet before L3+ conntrack processing
      
          IPv4 and IPv6 packets may arrive with lower-layer padding that is not
          included in the L3 length. For example, a short IPv4 packet may have
          up to 6 bytes of padding following the IP payload when received on an
          Ethernet device with a minimum packet length of 64 bytes.
      
          Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
          and help() in nf_conntrack_ftp) assume skb->len reflects the length of
          the L3 header and payload, rather than referring back to
          ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
          lower-layer padding.
      
          In the normal IPv4 receive path, ip_rcv() trims the packet to
          ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
          path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
          in the br_netfilter receive path, br_validate_ipv4() and
          br_validate_ipv6() trim the packet to the L3 length before invoking
          netfilter hooks.
      
          Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
          the skb to the L3 header but does not trim it to the L3 length before
          calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
          nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
          nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
          message. While extra zero bytes don't affect the checksum, the length
          in the IP pseudoheader does. That length is based on skb->len, and
          without trimming, it doesn't match the length the sender used when
          computing the checksum.
      
          In ovs_ct_execute(), trim the skb to the L3 length before higher-layer
          processing.
      Signed-off-by: default avatarEd Swierk <eswierk@skyportsystems.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Cc: Ed Swierk <eswierk@skyportsystems.com>
      Signed-off-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      a948fa4b
    • Eric Garver's avatar
      datapath: Fix pop_vlan action for double tagged frames · fcb77796
      Eric Garver authored
      Upstream commit:
          commit c48e74736fccf25fb32bb015426359e1c2016e3b
          Author: Eric Garver <e@erig.me>
          Date:   Wed Dec 20 15:09:22 2017 -0500
      
          openvswitch: Fix pop_vlan action for double tagged frames
      
          skb_vlan_pop() expects skb->protocol to be a valid TPID for double
          tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
          shift the true ethertype into position for us.
      
          Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Cc: Eric Garver <e@erig.me>
      Fixes: a27c454e ("datapath: add processing of L3 packets")
      Signed-off-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      fcb77796
    • paolo abeni's avatar
      datapath: do not propagate headroom updates to internal port · 893b9906
      paolo abeni authored
      Upstream commit:
          commit 183dea5818315c0a172d21ecbcd2554894bf01e3
          Author: Paolo Abeni <pabeni@redhat.com>
          Date:   Thu Nov 30 15:35:33 2017 +0100
      
          openvswitch: do not propagate headroom updates to internal port
      
          After commit 3a927bc7cf9d ("ovs: propagate per dp max headroom to
          all vports") the need_headroom for the internal vport is updated
          accordingly to the max needed headroom in its datapath.
      
          That avoids the pskb_expand_head() costs when sending/forwarding
          packets towards tunnel devices, at least for some scenarios.
      
          We still require such copy when using the ovs-preferred configuration
          for vxlan tunnels:
      
              br_int
            /       \
          tap      vxlan
                     (remote_ip:X)
      
          br_phy
               \
              NIC
      
          where the route towards the IP 'X' is via 'br_phy'.
      
          When forwarding traffic from the tap towards the vxlan device, we
          will call pskb_expand_head() in vxlan_build_skb() because
          br-phy->needed_headroom is equal to tun->needed_headroom.
      
          With this change we avoid updating the internal vport needed_headroom,
          so that in the above scenario no head copy is needed, giving 5%
          performance improvement in UDP throughput test.
      
          As a trade-off, packets sent from the internal port towards a tunnel
          device will now experience the head copy overhead. The rationale is
          that the latter use-case is less relevant performance-wise.
      Signed-off-by: default avatarpaolo abeni <pabeni@redhat.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Cc: paolo abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      893b9906
  3. 15 Feb, 2018 3 commits
  4. 13 Feb, 2018 2 commits
    • Ian Stokes's avatar
      netdev-dpdk: Reintroduce shared mempools. · 6f202d85
      Ian Stokes authored
      This commit manually reverts the current per port mempool model to the
      previous shared mempool model for DPDK ports.
      
      OVS previously used a shared mempool model for ports with the same MTU
      configuration. This was replaced by a per port mempool model to address
      issues flagged by users such as:
      
      https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html
      
      However the per port model has a number of issues including:
      
      1. Requires an increase in memory resource requirements to support the same
      number of ports as the shared port model.
      2. Incorrect algorithm for mbuf provisioning for each mempool.
      
      These are considered blocking factors for current deployments of OVS when
      upgrading to OVS 2.9 as a  user may have to redimension memory for the same
      deployment configuration. This may not be possible for users.
      
      For clarity, the commits whose changes are removed include the
      following:
      
      netdev-dpdk: Create separate memory pool for each port: d555d9bd
      netdev-dpdk: fix management of pre-existing mempools: b6b26021
      Fix mempool names to reflect socket id: f06546a5
      netdev-dpdk: skip init for existing mempools: 837c1761
      netdev-dpdk: manage failure in mempool name creation: 65056fd7
      netdev-dpdk: Reword mp_size as n_mbufs: ad9b5b9b
      netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free: a08a115d
      netdev-dpdk: Fix mp_name leak on snprintf failure: ec6edc8c
      netdev-dpdk: Fix dpdk_mp leak in case of EEXIST: 173ef76b
      netdev-dpdk: Factor out struct dpdk_mp: 24e78f93
      netdev-dpdk: Remove unused MAX_NB_MBUF: bc57ed90
      netdev-dpdk: Fix mempool creation with large MTU: af5b0dad
      
      Due to the number of commits and period of time they were introduced
      over, a simple revert was not possible. All code from the commits above
      is removed and the shared mempool code reintroduced as it was before its
      replacement.
      
      Code introduced by commit
      
      netdev-dpdk: Add debug appctl to get mempool information: be481733
      
      has been modified to work with the shared mempool model.
      
      Cc: Antonio Fischetti <antonio.fischetti@gmail.com>
      Cc: Ilya Maximets <i.maximets@samsung.com>
      Cc: Kevin Traynor <ktraynor@redhat.com>
      Cc: Jan Scheurich <jan.scheurich@ericsson.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      Acked-by: default avatarKevin Traynor <ktraynor@redhat.com>
      Tested-by: default avatarKevin Traynor <ktraynor@redhat.com>
      6f202d85
    • Ian Stokes's avatar
      docs: Update supported DPDK versions. · c261dc72
      Ian Stokes authored
      Update the OVS to DPDK release table to use the latest stable
      DPDK 16.11.4 for OVS 2.7.
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      Acked-by: default avatarMark Kavanagh <mark.b.kavanagh@intel.com>
      c261dc72
  5. 12 Feb, 2018 10 commits
  6. 09 Feb, 2018 1 commit
    • Mark Michelson's avatar
      ovn: Allow DNS lookups over IPv6 · 0451eb1e
      Mark Michelson authored
      There was a bug in DNS request handling where the incoming packet was
      assumed to be IPv4.
      
      The result was that for the outgoing packet, we would attempt to write
      the IPv4 checksum and total length into what was actually an IPv6
      header. This resulted in the source IPv6 address getting corrupted.
      Later, the source and destination IPv6 addresses would get swapped,
      resulting in the DNS response being sent to a nonsense destination.
      
      With this change, we check the ethertype of the packet to determine what
      l3 information to write, and where to write it. A test is also included
      that verifies that this works as expected.
      
      Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1539608Signed-off-by: default avatarMark Michelson <mmichels@redhat.com>
      Signed-off-by: default avatarBen Pfaff <blp@ovn.org>
      0451eb1e
  7. 08 Feb, 2018 5 commits
  8. 05 Feb, 2018 2 commits
  9. 01 Feb, 2018 3 commits
  10. 31 Jan, 2018 1 commit
    • Ciara Loftus's avatar
      netdev-dpdk: Add support for vHost dequeue zero copy (experimental) · a0b62aac
      Ciara Loftus authored
      Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
      option to 'true' when configuring the Interface:
      
      ovs-vsctl set Interface dpdkvhostuserclient0
      options:vhost-server-path=/tmp/dpdkvhostuserclient0
      options:dq-zero-copy=true
      
      When packets from a vHost device with zero copy enabled are destined for
      a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
      must be set to a smaller value. 128 is recommended. This can be achieved
      like so:
      
      ovs-vsctl set Interface dpdkport options:n_txq_desc=128
      
      Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
      to should not exceed 128. Due to this requirement, the feature is
      considered 'experimental'.
      
      Testing of the patch showed a ~8% improvement when switching 512B
      packets between vHost devices on different VMs on the same host when
      zero copy was enabled on the transmitting device.
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Acked-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      a0b62aac
  11. 26 Jan, 2018 7 commits
    • Ilya Maximets's avatar
      netdev-dpdk: Fix xstats leak on port destruction. · 448a1845
      Ilya Maximets authored
      CC: Michal Weglicki <michalx.weglicki@intel.com>
      Fixes: 971f4b39 ("netdev: Custom statistics.")
      Signed-off-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      448a1845
    • Ilya Maximets's avatar
      netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats(). · 0804663d
      Ilya Maximets authored
      CC: Michal Weglicki <michalx.weglicki@intel.com>
      Fixes: 971f4b39 ("netdev: Custom statistics.")
      Signed-off-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      0804663d
    • Ilya Maximets's avatar
      netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats(). · 51be17b4
      Ilya Maximets authored
      CC: Michal Weglicki <michalx.weglicki@intel.com>
      Fixes: 971f4b39 ("netdev: Custom statistics.")
      Signed-off-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      51be17b4
    • Matteo Croce's avatar
      vswitchd: show DPDK version · 1de83cb0
      Matteo Croce authored
      Show DPDK version if Open vSwitch is compiled with DPDK support.
      Version can be retrieved with `ovs-vswitchd --version` or from OVS logs.
      Small change in ovs-ctl to avoid breakage on output change.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarKevin Traynor <ktraynor@redhat.com>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      1de83cb0
    • Yuanhan Liu's avatar
      netdev-dpdk: fix port addition for ports sharing same PCI id · c7f98a77
      Yuanhan Liu authored
      Some NICs have only one PCI address associated with multiple ports. This
      patch extends the dpdk-devargs option's format to cater for such devices.
      
      To achieve that, this patch uses a new syntax that will be adapted and
      implemented in future DPDK release (likely, v18.05):
          http://dpdk.org/ml/archives/dev/2017-December/084234.html
      
      And since it's the DPDK duty to parse the (complete and full) syntax
      and this patch is more likely to serve as an intermediate workaround,
      here I take a simpler and shorter syntax from it (note it's allowed to
      have only one category being provided):
          class=eth,mac=00:11:22:33:44:55:66
      
      Also, old compatibility is kept. Users can still go on with using the
      PCI id to add a port (if that's enough for them). Meaning, this patch
      will not break anything.
      
      This patch is basically based on the one from Ciara:
          https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html
      
      Cc: Loftus Ciara <ciara.loftus@intel.com>
      Cc: Thomas Monjalon <thomas@monjalon.net>
      Cc: Kevin Traynor <ktraynor@redhat.com>
      Signed-off-by: default avatarYuanhan Liu <yliu@fridaylinux.org>
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      c7f98a77
    • Ian Stokes's avatar
      netdev-dpdk: Fix requested MTU size validation. · fa02b5bf
      Ian Stokes authored
      This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
      in netdev_dpdk_set_mtu(), in order to determine if the total length of
      the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.
      
      When setting an MTU we first check if the requested total frame length
      (which includes associated L2 overhead) will exceed the maximum
      frame length supported in netdev_dpdk_set_mtu(). The frame length is
      calculated by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU
      for the device will be set at a later stage in dpdk_eth_dev_init() using
      rte_eth_dev_set_mtu(mtu).
      
      However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
      that the frame does not exceed the max frame length for that device varies
      between DPDK device drivers. For example ixgbe driver calculates the
      frame length for a given MTU as
      
      mtu + ETHER_HDR_LEN + ETHER_CRC_LEN
      
      i40e driver calculates it as
      
      mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2
      
      em driver calculates it as
      
      mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE
      
      Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
      the upper limit MTU for that devices DPDK driver. This leads to a segfault.
      This is because the frame length comparison as is, does not take into account
      the addition of the vlan tag overhead expected in the drivers. The
      netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
      dpdk_eth_dev_init() will fail before the queues have been created for the
      DPDK device. This coupled with assumptions regarding reconfiguration
      requirements for the netdev will lead to a segfault when the rxq is polled
      for this device.
      
      A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
      validating a requested MTU in netdev_dpdk_set_mtu().
      MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:
      
      mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)
      
      By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
      now takes into account the maximum L2 overhead that a DPDK driver could
      allow for in its frame size calculation. This allows OVS to flag an error
      rather than the DPDK driver if the frame length exceeds the max DPDK frame
      length. OVS can fail gracefully at this point and use the default MTU of
      1500 to continue to configure the port.
      
      Note: this fix is a work around, a better approach would be if DPDK devices
      could report the maximum MTU value that can be requested on a per device
      basis. This capability however is not currently available. A downside of
      this patch is that the MTU upper limit will be reduced by 8 bytes for
      DPDK devices that do not need to account for vlan tags in the frame length
      driver calculations e.g. ixgbe devices upper MTU limit is reduced from
      the OVS point of view from 9710 to 9702.
      
      CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
      Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
      Signed-off-by: default avatarIan Stokes <ian.stokes@intel.com>
      Co-authored-by: default avatarMark Kavanagh <mark.b.kavanagh@intel.com>
      Signed-off-by: default avatarMark Kavanagh <mark.b.kavanagh@intel.com>
      Acked-by: default avatarFlavio Leitner <fbl@sysclose.org>
      fa02b5bf
    • Ben Pfaff's avatar
      ofproto: Fix double-unref of temporary rule when learning. · a8b629f8
      Ben Pfaff authored
      When ofproto_flow_mod_init() accepts a rule, it takes ownership of it and
      either unrefs it on error or transfers ownership to the struct it
      initializes on success, but ofproto_flow_mod_init_for_learn() was unref-ing
      it a second time if it reported an error.
      Signed-off-by: default avatarBen Pfaff <blp@ovn.org>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      a8b629f8
  12. 25 Jan, 2018 1 commit