1. 04 Feb, 2015 10 commits
    • Jon Paul Maloy's avatar
      tipc: add reference count to struct tipc_link · 2d72d495
      Jon Paul Maloy authored
      When a bearer is disabled, all pertaining links will be reset and
      deleted. However, if there is a second active link towards a killed
      link's destination, the delete has to be postponed until the failover
      is finished. During this interval, we currently put the link in zombie
      mode, i.e., we take it out of traffic, delete its timer, but leave it
      attached to the owner node structure until all missing packets have
      been received.  When this is done, we detach the link from its node
      and delete it, assuming that the synchronous timer deletion that was
      initiated earlier in a different thread has finished.
      This is unsafe, as the failover may finish before del_timer_sync()
      has returned in the other thread.
      We fix this by adding an atomic reference counter of type kref in
      struct tipc_link. The counter keeps track of the references kept
      to the link by the owner node and the timer. We then do a conditional
      delete, based on the reference counter, both after the failover has
      been finished and when the timer expires, if applicable. Whoever
      comes last, will actually delete the link. This approach also implies
      that we can make the deletion of the timer asynchronous.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2015-02-03' of... · 940288b6
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      Last round of updates for net-next:
       * revert a patch that caused a regression with mesh userspace (Bob)
       * fix a number of suspend/resume related races
         (from Emmanuel, Luca and myself - we'll look at backporting later)
       * add software implementations for new ciphers (Jouni)
       * add a new ACPI ID for Broadcom's rfkill (Mika)
       * allow using netns FD for wireless (Vadim)
       * some other cleanups (various)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Praveen Madhavan's avatar
      csiostor:Use firmware version from cxgb4/t4fw_version.h · 541c571f
      Praveen Madhavan authored
      This patch is to use firmware version macros from t4fw_version.h
      and also enables 40g T5 adapter.
      Signed-off-by: default avatarPraveen Madhavan <praveenm@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Nicholas Mc Guire's avatar
      tlan: msecs_to_jiffies convrsion · b5057dd7
      Nicholas Mc Guire authored
      This is only an API consolidation and should make things more readable
      it replaces var * HZ / 1000 by msecs_to_jiffies(var).
      As there is a discrepancy between the code and the comments this is in
      a separate patch.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Nicholas Mc Guire's avatar
      tlan: use msecs_to_jiffies for conversion · 51fd9471
      Nicholas Mc Guire authored
      This is only an API consolidation and should make things more readable
      it replaces var * HZ / 1000 by msecs_to_jiffies(var).
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 45e826fd
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      Johan Hedberg says:
      pull request: bluetooth-next 2015-02-03
      Here's what's likely the last bluetooth-next pull request for 3.20.
      Notable changes include:
       - xHCI workaround + a new id for the ath3k driver
       - Several new ids for the btusb driver
       - Support for new Intel Bluetooth controllers
       - Minor cleanups to ieee802154 code
       - Nested sleep warning fix in socket accept() code path
       - Fixes for Out of Band pairing handling
       - Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
       - Improvements to data we expose through debugfs
       - Proper handling of Hardware Error HCI events
      Please let me know if there are any issues pulling. Thanks.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Tom Herbert's avatar
      net: add skb functions to process remote checksum offload · dcdc8994
      Tom Herbert authored
      This patch adds skb_remcsum_process and skb_gro_remcsum_process to
      perform the appropriate adjustments to the skb when receiving
      remote checksum offload.
      Updated vxlan and gue to use these functions.
      Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
      not see any change in performance.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Siva Mannem's avatar
      bridge: Let bridge not age 'externally' learnt FDB entries, they are removed... · 9a05dde5
      Siva Mannem authored
      bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging
      When 'learned_sync' flag is turned on, the offloaded switch
       port syncs learned MAC addresses to bridge's FDB via switchdev notifier
       (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
       wrongly being deleted by bridge aging logic. This patch ensures that FDB
       entries synced from offloaded switch ports are not deleted by bridging logic.
       Such entries can only be deleted via switchdev notifier
      Signed-off-by: default avatarSiva Mannem <siva.mannem.lnx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • LEROY Christophe's avatar
      net: fs_enet: Implement NETIF_F_SG feature · 4fc9b87b
      LEROY Christophe authored
      Freescale ethernet controllers have the capability to re-assemble fragmented
      data into a single ethernet frame. This patch uses this capability and
      implements NETIP_F_SG feature into the fs_enet ethernet driver.
      On a MPC885, I get 53% performance improvement on a ftp transfer of a 15Mb file:
        * Without the patch : 2,8 Mbps
        * With the patch : 4,3 Mbps
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      xps: fix xps for stacked devices · 2bd82484
      Eric Dumazet authored
      A typical qdisc setup is the following :
      bond0 : bonding device, using HTB hierarchy
      eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
      XPS allows to spread packets on specific tx queues, based on the cpu
      doing the send.
      Problem is that dequeues from bond0 qdisc can happen on random cpus,
      due to the fact that qdisc_run() can dequeue a batch of packets.
      CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
      CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
      CPUB -> dequeue packet P1 from bond0
              enqueue packet on eth1/eth2
      CPUC -> dequeue packet P2 from bond0
              enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
      get_xps_queue() then might select wrong queue for P1, since current cpu
      might be different than CPUA.
      P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
      if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
      Effect of this bug is TCP reorders, and more generally not optimal
      TX queue placement. (A victim bulk flow can be migrated to the wrong TX
      queue for a while)
      To fix this, we have to record sender cpu number the first time
      dev_queue_xmit() is called for one tx skb.
      We can union napi_id (used on receive path) and sender_cpu,
      granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
      this union idea)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 03 Feb, 2015 8 commits
  3. 02 Feb, 2015 22 commits
    • Markus Elfring's avatar
      net: sctp: Deletion of an unnecessary check before the function call "kfree" · 7d37d0c1
      Markus Elfring authored
      The kfree() function tests whether its argument is NULL and then
      returns immediately. Thus the test around the call is not needed.
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Acked-By: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'udpv6_lockless_send' · 193cdc4a
      David S. Miller authored
      Vladislav Yasevich says:
      ipv6: Add lockless UDP send path
      This series introduces a lockless UDPv6 send path similar to
      what Herbert Xu did for IPv4 a while ago.
      There are some difference from IPv4.  IPv6 caching for flow
      label is a bit different, as well as it requires another cork
      cork structure that holds the IPv6 ancillary data.
      Please take a look.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      ipv6: Allow for partial checksums on non-ufo packets · 32dce968
      Vlad Yasevich authored
      Currntly, if we are not doing UFO on the packet, all UDP
      packets will start with CHECKSUM_NONE and thus perform full
      checksum computations in software even if device support
      IPv6 checksum offloading.
      Let's start start with CHECKSUM_PARTIAL if the device
      supports it and we are sending only a single packet at
      or below mtu size.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      udpv6: Add lockless sendmsg() support · 03485f2a
      Vlad Yasevich authored
      This commit adds the same functionaliy to IPv6 that
      commit 903ab86d
      Author: Herbert Xu <herbert@gondor.apana.org.au>
      Date:   Tue Mar 1 02:36:48 2011 +0000
          udp: Add lockless transmit path
      added to IPv4.
      UDP transmit path can now run without a socket lock,
      thus allowing multiple threads to send to a single socket
      more efficiently.
      This is only used when corking/MSG_MORE is not used.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      ipv6: Introduce udpv6_send_skb() · d39d938c
      Vlad Yasevich authored
      Now that we can individually construct IPv6 skbs to send, add a
      udpv6_send_skb() function to populate the udp header and send the
      skb.  This allows udp_v6_push_pending_frames() to re-use this
      function as well as enables us to add lockless sendmsg() support.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      ipv6: introduce ipv6_make_skb · 6422398c
      Vlad Yasevich authored
      This commit is very similar to
      commit 1c32c5ad
      Author: Herbert Xu <herbert@gondor.apana.org.au>
      Date:   Tue Mar 1 02:36:47 2011 +0000
          inet: Add ip_make_skb and ip_finish_skb
      It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.
      The job of ip6_make_skb is to collect messages into an ipv6 packet
      and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
      the generated skb.  Together they replicated the job of
      ip6_push_pending_frames() while also provide the capability to be
      called independently.  This will be needed to add lockless UDP sendmsg
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      ipv6: Append sending data to arbitrary queue · 0bbe84a6
      Vlad Yasevich authored
      Add the ability to append data to arbitrary queue.  This
      will be needed later to implement lockless UDP sends.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Vlad Yasevich's avatar
      ipv6: pull cork initialization into its own function. · 366e41d9
      Vlad Yasevich authored
      Pull IPv6 cork initialization into its own function that
      can be re-used.  IPv6 specific cork data did not have an
      explicit data structure.  This patch creats eone so that
      just ipv6 cork data can be as arguemts.  Also, since
      IPv6 tries to save the flow label into inet_cork_full
      tructure, pass the full cork.
      Adjust ip6_cork_release() to take cork data structures.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Anish Bhatt's avatar
      cxgb4 : Improve IEEE DCBx support, other minor open-lldp fixes · ba0c39cb
      Anish Bhatt authored
      * Add support for IEEE ets & pfc api.
      * Fix bug that resulted in incorrect bandwidth percentage being returned for
        CEE peers
      * Convert pfc enabled info from firmware format to what dcbnl expects before
      Signed-off-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Arnd Bergmann's avatar
      net/tulip: don't warn about unknown ARM architecture · 98830dd0
      Arnd Bergmann authored
      ARM has 32-byte cache lines, which according to the comment in
      the init registers function seems to work best with the default
      value of 0x4800 that is also used on sparc and parisc.
      This adds ARM to the same list, to use that default but no
      longer warn about it.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarGrant Grundler <grundler@parisc-linux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Arnd Bergmann's avatar
      net: hip04: add missing MODULE_LICENSE · 4c0c46be
      Arnd Bergmann authored
      The hip04 ethernet driver causes a new compile-time warning
      when built as a loadable module:
      WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/hisilicon/hip04_eth.o
      see include/linux/module.h for more information
      This adds the license as "GPL", which matches the header of the file.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Florian Westphal's avatar
      net: dctcp: loosen requirement to assert ECT(0) during 3WHS · 843c2fdf
      Florian Westphal authored
      One deployment requirement of DCTCP is to be able to run
      in a DC setting along with TCP traffic. As Glenn Judd's
      NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls
      of TCP in the Datacenter" [1] (tba) explains, one way to
      solve this on switch side is to split DCTCP and TCP traffic
      in two queues per switch port based on the DSCP: one queue
      soley intended for DCTCP traffic and one for non-DCTCP traffic.
      For the DCTCP queue, there's the marking threshold K as
      explained in commit e3118e83 ("net: tcp: add DCTCP congestion
      control algorithm") for RED marking ECT(0) packets with CE.
      For the non-DCTCP queue, there's f.e. a classic tail drop queue.
      As already explained in e3118e83, running DCTCP at scale
      when not marking SYN/SYN-ACK packets with ECT(0) has severe
      consequences as for non-ECT(0) packets, traversing the RED
      marking DCTCP queue will result in a severe reduction of
      connection probability.
      This is due to the DCTCP queue being dominated by ECT(0) traffic
      and switches handle non-ECT traffic in the RED marking queue
      after passing K as drops, where K is usually a low watermark
      in order to leave enough tailroom for bursts. Splitting DCTCP
      traffic among several queues (ECN and non-ECN queue) is being
      considered a terrible idea in the network community as it
      splits single flows across multiple network paths.
      Therefore, commit e3118e83 implements this on Linux as
      ECT(0) marked traffic, as we argue that marking all packets
      of a DCTCP flow is the only viable solution and also doesn't
      speak against the draft.
      However, recently, a DCTCP implementation for FreeBSD hit also
      their mainline kernel [2]. In order to let them play well
      together with Linux' DCTCP, we would need to loosen the
      requirement that ECT(0) has to be asserted during the 3WHS as
      not implemented in FreeBSD. This simplifies the ECN test and
      lets DCTCP work together with FreeBSD.
      Joint work with Daniel Borkmann.
        [1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd
        [2] https://github.com/freebsd/freebsd/commit/8ad879445281027858a7fa706d13e458095b595fSigned-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'net-timestamp' · 69422416
      David S. Miller authored
      Willem de Bruijn says:
      net-timestamp: blinding
        (v2 -> v3)
        - rebase only: v2 did not make it to patchwork / netdev
        (v1 -> v2)
        - fix capability check in patch 2
            this could be moved into net/core/sock.c as sk_capable_nouser()
        (rfc -> v1)
        - dropped patch 4: timestamp batching
            due to complexity, as discussed
        - dropped patch 5: default mode
            because it does not really cover all use cases, as discussed
        - added documentation
        - minor fix, see patch 2
      Two issues were raised during recent timestamping discussions:
      1. looping full packets on the error queue exposes packet headers
      2. TCP timestamping with retransmissions generates many timestamps
      This RFC patchset is an attempt at addressing both without breaking
      legacy behavior.
      Patch 1 reintroduces the "no payload" timestamp option, which loops
      timestamps onto an empty skb. This reduces the pressure on SO_RCVBUF
      from looping many timestamps. It does not reduce the number of recv()
      calls needed to process them. The timestamp cookie mechanism developed
      in http://patchwork.ozlabs.org/patch/427213/ did, but this is
      considerably simpler.
      Patch 2 then gives administrators the power to block all timestamp
      requests that contain data by unprivileged users. I proposed this
      earlier as a backward compatible workaround in the discussion of
        net-timestamp: pull headers for SOCK_STREAM
      Patch 3 only updates the txtimestamp example to test this option.
      Verified that with option '-n', length is zero in all cases and
      option '-I' (PKTINFO) stops working.
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Willem de Bruijn's avatar
      net-timestamp: no-payload option in txtimestamp test · 23685923
      Willem de Bruijn authored
      Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
      test the implementation.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Willem de Bruijn's avatar
      net-timestamp: no-payload only sysctl · b245be1f
      Willem de Bruijn authored
      Tx timestamps are looped onto the error queue on top of an skb. This
      mechanism leaks packet headers to processes unless the no-payload
      options SOF_TIMESTAMPING_OPT_TSONLY is set.
      Add a sysctl that optionally drops looped timestamp with data. This
      only affects processes without CAP_NET_RAW.
      The policy is checked when timestamps are generated in the stack.
      It is possible for timestamps with data to be reported after the
      sysctl is set, if these were queued internally earlier.
      No vulnerability is immediately known that exploits knowledge
      gleaned from packet headers, but it may still be preferable to allow
      administrators to lock down this path at the cost of possible
      breakage of legacy applications.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
        (v1 -> v2)
        - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
        (rfc -> v1)
        - document the sysctl in Documentation/sysctl/net.txt
        - fix access control race: read .._OPT_TSONLY only once,
              use same value for permission check and skb generation.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Willem de Bruijn's avatar
      net-timestamp: no-payload option · 49ca0d8b
      Willem de Bruijn authored
      Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
      timestamps, this loops timestamps on top of empty packets.
      Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
      cmsg reception (aside from timestamps) are no longer possible. This
      works together with a follow on patch that allows administrators to
      only allow tx timestamping if it does not loop payload or metadata.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Changes (rfc -> v1)
        - add documentation
        - remove unnecessary skb->len test (thanks to Richard Cochran)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Johan Hedberg's avatar
      Bluetooth: Remove mgmt_rp_read_local_oob_ext_data struct · 66f096f7
      Johan Hedberg authored
      This extended return parameters struct conflicts with the new Read Local
      OOB Extended Data command definition. To avoid the conflict simply
      rename the old "extended" version to the normal one and update the code
      appropriately to take into account the two possible response PDU sizes.
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
    • Marcel Holtmann's avatar
      Marcel Holtmann authored
      The Intel Snowfield Peak Bluetooth controllers use a strict scanning
      filter policy that filters based on Bluetooth device addresses and
      not on RSSI.
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
    • Jakub Pawlowski's avatar
      Bluetooth: Add restarting to service discovery · 4b0e0ced
      Jakub Pawlowski authored
      When using LE_SCAN_FILTER_DUP_ENABLE, some controllers would send
      advertising report from each LE device only once. That means that we
      don't get any updates on RSSI value, and makes Service Discovery very
      slow. This patch adds restarting scan when in Service Discovery, and
      device with filtered uuid is found, but it's not in RSSI range to send
      event yet. This way if device moves into range, we will quickly get RSSI
      Signed-off-by: default avatarJakub Pawlowski <jpawlowski@google.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
    • Jakub Pawlowski's avatar
      Bluetooth: Add le_scan_restart work for LE scan restarting · 2d28cfe7
      Jakub Pawlowski authored
      Currently there is no way to restart le scan, and it's needed in
      service scan method. The way it work: it disable, and then enable le
      scan on controller.
      During the restart, we must remember when the scan was started, and
      it's duration, to later re-schedule the le_scan_disable work, that was
      stopped during the stop scan phase.
      Signed-off-by: default avatarJakub Pawlowski <jpawlowski@google.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
    • David Ahern's avatar
      net: rocker: Add support for retrieving port level statistics · 9766e97a
      David Ahern authored
      Add support for retrieving port level statistics from device.
      Hook is added for ethtool's stats functionality. For example,
      $ ethtool -S eth3
      NIC statistics:
           rx_packets: 12
           rx_bytes: 2790
           rx_dropped: 0
           rx_errors: 0
           tx_packets: 8
           tx_bytes: 728
           tx_dropped: 0
           tx_errors: 0
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'switchdev_offload_flags' · fe3ef616
      David S. Miller authored
      Roopa Prabhu says:
      switchdev offload flags
      This patch series introduces new offload flags for switchdev.
      Kernel network subsystems can use this flag to accelerate
      network functions by offloading to hw.
      I expect that there will be need for subsystem specific feature
      flag in the future.
      This patch series currently only addresses bridge driver link
      attribute offloads to hardware.
      Looking at the current state of bridge l2 offload in the kernel,
          - flag 'self' is the way to directly manage the bridge device in hw via
            the ndo_bridge_setlink/ndo_bridge_getlink calls
          - flag 'master' is always used to manage the in kernel bridge devices
            via the same ndo_bridge_setlink/ndo_bridge_getlink calls
      Today these are used separately. The nic offloads use hwmode "vepa/veb" to go
      directly to hw with the "self" flag.
      At this point i am trying not to introduce any new user facing flags/attributes.
      In the model where we want the kernel bridging to be accelerated with
      hardware, we very much want the bridge driver to be involved.
      In this proposal,
      - The offload flag/bit helps switch asic drivers to indicate that they
        accelerate the kernel networking objects/functions
      - The user does not have to specify a new flag to do so. A bridge created with
        switch asic ports will be accelerated if the switch driver supports it.
      - The user can continue to directly manage l2 in nics (ixgbe) using the
        existing hwmode/self flags
      - It also does not stop users from using the 'self' flag to talk to the
        switch asic driver directly
      - Involving the bridge driver makes sure the add/del notifications to user
        space go out after both kernel and hardware are programmed
      (To selectively offload bridge port attributes,
      example learning in hw only etc, we can introduce offload bits for
      per bridge port flag attribute as in my previous patch
      https://patchwork.ozlabs.org/patch/413211/. I have not included that in this
         - try a different name for the offload flag/bit
         - tries to solve the stacked netdev case by traversing the lowerdev
           list to reach the switch port
      v3 -
          - Tested with bond as bridge port for the stacked device case.
            Includes a bond_fix_features change to not ignore the
          - Some checkpatch fixes
      v4 -
          - rename flag to NETIF_F_HW_SWITCH_OFFLOAD
          - add ndo_bridge_setlink/dellink handlers in bond and team drivers as
            suggested by jiri.
          - introduce default ndo_dflt_netdev_switch_port_bridge_setlink/dellink
          handlers that masters can use to call offload api on lowerdevs.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>