1. 11 Dec, 2014 14 commits
    • David S. Miller's avatar
      Merge branch 'mlx4-next' · efef7939
      David S. Miller authored
      Or Gerlitz says:
      mlx4 driver update
      This series from Matan, Jenny, Dotan and myself is mostly about adding
      support to a new performance optimized flow steering mode (patches 4-10).
      The 1st two patches are small fixes (one for VXLAN and one for SRIOV),
      and the third patch is a fix to avoid hard-lockup situation when many
      (hunderds) processes holding user-space QPs/CQs get events.
      Matan and Or.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4: Add support for A0 steering · 7d077cd3
      Matan Barak authored
      Add the required firmware commands for A0 steering and a way to enable
      that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
      QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
      to configure and query the device.
      The different A0 DMFS (steering) modes are:
      Static - optimized performance, but flow steering rules are
      limited. This mode should be choosed explicitly by the user
      in order to be used.
      Dynamic - this mode should be explicitly choosed by the user.
      In this mode, the FW works in optimized steering mode as long as
      it can and afterwards automatically drops to classic (full) DMFS.
      Disable - this mode should be explicitly choosed by the user.
      The user instructs the system not to use optimized steering, even if
      the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
      steering in Default A0 DMFS mode).
      Default - this mode is implicitly choosed. In this mode, if the FW
      supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
      work at Disable A0 DMFS mode.
      Under SRIOV configuration, when the A0 steering mode is enabled,
      older guest VF drivers who aren't using the RX QP allocation flag
      (MLX4_RESERVE_A0_QP) will get a QP from the general range and
      fail when attempting to register a steering rule. To avoid that,
      the PF context behaviour is changed once on A0 static mode, to
      require support for the allocation flag in VF drivers too.
      In order to enable A0 steering, we use log_num_mgm_entry_size param.
      If the value of the parameter is not positive, we treat the absolute
      value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
      bit field enables static A0 steering.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4: Refactor QUERY_PORT · 431df8c7
      Matan Barak authored
      Currently QUERY_PORT is done as a part of QUERY_DEV_CAP firmware command.
      Since we would like to use it without querying all device capabilities,
      extract this part to be a function of its own.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4_core: Add explicit error message when rule doesn't meet configuration · 579d059b
      Matan Barak authored
      When a given flow steering rule is invalid in respect to the current
      steering configuration, print the correct error message to the system log.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak authored
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4: Add mlx4_bitmap zone allocator · 7a89399f
      Matan Barak authored
      The zone allocator is a mechanism which manages a few mlx4_bitmaps.
      When allocating a resource, the user indicates the desired zone of
      which this resource will be allocated from. If possible, the resource
      will be allocated from this zone. Otherwise, the resource will be
      allocated from a less-than, equal-to, higher-than priority zone,
      according to the desired zone's properties with that respective
      allocation order.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Dotan Barak's avatar
      net/mlx4: Add a check if there are too many reserved QPs · ab256e5a
      Dotan Barak authored
      The number of reserved QPs is affected both from the firmware and
      from the driver's requirements. This patch adds a check that
      validates that this number is indeed feasable.
      Signed-off-by: default avatarDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eugenia Emantayev's avatar
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev authored
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      Bits 24-30 of the flags are currently reserved.
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak authored
      Previously, we've fired all our completion callbacks straight from our ISR.
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Or Gerlitz's avatar
      net/mlx4_core: Mask out host side virtualization features for guests · 383677da
      Or Gerlitz authored
      When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
      need not be told that host side virtualization features such as VST, FSM
      (MAC anti-spoofing) and running > 80 VFs are supported by the device.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Or Gerlitz's avatar
      net/mlx4_en: Set csum level for encapsulated packets · c58942f2
      Or Gerlitz authored
      This was dropped by mistake for the napi_gro_frags flow, fix that.
      Fixes: dd65beac
       ('net/mlx4_en: Extend usage of napi_gro_frags')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Sriharsha Basavapatna's avatar
      be2net: Export tunnel offloads only when a VxLAN tunnel is created · 630f4b70
      Sriharsha Basavapatna authored
      The encapsulated offload flags shouldn't be unconditionally exported
      to the stack. The stack expects offloading to work across all tunnel
      types when those flags are set. This would break other tunnels (like
      GRE) since be2net currently supports tunnel offload for VxLAN only.
      Also, with VxLANs Skyhawk-R can offload only 1 UDP dport. If more
      than 1 UDP port is added, we should disable offloads in that case too.
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Kevin Hao's avatar
      gianfar: Fix dma check map error when DMA_API_DEBUG is enabled · 0a4b5a24
      Kevin Hao authored
      We need to use dma_mapping_error() to check the dma address returned
      by dma_map_single/page(). Otherwise we would get warning like this:
        WARNING: at lib/dma-debug.c:1140
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc2-next-20141029 #196
        task: c0834300 ti: effe6000 task.ti: c0874000
        NIP: c02b2c98 LR: c02b2c98 CTR: c030abc4
        REGS: effe7d70 TRAP: 0700   Not tainted  (3.18.0-rc2-next-20141029)
        MSR: 00021000 <CE,ME>  CR: 22044022  XER: 20000000
        GPR00: c02b2c98 effe7e20 c0834300 00000098 00021000 00000000 c030b898 00000003
        GPR08: 00000001 00000000 00000001 749eec9d 22044022 1001abe0 00000020 ef278678
        GPR16: ef278670 ef278668 ef278660 070a8040 c087f99c c08cdc60 00029000 c0840d44
        GPR24: c08be6e8 c0840000 effe7e78 ef041340 00000600 ef114e10 00000000 c08be6e0
        NIP [c02b2c98] check_unmap+0x51c/0x9e4
        LR [c02b2c98] check_unmap+0x51c/0x9e4
        Call Trace:
        [effe7e20] [c02b2c98] check_unmap+0x51c/0x9e4 (unreliable)
        [effe7e70] [c02b31d8] debug_dma_unmap_page+0x78/0x8c
        [effe7ed0] [c03d1640] gfar_clean_rx_ring+0x208/0x488
        [effe7f40] [c03d1a9c] gfar_poll_rx_sq+0x3c/0xa8
        [effe7f60] [c04f8714] net_rx_action+0xc0/0x178
        [effe7f90] [c00435a0] __do_softirq+0x100/0x1fc
        [effe7fe0] [c0043958] irq_exit+0xa4/0xc8
        [effe7ff0] [c000d14c] call_do_irq+0x24/0x3c
        [c0875e90] [c00048a0] do_IRQ+0x8c/0xf8
        [c0875eb0] [c000ed10] ret_from_except+0x0/0x18
      For TX, we need to unmap the pages which has already been mapped and
      free the skb before return.
      For RX, move the dma mapping and error check to gfar_new_skb(). We
      would reuse the original skb in the rx ring when either allocating
      skb failure or dma mapping error.
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Hariprasad Shenai's avatar
      cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call · 666224d4
      Hariprasad Shenai authored
      Remove use of calls into t4_fw_hello() with MASTER_MUST, which results in
      FW_HELLO_CMD_MASTERFORCE being set. The firmware doesn't support this and of
      course any existing PF Drivers will totally go for a toss.
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 10 Dec, 2014 26 commits