1. 21 Mar, 2016 1 commit
  2. 04 Mar, 2016 1 commit
  3. 01 Mar, 2016 1 commit
  4. 29 Feb, 2016 2 commits
    • Marina Varshaver's avatar
      IB/core: Add don't trap flag to flow creation · a3100a78
      Marina Varshaver authored
      Don't trap flag (i.e. IB_FLOW_ATTR_FLAGS_DONT_TRAP) indicates that QP
      will receive traffic, but will not steal it.
      
      When a packet matches a flow steering rule that was created with
      the don't trap flag, the QPs assigned to this rule will get this
      packet, but matching will continue to other equal/lower priority
      rules. This will let other QPs assigned to those rules to get the
      packet too.
      
      If both don't trap rule and other rules have the same priority
      and match the same packet, the behavior is undefined.
      
      The don't trap flag can't be set with default rule types
      (i.e. IB_FLOW_ATTR_ALL_DEFAULT, IB_FLOW_ATTR_MC_DEFAULT) as default rules
      don't have rules after them and don't trap has no meaning here.
      Signed-off-by: default avatarMarina Varshaver <marinav@mellanox.com>
      Reviewed-by: default avatarMatan Barak <matanb@mellanox.com>
      Reviewed-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      a3100a78
    • Steve Wise's avatar
      IB: new common API for draining queues · 765d6774
      Steve Wise authored
      Add provider-specific drain_sq/drain_rq functions for providers needing
      special drain logic.
      
      Add static functions __ib_drain_sq() and __ib_drain_rq() which post noop
      WRs to the SQ or RQ and block until their completions are processed.
      This ensures the applications completions for work requests posted prior
      to the drain work request have all been processed.
      
      Add API functions ib_drain_sq(), ib_drain_rq(), and ib_drain_qp().
      
      For the drain logic to work, the caller must:
      
      ensure there is room in the CQ(s) and QP for the drain work request
      and completion.
      
      allocate the CQ using ib_alloc_cq() and the CQ poll context cannot be
      IB_POLL_DIRECT.
      
      ensure that there are no other contexts that are posting WRs concurrently.
      Otherwise the drain is not guaranteed.
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      765d6774
  5. 19 Jan, 2016 1 commit
  6. 23 Dec, 2015 12 commits
  7. 22 Dec, 2015 2 commits
  8. 11 Dec, 2015 1 commit
    • Christoph Hellwig's avatar
      IB: add a proper completion queue abstraction · 14d3a3b2
      Christoph Hellwig authored
      This adds an abstraction that allows ULPs to simply pass a completion
      object and completion callback with each submitted WR and let the RDMA
      core handle the nitty gritty details of how to handle completion
      interrupts and poll the CQ.
      
      In detail there is a new ib_cqe structure which just contains the
      completion callback, and which can be used to get at the containing
      object using container_of.  It is pointed to by the WR and WC as an
      alternative to the wr_id field, similar to how many ULPs already use
      the field to store a pointer using casts.
      
      A driver using the new completion callbacks allocates it's CQs using
      the new ib_create_cq API, which in addition to the number of CQEs and
      the completion vectors also takes a mode on how we poll for CQEs.
      Three modes are available: direct for drivers that never take CQ
      interrupts and just poll for them, softirq to poll from softirq context
      using the to be renamed blk-iopoll infrastructure which takes care of
      rearming and budgeting, or a workqueue for consumer who want to be
      called from user context.
      
      Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
      the current version of the workqueue code because my two previous
      attempts sucked too much and converted the iSER initiator to the new
      API.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      14d3a3b2
  9. 07 Dec, 2015 1 commit
  10. 30 Oct, 2015 1 commit
  11. 29 Oct, 2015 1 commit
  12. 28 Oct, 2015 1 commit
    • Sagi Grimberg's avatar
      IB/core: Introduce new fast registration API · 4c67e2bf
      Sagi Grimberg authored
      The new fast registration  verb ib_map_mr_sg receives a scatterlist
      and converts it to a page list under the verbs API thus hiding
      the specific HW mapping details away from the consumer.
      
      The provider drivers are provided with a generic helper ib_sg_to_pages
      that converts a scatterlist into a vector of page addresses. The
      drivers can still perform any HW specific page address setting
      by passing a set_page function pointer which will be invoked for
      each page address. This allows drivers to avoid keeping a shadow
      page vectors and convert them to HW specific translations by doing
      extra copies.
      
      This API will allow ULPs to remove the duplicated code of constructing
      a page vector from a given sg list.
      
      The send work request ib_reg_wr also shrinks as it will contain only
      mr, key and access flags in addition.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Tested-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      4c67e2bf
  13. 21 Oct, 2015 2 commits
  14. 08 Oct, 2015 2 commits
  15. 28 Sep, 2015 1 commit
  16. 30 Aug, 2015 10 commits
    • Yishai Hadas's avatar
      IB/uverbs: Enable device removal when there are active user space applications · 036b1063
      Yishai Hadas authored
      Enables the uverbs_remove_one to succeed despite the fact that there are
      running IB applications working with the given ib device.  This
      functionality enables a HW device to be unbind/reset despite the fact that
      there are running user space applications using it.
      
      It exposes a new IB kernel API named 'disassociate_ucontext' which lets
      a driver detaching its HW resources from a given user context without
      crashing/terminating the application. In case a driver implemented the
      above API and registered with ib_uverb there will be no dependency between its
      device to its uverbs_device. Upon calling remove_one of ib_uverbs the call
      should return after disassociating the open HW resources without waiting to
      clients disconnecting. In case driver didn't implement this API there will be no
      change to current behaviour and uverbs_remove_one will return only when last
      client has disconnected and reference count on uverbs device became 0.
      
      In case the lower driver device was removed any application will
      continue working over some zombie HCA, further calls will ended with an
      immediate error.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      036b1063
    • Jason Gunthorpe's avatar
      IB/core: Make ib_dealloc_pd return void · 7dd78647
      Jason Gunthorpe authored
      The majority of callers never check the return value, and even if they
      did, they can't do anything about a failure.
      
      All possible failure cases represent a bug in the caller, so just
      WARN_ON inside the function instead.
      
      This fixes a few random errors:
       net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)
      
      This also lays the ground work to get rid of error return from the
      drivers. Most drivers do not error, the few that do are broken since
      it cannot be handled.
      
      Since uverbs can legitimately make use of EBUSY, open code the
      check.
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      7dd78647
    • Jason Gunthorpe's avatar
      IB/core: Guarantee that a local_dma_lkey is available · 96249d70
      Jason Gunthorpe authored
      Every single ULP requires a local_dma_lkey to do anything with
      a QP, so let us ensure one exists for every PD created.
      
      If the driver can supply a global local_dma_lkey then use that, otherwise
      ask the driver to create a local use all physical memory MR associated
      with the new PD.
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Acked-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Tested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      96249d70
    • Moni Shoua's avatar
      IB/mlx4: Implement ib_device callbacks · e26be1bf
      Moni Shoua authored
      get_netdev: get the net_device on the physical port of the IB transport port. In
      port aggregation mode it is required to return the netdev of the active port.
      
      modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
      the harsware GID table. It is possible that indexes in cahce and hardware tables
      won't match so a translation is required when modifying a QP or creating an
      address handle.
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e26be1bf
    • Matan Barak's avatar
      IB/core: Add RoCE GID table management · 03db3a2d
      Matan Barak authored
      RoCE GIDs are based on IP addresses configured on Ethernet net-devices
      which relate to the RDMA (RoCE) device port.
      
      Currently, each of the low-level drivers that support RoCE (ocrdma,
      mlx4) manages its own RoCE port GID table. As there's nothing which is
      essentially vendor specific, we generalize that, and enhance the RDMA
      core GID cache to do this job.
      
      In order to populate the GID table, we listen for events:
      
      (a) netdev up/down/change_addr events - if a netdev is built onto
          our RoCE device, we need to add/delete its IPs. This involves
          adding all GIDs related to this ndev, add default GIDs, etc.
      
      (b) inet events - add new GIDs (according to the IP addresses)
          to the table.
      
      For programming the port RoCE GID table, providers must implement
      the add_gid and del_gid callbacks.
      
      RoCE GID management requires us to state the associated net_device
      alongside the GID. This information is necessary in order to manage
      the GID table. For example, when a net_device is removed, its
      associated GIDs need to be removed as well.
      
      RoCE mandates generating a default GID for each port, based on the
      related net-device's IPv6 link local. In contrast to the GID based on
      the regular IPv6 link-local (as we generate GID per IP address),
      the default GID is also available when the net device is down (in
      order to support loopback).
      
      Locking is done as follows:
      The patch modify the GID table code both for new RoCE drivers
      implementing the add_gid/del_gid callbacks and for current RoCE and
      IB drivers that do not. The flows for updating the table are
      different, so the locking requirements are too.
      
      While updating RoCE GID table, protection against multiple writers is
      achieved via mutex_lock(&table->lock). Since writing to a table
      requires us to find an entry (possible a free entry) in the table and
      then modify it, this mutex protects both the find_gid and write_gid
      ensuring the atomicity of the action.
      Each entry in the GID cache is protected by rwlock. In RoCE, writing
      (usually results from netdev notifier) involves invoking the vendor's
      add_gid and del_gid callbacks, which could sleep.
      Therefore, an invalid flag is added for each entry. Updates for RoCE are
      done via a workqueue, thus sleeping is permitted.
      
      In IB, updates are done in write_lock_irq(&device->cache.lock), thus
      write_gid isn't allowed to sleep and add_gid/del_gid are not called.
      
      When passing net-device into/out-of the GID cache, the device
      is always passed held (dev_hold).
      
      The code uses a single work item for updating all RDMA devices,
      following a netdev or inet notifier.
      
      The patch moves the cache from being a client (which was incorrect,
      as the cache is part of the IB infrastructure) to being explicitly
      initialized/freed when a device is registered/removed.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      03db3a2d
    • Sagi Grimberg's avatar
      IB/core: Drop ib_alloc_fast_reg_mr · d9f272c5
      Sagi Grimberg authored
      Fully replaced by a more generic and suitable
      ib_alloc_mr.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      d9f272c5
    • Sagi Grimberg's avatar
      IB: Modify ib_create_mr API · 9bee178b
      Sagi Grimberg authored
      Use ib_alloc_mr with specific parameters.
      Change the existing callers.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      9bee178b
    • Sagi Grimberg's avatar
      IB/core: Get rid of redundant verb ib_destroy_mr · 8b91ffc1
      Sagi Grimberg authored
      This was added in a thought of uniting all mr allocation
      and deallocation routines but the fact is we have a single
      deallocation routine already, ib_dereg_mr.
      
      And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
      (includes only signature stuff for now).
      
      And, fixup the only callers (iser/isert) accordingly.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      8b91ffc1
    • Yotam Kenneth's avatar
      IB/core: Find the network device matching connection parameters · 9268f72d
      Yotam Kenneth authored
      In the case of IPoIB, and maybe in other cases, the network device is
      managed by an upper-layer protocol (ULP). In order to expose this
      network device to other users of the IB device, let ULPs implement
      a callback that returns network device according to connection parameters.
      
      The IB device and port, together with the P_Key and the GID should
      be enough to uniquely identify the ULP net device. However, in current
      kernels there can be multiple IPoIB interfaces created with the same GID.
      Furthermore, such configuration may be desireable to support ipvlan-like
      configurations for RDMA CM with IPoIB.  To resolve the device in these
      cases the code will also take the IP address as an additional input.
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarYotam Kenneth <yotamke@mellanox.com>
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Signed-off-by: default avatarGuy Shapiro <guysh@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      9268f72d
    • Haggai Eran's avatar
      IB/core: lock client data with lists_rwsem · 7c1eb45a
      Haggai Eran authored
      An ib_client callback that is called with the lists_rwsem locked only for
      read is protected from changes to the IB client lists, but not from
      ib_unregister_device() freeing its client data. This is because
      ib_unregister_device() will remove the device from the device list with
      lists_rwsem locked for write, but perform the rest of the cleanup,
      including the call to remove() without that lock.
      
      Mark client data that is undergoing de-registration with a new going_down
      flag in the client data context. Lock the client data list with lists_rwsem
      for write in addition to using the spinlock, so that functions calling the
      callback would be able to lock only lists_rwsem for read and let callbacks
      sleep.
      
      Since ib_unregister_client() now marks the client data context, no need for
      remove() to search the context again, so pass the client data directly to
      remove() callbacks.
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      7c1eb45a