1. 04 Feb, 2015 1 commit
  2. 30 Jan, 2015 1 commit
  3. 26 Jan, 2015 1 commit
  4. 15 Jan, 2015 1 commit
    • Ying Xue's avatar
      rhashtable: Fix race in rhashtable_destroy() and use regular work_struct · 57699a40
      Ying Xue authored
      When we put our declared work task in the global workqueue with
      schedule_delayed_work(), its delay parameter is always zero.
      Therefore, we should define a regular work in rhashtable structure
      instead of a delayed work.
      
      By the way, we add a condition to check whether resizing functions
      are NULL before cancelling the work, avoiding to cancel an
      uninitialized work.
      
      Lastly, while we wait for all work items we submitted before to run
      to completion with cancel_delayed_work(), ht->mutex has been taken in
      rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
      until all work items are accomplished, and when work items are
      scheduled, the work's function - rht_deferred_worker() will be called.
      However, as rht_deferred_worker() also needs to acquire the lock,
      deadlock might happen at the moment as the lock is already held before.
      So if the cancel work function is moved out of the lock covered scope,
      this will avoid the deadlock.
      
      Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57699a40
  5. 13 Jan, 2015 2 commits
  6. 08 Jan, 2015 6 commits
  7. 03 Jan, 2015 7 commits
    • Thomas Graf's avatar
      rhashtable: Supports for nulls marker · f89bd6f8
      Thomas Graf authored
      In order to allow for wider usage of rhashtable, use a special nulls
      marker to terminate each chain. The reason for not using the existing
      nulls_list is that the prev pointer usage would not be valid as entries
      can be linked in two different buckets at the same time.
      
      The 4 nulls base bits can be set through the rhashtable_params structure
      like this:
      
      struct rhashtable_params params = {
              [...]
              .nulls_base = (1U << RHT_BASE_SHIFT),
      };
      
      This reduces the hash length from 32 bits to 27 bits.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f89bd6f8
    • Thomas Graf's avatar
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf authored
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97defe1e
    • Thomas Graf's avatar
      nft_hash: Remove rhashtable_remove_pprev() · 897362e4
      Thomas Graf authored
      The removal function of nft_hash currently stores a reference to the
      previous element during lookup which is used to optimize removal later
      on. This was possible because a lock is held throughout calling
      rhashtable_lookup() and rhashtable_remove().
      
      With the introdution of deferred table resizing in parallel to lookups
      and insertions, the nftables lock will no longer synchronize all
      table mutations and the stored pprev may become invalid.
      
      Removing this optimization makes removal slightly more expensive on
      average but allows taking the resize cost out of the insert and
      remove path.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      897362e4
    • Thomas Graf's avatar
      rhashtable: Factor out bucket_tail() function · b8e1943e
      Thomas Graf authored
      Subsequent patches will require access to the bucket tail. Access
      to the tail is relatively cheap as the automatic resizing of the
      table should keep the number of entries per bucket to no more
      than 0.75 on average.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8e1943e
    • Thomas Graf's avatar
      rhashtable: Convert bucket iterators to take table and index · 88d6ed15
      Thomas Graf authored
      This patch is in preparation to introduce per bucket spinlocks. It
      extends all iterator macros to take the bucket table and bucket
      index. It also introduces a new rht_dereference_bucket() to
      handle protected accesses to buckets.
      
      It introduces a barrier() to the RCU iterators to the prevent
      the compiler from caching the first element.
      
      The lockdep verifier is introduced as stub which always succeeds
      and properly implement in the next patch when the locks are
      introduced.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88d6ed15
    • Thomas Graf's avatar
    • Thomas Graf's avatar
      rhashtable: Do hashing inside of rhashtable_lookup_compare() · 8d24c0b4
      Thomas Graf authored
      Hash the key inside of rhashtable_lookup_compare() like
      rhashtable_lookup() does. This allows to simplify the hashing
      functions and keep them private.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d24c0b4
  8. 10 Dec, 2014 1 commit
    • Daniel Borkmann's avatar
      net: replace remaining users of arch_fast_hash with jhash · 87545899
      Daniel Borkmann authored
      This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32
      accelerated flow hash if available"), and other remaining arch_fast_hash()
      users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out
      delegations for 30 seconds after recalling them.") where it has been used
      as a hash function for bloom filtering.
      
      While we think that these users are actually not much of concern, it has
      been requested to remove the arch_fast_hash() library bits that arose
      from [1] entirely as per recent discussion [2]. The main argument is that
      using it as a hash may introduce bias due to its linearity (see avalanche
      criterion) and thus makes it less clear (though we tried to document that)
      when this security/performance trade-off is actually acceptable for a
      general purpose library function.
      
      Lets therefore avoid any further confusion on this matter and remove it to
      prevent any future accidental misuse of it. For the time being, this is
      going to make hashing of flow keys a bit more expensive in the ovs case,
      but future work could reevaluate a different hashing discipline.
      
        [1] https://patchwork.ozlabs.org/patch/299369/
        [2] https://patchwork.ozlabs.org/patch/418756/
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: Francesco Fusco <fusco@ntop.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87545899
  9. 24 Nov, 2014 1 commit
  10. 13 Nov, 2014 4 commits
  11. 19 Sep, 2014 1 commit
  12. 03 Sep, 2014 2 commits
    • Ying Xue's avatar
      lib/rhashtable: allow user to set the minimum shifts of shrinking · 94000176
      Ying Xue authored
      Although rhashtable library allows user to specify a quiet big size
      for user's created hash table, the table may be shrunk to a
      very small size - HASH_MIN_SIZE(4) after object is removed from
      the table at the first time. Subsequently, even if the total amount
      of objects saved in the table is quite lower than user's initial
      setting in a long time, the hash table size is still dynamically
      adjusted by rhashtable_shrink() or rhashtable_expand() each time
      object is inserted or removed from the table. However, as
      synchronize_rcu() has to be called when table is shrunk or
      expanded by the two functions, we should permit user to set the
      minimum table size through configuring the minimum number of shifts
      according to user specific requirement, avoiding these expensive
      actions of shrinking or expanding because of calling synchronize_rcu().
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94000176
    • Pablo Neira Ayuso's avatar
      rhashtable: fix lockdep splat in rhashtable_destroy() · ae82ddcf
      Pablo Neira Ayuso authored
      No need for rht_dereference() from rhashtable_destroy() since the
      existing callers don't hold the mutex when invoking this function
      from:
      
      1) Netlink, this is called in case of memory allocation errors in the
         initialization path, no nl_sk_hash_lock is held.
      2) Netfilter, this is called from the rcu callback, no nfnl_lock is
         held either.
      
      I think it's reasonable to assume that the caller has to make sure
      that no hash resizing may happen before releasing the bucket array.
      Therefore, the caller should be responsible for releasing this in a
      safe way, document this to make people aware of it.
      
      This resolves a rcu lockdep splat in nft_hash:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.16.0+ #178 Not tainted
      -------------------------------
      lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 1
      1 lock held by ksoftirqd/2/18:
       #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7
      
      stack backtrace:
      CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
      Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
       0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
       ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
       0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
      Call Trace:
       [<ffffffff8143debc>] dump_stack+0x4e/0x68
       [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
       [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
       [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      ae82ddcf
  13. 26 Aug, 2014 1 commit
  14. 14 Aug, 2014 2 commits
  15. 02 Aug, 2014 1 commit