1. 28 Feb, 2012 1 commit
  2. 24 Feb, 2012 1 commit
    • Jozsef Kadlecsik's avatar
      netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2) · 7d367e06
      Jozsef Kadlecsik authored
      Marcell Zambo and Janos Farago noticed and reported that when
      new conntrack entries are added via netlink and the conntrack table
      gets full, soft lockup happens. This is because the nf_conntrack_lock
      is held while nf_conntrack_alloc is called, which is in turn wants
      to lock nf_conntrack_lock while evicting entries from the full table.
      
      The patch fixes the soft lockup with limiting the holding of the
      nf_conntrack_lock to the minimum, where it's absolutely required.
      It required to extend (and thus change) nf_conntrack_hash_insert
      so that it makes sure conntrack and ctnetlink do not add the same entry
      twice to the conntrack table.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d367e06
  3. 21 Feb, 2012 1 commit
    • Greg Rose's avatar
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose authored
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: default avatarGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      115c9b81
  4. 15 Feb, 2012 6 commits
    • Ulisses Furquim's avatar
      Bluetooth: Remove usage of __cancel_delayed_work() · 6de32750
      Ulisses Furquim authored
      __cancel_delayed_work() is being used in some paths where we cannot
      sleep waiting for the delayed work to finish. However, that function
      might return while the timer is running and the work will be queued
      again. Replace the calls with safer cancel_delayed_work() version
      which spins until the timer handler finishes on other CPUs and
      cancels the delayed work.
      Signed-off-by: default avatarUlisses Furquim <ulisses@profusion.mobi>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      6de32750
    • Andre Guedes's avatar
      Bluetooth: Fix potential deadlock · a51cd2be
      Andre Guedes authored
      We don't need to use the _sync variant in hci_conn_hold and
      hci_conn_put to cancel conn->disc_work delayed work. This way
      we avoid potential deadlocks like this one reported by lockdep.
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.2.0+ #1 Not tainted
      -------------------------------------------------------
      kworker/u:1/17 is trying to acquire lock:
       (&hdev->lock){+.+.+.}, at: [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
      
      but task is already holding lock:
       ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 ((&(&conn->disc_work)->work)){+.+...}:
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff81034ed1>] wait_on_work+0x3d/0xaa
             [<ffffffff81035b54>] __cancel_work_timer+0xac/0xef
             [<ffffffff81035ba4>] cancel_delayed_work_sync+0xd/0xf
             [<ffffffffa00554b0>] smp_chan_create+0xde/0xe6 [bluetooth]
             [<ffffffffa0056160>] smp_conn_security+0xa3/0x12d [bluetooth]
             [<ffffffffa0053640>] l2cap_connect_cfm+0x237/0x2e8 [bluetooth]
             [<ffffffffa004239c>] hci_proto_connect_cfm+0x2d/0x6f [bluetooth]
             [<ffffffffa0046ea5>] hci_event_packet+0x29d1/0x2d60 [bluetooth]
             [<ffffffffa003dde3>] hci_rx_work+0xd0/0x2e1 [bluetooth]
             [<ffffffff810357af>] process_one_work+0x178/0x2bf
             [<ffffffff81036178>] worker_thread+0xce/0x152
             [<ffffffff81039a03>] kthread+0x95/0x9d
             [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
      
      -> #1 (slock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}:
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff812e553a>] _raw_spin_lock_bh+0x36/0x6a
             [<ffffffff81244d56>] lock_sock_nested+0x24/0x7f
             [<ffffffffa004d96f>] lock_sock+0xb/0xd [bluetooth]
             [<ffffffffa0052906>] l2cap_chan_connect+0xa9/0x26f [bluetooth]
             [<ffffffffa00545f8>] l2cap_sock_connect+0xb3/0xff [bluetooth]
             [<ffffffff81243b48>] sys_connect+0x69/0x8a
             [<ffffffff812e6579>] system_call_fastpath+0x16/0x1b
      
      -> #0 (&hdev->lock){+.+.+.}:
             [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
             [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
             [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
             [<ffffffff810357af>] process_one_work+0x178/0x2bf
             [<ffffffff81036178>] worker_thread+0xce/0x152
             [<ffffffff81039a03>] kthread+0x95/0x9d
             [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
      
      other info that might help us debug this:
      
      Chain exists of:
        &hdev->lock --> slock-AF_BLUETOOTH-BTPROTO_L2CAP --> (&(&conn->disc_work)->work)
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock((&(&conn->disc_work)->work));
                                     lock(slock-AF_BLUETOOTH-BTPROTO_L2CAP);
                                     lock((&(&conn->disc_work)->work));
        lock(&hdev->lock);
      
       *** DEADLOCK ***
      
      2 locks held by kworker/u:1/17:
       #0:  (hdev->name){.+.+.+}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
       #1:  ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
      
      stack backtrace:
      Pid: 17, comm: kworker/u:1 Not tainted 3.2.0+ #1
      Call Trace:
       [<ffffffff812e06c6>] print_circular_bug+0x1f8/0x209
       [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
       [<ffffffff81021ef2>] ? arch_local_irq_restore+0x6/0xd
       [<ffffffff81022bc7>] ? vprintk+0x3f9/0x41e
       [<ffffffff81057444>] lock_acquire+0x8a/0xa7
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff81190fd6>] ? __dynamic_pr_debug+0x6d/0x6f
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff8105320f>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
       [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff810357af>] process_one_work+0x178/0x2bf
       [<ffffffff81035751>] ? process_one_work+0x11a/0x2bf
       [<ffffffff81055af3>] ? lock_acquired+0x1d0/0x1df
       [<ffffffffa00410f3>] ? hci_acl_disconn+0x65/0x65 [bluetooth]
       [<ffffffff81036178>] worker_thread+0xce/0x152
       [<ffffffff810407ed>] ? finish_task_switch+0x45/0xc5
       [<ffffffff810360aa>] ? manage_workers.isra.25+0x16a/0x16a
       [<ffffffff81039a03>] kthread+0x95/0x9d
       [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
       [<ffffffff812e5db4>] ? retint_restore_args+0x13/0x13
       [<ffffffff8103996e>] ? __init_kthread_worker+0x55/0x55
       [<ffffffff812e7750>] ? gs_change+0x13/0x13
      Signed-off-by: default avatarAndre Guedes <andre.guedes@openbossa.org>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@openbossa.org>
      Reviewed-by: default avatarUlisses Furquim <ulisses@profusion.mobi>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      a51cd2be
    • Octavian Purdila's avatar
      Bluetooth: silence lockdep warning · b5a30dda
      Octavian Purdila authored
      Since bluetooth uses multiple protocols types, to avoid lockdep
      warnings, we need to use different lockdep classes (one for each
      protocol type).
      
      This is already done in bt_sock_create but it misses a couple of cases
      when new connections are created. This patch corrects that to fix the
      following warning:
      
      <4>[ 1864.732366] =======================================================
      <4>[ 1864.733030] [ INFO: possible circular locking dependency detected ]
      <4>[ 1864.733544] 3.0.16-mid3-00007-gc9a0f62 #3
      <4>[ 1864.733883] -------------------------------------------------------
      <4>[ 1864.734408] t.android.btclc/4204 is trying to acquire lock:
      <4>[ 1864.734869]  (rfcomm_mutex){+.+.+.}, at: [<c14970ea>] rfcomm_dlc_close+0x15/0x30
      <4>[ 1864.735541]
      <4>[ 1864.735549] but task is already holding lock:
      <4>[ 1864.736045]  (sk_lock-AF_BLUETOOTH){+.+.+.}, at: [<c1498bf7>] lock_sock+0xa/0xc
      <4>[ 1864.736732]
      <4>[ 1864.736740] which lock already depends on the new lock.
      <4>[ 1864.736750]
      <4>[ 1864.737428]
      <4>[ 1864.737437] the existing dependency chain (in reverse order) is:
      <4>[ 1864.738016]
      <4>[ 1864.738023] -> #1 (sk_lock-AF_BLUETOOTH){+.+.+.}:
      <4>[ 1864.738549]        [<c1062273>] lock_acquire+0x104/0x140
      <4>[ 1864.738977]        [<c13d35c1>] lock_sock_nested+0x58/0x68
      <4>[ 1864.739411]        [<c1493c33>] l2cap_sock_sendmsg+0x3e/0x76
      <4>[ 1864.739858]        [<c13d06c3>] __sock_sendmsg+0x50/0x59
      <4>[ 1864.740279]        [<c13d0ea2>] sock_sendmsg+0x94/0xa8
      <4>[ 1864.740687]        [<c13d0ede>] kernel_sendmsg+0x28/0x37
      <4>[ 1864.741106]        [<c14969ca>] rfcomm_send_frame+0x30/0x38
      <4>[ 1864.741542]        [<c1496a2a>] rfcomm_send_ua+0x58/0x5a
      <4>[ 1864.741959]        [<c1498447>] rfcomm_run+0x441/0xb52
      <4>[ 1864.742365]        [<c104f095>] kthread+0x63/0x68
      <4>[ 1864.742742]        [<c14d5182>] kernel_thread_helper+0x6/0xd
      <4>[ 1864.743187]
      <4>[ 1864.743193] -> #0 (rfcomm_mutex){+.+.+.}:
      <4>[ 1864.743667]        [<c1061ada>] __lock_acquire+0x988/0xc00
      <4>[ 1864.744100]        [<c1062273>] lock_acquire+0x104/0x140
      <4>[ 1864.744519]        [<c14d2c70>] __mutex_lock_common+0x3b/0x33f
      <4>[ 1864.744975]        [<c14d303e>] mutex_lock_nested+0x2d/0x36
      <4>[ 1864.745412]        [<c14970ea>] rfcomm_dlc_close+0x15/0x30
      <4>[ 1864.745842]        [<c14990d9>] __rfcomm_sock_close+0x5f/0x6b
      <4>[ 1864.746288]        [<c1499114>] rfcomm_sock_shutdown+0x2f/0x62
      <4>[ 1864.746737]        [<c13d275d>] sys_socketcall+0x1db/0x422
      <4>[ 1864.747165]        [<c14d42f0>] syscall_call+0x7/0xb
      Signed-off-by: default avatarOctavian Purdila <octavian.purdila@intel.com>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      b5a30dda
    • Vinicius Costa Gomes's avatar
      Bluetooth: Fix using an absolute timeout on hci_conn_put() · 33166063
      Vinicius Costa Gomes authored
      queue_delayed_work() expects a relative time for when that work
      should be scheduled.
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@openbossa.org>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      33166063
    • Andrzej Kaczmarek's avatar
      Bluetooth: l2cap_set_timer needs jiffies as timeout value · 6e1da683
      Andrzej Kaczmarek authored
      After moving L2CAP timers to workqueues l2cap_set_timer expects timeout
      value to be specified in jiffies but constants defined in miliseconds
      are used. This makes timeouts unreliable when CONFIG_HZ is not set to
      1000.
      
      __set_chan_timer macro still uses jiffies as input to avoid multiple
      conversions from/to jiffies for sk_sndtimeo value which is already
      specified in jiffies.
      Signed-off-by: default avatarAndrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
      Ackec-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      6e1da683
    • Johan Hedberg's avatar
      Bluetooth: Remove bogus inline declaration from l2cap_chan_connect · 4aa832c2
      Johan Hedberg authored
      As reported by Dan Carpenter this function causes a Sparse warning and
      shouldn't be declared inline:
      
      include/net/bluetooth/l2cap.h:837:30 error: marked inline, but without a
      definition"
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      4aa832c2
  5. 10 Feb, 2012 1 commit
  6. 09 Feb, 2012 1 commit
  7. 04 Feb, 2012 1 commit
    • Julian Anastasov's avatar
      ipv4: reset flowi parameters on route connect · e6b45241
      Julian Anastasov authored
      Eric Dumazet found that commit 813b3b5d
      (ipv4: Use caller's on-stack flowi as-is in output
      route lookups.) that comes in 3.0 added a regression.
      The problem appears to be that resulting flowi4_oif is
      used incorrectly as input parameter to some routing lookups.
      The result is that when connecting to local port without
      listener if the IP address that is used is not on a loopback
      interface we incorrectly assign RTN_UNICAST to the output
      route because no route is matched by oif=lo. The RST packet
      can not be sent immediately by tcp_v4_send_reset because
      it expects RTN_LOCAL.
      
      	So, change ip_route_connect and ip_route_newports to
      update the flowi4 fields that are input parameters because
      we do not want unnecessary binding to oif.
      
      	To make it clear what are the input parameters that
      can be modified during lookup and to show which fields of
      floiw4 are reused add a new function to update the flowi4
      structure: flowi4_update_output.
      
      Thanks to Yurij M. Plotnikov for providing a bug report including a
      program to reproduce the problem.
      
      Thanks to Eric Dumazet for tracking the problem down to
      tcp_v4_send_reset and providing initial fix.
      Reported-by: default avatarYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6b45241
  8. 01 Feb, 2012 1 commit
    • Arun Sharma's avatar
      net: Disambiguate kernel message · efcdbf24
      Arun Sharma authored
      Some of our machines were reporting:
      
      TCP: too many of orphaned sockets
      
      even when the number of orphaned sockets was well below the
      limit.
      
      We print a different message depending on whether we're out
      of TCP memory or there are too many orphaned sockets.
      
      Also move the check out of line and cleanup the messages
      that were printed.
      Signed-off-by: default avatarArun Sharma <asharma@fb.com>
      Suggested-by: default avatarMohan Srinivasan <mohan@fb.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efcdbf24
  9. 30 Jan, 2012 1 commit
  10. 27 Jan, 2012 1 commit
  11. 26 Jan, 2012 1 commit
  12. 24 Jan, 2012 1 commit
  13. 23 Jan, 2012 1 commit
  14. 22 Jan, 2012 4 commits
  15. 17 Jan, 2012 1 commit
  16. 12 Jan, 2012 1 commit
    • Eric Dumazet's avatar
      net_sched: sfq: add optional RED on top of SFQ · ddecf0f4
      Eric Dumazet authored
      Adds an optional Random Early Detection on each SFQ flow queue.
      
      Traditional SFQ limits count of packets, while RED permits to also
      control number of bytes per flow, and adds ECN capability as well.
      
      1) We dont handle the idle time management in this RED implementation,
      since each 'new flow' begins with a null qavg. We really want to address
      backlogged flows.
      
      2) if headdrop is selected, we try to ecn mark first packet instead of
      currently enqueued packet. This gives faster feedback for tcp flows
      compared to traditional RED [ marking the last packet in queue ]
      
      Example of use :
      
      tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
      	limit 3000 headdrop flows 512 divisor 16384 \
      	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn
      
      qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
      flows 512/16384 divisor 16384
       ewma 6 min 8000b max 60000b probability 0.2 ecn
       prob_mark 0 prob_mark_head 4876 prob_drop 6131
       forced_mark 0 forced_mark_head 0 forced_drop 0
       Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
      requeues 0)
       rate 99483Kbit 8219pps backlog 689392b 456p requeues 0
      
      In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
      flows, we can see number of packets CE marked is smaller than number of
      drops (for non ECN flows)
      
      If same test is run, without RED, we can check backlog is much bigger.
      
      qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
      flows 512/16384 divisor 16384
       Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
       rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Dave Taht <dave.taht@gmail.com>
      Tested-by: default avatarDave Taht <dave.taht@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddecf0f4
  17. 09 Jan, 2012 1 commit
  18. 07 Jan, 2012 1 commit
    • Glauber Costa's avatar
      net: fix sock_clone reference mismatch with tcp memcontrol · f3f511e1
      Glauber Costa authored
      Sockets can also be created through sock_clone. Because it copies
      all data in the sock structure, it also copies the memcg-related pointer,
      and all should be fine. However, since we now use reference counts in
      socket creation, we are left with some sockets that have no reference
      counts. It matters when we destroy them, since it leads to a mismatch.
      Signed-off-by: default avatarGlauber Costa <glommer@parallels.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Greg Thelen <gthelen@google.com>
      CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
      CC: Laurent Chavey <chavey@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3f511e1
  19. 05 Jan, 2012 2 commits
    • Eric Dumazet's avatar
      net_sched: red: split red_parms into parms and vars · eeca6688
      Eric Dumazet authored
      This patch splits the red_parms structure into two components.
      
      One holding the RED 'constant' parameters, and one containing the
      variables.
      
      This permits a size reduction of GRED qdisc, and is a preliminary step
      to add an optional RED unit to SFQ.
      
      SFQRED will have a single red_parms structure shared by all flows, and a
      private red_vars per flow.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Dave Taht <dave.taht@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeca6688
    • Joe Perches's avatar
      9p: Reduce object size with CONFIG_NET_9P_DEBUG · 5d385153
      Joe Perches authored
      Reduce object size by deduplicating formats.
      
      Use vsprintf extension %pV.
      Rename P9_DPRINTK uses to p9_debug, align arguments.
      Add function for _p9_debug and macro to add __func__.
      Add missing "\n"s to p9_debug uses.
      Remove embedded function names as p9_debug adds it.
      Remove P9_EPRINTK macro and convert use to pr_<level>.
      Add and use pr_fmt and pr_<level>.
      
      $ size fs/9p/built-in.o*
         text	   data	    bss	    dec	    hex	filename
        62133	    984	  16000	  79117	  1350d	fs/9p/built-in.o.new
        67342	    984	  16928	  85254	  14d06	fs/9p/built-in.o.old
      $ size net/9p/built-in.o*
         text	   data	    bss	    dec	    hex	filename
        88792	   4148	  22024	 114964	  1c114	net/9p/built-in.o.new
        94072	   4148	  23232	 121452	  1da6c	net/9p/built-in.o.old
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarEric Van Hensbergen <ericvh@gmail.com>
      5d385153
  20. 04 Jan, 2012 4 commits
  21. 02 Jan, 2012 1 commit
  22. 31 Dec, 2011 1 commit
  23. 30 Dec, 2011 2 commits
    • Josh Hunt's avatar
      IPv6: Avoid taking write lock for /proc/net/ipv6_route · 32b293a5
      Josh Hunt authored
      During some debugging I needed to look into how /proc/net/ipv6_route
      operated and in my digging I found its calling fib6_clean_all() which uses
      "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
      found this on 2.6.32, but reading the code I believe the same basic idea
      exists currently. Looking at the rtnetlink code they are only calling
      "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
      reading from proc isn't the recommended way of fetching the ipv6 route
      table; taking a write lock seems unnecessary and would probably cause
      network performance issues.
      
      To verify this I loaded up the ipv6 route table and then ran iperf in 3
      cases:
        * doing nothing
        * reading ipv6 route table via proc
          (while :; do cat /proc/net/ipv6_route > /dev/null; done)
        * reading ipv6 route table via rtnetlink
          (while :; do ip -6 route show table all > /dev/null; done)
      
      * Load the ipv6 route table up with:
        * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
      
      * iperf commands:
        * client: iperf -i 1 -V -c <ipv6 addr>
        * server: iperf -V -s
      
      * iperf results - 3 runs each (in Mbits/sec)
        * nothing: client: 927,927,927 server: 927,927,927
        * proc: client: 179,97,96,113 server: 142,112,133
        * iproute: client: 928,927,928 server: 927,927,927
      
      lock_stat shows taking the write lock is causing the slowdown. Using this
      info I decided to write a version of fib6_clean_all() which replaces
      write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
      this new function I see the same results as with my rtnetlink iperf test.
      Signed-off-by: default avatarJosh Hunt <joshhunt00@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32b293a5
    • Pavel Emelyanov's avatar
      af_unix: Move CINQ/COUTQ code to helpers · 885ee74d
      Pavel Emelyanov authored
      Currently tcp diag reports rqlen and wqlen values similar to how
      the CINQ/COUTQ iotcls do. To make unix diag report these values
      in the same way move the respective code into helpers.
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      885ee74d
  24. 28 Dec, 2011 3 commits
  25. 23 Dec, 2011 1 commit