1. 03 Dec, 2012 2 commits
    • Michael S. Tsirkin's avatar
      tun: only queue packets on device · 5d097109
      Michael S. Tsirkin authored
      Historically tun supported two modes of operation:
      - in default mode, a small number of packets would get queued
        at the device, the rest would be queued in qdisc
      - in one queue mode, all packets would get queued at the device
      This might have made sense up to a point where we made the
      queue depth for both modes the same and set it to
      a huge value (500) so unless the consumer
      is stuck the chance of losing packets is small.
      Thus in practice both modes behave the same, but the
      default mode has some problems:
      - if packets are never consumed, fragments are never orphaned
        which cases a DOS for sender using zero copy transmit
      - overrun errors are hard to diagnose: fifo error is incremented
        only once so you can not distinguish between
        userspace that is stuck and a transient failure,
        tcpdump on the device does not show any traffic
      Userspace solves this simply by enabling IFF_ONE_QUEUE
      but there seems to be little point in not doing the
      right thing for everyone, by default.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
      tuntap: attach queue 0 before registering netdevice · eb0fb363
      Jason Wang authored
      We attach queue 0 after registering netdevice currently. This leads to call
      netif_set_real_num_{tx|rx}_queues() after registering the netdevice. Since we
      allow tun/tap has a maximum of 1024 queues, this may lead a huge number of
      uevents to be injected to userspace since we create 2048 kobjects and then
      remove 2046. Solve this problem by attaching queue 0 and set the real number of
      queues before registering netdevice.
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Tested-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 26 Nov, 2012 2 commits
  3. 23 Nov, 2012 1 commit
  4. 19 Nov, 2012 1 commit
  5. 02 Nov, 2012 1 commit
  6. 01 Nov, 2012 6 commits
    • Jason Wang's avatar
      tuntap: choose the txq based on rxq · 96442e42
      Jason Wang authored
      This patch implements a simple multiqueue flow steering policy - tx follows rx
      for tun/tap. The idea is simple, it just choose the txq based on which rxq it
      comes. The flow were identified through the rxhash of a skb, and the hash to
      queue mapping were recorded in a hlist with an ageing timer to retire the
      mapping. The mapping were created when tun receives packet from userspace, and
      was quired in .ndo_select_queue().
      I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
      perf top, so the overhead could be negelected.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
      tuntap: add ioctl to attach or detach a file form tuntap device · cde8b15f
      Jason Wang authored
      Sometimes usespace may need to active/deactive a queue, this could be done by
      detaching and attaching a file from tuntap device.
      This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
      this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
      IFF_DETACH_QUEUE were introduced to do the detaching.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
      tuntap: multiqueue support · c8d68e6b
      Jason Wang authored
      This patch converts tun/tap to a multiqueue devices and expose the multiqueue
      queues as multiple file descriptors to userspace. Internally, each tun_file were
      abstracted as a queue, and an array of pointers to tun_file structurs were
      stored in tun_structure device, so multiple tun_files were allowed to be
      attached to the device as multiple queues.
      When choosing txq, we first try to identify a flow through its rxhash, if it
      does not have such one, we could try recorded rxq and then use them to choose
      the transmit queue. This policy may be changed in the future.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
      tuntap: RCUify dereferencing between tun_struct and tun_file · 6e914fc7
      Jason Wang authored
      RCU were introduced in this patch to synchronize the dereferences between
      tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
      dereference from one to other must be done under rtnl lock or rcu read critical
      This is needed for the following patches since the one of the goal of multiqueue
      tuntap is to allow adding or removing queues during workload. Without RCU,
      control path would hold tx locks when adding or removing queues (which may cause
      sme delay) and it's hard to change the number of queues without stopping the net
      device. With the help of rcu, there's also no need for tun_file hold an refcnt
      to tun_struct.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
      tuntap: move socket to tun_file · 54f968d6
      Jason Wang authored
      Current tuntap makes use of the socket receive queue as its tx queue. To
      implement multiple tx queues for tuntap and enable the ability of adding and
      removing queues during workload, the first step is to move the socket related
      structures to tun_file. Then we could let multiple fds/sockets to be attached to
      the tuntap.
      This patch removes tun_sock and moves socket related structures from tun_sock or
      tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
      still kept in tun_structure since they are used to filter packets for the net
      device instead of per transmit queue (at least I see no requirements for
      them). After those changes, socket were created and destroyed during file open
      and close (instead of device creation and destroy), the socket structures could
      be dereferenced from tun_file instead of the file of tun_struct structure
      For persisent device, since we purge during datching and wouldn't queue any
      packets when no interface were attached, there's no behaviod changes before and
      after this patch, so the changes were transparent to the userspace. To keep the
      attributes such as sndbuf, socket filter and vnet header, those would be
      re-initialize after a new interface were attached to an persist device.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jason Wang's avatar
  7. 26 Oct, 2012 2 commits
    • Daniel Wagner's avatar
      cgroup: net_cls: Rework update socket logic · 6a328d8c
      Daniel Wagner authored
      The cgroup logic part of net_cls is very similar as the one in
      net_prio. Let's stream line the net_cls logic with the net_prio one.
      The net_prio update logic was changed by following commit (note there
      were some changes necessary later on)
      commit 406a3c63
      Author: John Fastabend <john.r.fastabend@intel.com>
      Date:   Fri Jul 20 10:39:25 2012 +0000
          net: netprio_cgroup: rework update socket logic
          Instead of updating the sk_cgrp_prioidx struct field on every send
          this only updates the field when a task is moved via cgroup
          This allows sockets that may be used by a kernel worker thread
          to be managed. For example in the iscsi case today a user can
          put iscsid in a netprio cgroup and control traffic will be sent
          with the correct sk_cgrp_prioidx value set but as soon as data
          is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
          is updated with the kernel worker threads value which is the
          default case.
          It seems more correct to only update the field when the user
          explicitly sets it via control group infrastructure. This allows
          the users to manage sockets that may be used with other threads.
      Since classid is now updated when the task is moved between the
      cgroups, we don't have to call sock_update_classid() from various
      places to ensure we always using the latest classid value.
      [v2: Use iterate_fd() instead of open coding]
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc:  Li Zefan <lizefan@huawei.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniel Wagner's avatar
      cgroup: net_cls: Pass in task to sock_update_classid() · fd9a08a7
      Daniel Wagner authored
      sock_update_classid() assumes that the update operation always are
      applied on the current task. sock_update_classid() needs to know on
      which tasks to work on in order to be able to migrate task between
      cgroups using the struct cgroup_subsys attach() callback.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 14 Sep, 2012 1 commit
  9. 14 Aug, 2012 1 commit
  10. 09 Aug, 2012 1 commit
  11. 30 Jul, 2012 2 commits
  12. 22 Jul, 2012 2 commits
  13. 20 Jul, 2012 1 commit
    • Mikulas Patocka's avatar
      tun: fix a crash bug and a memory leak · b09e786b
      Mikulas Patocka authored
      This patch fixes a crash
      tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
      sock_release -> iput(SOCK_INODE(sock))
      introduced by commit 1ab5ecb9
      The problem is that this socket is embedded in struct tun_struct, it has
      no inode, iput is called on invalid inode, which modifies invalid memory
      and optionally causes a crash.
      sock_release also decrements sockets_in_use, this causes a bug that
      "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
      creating and closing tun devices.
      This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
      sock_release to not free the inode and not decrement sockets_in_use,
      fixing both memory corruption and sockets_in_use underflow.
      It should be backported to 3.3 an 3.4 stabke.
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  14. 16 Jul, 2012 1 commit
  15. 10 May, 2012 1 commit
    • Joe Perches's avatar
      drivers/net: Convert compare_ether_addr to ether_addr_equal · 2e42e474
      Joe Perches authored
      Use the new bool function ether_addr_equal to add
      some clarity and reduce the likelihood for misuse
      of compare_ether_addr for sorting.
      Done via cocci script:
      $ cat compare_ether_addr.cocci
      expression a,b;
      -	!compare_ether_addr(a, b)
      +	ether_addr_equal(a, b)
      expression a,b;
      -	compare_ether_addr(a, b)
      +	!ether_addr_equal(a, b)
      expression a,b;
      -	!ether_addr_equal(a, b) == 0
      +	ether_addr_equal(a, b)
      expression a,b;
      -	!ether_addr_equal(a, b) != 0
      +	!ether_addr_equal(a, b)
      expression a,b;
      -	ether_addr_equal(a, b) == 0
      +	!ether_addr_equal(a, b)
      expression a,b;
      -	ether_addr_equal(a, b) != 0
      +	ether_addr_equal(a, b)
      expression a,b;
      -	!!ether_addr_equal(a, b)
      +	ether_addr_equal(a, b)
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 28 Mar, 2012 1 commit
  17. 12 Mar, 2012 1 commit
    • Stanislav Kinsbursky's avatar
      tun: don't hold network namespace by tun sockets · 1ab5ecb9
      Stanislav Kinsbursky authored
      v3: added previously removed sock_put() to the tun_release() callback, because
      sk_release_kernel() doesn't drop the socket reference.
      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
      required for sk_release_kernel() ---> sock_release() ---> sock->ops->release()
      TUN was designed to destroy it's socket on network namesapce shutdown. But this
      will never happen for persistent device, because it's socket holds network
      This patch removes of holding network namespace by TUN socket and replaces it
      by creating socket in init_net and then changing it's net it to desired one. On
      shutdown socket is moved back to init_net prior to final put.
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  18. 15 Feb, 2012 1 commit
  19. 22 Nov, 2011 1 commit
  20. 16 Nov, 2011 2 commits
  21. 17 Aug, 2011 1 commit
  22. 27 Jul, 2011 1 commit
    • Neil Horman's avatar
      net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared · 550fd08c
      Neil Horman authored
      After the last patch, We are left in a state in which only drivers calling
      ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
      hardware call ether_setup for their net_devices and don't hold any state in
      their skbs.  There are a handful of drivers that violate this assumption of
      course, and need to be fixed up.  This patch identifies those drivers, and marks
      them as not being able to support the safe transmission of skbs by clearning the
      IFF_TX_SKB_SHARING flag in priv_flags
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Karsten Keil <isdn@linux-pingi.de>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Krzysztof Halasa <khc@pm.waw.pl>
      CC: "John W. Linville" <linville@tuxdriver.com>
      CC: Greg Kroah-Hartman <gregkh@suse.de>
      CC: Marcel Holtmann <marcel@holtmann.org>
      CC: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  23. 16 Jun, 2011 1 commit
    • Neil Horman's avatar
      tun: teach the tun/tap driver to support netpoll · bebd097a
      Neil Horman authored
      Commit 8d8fc29d changed the behavior of slave
      devices in regards to netpoll.  Specifically it created a mutually exclusive
      relationship between being a slave and a netpoll-capable device.  This creates
      problems for KVM because guests relied on needing netconsole active on a slave
      device to a bridge.  Ideally libvirtd could just attach netconsole to the bridge
      device instead, but thats currently infeasible, because while the bridge device
      supports netpoll, it requires that all slave interface also support it, but the
      tun/tap driver currently does not.  The most direct solution is to teach tun/tap
      to support netpoll, which is implemented by the patch below.
      I've not tested this yet, but its pretty straightforward.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarRik van Riel <riel@redhat.com>
      CC: Rik van Riel <riel@redhat.com>
      CC: Maxim Krasnyansky <maxk@qualcomm.com>
      CC: Cong Wang <amwang@redhat.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarWANG Cong <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@conan.davemloft.net>
  24. 11 Jun, 2011 1 commit
    • Jason Wang's avatar
      virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID · 10a8d94a
      Jason Wang authored
      There's no need for the guest to validate the checksum if it have been
      validated by host nics. So this patch introduces a new flag -
      VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
      examing in guest. The backend (tap/macvtap) may set this flag when
      met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.
      No feature negotiation is needed as old driver just ignore this flag.
      Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
      when gro is on no difference as it produces skb with partial
      checksum. But when gro is disabled, 20% or even higher improvement
      could be measured by netperf.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  25. 09 Jun, 2011 3 commits
  26. 05 Jun, 2011 1 commit
  27. 05 May, 2011 1 commit