1. 12 Mar, 2015 1 commit
    • Eric W. Biederman's avatar
      net: Introduce possible_net_t · 0c5c9fb5
      Eric W. Biederman authored
      Having to say
      > #ifdef CONFIG_NET_NS
      > 	struct net *net;
      > #endif
      in structures is a little bit wordy and a little bit error prone.
      Instead it is possible to say:
      > typedef struct {
      > #ifdef CONFIG_NET_NS
      >       struct net *net;
      > #endif
      > } possible_net_t;
      And then in a header say:
      > 	possible_net_t net;
      Which is cleaner and easier to use and easier to test, as the
      possible_net_t is always there no matter what the compile options.
      Further this allows read_pnet and write_pnet to be functions in all
      cases which is better at catching typos.
      This change adds possible_net_t, updates the definitions of read_pnet
      and write_pnet, updates optional struct net * variables that
      write_pnet uses on to have the type possible_net_t, and finally fixes
      up the b0rked users of read_pnet and write_pnet.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 09 Mar, 2015 1 commit
    • Francesco Ruggeri's avatar
      net: delete stale packet_mclist entries · 82f17091
      Francesco Ruggeri authored
      When an interface is deleted from a net namespace the ifindex in the
      corresponding entries in PF_PACKET sockets' mclists becomes stale.
      This can create inconsistencies if later an interface with the same ifindex
      is moved from a different namespace (not that unlikely since ifindexes are
      In particular we saw problems with dev->promiscuity, resulting
      in "promiscuity touches roof, set promiscuity failed. promiscuity
      feature of device might be broken" warnings and EOVERFLOW failures of
      This patch deletes the mclist entries for interfaces that are deleted.
      Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with
      EADDRNOTAVAIL if called after the interface is deleted, also make
      packet_mc_drop not fail.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 02 Mar, 2015 1 commit
  4. 01 Mar, 2015 3 commits
  5. 24 Feb, 2015 1 commit
    • Alexander Drozdov's avatar
      af_packet: don't pass empty blocks for PACKET_V3 · 41a50d62
      Alexander Drozdov authored
      Before da413eec ("packet: Fixed TPACKET V3 to signal poll when block is
      closed rather than every packet") poll listening for an af_packet socket was
      not signaled if there was no packets to process. After the patch poll is
      signaled evety time when block retire timer expires. That happens because
      af_packet closes the current block on timeout even if the block is empty.
      Passing empty blocks to the user not only wastes CPU but also wastes ring
      buffer space increasing probability of packets dropping on small timeouts.
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Cc: Dan Collins <dan@dcollins.co.nz>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Guy Harris <guy@alum.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  6. 21 Feb, 2015 1 commit
  7. 17 Jan, 2015 1 commit
    • Johannes Berg's avatar
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg authored
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      This makes the very common pattern of
        if (genlmsg_end(...) < 0) { ... }
      be a whole bunch of dead code. Many places also simply do
        return nlmsg_end(...);
      and the caller is expected to deal with it.
      This also commonly (at least for me) causes errors, because it is very
      common to write
        if (my_function(...))
          /* error condition */
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 13 Jan, 2015 1 commit
  9. 12 Jan, 2015 1 commit
  10. 11 Jan, 2015 1 commit
  11. 22 Dec, 2014 1 commit
    • Dan Collins's avatar
      packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet · da413eec
      Dan Collins authored
      Make TPACKET_V3 signal poll when block is closed rather than for every
      packet. Side effect is that poll will be signaled when block retire
      timer expires which didn't previously happen. Issue was visible when
      sending packets at a very low frequency such that all blocks are retired
      before packets are received by TPACKET_V3. This caused avoidable packet
      loss. The fix ensures that the signal is sent when blocks are closed
      which covers the normal path where the block is filled as well as the
      path where the timer expires. The case where a block is filled without
      moving to the next block (ie. all blocks are full) will still cause poll
      to be signaled.
      Signed-off-by: default avatarDan Collins <dan@dcollins.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 09 Dec, 2014 2 commits
  13. 24 Nov, 2014 4 commits
  14. 21 Nov, 2014 1 commit
  15. 05 Nov, 2014 1 commit
    • David S. Miller's avatar
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller authored
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      Having a helper like this means there will be less places to touch
      during that transformation.
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 01 Sep, 2014 2 commits
  17. 29 Aug, 2014 1 commit
  18. 25 Aug, 2014 1 commit
  19. 21 Aug, 2014 1 commit
  20. 29 Jul, 2014 1 commit
  21. 15 Jul, 2014 1 commit
  22. 24 Apr, 2014 2 commits
  23. 22 Apr, 2014 1 commit
  24. 11 Apr, 2014 1 commit
    • David S. Miller's avatar
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller authored
      Several spots in the kernel perform a sequence like:
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  25. 03 Apr, 2014 2 commits
  26. 28 Mar, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: respect devices with LLTX flag in direct xmit · 43279500
      Daniel Borkmann authored
      Quite often it can be useful to test with dummy or similar
      devices as a blackhole sink for skbs. Such devices are only
      equipped with a single txq, but marked as NETIF_F_LLTX as
      they do not require locking their internal queues on xmit
      (or implement locking themselves). Therefore, rather use
      HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.
      trafgen mmap/TX_RING example against dummy device with config
      foo: { fill(0xff, 64) } results in the following performance
      improvements for such scenarios on an ordinary Core i7/2.80GHz:
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
         160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
         293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
             192,501,104 branch-misses:k                                               ( +-  1.63% )
                     831 context-switches:k                                            ( +-  9.18% )
                       7 cpu-migrations:k                                              ( +-  7.40% )
                  69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
             671,552,021 cache-references:k                                            ( +-  1.29% )
            22.856401569 seconds time elapsed                                          ( +-  0.33% )
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
         133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
         145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
              59,867,100 branch-misses:k                                               ( +-  4.72% )
                     384 context-switches:k                                            ( +-  3.76% )
                       6 cpu-migrations:k                                              ( +-  6.28% )
                  70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
              90,879,408 cache-references:k                                            ( +-  1.35% )
            11.719372413 seconds time elapsed                                          ( +-  0.24% )
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  27. 26 Mar, 2014 1 commit
  28. 28 Feb, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: allow to transmit +4 byte in TX_RING slot for VLAN case · 52f1454f
      Daniel Borkmann authored
      Commit 57f89bfa ("network: Allow af_packet to transmit +4 bytes
      for VLAN packets.") added the possibility for non-mmaped frames to
      send extra 4 byte for VLAN header so the MTU increases from 1500 to
      1504 byte, for example.
      Commit cbd89acb ("af_packet: fix for sending VLAN frames via
      packet_mmap") attempted to fix that for the mmap part but was
      reverted as it caused regressions while using eth_type_trans()
      on output path.
      Lets just act analogous to 57f89bfa and add a similar logic
      to TX_RING. We presume size_max as overcharged with +4 bytes and
      later on after skb has been built by tpacket_fill_skb() check
      for ETH_P_8021Q header on packets larger than normal MTU. Can
      be easily reproduced with a slightly modified trafgen in mmap(2)
      mode, test cases:
       { fill(0xff, 12) const16(0x8100) fill(0xff, <1504|1505>) }
       { fill(0xff, 12) const16(0x0806) fill(0xff, <1500|1501>) }
      Note that we need to do the test right after tpacket_fill_skb()
      as sockets can have PACKET_LOSS set where we would not fail but
      instead just continue to traverse the ring.
      Reported-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Phil Sutter <phil@nwl.cc>
      Tested-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  29. 18 Feb, 2014 1 commit
  30. 16 Feb, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: check for ndo_select_queue during queue selection · 0fd5d57b
      Daniel Borkmann authored
      Mathias reported that on an AMD Geode LX embedded board (ALiX)
      with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
      d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket
      option"), triggers a WARN_ON() coming from the driver itself
      via 066dae93 ("ath9k: rework tx queue selection and fix
      queue stopping/waking").
      The reason why this happened is that ndo_select_queue() call
      is not invoked from direct xmit path i.e. for ieee80211 subsystem
      that sets queue and TID (similar to 802.1d tag) which is being
      put into the frame through 802.11e (WMM, QoS). If that is not
      set, pending frame counter for e.g. ath9k can get messed up.
      So the WARN_ON() in ath9k is absolutely legitimate. Generally,
      the hw queue selection in ieee80211 depends on the type of
      traffic, and priorities are set according to ieee80211_ac_numbers
      mapping; working in a similar way as DiffServ only on a lower
      layer, so that the AP can favour frames that have "real-time"
      requirements like voice or video data frames.
      Therefore, check for presence of ndo_select_queue() in netdev
      ops and, if available, invoke it with a fallback handler to
      __packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
      or mlx4 can still select a hw queue for transmission in
      relation to the current CPU while e.g. ieee80211 subsystem
      can make their own choices.
      Reported-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  31. 22 Jan, 2014 1 commit