1. 29 Sep, 2015 2 commits
    • Ayala Beker's avatar
      cfg80211: allow changing station capabilities for unassociated stations · 47edb11b
      Ayala Beker authored
      Currently, cfg80211 rejects capability updates for existing entries
      and as a result it's impossible to update entries that were added
      unassociated, but that is necessary to go through the full station
      states from userspace, adding a station before authentication etc.
      Fix this by allowing updates to capabilities for stations that the
      driver (or mac80211) assigned unassociated state. Drivers setting
      the full station state support flag must use the new station type
      for proper operation.
      Signed-off-by: default avatarAyala Beker <ayala.beker@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
    • Helmut Schaa's avatar
      mac80211: Copy tx'ed beacons to monitor mode · 35afa588
      Helmut Schaa authored
      When debugging wireless powersave issues on the AP side it's quite helpful
      to see our own beacons that are transmitted by the hardware/driver. However,
      this is not that easy since beacons don't pass through the regular TX queues.
      Preferably drivers would call ieee80211_tx_status also for tx'ed beacons
      but that's not always possible. Hence, just send a copy of each beacon
      generated by ieee80211_beacon_get_tim to monitor devices when they are
      getting fetched by the driver.
      Also add a HW flag IEEE80211_HW_BEACON_TX_STATUS that can be used by
      drivers to indicate that they report TX status for beacons.
      Signed-off-by: default avatarHelmut Schaa <helmut.schaa@googlemail.com>
      (with a fix from Christian Lamparted rolled in)
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
  2. 22 Sep, 2015 5 commits
  3. 26 Aug, 2015 2 commits
  4. 25 Aug, 2015 4 commits
    • Jiri Benc's avatar
      vxlan: fix multiple inclusion of vxlan.h · 48e92c44
      Jiri Benc authored
      The vxlan_get_sk_family inline function was added after the last #endif,
      making multiple inclusion of net/vxlan.h fail. Move it to the proper place.
      Reported-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Fixes: 705cc62f ("vxlan: provide access function for vxlan socket address family")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      inetpeer: remove dead code · 2c0027cd
      David Ahern authored
      Remove various inlined functions not referenced in the kernel.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      tcp: refine pacing rate determination · 43e122b0
      Eric Dumazet authored
      When TCP pacing was added back in linux-3.12, we chose
      to apply a fixed ratio of 200 % against current rate,
      to allow probing for optimal throughput even during
      slow start phase, where cwnd can be doubled every other gRTT.
      At Google, we found it was better applying a different ratio
      while in Congestion Avoidance phase.
      This ratio was set to 120 %.
      We've used the normal tcp_in_slow_start() helper for a while,
      then tuned the condition to select the conservative ratio
      as soon as cwnd >= ssthresh/2 :
      - After cwnd reduction, it is safer to ramp up more slowly,
        as we approach optimal cwnd.
      - Initial ramp up (ssthresh == INFINITY) still allows doubling
        cwnd every other RTT.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      tcp: fix slow start after idle vs TSO/GSO · 6f021c62
      Eric Dumazet authored
      slow start after idle might reduce cwnd, but we perform this
      after first packet was cooked and sent.
      With TSO/GSO, it means that we might send a full TSO packet
      even if cwnd should have been reduced to IW10.
      Moving the SSAI check in skb_entail() makes sense, because
      we slightly reduce number of times this check is done,
      especially for large send() and TCP Small queue callbacks from
      softirq context.
      As Neal pointed out, we also need to perform the check
      if/when receive window opens.
      Following packetdrill test demonstrates the problem
      // Test of slow start after idle
      `sysctl -q net.ipv4.tcp_slow_start_after_idle=1`
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      +0    < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.100 < . 1:1(0) ack 1 win 511
      +0    accept(3, ..., ...) = 4
      +0    setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0
      +0    write(4, ..., 26000) = 26000
      +0    > . 1:5001(5000) ack 1
      +0    > . 5001:10001(5000) ack 1
      +0    %{ assert tcpi_snd_cwnd == 10 }%
      +.100 < . 1:1(0) ack 10001 win 511
      +0    %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
      +0    > . 10001:20001(10000) ack 1
      +0    > P. 20001:26001(6000) ack 1
      +.100 < . 1:1(0) ack 26001 win 511
      +0    %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%
      +4 write(4, ..., 20000) = 20000
      // If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
      +0    > . 26001:31001(5000) ack 1
      +0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
      +0    > . 31001:36001(5000) ack 1
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 24 Aug, 2015 1 commit
  6. 23 Aug, 2015 2 commits
  7. 20 Aug, 2015 12 commits
  8. 19 Aug, 2015 2 commits
    • Nikolay Aleksandrov's avatar
      vrf: vrf_master_ifindex_rcu is not always called with rcu read lock · 18041e31
      Nikolay Aleksandrov authored
      While running net-next I hit this:
      [  634.073119] ===============================
      [  634.073150] [ INFO: suspicious RCU usage. ]
      [  634.073182] 4.2.0-rc6+ #45 Not tainted
      [  634.073213] -------------------------------
      [  634.073244] include/net/vrf.h:38 suspicious rcu_dereference_check()
      [  634.073274]
                     other info that might help us debug this:
      [  634.073307]
                     rcu_scheduler_active = 1, debug_locks = 1
      [  634.073338] 2 locks held by swapper/0/0:
      [  634.073369]  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8112bc35>]
      [  634.073412]  #1:  (slock-AF_INET){+.-...}, at: [<ffffffff8174f0f5>]
      [  634.073450]
                     stack backtrace:
      [  634.073483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6+ #45
      [  634.073514] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [  634.073545]  0000000000000000 0593ba8242d9ace4 ffff88002fc03b48
      [  634.073612]  0000000000000000 ffffffff81e12500 ffff88002fc03b78
      [  634.073642]  0000000000000000 ffff88002ec4e600 ffffffff81f00f80
      [  634.073669] Call Trace:
      [  634.073694]  <IRQ>  [<ffffffff81803f1b>] dump_stack+0x4c/0x65
      [  634.073728]  [<ffffffff811003c5>] lockdep_rcu_suspicious+0xc5/0x100
      [  634.073763]  [<ffffffff8174eb56>] icmp_route_lookup+0x176/0x5c0
      [  634.073793]  [<ffffffff8174f2fb>] ? icmp_send+0x35b/0x5f0
      [  634.073818]  [<ffffffff8174f274>] ? icmp_send+0x2d4/0x5f0
      [  634.073844]  [<ffffffff8174f3ce>] icmp_send+0x42e/0x5f0
      [  634.073873]  [<ffffffff8170b662>] ipv4_link_failure+0x22/0xa0
      [  634.073899]  [<ffffffff8174bdda>] arp_error_report+0x3a/0x80
      [  634.073926]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
      [  634.073952]  [<ffffffff816d396e>] neigh_invalidate+0x8e/0x110
      [  634.073984]  [<ffffffff816d62ae>] neigh_timer_handler+0x1ae/0x290
      [  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
      [  634.074013]  [<ffffffff8112bce3>] call_timer_fn+0xb3/0x480
      [  634.074013]  [<ffffffff8112bc35>] ? call_timer_fn+0x5/0x480
      [  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
      [  634.074013]  [<ffffffff8112c2bc>] run_timer_softirq+0x20c/0x430
      [  634.074013]  [<ffffffff810af50e>] __do_softirq+0xde/0x630
      [  634.074013]  [<ffffffff810afc97>] irq_exit+0x117/0x120
      [  634.074013]  [<ffffffff81810976>] smp_apic_timer_interrupt+0x46/0x60
      [  634.074013]  [<ffffffff8180e950>] apic_timer_interrupt+0x70/0x80
      [  634.074013]  <EOI>  [<ffffffff8106b9d6>] ? native_safe_halt+0x6/0x10
      [  634.074013]  [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
      [  634.074013]  [<ffffffff81027d43>] default_idle+0x23/0x200
      [  634.074013]  [<ffffffff8102852f>] arch_cpu_idle+0xf/0x20
      [  634.074013]  [<ffffffff810f89ba>] default_idle_call+0x2a/0x40
      [  634.074013]  [<ffffffff810f8dcc>] cpu_startup_entry+0x39c/0x4c0
      [  634.074013]  [<ffffffff817f9cad>] rest_init+0x13d/0x150
      [  634.074013]  [<ffffffff81f69038>] start_kernel+0x4a8/0x4c9
      [  634.074013]  [<ffffffff81f68120>] ?
      [  634.074013]  [<ffffffff81f68339>] x86_64_start_reservations+0x2a/0x2c
      [  634.074013]  [<ffffffff81f68485>] x86_64_start_kernel+0x14a/0x16d
      It would seem vrf_master_ifindex_rcu() can be called without RCU held in
      other contexts as well so introduce a new helper which acquires rcu and
      returns the ifindex.
      Also add curly braces around both the "if" and "else" parts as per the
      style guide.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Ying Xue's avatar
      lwtunnel: Fix the sparse warnings in fib_encap_match · 824e7383
      Ying Xue authored
      When CONFIG_LWTUNNEL config is not enabled, the lwtstate_free() is not
      declared in lwtunnel.h at all. However, even in this case, the function
      is still referenced in fib_semantics.c so that there appears the
      following sparse warnings:
      net/ipv4/fib_semantics.c:553:17: error: undefined identifier 'lwtstate_free'
        CC      net/ipv4/fib_semantics.o
        net/ipv4/fib_semantics.c: In function ‘fib_encap_match’:
        net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function ‘lwtstate_free’ [-Werror=implicit-function-declaration]
        cc1: some warnings being treated as errors
        make[1]: *** [net/ipv4/fib_semantics.o] Error 1
        make: *** [net/ipv4/fib_semantics.o] Error 2
      To eliminate the error, we define an empty function for lwtstate_free()
      in lwtunnel.h when CONFIG_LWTUNNEL is disabled.
      Fixes: df383e62 ("lwtunnel: fix memory leak")
      Cc: Jiri Benc <jbenc@redhat.com>
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  9. 18 Aug, 2015 3 commits
  10. 17 Aug, 2015 6 commits
    • Tom Herbert's avatar
      net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff
      Tom Herbert authored
      This function updates a checksum field value and skb->csum based on
      a value which is the difference between the old and new checksum.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Tom Herbert's avatar
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert authored
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Tom Herbert's avatar
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert authored
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniel Borkmann's avatar
      netfilter: nf_conntrack: add efficient mark to zone mapping · 5e8018fc
      Daniel Borkmann authored
      This work adds the possibility of deriving the zone id from the skb->mark
      field in a scalable manner. This allows for having only a single template
      serving hundreds/thousands of different zones, for example, instead of the
      need to have one match for each zone as an extra CT jump target.
      Note that we'd need to have this information attached to the template as at
      the time when we're trying to lookup a possible ct object, we already need
      to know zone information for a possible match when going into
      __nf_conntrack_find_get(). This work provides a minimal implementation for
      a possible mapping.
      In order to not add/expose an extra ct->status bit, the zone structure has
      been extended to carry a flag for deriving the mark.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Daniel Borkmann's avatar
      netfilter: nf_conntrack: add direction support for zones · deedb590
      Daniel Borkmann authored
      This work adds a direction parameter to netfilter zones, so identity
      separation can be performed only in original/reply or both directions
      (default). This basically opens up the possibility of doing NAT with
      conflicting IP address/port tuples from multiple, isolated tenants
      on a host (e.g. from a netns) without requiring each tenant to NAT
      twice resp. to use its own dedicated IP address to SNAT to, meaning
      overlapping tuples can be made unique with the zone identifier in
      original direction, where the NAT engine will then allocate a unique
      tuple in the commonly shared default zone for the reply direction.
      In some restricted, local DNAT cases, also port redirection could be
      used for making the reply traffic unique w/o requiring SNAT.
      The consensus we've reached and discussed at NFWS and since the initial
      implementation [1] was to directly integrate the direction meta data
      into the existing zones infrastructure, as opposed to the ct->mark
      approach we proposed initially.
      As we pass the nf_conntrack_zone object directly around, we don't have
      to touch all call-sites, but only those, that contain equality checks
      of zones. Thus, based on the current direction (original or reply),
      we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
      CT expectations are direction-agnostic entities when expectations are
      being compared among themselves, so we can only use the identifier
      in this case.
      Note that zone identifiers can not be included into the hash mix
      anymore as they don't contain a "stable" value that would be equal
      for both directions at all times, f.e. if only zone->id would
      unconditionally be xor'ed into the table slot hash, then replies won't
      find the corresponding conntracking entry anymore.
      If no particular direction is specified when configuring zones, the
      behaviour is exactly as we expect currently (both directions).
      Support has been added for the CT netlink interface as well as the
      x_tables raw CT target, which both already offer existing interfaces
      to user space for the configuration of zones.
      Below a minimal, simplified collision example (script in [2]) with
      netperf sessions:
        +--- tenant-1 ---+   mark := 1
        |    netperf     |--+
        +----------------+  |                CT zone := mark [ORIGINAL]
         [ip,sport] := X   +--------------+  +--- gateway ---+
                           | mark routing |--|     SNAT      |-- ... +
                           +--------------+  +---------------+       |
        +--- tenant-2 ---+  |                                     ~~~|~~~
        |    netperf     |--+                +-----------+           |
        +----------------+   mark := 2       | netserver |------ ... +
         [ip,sport] := X                     +-----------+
                                              [ip,port] := Y
      On the gateway netns, example:
        iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
        iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
        iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
        iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
      conntrack dump from gateway netns:
        netperf -H -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
        tcp 6 431995 ESTABLISHED src= dst= sport=5555 dport=12865 zone-orig=1
                                 src= dst= sport=12865 dport=1024
                     [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
        tcp 6 431994 ESTABLISHED src= dst= sport=5555 dport=12865 zone-orig=2
                                 src= dst= sport=12865 dport=5555
                     [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
        tcp 6 299 ESTABLISHED src= dst= sport=39438 dport=33768 zone-orig=1
                              src= dst= sport=33768 dport=39438
                     [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
        tcp 6 300 ESTABLISHED src= dst= sport=32889 dport=40206 zone-orig=2
                              src= dst= sport=40206 dport=32889
                     [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
      Taking this further, test script in [2] creates 200 tenants and runs
      original-tuple colliding netperf sessions each. A conntrack -L dump in
      the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
      state as expected.
      I also did run various other tests with some permutations of the script,
      to mention some: SNAT in random/random-fully/persistent mode, no zones (no
      overlaps), static zones (original, reply, both directions), etc.
        [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
        [2] https://paste.fedoraproject.org/242835/65657871/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • David Ahern's avatar
      inet: Move VRF table lookup to inlined function · dc028da5
      David Ahern authored
      Table lookup compiles out when VRF is not enabled.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  11. 13 Aug, 2015 1 commit
    • David Ahern's avatar
      net: Fix up inet_addr_type checks · 30bbaa19
      David Ahern authored
      Currently inet_addr_type and inet_dev_addr_type expect local addresses
      to be in the local table. With the VRF device local routes for devices
      associated with a VRF will be in the table associated with the VRF.
      Provide an alternate inet_addr lookup to use a specific table rather
      than defaulting to the local table.
      inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
      if the passed in device is enslaved to a VRF then the table for that VRF
      is used for the lookup.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>