- 29 Sep, 2015 2 commits
-
-
Ayala Beker authored
Currently, cfg80211 rejects capability updates for existing entries and as a result it's impossible to update entries that were added unassociated, but that is necessary to go through the full station states from userspace, adding a station before authentication etc. Fix this by allowing updates to capabilities for stations that the driver (or mac80211) assigned unassociated state. Drivers setting the full station state support flag must use the new station type for proper operation. Signed-off-by:
Ayala Beker <ayala.beker@intel.com> Signed-off-by:
Luca Coelho <luciano.coelho@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Helmut Schaa authored
When debugging wireless powersave issues on the AP side it's quite helpful to see our own beacons that are transmitted by the hardware/driver. However, this is not that easy since beacons don't pass through the regular TX queues. Preferably drivers would call ieee80211_tx_status also for tx'ed beacons but that's not always possible. Hence, just send a copy of each beacon generated by ieee80211_beacon_get_tim to monitor devices when they are getting fetched by the driver. Also add a HW flag IEEE80211_HW_BEACON_TX_STATUS that can be used by drivers to indicate that they report TX status for beacons. Signed-off-by:
Helmut Schaa <helmut.schaa@googlemail.com> (with a fix from Christian Lamparted rolled in) Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- 22 Sep, 2015 5 commits
-
-
Johannes Berg authored
This reverts commit f9a060f4. No driver has turned up needing this functionality, and I've just implemented the functionality I wanted this for in a different way. Thus, remove it again, until somebody shows up with a need for having it. Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Emmanuel Grumbach authored
Drivers may be interested in receiving A-MSDU within A-MDPU. Not all the devices may be able to do so, make it configurable. Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Emmanuel Grumbach authored
Advertise the capability to send A-MSDU within A-MPDU in the AddBA request sent by mac80211. Let the driver know about the peer's capabilities. Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Andrei Otcheretianski authored
Currently the cfg80211's frame registration api receives wdev, however mac80211 assumes per device filter configuration and ignores wdev. Per device filtering is too wasteful, especially for multi-channel devices. Introduce new per vif frame registration API and use it for probe request registrations in ieee80211_mgmt_frame_register() Also call directly to ieee80211_configure_filter instead of using a work since it is now allowed to sleep in ieee80211_mgmt_frame_register. Signed-off-by:
Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Johannes Berg authored
In order to transfer many items in vendor commands, support the dumpit netlink method for them. Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- 26 Aug, 2015 2 commits
-
-
Alexei Starovoitov authored
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing with extra care taken to free bpf programs after rcu grace period. Replacement of existing act_bpf (very rare) is done with synchronize_rcu() and final destruction is done from tc_action_ops->cleanup() callback that is called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when bind and refcnt reach zero which is only possible when classifier is destroyed. Previous two patches fixed the last two classifiers (tcindex and rsvp) to call tcf_exts_destroy() from rcu callback. Similar to gact/mirred there is a race between prog->filter and prog->tcf_action. Meaning that the program being replaced may use previous default action if it happened to return TC_ACT_UNSPEC. act_mirred race betwen tcf_action and tcfm_dev is similar. In all cases the race is harmless. Long term we may want to improve the situation by replacing the whole tc_action->priv as single pointer instead of updating inner fields one by one. Signed-off-by:
Alexei Starovoitov <ast@plumgrid.com> Acked-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
tcf_hash_destroy() used once. Make it static. Signed-off-by:
Alexei Starovoitov <ast@plumgrid.com> Acked-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 25 Aug, 2015 4 commits
-
-
Jiri Benc authored
The vxlan_get_sk_family inline function was added after the last #endif, making multiple inclusion of net/vxlan.h fail. Move it to the proper place. Reported-by:
Mark Rustad <mark.d.rustad@intel.com> Fixes: 705cc62f ("vxlan: provide access function for vxlan socket address family") Signed-off-by:
Jiri Benc <jbenc@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David Ahern authored
Remove various inlined functions not referenced in the kernel. Signed-off-by:
David Ahern <dsa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 24 Aug, 2015 1 commit
-
-
Tom Herbert authored
Add cfg and family arguments to lwt build state functions. cfg is a void pointer and will either be a pointer to a fib_config or fib6_config structure. The family parameter indicates which one (either AF_INET or AF_INET6). LWT encpasulation implementation may use the fib configuration to build the LWT state. Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 23 Aug, 2015 2 commits
-
-
Jiri Benc authored
__recnt and related fields need to be in its own cacheline for performance reasons. Commit 61adedf3 ("route: move lwtunnel state to dst_entry") broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered. This patch fixes the breakage by moving the lwtunnel state to the end of dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline with __refcnt and may affect performance, thus further patches may be needed. Reported-by:
kbuild test robot <fengguang.wu@intel.com> Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry") Signed-off-by:
Jiri Benc <jbenc@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel. Testing: Ran 200 netperf TCP_STREAM instance - With fix (GRO enabled on VXLAN interface) Verify GRO is happening. 9084 MBps tput 3.44% CPU utilization - Without fix (GRO disabled on VXLAN interface) Verified no GRO is happening. 9084 MBps tput 5.54% CPU utilization Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Aug, 2015 12 commits
-
-
Jiri Benc authored
Use flowi_tunnel in flowi6 similarly to what is done with IPv4. This complements commit 1b7179d3 ("route: Extend flow representation with tunnel key"). Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
If output device wants to see the dst, inherit the dst of the original skb in the ndisc request. This is an IPv6 counterpart of commit 0accfc26 ("arp: Inherit metadata dst when creating ARP requests"). Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible. As a bonus, this brings a nice diffstat. Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll be used with IPv6 tunnels, too. Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the newly introduced padding after the IPv4 addresses needs to be zeroed out. Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types. Unify it to the non-underscore variants. Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
The custom alignment of struct ip_tunnel_key is unnecessary. In struct sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the first field. The structure is also packed even without the __packed keyword. Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christophe Ricard authored
A proprietary vendor command may send back useful data to the user application. For example, the field level applied on the NFC router antenna. Still based on net/wireless/nl80211.c implementation, add nfc_vendor_cmd_alloc_reply_skb and nfc_vendor_cmd_reply in order to send back over netlink data generated by a proprietary command. Signed-off-by:
Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by:
Samuel Ortiz <sameo@linux.intel.com>
-
Robert Baldyga authored
Some drivers needs to have ability to reinit NCI core, for example after updating firmware in setup() of post_setup() callback. This patch makes nci_core_reset() and nci_core_init() functions public, to make it possible. Signed-off-by:
Robert Baldyga <r.baldyga@samsung.com> Signed-off-by:
Samuel Ortiz <sameo@linux.intel.com>
-
Robert Baldyga authored
Some drivers require non-standard configuration after NCI_CORE_INIT request, because they need to know ndev->manufact_specific_info or ndev->manufact_id. This patch adds post_setup handler allowing to do such custom configuration. Signed-off-by:
Robert Baldyga <r.baldyga@samsung.com> Signed-off-by:
Samuel Ortiz <sameo@linux.intel.com>
-
- 19 Aug, 2015 2 commits
-
-
Nikolay Aleksandrov authored
While running net-next I hit this: [ 634.073119] =============================== [ 634.073150] [ INFO: suspicious RCU usage. ] [ 634.073182] 4.2.0-rc6+ #45 Not tainted [ 634.073213] ------------------------------- [ 634.073244] include/net/vrf.h:38 suspicious rcu_dereference_check() usage! [ 634.073274] other info that might help us debug this: [ 634.073307] rcu_scheduler_active = 1, debug_locks = 1 [ 634.073338] 2 locks held by swapper/0/0: [ 634.073369] #0: (((&n->timer))){+.-...}, at: [<ffffffff8112bc35>] call_timer_fn+0x5/0x480 [ 634.073412] #1: (slock-AF_INET){+.-...}, at: [<ffffffff8174f0f5>] icmp_send+0x155/0x5f0 [ 634.073450] stack backtrace: [ 634.073483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6+ #45 [ 634.073514] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 634.073545] 0000000000000000 0593ba8242d9ace4 ffff88002fc03b48 ffffffff81803f1b [ 634.073612] 0000000000000000 ffffffff81e12500 ffff88002fc03b78 ffffffff811003c5 [ 634.073642] 0000000000000000 ffff88002ec4e600 ffffffff81f00f80 ffff88002fc03cf0 [ 634.073669] Call Trace: [ 634.073694] <IRQ> [<ffffffff81803f1b>] dump_stack+0x4c/0x65 [ 634.073728] [<ffffffff811003c5>] lockdep_rcu_suspicious+0xc5/0x100 [ 634.073763] [<ffffffff8174eb56>] icmp_route_lookup+0x176/0x5c0 [ 634.073793] [<ffffffff8174f2fb>] ? icmp_send+0x35b/0x5f0 [ 634.073818] [<ffffffff8174f274>] ? icmp_send+0x2d4/0x5f0 [ 634.073844] [<ffffffff8174f3ce>] icmp_send+0x42e/0x5f0 [ 634.073873] [<ffffffff8170b662>] ipv4_link_failure+0x22/0xa0 [ 634.073899] [<ffffffff8174bdda>] arp_error_report+0x3a/0x80 [ 634.073926] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.073952] [<ffffffff816d396e>] neigh_invalidate+0x8e/0x110 [ 634.073984] [<ffffffff816d62ae>] neigh_timer_handler+0x1ae/0x290 [ 634.074013] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.074013] [<ffffffff8112bce3>] call_timer_fn+0xb3/0x480 [ 634.074013] [<ffffffff8112bc35>] ? call_timer_fn+0x5/0x480 [ 634.074013] [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0 [ 634.074013] [<ffffffff8112c2bc>] run_timer_softirq+0x20c/0x430 [ 634.074013] [<ffffffff810af50e>] __do_softirq+0xde/0x630 [ 634.074013] [<ffffffff810afc97>] irq_exit+0x117/0x120 [ 634.074013] [<ffffffff81810976>] smp_apic_timer_interrupt+0x46/0x60 [ 634.074013] [<ffffffff8180e950>] apic_timer_interrupt+0x70/0x80 [ 634.074013] <EOI> [<ffffffff8106b9d6>] ? native_safe_halt+0x6/0x10 [ 634.074013] [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10 [ 634.074013] [<ffffffff81027d43>] default_idle+0x23/0x200 [ 634.074013] [<ffffffff8102852f>] arch_cpu_idle+0xf/0x20 [ 634.074013] [<ffffffff810f89ba>] default_idle_call+0x2a/0x40 [ 634.074013] [<ffffffff810f8dcc>] cpu_startup_entry+0x39c/0x4c0 [ 634.074013] [<ffffffff817f9cad>] rest_init+0x13d/0x150 [ 634.074013] [<ffffffff81f69038>] start_kernel+0x4a8/0x4c9 [ 634.074013] [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120 [ 634.074013] [<ffffffff81f68339>] x86_64_start_reservations+0x2a/0x2c [ 634.074013] [<ffffffff81f68485>] x86_64_start_kernel+0x14a/0x16d It would seem vrf_master_ifindex_rcu() can be called without RCU held in other contexts as well so introduce a new helper which acquires rcu and returns the ifindex. Also add curly braces around both the "if" and "else" parts as per the style guide. Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ying Xue authored
When CONFIG_LWTUNNEL config is not enabled, the lwtstate_free() is not declared in lwtunnel.h at all. However, even in this case, the function is still referenced in fib_semantics.c so that there appears the following sparse warnings: net/ipv4/fib_semantics.c:553:17: error: undefined identifier 'lwtstate_free' CC net/ipv4/fib_semantics.o net/ipv4/fib_semantics.c: In function ‘fib_encap_match’: net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function ‘lwtstate_free’ [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors make[1]: *** [net/ipv4/fib_semantics.o] Error 1 make: *** [net/ipv4/fib_semantics.o] Error 2 To eliminate the error, we define an empty function for lwtstate_free() in lwtunnel.h when CONFIG_LWTUNNEL is disabled. Fixes: df383e62 ("lwtunnel: fix memory leak") Cc: Jiri Benc <jbenc@redhat.com> Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Ying Xue <ying.xue@windriver.com> Acked-by:
Jiri Benc <jbenc@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 18 Aug, 2015 3 commits
-
-
Nikolay Aleksandrov authored
slave_queue has a num_slaves member which is unused, drop it. Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
The built lwtunnel_state struct has to be freed after comparison. Fixes: 571e7226 ("ipv4: support for fib route lwtunnel encap attributes") Signed-off-by:
Jiri Benc <jbenc@redhat.com> Acked-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Add an inline helper for determining is a port is a DSA port. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 17 Aug, 2015 6 commits
-
-
Tom Herbert authored
This function updates a checksum field value and skb->csum based on a value which is the difference between the old and new checksum. Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates the checksum field carries a pseudo header. This argument should be a boolean instead of an int. Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
This work adds the possibility of deriving the zone id from the skb->mark field in a scalable manner. This allows for having only a single template serving hundreds/thousands of different zones, for example, instead of the need to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. In order to not add/expose an extra ct->status bit, the zone structure has been extended to carry a flag for deriving the mark. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Daniel Borkmann authored
This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct->mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a "stable" value that would be equal for both directions at all times, f.e. if only zone->id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 | netperf |--+ +----------------+ | CT zone := mark [ORIGINAL] [ip,sport] := X +--------------+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--------------+ +---------------+ | +--- tenant-2 ---+ | ~~~|~~~ | netperf |--+ +-----------+ | +----------------+ mark := 2 | netserver |------ ... + [ip,sport] := X +-----------+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
David Ahern authored
Table lookup compiles out when VRF is not enabled. Signed-off-by:
David Ahern <dsa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 13 Aug, 2015 1 commit
-
-
David Ahern authored
Currently inet_addr_type and inet_dev_addr_type expect local addresses to be in the local table. With the VRF device local routes for devices associated with a VRF will be in the table associated with the VRF. Provide an alternate inet_addr lookup to use a specific table rather than defaulting to the local table. inet_addr_type_dev_table keeps the same semantics as inet_addr_type but if the passed in device is enslaved to a VRF then the table for that VRF is used for the lookup. Signed-off-by:
David Ahern <dsa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-