- 22 Sep, 2014 1 commit
-
-
Florian Fainelli authored
Add an abstraction layer to suspend/resume switch devices, doing the following split: - suspend/resume the slave network devices and their corresponding PHY devices - suspend/resume the switch hardware using switch driver callbacks Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 19 Sep, 2014 5 commits
-
-
Sabrina Dubroca authored
ptr used to be a non __percpu pointer (result of a this_cpu_ptr assignment, 7d720c3e ("percpu: add __percpu sparse annotations to net")). Since d25398df ("net: avoid reloads in SNMP_UPD_PO_STATS"), that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr is now __percpu. Silence sparse warnings by preserving the original type and annotation, and remove the out-of-date comment. warning: incorrect type in initializer (different address spaces) expected unsigned long long *ptr got unsigned long long [noderef] <asn:3>*<noident> warning: incorrect type in initializer (different address spaces) expected void const [noderef] <asn:3>*__vpp_verify got unsigned long long *<noident> warning: incorrect type in initializer (different address spaces) expected void const [noderef] <asn:3>*__vpp_verify got unsigned long long *<noident> Signed-off-by:
Sabrina Dubroca <sd@queasysnail.net> Acked-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch changes IP tunnel to support (secondary) encapsulation, Foo-over-UDP. Changes include: 1) Adding tun_hlen as the tunnel header length, encap_hlen as the encapsulation header length, and hlen becomes the grand total of these. 2) Added common netlink define to support FOU encapsulation. 3) Routines to perform FOU encapsulation. Signed-off-by:
Tom Herbert <therbert@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Some switch drivers (e.g: bcm_sf2) may have to communicate specific workarounds or flags towards the PHY device driver. Allow switches driver to be delegated that task by introducing a get_phy_flags() callback which will do just that. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andy Zhou authored
Added a few more UDP tunnel APIs that can be shared by UDP based tunnel protocol implementation. The main ones are highlighted below. setup_udp_tunnel_sock() configures UDP listener socket for receiving UDP encapsulated packets. udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb using UDP encapsulation. udp_tunnel_sock_release() closes the UDP tunnel listener socket. Signed-off-by:
Andy Zhou <azhou@nicira.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andy Zhou authored
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs in udp_tunnel.c Signed-off-by:
Andy Zhou <azhou@nicira.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 15 Sep, 2014 2 commits
-
-
Alexander Duyck authored
This change makes it so that instead of passing and storing a mii_bus we instead pass and store a host_dev. From there we can test to determine the exact type of device, and can verify it is the correct device for our switch. So for example it would be possible to pass a device pointer from a pci_dev and instead of checking for a PHY ID we could check for a vendor and/or device ID. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This change addresses several issues. First, it was possible to set tag_protocol without setting the ops pointer. To correct that I have reordered things so that rcv is now populated before we set tag_protocol. Second, it didn't make much sense to keep setting the device ops each time a new slave was registered. So by moving the receive portion out into root switch initialization that issue should be addressed. Third, I wanted to avoid sending tags if the rcv pointer was not registered so I changed the tag check to verify if the rcv function pointer is set on the root tree. If it is then we start sending DSA tagged frames. Finally I split the device ops pointer in the structures into two spots. I placed the rcv function pointer in the root switch since this makes it easiest to access from there, and I placed the xmit function pointer in the slave for the same reason. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 13 Sep, 2014 6 commits
-
-
Florian Fainelli authored
Now that we introduced an additional multiplexing/demultiplexing layer with commit 3e8a72d1 ("net: dsa: reduce number of protocol hooks") that lives within the DSA code, we no longer need to have a given switch driver tag_protocol be an actual ethertype value, instead, we can replace it with an enum: dsa_tag_protocol. Do this replacement in the drivers, which allows us to get rid of the cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not need it anymore. Suggested-by:
Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
WANG Cong authored
Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc() to reflect this change. Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 09 Sep, 2014 6 commits
-
-
Willem de Bruijn authored
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly in this common case. Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vincent Bernat authored
net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by:
Vincent Bernat <vincent@bernat.im> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Arturo Borrero authored
The nft_masq expression is intended to perform NAT in the masquerade flavour. We decided to have the masquerade functionality in a separated expression other than nft_nat. Signed-off-by:
Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Arturo Borrero authored
Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv6; a similar patch exists to handle IPv4. Signed-off-by:
Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Arturo Borrero authored
Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv4; a similar patch follows to handle IPv6. Signed-off-by:
Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Move the specific NAT IPv6 core functions that are called from the hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 08 Sep, 2014 1 commit
-
-
Willem de Bruijn authored
inetpeer sequence numbers are no longer incremented, so no need to check and flush the tree. The function that increments the sequence number was already dead code and removed in in "ipv4: remove unused function" (068a6e18). Remove the code that checks for a change, too. Verifying that v4_seq and v6_seq are never incremented and thus that flush_check compares bp->flush_seq to 0 is trivial. The second part of the change removes flush_check completely even though bp->flush_seq is exactly !0 once, at initialization. This change is correct because the time this branch is true is when bp->root == peer_avl_empty_rcu, in which the branch and inetpeer_invalidate_tree are a NOOP. Signed-off-by:
Willem de Bruijn <willemb@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 05 Sep, 2014 9 commits
-
-
Eric Dumazet authored
After commit 740b0f18 ("tcp: switch rtt estimations to usec resolution"), we no longer need to maintain timestamps in two different fields. TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP_SKB_CB(skb)->when has different meaning in output and input paths. In output path, it contains a timestamp. In input path, it contains an ISN, chosen by tcp_timewait_state_process() Lets add a different name to ease code comprehension. Note that 'when' field will disappear in following patch, as skb_mstamp already contains timestamp, the anonymous union will promptly disappear as well. Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This patch updates some of the flow_dissector api so that it can be used to parse the length of ethernet buffers stored in fragments. Most of the changes needed were to __skb_get_poff as it needed to be updated to support sending a linear buffer instead of a skb. I have split __skb_get_poff into two functions, the first is skb_get_poff and it retains the functionality of the original __skb_get_poff. The other function is __skb_get_poff which now works much like __skb_flow_dissect in relation to skb_flow_dissect in that it provides the same functionality but works with just a data buffer and hlen instead of needing an skb. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
Since sock_efree and sock_demux are essentially the same code for non-TCP sockets and the case where CONFIG_INET is not defined we can combine the code or replace the call to sock_edemux in several spots. As a result we can avoid a bit of unnecessary code or code duplication. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
The phy timestamping takes a different path than the regular timestamping does in that it will create a clone first so that the packets needing to be timestamped can be placed in a queue, or the context block could be used. In order to support these use cases I am pulling the core of the code out so it can be used in other drivers beyond just phy devices. In addition I have added a destructor named sock_efree which is meant to provide a simple way for dropping the reference to skb exceptions that aren't part of either the receive or send windows for the socket, and I have removed some duplication in spots where this destructor could be used in place of sock_edemux. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Lets make this hash function a bit secure, as ICMP attacks are still in the wild. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Masanari Iida authored
This patch fix spelling typo found in DocBook/networking.xml. It is because the neworking.xml is generated from comments in the source, I have to fix typo in comments within the source. Signed-off-by:
Masanari Iida <standby24x7@gmail.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
nh_exceptions is effectively used under rcu, but lacks proper barriers. Between kzalloc() and setting of nh->nh_exceptions(), we need a proper memory barrier. Signed-off-by:
Eric Dumazet <edumazet@google.com> Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
The timestamping API has separate bits for generating and reporting timestamps. A software timestamp should only be reported for a packet when the packet has the relevant generation flag (SKBTX_..) set and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set. The second check was accidentally removed. Reinstitute the original behavior. Tested: Without this patch, Documentation/networking/txtimestamp reports timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set. After the patch, it only reports them when the flag is set. Fixes: f24b9be5 ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct") Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 04 Sep, 2014 1 commit
-
-
Hannes Frederic Sowa authored
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by:
Flavio Leitner <fbl@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 02 Sep, 2014 1 commit
-
-
Pablo Neira Ayuso authored
Move the specific NAT IPv4 core functions that are called from the hooks from iptable_nat.c to nf_nat_l3proto_ipv4.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 01 Sep, 2014 1 commit
-
-
Willem de Bruijn authored
sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 29 Aug, 2014 1 commit
-
-
Daniel Borkmann authored
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp tree, the official <netinet/sctp.h> header carries a copy of enum sctp_sstat_state that looks like (compared to the current in-kernel enumeration): User definition: Kernel definition: enum sctp_sstat_state { typedef enum { SCTP_EMPTY = 0, <removed> SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0, SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1, SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2, SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3, SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4, SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5, SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6, SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7, }; } sctp_state_t; This header was later on also placed into the uapi, so that user space programs can compile without having <netinet/sctp.h>, but the shipped with <linux/sctp.h> instead. While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we nevertheless have a what it appears to be dummy SCTP_EMPTY state from the very early days. While it seems to do just nothing, commit 0b8f9e25 ("sctp: remove completely unsed EMPTY state") did the right thing and removed this dead code. That however, causes an off-by-one when the user asks the SCTP stack via SCTP_STATUS API and checks for the current socket state thus yielding possibly undefined behaviour in applications as they expect the kernel to tell the right thing. The enumeration had to be changed however as based on the current socket state, we access a function pointer lookup-table through this. Therefore, I think the best way to deal with this is just to add a helper function sctp_assoc_to_state() to encapsulate the off-by-one quirk. Reported-by:
Tristan Su <sooqing@gmail.com> Fixes: 0b8f9e25 ("sctp: remove completely unsed EMPTY state") Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Vlad Yasevich <vyasevich@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 27 Aug, 2014 6 commits
-
-
Florian Fainelli authored
Add support for the 4-bytes Broadcom tag that built-in switches such as the Starfighter 2 might insert when receiving packets, or that we need to insert while targetting specific switch ports. We use a fake local EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure we can assign DSA-specific network operations within the DSA drivers. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Allow switch drivers to hook a PHY link update callback to perform port-specific link work. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Whenever libphy determines that the link status of a given PHY/port has changed, allow to call into the switch driver link adjustment callback so proper actions can be taken care of by the switch driver upon link notification. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
In case switch port tagging is disabled (voluntarily, or the switch just does not support it), allow us to continue using the defined set of dsa_device_ops in net/dsa/slave.c. We introduce dsa_protocol_is_tagged() to check whether we need to override skb->protocol and go through the DSA-specifif packet_type function, or if we just go on and receive the SKB through the normal path. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Modify the DSA slave interface to be bound to an arbitray PHY, not just the ones that are available as child PHY devices of the switch MDIO bus. This allows us for instance to have external PHYs connected to a separate MDIO bus, but yet also connected to a given switch port. Under certain configurations, the physical port mask might not be a 1:1 mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the switch is used and connects to a PHY at a MDIO address different than 1. Introduce a phys_mii_mask variable which allows driver to implement and divert their own MDIO read/writes operations for a subset of the MDIO PHY addresses. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
We will later use the per-port device_node pointer to fetch a bunch of port-specific properties. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-