Commit 35a9ad8a authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge git://

Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git:// (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
parents d5935b07 64b1f00a
Driver for ARM AXI Bus with Broadcom Plugins (bcma)
Required properties:
- compatible : brcm,bus-axi
- reg : iomem address range of chipcommon core
The cores on the AXI bus are automatically detected by bcma with the
memory ranges they are using and they get registered afterwards.
The top-level axi bus may contain children representing attached cores
(devices). This is needed since some hardware details can't be auto
detected (e.g. IRQ numbers). Also some of the cores may be responsible
for extra things, e.g. ChipCommon providing access to the GPIO chip.
axi@18000000 {
compatible = "brcm,bus-axi";
reg = <0x18000000 0x1000>;
ranges = <0x00000000 0x18000000 0x00100000>;
#address-cells = <1>;
#size-cells = <1>;
chipcommon {
reg = <0x00000000 0x1000>;
#gpio-cells = <2>;
* Broadcom UniMAC MDIO bus controller
Required properties:
- compatible: should one from "brcm,genet-mdio-v1", "brcm,genet-mdio-v2",
"brcm,genet-mdio-v3", "brcm,genet-mdio-v4" or "brcm,unimac-mdio"
- reg: address and length of the regsiter set for the device, first one is the
base register, and the second one is optional and for indirect accesses to
larger than 16-bits MDIO transactions
- reg-names: name(s) of the register must be "mdio" and optional "mdio_indir_rw"
- #size-cells: must be 1
- #address-cells: must be 0
Optional properties:
- interrupts: must be one if the interrupt is shared with the Ethernet MAC or
Ethernet switch this MDIO block is integrated from, or must be two, if there
are two separate interrupts, first one must be "mdio done" and second must be
for "mdio error"
- interrupt-names: must be "mdio_done_error" when there is a share interrupt fed
to this hardware block, or must be "mdio_done" for the first interrupt and
"mdio_error" for the second when there are separate interrupts
Child nodes of this MDIO bus controller node are standard Ethernet PHY device
nodes as described in Documentation/devicetree/bindings/net/phy.txt
mdio@403c0 {
compatible = "brcm,unimac-mdio";
reg = <0x403c0 0x8 0x40300 0x18>;
reg-names = "mdio", "mdio_indir_rw";
#size-cells = <1>;
#address-cells = <0>;
phy@0 {
compatible = "ethernet-phy-ieee802.3-c22";
reg = <0>;
* Broadcom Starfighter 2 integrated swich
Required properties:
- compatible: should be "brcm,bcm7445-switch-v4.0"
- reg: addresses and length of the register sets for the device, must be 6
pairs of register addresses and lengths
- interrupts: interrupts for the devices, must be two interrupts
- dsa,mii-bus: phandle to the MDIO bus controller, see dsa/dsa.txt
- dsa,ethernet: phandle to the CPU network interface controller, see dsa/dsa.txt
- #size-cells: must be 0
- #address-cells: must be 2, see dsa/dsa.txt
The integrated switch subnode should be specified according to the binding
described in dsa/dsa.txt.
Optional properties:
- reg-names: litteral names for the device base register addresses, when present
must be: "core", "reg", "intrl2_0", "intrl2_1", "fcb", "acb"
- interrupt-names: litternal names for the device interrupt lines, when present
must be: "switch_0" and "switch_1"
- brcm,num-gphy: specify the maximum number of integrated gigabit PHYs in the
- brcm,num-rgmii-ports: specify the maximum number of RGMII interfaces supported
by the switch
- brcm,fcb-pause-override: boolean property, if present indicates that the switch
supports Failover Control Block pause override capability
- brcm,acb-packets-inflight: boolean property, if present indicates that the switch
Admission Control Block supports reporting the number of packets in-flight in a
switch queue
switch_top@f0b00000 {
compatible = "simple-bus";
#size-cells = <1>;
#address-cells = <1>;
ranges = <0 0xf0b00000 0x40804>;
ethernet_switch@0 {
compatible = "brcm,bcm7445-switch-v4.0";
#size-cells = <0>;
#address-cells = <2>;
reg = <0x0 0x40000
0x40000 0x110
0x40340 0x30
0x40380 0x30
0x40400 0x34
0x40600 0x208>;
interrupts = <0 0x18 0
0 0x19 0>;
brcm,num-gphy = <1>;
brcm,num-rgmii-ports = <2>;
switch@0 {
reg = <0 0>;
#size-cells = <0>;
#address-cells <1>;
port@0 {
label = "gphy";
reg = <0>;
Bosch MCAN controller Device Tree Bindings
Required properties:
- compatible : Should be "bosch,m_can" for M_CAN controllers
- reg : physical base address and size of the M_CAN
registers map and Message RAM
- reg-names : Should be "m_can" and "message_ram"
- interrupts : Should be the interrupt number of M_CAN interrupt
line 0 and line 1, could be same if sharing
the same interrupt.
- interrupt-names : Should contain "int0" and "int1"
- clocks : Clocks used by controller, should be host clock
and CAN clock.
- clock-names : Should contain "hclk" and "cclk"
- pinctrl-<n> : Pinctrl states as described in bindings/pinctrl/pinctrl-bindings.txt
- pinctrl-names : Names corresponding to the numbered pinctrl states
- bosch,mram-cfg : Message RAM configuration data.
Multiple M_CAN instances can share the same Message
RAM and each element(e.g Rx FIFO or Tx Buffer and etc)
number in Message RAM is also configurable,
so this property is telling driver how the shared or
private Message RAM are used by this M_CAN controller.
The format should be as follows:
<offset sidf_elems xidf_elems rxf0_elems rxf1_elems
rxb_elems txe_elems txb_elems>
The 'offset' is an address offset of the Message RAM
where the following elements start from. This is
usually set to 0x0 if you're using a private Message
RAM. The remain cells are used to specify how many
elements are used for each FIFO/Buffer.
M_CAN includes the following elements according to user manual:
11-bit Filter 0-128 elements / 0-128 words
29-bit Filter 0-64 elements / 0-128 words
Rx FIFO 0 0-64 elements / 0-1152 words
Rx FIFO 1 0-64 elements / 0-1152 words
Rx Buffers 0-64 elements / 0-1152 words
Tx Event FIFO 0-32 elements / 0-64 words
Tx Buffers 0-32 elements / 0-576 words
Please refer to 2.4.1 Message RAM Configuration in
Bosch M_CAN user manual for details.
SoC dtsi:
m_can1: can@020e8000 {
compatible = "bosch,m_can";
reg = <0x020e8000 0x4000>, <0x02298000 0x4000>;
reg-names = "m_can", "message_ram";
interrupts = <0 114 0x04>,
<0 114 0x04>;
interrupt-names = "int0", "int1";
clocks = <&clks IMX6SX_CLK_CANFD>,
clock-names = "hclk", "cclk";
bosch,mram-cfg = <0x0 0 0 32 0 0 0 1>;
status = "disabled";
Board dts:
&m_can1 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_m_can1>;
status = "enabled";
Renesas R-Car CAN controller Device Tree Bindings
Required properties:
- compatible: "renesas,can-r8a7778" if CAN controller is a part of R8A7778 SoC.
"renesas,can-r8a7779" if CAN controller is a part of R8A7779 SoC.
"renesas,can-r8a7790" if CAN controller is a part of R8A7790 SoC.
"renesas,can-r8a7791" if CAN controller is a part of R8A7791 SoC.
- reg: physical base address and size of the R-Car CAN register map.
- interrupts: interrupt specifier for the sole interrupt.
- clocks: phandles and clock specifiers for 3 CAN clock inputs.
- clock-names: 3 clock input name strings: "clkp1", "clkp2", "can_clk".
- pinctrl-0: pin control group to be used for this controller.
- pinctrl-names: must be "default".
Optional properties:
- renesas,can-clock-select: R-Car CAN Clock Source Select. Valid values are:
<0x0> (default) : Peripheral clock (clkp1)
<0x1> : Peripheral clock (clkp2)
<0x3> : Externally input clock
SoC common .dtsi file:
can0: can@e6e80000 {
compatible = "renesas,can-r8a7791";
reg = <0 0xe6e80000 0 0x1000>;
interrupts = <0 186 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&mstp9_clks R8A7791_CLK_RCAN0>,
<&cpg_clocks R8A7791_CLK_RCAN>, <&can_clk>;
clock-names = "clkp1", "clkp2", "can_clk";
status = "disabled";
Board specific .dts file:
&can0 {
pinctrl-0 = <&can0_pins>;
pinctrl-names = "default";
status = "okay";
......@@ -24,15 +24,17 @@ Optional properties:
- ti,hwmods : Must be "cpgmac0"
- no_bd_ram : Must be 0 or 1
- dual_emac : Specifies Switch to act as Dual EMAC
- syscon : Phandle to the system control device node, which is
the control module device of the am33x
Slave Properties:
Required properties:
- phy_id : Specifies slave phy id
- phy-mode : See ethernet.txt file in the same directory
- mac-address : See ethernet.txt file in the same directory
Optional properties:
- dual_emac_res_vlan : Specifies VID to be used to segregate the ports
- mac-address : See ethernet.txt file in the same directory
Note: "ti,hwmods" field is used to fetch the base address and irq
resources from TI, omap hwmod data base during device registration.
......@@ -57,6 +59,7 @@ Examples:
active_slave = <0>;
cpts_clock_mult = <0x80000000>;
cpts_clock_shift = <29>;
syscon = <&cm>;
cpsw_emac0: slave@0 {
phy_id = <&davinci_mdio>, <0>;
phy-mode = "rgmii-txid";
......@@ -85,6 +88,7 @@ Examples:
active_slave = <0>;
cpts_clock_mult = <0x80000000>;
cpts_clock_shift = <29>;
syscon = <&cm>;
cpsw_emac0: slave@0 {
phy_id = <&davinci_mdio>, <0>;
phy-mode = "rgmii-txid";
......@@ -39,6 +39,22 @@ Optionnal property:
This property is only used when switches are being
chained/cascaded together.
- phy-handle : Phandle to a PHY on an external MDIO bus, not the
switch internal one. See
for details.
- phy-mode : String representing the connection to the designated
PHY node specified by the 'phy-handle' property. See
for details.
Optional subnodes:
- fixed-link : Fixed-link subnode describing a link to a non-MDIO
managed entity. See
for details.
dsa@0 {
......@@ -58,6 +74,7 @@ Example:
port@0 {
reg = <0>;
label = "lan1";
phy-handle = <&phy0>;
port@1 {
* ARC EMAC 10/100 Ethernet platform driver for Rockchip Rk3066/RK3188 SoCs
Required properties:
- compatible: Should be "rockchip,rk3066-emac" or "rockchip,rk3188-emac"
according to the target SoC.
- reg: Address and length of the register set for the device
- interrupts: Should contain the EMAC interrupts
- rockchip,grf: phandle to the syscon grf used to control speed and mode
for emac.
- phy: see ethernet.txt file in the same directory.
- phy-mode: see ethernet.txt file in the same directory.
Optional properties:
- phy-supply: phandle to a regulator if the PHY needs one
Clock handling:
- clocks: Must contain an entry for each entry in clock-names.
- clock-names: Shall be "hclk" for the host clock needed to calculate and set
polling period of EMAC and "macref" for the reference clock needed to transfer
data to and from the phy.
Child nodes of the driver are the individual PHY devices connected to the
MDIO bus. They must have a "reg" property given the PHY address on the MDIO bus.
ethernet@10204000 {
compatible = "rockchip,rk3188-emac";
reg = <0xc0fc2000 0x3c>;
interrupts = <6>;
mac-address = [ 00 11 22 33 44 55 ];
clocks = <&cru HCLK_EMAC>, <&cru SCLK_MAC>;
clock-names = "hclk", "macref";
pinctrl-names = "default";
pinctrl-0 = <&emac_xfer>, <&emac_mdio>, <&phy_int>;
rockchip,grf = <&grf>;
phy = <&phy0>;
phy-mode = "rmii";
phy-supply = <&vcc_rmii>;
#address-cells = <1>;
#size-cells = <0>;
phy0: ethernet-phy@0 {
reg = <1>;
......@@ -16,6 +16,12 @@ Optional properties:
- phy-handle : phandle to the PHY device connected to this device.
- fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
Use instead of phy-handle.
- fsl,num-tx-queues : The property is valid for enet-avb IP, which supports
hw multi queues. Should specify the tx queue number, otherwise set tx queue
number to 1.
- fsl,num-rx-queues : The property is valid for enet-avb IP, which supports
hw multi queues. Should specify the rx queue number, otherwise set rx queue
number to 1.
Optional subnodes:
- mdio : specifies the mdio bus in the FEC, used as a container for phy nodes
* Marvell PXA168 Ethernet Controller
Required properties:
- compatible: should be "marvell,pxa168-eth".
- reg: address and length of the register set for the device.
- interrupts: interrupt for the device.
- clocks: pointer to the clock for the device.
Optional properties:
- port-id: Ethernet port number. Should be '0','1' or '2'.
- #address-cells: must be 1 when using sub-nodes.
- #size-cells: must be 0 when using sub-nodes.
- phy-handle: see ethernet.txt file in the same directory.
- local-mac-address: see ethernet.txt file in the same directory.
Each PHY can be represented as a sub-node. This is not mandatory.
Sub-nodes required properties:
- reg: the MDIO address of the PHY.
eth0: ethernet@f7b90000 {
compatible = "marvell,pxa168-eth";
reg = <0xf7b90000 0x10000>;
clocks = <&chip CLKID_GETH0>;
interrupts = <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>;
#address-cells = <1>;
#size-cells = <0>;
phy-handle = <&ethphy0>;
ethphy0: ethernet-phy@0 {
reg = <0>;
* Amlogic Meson DWMAC Ethernet controller
The device inherits all the properties of the dwmac/stmmac devices
described in the file net/stmmac.txt with the following changes.
Required properties:
- compatible: should be "amlogic,meson6-dwmac" along with "snps,dwmac"
and any applicable more detailed version number
described in net/stmmac.txt
- reg: should contain a register range for the dwmac controller and
another one for the Amlogic specific configuration
ethmac: ethernet@c9410000 {
compatible = "amlogic,meson6-dwmac", "snps,dwmac";
reg = <0xc9410000 0x10000
0xc1108108 0x4>;
interrupts = <0 8 1>;
interrupt-names = "macirq";
clocks = <&clk81>;
clock-names = "stmmaceth";
......@@ -26,7 +26,7 @@ Example (for ARM-based BeagleBoard xM with ST21NFCB on I2C2):
clock-frequency = <400000>;
interrupt-parent = <&gpio5>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
reset-gpios = <&gpio5 29 GPIO_ACTIVE_HIGH>;
......@@ -13,6 +13,11 @@ Optional SoC Specific Properties:
- pinctrl-names: Contains only one value - "default".
- pintctrl-0: Specifies the pin control groups used for this controller.
- autosuspend-delay: Specify autosuspend delay in milliseconds.
- vin-voltage-override: Specify voltage of VIN pin in microvolts.
- irq-status-read-quirk: Specify that the trf7970a being used has the
"IRQ Status Read" erratum.
- en2-rf-quirk: Specify that the trf7970a being used has the "EN2 RF"
Example (for ARM-based BeagleBone with TRF7970A on SPI1):
......@@ -30,7 +35,10 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
ti,enable-gpios = <&gpio2 2 GPIO_ACTIVE_LOW>,
<&gpio2 5 GPIO_ACTIVE_LOW>;
vin-supply = <&ldo3_reg>;
vin-voltage-override = <5000000>;
autosuspend-delay = <30000>;
status = "okay";
* Qualcomm QCA7000 (Ethernet over SPI protocol)
Note: The QCA7000 is useable as a SPI device. In this case it must be defined
as a child of a SPI master in the device tree.
Required properties:
- compatible : Should be "qca,qca7000"
- reg : Should specify the SPI chip select
- interrupts : The first cell should specify the index of the source interrupt
and the second cell should specify the trigger type as rising edge
- spi-cpha : Must be set
- spi-cpol: Must be set
Optional properties:
- interrupt-parent : Specify the pHandle of the source interrupt
- spi-max-frequency : Maximum frequency of the SPI bus the chip can operate at.
Numbers smaller than 1000000 or greater than 16000000 are invalid. Missing
the property will set the SPI frequency to 8000000 Hertz.
- local-mac-address: 6 bytes, MAC address
- qca,legacy-mode : Set the SPI data transfer of the QCA7000 to legacy mode.
In this mode the SPI master must toggle the chip select between each data
word. In burst mode these gaps aren't necessary, which is faster.
This setting depends on how the QCA7000 is setup via GPIO pin strapping.
If the property is missing the driver defaults to burst mode.
/* Freescale i.MX28 SPI master*/
ssp2: spi@80014000 {
#address-cells = <1>;
#size-cells = <0>;
compatible = "fsl,imx28-spi";
pinctrl-names = "default";
pinctrl-0 = <&spi2_pins_a>;
status = "okay";
qca7000: ethernet@0 {
compatible = "qca,qca7000";
reg = <0x0>;
interrupt-parent = <&gpio3>; /* GPIO Bank 3 */
interrupts = <25 0x1>; /* Index: 25, rising edge */
spi-cpha; /* SPI mode: CPHA=1 */
spi-cpol; /* SPI mode: CPOL=1 */
spi-max-frequency = <8000000>; /* freq: 8 MHz */
local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
......@@ -12,6 +12,10 @@ Required properties:
- altr,sysmgr-syscon : Should be the phandle to the system manager node that
encompasses the glue register, the register offset, and the register shift.
Optional properties:
altr,emac-splitter: Should be the phandle to the emac splitter soft IP node if
DWMAC controller is connected emac splitter.
gmac0: ethernet@ff700000 {
DCTCP (DataCenter TCP)
DCTCP is an enhancement to the TCP congestion control algorithm for data
center networks and leverages Explicit Congestion Notification (ECN) in
the data center network to provide multi-bit feedback to the end hosts.
To enable it on end hosts:
sysctl -w net.ipv4.tcp_congestion_control=dctcp
All switches in the data center network running DCTCP must support ECN
marking and be configured for marking when reaching defined switch buffer
thresholds. The default ECN marking threshold heuristic for DCTCP on
switches is 20 packets (30KB) at 1Gbps, and 65 packets (~100KB) at 10Gbps,
but might need further careful tweaking.
For more details, see below documents:
The algorithm is further described in detail in the following two
i) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye,
Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan:
"Data Center TCP (DCTCP)", Data Center Networks session
Proc. ACM SIGCOMM, New Delhi, 2010.
ii) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar:
"Analysis of DCTCP: Stability, Convergence, and Fairness"
Proc. ACM SIGMETRICS, San Jose, 2011.
IETF informational draft:
DCTCP site:
......@@ -951,7 +951,7 @@ Size modifier is one of ...
Mode modifier is one of:
BPF_IMM 0x00 /* classic BPF only, reserved in eBPF */
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
BPF_IND 0x40
BPF_MEM 0x60
......@@ -995,6 +995,275 @@ BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
32-bit immediate value into a register.