Commit 6c373ca8 authored by Linus Torvalds's avatar Linus Torvalds

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:

 1) Add BQL support to via-rhine, from Tino Reichardt.

 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading.  From Floria Fainelli.

 3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

 4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

 6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc.  And use this to clean up the neigh layer, then use it to
    implement MPLS support.  All from Eric Biederman.

 7) Support L3 forwarding offloading in switches, from Scott Feldman.

 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further.  From Alexander Duyck.

 9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf.  In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
    established in the main hash table.  Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath.  From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6.  From
    Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
  fm10k: Bump driver version to 0.15.2
  fm10k: corrected VF multicast update
  fm10k: mbx_update_max_size does not drop all oversized messages
  fm10k: reset head instead of calling update_max_size
  fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
  fm10k: update xcast mode before synchronizing multicast addresses
  fm10k: start service timer on probe
  fm10k: fix function header comment
  fm10k: comment next_vf_mbx flow
  fm10k: don't handle mailbox events in iov_event path and always process mailbox
  fm10k: use separate workqueue for fm10k driver
  fm10k: Set PF queues to unlimited bandwidth during virtualization
  fm10k: expose tx_timeout_count as an ethtool stat
  fm10k: only increment tx_timeout_count in Tx hang path
  fm10k: remove extraneous "Reset interface" message
  fm10k: separate PF only stats so that VF does not display them
  fm10k: use hw->mac.max_queues for stats
  fm10k: only show actual queues, not the maximum in hardware
  fm10k: allow creation of VLAN on default vid
  fm10k: fix unused warnings
  ...
parents bb0fd7ab 9f915141

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.
......@@ -188,6 +188,14 @@ Description:
Indicates the interface unique physical port identifier within
the NIC, as a string.
What: /sys/class/net/<iface>/phys_port_name
Date: March 2015
KernelVersion: 4.0
Contact: netdev@vger.kernel.org
Description:
Indicates the interface physical port name within the NIC,
as a string.
What: /sys/class/net/<iface>/speed
Date: October 2009
KernelVersion: 2.6.33
......
......@@ -24,6 +24,14 @@ Description:
Indicates the number of transmit timeout events seen by this
network interface transmit queue.
What: /sys/class/<iface>/queues/tx-<queue>/tx_maxrate
Date: March 2015
KernelVersion: 4.1
Contact: netdev@vger.kernel.org
Description:
A Mbps max-rate set for the queue, a value of zero means disabled,
default is disabled.
What: /sys/class/<iface>/queues/tx-<queue>/xps_cpus
Date: November 2010
KernelVersion: 2.6.38
......
......@@ -14,7 +14,11 @@ Required properties for all the ethernet interfaces:
- "enet_csr": Ethernet control and status register address space
- "ring_csr": Descriptor ring control and status register address space
- "ring_cmd": Descriptor ring command register address space
- interrupts: Ethernet main interrupt
- interrupts: Two interrupt specifiers can be specified.
- First is the Rx interrupt. This irq is mandatory.
- Second is the Tx completion interrupt.
This is supported only on SGMII based 1GbE and 10GbE interfaces.
- port-id: Port number (0 or 1)
- clocks: Reference to the clock entry.
- local-mac-address: MAC address assigned to this device
- phy-connection-type: Interface type between ethernet device and PHY device
......@@ -49,6 +53,7 @@ Example:
<0x0 0X10000000 0x0 0X200>;
reg-names = "enet_csr", "ring_csr", "ring_cmd";
interrupts = <0x0 0x3c 0x4>;
port-id = <0>;
clocks = <&menetclk 0>;
local-mac-address = [00 01 73 00 00 01];
phy-connection-type = "rgmii";
......
......@@ -6,11 +6,14 @@ Required properties:
- spi-max-frequency: maximal bus speed, should be set to 7500000 depends
sync or async operation mode
- reg: the chipselect index
- interrupts: the interrupt generated by the device
- interrupts: the interrupt generated by the device. Non high-level
can occur deadlocks while handling isr.
Optional properties:
- reset-gpio: GPIO spec for the rstn pin
- sleep-gpio: GPIO spec for the slp_tr pin
- xtal-trim: u8 value for fine tuning the internal capacitance
arrays of xtal pins: 0 = +0 pF, 0xf = +4.5 pF
Example:
......@@ -18,6 +21,7 @@ Example:
compatible = "atmel,at86rf231";
spi-max-frequency = <7500000>;
reg = <0>;
interrupts = <19 1>;
interrupts = <19 4>;
interrupt-parent = <&gpio3>;
xtal-trim = /bits/ 8 <0x06>;
};
......@@ -13,11 +13,15 @@ Required properties:
- cca-gpio: GPIO spec for the CCA pin
- vreg-gpio: GPIO spec for the VREG pin
- reset-gpio: GPIO spec for the RESET pin
Optional properties:
- amplified: include if the CC2520 is connected to a CC2591 amplifier
Example:
cc2520@0 {
compatible = "ti,cc2520";
reg = <0>;
spi-max-frequency = <4000000>;
amplified;
pinctrl-names = "default";
pinctrl-0 = <&cc2520_cape_pins>;
fifo-gpio = <&gpio1 18 0>;
......
......@@ -49,6 +49,7 @@ Required properties:
- compatible: Should be "ti,netcp-1.0"
- clocks: phandle to the reference clocks for the subsystem.
- dma-id: Navigator packet dma instance id.
- ranges: address range of NetCP (includes, Ethernet SS, PA and SA)
Optional properties:
- reg: register location and the size for the following register
......@@ -64,10 +65,30 @@ NetCP device properties: Device specification for NetCP sub-modules.
1Gb/10Gb (gbe/xgbe) ethernet switch sub-module specifications.
Required properties:
- label: Must be "netcp-gbe" for 1Gb & "netcp-xgbe" for 10Gb.
- compatible: Must be one of below:-
"ti,netcp-gbe" for 1GbE on NetCP 1.4
"ti,netcp-gbe-5" for 1GbE N NetCP 1.5 (N=5)
"ti,netcp-gbe-9" for 1GbE N NetCP 1.5 (N=9)
"ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
"ti,netcp-xgbe" for 10 GbE
- reg: register location and the size for the following register
regions in the specified order.
- subsystem registers
- serdes registers
- switch subsystem registers
- sgmii port3/4 module registers (only for NetCP 1.4)
- switch module registers
- serdes registers (only for 10G)
NetCP 1.4 ethss, here is the order
index #0 - switch subsystem registers
index #1 - sgmii port3/4 module registers
index #2 - switch module registers
NetCP 1.5 ethss 9 port, 5 port and 2 port
index #0 - switch subsystem registers
index #1 - switch module registers
index #2 - serdes registers
- tx-channel: the navigator packet dma channel name for tx.
- tx-queue: the navigator queue number associated with the tx dma channel.
- interfaces: specification for each of the switch port to be registered as a
......@@ -120,14 +141,13 @@ Optional properties:
Example binding:
netcp: netcp@2090000 {
netcp: netcp@2000000 {
reg = <0x2620110 0x8>;
reg-names = "efuse";
compatible = "ti,netcp-1.0";
#address-cells = <1>;
#size-cells = <1>;
ranges;
ranges = <0 0x2000000 0xfffff>;
clocks = <&papllclk>, <&clkcpgmac>, <&chipclk12>;
dma-coherent;
/* big-endian; */
......@@ -137,9 +157,9 @@ netcp: netcp@2090000 {
#address-cells = <1>;
#size-cells = <1>;
ranges;
gbe@0x2090000 {
gbe@90000 {
label = "netcp-gbe";
reg = <0x2090000 0xf00>;
reg = <0x90000 0x300>, <0x90400 0x400>, <0x90800 0x700>;
/* enable-ale; */
tx-queue = <648>;
tx-channel = <8>;
......
......@@ -2,10 +2,13 @@
Required properties:
- compatible: Should be "cdns,[<chip>-]{macb|gem}"
Use "cdns,at91sam9260-macb" Atmel at91sam9260 and at91sam9263 SoCs.
Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs or the 10/100Mbit IP
available on sama5d3 SoCs.
Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb".
Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
the Cadence GEM, or the generic form: "cdns,gem".
Use "cdns,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
Use "cdns,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs.
- reg: Address and length of the register set for the device
- interrupts: Should contain macb interrupt
- phy-mode: See ethernet.txt file in the same directory.
......
* NXP Semiconductors NXP NCI NFC Controllers
Required properties:
- compatible: Should be "nxp,nxp-nci-i2c".
- clock-frequency: I²C work frequency.
- reg: address on the bus
- interrupt-parent: phandle for the interrupt gpio controller
- interrupts: GPIO interrupt to which the chip is connected
- enable-gpios: Output GPIO pin used for enabling/disabling the chip
- firmware-gpios: Output GPIO pin used to enter firmware download mode
Optional SoC Specific Properties:
- pinctrl-names: Contains only one value - "default".
- pintctrl-0: Specifies the pin control groups used for this controller.
Example (for ARM-based BeagleBone with NPC100 NFC controller on I2C2):
&i2c2 {
status = "okay";
npc100: npc100@29 {
compatible = "nxp,nxp-nci-i2c";
reg = <0x29>;
clock-frequency = <100000>;
interrupt-parent = <&gpio1>;
interrupts = <29 GPIO_ACTIVE_HIGH>;
enable-gpios = <&gpio0 30 GPIO_ACTIVE_HIGH>;
firmware-gpios = <&gpio0 31 GPIO_ACTIVE_HIGH>;
};
};
......@@ -35,10 +35,11 @@ Optional properties:
- reset-names: Should contain the reset signal name "stmmaceth", if a
reset phandle is given
- max-frame-size: See ethernet.txt file in the same directory
- clocks: If present, the first clock should be the GMAC main clock,
further clocks may be specified in derived bindings.
- clocks: If present, the first clock should be the GMAC main clock and
the second clock should be peripheral's register interface clock. Further
clocks may be specified in derived bindings.
- clock-names: One name for each entry in the clocks property, the
first one should be "stmmaceth".
first one should be "stmmaceth" and the second one should be "pclk".
- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is
available this clock is used for programming the Timestamp Addend Register.
If not passed then the system clock will be used and this is fine on some
......
......@@ -22,7 +22,8 @@ This file contains
4.1.3 RAW socket option CAN_RAW_LOOPBACK
4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
4.1.5 RAW socket option CAN_RAW_FD_FRAMES
4.1.6 RAW socket returned message flags
4.1.6 RAW socket option CAN_RAW_JOIN_FILTERS
4.1.7 RAW socket returned message flags
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
4.2.1 Broadcast Manager operations
4.2.2 Broadcast Manager message flags
......@@ -601,7 +602,22 @@ solution for a couple of reasons:
CAN FD frames by checking if the device maximum transfer unit is CANFD_MTU.
The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall.
4.1.6 RAW socket returned message flags
4.1.6 RAW socket option CAN_RAW_JOIN_FILTERS
The CAN_RAW socket can set multiple CAN identifier specific filters that
lead to multiple filters in the af_can.c filter processing. These filters
are indenpendent from each other which leads to logical OR'ed filters when
applied (see 4.1.1).
This socket option joines the given CAN filters in the way that only CAN
frames are passed to user space that matched *all* given CAN filters. The
semantic for the applied filters is therefore changed to a logical AND.
This is useful especially when the filterset is a combination of filters
where the CAN_INV_FILTER flag is set in order to notch single CAN IDs or
CAN ID ranges from the incoming traffic.
4.1.7 RAW socket returned message flags
When using recvmsg() call, the msg->msg_flags may contain following flags:
......
......@@ -280,7 +280,8 @@ Possible BPF extensions are shown in the following table:
rxhash skb->hash
cpu raw_smp_processor_id()
vlan_tci skb_vlan_tag_get(skb)
vlan_pr skb_vlan_tag_present(skb)
vlan_avail skb_vlan_tag_present(skb)
vlan_tpid skb->vlan_proto
rand prandom_u32()
These extensions can also be prefixed with '#'.
......
......@@ -42,10 +42,10 @@ Additional Configurations
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the MTU to a value larger than
the default of 1500. Use the ifconfig command to increase the MTU size.
the default of 1500. Use the ip command to increase the MTU size.
For example:
ifconfig eth<x> mtu 9000 up
ip link set dev eth<x> mtu 9000
This setting is not saved across reboots.
......
......@@ -388,6 +388,16 @@ tcp_mtu_probing - INTEGER
1 - Disabled by default, enabled when an ICMP black hole detected
2 - Always enabled, use initial MSS of tcp_base_mss.
tcp_probe_interval - INTEGER
Controls how often to start TCP Packetization-Layer Path MTU
Discovery reprobe. The default is reprobing every 10 minutes as
per RFC4821.
tcp_probe_threshold - INTEGER
Controls when TCP Packetization-Layer Path MTU Discovery probing
will stop in respect to the width of search range in bytes. Default
is 8 bytes.
tcp_no_metrics_save - BOOLEAN
By default, TCP saves various connection metrics in the route cache
when the connection closes, so that connections established in the
......@@ -1116,11 +1126,23 @@ arp_accept - BOOLEAN
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.
mcast_solicit - INTEGER
The maximum number of multicast probes in INCOMPLETE state,
when the associated hardware address is unknown. Defaults
to 3.
ucast_solicit - INTEGER
The maximum number of unicast probes in PROBE state, when
the hardware address is being reconfirmed. Defaults to 3.
app_solicit - INTEGER
The maximum number of probes to send to the user space ARP daemon
via netlink before dropping back to multicast probes (see
mcast_solicit). Defaults to 0.
mcast_resolicit). Defaults to 0.
mcast_resolicit - INTEGER
The maximum number of multicast probes after unicast and
app probes in PROBE state. Defaults to 0.
disable_policy - BOOLEAN
Disable IPSEC policy (SPD) for this interface
......@@ -1198,6 +1220,17 @@ anycast_src_echo_reply - BOOLEAN
FALSE: disabled
Default: FALSE
idgen_delay - INTEGER
Controls the delay in seconds after which time to retry
privacy stable address generation if a DAD conflict is
detected.
Default: 1 (as specified in RFC7217)
idgen_retries - INTEGER
Controls the number of retries to generate a stable privacy
address if a DAD conflict is detected.
Default: 3 (as specified in RFC7217)
mld_qrv - INTEGER
Controls the MLD query robustness variable (see RFC3810 9.1).
Default: 2 (as specified by RFC3810 9.1)
......@@ -1518,6 +1551,20 @@ use_optimistic - BOOLEAN
0: disabled (default)
1: enabled
stable_secret - IPv6 address
This IPv6 address will be used as a secret to generate IPv6
addresses for link-local addresses and autoconfigured
ones. All addresses generated after setting this secret will
be stable privacy ones by default. This can be changed via the
addrgenmode ip-link. conf/default/stable_secret is used as the
secret for the namespace, the interface specific ones can
overwrite that. Writes to conf/all/stable_secret are refused.
It is recommended to generate this secret during installation
of a system and keep it stable after that.
By default the stable secret is unset.
icmp/*:
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
......
......@@ -22,6 +22,27 @@ backup_only - BOOLEAN
If set, disable the director function while the server is
in backup mode to avoid packet loops for DR/TUN methods.
conn_reuse_mode - INTEGER
1 - default
Controls how ipvs will deal with connections that are detected
port reuse. It is a bitmap, with the values being:
0: disable any special handling on port reuse. The new
connection will be delivered to the same real server that was
servicing the previous connection. This will effectively
disable expire_nodest_conn.
bit 1: enable rescheduling of new connections when it is safe.
That is, whenever expire_nodest_conn and for TCP sockets, when
the connection is in TIME_WAIT state (which is only possible if
you use NAT mode).
bit 2: it is bit 1 plus, for TCP connections, when connections
are in FIN_WAIT state, as this is the last state seen by load
balancer in Direct Routing mode. This bit helps on adding new
real servers to a very busy cluster.
conntrack - BOOLEAN
0 - disabled (default)
not 0 - enabled
......
......@@ -39,7 +39,7 @@ Channel Bonding documentation can be found in the Linux kernel source:
The driver information previously displayed in the /proc filesystem is not
supported in this release. Alternatively, you can use ethtool (version 1.6
or later), lspci, and ifconfig to obtain the same information.
or later), lspci, and iproute2 to obtain the same information.
Instructions on updating ethtool can be found in the section "Additional
Configurations" later in this document.
......@@ -90,7 +90,7 @@ select m for "Intel(R) PRO/10GbE support" located at:
3. Assign an IP address to the interface by entering the following, where
x is the interface number:
ifconfig ethx <IP_address>
ip addr add ethx <IP_address>
4. Verify that the interface works. Enter the following, where <IP_address>
is the IP address for another machine on the same subnet as the interface
......@@ -177,7 +177,7 @@ NOTE: These changes are only suggestions, and serve as a starting point for
tuning your network performance.
The changes are made in three major ways, listed in order of greatest effect:
- Use ifconfig to modify the mtu (maximum transmission unit) and the txqueuelen
- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
parameter.
- Use sysctl to modify /proc parameters (essentially kernel tuning)
- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
......@@ -202,7 +202,7 @@ setpci -d 8086:1a48 e6.b=2e
# to change as well.
# set the txqueuelen
# your ixgb adapter should be loaded as eth1 for this to work, change if needed
ifconfig eth1 mtu 9000 txqueuelen 1000 up
ip li set dev eth1 mtu 9000 txqueuelen 1000 up
# call the sysctl utility to modify /proc/sys entries
sysctl -p ./sysctl_ixgb.conf
- END ixgb_perf.sh
......@@ -297,10 +297,10 @@ Additional Configurations
------------
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
enabled by changing the MTU to a value larger than the default of 1500.
The maximum value for the MTU is 16114. Use the ifconfig command to
The maximum value for the MTU is 16114. Use the ip command to
increase the MTU size. For example:
ifconfig ethx mtu 9000 up
ip li set dev ethx mtu 9000
The maximum MTU setting for Jumbo Frames is 16114. This value coincides
with the maximum Jumbo Frames size of 16128.
......
......@@ -70,10 +70,10 @@ Avago 1000BASE-T SFP ABCU-5710RZ
82599-based adapters support all passive and active limiting direct attach
cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
Laser turns off for SFP+ when ifconfig down
Laser turns off for SFP+ when device is down
-------------------------------------------
"ifconfig down" turns off the laser for 82599-based SFP+ fiber adapters.
"ifconfig up" turns on the laser.
"ip link set down" turns off the laser for 82599-based SFP+ fiber adapters.
"ip link set up" turns on the laser.
82598-BASED ADAPTERS
......@@ -213,13 +213,13 @@ Additional Configurations
------------
The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
enabled by changing the MTU to a value larger than the default of 1500.
The maximum value for the MTU is 16110. Use the ifconfig command to
The maximum value for the MTU is 16110. Use the ip command to
increase the MTU size. For example:
ifconfig ethx mtu 9000 up
ip link set dev ethx mtu 9000
The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
The maximum MTU setting for Jumbo Frames is 9710. This value coincides
with the maximum Jumbo Frames size of 9728.
Generic Receive Offload, aka GRO
--------------------------------
......
/proc/sys/net/mpls/* Variables:
platform_labels - INTEGER
Number of entries in the platform label table. It is not
possible to configure forwarding for label values equal to or
greater than the number of platform labels.
A dense utliziation of the entries in the platform label table
is possible and expected aas the platform labels are locally
allocated.
If the number of platform label table entries is set to 0 no
label will be recognized by the kernel and mpls forwarding
will be disabled.
Reducing this value will remove all label routing entries that
no longer fit in the table.
Possible values: 0 - 1048575
Default: 0
......@@ -440,9 +440,10 @@ and the following flags apply:
+++ Capture process:
from include/linux/if_packet.h
#define TP_STATUS_COPY 2
#define TP_STATUS_LOSING 4
#define TP_STATUS_CSUMNOTREADY 8
#define TP_STATUS_COPY (1 << 1)
#define TP_STATUS_LOSING (1 << 2)
#define TP_STATUS_CSUMNOTREADY (1 << 3)
#define TP_STATUS_CSUM_VALID (1 << 7)
TP_STATUS_COPY : This flag indicates that the frame (and associated
meta information) has been truncated because it's
......@@ -466,6 +467,12 @@ TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which
reading the packet we should not try to check the
checksum.
TP_STATUS_CSUM_VALID : This flag indicates that at least the transport
header checksum of the packet has been already
validated on the kernel side. If the flag is not set
then we are free to check the checksum by ourselves
provided that TP_STATUS_CSUMNOTREADY is also not set.
for convenience there are also the following defines:
#define TP_STATUS_KERNEL 0
......
......@@ -3,13 +3,11 @@
HOWTO for the linux packet generator
------------------------------------
Date: 041221
Enable CONFIG_NET_PKTGEN to compile and build pktgen.o either in kernel
or as module. Module is preferred. insmod pktgen if needed. Once running
pktgen creates a thread on each CPU where each thread has affinity to its CPU.
Monitoring and controlling is done via /proc. Easiest to select a suitable
a sample script and configure.
Enable CONFIG_NET_PKTGEN to compile and build pktgen either in-kernel
or as a module. A module is preferred; modprobe pktgen if needed. Once
running, pktgen creates a thread for each CPU with affinity to that CPU.
Monitoring and controlling is done via /proc. It is easiest to select a
suitable sample script and configure that.
On a dual CPU:
......@@ -27,7 +25,7 @@ For monitoring and control pktgen creates:
Tuning NIC for max performance
==============================
The default NIC setting are (likely) not tuned for pktgen's artificial
The default NIC settings are (likely) not tuned for pktgen's artificial
overload type of benchmarking, as this could hurt the normal use-case.
Specifically increasing the TX ring buffer in the NIC:
......@@ -35,20 +33,20 @@ Specifically increasing the TX ring buffer in the NIC:
A larger TX ring can improve pktgen's performance, while it can hurt
in the general case, 1) because the TX ring buffer might get larger
than the CPUs L1/L2 cache, 2) because it allow more queueing in the
than the CPU's L1/L2 cache, 2) because it allows more queueing in the
NIC HW layer (which is bad for bufferbloat).
One should be careful to conclude, that packets/descriptors in the HW
One should hesitate to conclude that packets/descriptors in the HW
TX ring cause delay. Drivers usually delay cleaning up the
ring-buffers (for various performance reasons), thus packets stalling
the TX ring, might just be waiting for cleanup.
ring-buffers for various performance reasons, and packets stalling
the TX ring might just be waiting for cleanup.
This cleanup issues is specifically the case, for the driver ixgbe
(Intel 82599 chip). This driver (ixgbe) combine TX+RX ring cleanups,
This cleanup issue is specifically the case for the driver ixgbe
(Intel 82599 chip). This driver (ixgbe) combines TX+RX ring cleanups,
and the cleanup interval is affected by the ethtool --coalesce setting
of parameter "rx-usecs".
For ixgbe use e.g "30" resulting in approx 33K interrupts/sec (1/30*10^6):
For ixgbe use e.g. "30" resulting in approx 33K interrupts/sec (1/30*10^6):
# ethtool -C ethX rx-usecs 30
......@@ -60,15 +58,16 @@ Running:
Stopped: eth1
Result: OK: max_before_softirq=10000
Most important the devices assigned to thread. Note! A device can only belong
to one thread.
Most important are the devices assigned to the thread. Note that a
device can only belong to one thread.
Viewing devices
===============
Parm section holds configured info. Current hold running stats.
Result is printed after run or after interruption. Example:
The Params section holds configured information. The Current section
holds running statistics. The Result is printed after a run or after
interruption. Example:
/proc/net/pktgen/eth1
......@@ -93,7 +92,8 @@ Result: OK: 13101142(c12220741+d880401) usec, 10000000 (60byte,0frags)