Commit ba4e58ec authored by Gerrit Renker's avatar Gerrit Renker Committed by David S. Miller
Browse files

[NET]: Supporting UDP-Lite (RFC 3828) in Linux



This is a revision of the previously submitted patch, which alters
the way files are organized and compiled in the following manner:

	* UDP and UDP-Lite now use separate object files
	* source file dependencies resolved via header files
	  net/ipv{4,6}/udp_impl.h
	* order of inclusion files in udp.c/udplite.c adapted
	  accordingly

[NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)

This patch adds support for UDP-Lite to the IPv4 stack, provided as an
extension to the existing UDPv4 code:
        * generic routines are all located in net/ipv4/udp.c
        * UDP-Lite specific routines are in net/ipv4/udplite.c
        * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
        * shared API with extensions for partial checksum coverage

[NET/IPv6]: Extension for UDP-Lite over IPv6

It extends the existing UDPv6 code base with support for UDP-Lite
in the same manner as per UDPv4. In particular,
        * UDPv6 generic and shared code is in net/ipv6/udp.c
        * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
        * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
        * support for IPV6_ADDRFORM
        * aligned the coding style of protocol initialisation with af_inet6.c
        * made the error handling in udpv6_queue_rcv_skb consistent;
          to return `-1' on error on all error cases
        * consolidation of shared code

[NET]: UDP-Lite Documentation and basic XFRM/Netfilter support

The UDP-Lite patch further provides
        * API documentation for UDP-Lite
        * basic xfrm support
        * basic netfilter support for IPv4 and IPv6 (LOG target)
Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 6051e2f4
===========================================================================
The UDP-Lite protocol (RFC 3828)
===========================================================================
UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
is a variable-length checksum. This has advantages for transport of multimedia
(video, VoIP) over wireless networks, as partly damaged packets can still be
fed into the codec instead of being discarded due to a failed checksum test.
This file briefly describes the existing kernel support and the socket API.
For in-depth information, you can consult:
o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
Fom here you can also download some example application source code.
o The UDP-Lite HOWTO on
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
o The Wireshark UDP-Lite WiKi (with capture files):
http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
I) APPLICATIONS
Several applications have been ported successfully to UDP-Lite. Ethereal
(now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
has source code for several v4/v6 client-server and network testing examples.
Porting applications to UDP-Lite is straightforward: only socket level and
IPPROTO need to be changed; senders additionally set the checksum coverage
length (default = header length = 8). Details are in the next section.
II) PROGRAMMING API
UDP-Lite provides a connectionless, unreliable datagram service and hence
uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
very easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
call so that the statement looks like:
s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
or, respectively,
s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
With just the above change you are able to run UDP-Lite services or connect
to UDP-Lite servers. The kernel will assume that you are not interested in
using partial checksum coverage and so emulate UDP mode (full coverage).
To make use of the partial checksum coverage facilities requires setting a
single socket option, which takes an integer specifying the coverage length:
* Sender checksum coverage: UDPLITE_SEND_CSCOV
For example,
int val = 20;
setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
sets the checksum coverage length to 20 bytes (12b data + 8b header).
Of each packet only the first 20 bytes (plus the pseudo-header) will be
checksummed. This is useful for RTP applications which have a 12-byte
base header.
* Receiver checksum coverage: UDPLITE_RECV_CSCOV
This option is the receiver-side analogue. It is truly optional, i.e. not
required to enable traffic with partial checksum coverage. Its function is
that of a traffic filter: when enabled, it instructs the kernel to drop
all packets which have a coverage _less_ than this value. For example, if
RTP and UDP headers are to be protected, a receiver can enforce that only
packets with a minimum coverage of 20 are admitted:
int min = 20;
setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
The calls to getsockopt(2) are analogous. Being an extension and not a stand-
alone protocol, all socket options known from UDP can be used in exactly the
same manner as before, e.g. UDP_CORK or UDP_ENCAP.
A detailed discussion of UDP-Lite checksum coverage options is in section IV.
III) HEADER FILES
The socket API requires support through header files in /usr/include:
* /usr/include/netinet/in.h
to define IPPROTO_UDPLITE
* /usr/include/netinet/udplite.h
for UDP-Lite header fields and protocol constants
For testing purposes, the following can serve as a `mini' header file:
#define IPPROTO_UDPLITE 136
#define SOL_UDPLITE 136
#define UDPLITE_SEND_CSCOV 10
#define UDPLITE_RECV_CSCOV 11
Ready-made header files for various distros are in the UDP-Lite tarball.
IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
To enable debugging messages, the log level need to be set to 8, as most
messages use the KERN_DEBUG level (7).
1) Sender Socket Options
If the sender specifies a value of 0 as coverage length, the module
assumes full coverage, transmits a packet with coverage length of 0
and according checksum. If the sender specifies a coverage < 8 and
different from 0, the kernel assumes 8 as default value. Finally,
if the specified coverage length exceeds the packet length, the packet
length is used instead as coverage length.
2) Receiver Socket Options
The receiver specifies the minimum value of the coverage length it
is willing to accept. A value of 0 here indicates that the receiver
always wants the whole of the packet covered. In this case, all
partially covered packets are dropped and an error is logged.
It is not possible to specify illegal values (<0 and <8); in these
cases the default of 8 is assumed.
All packets arriving with a coverage value less than the specified
threshold are discarded, these events are also logged.
3) Disabling the Checksum Computation
On both sender and receiver, checksumming will always be performed
and can not be disabled using SO_NO_CHECK. Thus
setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
will always will be ignored, while the value of
getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
is meaningless (as in TCP). Packets with a zero checksum field are
illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
4) Fragmentation
The checksum computation respects both buffersize and MTU. The size
of UDP-Lite packets is determined by the size of the send buffer. The
minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
in include/net/sock.h), the default value is configurable as
net.core.wmem_default or via setting the SO_SNDBUF socket(7)
option. The maximum upper bound for the send buffer is determined
by net.core.wmem_max.
Given a payload size larger than the send buffer size, UDP-Lite will
split the payload into several individual packets, filling up the
send buffer size in each case.
The precise value also depends on the interface MTU. The interface MTU,
in turn, may trigger IP fragmentation. In this case, the generated
UDP-Lite packet is split into several IP packets, of which only the
first one contains the L4 header.
The send buffer size has implications on the checksum coverage length.
Consider the following example:
Payload: 1536 bytes Send Buffer: 1024 bytes
MTU: 1500 bytes Coverage Length: 856 bytes
UDP-Lite will ship the 1536 bytes in two separate packets:
Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
The coverage packet covers the UDP-Lite header and 848 bytes of the
payload in the first packet, the second packet is fully covered. Note
that for the second packet, the coverage length exceeds the packet
length. The kernel always re-adjusts the coverage length to the packet
length in such cases.
As an example of what happens when one UDP-Lite packet is split into
several tiny fragments, consider the following example.
Payload: 1024 bytes Send buffer size: 1024 bytes
MTU: 300 bytes Coverage length: 575 bytes
+-+-----------+--------------+--------------+--------------+
|8| 272 | 280 | 280 | 280 |
+-+-----------+--------------+--------------+--------------+
280 560 840 1032
^
*****checksum coverage*************
The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
header). According to the interface MTU, these are split into 4 IP
packets (280 byte IP payload + 20 byte IP header). The kernel module
sums the contents of the entire first two packets, plus 15 bytes of
the last packet before releasing the fragments to the IP module.
To see the analogous case for IPv6 fragmentation, consider a link
MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
coverage is less than 1232 bytes (MTU minus IPv6/fragment header
lengths), only the first fragment needs to be considered. When using
larger checksum coverage lengths, each eligible fragment needs to be
checksummed. Suppose we have a checksum coverage of 3062. The buffer
of 3356 bytes will be split into the following fragments:
Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
The first two fragments have to be checksummed in full, of the last
fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
While it is important that such cases are dealt with correctly, they
are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
performance over wireless (or generally noisy) links and thus smaller
coverage lenghts are likely to be expected.
V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
Exceptional and error conditions are logged to syslog at the KERN_DEBUG
level. Live statistics about UDP-Lite are available in /proc/net/snmp
and can (with newer versions of netstat) be viewed using
netstat -svu
This displays UDP-Lite statistics variables, whose meaning is as follows.
InDatagrams: Total number of received datagrams.
NoPorts: Number of packets received to an unknown port.
These cases are counted separately (not as InErrors).
InErrors: Number of erroneous UDP-Lite packets. Errors include:
* internal socket queue receive errors
* packet too short (less than 8 bytes or stated
coverage length exceeds received length)
* xfrm4_policy_check() returned with error
* application has specified larger min. coverage
length than that of incoming packet
* checksum coverage violated
* bad checksum
OutDatagrams: Total number of sent datagrams.
These statistics derive from the UDP MIB (RFC 2013).
VI) IPTABLES
There is packet match support for UDP-Lite as well as support for the LOG target.
If you copy and paste the following line into /etc/protcols,
udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
then
iptables -A INPUT -p udplite -j LOG
will produce logging output to syslog. Dropping and rejecting packets also works.
VII) MAINTAINER ADDRESS
The UDP-Lite patch was developed at
University of Aberdeen
Electronics Research Group
Department of Engineering
Fraser Noble Building
Aberdeen AB24 3UE; UK
The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
code was developed by William Stanislaus, <william@erg.abdn.ac.uk>.
......@@ -45,6 +45,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
......
......@@ -264,6 +264,7 @@ struct ucred {
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
......
......@@ -38,6 +38,7 @@ struct udphdr {
#include <linux/types.h>
#include <net/inet_sock.h>
#define UDP_HTABLE_SIZE 128
struct udp_sock {
/* inet_sock has to be the first member */
......@@ -50,12 +51,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
/*
* Fields specific to UDP-Lite.
*/
__u16 pcslen;
__u16 pcrlen;
/* indicator bits used by pcflag: */
#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
__u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
......
......@@ -158,9 +158,13 @@ DECLARE_SNMP_STAT(struct icmpv6_mib, icmpv6_statistics);
SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \
})
DECLARE_SNMP_STAT(struct udp_mib, udp_stats_in6);
#define UDP6_INC_STATS(field) SNMP_INC_STATS(udp_stats_in6, field)
#define UDP6_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_stats_in6, field)
#define UDP6_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_stats_in6, field)
DECLARE_SNMP_STAT(struct udp_mib, udplite_stats_in6);
#define UDP6_INC_STATS_BH(field, is_udplite) do { \
if (is_udplite) SNMP_INC_STATS_BH(udplite_stats_in6, field); \
else SNMP_INC_STATS_BH(udp_stats_in6, field); } while(0)
#define UDP6_INC_STATS_USER(field, is_udplite) do { \
if (is_udplite) SNMP_INC_STATS_USER(udplite_stats_in6, field); \
else SNMP_INC_STATS_USER(udp_stats_in6, field); } while(0)
int snmp6_register_dev(struct inet6_dev *idev);
int snmp6_unregister_dev(struct inet6_dev *idev);
......@@ -604,6 +608,8 @@ extern int tcp6_proc_init(void);
extern void tcp6_proc_exit(void);
extern int udp6_proc_init(void);
extern void udp6_proc_exit(void);
extern int udplite6_proc_init(void);
extern void udplite6_proc_exit(void);
extern int ipv6_misc_proc_init(void);
extern void ipv6_misc_proc_exit(void);
......
......@@ -11,6 +11,7 @@
extern struct proto rawv6_prot;
extern struct proto udpv6_prot;
extern struct proto udplitev6_prot;
extern struct proto tcpv6_prot;
struct flowi;
......@@ -24,6 +25,7 @@ extern void ipv6_destopt_init(void);
/* transport protocols */
extern void rawv6_init(void);
extern void udpv6_init(void);
extern void udplitev6_init(void);
extern void tcpv6_init(void);
extern int udpv6_connect(struct sock *sk,
......
......@@ -26,9 +26,28 @@
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
#include <net/ip.h>
#include <linux/ipv6.h>
#include <linux/seq_file.h>
#define UDP_HTABLE_SIZE 128
/**
* struct udp_skb_cb - UDP(-Lite) private variables
*
* @header: private variables used by IPv4/IPv6
* @cscov: checksum coverage length (UDP-Lite only)
* @partial_cov: if set indicates partial csum coverage
*/
struct udp_skb_cb {
union {
struct inet_skb_parm h4;
#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
struct inet6_skb_parm h6;
#endif
} header;
__u16 cscov;
__u8 partial_cov;
};
#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
extern struct hlist_head udp_hash[UDP_HTABLE_SIZE];
extern rwlock_t udp_hash_lock;
......@@ -47,6 +66,62 @@ extern struct proto udp_prot;
struct sk_buff;
/*
* Generic checksumming routines for UDP(-Lite) v4 and v6
*/
static inline u16 __udp_lib_checksum_complete(struct sk_buff *skb)
{
if (! UDP_SKB_CB(skb)->partial_cov)
return __skb_checksum_complete(skb);
return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
skb->csum));
}
static __inline__ int udp_lib_checksum_complete(struct sk_buff *skb)
{
return skb->ip_summed != CHECKSUM_UNNECESSARY &&
__udp_lib_checksum_complete(skb);
}
/**
* udp_csum_outgoing - compute UDPv4/v6 checksum over fragments
* @sk: socket we are writing to
* @skb: sk_buff containing the filled-in UDP header
* (checksum field must be zeroed out)
*/
static inline u32 udp_csum_outgoing(struct sock *sk, struct sk_buff *skb)
{
u32 csum = csum_partial(skb->h.raw, sizeof(struct udphdr), 0);
skb_queue_walk(&sk->sk_write_queue, skb) {
csum = csum_add(csum, skb->csum);
}
return csum;
}
/* hash routines shared between UDPv4/6 and UDP-Litev4/6 */
static inline void udp_lib_hash(struct sock *sk)
{
BUG();
}
static inline void udp_lib_unhash(struct sock *sk)
{
write_lock_bh(&udp_hash_lock);
if (sk_del_node_init(sk)) {
inet_sk(sk)->num = 0;
sock_prot_dec_use(sk->sk_prot);
}
write_unlock_bh(&udp_hash_lock);
}
static inline void udp_lib_close(struct sock *sk, long timeout)
{
sk_common_release(sk);
}
/* net/ipv4/udp.c */
extern int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);
......@@ -61,21 +136,29 @@ extern unsigned int udp_poll(struct file *file, struct socket *sock,
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
/*
* SNMP statistics for UDP and UDP-Lite
*/
#define UDP_INC_STATS_USER(field, is_udplite) do { \
if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
#define UDP_INC_STATS_BH(field, is_udplite) do { \
if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
......
/*
* Definitions for the UDP-Lite (RFC 3828) code.
*/
#ifndef _UDPLITE_H
#define _UDPLITE_H
/* UDP-Lite socket options */
#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
extern struct proto udplite_prot;
extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
/*
* Checksum computation is all in software, hence simpler getfrag.
*/
static __inline__ int udplite_getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb)
{
return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
}
/* Designate sk as UDP-Lite socket */
static inline int udplite_sk_init(struct sock *sk)
{
udp_sk(sk)->pcflag = UDPLITE_BIT;
return 0;
}
/*
* Checksumming routines
*/
static inline int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh)
{
u16 cscov;
/* In UDPv4 a zero checksum means that the transmitter generated no
* checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
* with a zero checksum field are illegal. */
if (uh->check == 0) {
LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed checksum field\n");
return 1;
}
UDP_SKB_CB(skb)->partial_cov = 0;
cscov = ntohs(uh->len);
if (cscov == 0) /* Indicates that full coverage is required. */
cscov = skb->len;
else if (cscov < 8 || cscov > skb->len) {
/*
* Coverage length violates RFC 3828: log and discard silently.
*/
LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d\n",
cscov, skb->len);
return 1;
} else if (cscov < skb->len)
UDP_SKB_CB(skb)->partial_cov = 1;
UDP_SKB_CB(skb)->cscov = cscov;
/*
* There is no known NIC manufacturer supporting UDP-Lite yet,
* hence ip_summed is always (re-)set to CHECKSUM_NONE.
*/
skb->ip_summed = CHECKSUM_NONE;
return 0;
}
static __inline__ int udplite4_csum_init(struct sk_buff *skb, struct udphdr *uh)
{
int rc = udplite_checksum_init(skb, uh);
if (!rc)
skb->csum = csum_tcpudp_nofold(skb->nh.iph->saddr,
skb->nh.iph->daddr,
skb->len, IPPROTO_UDPLITE, 0);
return rc;
}
static __inline__ int udplite6_csum_init(struct sk_buff *skb, struct udphdr *uh)
{
int rc = udplite_checksum_init(skb, uh);
if (!rc)
skb->csum = ~csum_ipv6_magic(&skb->nh.ipv6h->saddr,
&skb->nh.ipv6h->daddr,
skb->len, IPPROTO_UDPLITE, 0);
return rc;
}
static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
{
int cscov = up->len;
/*
* Sender has set `partial coverage' option on UDP-Lite socket
*/
if (up->pcflag & UDPLITE_SEND_CC) {
if (up->pcslen < up->len) {
/* up->pcslen == 0 means that full coverage is required,
* partial coverage only if 0 < up->pcslen < up->len */
if (0 < up->pcslen) {
cscov = up->pcslen;
}
uh->len = htons(up->pcslen);
}
/*
* NOTE: Causes for the error case `up->pcslen > up->len':
* (i) Application error (will not be penalized).
* (ii) Payload too big for send buffer: data is split
* into several packets, each with its own header.
* In this case (e.g. last segment), coverage may
* exceed packet length.
* Since packets with coverage length > packet length are
* illegal, we fall back to the defaults here.
*/
}
return cscov;
}
static inline u32 udplite_csum_outgoing(struct sock *