1. 30 Nov, 2012 1 commit
    • Eric Dumazet's avatar
      net: move inet_dport/inet_num in sock_common · ce43b03e
      Eric Dumazet authored
      commit 68835aba (net: optimize INET input path further)
      moved some fields used for tcp/udp sockets lookup in the first cache
      line of struct sock_common.
      
      This patch moves inet_dport/inet_num as well, filling a 32bit hole
      on 64 bit arches and reducing number of cache line misses in lookups.
      
      Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
      before addresses match, as this check is more discriminant.
      
      Remove the hash check from MATCH() macros because we dont need to
      re validate the hash value after taking a refcount on socket, and
      use likely/unlikely compiler hints, as the sk_hash/hash check
      makes the following conditional tests 100% predicted by cpu.
      
      Introduce skc_addrpair/skc_portpair pair values to better
      document the alignment requirements of the port/addr pairs
      used in the various MATCH() macros, and remove some casts.
      
      The namespace check can also be done at last.
      
      This slightly improves TCP/UDP lookup times.
      
      IP/TCP early demux needs inet->rx_dst_ifindex and
      TCP needs inet->min_ttl, lets group them together in same cache line.
      
      With help from Ben Hutchings & Joe Perches.
      
      Idea of this patch came after Ling Ma proposal to move skc_hash
      to the beginning of struct sock_common, and should allow him
      to submit a final version of his patch. My tests show an improvement
      doing so.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce43b03e
  2. 13 Nov, 2012 1 commit
  3. 13 Oct, 2012 1 commit
  4. 29 Aug, 2012 1 commit
    • Patrick McHardy's avatar
      netfilter: nf_conntrack_ipv6: improve fragmentation handling · 4cdd3408
      Patrick McHardy authored
      The IPv6 conntrack fragmentation currently has a couple of shortcomings.
      Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
      defragmented packet is then passed to conntrack, the resulting conntrack
      information is attached to each original fragment and the fragments then
      continue their way through the stack.
      
      Helper invocation occurs in the POSTROUTING hook, at which point only
      the original fragments are available. The result of this is that
      fragmented packets are never passed to helpers.
      
      This patch improves the situation in the following way:
      
      - If a reassembled packet belongs to a connection that has a helper
        assigned, the reassembled packet is passed through the stack instead
        of the original fragments.
      
      - During defragmentation, the largest received fragment size is stored.
        On output, the packet is refragmented if required. If the largest
        received fragment size exceeds the outgoing MTU, a "packet too big"
        message is generated, thus behaving as if the original fragments
        were passed through the stack from an outside point of view.
      
      - The ipv6_helper() hook function can't receive fragments anymore for
        connections using a helper, so it is switched to use ipv6_skip_exthdr()
        instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
        reassembled packets are passed to connection tracking helpers.
      
      The result of this is that we can properly track fragmented packets, but
      still generate ICMPv6 Packet too big messages if we would have before.
      
      This patch is also required as a precondition for IPv6 NAT, where NAT
      helpers might enlarge packets up to a point that they require
      fragmentation. In that case we can't generate Packet too big messages
      since the proper MTU can't be calculated in all cases (f.i. when
      changing textual representation of a variable amount of addresses),
      so the packet is transparently fragmented iff the original packet or
      fragments would have fit the outgoing MTU.
      
      IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      4cdd3408
  5. 06 Aug, 2012 1 commit
  6. 18 Jul, 2012 1 commit
  7. 11 Jul, 2012 1 commit
  8. 12 Feb, 2012 2 commits
  9. 08 Feb, 2012 1 commit
  10. 11 Dec, 2011 1 commit
  11. 24 Nov, 2010 1 commit
  12. 21 Oct, 2010 1 commit
  13. 22 Aug, 2010 1 commit
  14. 19 Jul, 2010 1 commit
    • David S. Miller's avatar
      ipv6: Make IP6CB(skb)->nhoff 16-bit. · e7c38157
      David S. Miller authored
      Even with jumbograms I cannot see any way in which we would need
      to records a larger than 65535 valued next-header offset.
      
      The maximum extension header length is (256 << 3) == 2048.
      There are only a handful of extension headers specified which
      we'd even accept (say 5 or 6), therefore the largest next-header
      offset we'd ever have to contend with is something less than
      say 16k.
      
      Therefore make it a u16 instead of a u32.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7c38157
  15. 03 Jun, 2010 1 commit
  16. 11 May, 2010 1 commit
    • Patrick McHardy's avatar
      ipv6: ip6mr: support multiple tables · d1db275d
      Patrick McHardy authored
      This patch adds support for multiple independant multicast routing instances,
      named "tables".
      
      Userspace multicast routing daemons can bind to a specific table instance by
      issuing a setsockopt call using a new option MRT6_TABLE. The table number is
      stored in the raw socket data and affects all following ip6mr setsockopt(),
      getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
      is created with a default routing rule pointing to it. Newly created pim6reg
      devices have the table number appended ("pim6regX"), with the exception of
      devices created in the default table, which are named just "pim6reg" for
      compatibility reasons.
      
      Packets are directed to a specific table instance using routing rules,
      similar to how regular routing rules work. Currently iif, oif and mark
      are supported as keys, source and destination addresses could be supported
      additionally.
      
      Example usage:
      
      - bind pimd/xorp/... to a specific table:
      
      uint32_t table = 123;
      setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));
      
      - create routing rules directing packets to the new table:
      
      # ip -6 mrule add iif eth0 lookup 123
      # ip -6 mrule add oif eth0 lookup 123
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      d1db275d
  17. 24 Apr, 2010 2 commits
  18. 22 Apr, 2010 1 commit
    • Stephen Hemminger's avatar
      IPv6: Generic TTL Security Mechanism (final version) · e802af9c
      Stephen Hemminger authored
      This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism.  
      
      Not to users of mapped address; the IPV6 and IPV4 socket options are seperate.
      The server does have to deal with both IPv4 and IPv6 socket options
      and the client has to handle the different for each family.
      
      On client:
      	int ttl = 255;
      	getaddrinfo(argv[1], argv[2], &hint, &result);
      
      	for (rp = result; rp != NULL; rp = rp->ai_next) {
      		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
      		if (s < 0) continue;
      
      		if (rp->ai_family == AF_INET) {
      			setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl));
      		} else if (rp->ai_family == AF_INET6) {
      			setsockopt(s, IPPROTO_IPV6,  IPV6_UNICAST_HOPS, 
      					&ttl, sizeof(ttl)))
      		}
      			
      		if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) {
      		   ...
      
      On server:
      	int minttl = 255 - maxhops;
         
      	getaddrinfo(NULL, port, &hints, &result);
      	for (rp = result; rp != NULL; rp = rp->ai_next) {
      		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
      		if (s < 0) continue;
      
      		if (rp->ai_family == AF_INET6)
      			setsockopt(s, IPPROTO_IPV6,  IPV6_MINHOPCOUNT,
      					&minttl, sizeof(minttl));
      		setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl));
      			
      		if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0)
      			break
      ...
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e802af9c
  19. 13 Apr, 2010 1 commit
  20. 18 Oct, 2009 1 commit
    • Eric Dumazet's avatar
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet authored
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c720c7e8
  21. 08 Oct, 2009 1 commit
    • Jin Dongming's avatar
      ipv6: Fix the size overflow of addrconf_sysctl array · e0e6f55d
      Jin Dongming authored
      (This patch fixes bug of commit f7734fdf
       title "make TLLAO option for NA packets configurable")
      
      When the IPV6 conf is used, the function sysctl_set_parent is called and the
      array addrconf_sysctl is used as a parameter of the function.
      
      The above patch added new conf "force_tllao" into the array addrconf_sysctl,
      but the size of the array was not modified, the static allocated size is
      DEVCONF_MAX + 1 but the real size is DEVCONF_MAX + 2, so the problem is
      that the function sysctl_set_parent accessed wrong address.
      
      I got the following information.
      Call Trace:
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff810622d5>] __register_sysctl_paths+0xde/0x272
          [<ffffffff8110892d>] ? __kmalloc_track_caller+0x16e/0x180
          [<ffffffffa00cfac3>] ? __addrconf_sysctl_register+0xc5/0x144 [ipv6]
          [<ffffffff8141f2c9>] register_net_sysctl_table+0x48/0x4b
          [<ffffffffa00cfaf5>] __addrconf_sysctl_register+0xf7/0x144 [ipv6]
          [<ffffffffa00cfc16>] addrconf_init_net+0xd4/0x104 [ipv6]
          [<ffffffff8139195f>] setup_net+0x35/0x82
          [<ffffffff81391f6c>] copy_net_ns+0x76/0xe0
          [<ffffffff8107ad60>] create_new_namespaces+0xf0/0x16e
          [<ffffffff8107afee>] copy_namespaces+0x65/0x9f
          [<ffffffff81056dff>] copy_process+0xb2c/0x12c3
          [<ffffffff810576e1>] do_fork+0x14b/0x2d2
          [<ffffffff8107ac4e>] ? up_read+0xe/0x10
          [<ffffffff81438e73>] ? do_page_fault+0x27a/0x2aa
          [<ffffffff8101044b>] sys_clone+0x28/0x2a
          [<ffffffff81011fb3>] stub_clone+0x13/0x20
          [<ffffffff81011c72>] ? system_call_fastpath+0x16/0x1b
      
      And the information of IPV6 in .config is as following.
      IPV6 in .config:
          CONFIG_IPV6=m
          CONFIG_IPV6_PRIVACY=y
          CONFIG_IPV6_ROUTER_PREF=y
          CONFIG_IPV6_ROUTE_INFO=y
          CONFIG_IPV6_OPTIMISTIC_DAD=y
          CONFIG_IPV6_MIP6=m
          CONFIG_IPV6_SIT=m
          # CONFIG_IPV6_SIT_6RD is not set
          CONFIG_IPV6_NDISC_NODETYPE=y
          CONFIG_IPV6_TUNNEL=m
          CONFIG_IPV6_MULTIPLE_TABLES=y
          CONFIG_IPV6_SUBTREES=y
          CONFIG_IPV6_MROUTE=y
          CONFIG_IPV6_PIMSM_V2=y
          # CONFIG_IP_VS_IPV6 is not set
          CONFIG_NF_CONNTRACK_IPV6=m
          CONFIG_IP6_NF_MATCH_IPV6HEADER=m
      
      I confirmed this patch fixes this problem.
      Signed-off-by: default avatarJin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0e6f55d
  22. 07 Oct, 2009 1 commit
    • Octavian Purdila's avatar
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila authored
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: default avatarCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: default avatarOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7734fdf
  23. 01 Jun, 2009 1 commit
  24. 30 Jan, 2009 1 commit
  25. 16 Dec, 2008 1 commit
    • Yang Hongyang's avatar
      ipv6: Add IPV6_PKTINFO sticky option support to setsockopt() · b24a2516
      Yang Hongyang authored
      There are three reasons for me to add this support:
      1.When no interface is specified in an IPV6_PKTINFO ancillary data
        item, the interface specified in an IPV6_PKTINFO sticky optionis 
        is used.
      
      RFC3542:
      6.7.  Summary of Outgoing Interface Selection
      
         This document and [RFC-3493] specify various methods that affect the
         selection of the packet's outgoing interface.  This subsection
         summarizes the ordering among those in order to ensure deterministic
         behavior.
      
         For a given outgoing packet on a given socket, the outgoing interface
         is determined in the following order:
      
         1. if an interface is specified in an IPV6_PKTINFO ancillary data
            item, the interface is used.
      
         2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
            option, the interface is used.
      
      2.When no IPV6_PKTINFO ancillary data is received,getsockopt() should 
        return the sticky option value which set with setsockopt().
      
      RFC 3542:
         Issuing getsockopt() for the above options will return the sticky
         option value i.e., the value set with setsockopt().  If no sticky
         option value has been set getsockopt() will return the following
         values:
      
      3.Make the setsockopt implementation POSIX compliant.
      Signed-off-by: default avatarYang Hongyang <yanghy@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b24a2516
  26. 21 Jul, 2008 1 commit
  27. 03 Jul, 2008 2 commits
  28. 10 Jun, 2008 1 commit
  29. 14 Apr, 2008 1 commit
  30. 05 Apr, 2008 1 commit
  31. 25 Mar, 2008 1 commit
  32. 24 Mar, 2008 4 commits
  33. 31 Jan, 2008 2 commits