1. 13 Sep, 2012 1 commit
  2. 10 Sep, 2012 2 commits
  3. 07 Sep, 2012 1 commit
  4. 05 Sep, 2012 2 commits
  5. 31 Aug, 2012 2 commits
    • Jerry Chu's avatar
      tcp: TCP Fast Open Server - support TFO listeners · 8336886f
      Jerry Chu authored
      
      
      This patch builds on top of the previous patch to add the support
      for TFO listeners. This includes -
      
      1. allocating, properly initializing, and managing the per listener
      fastopen_queue structure when TFO is enabled
      
      2. changes to the inet_csk_accept code to support TFO. E.g., the
      request_sock can no longer be freed upon accept(), not until 3WHS
      finishes
      
      3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
      if it's a TFO socket
      
      4. properly closing a TFO listener, and a TFO socket before 3WHS
      finishes
      
      5. supporting TCP_FASTOPEN socket option
      
      6. modifying tcp_check_req() to use to check a TFO socket as well
      as request_sock
      
      7. supporting TCP's TFO cookie option
      
      8. adding a new SYN-ACK retransmit handler to use the timer directly
      off the TFO socket rather than the listener socket. Note that TFO
      server side will not retransmit anything other than SYN-ACK until
      the 3WHS is completed.
      
      The patch also contains an important function
      "reqsk_fastopen_remove()" to manage the somewhat complex relation
      between a listener, its request_sock, and the corresponding child
      socket. See the comment above the function for the detail.
      Signed-off-by: default avatarH.K. Jerry Chu <hkchu@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8336886f
    • Sorin Dumitru's avatar
      ipv6: remove some deadcode · eb7e0575
      Sorin Dumitru authored
      
      
      __ipv6_regen_rndid no longer returns anything other than 0
      so there's no point in verifying what it returns
      Signed-off-by: default avatarSorin Dumitru <sdumitru@ixiacom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb7e0575
  6. 29 Aug, 2012 8 commits
  7. 23 Aug, 2012 1 commit
  8. 22 Aug, 2012 2 commits
    • Eric Dumazet's avatar
      ipv6: gre: fix ip6gre_err() · b87fb39e
      Eric Dumazet authored
      
      
      ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
      not 40 as expected), and should take into account 'offset' parameter.
      
      Also uses pskb_may_pull() to cope with some fragged skbs
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87fb39e
    • Eric Dumazet's avatar
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet authored
      
      
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0115e8e3
  9. 20 Aug, 2012 3 commits
    • Neal Cardwell's avatar
      net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() · fae6ef87
      Neal Cardwell authored
      
      
      This commit removes the sk_rx_dst_set calls from
      tcp_create_openreq_child(), because at that point the icsk_af_ops
      field of ipv6_mapped TCP sockets has not been set to its proper final
      value.
      
      Instead, to make sure we get the right sk_rx_dst_set variant
      appropriate for the address family of the new connection, we have
      tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
      shortly after the call to tcp_create_openreq_child() returns.
      
      This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
      with the new approach.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reported-by: default avatarArtem Savkov <artem.savkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fae6ef87
    • Patrick McHardy's avatar
      net: ipv6: fix oops in inet_putpeer() · 9d7b0fc1
      Patrick McHardy authored
      Commit 97bab73f
      
       (inet: Hide route peer accesses behind helpers.) introduced
      a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
      initialized, causing a false positive result from inetpeer_ptr_is_peer(),
      which in turn causes a NULL pointer dereference in inet_putpeer().
      
      Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
      EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
      EIP is at inet_putpeer+0xe/0x16
      EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
      ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
       DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
      CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
      DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      DR6: ffff0ff0 DR7: 00000400
       f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
       f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
       f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
       [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
       [<c038d5f7>] dst_destroy+0x1d/0xa4
       [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
       [<c0396870>] flow_cache_gc_task+0x85/0x9f
       [<c0142d2b>] process_one_work+0x122/0x441
       [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
       [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
       [<c0143e2d>] worker_thread+0x113/0x3cc
      
      Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
      properly initialize the dst's peer pointer.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d7b0fc1
    • Dan Carpenter's avatar
      gre: information leak in ip6_tnl_ioctl() · 5ef5d6c5
      Dan Carpenter authored
      There is a one byte hole between p->hop_limit and p->flowinfo where
      stack memory is leaked to the user.  This was introduced in c12b395a
      
      
      "gre: Support GRE over IPv6".
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      5ef5d6c5
  10. 16 Aug, 2012 1 commit
  11. 14 Aug, 2012 5 commits
  12. 13 Aug, 2012 1 commit
    • Wu Fengguang's avatar
      netfilter: PTR_RET can be used · 19e303d6
      Wu Fengguang authored
      
      
      This quiets the coccinelle warnings:
      
      net/bridge/netfilter/ebtable_filter.c:107:1-3: WARNING: PTR_RET can be used
      net/bridge/netfilter/ebtable_nat.c:107:1-3: WARNING: PTR_RET can be used
      net/ipv6/netfilter/ip6table_filter.c:65:1-3: WARNING: PTR_RET can be used
      net/ipv6/netfilter/ip6table_mangle.c:100:1-3: WARNING: PTR_RET can be used
      net/ipv6/netfilter/ip6table_raw.c:44:1-3: WARNING: PTR_RET can be used
      net/ipv6/netfilter/ip6table_security.c:62:1-3: WARNING: PTR_RET can be used
      net/ipv4/netfilter/iptable_filter.c:72:1-3: WARNING: PTR_RET can be used
      net/ipv4/netfilter/iptable_mangle.c:107:1-3: WARNING: PTR_RET can be used
      net/ipv4/netfilter/iptable_raw.c:51:1-3: WARNING: PTR_RET can be used
      net/ipv4/netfilter/iptable_security.c:70:1-3: WARNING: PTR_RET can be used
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      19e303d6
  13. 09 Aug, 2012 3 commits
    • Eric Dumazet's avatar
      net: tcp: ipv6_mapped needs sk_rx_dst_set method · 63d02d15
      Eric Dumazet authored
      commit 5d299f3d
      
       (net: ipv6: fix TCP early demux) added a
      regression for ipv6_mapped case.
      
      [   67.422369] SELinux: initialized (dev autofs, type autofs), uses
      genfs_contexts
      [   67.449678] SELinux: initialized (dev autofs, type autofs), uses
      genfs_contexts
      [   92.631060] BUG: unable to handle kernel NULL pointer dereference at
      (null)
      [   92.631435] IP: [<          (null)>]           (null)
      [   92.631645] PGD 0
      [   92.631846] Oops: 0010 [#1] SMP
      [   92.632095] Modules linked in: autofs4 sunrpc ipv6 dm_mirror
      dm_region_hash dm_log dm_multipath dm_mod video sbs sbshc battery ac lp
      parport sg snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event
      snd_seq snd_seq_device pcspkr snd_pcm_oss snd_mixer_oss snd_pcm
      snd_timer serio_raw button floppy snd i2c_i801 i2c_core soundcore
      snd_page_alloc shpchp ide_cd_mod cdrom microcode ehci_hcd ohci_hcd
      uhci_hcd
      [   92.634294] CPU 0
      [   92.634294] Pid: 4469, comm: sendmail Not tainted 3.6.0-rc1 #3
      [   92.634294] RIP: 0010:[<0000000000000000>]  [<          (null)>]
      (null)
      [   92.634294] RSP: 0018:ffff880245fc7cb0  EFLAGS: 00010282
      [   92.634294] RAX: ffffffffa01985f0 RBX: ffff88024827ad00 RCX:
      0000000000000000
      [   92.634294] RDX: 0000000000000218 RSI: ffff880254735380 RDI:
      ffff88024827ad00
      [   92.634294] RBP: ffff880245fc7cc8 R08: 0000000000000001 R09:
      0000000000000000
      [   92.634294] R10: 0000000000000000 R11: ffff880245fc7bf8 R12:
      ffff880254735380
      [   92.634294] R13: ffff880254735380 R14: 0000000000000000 R15:
      7fffffffffff0218
      [   92.634294] FS:  00007f4516ccd6f0(0000) GS:ffff880256600000(0000)
      knlGS:0000000000000000
      [   92.634294] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   92.634294] CR2: 0000000000000000 CR3: 0000000245ed1000 CR4:
      00000000000007f0
      [   92.634294] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [   92.634294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400
      [   92.634294] Process sendmail (pid: 4469, threadinfo ffff880245fc6000,
      task ffff880254b8cac0)
      [   92.634294] Stack:
      [   92.634294]  ffffffff813837a7 ffff88024827ad00 ffff880254b6b0e8
      ffff880245fc7d68
      [   92.634294]  ffffffff81385083 00000000001d2680 ffff8802547353a8
      ffff880245fc7d18
      [   92.634294]  ffffffff8105903a ffff88024827ad60 0000000000000002
      00000000000000ff
      [   92.634294] Call Trace:
      [   92.634294]  [<ffffffff813837a7>] ? tcp_finish_connect+0x2c/0xfa
      [   92.634294]  [<ffffffff81385083>] tcp_rcv_state_process+0x2b6/0x9c6
      [   92.634294]  [<ffffffff8105903a>] ? sched_clock_cpu+0xc3/0xd1
      [   92.634294]  [<ffffffff81059073>] ? local_clock+0x2b/0x3c
      [   92.634294]  [<ffffffff8138caf3>] tcp_v4_do_rcv+0x63a/0x670
      [   92.634294]  [<ffffffff8133278e>] release_sock+0x128/0x1bd
      [   92.634294]  [<ffffffff8139f060>] __inet_stream_connect+0x1b1/0x352
      [   92.634294]  [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f
      [   92.634294]  [<ffffffff8104b333>] ? wake_up_bit+0x25/0x25
      [   92.634294]  [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f
      [   92.634294]  [<ffffffff8139f223>] ? inet_stream_connect+0x22/0x4b
      [   92.634294]  [<ffffffff8139f234>] inet_stream_connect+0x33/0x4b
      [   92.634294]  [<ffffffff8132e8cf>] sys_connect+0x78/0x9e
      [   92.634294]  [<ffffffff813fd407>] ? sysret_check+0x1b/0x56
      [   92.634294]  [<ffffffff81088503>] ? __audit_syscall_entry+0x195/0x1c8
      [   92.634294]  [<ffffffff811cc26e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [   92.634294]  [<ffffffff813fd3e2>] system_call_fastpath+0x16/0x1b
      [   92.634294] Code:  Bad RIP value.
      [   92.634294] RIP  [<          (null)>]           (null)
      [   92.634294]  RSP <ffff880245fc7cb0>
      [   92.634294] CR2: 0000000000000000
      [   92.648982] ---[ end trace 24e2bed94314c8d9 ]---
      [   92.649146] Kernel panic - not syncing: Fatal exception in interrupt
      
      Fix this using inet_sk_rx_dst_set(), and export this function in case
      IPv6 is modular.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63d02d15
    • Pavel Emelyanov's avatar
      net: Loopback ifindex is constant now · 1fb9489b
      Pavel Emelyanov authored
      
      
      As pointed out, there are places, that access net->loopback_dev->ifindex
      and after ifindex generation is made per-net this value becomes constant
      equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
      it where appropriate.
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fb9489b
    • Eric Dumazet's avatar
      time: jiffies_delta_to_clock_t() helper to the rescue · a399a805
      Eric Dumazet authored
      Various /proc/net files sometimes report crazy timer values, expressed
      in clock_t units.
      
      This happens when an expired timer delta (expires - jiffies) is passed
      to jiffies_to_clock_t().
      
      This function has an overflow in :
      
      return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);
      
      commit cbbc719f
      
       (time: Change jiffies_to_clock_t() argument type
      to unsigned long) only got around the problem.
      
      As we cant output negative values in /proc/net/tcp without breaking
      various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
      that caps the negative delta to a 0 value.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: hank <pyu@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a399a805
  14. 06 Aug, 2012 1 commit
  15. 31 Jul, 2012 2 commits
  16. 30 Jul, 2012 2 commits
    • Eric Dumazet's avatar
      net: TCP early demux cleanup · cca32e4b
      Eric Dumazet authored
      
      
      early_demux() handlers should be called in RCU context, and as we
      use skb_dst_set_noref(skb, dst), caller must not exit from RCU context
      before dst use (skb_dst(skb)) or release (skb_drop(dst))
      
      Therefore, rcu_read_lock()/rcu_read_unlock() pairs around
      ->early_demux() are confusing and not needed :
      
      Protocol handlers are already in an RCU read lock section.
      (__netif_receive_skb() does the rcu_read_lock() )
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cca32e4b
    • Li Wei's avatar
      ipv6: fix incorrect route 'expires' value passed to userspace · 8253947e
      Li Wei authored
      
      
      When userspace use RTM_GETROUTE to dump route table, with an already
      expired route entry, we always got an 'expires' value(2147157)
      calculated base on INT_MAX.
      
      The reason of this problem is in the following satement:
      	rt->dst.expires - jiffies < INT_MAX
      gcc promoted the type of both sides of '<' to unsigned long, thus
      a small negative value would be considered greater than INT_MAX.
      
      With the help of Eric Dumazet, do the out of bound checks in
      rtnl_put_cacheinfo(), _after_ conversion to clock_t.
      Signed-off-by: default avatarLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8253947e
  17. 26 Jul, 2012 1 commit
  18. 23 Jul, 2012 1 commit
    • Eric Dumazet's avatar
      tcp: dont drop MTU reduction indications · 563d34d0
      Eric Dumazet authored
      ICMP messages generated in output path if frame length is bigger than
      mtu are actually lost because socket is owned by user (doing the xmit)
      
      One example is the ipgre_tunnel_xmit() calling
      icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
      
      We had a similar case fixed in commit a34a101e
      
       (ipv6: disable GSO on
      sockets hitting dst_allfrag).
      
      Problem of such fix is that it relied on retransmit timers, so short tcp
      sessions paid a too big latency increase price.
      
      This patch uses the tcp_release_cb() infrastructure so that MTU
      reduction messages (ICMP messages) are not lost, and no extra delay
      is added in TCP transmits.
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Diagnosed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Tore Anderson <tore@fud.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      563d34d0
  19. 20 Jul, 2012 1 commit