1. 03 Mar, 2009 1 commit
    • Eric W. Biederman's avatar
      netns: Remove net_alive · 17edde52
      Eric W. Biederman authored
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17edde52
  2. 22 Feb, 2009 1 commit
    • Eric W. Biederman's avatar
      netns: Remove net_alive · ce16c533
      Eric W. Biederman authored
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce16c533
  3. 25 Nov, 2008 1 commit
  4. 12 Nov, 2008 1 commit
  5. 31 Oct, 2008 1 commit
  6. 08 Oct, 2008 1 commit
  7. 26 Jul, 2008 1 commit
    • Al Viro's avatar
      [PATCH] beginning of sysctl cleanup - ctl_table_set · 73455092
      Al Viro authored
      New object: set of sysctls [currently - root and per-net-ns].
      Contains: pointer to parent set, list of tables and "should I see this set?"
      method (->is_seen(set)).
      Current lists of tables are subsumed by that; net-ns contains such a beast.
      ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
      that to ->list of that ctl_table_set.
      
      [folded compile fixes by rdd for configs without sysctl]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      73455092
  8. 18 Jul, 2008 1 commit
  9. 20 Jun, 2008 1 commit
    • Eric W. Biederman's avatar
      netns: Don't receive new packets in a dead network namespace. · b9f75f45
      Eric W. Biederman authored
      Alexey Dobriyan <adobriyan@gmail.com> writes:
      > Subject: ICMP sockets destruction vs ICMP packets oops
      
      > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
      > icmp_reply() wants ICMP socket.
      >
      > Steps to reproduce:
      >
      > 	launch shell in new netns
      > 	move real NIC to netns
      > 	setup routing
      > 	ping -i 0
      > 	exit from shell
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > PGD 17f3cd067 PUD 17f3ce067 PMD 0 
      > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
      > CPU 0 
      > Modules linked in: usblp usbcore
      > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
      > RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
      > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
      > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
      > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
      > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
      > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
      > FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
      > Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
      >  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
      >  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
      > Call Trace:
      >  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
      >  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
      >  [<ffffffff803fd645>] icmp_echo+0x65/0x70
      >  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
      >  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
      >  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
      >  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
      >  [<ffffffff803be57d>] process_backlog+0x9d/0x100
      >  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
      >  [<ffffffff80237be5>] __do_softirq+0x75/0x100
      >  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
      >  [<ffffffff8020f085>] do_softirq+0x65/0xa0
      >  [<ffffffff80237af7>] irq_exit+0x97/0xa0
      >  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
      >  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
      >  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
      >  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
      >  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
      >  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
      >  [<ffffffff8040f380>] ? rest_init+0x70/0x80
      > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
      > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
      > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
      > RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      >  RSP <ffffffff8057fc30>
      > CR2: 0000000000000000
      > ---[ end trace ea161157b76b33e8 ]---
      > Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      Receiving packets while we are cleaning up a network namespace is a
      racy proposition. It is possible when the packet arrives that we have
      removed some but not all of the state we need to fully process it.  We
      have the choice of either playing wack-a-mole with the cleanup routines
      or simply dropping packets when we don't have a network namespace to
      handle them.
      
      Since the check looks inexpensive in netif_receive_skb let's just
      drop the incoming packets.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9f75f45
  10. 19 May, 2008 1 commit
  11. 16 Apr, 2008 1 commit
  12. 15 Apr, 2008 2 commits
    • Pavel Emelyanov's avatar
      [NETNS]: The generic per-net pointers. · dec827d1
      Pavel Emelyanov authored
      Add the elastic array of void * pointer to the struct net.
      The access rules are simple:
      
       1. register the ops with register_pernet_gen_device to get
          the id of your private pointer
       2. call net_assign_generic() to put the private data on the
          struct net (most preferably this should be done in the
          ->init callback of the ops registered)
       3. do not store any private reference on the net_generic array;
       4. do not change this pointer while the net is alive;
       5. use the net_generic() to get the pointer.
      
      When adding a new pointer, I copy the old array, replace it
      with a new one and schedule the old for kfree after an RCU
      grace period.
      
      Since the net_generic explores the net->gen array inside rcu
      read section and once set the net->gen->ptr[x] pointer never 
      changes, this grants us a safe access to generic pointers.
      
      Quoting Paul: "... RCU is protecting -only- the net_generic 
      structure that net_generic() is traversing, and the [pointer]
      returned by net_generic() is protected by a reference counter 
      in the upper-level struct net."
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dec827d1
    • Pavel Emelyanov's avatar
      [NETNS]: The net-subsys IDs generator. · c93cf61f
      Pavel Emelyanov authored
      To make some per-net generic pointers, we need some way to address
      them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
      subsystems.
      
      Addressing questions about potential checkpoint/restart problems: 
      these IDs are "lite-offsets" within the net structure and are by no 
      means supposed to be exported to the userspace.
      
      Since it will be used in the nearest future by devices only (tun,
      vlan, tunnels, bridge, etc), I make it resemble the functionality
      of register_pernet_device().
      
      The new ids is stored in the *id pointer _before_ calling the init
      callback to make this id available in this callback.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c93cf61f
  13. 13 Apr, 2008 1 commit
  14. 03 Apr, 2008 1 commit
  15. 02 Apr, 2008 2 commits
  16. 31 Mar, 2008 1 commit
  17. 25 Mar, 2008 1 commit
  18. 07 Mar, 2008 1 commit
    • Pavel Emelyanov's avatar
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov authored
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9720acd
  19. 31 Jan, 2008 1 commit
  20. 28 Jan, 2008 12 commits
  21. 13 Nov, 2007 1 commit
  22. 01 Nov, 2007 1 commit
  23. 26 Oct, 2007 1 commit
    • Eric W. Biederman's avatar
      [NET]: Marking struct pernet_operations __net_initdata was inappropriate · 2b008b0a
      Eric W. Biederman authored
      It is not safe to to place struct pernet_operations in a special section.
      We need struct pernet_operations to last until we call unregister_pernet_subsys.
      Which doesn't happen until module unload.
      
      So marking struct pernet_operations is a disaster for modules in two ways.
      - We discard it before we call the exit method it points to.
      - Because I keep struct pernet_operations on a linked list discarding
        it for compiled in code removes elements in the middle of a linked
        list and does horrible things for linked insert.
      
      So this looks safe assuming __exit_refok is not discarded
      for modules.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b008b0a
  24. 10 Oct, 2007 4 commits
    • Pavel Emelyanov's avatar
      [NETNS]: Move some code into __init section when CONFIG_NET_NS=n · 4665079c
      Pavel Emelyanov authored
      With the net namespaces many code leaved the __init section,
      thus making the kernel occupy more memory than it did before.
      Since we have a config option that prohibits the namespace
      creation, the functions that initialize/finalize some netns
      stuff are simply not needed and can be freed after the boot.
      
      Currently, this is almost not noticeable, since few calls
      are no longer in __init, but when the namespaces will be
      merged it will be possible to free more code. I propose to
      use the __net_init, __net_exit and __net_initdata "attributes"
      for functions/variables that are not used if the CONFIG_NET_NS
      is not set to save more space in memory.
      
      The exiting functions cannot just reside in the __exit section,
      as noticed by David, since the init section will have
      references on it and the compilation will fail due to modpost
      checks. These references can exist, since the init namespace
      never dies and the exit callbacks are never called. So I
      introduce the __exit_refok attribute just like it is already
      done with the __init_refok.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4665079c
    • Eric W. Biederman's avatar
      [NETNS]: Simplify the network namespace list locking rules. · f4618d39
      Eric W. Biederman authored
      Denis V. Lunev <den@sw.ru> noticed that the locking rules
      for the network namespace list are over complicated and broken.
      
      In particular the current register_netdev_notifier currently
      does not take any lock making the for_each_net iteration racy
      with network namespace creation and destruction. Oops.
      
      The fact that we need to use for_each_net in rtnl_unlock() when
      the rtnetlink support becomes per network namespace makes designing
      the proper locking tricky.  In addition we need to be able to call
      rtnl_lock() and rtnl_unlock() when we have the net_mutex held.
      
      After thinking about it and looking at the alternatives carefully
      it looks like the simplest and most maintainable solution is
      to remove net_list_mutex altogether, and to use the rtnl_mutex instead.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4618d39
    • Eric W. Biederman's avatar
      [NET]: Make the loopback device per network namespace. · 2774c7ab
      Eric W. Biederman authored
      This patch makes loopback_dev per network namespace.  Adding
      code to create a different loopback device for each network
      namespace and adding the code to free a loopback device
      when a network namespace exits.
      
      This patch modifies all users the loopback_dev so they
      access it as init_net.loopback_dev, keeping all of the
      code compiling and working.  A later pass will be needed to
      update the users to use something other than the initial network
      namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2774c7ab
    • Eric W. Biederman's avatar
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman authored
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dd776b6