1. 28 Jan, 2008 3 commits
  2. 13 Nov, 2007 1 commit
  3. 01 Nov, 2007 1 commit
  4. 26 Oct, 2007 1 commit
    • Eric W. Biederman's avatar
      [NET]: Marking struct pernet_operations __net_initdata was inappropriate · 2b008b0a
      Eric W. Biederman authored
      
      
      It is not safe to to place struct pernet_operations in a special section.
      We need struct pernet_operations to last until we call unregister_pernet_subsys.
      Which doesn't happen until module unload.
      
      So marking struct pernet_operations is a disaster for modules in two ways.
      - We discard it before we call the exit method it points to.
      - Because I keep struct pernet_operations on a linked list discarding
        it for compiled in code removes elements in the middle of a linked
        list and does horrible things for linked insert.
      
      So this looks safe assuming __exit_refok is not discarded
      for modules.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b008b0a
  5. 10 Oct, 2007 9 commits
    • Pavel Emelyanov's avatar
      [NETNS]: Move some code into __init section when CONFIG_NET_NS=n · 4665079c
      Pavel Emelyanov authored
      
      
      With the net namespaces many code leaved the __init section,
      thus making the kernel occupy more memory than it did before.
      Since we have a config option that prohibits the namespace
      creation, the functions that initialize/finalize some netns
      stuff are simply not needed and can be freed after the boot.
      
      Currently, this is almost not noticeable, since few calls
      are no longer in __init, but when the namespaces will be
      merged it will be possible to free more code. I propose to
      use the __net_init, __net_exit and __net_initdata "attributes"
      for functions/variables that are not used if the CONFIG_NET_NS
      is not set to save more space in memory.
      
      The exiting functions cannot just reside in the __exit section,
      as noticed by David, since the init section will have
      references on it and the compilation will fail due to modpost
      checks. These references can exist, since the init namespace
      never dies and the exit callbacks are never called. So I
      introduce the __exit_refok attribute just like it is already
      done with the __init_refok.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4665079c
    • Eric W. Biederman's avatar
      [NETNS]: Simplify the network namespace list locking rules. · f4618d39
      Eric W. Biederman authored
      
      
      Denis V. Lunev <den@sw.ru> noticed that the locking rules
      for the network namespace list are over complicated and broken.
      
      In particular the current register_netdev_notifier currently
      does not take any lock making the for_each_net iteration racy
      with network namespace creation and destruction. Oops.
      
      The fact that we need to use for_each_net in rtnl_unlock() when
      the rtnetlink support becomes per network namespace makes designing
      the proper locking tricky.  In addition we need to be able to call
      rtnl_lock() and rtnl_unlock() when we have the net_mutex held.
      
      After thinking about it and looking at the alternatives carefully
      it looks like the simplest and most maintainable solution is
      to remove net_list_mutex altogether, and to use the rtnl_mutex instead.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4618d39
    • Eric W. Biederman's avatar
      [NET]: Make the loopback device per network namespace. · 2774c7ab
      Eric W. Biederman authored
      
      
      This patch makes loopback_dev per network namespace.  Adding
      code to create a different loopback device for each network
      namespace and adding the code to free a loopback device
      when a network namespace exits.
      
      This patch modifies all users the loopback_dev so they
      access it as init_net.loopback_dev, keeping all of the
      code compiling and working.  A later pass will be needed to
      update the users to use something other than the initial network
      namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2774c7ab
    • Eric W. Biederman's avatar
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman authored
      
      
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dd776b6
    • Eric W. Biederman's avatar
      [NET]: Fix race when opening a proc file while a network namespace is exiting. · 077130c0
      Eric W. Biederman authored
      
      
      The problem:  proc_net files remember which network namespace the are
      against but do not remember hold a reference count (as that would pin
      the network namespace).   So we currently have a small window where
      the reference count on a network namespace may be incremented when opening
      a /proc file when it has already gone to zero.
      
      To fix this introduce maybe_get_net and get_proc_net.
      
      maybe_get_net increments the network namespace reference count only if it is
      greater then zero, ensuring we don't increment a reference count after it
      has gone to zero.
      
      get_proc_net handles all of the magic to go from a proc inode to the network
      namespace instance and call maybe_get_net on it.
      
      PROC_NET the old accessor is removed so that we don't get confused and use
      the wrong helper function.
      
      Then I fix up the callers to use get_proc_net and handle the case case
      where get_proc_net returns NULL.  In that case I return -ENXIO because
      effectively the network namespace has already gone away so the files
      we are trying to access don't exist anymore.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      077130c0
    • Daniel Lezcano's avatar
      [NETNS]: Fix allnoconfig compilation error. · 4fabcd71
      Daniel Lezcano authored
      
      
      When CONFIG_NET=no, init_net is unresolved because net_namespace.c
      is not compiled and the include pull init_net definition.
      
      This problem was very similar with the ipc namespace where the kernel
      can be compiled with SYSV ipc out.
      
      This patch fix that defining a macro which simply remove init_net
      initialization from nsproxy namespace aggregator.
      
      Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.
      Signed-off-by: default avatarDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fabcd71
    • Eric W. Biederman's avatar
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman authored
      
      
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      881d966b
    • Eric W. Biederman's avatar
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman authored
      
      
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      457c4cbc
    • Eric W. Biederman's avatar
      [NET]: Basic network namespace infrastructure. · 5f256bec
      Eric W. Biederman authored
      
      
      This is the basic infrastructure needed to support network
      namespaces.  This infrastructure is:
      - Registration functions to support initializing per network
        namespace data when a network namespaces is created or destroyed.
      
      - struct net.  The network namespace data structure.
        This structure will grow as variables are made per network
        namespace but this is the minimal starting point.
      
      - Functions to grab a reference to the network namespace.
        I provide both get/put functions that keep a network namespace
        from being freed.  And hold/release functions serve as weak references
        and will warn if their count is not zero when the data structure
        is freed.  Useful for dealing with more complicated data structures
        like the ipv4 route cache.
      
      - A list of all of the network namespaces so we can iterate over them.
      
      - A slab for the network namespace data structure allowing leaks
        to be spotted.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f256bec