1. 16 Feb, 2016 1 commit
    • Aditya Kali's avatar
      cgroup: introduce cgroup namespaces · a79a908f
      Aditya Kali authored
      Introduce the ability to create new cgroup namespace. The newly created
      cgroup namespace remembers the cgroup of the process at the point
      of creation of the cgroup namespace (referred as cgroupns-root).
      The main purpose of cgroup namespace is to virtualize the contents
      of /proc/self/cgroup file. Processes inside a cgroup namespace
      are only able to see paths relative to their namespace root
      (unless they are moved outside of their cgroupns-root, at which point
       they will see a relative path from their cgroupns-root).
      For a correctly setup container this enables container-tools
      (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
      containers without leaking system level cgroup hierarchy to the task.
      This patch only implements the 'unshare' part of the cgroupns.
      Signed-off-by: default avatarAditya Kali <adityakali@google.com>
      Signed-off-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a79a908f
  2. 20 Jan, 2016 1 commit
    • Jann Horn's avatar
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn authored
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
  3. 30 Dec, 2015 1 commit
  4. 08 Dec, 2015 1 commit
    • Al Viro's avatar
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro authored
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b255391
  5. 10 May, 2015 2 commits
    • Al Viro's avatar
      don't pass nameidata to ->follow_link() · 6e77137b
      Al Viro authored
      its only use is getting passed to nd_jump_link(), which can obtain
      it from current->nameidata
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6e77137b
    • Al Viro's avatar
      new ->follow_link() and ->put_link() calling conventions · 680baacb
      Al Viro authored
      a) instead of storing the symlink body (via nd_set_link()) and returning
      an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
      that opaque pointer (into void * passed by address by caller) and returns
      the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
      symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
      is ignored in all cases except the last one.
      
      Storing NULL for opaque pointer (or not storing it at all) means no call
      of ->put_link().
      
      b) the body used to be passed to ->put_link() implicitly (via nameidata).
      Now only the opaque pointer is.  In the cases when we used the symlink body
      to free stuff, ->follow_link() now should store it as opaque pointer in addition
      to returning it.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      680baacb
  6. 15 Apr, 2015 1 commit
  7. 10 Dec, 2014 2 commits
    • Al Viro's avatar
      kill proc_ns completely · 3d3d35b1
      Al Viro authored
      procfs inodes need only the ns_ops part; nsfs inodes don't need it at all
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3d3d35b1
    • Al Viro's avatar
      take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b
      Al Viro authored
      New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
      It's not mountable (not even registered, so it's not in /proc/filesystems,
      etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
      
      This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
      get_proc_ns() is a macro now (it's simply returning ->i_private; would
      have been an inline, if not for header ordering headache).
      proc_ns_inode() is an ex-parrot.  The interface used in procfs is
      ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
      
      Dentries and inodes are never hashed; a non-counting reference to dentry
      is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
      if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
      of that mechanism.
      
      As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
      it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
      from ns_get_path().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e149ed2b
  8. 04 Dec, 2014 2 commits
  9. 01 Apr, 2014 1 commit
  10. 15 Nov, 2013 1 commit
  11. 29 Jun, 2013 2 commits
  12. 01 May, 2013 1 commit
  13. 09 Mar, 2013 1 commit
    • Eric W. Biederman's avatar
      proc: Use nd_jump_link in proc_ns_follow_link · db04dc67
      Eric W. Biederman authored
      Update proc_ns_follow_link to use nd_jump_link instead of just
      manually updating nd.path.dentry.
      
      This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave
      Jones and reproduced trivially with mkdir /proc/self/ns/uts/a.
      
      Sigh it looks like the VFS change to require use of nd_jump_link
      happend while proc_ns_follow_link was baking and since the common case
      of proc_ns_follow_link continued to work without problems the need for
      making this change was overlooked.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      db04dc67
  14. 20 Nov, 2012 3 commits
    • Eric W. Biederman's avatar
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman authored
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      98f842e6
    • Eric W. Biederman's avatar
      proc: Fix the namespace inode permission checks. · bf056bfa
      Eric W. Biederman authored
      Change the proc namespace files into symlinks so that
      we won't cache the dentries for the namespace files
      which can bypass the ptrace_may_access checks.
      
      To support the symlinks create an additional namespace
      inode with it's own set of operations distinct from the
      proc pid inode and dentry methods as those no longer
      make sense.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      bf056bfa
    • Eric W. Biederman's avatar
      userns: Implent proc namespace operations · cde1975b
      Eric W. Biederman authored
      This allows entering a user namespace, and the ability
      to store a reference to a user namespace with a bind
      mount.
      
      Addition of missing userns_ns_put in userns_install
      from Gao feng <gaofeng@cn.fujitsu.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      cde1975b
  15. 19 Nov, 2012 2 commits
    • Eric W. Biederman's avatar
      vfs: Add setns support for the mount namespace · 8823c079
      Eric W. Biederman authored
      setns support for the mount namespace is a little tricky as an
      arbitrary decision must be made about what to set fs->root and
      fs->pwd to, as there is no expectation of a relationship between
      the two mount namespaces.  Therefore I arbitrarily find the root
      mount point, and follow every mount on top of it to find the top
      of the mount stack.  Then I set fs->root and fs->pwd to that
      location.  The topmost root of the mount stack seems like a
      reasonable place to be.
      
      Bind mount support for the mount namespace inodes has the
      possibility of creating circular dependencies between mount
      namespaces.  Circular dependencies can result in loops that
      prevent mount namespaces from every being freed.  I avoid
      creating those circular dependencies by adding a sequence number
      to the mount namespace and require all bind mounts be of a
      younger mount namespace into an older mount namespace.
      
      Add a helper function proc_ns_inode so it is possible to
      detect when we are attempting to bind mound a namespace inode.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      8823c079
    • Eric W. Biederman's avatar
      pidns: Add setns support · 57e8391d
      Eric W. Biederman authored
      - Pid namespaces are designed to be inescapable so verify that the
        passed in pid namespace is a child of the currently active
        pid namespace or the currently active pid namespace itself.
      
        Allowing the currently active pid namespace is important so
        the effects of an earlier setns can be cancelled.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      57e8391d
  16. 14 Jul, 2012 2 commits
  17. 28 Mar, 2012 1 commit
  18. 23 Mar, 2012 1 commit
  19. 03 Jan, 2012 1 commit
  20. 15 Jun, 2011 1 commit
  21. 24 May, 2011 1 commit
  22. 10 May, 2011 4 commits