1. 22 Mar, 2016 1 commit
    • Arnd Bergmann's avatar
      cred/userns: define current_user_ns() as a function · 0335695d
      Arnd Bergmann authored
      The current_user_ns() macro currently returns &init_user_ns when user
      namespaces are disabled, and that causes several warnings when building
      with gcc-6.0 in code that compares the result of the macro to
      &init_user_ns itself:
        fs/xfs/xfs_ioctl.c: In function 'xfs_ioctl_setattr_check_projid':
        fs/xfs/xfs_ioctl.c:1249:22: error: self-comparison always evaluates to true [-Werror=tautological-compare]
          if (current_user_ns() == &init_user_ns)
      This is a legitimate warning in principle, but here it isn't really
      helpful, so I'm reprasing the definition in a way that shuts up the
      warning.  Apparently gcc only warns when comparing identical literals,
      but it can figure out that the result of an inline function can be
      identical to a constant expression in order to optimize a condition yet
      not warn about the fact that the condition is known at compile time.
      This is exactly what we want here, and it looks reasonable because we
      generally prefer inline functions over macros anyway.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 04 Sep, 2015 1 commit
    • Andy Lutomirski's avatar
      capabilities: ambient capabilities · 58319057
      Andy Lutomirski authored
      Credit where credit is due: this idea comes from Christoph Lameter with
      a lot of valuable input from Serge Hallyn.  This patch is heavily based
      on Christoph's patch.
      ===== The status quo =====
      On Linux, there are a number of capabilities defined by the kernel.  To
      perform various privileged tasks, processes can wield capabilities that
      they hold.
      Each task has four capability masks: effective (pE), permitted (pP),
      inheritable (pI), and a bounding set (X).  When the kernel checks for a
      capability, it checks pE.  The other capability masks serve to modify
      what capabilities can be in pE.
      Any task can remove capabilities from pE, pP, or pI at any time.  If a
      task has a capability in pP, it can add that capability to pE and/or pI.
      If a task has CAP_SETPCAP, then it can add any capability to pI, and it
      can remove capabilities from X.
      Tasks are not the only things that can have capabilities; files can also
      have capabilities.  A file can have no capabilty information at all [1].
      If a file has capability information, then it has a permitted mask (fP)
      and an inheritable mask (fI) as well as a single effective bit (fE) [2].
      File capabilities modify the capabilities of tasks that execve(2) them.
      A task that successfully calls execve has its capabilities modified for
      the file ultimately being excecuted (i.e.  the binary itself if that
      binary is ELF or for the interpreter if the binary is a script.) [3] In
      the capability evolution rules, for each mask Z, pZ represents the old
      value and pZ' represents the new value.  The rules are:
        pP' = (X & fP) | (pI & fI)
        pI' = pI
        pE' = (fE ? pP' : 0)
        X is unchanged
      For setuid binaries, fP, fI, and fE are modified by a moderately
      complicated set of rules that emulate POSIX behavior.  Similarly, if
      euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
      (primary, fP and fI usually end up being the full set).  For nonroot
      users executing binaries with neither setuid nor file caps, fI and fP
      are empty and fE is false.
      As an extra complication, if you execute a process as nonroot and fE is
      set, then the "secure exec" rules are in effect: AT_SECURE gets set,
      LD_PRELOAD doesn't work, etc.
      This is rather messy.  We've learned that making any changes is
      dangerous, though: if a new kernel version allows an unprivileged
      program to change its security state in a way that persists cross
      execution of a setuid program or a program with file caps, this
      persistent state is surprisingly likely to allow setuid or file-capped
      programs to be exploited for privilege escalation.
      ===== The problem =====
      Capability inheritance is basically useless.
      If you aren't root and you execute an ordinary binary, fI is zero, so
      your capabilities have no effect whatsoever on pP'.  This means that you
      can't usefully execute a helper process or a shell command with elevated
      capabilities if you aren't root.
      On current kernels, you can sort of work around this by setting fI to
      the full set for most or all non-setuid executable files.  This causes
      pP' = pI for nonroot, and inheritance works.  No one does this because
      it's a PITA and it isn't even supported on most filesystems.
      If you try this, you'll discover that every nonroot program ends up with
      secure exec rules, breaking many things.
      This is a problem that has bitten many people who have tried to use
      capabilities for anything useful.
      ===== The proposed change =====
      This patch adds a fifth capability mask called the ambient mask (pA).
      pA does what most people expect pI to do.
      pA obeys the invariant that no bit can ever be set in pA if it is not
      set in both pP and pI.  Dropping a bit from pP or pI drops that bit from
      pA.  This ensures that existing programs that try to drop capabilities
      still do so, with a complication.  Because capability inheritance is so
      broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
      then calling execve effectively drops capabilities.  Therefore,
      setresuid from root to nonroot conditionally clears pA unless
      SECBIT_NO_SETUID_FIXUP is set.  Processes that don't like this can
      re-add bits to pA afterwards.
      The capability evolution rules are changed:
        pA' = (file caps or setuid or setgid ? 0 : pA)
        pP' = (X & fP) | (pI & fI) | pA'
        pI' = pI
        pE' = (fE ? pP' : pA')
        X is unchanged
      If you are nonroot but you have a capability, you can add it to pA.  If
      you do so, your children get that capability in pA, pP, and pE.  For
      example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
      automatically bind low-numbered ports.  Hallelujah!
      Unprivileged users can create user namespaces, map themselves to a
      nonzero uid, and create both privileged (relative to their namespace)
      and unprivileged process trees.  This is currently more or less
      impossible.  Hallelujah!
      You cannot use pA to try to subvert a setuid, setgid, or file-capped
      program: if you execute any such program, pA gets cleared and the
      resulting evolution rules are unchanged by this patch.
      Users with nonzero pA are unlikely to unintentionally leak that
      capability.  If they run programs that try to drop privileges, dropping
      privileges will still work.
      It's worth noting that the degree of paranoia in this patch could
      possibly be reduced without causing serious problems.  Specifically, if
      we allowed pA to persist across executing non-pA-aware setuid binaries
      and across setresuid, then, naively, the only capabilities that could
      leak as a result would be the capabilities in pA, and any attacker
      *already* has those capabilities.  This would make me nervous, though --
      setuid binaries that tried to privilege-separate might fail to do so,
      and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
      unexpected side effects.  (Whether these unexpected side effects would
      be exploitable is an open question.) I've therefore taken the more
      paranoid route.  We can revisit this later.
      An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
      ambient capabilities.  I think that this would be annoying and would
      make granting otherwise unprivileged users minor ambient capabilities
      (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
      it is with this patch.
      ===== Footnotes =====
      [1] Files that are missing the "security.capability" xattr or that have
      unrecognized values for that xattr end up with has_cap set to false.
      The code that does that appears to be complicated for no good reason.
      [2] The libcap capability mask parsers and formatters are dangerously
      misleading and the documentation is flat-out wrong.  fE is *not* a mask;
      it's a single bit.  This has probably confused every single person who
      has tried to use file capabilities.
      [3] Linux very confusingly processes both the script and the interpreter
      if applicable, for reasons that elude me.  The results from thinking
      about a script's file capabilities and/or setuid bits are mostly
      Preliminary userspace code is here, but it needs updating:
      Here is a test program that can be used to verify the functionality
      (from Christoph):
       * Test program for the ambient capabilities. This program spawns a shell
       * that allows running processes with a defined set of capabilities.
       * (C) 2015 Christoph Lameter <cl@linux.com>
       * Released under: GPL v3 or later.
       * Compile using:
       *	gcc -o ambient_test ambient_test.o -lcap-ng
       * This program must have the following capabilities to run properly:
       * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
       * A command to equip the binary with the right caps is:
       *	setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
       * To get a shell with additional caps that can be inherited by other processes:
       *	./ambient_test /bin/bash
       * Verifying that it works:
       * From the bash spawed by ambient_test run
       *	cat /proc/$$/status
       * and have a look at the capabilities.
      #include <stdlib.h>
      #include <stdio.h>
      #include <errno.h>
      #include <cap-ng.h>
      #include <sys/prctl.h>
      #include <linux/capability.h>
       * Definitions from the kernel header files. These are going to be removed
       * when the /usr/include files have these defined.
      #define PR_CAP_AMBIENT 47
      #define PR_CAP_AMBIENT_IS_SET 1
      #define PR_CAP_AMBIENT_RAISE 2
      #define PR_CAP_AMBIENT_LOWER 3
      #define PR_CAP_AMBIENT_CLEAR_ALL 4
      static void set_ambient_cap(int cap)
      	int rc;
      	rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
      	if (rc) {
      		printf("Cannot add inheritable cap\n");
      	/* Note the two 0s at the end. Kernel checks for these */
      	if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
      		perror("Cannot set cap");
      int main(int argc, char **argv)
      	int rc;
      	printf("Ambient_test forking shell\n");
      	if (execv(argv[1], argv + 1))
      		perror("Cannot exec");
      	return 0;
      Signed-off-by: Christoph Lameter <cl@linux.com> # Original author
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Aaron Jones <aaronmdjones@gmail.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Andrew G. Morgan <morgan@kernel.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: Markku Savela <msa@moth.iki.fi>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 15 Apr, 2015 1 commit
    • Iulia Manda's avatar
      kernel: conditionally support non-root users, groups and capabilities · 2813893f
      Iulia Manda authored
      There are a lot of embedded systems that run most or all of their
      functionality in init, running as root:root.  For these systems,
      supporting multiple users is not necessary.
      This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
      non-root users, non-root groups, and capabilities optional.  It is enabled
      under CONFIG_EXPERT menu.
      When this symbol is not defined, UID and GID are zero in any possible case
      and processes always have all capabilities.
      The following syscalls are compiled out: setuid, setregid, setgid,
      setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
      getgroups, setfsuid, setfsgid, capget, capset.
      Also, groups.c is compiled out completely.
      In kernel/capability.c, capable function was moved in order to avoid
      adding two ifdef blocks.
      This change saves about 25 KB on a defconfig build.  The most minimal
      kernels have total text sizes in the high hundreds of kB rather than
      low MB.  (The 25k goes down a bit with allnoconfig, but not that much.
      The kernel was booted in Qemu.  All the common functionalities work.
      Adding users/groups is not possible, failing with -ENOSYS.
      Bloat-o-meter output:
      add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarIulia Manda <iulia.manda21@gmail.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 05 Dec, 2014 1 commit
    • Eric W. Biederman's avatar
      groups: Consolidate the setgroups permission checks · 7ff4d90b
      Eric W. Biederman authored
      Today there are 3 instances of setgroups and due to an oversight their
      permission checking has diverged.  Add a common function so that
      they may all share the same permission checking code.
      This corrects the current oversight in the current permission checks
      and adds a helper to avoid this in the future.
      A user namespace security fix will update this new helper, shortly.
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
  5. 17 Jul, 2014 1 commit
    • Jeff Layton's avatar
      nfsd: silence sparse warning about accessing credentials · ae4b884f
      Jeff Layton authored
      sparse says:
          fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces)
          fs/nfsd/auth.c:31:38:    expected struct cred const *cred
          fs/nfsd/auth.c:31:38:    got struct cred const [noderef] <asn:4>*real_cred
      Add a new accessor for the ->real_cred and use that to fetch the
      pointer. Accessing current->real_cred directly is actually quite safe
      since we know that they can't go away so this is mostly a cosmetic fixup
      to silence sparse.
      Signed-off-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
  6. 03 Apr, 2014 1 commit
  7. 09 Jan, 2013 1 commit
  8. 20 Nov, 2012 1 commit
    • Eric W. Biederman's avatar
      userns: Kill task_user_ns · 4c44aaaf
      Eric W. Biederman authored
      The task_user_ns function hides the fact that it is getting the user
      namespace from struct cred on the task.  struct cred may go away as
      soon as the rcu lock is released.  This leads to a race where we
      can dereference a stale user namespace pointer.
      To make it obvious a struct cred is involved kill task_user_ns.
      To kill the race modify the users of task_user_ns to only
      reference the user namespace while the rcu lock is held.
      Cc: Kees Cook <keescook@chromium.org>
      Cc: James Morris <james.l.morris@oracle.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
  9. 02 Oct, 2012 1 commit
    • David Howells's avatar
      KEYS: Make the session and process keyrings per-thread · 3a50597d
      David Howells authored
      Make the session keyring per-thread rather than per-process, but still
      inherited from the parent thread to solve a problem with PAM and gdm.
      The problem is that join_session_keyring() will reject attempts to change the
      session keyring of a multithreaded program but gdm is now multithreaded before
      it gets to the point of starting PAM and running pam_keyinit to create the
      session keyring.  See:
      The reason that join_session_keyring() will only change the session keyring
      under a single-threaded environment is that it's hard to alter the other
      thread's credentials to effect the change in a multi-threaded program.  The
      problems are such as:
       (1) How to prevent two threads both running join_session_keyring() from
       (2) Another thread's credentials may not be modified directly by this process.
       (3) The number of threads is uncertain whilst we're not holding the
           appropriate spinlock, making preallocation slightly tricky.
       (4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
           another thread to replace its keyring, but that means preallocating for
           each thread.
      A reasonable way around this is to make the session keyring per-thread rather
      than per-process and just document that if you want a common session keyring,
      you must get it before you spawn any threads - which is the current situation
      Whilst we're at it, we can the process keyring behave in the same way.  This
      means we can clean up some of the ickyness in the creds code.
      Basically, after this patch, the session, process and thread keyrings are about
      inheritance rules only and not about sharing changes of keyring.
      Reported-by: default avatarMantas M. <grawity@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarRay Strode <rstrode@redhat.com>
  10. 31 May, 2012 1 commit
    • Oleg Nesterov's avatar
      cred: remove task_is_dead() from __task_cred() validation · 43e13cc1
      Oleg Nesterov authored
      Commit 8f92054e ("CRED: Fix __task_cred()'s lockdep check and banner
          add the following validation condition:
              task->exit_state >= 0
          to permit the access if the target task is dead and therefore
          unable to change its own credentials.
      OK, but afaics currently this can only help wait_task_zombie() which calls
      __task_cred() without rcu lock.
      Remove this validation and change wait_task_zombie() to use task_uid()
      instead.  This means we do rcu_read_lock() only to shut up the lockdep,
      but we already do the same in, say, wait_task_stopped().
      task_is_dead() should die, task->exit_state != 0 means that this task has
      passed exit_notify(), only do_wait-like code paths should use this.
      Unfortunately, we can't kill task_is_dead() right now, it has already
      acquired buggy users in drivers/staging.  The fix already exists.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 03 May, 2012 3 commits
  12. 07 Apr, 2012 2 commits
  13. 05 Jan, 2012 1 commit
    • Eric Paris's avatar
      capabilities: remove task_ns_* functions · f1c84dae
      Eric Paris authored
      task_ in the front of a function, in the security subsystem anyway, means
      to me at least, that we are operating with that task as the subject of the
      security decision.  In this case what it means is that we are using current as
      the subject but we use the task to get the right namespace.  Who in the world
      would ever realize that's what task_ns_capability means just by the name?  This
      patch eliminates the task_ns functions entirely and uses the has_ns_capability
      function instead.  This means we explicitly open code the ns in question in
      the caller.  I think it makes the caller a LOT more clear what is going on.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
  14. 08 Aug, 2011 2 commits
  15. 07 Aug, 2011 1 commit
  16. 26 Jul, 2011 1 commit
  17. 08 Jul, 2011 1 commit
  18. 19 May, 2011 1 commit
    • Randy Dunlap's avatar
      Create Documentation/security/, · d410fa4e
      Randy Dunlap authored
      move LSM-, credentials-, and keys-related files from Documentation/
        to Documentation/security/,
      add Documentation/security/00-INDEX, and
      update all occurrences of Documentation/<moved_file>
        to Documentation/security/<moved_file>.
  19. 13 May, 2011 1 commit
    • Serge E. Hallyn's avatar
      Cache user_ns in struct cred · 47a150ed
      Serge E. Hallyn authored
      If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns).
      Get rid of _current_user_ns.  This requires nsown_capable() to be
      defined in capability.c rather than as static inline in capability.h,
      so do that.
      Request_key needs init_user_ns defined at current_user_ns if
      !CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS
      at current_user_ns() define.
      Compile-tested with and without CONFIG_USERNS.
      Signed-off-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
      [ This makes a huge performance difference for acl_permission_check(),
        up to 30%.  And that is one of the hottest kernel functions for loads
        that are pathname-lookup heavy.  ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 23 Mar, 2011 1 commit
    • Serge E. Hallyn's avatar
      userns: security: make capabilities relative to the user namespace · 3486740a
      Serge E. Hallyn authored
      - Introduce ns_capable to test for a capability in a non-default
        user namespace.
      - Teach cap_capable to handle capabilities in a non-default
        user namespace.
      The motivation is to get to the unprivileged creation of new
      namespaces.  It looks like this gets us 90% of the way there, with
      only potential uid confusion issues left.
      I still need to handle getting all caps after creation but otherwise I
      think I have a good starter patch that achieves all of your goals.
      	11/05/2010: [serge] add apparmor
      	12/14/2010: [serge] fix capabilities to created user namespaces
      	Without this, if user serge creates a user_ns, he won't have
      	capabilities to the user_ns he created.  THis is because we
      	were first checking whether his effective caps had the caps
      	he needed and returning -EPERM if not, and THEN checking whether
      	he was the creator.  Reverse those checks.
      	12/16/2010: [serge] security_real_capable needs ns argument in !security case
      	01/11/2011: [serge] add task_ns_capable helper
      	01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
      	02/16/2011: [serge] fix a logic bug: the root user is always creator of
      		    init_user_ns, but should not always have capabilities to
      		    it!  Fix the check in cap_capable().
      	02/21/2011: Add the required user_ns parameter to security_capable,
      		    fixing a compile failure.
      	02/23/2011: Convert some macros to functions as per akpm comments.  Some
      		    couldn't be converted because we can't easily forward-declare
      		    them (they are inline if !SECURITY, extern if SECURITY).  Add
      		    a current_user_ns function so we can use it in capability.h
      		    without #including cred.h.  Move all forward declarations
      		    together to the top of the #ifdef __KERNEL__ section, and use
      		    kernel-doc format.
      	02/23/2011: Per dhowells, clean up comment in cap_capable().
      	02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.
      (Original written and signed off by Eric;  latest, modified version
      acked by him)
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: export current_user_ns() for ecryptfs]
      [serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability]
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  21. 19 Aug, 2010 1 commit
  22. 29 Jul, 2010 2 commits
    • David Howells's avatar
      CRED: Fix __task_cred()'s lockdep check and banner comment · 8f92054e
      David Howells authored
      Fix __task_cred()'s lockdep check by removing the following validation
      as commit_creds() does not take the tasklist_lock, and nor do most of the
      functions that call it, so this check is pointless and it can prevent
      detection of the RCU lock not being held if the tasklist_lock is held.
      Instead, add the following validation condition:
      	task->exit_state >= 0
      to permit the access if the target task is dead and therefore unable to change
      its own credentials.
      Fix __task_cred()'s comment to:
       (1) discard the bit that says that the caller must prevent the target task
           from being deleted.  That shouldn't need saying.
       (2) Add a comment indicating the result of __task_cred() should not be passed
           directly to get_cred(), but rather than get_task_cred() should be used
      Also put a note into the documentation to enforce this point there too.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • David Howells's avatar
      CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials · de09a977
      David Howells authored
      It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
      credentials by incrementing their usage count after their replacement by the
      task being accessed.
      What happens is that get_task_cred() can race with commit_creds():
      	TASK_1			TASK_2			RCU_CLEANER
      	__cred = __task_cred(TASK_2)
      				old_cred = TASK_2->real_cred
      				TASK_2->real_cred = ...
      		[__cred->usage == 0]
      		[__cred->usage == 1]
      							[__cred->usage == 1]
      However, since a tasks credentials are generally not changed very often, we can
      reasonably make use of a loop involving reading the creds pointer and using
      atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
      If successful, we can safely return the credentials in the knowledge that, even
      if the task we're accessing has released them, they haven't gone to the RCU
      cleanup code.
      We then change task_state() in procfs to use get_task_cred() rather than
      calling get_cred() on the result of __task_cred(), as that suffers from the
      same problem.
      Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
      tripped when it is noticed that the usage count is not zero as it ought to be,
      for example:
      kernel BUG at kernel/cred.c:168!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/kernel/mm/ksm/run
      CPU 0
      Pid: 2436, comm: master Not tainted #1 0HR330/OptiPlex
      RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
      RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
      RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
      RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
      R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
      FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
       ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
      <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
      <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
      Call Trace:
       [<ffffffff810698cd>] put_cred+0x13/0x15
       [<ffffffff81069b45>] commit_creds+0x16b/0x175
       [<ffffffff8106aace>] set_current_groups+0x47/0x4e
       [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
       [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
      Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
      48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
      04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
      RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
       RSP <ffff88019e7e9eb8>
      ---[ end trace df391256a100ebdd ]---
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  23. 27 May, 2010 1 commit
  24. 04 Mar, 2010 1 commit
  25. 25 Feb, 2010 1 commit
    • Paul E. McKenney's avatar
      sched: Use lockdep-based checking on rcu_dereference() · d11c563d
      Paul E. McKenney authored
      Update the rcu_dereference() usages to take advantage of the new
      lockdep-based checking.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com>
      [ -v2: fix allmodconfig missing symbol export build failure on x86 ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  26. 23 Sep, 2009 1 commit
    • Andrew Morton's avatar
      include/linux/cred.h: fix build · 74908a00
      Andrew Morton authored
      mips allmodconfig:
      include/linux/cred.h: In function `creds_are_invalid':
      include/linux/cred.h:187: error: `PAGE_SIZE' undeclared (first use in this function)
      include/linux/cred.h:187: error: (Each undeclared identifier is reported only once
      include/linux/cred.h:187: error: for each function it appears in.)
      commit b6dff3ec
      Author:     David Howells <dhowells@redhat.com>
      AuthorDate: Fri Nov 14 10:39:16 2008 +1100
      Commit:     James Morris <jmorris@namei.org>
      CommitDate: Fri Nov 14 10:39:16 2008 +1100
          CRED: Separate task security context from task_struct
      I think.
      It's way too large to be inlined anyway.
      Dunno if this needs an EXPORT_SYMBOL() yet.
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
  27. 13 Sep, 2009 1 commit
  28. 02 Sep, 2009 2 commits
    • David Howells's avatar
      KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] · ee18d64c
      David Howells authored
      Add a keyctl to install a process's session keyring onto its parent.  This
      replaces the parent's session keyring.  Because the COW credential code does
      not permit one process to change another process's credentials directly, the
      change is deferred until userspace next starts executing again.  Normally this
      will be after a wait*() syscall.
      To support this, three new security hooks have been provided:
      cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
      the blank security creds and key_session_to_parent() - which asks the LSM if
      the process may replace its parent's session keyring.
      The replacement may only happen if the process has the same ownership details
      as its parent, and the process has LINK permission on the session keyring, and
      the session keyring is owned by the process, and the LSM permits it.
      Note that this requires alteration to each architecture's notify_resume path.
      This has been done for all arches barring blackfin, m68k* and xtensa, all of
      which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
      replacement to be performed at the point the parent process resumes userspace
      This allows the userspace AFS pioctl emulation to fully emulate newpag() and
      the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
      alter the parent process's PAG membership.  However, since kAFS doesn't use
      PAGs per se, but rather dumps the keys into the session keyring, the session
      keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
      the newpag flag.
      This can be tested with the following program:
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      	#define KEYCTL_SESSION_TO_PARENT	18
      	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
      	int main(int argc, char **argv)
      		key_serial_t keyring, key;
      		long ret;
      		keyring = keyctl_join_session_keyring(argv[1]);
      		OSERROR(keyring, "keyctl_join_session_keyring");
      		key = add_key("user", "a", "b", 1, keyring);
      		OSERROR(key, "add_key");
      		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
      		return 0;
      Compiled and linked with -lkeyutils, you should see something like:
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
      	[dhowells@andromeda ~]$ /tmp/newpag
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	1055658746 --alswrv   4043  4043   \_ user: a
      	[dhowells@andromeda ~]$ /tmp/newpag hello
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: hello
      	340417692 --alswrv   4043  4043   \_ user: a
      Where the test program creates a new session keyring, sticks a user key named
      'a' into it and then installs it on its parent.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
    • David Howells's avatar
      CRED: Add some configurable debugging [try #6] · e0e81739
      David Howells authored
      Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
      for credential management.  The additional code keeps track of the number of
      pointers from task_structs to any given cred struct, and checks to see that
      this number never exceeds the usage count of the cred struct (which includes
      all references, not just those from task_structs).
      Furthermore, if SELinux is enabled, the code also checks that the security
      pointer in the cred struct is never seen to be invalid.
      This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
      kernel thread on seeing cred->security be a NULL pointer (it appears that the
      credential struct has been previously released):
      	http://www.kerneloops.org/oops.php?number=252883Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
  29. 19 Jul, 2009 1 commit
  30. 29 May, 2009 1 commit
  31. 24 Nov, 2008 1 commit
    • Serge Hallyn's avatar
      User namespaces: set of cleanups (v2) · 18b6e041
      Serge Hallyn authored
      The user_ns is moved from nsproxy to user_struct, so that a struct
      cred by itself is sufficient to determine access (which it otherwise
      would not be).  Corresponding ecryptfs fixes (by David Howells) are
      here as well.
      Fix refcounting.  The following rules now apply:
              1. The task pins the user struct.
              2. The user struct pins its user namespace.
              3. The user namespace pins the struct user which created it.
      User namespaces are cloned during copy_creds().  Unsharing a new user_ns
      is no longer possible.  (We could re-add that, but it'll cause code
      duplication and doesn't seem useful if PAM doesn't need to clone user
      When a user namespace is created, its first user (uid 0) gets empty
      keyrings and a clean group_info.
      This incorporates a previous patch by David Howells.  Here
      is his original patch description:
      >I suggest adding the attached incremental patch.  It makes the following
      > (1) Provides a current_user_ns() macro to wrap accesses to current's user
      >     namespace.
      > (2) Fixes eCryptFS.
      > (3) Renames create_new_userns() to create_user_ns() to be more consistent
      >     with the other associated functions and because the 'new' in the name is
      >     superfluous.
      > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
      >     beginning of do_fork() so that they're done prior to making any attempts
      >     at allocation.
      > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
      >     to fill in rather than have it return the new root user.  I don't imagine
      >     the new root user being used for anything other than filling in a cred
      >     struct.
      >     This also permits me to get rid of a get_uid() and a free_uid(), as the
      >     reference the creds were holding on the old user_struct can just be
      >     transferred to the new namespace's creator pointer.
      > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
      >     preparation rather than doing it in copy_creds().
      >Signed-off-by: David Howells <dhowells@redhat.com>
      	Oct 20: integrate dhowells comments
      		1. leave thread_keyring alone
      		2. use current_user_ns() in set_user()
      Signed-off-by: default avatarSerge Hallyn <serue@us.ibm.com>
  32. 13 Nov, 2008 3 commits
    • David Howells's avatar
      CRED: Allow kernel services to override LSM settings for task actions · 3a3b7ce9
      David Howells authored
      Allow kernel services to override LSM settings appropriate to the actions
      performed by a task by duplicating a set of credentials, modifying it and then
      using task_struct::cred to point to it when performing operations on behalf of
      a task.
      This is used, for example, by CacheFiles which has to transparently access the
      cache on behalf of a process that thinks it is doing, say, NFS accesses with a
      potentially inappropriate (with respect to accessing the cache) set of
      This patch provides two LSM hooks for modifying a task security record:
       (*) security_kernel_act_as() which allows modification of the security datum
           with which a task acts on other objects (most notably files).
       (*) security_kernel_create_files_as() which allows modification of the
           security datum that is used to initialise the security data on a file that
           a task creates.
      The patch also provides four new credentials handling functions, which wrap the
      LSM functions:
       (1) prepare_kernel_cred()
           Prepare a set of credentials for a kernel service to use, based either on
           a daemon's credentials or on init_cred.  All the keyrings are cleared.
       (2) set_security_override()
           Set the LSM security ID in a set of credentials to a specific security
           context, assuming permission from the LSM policy.
       (3) set_security_override_from_ctx()
           As (2), but takes the security context as a string.
       (4) set_create_files_as()
           Set the file creation LSM security ID in a set of credentials to be the
           same as that on a particular inode.
      Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
    • David Howells's avatar
      CRED: Differentiate objective and effective subjective credentials on a task · 3b11a1de
      David Howells authored
      Differentiate the objective and real subjective credentials from the effective
      subjective credentials on a task by introducing a second credentials pointer
      into the task_struct.
      task_struct::real_cred then refers to the objective and apparent real
      subjective credentials of a task, as perceived by the other tasks in the
      task_struct::cred then refers to the effective subjective credentials of a
      task, as used by that task when it's actually running.  These are not visible
      to the other tasks in the system.
      __task_cred(task) then refers to the objective/real credentials of the task in
      current_cred() refers to the effective subjective credentials of the current
      prepare_creds() uses the objective creds as a base and commit_creds() changes
      both pointers in the task_struct (indeed commit_creds() requires them to be the
      override_creds() and revert_creds() change the subjective creds pointer only,
      and the former returns the old subjective creds.  These are used by NFSD,
      faccessat() and do_coredump(), and will by used by CacheFiles.
      In SELinux, current_has_perm() is provided as an alternative to
      task_has_perm().  This uses the effective subjective context of current,
      whereas task_has_perm() uses the objective/real context of the subject.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
    • David Howells's avatar
      CRED: Documentation · 98870ab0
      David Howells authored
      Document credentials and the new credentials API.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>