Skip to content
  • Richard Guy Briggs's avatar
    audit: try harder to send to auditd upon netlink failure · 32a1dbae
    Richard Guy Briggs authored
    There are several reports of the kernel losing contact with auditd when
    it is, in fact, still running.  When this happens, kernel syslogs show:
    	"audit: *NO* daemon at audit_pid=<pid>"
    although auditd is still running, and is apparently happy, listening on
    the netlink socket. The pid in the "*NO* daemon" message matches the pid
    of the running auditd process.  Restarting auditd solves this.
    
    The problem appears to happen randomly, and doesn't seem to be strongly
    correlated to the rate of audit events being logged.  The problem
    happens fairly regularly (every few days), but not yet reproduced to
    order.
    
    On production kernels, BUG_ON() is a no-op, so any error will trigger
    this.
    
    Commit 34eab0a7
    
     ("audit: prevent an older auditd shutdown from
    orphaning a newer auditd startup") eliminates one possible cause.  This
    isn't the case here, since the PID in the error message and the PID of
    the running auditd match.
    
    The primary expected cause of error here is -ECONNREFUSED when the audit
    daemon goes away, when netlink_getsockbyportid() can't find the auditd
    portid entry in the netlink audit table (or there is no receive
    function).  If -EPERM is returned, that situation isn't likely to be
    resolved in a timely fashion without administrator intervention.  In
    both cases, reset the audit_pid.  This does not rule out a race
    condition.  SELinux is expected to return zero since this isn't an INET
    or INET6 socket.  Other LSMs may have other return codes.  Log the error
    code for better diagnosis in the future.
    
    In the case of -ENOMEM, the situation could be temporary, based on local
    or general availability of buffers.  -EAGAIN should never happen since
    the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
    -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
    expected to receive signals.  In these cases (or any other unexpected
    ones for now), report the error and re-schedule the thread, retrying up
    to 5 times.
    
    v2:
    	Removed BUG_ON().
    	Moved comma in pr_*() statements.
    	Removed audit_strerror() text.
    
    Reported-by: default avatarVipin Rathor <v.rathor@gmail.com>
    Reported-by: default avatar <ctcard@hotmail.com>
    Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
    [PM: applied rgb's fixup patch to correct audit_log_lost() format issues]
    Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
    32a1dbae