Skip to content
  • Daniel Borkmann's avatar
    bpf, maps: flush own entries on perf map release · 3b1efb19
    Daniel Borkmann authored
    The behavior of perf event arrays are quite different from all
    others as they are tightly coupled to perf event fds, f.e. shown
    recently by commit e03e7ee3 ("perf/bpf: Convert perf_event_array
    to use struct file") to make refcounting on perf event more robust.
    A remaining issue that the current code still has is that since
    additions to the perf event array take a reference on the struct
    file via perf_event_get() and are only released via fput() (that
    cleans up the perf event eventually via perf_event_release_kernel())
    when the element is either manually removed from the map from user
    space or automatically when the last reference on the perf event
    map is dropped. However, this leads us to dangling struct file's
    when the map gets pinned after the application owning the perf
    event descriptor exits, and since the struct file reference will
    in such case only be manually dropped or via pinned file removal,
    it leads to the perf event living longer than necessary, consuming
    needlessly resources for that time.
    
    Relations between perf event fds and bpf perf event map fds can be
    rather complex. F.e. maps can act as demuxers among different perf
    event fds that can possibly be owned by different threads and based
    on the index selection from the program, events get dispatched to
    one of the per-cpu fd endpoints. One perf event fd (or, rather a
    per-cpu set of them) can also live in multiple perf event maps at
    the same time, listening for events. Also, another requirement is
    that perf event fds can get closed from application side after they
    have been attached to the perf event map, so that on exit perf event
    map will take care of dropping their references eventually. Likewise,
    when such maps are pinned, the intended behavior is that a user
    application does bpf_obj_get(), puts its fds in there and on exit
    when fd is released, they are dropped from the map again, so the map
    acts rather as connector endpoint. This also makes perf event maps
    inherently different from program arrays as described in more detail
    in commit c9da161c
    
     ("bpf: fix clearing on persistent program
    array maps").
    
    To tackle this, map entries are marked by the map struct file that
    added the element to the map. And when the last reference to that map
    struct file is released from user space, then the tracked entries
    are purged from the map. This is okay, because new map struct files
    instances resp. frontends to the anon inode are provided via
    bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
    for retrieving a pinned map, but also when an initial instance is
    created via map_create(). The rest is resolved by the vfs layer
    automatically for us by keeping reference count on the map's struct
    file. Any concurrent updates on the map slot are fine as well, it
    just means that perf_event_fd_array_release() needs to delete less
    of its own entires.
    
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    3b1efb19