1. 07 Apr, 2016 1 commit
    • Stefan Hajnoczi's avatar
      virtio: add VIRTIO_CONFIG_S_NEEDS_RESET device status bit · c00bbcf8
      Stefan Hajnoczi authored
      The VIRTIO 1.0 specification added the DEVICE_NEEDS_RESET device status
      bit in "VIRTIO-98: Add DEVICE_NEEDS_RESET".  This patch defines the
      device status bit in the uapi header file so that both the kernel and
      userspace applications can use it.
      
      The bit is currently unused by the virtio guest drivers and vhost.
      According to the spec "a good implementation will try to recover by
      issuing a reset".  This is not attempted here because it requires
      auditing the virtio drivers to ensure there are no resource leaks or
      crashes if the device needs to be reset mid-operation.
      
      See "2.1 Device Status Field" in the VIRTIO 1.0 specification for
      details.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      c00bbcf8
  2. 30 Mar, 2016 2 commits
  3. 29 Mar, 2016 1 commit
  4. 22 Mar, 2016 3 commits
    • Dmitry Vyukov's avatar
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov authored
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
    • Alexandre Bounine's avatar
      rapidio: add mport char device driver · e8de3701
      Alexandre Bounine authored
      Add mport character device driver to provide user space interface to
      basic RapidIO subsystem operations.
      
      See included Documentation/rapidio/mport_cdev.txt for more details.
      
      [akpm@linux-foundation.org: fix printk warning on i386]
      [dan.carpenter@oracle.com: mport_cdev: fix some error codes]
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarBarry Wood <barry.wood@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Barry Wood <barry.wood@idt.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8de3701
    • David Decotigny's avatar
      ethtool: minor doc update · 5f2d4724
      David Decotigny authored
      Updates: commit 793cf87d ("ethtool: Set cmd field in
               ETHTOOL_GLINKSETTINGS response to wrong nwords")
      Signed-off-by: default avatarDavid Decotigny <decot@googlers.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f2d4724
  5. 21 Mar, 2016 2 commits
  6. 18 Mar, 2016 1 commit
  7. 17 Mar, 2016 4 commits
  8. 16 Mar, 2016 1 commit
  9. 15 Mar, 2016 2 commits
  10. 14 Mar, 2016 3 commits
    • Jarno Rajahalme's avatar
      openvswitch: Interface with NAT. · 05752523
      Jarno Rajahalme authored
      Extend OVS conntrack interface to cover NAT.  New nested
      OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
      A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
      If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
      attributes, new (non-committed/non-confirmed) connections are mangled
      according to the rest of the nested attributes.
      
      The corresponding OVS userspace patch series includes test cases (in
      tests/system-traffic.at) that also serve as example uses.
      
      This work extends on a branch by Thomas Graf at
      https://github.com/tgraf/ovs/tree/nat.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      05752523
    • Jarno Rajahalme's avatar
      netfilter: Remove IP_CT_NEW_REPLY definition. · bfa3f9d7
      Jarno Rajahalme authored
      Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
      not make sense.  This allows the definition of IP_CT_NUMBER to be
      simplified as well.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bfa3f9d7
    • Martin KaFai Lau's avatar
      tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In · a44d6eac
      Martin KaFai Lau authored
      Per RFC4898, they count segments sent/received
      containing a positive length data segment (that includes
      retransmission segments carrying data).  Unlike
      tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
      carrying no data (e.g. pure ack).
      
      The patch also updates the segs_in in tcp_fastopen_add_skb()
      so that segs_in >= data_segs_in property is kept.
      
      Together with retransmission data, tcpi_data_segs_out
      gives a better signal on the rxmit rate.
      
      v6: Rebase on the latest net-next
      
      v5: Eric pointed out that checking skb->len is still needed in
      tcp_fastopen_add_skb() because skb can carry a FIN without data.
      Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
      helper is used.  Comment is added to the fastopen case to explain why
      segs_in has to be reset and tcp_segs_in() has to be called before
      __skb_pull().
      
      v4: Add comment to the changes in tcp_fastopen_add_skb()
      and also add remark on this case in the commit message.
      
      v3: Add const modifier to the skb parameter in tcp_segs_in()
      
      v2: Rework based on recent fix by Eric:
      commit a9d99ce2 ("tcp: fix tcpi_segs_in after connection establishment")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Chris Rapier <rapier@psc.edu>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a44d6eac
  11. 13 Mar, 2016 2 commits
  12. 12 Mar, 2016 1 commit
  13. 11 Mar, 2016 3 commits
  14. 10 Mar, 2016 6 commits
  15. 09 Mar, 2016 2 commits
  16. 08 Mar, 2016 6 commits
    • Alexei Starovoitov's avatar
      bpf: pre-allocate hash map elements · 6c905981
      Alexei Starovoitov authored
      If kprobe is placed on spin_unlock then calling kmalloc/kfree from
      bpf programs is not safe, since the following dead lock is possible:
      kfree->spin_lock(kmem_cache_node->lock)...spin_unlock->kprobe->
      bpf_prog->map_update->kmalloc->spin_lock(of the same kmem_cache_node->lock)
      and deadlocks.
      
      The following solutions were considered and some implemented, but
      eventually discarded
      - kmem_cache_create for every map
      - add recursion check to slow-path of slub
      - use reserved memory in bpf_map_update for in_irq or in preempt_disabled
      - kmalloc via irq_work
      
      At the end pre-allocation of all map elements turned out to be the simplest
      solution and since the user is charged upfront for all the memory, such
      pre-allocation doesn't affect the user space visible behavior.
      
      Since it's impossible to tell whether kprobe is triggered in a safe
      location from kmalloc point of view, use pre-allocation by default
      and introduce new BPF_F_NO_PREALLOC flag.
      
      While testing of per-cpu hash maps it was discovered
      that alloc_percpu(GFP_ATOMIC) has odd corner cases and often
      fails to allocate memory even when 90% of it is free.
      The pre-allocation of per-cpu hash elements solves this problem as well.
      
      Turned out that bpf_map_update() quickly followed by
      bpf_map_lookup()+bpf_map_delete() is very common pattern used
      in many of iovisor/bcc/tools, so there is additional benefit of
      pre-allocation, since such use cases are must faster.
      
      Since all hash map elements are now pre-allocated we can remove
      atomic increment of htab->count and save few more cycles.
      
      Also add bpf_map_precharge_memlock() to check rlimit_memlock early to avoid
      large malloc/free done by users who don't have sufficient limits.
      
      Pre-allocation is done with vmalloc and alloc/free is done
      via percpu_freelist. Here are performance numbers for different
      pre-allocation algorithms that were implemented, but discarded
      in favor of percpu_freelist:
      
      1 cpu:
      pcpu_ida	2.1M
      pcpu_ida nolock	2.3M
      bt		2.4M
      kmalloc		1.8M
      hlist+spinlock	2.3M
      pcpu_freelist	2.6M
      
      4 cpu:
      pcpu_ida	1.5M
      pcpu_ida nolock	1.8M
      bt w/smp_align	1.7M
      bt no/smp_align	1.1M
      kmalloc		0.7M
      hlist+spinlock	0.2M
      pcpu_freelist	2.0M
      
      8 cpu:
      pcpu_ida	0.7M
      bt w/smp_align	0.8M
      kmalloc		0.4M
      pcpu_freelist	1.5M
      
      32 cpu:
      kmalloc		0.13M
      pcpu_freelist	0.49M
      
      pcpu_ida nolock is a modified percpu_ida algorithm without
      percpu_ida_cpu locks and without cross-cpu tag stealing.
      It's faster than existing percpu_ida, but not as fast as pcpu_freelist.
      
      bt is a variant of block/blk-mq-tag.c simlified and customized
      for bpf use case. bt w/smp_align is using cache line for every 'long'
      (similar to blk-mq-tag). bt no/smp_align allocates 'long'
      bitmasks continuously to save memory. It's comparable to percpu_ida
      and in some cases faster, but slower than percpu_freelist
      
      hlist+spinlock is the simplest free list with single spinlock.
      As expeceted it has very bad scaling in SMP.
      
      kmalloc is existing implementation which is still available via
      BPF_F_NO_PREALLOC flag. It's significantly slower in single cpu and
      in 8 cpu setup it's 3 times slower than pre-allocation with pcpu_freelist,
      but saves memory, so in cases where map->max_entries can be large
      and number of map update/delete per second is low, it may make
      sense to use it.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c905981
    • Daniel Borkmann's avatar
      bpf: support for access to tunnel options · 14ca0751
      Daniel Borkmann authored
      After eBPF being able to programmatically access/manage tunnel key meta
      data via commit d3aa45ce ("bpf: add helpers to access tunnel metadata")
      and more recently also for IPv6 through c6c33454 ("bpf: support ipv6
      for bpf_skb_{set,get}_tunnel_key"), this work adds two complementary
      helpers to generically access their auxiliary tunnel options.
      
      Geneve and vxlan support this facility. For geneve, TLVs can be pushed,
      and for the vxlan case its GBP extension. I.e. setting tunnel key for geneve
      case only makes sense, if we can also read/write TLVs into it. In the GBP
      case, it provides the flexibility to easily map the group policy ID in
      combination with other helpers or maps.
      
      I chose to model this as two separate helpers, bpf_skb_{set,get}_tunnel_opt(),
      for a couple of reasons. bpf_skb_{set,get}_tunnel_key() is already rather
      complex by itself, and there may be cases for tunnel key backends where
      tunnel options are not always needed. If we would have integrated this
      into bpf_skb_{set,get}_tunnel_key() nevertheless, we are very limited with
      remaining helper arguments, so keeping compatibility on structs in case of
      passing in a flat buffer gets more cumbersome. Separating both also allows
      for more flexibility and future extensibility, f.e. options could be fed
      directly from a map, etc.
      
      Moreover, change geneve's xmit path to test only for info->options_len
      instead of TUNNEL_GENEVE_OPT flag. This makes it more consistent with vxlan's
      xmit path and allows for avoiding to specify a protocol flag in the API on
      xmit, so it can be protocol agnostic. Having info->options_len is enough
      information that is needed. Tested with vxlan and geneve.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14ca0751
    • Daniel Borkmann's avatar
      bpf: allow to propagate df in bpf_skb_set_tunnel_key · 22080870
      Daniel Borkmann authored
      Added by 9a628224 ("ip_tunnel: Add dont fragment flag."), allow to
      feed df flag into tunneling facilities (currently supported on TX by
      vxlan, geneve and gre) as a hint from eBPF's bpf_skb_set_tunnel_key()
      helper.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22080870
    • Daniel Borkmann's avatar
      bpf: add flags to bpf_skb_store_bytes for clearing hash · 8afd54c8
      Daniel Borkmann authored
      When overwriting parts of the packet with bpf_skb_store_bytes() that
      were fed previously into skb->hash calculation, we should clear the
      current hash with skb_clear_hash(), so that a next skb_get_hash() call
      can determine the correct hash related to this skb.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8afd54c8
    • Lionel Landwerlin's avatar
      drm: introduce pipe color correction properties · 5488dc16
      Lionel Landwerlin authored
      Patch based on a previous series by Shashank Sharma.
      
      This introduces optional properties to enable color correction at the
      pipe level. It relies on 3 transformations applied to every pixels
      displayed. First a lookup into a degamma table, then a multiplication
      of the rgb components by a 3x3 matrix and finally another lookup into
      a gamma table.
      
      The following properties can be added to a pipe :
        - DEGAMMA_LUT : blob containing degamma LUT
        - DEGAMMA_LUT_SIZE : number of elements in DEGAMMA_LUT
        - CTM : transformation matrix applied after the degamma LUT
        - GAMMA_LUT : blob containing gamma LUT
        - GAMMA_LUT_SIZE : number of elements in GAMMA_LUT
      
      DEGAMMA_LUT_SIZE and GAMMA_LUT_SIZE are read only properties, set by
      the driver to tell userspace applications what sizes should be the
      lookup tables in DEGAMMA_LUT and GAMMA_LUT.
      
      A helper is also provided so legacy gamma correction is redirected
      through these new properties.
      
      v2: Register LUT size properties as range
      
      v3: Fix round in drm_color_lut_get_value() helper
          More docs on how degamma/gamma properties are used
      
      v4: Update contributors
      
      v5: Rename CTM_MATRIX property to CTM (Doh!)
          Add legacy gamma_set atomic helper
          Describe CTM/LUT acronyms in the kernel doc
      
      v6: Fix missing blob unref in drm_atomic_helper_crtc_reset
      Signed-off-by: default avatarShashank Sharma <shashank.sharma@intel.com>
      Signed-off-by: default avatarKumar, Kiran S <kiran.s.kumar@intel.com>
      Signed-off-by: default avatarKausal Malladi <kausalmalladi@gmail.com>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Acked-by: default avatarRob Bradford <robert.bradford@intel.com>
      [danvet: CrOS maintainers are also happy with the userspacde side:
      https://codereview.chromium.org/1182063002/ ]
      Reviewed-by: default avatarDaniel Stone <daniels@collabora.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1456506302-640-4-git-send-email-lionel.g.landwerlin@intel.com
      5488dc16
    • Martin Koegler's avatar
      ALSA: seq: Provide card number / PID via sequencer client info · a1ce94d0
      Martin Koegler authored
      rawmidi devices expose the card number via IOCTLs, which allows to
      find the corresponding device in sysfs.
      
      The sequencer provides no identifing data. Chromium works around this
      issue by scanning rawmidi as well as sequencer devices and matching
      them by using assumtions, how the kernel register sequencer devices.
      
      This changes adds support for exposing the card number for kernel clients
      as well as the PID for user client.
      
      The minor of the API version is changed to distinguish between the zero
      initialised reserved field and card number 0.
      
      [minor coding style fixes by tiwai]
      Signed-off-by: default avatarMartin Koegler <martin.koegler@chello.at>
      Acked-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      a1ce94d0