1. 25 Oct, 2016 21 commits
    • Muktesh Khole's avatar
      Async Code, builds successfully - untested · 71bacb46
      Muktesh Khole authored
      71bacb46
    • Charlie Jacobsen's avatar
      Two simple IPC tests are passing. Still getting mysterious hang. · c667fea5
      Charlie Jacobsen authored
      Working on moving Linux's mm into lcd's.
      
      I gave up trying to debug the hang. Confirmed the pages for the lcd's
      vm are the ones I expect. Turned on red zones. All tests are passing.
      Hang happens after insmod/rmmod of the lcd module about 10 - 20 times, it
      varies. Sometimes one core just silently dies / doesn't even respond to
      an NMI. Sometimes the ethernet driver complains (this could be an
      unrelated bug that was fixed upstream).
      
      Few things in this commit:
      
        1
      =====
      
      Updated documentation in Documentation/lcd-domains/.
      
        2
      =====
      
      Baby version of lib kernel, inside arch/x86/lcd-domains/liblcd.c.
      Unfortunately due to the recursive make, this needs to be textually
      included inside the modules destined for lcd's, for now.
      
        3
      =====
      
      Added new test modules and modified directory structure and
      build system. See documentation in Documentation/lcd-domains.
      
        4
      =====
      
      A few tweaks to the nmi handler to print a backtrace. May remove that in
      the future, as it's probably not safe to do inside an nmi handler (but if
      we're in that error state, we might be desperate to know what's happening ...).
      
        5
      =====
      
      Changed interrupt handling in arch-dependent code. The KVM code we were using
      is probably wrong for 64-bit - it doesn't properly switch stacks, etc., which
      is super important for 64-bit and may be impossible to emulate in
      software. I think this could be stale code inside KVM, but not sure. Dune
      doesn't use it. KVM doesn't ack external interrutps on vm exit, so I think
      this interrupt emulation code is always skipped (at least for non-nested
      VMs).
      
      Instead, we're not ack'ing interrupts on exit, and letting the native code
      do the right thing, like Dune.
      
      I was thinking this might be the source of the bad hang (stack
      overflow, e.g.), but not true.
      
      Conflicts:
      	include/linux/sched.h
      	kernel/watchdog.c
      	virt/lcd-domains/lcd-cspace-tests2.c
      Resolved-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
      c667fea5
    • Charlie Jacobsen's avatar
      Muktesh's capabilities fully incorporated. Capsicum-style enter/exit. · 6ee9a51f
      Charlie Jacobsen authored
      Builds, but not fully tested. Good tests for capability subsystem, some tests
      for kliblcd.
      
      Non-isolated kernel threads can "enter" the lcd system by doing
      klcd_enter / klcd_exit. They can create other lcd's, set them up, etc. They
      use the same interface that regular lcd's will use, so such code could be
      moved to an lcd, as we had planned. Will document this in Documentation folder
      tomorrow ( == today ).
      
      Capability system does checks now when a capability is deleted/revoked: for
      example, if it's for a page, the microkernel checks if the page is mapped, and
      unmaps it. If the last capability goes away, the page is freed. Documentation
      is in Documentation/lcd-domains/cap.txt.
      
      IPC code is in place, but not tested yet (pray for me).
      
      Debug is taking some time. Sometimes requires a power cycle which adds an
      extra 5 - 10 minutes. Build is slow the first time after reboot. Give me a user
      level program and I'll debug it in 30 seconds! argc
      
      Main arch-independent files:
      
          include/lcd-domains/kliblcd.h, types.h
      
             This is what non-isolated kernel code should include to use the
             kliblcd interface to the microkernel.
      
          virt/lcd-domains/main.c, kliblcd.c, cap.c, ipc.c, internal.h
      
             The microkernel, broken up into pieces.
      
          virt/lcd-domains/tests/
      
             The tests, in progress.
      
      Some old files are still hanging around in virt/lcd-domains and will be
      incorprated/cleaned up soon.
      
      I couldn't squash over the merge from the decomposition branch, so there's a
      bunch of junk commits coming over. (I should've just copied Muktesh's files.)
      
      Conflicts:
      	drivers/Kconfig
      	drivers/lcd-cspace/test.h
      	include/lcd-domains/cap.h
      	include/lcd-prototype/lcd.h
      	include/lcd/console.h
      	include/lcd/elfnote.h
      	include/linux/init_task.h
      	include/linux/module.h
      	include/linux/sched.h
      	virt/lcd-domains/cap.c
      	virt/lcd-domains/ipc.c
      	virt/lcd-domains/lcd-cspace-tests2.c
      Resolved-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
      6ee9a51f
    • Charles Jacobsen's avatar
      Fixed build errors, module init tools. · 0f690a86
      Charles Jacobsen authored
      Getting nasty runtime bugs though.
      0f690a86
    • Charlie Jacobsen's avatar
      Finished the majority of the arch-indep code changes needed. · d71afeb4
      Charlie Jacobsen authored
      Refactored lcd's into lcd and lcd_thread. Still need to test/update
      tests.
      d71afeb4
    • Charlie Jacobsen's avatar
    • root's avatar
      Working lookup, insert, and delete · a5d6ea85
      root authored
      a5d6ea85
    • Charlie Jacobsen's avatar
      Restructuring prototype code. · 963eb782
      Charlie Jacobsen authored
      963eb782
    • Charlie Jacobsen's avatar
      Basic lcd module create, run, and destroy. · e0193fa4
      Charlie Jacobsen authored
      This code is ugly, but it's working.
      
      Tested with basic module, and appears to be working
      properly. I will soon incorporate the patched
      modprobe into the kernel tree, and then this code
      will be usable by everyone.
      
      The ipc code is still unimplemented. The only
      hypercall handled is yield. Also note that other
      exit conditions (e.g. external interrupt) have not
      been fully tested.
      
      Overview:
      -- kernel code calls lcd_create_as_module with
         the module's name
      -- lcd_create_as_module loads the module using
         request_lcd_module (request_lcd_module calls
         the patched modprobe to load the module, and
         the patched modprobe calls back into the lcd
         driver via the ioctrl interface to load the
         module)
      -- lcd_create_as_module then finds the loaded
         module, spawns a kernel thread and passes off
         the module to it
      -- the kernel thread initializes the lcd and
         maps the module inside it, then suspends itself
      -- lcd_run_as_module wakes up the kernel thread
         and tells it to run
      -- lcd_delete_as_module stops the kernel thread
         and deletes the module from the host kernel
      
      File-by-file details:
      
      arch/x86/include/asm/lcd-domains-arch.h
      arch/x86/lcd-domains/lcd-domains-arch-tests.c
      arch/x86/lcd-domains/lcd-domains-arch.c
      -- lcd was not running in 64-bit mode, and my
         checks had one subtle bug
      -- fixed %cr3 load to properly load vmcs first
      -- fixed set program counter to use guest virtual
         rather than guest physical address
      
      include/linux/sched.h
      -- added struct lcd to task_struct
      
      include/linux/init_task.h
      -- lcd pointer set to null when task_struct is
         initialized
      
      include/linux/module.h
      kernel/module.c
      -- made init_module and delete_module system calls
         callable from kernel code
      -- available in module.h via do_sys_init_module and
         do_sys_delete_module
      -- simply moved the majority of the guts of the
         system calls into a non-system call, exported
         routine
      -- take an extra flag, for_lcd; when set, the init
         code skips over running (and deallocating) the
         module's init code, and the delete code skips
         over running the module exit
      -- system calls from user code set for_lcd = 0; this
         ensures existing code still works
      
      include/linux/kmod.h
      kernel/kmod.c
      kernel/sysctl.c
      -- changed __request_module to __do_request_module; takes
         one extra argument, for_lcd
      -- __request_module   ==>  __do_request_module with for_lcd = 0
      -- request_lcd_module ==>  __do_request_module with for_lcd = 1
      -- call_modprobe conditionally uses lcd_modprobe_path, the path
         to a patched modprobe accessible via sysfs
      
      include/lcd-domains/lcd-domains.h
      -- added lcd status enum; see source code doc
      -- three routines for creating/running/destroying
         lcd's that use modules; see source code doc
      
      include/uapi/linux/lcd-domains.h
      -- added interface defns for patched modprobe to call into
         lcd driver for module init; lcd driver loads
         module (via slightly refactored module.c code) on behalf
         of modprobe
      
      virt/lcd-domains/lcd-domains.c
      -- implementation of routines for modules inside lcd's
      -- implementation of module init / delete for lcd's
         (uses patched module.c code)
      
      virt/lcd-domains/Kconfig
      virt/lcd-domains/Makefile
      virt/lcd-domains/lcd-module-load-test.c
      virt/lcd-domains/lcd-tests.c
      -- added test module for lcd module code
      -- test runs automatically when lcd module is inserted
      e0193fa4
    • Anton Burtsev's avatar
    • Anton Burtsev's avatar
      Kicked out Muktesh's code, wrote my own stub · 6085bcd5
      Anton Burtsev authored
      -- Mine is primitive but works, Muktesh will have to come
         back and rewrite
      -- Cspace is an array, only insert and lookup are supported
      6085bcd5
    • Anton Burtsev's avatar
    • Anton Burtsev's avatar
      6404e62e
    • Anton Burtsev's avatar
      Cleaned up (sort of) capability code · 1d9c6dfb
      Anton Burtsev authored
      1d9c6dfb
    • Charles Jacobsen's avatar
      Fixed build system for lcd, and most compiler errors. · 7c05c7a0
      Charles Jacobsen authored
      Two components to the lcd build now:
      -- arch/x86/lcd/Makefile: for building x86 lcd code
      -- virt/lcd/Makefile: for building arch-indep lcd code
      
      Modified the build system just slightly for a cleaner
      build:
      -- virt/ directory treated like ipc/, usr/, etc. directories
      -- added Kconfig and Makefile to virt/, mirroring drivers/
      -- updated top-level Makefile to include virt/ as vmlinux
         directory / dependency, so build system will recur into
         virt/
      -- updated arch/x86/Kconfig to include virt/Kconfig, so it
         will be included as a menu item
      -- updated arch/x86/Kbuild to include arch/x86/lcd/
      
      Removed old capabilities code in cap/.
      
      Removed lcd syscall.
      
      Temporarily turned off build for drivers/lcd.
      
      Fixed most bugs in lcd-vmx (still need to do lcd_vmx_exit).
      -- minor naming issues in lcd-vmx.h
      -- straight copy over of vmx_disable_intercepts_for_msr,
         but with more doc
      -- removed VMX_EPT_INDIVIDUAL_ADDR macro from vmx.h (where
         did this come from? it's not documented in the intel manual,
         nor is it used in kvm)
      7c05c7a0
    • Anton Burtsev's avatar
      Clean up user-level includes, build inskern4lcd tool · 2d55c00d
      Anton Burtsev authored
      It took me a while to figure out a suggested layout for Linux
      user-visible headers. I ended up with this:
      
        ./include/lcd/lcd.h -- includes #include <uapi/linux/lcd.h>
        ./include/uapi/linux/lcd.h -- this file will be installed to
            /usr/linux/lcd.h
      
      Note on installing user-visible header files
      
       1) Add header-y += lcd.h line to include/uapi/linux/Kbuild
      
       2) Install sanitized header files into ./usr/include
      
          sudo make install_headers
      
       3) Copy ./usr/include to /usr/include
      
          sudo find usr/include \( -name .install -o -name ..install.cmd \) -delete
          sudo cp -rv usr/include/* /usr/include
      2d55c00d
    • Anton Burtsev's avatar
      ioctl interface to LCD · 552bd976
      Anton Burtsev authored
      552bd976
    • Anton Burtsev's avatar
      Working on ioctl interface for LCD · 3cbf87c0
      Anton Burtsev authored
        -- A skeleton for the IOCTL interface
        -- Public include file <linux/lcd-domains.h>
        -- Moved module load/unload from arch/x86/lcd into drivers/lcd
        -- Build crashes for modules (built-in works)
      3cbf87c0
    • Anton Burtsev's avatar
      Move things around · 0ec257ef
      Anton Burtsev authored
      Not ideal. Some problems:
      
        - The arch dependent and independent parts are not completely
          separated.
      
        - The module builds in arch/x86/lcd and this is a bit strange
      
        - I can't build with LCD buildin, but build works if I build it
          as a module
      
        - There is still a bunch of old warrnings from Weibin's code
      0ec257ef
    • Sarah Spall's avatar
    • Muktesh Khole's avatar
      Adding cspace initialization to INIT_TASK macro · d6dfb02c
      Muktesh Khole authored
      Conflicts:
      	include/linux/init_task.h
      	include/linux/sched.h
      Resolved-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
      d6dfb02c
  2. 22 Oct, 2016 4 commits
    • Miklos Szeredi's avatar
      vfs: move permission checking into notify_change() for utimes(NULL) · 7bf99896
      Miklos Szeredi authored
      commit f2b20f6ee842313a0d681dbbf7f87b70291a6a3b upstream.
      
      This fixes a bug where the permission was not properly checked in
      overlayfs.  The testcase is ltp/utimensat01.
      
      It is also cleaner and safer to do the permission checking in the vfs
      helper instead of the caller.
      
      This patch introduces an additional ia_valid flag ATTR_TOUCH (since
      touch(1) is the most obvious user of utimes(NULL)) that is passed into
      notify_change whenever the conditions for this special permission checking
      mode are met.
      Reported-by: 's avatarAihua Zhang <zhangaihua1@huawei.com>
      Signed-off-by: 's avatarMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: 's avatarAihua Zhang <zhangaihua1@huawei.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bf99896
    • Manfred Spraul's avatar
      ipc/sem.c: fix complex_count vs. simple op race · a9d465be
      Manfred Spraul authored
      commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.
      
      Commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") introduced a
      race:
      
      sem_lock has a fast path that allows parallel simple operations.
      There are two reasons why a simple operation cannot run in parallel:
       - a non-simple operations is ongoing (sma->sem_perm.lock held)
       - a complex operation is sleeping (sma->complex_count != 0)
      
      As both facts are stored independently, a thread can bypass the current
      checks by sleeping in the right positions.  See below for more details
      (or kernel bugzilla 105651).
      
      The patch fixes that by creating one variable (complex_mode)
      that tracks both reasons why parallel operations are not possible.
      
      The patch also updates stale documentation regarding the locking.
      
      With regards to stable kernels:
      The patch is required for all kernels that include the
      commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") (3.10?)
      
      The alternative is to revert the patch that introduced the race.
      
      The patch is safe for backporting, i.e. it makes no assumptions
      about memory barriers in spin_unlock_wait().
      
      Background:
      Here is the race of the current implementation:
      
      Thread A: (simple op)
      - does the first "sma->complex_count == 0" test
      
      Thread B: (complex op)
      - does sem_lock(): This includes an array scan. But the scan can't
        find Thread A, because Thread A does not own sem->lock yet.
      - the thread does the operation, increases complex_count,
        drops sem_lock, sleeps
      
      Thread A:
      - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
      - sleeps before the complex_count test
      
      Thread C: (complex op)
      - does sem_lock (no array scan, complex_count==1)
      - wakes up Thread B.
      - decrements complex_count
      
      Thread A:
      - does the complex_count test
      
      Bug:
      Now both thread A and thread C operate on the same array, without
      any synchronization.
      
      Fixes: 6d07b68c ("ipc/sem.c: optimize sem_lock()")
      Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
      Reported-by: <felixh@informatik.uni-bremen.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <1vier1@web.de>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9d465be
    • Johannes Weiner's avatar
      mm: filemap: don't plant shadow entries without radix tree node · 7d5d3b13
      Johannes Weiner authored
      commit d3798ae8c6f3767c726403c2ca6ecc317752c9dd upstream.
      
      When the underflow checks were added to workingset_node_shadow_dec(),
      they triggered immediately:
      
        kernel BUG at ./include/linux/swap.h:276!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
         soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
        CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60b #1
        Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
        task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
        RIP: page_cache_tree_insert+0xf1/0x100
        Call Trace:
          __add_to_page_cache_locked+0x12e/0x270
          add_to_page_cache_lru+0x4e/0xe0
          mpage_readpages+0x112/0x1d0
          blkdev_readpages+0x1d/0x20
          __do_page_cache_readahead+0x1ad/0x290
          force_page_cache_readahead+0xaa/0x100
          page_cache_sync_readahead+0x3f/0x50
          generic_file_read_iter+0x5af/0x740
          blkdev_read_iter+0x35/0x40
          __vfs_read+0xe1/0x130
          vfs_read+0x96/0x130
          SyS_read+0x55/0xc0
          entry_SYSCALL_64_fastpath+0x13/0x8f
        Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
        RIP  page_cache_tree_insert+0xf1/0x100
      
      This is a long-standing bug in the way shadow entries are accounted in
      the radix tree nodes. The shrinker needs to know when radix tree nodes
      contain only shadow entries, no pages, so node->count is split in half
      to count shadows in the upper bits and pages in the lower bits.
      
      Unfortunately, the radix tree implementation doesn't know of this and
      assumes all entries are in node->count. When there is a shadow entry
      directly in root->rnode and the tree is later extended, the radix tree
      implementation will copy that entry into the new node and and bump its
      node->count, i.e. increases the page count bits. Once the shadow gets
      removed and we subtract from the upper counter, node->count underflows
      and triggers the warning. Afterwards, without node->count reaching 0
      again, the radix tree node is leaked.
      
      Limit shadow entries to when we have actual radix tree nodes and can
      count them properly. That means we lose the ability to detect refaults
      from files that had only the first page faulted in at eviction time.
      
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: 's avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-and-tested-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: 's avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d5d3b13
    • Christian Lamparter's avatar
      debugfs: introduce a public file_operations accessor · 7c646634
      Christian Lamparter authored
      commit 86f0e06767dda7863d6d2a8f0b3b857e6ea876a0 upstream.
      
      This patch introduces an accessor which can be used
      by the users of debugfs (drivers, fs, ...) to get the
      original file_operations struct. It also removes the
      REAL_FOPS_DEREF macro in file.c and converts the code
      to use the public version.
      
      Previously, REAL_FOPS_DEREF was only available within
      the file.c of debugfs. But having a public getter
      available for debugfs users is important as some
      drivers (carl9170 and b43) use the pointer of the
      original file_operations in conjunction with container_of()
      within their debugfs implementations.
      Reviewed-by: 's avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: 's avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c646634
  3. 20 Oct, 2016 1 commit
    • Linus Torvalds's avatar
      mm: remove gup_flags FOLL_WRITE games from __get_user_pages() · 89eeba15
      Linus Torvalds authored
      commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
      
      This is an ancient bug that was actually attempted to be fixed once
      (badly) by me eleven years ago in commit 4ceb5db9 ("Fix
      get_user_pages() race for write access") but that was then undone due to
      problems on s390 by commit f33ea7f4 ("fix get_user_pages bug").
      
      In the meantime, the s390 situation has long been fixed, and we can now
      fix it by checking the pte_dirty() bit properly (and do it better).  The
      s390 dirty bit was implemented in abf09bed ("s390/mm: implement
      software dirty bits") which made it into v3.9.  Earlier kernels will
      have to look at the page state itself.
      
      Also, the VM has become more scalable, and what used a purely
      theoretical race back then has become easier to trigger.
      
      To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
      we already did a COW" rather than play racy games with FOLL_WRITE that
      is very fundamental, and then use the pte dirty flag to validate that
      the FOLL_COW flag is still valid.
      Reported-and-tested-by: 's avatarPhil "not Paul" Oester <kernel@linuxace.com>
      Acked-by: 's avatarHugh Dickins <hughd@google.com>
      Reviewed-by: 's avatarMichal Hocko <mhocko@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89eeba15
  4. 16 Oct, 2016 1 commit
  5. 07 Oct, 2016 1 commit
    • Linus Torvalds's avatar
      Using BUG_ON() as an assert() is _never_ acceptable · 0b09f2d4
      Linus Torvalds authored
      commit 21f54ddae449f4bdd9f1498124901d67202243d9 upstream.
      
      That just generally kills the machine, and makes debugging only much
      harder, since the traces may long be gone.
      
      Debugging by assert() is a disease.  Don't do it.  If you can continue,
      you're much better off doing so with a live machine where you have a
      much higher chance that the report actually makes it to the system logs,
      rather than result in a machine that is just completely dead.
      
      The only valid situation for BUG_ON() is when continuing is not an
      option, because there is massive corruption.  But if you are just
      verifying that something is true, you warn about your broken assumptions
      (preferably just once), and limp on.
      
      Fixes: 22f2ac51 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b09f2d4
  6. 30 Sep, 2016 2 commits
    • John Youn's avatar
      include/linux/property.h: fix typo/compile error · 37aa7271
      John Youn authored
      This fixes commit d76eebfa ("include/linux/property.h: fix build
      issues with gcc-4.4.4").
      
      With that commit we get the following compile error when using the
      PROPERTY_ENTRY_INTEGER_ARRAY macro.
      
       include/linux/property.h:201:39: error: `u32_data' undeclared (first
                       use in this function)
        PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_)
                                             ^
       include/linux/property.h:193:17: note: in definition of macro
                       `PROPERTY_ENTRY_INTEGER_ARRAY'
        { .pointer = { _type_##_data = _val_ } },  \
                       ^
      
      This needs a '.' to reference the union member.  It seems this was just
      overlooked here since it is done correctly in similar constructs in
      other parts of the original commit.
      
      This fix is in preparation of upcoming commits that will use this macro.
      
      Fixes: commit d76eebfa ("include/linux/property.h: fix build issues with gcc-4.4.4")
      Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.comSigned-off-by: 's avatarJohn Youn <johnyoun@synopsys.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      37aa7271
    • Johannes Weiner's avatar
      mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() · 22f2ac51
      Johannes Weiner authored
      Antonio reports the following crash when using fuse under memory pressure:
      
        kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: all of them
        CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
        Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
        task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
        RIP: shadow_lru_isolate+0x181/0x190
        Call Trace:
          __list_lru_walk_one.isra.3+0x8f/0x130
          list_lru_walk_one+0x23/0x30
          scan_shadow_nodes+0x34/0x50
          shrink_slab.part.40+0x1ed/0x3d0
          shrink_zone+0x2ca/0x2e0
          kswapd+0x51e/0x990
          kthread+0xd8/0xf0
          ret_from_fork+0x3f/0x70
      
      which corresponds to the following sanity check in the shadow node
      tracking:
      
        BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
      
      The workingset code tracks radix tree nodes that exclusively contain
      shadow entries of evicted pages in them, and this (somewhat obscure)
      line checks whether there are real pages left that would interfere with
      reclaim of the radix tree node under memory pressure.
      
      While discussing ways how fuse might sneak pages into the radix tree
      past the workingset code, Miklos pointed to replace_page_cache_page(),
      and indeed there is a problem there: it properly accounts for the old
      page being removed - __delete_from_page_cache() does that - but then
      does a raw raw radix_tree_insert(), not accounting for the replacement
      page.  Eventually the page count bits in node->count underflow while
      leaving the node incorrectly linked to the shadow node LRU.
      
      To address this, make sure replace_page_cache_page() uses the tracked
      page insertion code, page_cache_tree_insert().  This fixes the page
      accounting and makes sure page-containing nodes are properly unlinked
      from the shadow node LRU again.
      
      Also, make the sanity checks a bit less obscure by using the helpers for
      checking the number of pages and shadows in a radix tree node.
      
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.orgSigned-off-by: 's avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: 's avatarAntonio SJ Musumeci <trapexit@spawn.link>
      Debugged-by: 's avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: <stable@vger.kernel.org>	[3.15+]
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      22f2ac51
  7. 28 Sep, 2016 1 commit
  8. 25 Sep, 2016 2 commits
    • Nikolay Aleksandrov's avatar
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov authored
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: 's avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      2cf75070
    • Dave Chinner's avatar
      fault_in_multipages_readable() throws set-but-unused error · 90b75db6
      Dave Chinner authored
      When building XFS with -Werror, it now fails with:
      
        include/linux/pagemap.h: In function 'fault_in_multipages_readable':
        include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
          volatile char c;
                        ^
      
      This is a regression caused by commit e23d4159 ("fix
      fault_in_multipages_...() on architectures with no-op access_ok()").
      Fix it by re-adding the "(void)c" trick taht was previously used to make
      the compiler think the variable is used.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      90b75db6
  9. 22 Sep, 2016 1 commit
  10. 20 Sep, 2016 1 commit
    • Al Viro's avatar
      fix fault_in_multipages_...() on architectures with no-op access_ok() · e23d4159
      Al Viro authored
      Switching iov_iter fault-in to multipages variants has exposed an old
      bug in underlying fault_in_multipages_...(); they break if the range
      passed to them wraps around.  Normally access_ok() done by callers will
      prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
      such a range and they should not point to any valid objects).
      
      However, on architectures where userland and kernel live in different
      MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
      with a wraparound can reach fault_in_multipages_...().
      
      Since any wraparound means EFAULT there, the fix is trivial - turn
      those
      
          while (uaddr <= end)
      	    ...
      into
      
          if (unlikely(uaddr > end))
      	    return -EFAULT;
          do
      	    ...
          while (uaddr <= end);
      Reported-by: 's avatarJan Stancek <jstancek@redhat.com>
      Tested-by: 's avatarJan Stancek <jstancek@redhat.com>
      Cc: stable@vger.kernel.org # v3.5+
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      e23d4159
  11. 19 Sep, 2016 2 commits
  12. 17 Sep, 2016 1 commit
  13. 14 Sep, 2016 1 commit
  14. 13 Sep, 2016 1 commit