Skip to content
Snippets Groups Projects
  1. Apr 04, 2011
  2. Mar 31, 2011
  3. Mar 24, 2011
  4. Mar 23, 2011
    • Akinobu Mita's avatar
      bitops: introduce CONFIG_GENERIC_FIND_BIT_LE · 0664996b
      Akinobu Mita authored
      
      This introduces CONFIG_GENERIC_FIND_BIT_LE to tell whether to use generic
      implementation of find_*_bit_le() in lib/find_next_bit.c or not.
      
      For now we select CONFIG_GENERIC_FIND_BIT_LE for all architectures which
      enable CONFIG_GENERIC_FIND_NEXT_BIT.
      
      But m68knommu wants to define own faster find_next_zero_bit_le() and
      continues using generic find_next_{,zero_}bit().
      (CONFIG_GENERIC_FIND_NEXT_BIT and !CONFIG_GENERIC_FIND_BIT_LE)
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0664996b
    • Akinobu Mita's avatar
      asm-generic: change little-endian bitops to take any pointer types · a56560b3
      Akinobu Mita authored
      
      This makes the little-endian bitops take any pointer types by changing the
      prototypes and adding casts in the preprocessor macros.
      
      That would seem to at least make all the filesystem code happier, and they
      can continue to do just something like
      
        #define ext2_set_bit __test_and_set_bit_le
      
      (or whatever the exact sequence ends up being).
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a56560b3
    • Akinobu Mita's avatar
      asm-generic: rename generic little-endian bitops functions · c4945b9e
      Akinobu Mita authored
      
      As a preparation for providing little-endian bitops for all architectures,
      This renames generic implementation of little-endian bitops.  (remove
      "generic_" prefix and postfix "_le")
      
      s/generic_find_next_le_bit/find_next_bit_le/
      s/generic_find_next_zero_le_bit/find_next_zero_bit_le/
      s/generic_find_first_zero_le_bit/find_first_zero_bit_le/
      s/generic___test_and_set_le_bit/__test_and_set_bit_le/
      s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/
      s/generic_test_le_bit/test_bit_le/
      s/generic___set_le_bit/__set_bit_le/
      s/generic___clear_le_bit/__clear_bit_le/
      s/generic_test_and_set_le_bit/test_and_set_bit_le/
      s/generic_test_and_clear_le_bit/test_and_clear_bit_le/
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4945b9e
  5. Mar 22, 2011
    • Jim Keniston's avatar
      zlib: slim down zlib_deflate() workspace when possible · 565d76cb
      Jim Keniston authored
      
      Instead of always creating a huge (268K) deflate_workspace with the
      maximum compression parameters (windowBits=15, memLevel=8), allow the
      caller to obtain a smaller workspace by specifying smaller parameter
      values.
      
      For example, when capturing oops and panic reports to a medium with
      limited capacity, such as NVRAM, compression may be the only way to
      capture the whole report.  In this case, a small workspace (24K works
      fine) is a win, whether you allocate the workspace when you need it (i.e.,
      during an oops or panic) or at boot time.
      
      I've verified that this patch works with all accepted values of windowBits
      (positive and negative), memLevel, and compression level.
      
      Signed-off-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      565d76cb
    • Alexey Dobriyan's avatar
      kstrto*: converting strings to integers done (hopefully) right · 33ee3b2e
      Alexey Dobriyan authored
      
      1. simple_strto*() do not contain overflow checks and crufty,
         libc way to indicate failure.
      2. strict_strto*() also do not have overflow checks but the name and
         comments pretend they do.
      3. Both families have only "long long" and "long" variants,
         but users want strtou8()
      4. Both "simple" and "strict" prefixes are wrong:
         Simple doesn't exactly say what's so simple, strict should not exist
         because conversion should be strict by default.
      
      The solution is to use "k" prefix and add convertors for more types.
      Enter
      	kstrtoull()
      	kstrtoll()
      	kstrtoul()
      	kstrtol()
      	kstrtouint()
      	kstrtoint()
      
      	kstrtou64()
      	kstrtos64()
      	kstrtou32()
      	kstrtos32()
      	kstrtou16()
      	kstrtos16()
      	kstrtou8()
      	kstrtos8()
      
      Include runtime testsuite (somewhat incomplete) as well.
      
      strict_strto*() become deprecated, stubbed to kstrto*() and
      eventually will be removed altogether.
      
      Use kstrto*() in code today!
      
      Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if
            they'll be unused at runtime. This is temporarily solution,
            because I don't want to hardcode list of archs where these
            functions aren't needed. Current solution with sizeof() and
            __alignof__ at least always works.
      
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33ee3b2e
    • Mandeep Singh Baines's avatar
      printk: allow setting DEFAULT_MESSAGE_LEVEL via Kconfig · 5af5bcb8
      Mandeep Singh Baines authored
      
      We've been burned by regressions/bugs which we later realized could have
      been triaged quicker if only we'd paid closer attention to dmesg.  To make
      it easier to audit dmesg, we'd like to make DEFAULT_MESSAGE_LEVEL
      Kconfig-settable.  That way we can set it to KERN_NOTICE and audit any
      messages <= KERN_WARNING.
      
      Signed-off-by: default avatarMandeep Singh Baines <msb@chromium.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Cc: Olof Johansson <olofj@chromium.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5af5bcb8
    • Kees Cook's avatar
      printk: use %pK for /proc/kallsyms and /proc/modules · 9f36e2c4
      Kees Cook authored
      
      In an effort to reduce kernel address leaks that might be used to help
      target kernel privilege escalation exploits, this patch uses %pK when
      displaying addresses in /proc/kallsyms, /proc/modules, and
      /sys/module/*/sections/*.
      
      Note that this changes %x to %p, so some legitimately 0 values in
      /proc/kallsyms would have changed from 00000000 to "(null)".  To avoid
      this, "(null)" is not used when using the "K" format.  Anything that was
      already successfully parsing "(null)" in addition to full hex digits
      should have no problem with this change.  (Thanks to Joe Perches for the
      suggestion.) Due to the %x to %p, "void *" casts are needed since these
      addresses are already "unsigned long" everywhere internally, due to their
      starting life as ELF section offsets.
      
      Signed-off-by: default avatarKees Cook <kees.cook@canonical.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: Dan Rosenberg <drosenberg@vsecurity.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f36e2c4
    • Joe Perches's avatar
      vsprintf: neaten %pK kptr_restrict, save a bit of code space · 26297607
      Joe Perches authored
      
      If kptr restrictions are on, just set the passed pointer to NULL.
      
      $ size lib/vsprintf.o.*
         text	   data	    bss	    dec	    hex	filename
         8247	      4	      2	   8253	   203d	lib/vsprintf.o.new
         8282	      4	      2	   8288	   2060	lib/vsprintf.o.old
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Dan Rosenberg <drosenberg@vsecurity.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26297607
    • Don Zickus's avatar
      kernel/watchdog.c: allow hardlockup to panic by default · fef2c9bc
      Don Zickus authored
      
      When a cpu is considered stuck, instead of limping along and just printing
      a warning, it is sometimes preferred to just panic, let kdump capture the
      vmcore and reboot.  This gets the machine back into a stable state quickly
      while saving the info that got it into a stuck state to begin with.
      
      Add a Kconfig option to allow users to set the hardlockup to panic
      by default.  Also add in a 'nmi_watchdog=nopanic' to override this.
      
      [akpm@linux-foundation.org: fix strncmp length]
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fef2c9bc
    • David Rientjes's avatar
      oom: suppress nodes that are not allowed from meminfo on oom kill · ddd588b5
      David Rientjes authored
      
      The oom killer is extremely verbose for machines with a large number of
      cpus and/or nodes.  This verbosity can often be harmful if it causes other
      important messages to be scrolled from the kernel log and incurs a
      signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT >
      8.
      
      This patch causes only memory information to be displayed for nodes that
      are allowed by current's cpuset when dumping the VM state.  Information
      for all other nodes is irrelevant to the oom condition; we don't care if
      there's an abundance of memory elsewhere if we can't access it.
      
      This only affects the behavior of dumping memory information when an oom
      is triggered.  Other dumps, such as for sysrq+m, still display the
      unfiltered form when using the existing show_mem() interface.
      
      Additionally, the per-cpu pageset statistics are extremely verbose in oom
      killer output, so it is now suppressed.  This removes
      
      	nodes_weight(current->mems_allowed) * (1 + nr_cpus)
      
      lines from the oom killer output.
      
      Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
      nodes.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ddd588b5
  6. Mar 21, 2011
  7. Mar 11, 2011
    • Lai Jiangshan's avatar
      plist: Add priority list test · 6d55da53
      Lai Jiangshan authored
      
      Add test code for checking plist when the kernel is booting.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D107986.1010302@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6d55da53
    • Lai Jiangshan's avatar
      plist: Shrink struct plist_head · bf6a9b83
      Lai Jiangshan authored
      
      struct plist_head is used in struct task_struct as well as struct
      rtmutex. If we can make it smaller, it will also make these structures
      smaller as well.
      
      The field prio_list in struct plist_head is seldom used and we can get
      its information from the plist_nodes. Removing this field will decrease
      the size of plist_head by half.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D107982.9090700@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bf6a9b83
    • Ivan Djelic's avatar
      lib: add shared BCH ECC library · 437aa565
      Ivan Djelic authored
      
      This is a new software BCH encoding/decoding library, similar to the shared
      Reed-Solomon library.
      
      Binary BCH (Bose-Chaudhuri-Hocquenghem) codes are widely used to correct
      errors in NAND flash devices requiring more than 1-bit ecc correction; they
      are generally better suited for NAND flash than RS codes because NAND bit
      errors do not occur in bursts. Latest SLC NAND devices typically require at
      least 4-bit ecc protection per 512 bytes block.
      
      This library provides software encoding/decoding, but may also be used with
      ASIC/SoC hardware BCH engines to perform error correction. It is being
      currently used for this purpose on an OMAP3630 board (4bit/8bit HW BCH). It
      has also been used to decode raw dumps of NAND devices with on-die BCH ecc
      engines (e.g. Micron 4bit ecc SLC devices).
      
      Latest NAND devices (including SLC) can exhibit high error rates (typically
      a dozen or more bitflips per hour during stress tests); in order to
      minimize the performance impact of error correction, this library
      implements recently developed algorithms for fast polynomial root finding
      (see bch.c header for details) instead of the traditional exhaustive Chien
      root search; a few performance figures are provided below:
      
      Platform: arm926ejs @ 468 MHz, 32 KiB icache, 16 KiB dcache
      BCH ecc : 4-bit per 512 bytes
      
      Encoding average throughput: 250 Mbits/s
      
      Error correction time (compared with Chien search):
      
              average   worst      average (Chien)  worst (Chien)
      ----------------------------------------------------------
      1 bit    8.5 µs   11 µs         200 µs           383 µs
      2 bit    9.7 µs   12.5 µs       477 µs           728 µs
      3 bit   18.1 µs   20.6 µs       758 µs          1010 µs
      4 bit   19.5 µs   23 µs        1028 µs          1280 µs
      
      In the above figures, "worst" is meant in terms of error pattern, not in
      terms of cache miss / page faults effects (not taken into account here).
      
      The library has been extensively tested on the following platforms: x86,
      x86_64, arm926ejs, omap3630, qemu-ppc64, qemu-mips.
      
      Signed-off-by: default avatarIvan Djelic <ivan.djelic@parrot.com>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      437aa565
  8. Mar 08, 2011
    • Stanislaw Gruszka's avatar
      debugobjects: Add hint for better object identification · 99777288
      Stanislaw Gruszka authored
      
      In complex subsystems like mac80211 structures can contain several
      timers and work structs, so identifying a specific instance from the
      call trace and object type output of debugobjects can be hard.
      
      Allow the subsystems which support debugobjects to provide a hint
      function. This function returns a pointer to a kernel address
      (preferrably the objects callback function) which is printed along
      with the debugobjects type.
      
      Add hint methods for timer_list, work_struct and hrtimer.
      
      [ tglx: Massaged changelog, made it compile ]
      
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <20110307085809.GA9334@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      99777288
  9. Mar 05, 2011
    • Arnd Bergmann's avatar
      BKL: That's all, folks · 4ba8216c
      Arnd Bergmann authored
      
      This removes the implementation of the big kernel lock,
      at last. A lot of people have worked on this in the
      past, I so the credit for this patch should be with
      everyone who participated in the hunt.
      
      The names on the Cc list are the people that were the
      most active in this, according to the recorded git
      history, in alphabetical order.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAlan Cox <alan@linux.intel.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Jan Blunck <jblunck@infradead.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Oliver Neukum <oliver@neukum.org>
      Cc: Paul Menage <menage@google.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      4ba8216c
  10. Mar 04, 2011
  11. Mar 01, 2011
  12. Feb 28, 2011
  13. Feb 25, 2011
  14. Feb 24, 2011
  15. Feb 18, 2011
    • Linus Torvalds's avatar
      Expand CONFIG_DEBUG_LIST to several other list operations · 3c18d4de
      Linus Torvalds authored
      
      When list debugging is enabled, we aim to readably show list corruption
      errors, and the basic list_add/list_del operations end up having extra
      debugging code in them to do some basic validation of the list entries.
      
      However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
      the debug code due to how they were written. This fixes that.
      
      So the _next_ time we have list_move() problems with stale list entries,
      we'll hopefully have an easier time finding them..
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c18d4de
  16. Feb 07, 2011
  17. Feb 03, 2011
  18. Jan 27, 2011
  19. Jan 25, 2011
    • Toshiyuki Okajima's avatar
      radix_tree: radix_tree_gang_lookup_tag_slot() may never return · ac15ee69
      Toshiyuki Okajima authored
      
      Executed command: fsstress -d /mnt -n 600 -p 850
      
        crash> bt
        PID: 7947   TASK: ffff880160546a70  CPU: 0   COMMAND: "fsstress"
         #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
         #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
         #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
         #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
         #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
         #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
         #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
            [exception RIP: __lookup_tag+100]
            RIP: ffffffff812274b4  RSP: ffff88016056b998  RFLAGS: 00000287
            RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000006
            RDX: 000000000000001d  RSI: ffff88016056bb18  RDI: ffff8800c85366e0
            RBP: ffff88016056b9c8   R8: ffff88016056b9e8   R9: 0000000000000000
            R10: 000000000000000e  R11: ffff8800c8536908  R12: 0000000000000010
            R13: 0000000000000040  R14: ffffffffffffffc0  R15: ffff8800c85366e0
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        <NMI exception stack>
         #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
         #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
         #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
        #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
        #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
        #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
        #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
        #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
        #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
        #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
        #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
        #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
        #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
        #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
        #21 [ffff88016056bf30] sys_write at ffffffff81150691
        #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2
      
      I think this root cause is the following:
      
       radix_tree_range_tag_if_tagged() always tags the root tag with settag
       if the root tag is set with iftag even if there are no iftag tags
       in the specified range (Of course, there are some iftag tags
       outside the specified range).
      
      ===============================================================================
      [[[Detailed description]]]
      
      (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?
      
      __lookup_tag():
       - Return with 0.
       - Return with the index which is not bigger than the old one as the
         input parameter.
      
      Therefore the following "while" repeats forever because the above
      conditions cause "ret" not to be updated and the cur_index cannot be
      changed into the bigger one.
      
      (So, radix_tree_gang_lookup_tag_slot() cannot return forever.)
      
      radix_tree_gang_lookup_tag_slot():
      1178         while (ret < max_items) {
      1179                 unsigned int slots_found;
      1180                 unsigned long next_index;       /* Index of next search */
      1181
      1182                 if (cur_index > max_index)
      1183                         break;
      1184                 slots_found = __lookup_tag(node, results + ret,
      1185                                 cur_index, max_items - ret, &next_index,
      tag);
      1186                 ret += slots_found;
      			// cannot update ret because slots_found == 0.
      			// so, this while loops forever.
      1187                 if (next_index == 0)
      1188                         break;
      1189                 cur_index = next_index;
      1190         }
      
      (2) Why does __lookup_tag() return with 0 and doesn't update the index?
      
      Assuming the following:
        - the one of the slot in radix_tree_node is NULL.
        - the one of the tag which corresponds to the slot sets with
          PAGECACHE_TAG_TOWRITE or other.
        - In a certain height(!=0), the corresponding index is 0.
      
      a) __lookup_tag() notices that the tag is set.
      
      1005 static unsigned int
      1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
      1007         unsigned int max_items, unsigned long *next_index, unsigned int tag)
      1008 {
      1009         unsigned int nr_found = 0;
      1010         unsigned int shift, height;
      1011
      1012         height = slot->height;
      1013         if (height == 0)
      1014                 goto out;
      1015         shift = (height-1) * RADIX_TREE_MAP_SHIFT;
      1016
      1017         while (height > 0) {
      1018                 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
      1019
      1020                 for (;;) {
      1021                         if (tag_get(slot, tag, i))
      1022                                 break;
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      * the index is not updated yet.
      
      b) __lookup_tag() notices that the slot is NULL.
      
      1023                         index &= ~((1UL << shift) - 1);
      1024                         index += 1UL << shift;
      1025                         if (index == 0)
      1026                                 goto out;       /* 32-bit wraparound */
      1027                         i++;
      1028                         if (i == RADIX_TREE_MAP_SIZE)
      1029                                 goto out;
      1030                 }
      1031                 height--;
      1032                 if (height == 0) {      /* Bottom level: grab some items */
      ...
      1055                 }
      1056                 shift -= RADIX_TREE_MAP_SHIFT;
      1057                 slot = rcu_dereference_raw(slot->slots[i]);
      1058                 if (slot == NULL)
      1059                         break;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      c) __lookup_tag() doesn't update the index and return with 0.
      
      1060         }
      1061 out:
      1062         *next_index = index;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      1063         return nr_found;
      1064 }
      
      (3) Why is the slot NULL even if the tag is set?
      
      Because radix_tree_range_tag_if_tagged() always sets the root tag with
      PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
      even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
      in the specified range (from *first_indexp to last_index). Of course,
      some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
      (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())
      
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       645         unsigned int height = root->height;
       646         struct radix_tree_path path[height];
       647         struct radix_tree_path *pathp = path;
       648         struct radix_tree_node *slot;
       649         unsigned int shift;
       650         unsigned long tagged = 0;
       651         unsigned long index = *first_indexp;
       652
       653         last_index = min(last_index, radix_tree_maxindex(height));
       654         if (index > last_index)
       655                 return 0;
       656         if (!nr_to_tag)
       657                 return 0;
       658         if (!root_tag_get(root, iftag)) {
       659                 *first_indexp = last_index + 1;
       660                 return 0;
       661         }
       662         if (height == 0) {
       663                 *first_indexp = last_index + 1;
       664                 root_tag_set(root, settag);
       665                 return 1;
       666         }
      ...
       733         root_tag_set(root, settag);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       734         *first_indexp = index;
       735
       736         return tagged;
       737 }
      
      As the result, there is no radix_tree_node which is set with
      PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
      PAGECACHE_TAG_TOWRITE.
      
      [figure: inside radix_tree]
      (Please see the figure with typewriter font)
      ===========================================
                [roottag = DIRTY]
                       |             tag=0:NOTHING
               tag[0 0 0 1]              1:DIRTY
                  [x x x +]              2:WRITEBACK
                         |               3:DIRTY,WRITEBACK
                         p               4:TOWRITE
                   <--->                 5:DIRTY,TOWRITE ...
           specified range (index: 0 to 2)
      
      * There is no DIRTY tag within the specified range.
       (But there is a DIRTY tag outside that range.)
      
                  | | | | | | | | |
          after calling tag_pages_for_writeback()
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |                 p is "page".
               tag[0 0 0 1]              x is NULL.
                  [x x x +]              +- is a pointer to "page".
                         |
                         p
      
      * But TOWRITE tag is set on the root tag.
      ============================================
      
      After that, radix_tree_extend() via radix_tree_insert() is called
      when the page is added.
      This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
      to succeed the status of the root tag.
      
       246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
      index)
       247 {
       248         struct radix_tree_node *node;
       249         unsigned int height;
       250         int tag;
       251
       252         /* Figure out what the height should be.  */
       253         height = root->height + 1;
       254         while (index > radix_tree_maxindex(height))
       255                 height++;
       256
       257         if (root->rnode == NULL) {
       258                 root->height = height;
       259                 goto out;
       260         }
       261
       262         do {
       263                 unsigned int newheight;
       264                 if (!(node = radix_tree_node_alloc(root)))
       265                         return -ENOMEM;
       266
       267                 /* Increase the height.  */
       268                 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
       269
       270                 /* Propagate the aggregated tag info into the new root */
       271                 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
       272                         if (root_tag_get(root, tag))
       273                                 tag_set(node, tag, 0);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       274                 }
      
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |     :
               tag[0 0 0 1] [0 0 0 0]
                  [x x x +] [+ x x x]
                         |   |
                         p   p (new page)
      
                  | | | | | | | | |
          after calling radix_tree_insert
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]    *  DIRTY and TOWRITE tags are
                   [+ + x x]       succeeded to the new node.
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
      ============================================
      
      After that, the index 3 page is released by remove_from_page_cache().
      Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
      and that the slot which corresponds to the tag is NULL.
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]
                   [+ + x x]
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
               (remove)
      
                  | | | | | | | | |
          after calling remove_page_cache
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [4 0 0 0]      * Only DIRTY tag is cleared
                   [x + x x]        because no TOWRITE tag is existed
                      |             in the bottom node.
                      [0 0 0 0]
                      [+ x x x]
                       |
                       p
      ============================================
      
      To solve this problem
      
      Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
      if it doesn't set any tags within the specified range.
      
      Like this.
      ============================================
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       650         unsigned long tagged = 0;
      ...
       733 	     if (tagged)
      ^^^^^^^^^^^^^^^^^^^^^^^^
       734            root_tag_set(root, settag);
       735         *first_indexp = index;
       736
       737         return tagged;
       738 }
      
      ============================================
      
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac15ee69
    • Jesper Dangaard Brouer's avatar
      textsearch: doc - fix spelling in lib/textsearch.c. · de0368d5
      Jesper Dangaard Brouer authored
      
      Found the following spelling errors while reading the textsearch code:
        "facitilies"  -> "facilities"
        "continously" -> "continuously"
        "arbitary"    -> "arbitrary"
        "patern"      -> "pattern"
        "occurences"  -> "occurrences"
      
      I'll try to push this patch through DaveM, given the only users
      of textsearch is in the net/ tree (nf_conntrack_amanda.c, xt_string.c
      and em_text.c)
      
      Signed-off-by: default avatarJesper Sander <sander.contrib@gmail.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <hawk@comx.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0368d5
  20. Jan 24, 2011
  21. Jan 20, 2011
    • David Rientjes's avatar
      kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT · 6a108a14
      David Rientjes authored
      
      The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
      is used to configure any non-standard kernel with a much larger scope than
      only small devices.
      
      This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
      references to the option throughout the kernel.  A new CONFIG_EMBEDDED
      option is added that automatically selects CONFIG_EXPERT when enabled and
      can be used in the future to isolate options that should only be
      considered for embedded systems (RISC architectures, SLOB, etc).
      
      Calling the option "EXPERT" more accurately represents its intention: only
      expert users who understand the impact of the configuration changes they
      are making should enable it.
      
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarDavid Woodhouse <david.woodhouse@intel.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Robin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a108a14
  22. Jan 13, 2011
    • Lasse Collin's avatar
      decompressors: check input size in decompress_inflate.c · 1da914e0
      Lasse Collin authored
      
      Check for end of the input buffer when skipping over the filename field in
      the .gz file header.
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1da914e0
    • Lasse Collin's avatar
      decompressors: add boot-time XZ support · 3ebe1243
      Lasse Collin authored
      
      This implements the API defined in <linux/decompress/generic.h> which is
      used for kernel, initramfs, and initrd decompression.  This patch together
      with the first patch is enough for XZ-compressed initramfs and initrd;
      XZ-compressed kernel will need arch-specific changes.
      
      The buffering requirements described in decompress_unxz.c are stricter
      than with gzip, so the relevant changes should be done to the
      arch-specific code when adding support for XZ-compressed kernel.
      Similarly, the heap size in arch-specific pre-boot code may need to be
      increased (30 KiB is enough).
      
      The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
      memzero() (memset(ptr, 0, size)), which aren't available in all
      arch-specific pre-boot environments.  I'm including simple versions in
      decompress_unxz.c, but a cleaner solution would naturally be nicer.
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ebe1243
    • Lasse Collin's avatar
      decompressors: add XZ decompressor module · 24fa0402
      Lasse Collin authored
      In userspace, the .lzma format has become mostly a legacy file format that
      got superseded by the .xz format.  Similarly, LZMA Utils was superseded by
      XZ Utils.
      
      These patches add support for XZ decompression into the kernel.  Most of
      the code is as is from XZ Embedded <http://tukaani.org/xz/embedded.html
      
      >.
      It was written for the Linux kernel but is usable in other projects too.
      
      Advantages of XZ over the current LZMA code in the kernel:
        - Nice API that can be used by other kernel modules; it's
          not limited to kernel, initramfs, and initrd decompression.
        - Integrity check support (CRC32)
        - BCJ filters improve compression of executable code on
          certain architectures. These together with LZMA2 can
          produce a few percent smaller kernel or Squashfs images
          than plain LZMA without making the decompression slower.
      
      This patch: Add the main decompression code (xz_dec), testing module
      (xz_dec_test), wrapper script (xz_wrap.sh) for the xz command line tool,
      and documentation.  The xz_dec module is enough to have a usable XZ
      decompressor e.g.  for Squashfs.
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24fa0402
    • Lasse Collin's avatar
      Decompressors: fix callback-to-callback mode in decompress_unlzo.c · fb7fa589
      Lasse Collin authored
      
      Callback-to-callback decompression mode is used for initrd (not
      initramfs).  The LZO wrapper is broken for this use case for two reasons:
      
        - The argument validation is needlessly too strict by
          requiring that "posp" is non-NULL when "fill" is non-NULL.
      
        - The buffer handling code didn't work at all for this
          use case.
      
      I tested with LZO-compressed kernel, initramfs, initrd, and corrupt
      (truncated) initramfs and initrd images.
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb7fa589
    • Lasse Collin's avatar
      Decompressors: check input size in decompress_unlzo.c · 5a3f81a7
      Lasse Collin authored
      
      The code assumes that the input is valid and not truncated.  Add checks to
      avoid reading past the end of the input buffer.  Change the type of "skip"
      from u8 to int to fix a possible integer overflow.
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a3f81a7
Loading