1. 09 Dec, 2006 1 commit
  2. 08 Dec, 2006 1 commit
  3. 06 Dec, 2006 4 commits
    • Adrian Bunk's avatar
      [PATCH] i386: Clean up smp_tune_scheduling() · d9408cef
      Adrian Bunk authored
      
      
      - remove the write-only local variable "bandwidth"
      - don't set "max_cache_size" in the (cachesize < 0) case:
        that's already handled in kernel/sched.c:measure_migration_cost()
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      d9408cef
    • Siddha, Suresh B's avatar
      [PATCH] x86: fix the irqbalance quirk for E7320/E7520/E7525 · b0d0a4ba
      Siddha, Suresh B authored
      Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
      http://download.intel.com/design/chipsets/specupdt/30304203.pdf
      
      ) to early
      quirks.
      
      And add a PCI quirk for these platforms to check(which happens very late
      during the boot) if the APIC routing is indeed set to default flat mode.
      
      This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
      selects physical mode instead of the logical flat(as needed for this errata
      workaround).
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Li, Shaohua" <shaohua.li@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      b0d0a4ba
    • Rusty Russell's avatar
      [PATCH] paravirt: header and stubs for paravirtualisation · d3561b7f
      Rusty Russell authored
      
      
      Create a paravirt.h header for all the critical operations which need to be
      replaced with hypervisor calls, and include that instead of defining native
      operations, when CONFIG_PARAVIRT.
      
      This patch does the dumbest possible replacement of paravirtualized
      instructions: calls through a "paravirt_ops" structure.  Currently these are
      function implementations of native hardware: hypervisors will override the ops
      structure with their own variants.
      
      All the pv-ops functions are declared "fastcall" so that a specific
      register-based ABI is used, to make inlining assember easier.
      
      And:
      
      +From: Andy Whitcroft <apw@shadowen.org>
      
      The paravirt ops introduce a 'weak' attribute onto memory_setup().
      Code ordering leads to the following warnings on x86:
      
          arch/i386/kernel/setup.c:651: warning: weak declaration of
                      `memory_setup' after first use results in unspecified behavior
      
      Move memory_setup() to avoid this.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarChris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      d3561b7f
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Initialize the per-CPU data area · 62111195
      Jeremy Fitzhardinge authored
      
      
      When a CPU is brought up, a PDA and GDT are allocated for it.  The GDT's
      __KERNEL_PDA entry is pointed to the allocated PDA memory, so that all
      references using this segment descriptor will refer to the PDA.
      
      This patch rearranges CPU initialization a bit, so that the GDT/PDA are set up
      as early as possible in cpu_init().  Also for secondary CPUs, GDT+PDA are
      preallocated and initialized so all the secondary CPU needs to do is set up
      the ldt and load %gs.  This will be important once smp_processor_id() and
      current use the PDA.
      
      In all cases, the PDA is set up in head.S, before a CPU starts running C code,
      so the PDA is always available.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Cc: Matt Tolentino <matthew.e.tolentino@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      62111195
  4. 22 Nov, 2006 1 commit
  5. 03 Oct, 2006 1 commit
  6. 02 Oct, 2006 1 commit
  7. 01 Oct, 2006 1 commit
  8. 29 Sep, 2006 1 commit
    • keith mannthey's avatar
      [PATCH] convert i386 Summit subarch to use SRAT info for apicid_to_node calls · 3b08606d
      keith mannthey authored
      
      
      Convert the i386 summit subarch apicid_to_node to use node information
      provided by the SRAT.  It was discussed a little on LKML a few weeks ago
      and was seen as an acceptable fix.  The current way of obtaining the nodeid
      
       static inline int apicid_to_node(int logical_apicid)
       {
         return logical_apicid >> 5;
       }
      
      is just not correct for all summit systems/bios.  Assuming the apicid
      matches the Linux node number require a leap of faith that the bios mapped
      out the apicids a set way.  Modern summit HW (IBM x460) does not layout its
      bios in the manner for various reasons and is unable to boot i386 numa.
      
      The best way to get the correct apicid to node information is from the SRAT
      table during boot.  It lays out what apicid belongs to what node.  I use
      this information to create a table for use at run time.
      Signed-off-by: default avatarKeith Mannthey <kmannth@us.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3b08606d
  9. 26 Sep, 2006 3 commits
    • Dave Jones's avatar
      [PATCH] i386: don't taint UP K7's running SMP kernels. · 3ca113ea
      Dave Jones authored
      
      
      We have a test that looks for invalid pairings of certain athlon/durons
      that weren't designed for SMP, and taint accordingly (with 'S') if we find
      such a configuration.  However, this test shouldn't fire if there's only
      a single CPU present. It's perfectly valid for an SMP kernel to boot on UP
      hardware for example.
      
      AK: changed to num_possible_cpus()
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      3ca113ea
    • Rusty Russell's avatar
      [PATCH] i386: Replace i386 open-coded cmdline parsing with · 1a3f239d
      Rusty Russell authored
      
      
      This patch replaces the open-coded early commandline parsing
      throughout the i386 boot code with the generic mechanism (already used
      by ppc, powerpc, ia64 and s390).  The code was inconsistent with
      whether it deletes the option from the cmdline or not, meaning some of
      these will get passed through the environment into init.
      
      This transformation is mainly mechanical, but there are some notable
      parts:
      
      1) Grammar: s/linux never set's it up/linux never sets it up/
      
      2) Remove hacked-in earlyprintk= option scanning.  When someone
         actually implements CONFIG_EARLY_PRINTK, then they can use
         early_param().
      [AK: actually it is implemented, but I'm adding the early_param it in the next
      x86-64 patch]
      
      3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
      
      4) Various parameters now moved into their appropriate files (thanks Andi).
      
      5) All parse functions which examine arg need to check for NULL,
         except one where it has subtle humor value.
      
      AK: readded acpi_sci handling which was completely dropped
      AK: moved some more variables into acpi/boot.c
      
      Cc: len.brown@intel.com
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      1a3f239d
    • Shaohua Li's avatar
      [PATCH] i386/x86-64: Fix NMI watchdog suspend/resume · 4038f901
      Shaohua Li authored
      
      
      Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
      APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
      method. APs should follow CPU hotplug code path.
      
      And:
      
      +From: Don Zickus <dzickus@redhat.com>
      
      Makes the start/stop paths of nmi watchdog more robust to handle the
      suspend/resume cases more gracefully.
      
      AK: I merged the two patches together
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      4038f901
  10. 25 Sep, 2006 1 commit
  11. 31 Jul, 2006 1 commit
  12. 30 Jun, 2006 1 commit
  13. 27 Jun, 2006 3 commits
  14. 26 Jun, 2006 1 commit
  15. 25 Jun, 2006 1 commit
  16. 28 Apr, 2006 1 commit
  17. 27 Mar, 2006 1 commit
    • Siddha, Suresh B's avatar
      [PATCH] sched: new sched domain for representing multi-core · 1e9f28fa
      Siddha, Suresh B authored
      
      
      Add a new sched domain for representing multi-core with shared caches
      between cores.  Consider a dual package system, each package containing two
      cores and with last level cache shared between cores with in a package.  If
      there are two runnable processes, with this appended patch those two
      processes will be scheduled on different packages.
      
      On such systems, with this patch we have observed 8% perf improvement with
      specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
      users).
      
      This new domain will come into play only on multi-core systems with shared
      caches.  On other systems, this sched domain will be removed by domain
      degeneration code.  This new domain can be also used for implementing power
      savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
      I will post another patch for power savings policy soon)
      
      Most of the arch/* file changes are for cpu_coregroup_map() implementation.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1e9f28fa
  18. 25 Mar, 2006 1 commit
  19. 23 Mar, 2006 1 commit
    • Gerd Hoffmann's avatar
      [PATCH] x86: SMP alternatives · 9a0b5817
      Gerd Hoffmann authored
      
      
      Implement SMP alternatives, i.e.  switching at runtime between different
      code versions for UP and SMP.  The code can patch both SMP->UP and UP->SMP.
      The UP->SMP case is useful for CPU hotplug.
      
      With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
      when the number of CPUs goes down to 1, and switches to SMP when the number
      of CPUs goes up to 2.
      
      Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
      patched once at boot time (if needed) and the tables are released
      afterwards.
      
      The changes in detail:
      
        * The current alternatives bits are moved to a separate file,
          the SMP alternatives code is added there.
      
        * The patch adds some new elf sections to the kernel:
          .smp_altinstructions
      	like .altinstructions, also contains a list
      	of alt_instr structs.
          .smp_altinstr_replacement
      	like .altinstr_replacement, but also has some space to
      	save original instruction before replaving it.
          .smp_locks
      	list of pointers to lock prefixes which can be nop'ed
      	out on UP.
          The first two are used to replace more complex instruction
          sequences such as spinlocks and semaphores.  It would be possible
          to deal with the lock prefixes with that as well, but by handling
          them as special case the table sizes become much smaller.
      
       * The sections are page-aligned and padded up to page size, so they
         can be free if they are not needed.
      
       * Splitted the code to release init pages to a separate function and
         use it to release the elf sections if they are unused.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@suse.de>
      Signed-off-by: default avatarChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9a0b5817
  20. 17 Mar, 2006 1 commit
  21. 24 Feb, 2006 1 commit
    • James Bottomley's avatar
      [PATCH] x86: fix broken SMP boot sequence · 2b932f6c
      James Bottomley authored
      
      
      Recent GDT changes broke the SMP boot sequence if the booting CPU is
      numbered anything other than zero.  There's also a subtle source of error
      in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
      for booting CPUs in head.S).  This patch fixes both problems by making GDT
      descriptors themselves allocated from a per_cpu area and switching to them
      in cpu_init(), which now means that cpu_gdt_table is exclusively used for
      booting CPUs again.
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2b932f6c
  22. 10 Feb, 2006 1 commit
    • Andrew Morton's avatar
      [PATCH] x86: don't initialise cpu_possible_map to all ones · 7a8ef1cb
      Andrew Morton authored
      
      
      Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that
      
      a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather
         than over possible ones.  That can be quite expensive.
      
      b) Soon we'll be allocating per-cpu areas only for possible CPUs.  So with
         CPU_MASK_ALL, we'll be wasting memory.
      
      I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug
      case.  Should be OK..
      
      I note that parisc is also using CPU_MASK_ALL.  Suggest that it stop doing
      that.
      
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7a8ef1cb
  23. 12 Jan, 2006 2 commits
    • akpm@osdl.org's avatar
      [PATCH] i386: fix task_pt_regs() · 07b047fc
      akpm@osdl.org authored
      
      
      )
      
      From: Al Viro <viro@ftp.linux.org.uk>
      
      task_pt_regs() needs the same offset-by-8 to match copy_thread()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      07b047fc
    • akpm@osdl.org's avatar
      [PATCH] scheduler cache-hot-autodetect · 198e2f18
      akpm@osdl.org authored
      
      
      )
      
      From: Ingo Molnar <mingo@elte.hu>
      
      This is the latest version of the scheduler cache-hot-auto-tune patch.
      
      The first problem was that detection time scaled with O(N^2), which is
      unacceptable on larger SMP and NUMA systems. To solve this:
      
      - I've added a 'domain distance' function, which is used to cache
        measurement results. Each distance is only measured once. This means
        that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
        distances 0 and 1, and on SMP distance 0 is measured. The code walks
        the domain tree to determine the distance, so it automatically follows
        whatever hierarchy an architecture sets up. This cuts down on the boot
        time significantly and removes the O(N^2) limit. The only assumption
        is that migration costs can be expressed as a function of domain
        distance - this covers the overwhelming majority of existing systems,
        and is a good guess even for more assymetric systems.
      
        [ People hacking systems that have assymetries that break this
          assumption (e.g. different CPU speeds) should experiment a bit with
          the cpu_distance() function. Adding a ->migration_distance factor to
          the domain structure would be one possible solution - but lets first
          see the problem systems, if they exist at all. Lets not overdesign. ]
      
      Another problem was that only a single cache-size was used for measuring
      the cost of migration, and most architectures didnt set that variable
      up. Furthermore, a single cache-size does not fit NUMA hierarchies with
      L3 caches and does not fit HT setups, where different CPUs will often
      have different 'effective cache sizes'. To solve this problem:
      
      - Instead of relying on a single cache-size provided by the platform and
        sticking to it, the code now auto-detects the 'effective migration
        cost' between two measured CPUs, via iterating through a wide range of
        cachesizes. The code searches for the maximum migration cost, which
        occurs when the working set of the test-workload falls just below the
        'effective cache size'. I.e. real-life optimized search is done for
        the maximum migration cost, between two real CPUs.
      
        This, amongst other things, has the positive effect hat if e.g. two
        CPUs share a L2/L3 cache, a different (and accurate) migration cost
        will be found than between two CPUs on the same system that dont share
        any caches.
      
      (The reliable measurement of migration costs is tricky - see the source
      for details.)
      
      Furthermore i've added various boot-time options to override/tune
      migration behavior.
      
      Firstly, there's a blanket override for autodetection:
      
      	migration_cost=1000,2000,3000
      
      will override the depth 0/1/2 values with 1msec/2msec/3msec values.
      
      Secondly, there's a global factor that can be used to increase (or
      decrease) the autodetected values:
      
      	migration_factor=120
      
      will increase the autodetected values by 20%. This option is useful to
      tune things in a workload-dependent way - e.g. if a workload is
      cache-insensitive then CPU utilization can be maximized by specifying
      migration_factor=0.
      
      I've tested the autodetection code quite extensively on x86, on 3
      P3/Xeon/2MB, and the autodetected values look pretty good:
      
      Dual Celeron (128K L2 cache):
      
       ---------------------
       migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
       ---------------------
                 [00]    [01]
       [00]:     -     1.7(1)
       [01]:   1.7(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (0) 1.7 (1784008)
       ---------------------
      
      Here the slow memory subsystem dominates system performance, and even
      though caches are small, the migration cost is 1.7 msecs.
      
      Dual HT P4 (512K L2 cache):
      
       ---------------------
       migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
       ---------------------
                 [00]    [01]    [02]    [03]
       [00]:     -     0.4(1)  0.0(0)  0.4(1)
       [01]:   0.4(1)    -     0.4(1)  0.0(0)
       [02]:   0.0(0)  0.4(1)    -     0.4(1)
       [03]:   0.4(1)  0.0(0)  0.4(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (33900) 0.4 (448514)
       ---------------------
      
      Here it can be seen that there is no migration cost between two HT
      siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
      system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
      
      8-way P3/Xeon [2MB L2 cache]:
      
       ---------------------
       migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
       ---------------------
                 [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
       [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
       [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
       [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
       [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (0) 19.2 (19281756)
       ---------------------
      
      This one has huge caches and a relatively slow memory subsystem - so the
      migration cost is 19 msecs.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Cc: <wilder@us.ibm.com>
      Signed-off-by: default avatarJohn Hawkes <hawkes@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      198e2f18
  24. 06 Jan, 2006 1 commit
    • Zachary Amsden's avatar
      [PATCH] x86: GDT alignment fix · 7c4cb60e
      Zachary Amsden authored
      
      
      Make GDT page aligned and page padded to support running inside of a
      hypervisor.  This prevents false sharing of the GDT page with other hot
      data, which is not allowed in Xen, and causes performance problems in
      VMware.
      
      Rather than go back to the old method of statically allocating the GDT
      (which wastes unneded space for non-present CPUs), the GDT for APs is
      allocated dynamically.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7c4cb60e
  25. 12 Dec, 2005 1 commit
  26. 14 Nov, 2005 1 commit
  27. 09 Nov, 2005 1 commit
    • Nick Piggin's avatar
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin authored
      
      
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
  28. 07 Nov, 2005 2 commits
  29. 31 Oct, 2005 1 commit
  30. 30 Oct, 2005 2 commits