1. 25 Dec, 2008 14 commits
  2. 30 Nov, 2008 1 commit
  3. 27 Nov, 2008 5 commits
    • Martin Schwidefsky's avatar
      abd94219
    • Heiko Carstens's avatar
      [S390] Fix alignment of initial kernel stack. · 0778dc3a
      Heiko Carstens authored
      
      
      We need an alignment of 16384 bytes for the initial kernel stack if
      the kernel is configured for 16384 bytes stacks but the linker script
      currently guarantees only an alignment of 8192 bytes.
      
      So fix this and simply use THREAD_SIZE as alignment value which will
      always do the right thing.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0778dc3a
    • Christian Borntraeger's avatar
      [S390] pgtable.h: Fix oops in unmap_vmas for KVM processes · 2944a5c9
      Christian Borntraeger authored
      
      
      When running several kvm processes with lots of memory overcommitment,
      we have seen an oops during process shutdown:
      ------------[ cut here ]------------
      Kernel BUG at 0000000000193434 [verbose debug info unavailable]
      addressing exception: 0005 [#1] PREEMPT SMP
      Modules linked in: kvm sunrpc qeth_l2 dm_mod qeth ccwgroup
      CPU: 10 Not tainted 2.6.28-rc4-kvm-bigiron-00521-g0ccca08-dirty #8
      Process kuli (pid: 14460, task: 0000000149822338, ksp: 0000000024f57650)
      Krnl PSW : 0704e00180000000 0000000000193434 (unmap_vmas+0x884/0xf10)
      R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
      Krnl GPRS: 0000000000000002 0000000000000000 000000051008d000 000003e05e6034e0
                 00000000001933f6 00000000000001e9 0000000407259e0a 00000002be88c400
                 00000200001c1000 0000000407259608 0000000407259e08 0000000024f577f0
                 0000000407259e09 0000000000445fa8 00000000001933f6 0000000024f577f0
      Krnl Code: 0000000000193426: eb22000c000d sllg %r2,%r2,12
                 000000000019342c: a7180000 lhi %r1,0
                 0000000000193430: b2290012 iske %r1,%r2
                >0000000000193434: a7110002 tmll %r1,2
                 0000000000193438: a7840006 brc 8,193444
                 000000000019343c: 9602c000 oi 0(%r12),2
                 0000000000193440: 96806000 oi 0(%r6),128
                 0000000000193444: a7110004 tmll %r1,4
      Call Trace:
      ([<00000000001933f6>] unmap_vmas+0x846/0xf10)
      [<0000000000199680>] exit_mmap+0x210/0x458
      [<000000000012a8f8>] mmput+0x54/0xfc
      [<000000000012f714>] exit_mm+0x134/0x144
      [<000000000013120c>] do_exit+0x240/0x878
      [<00000000001318dc>] do_group_exit+0x98/0xc8
      [<000000000013e6b0>] get_signal_to_deliver+0x30c/0x358
      [<000000000010bee0>] do_signal+0xec/0x860
      [<0000000000112e30>] sysc_sigpending+0xe/0x22
      [<000002000013198a>] 0x2000013198a
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
      [<00000000001a68d0>] free_swap_and_cache+0x1a0/0x1a4
      <4>---[ end trace bc19f1d51ac9db7c ]---
      
      The faulting instruction is the storage key operation (iske) in
      ptep_rcp_copy (called by pte_clear, called by unmap_vmas). iske
      reads dirty and reference bit information for a physical page and
      requires a valid physical address. Since we are in pte_clear, we
      cannot rely on the pte containing a valid address. Fortunately we
      dont need these information in pte_clear - after all there is no
      mapping. The best fix is to remove the needless call to ptep_rcp_copy
      that contains the iske.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      2944a5c9
    • Christian Borntraeger's avatar
      [S390] fix/cleanup sched_clock · 8107d829
      Christian Borntraeger authored
      
      
      CONFIG_PRINTK_TIME reveals that sched_clock has a wrong offset during boot:
      ..
      [    0.000000]   Movable zone: 0 pages used for memmap
      [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 775679
      [    0.000000] Kernel command line: dasd=4b6c root=/dev/dasda1 ro noinitrd
      [    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
      [6920575.975232] console [ttyS0] enabled
      [6920575.987586] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
      [6920575.991404] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
      ..
      
      The s390 implementation of sched_clock uses the store clock instruction and
      subtracts jiffies_timer_cc.
      jiffies_timer_cc is a local variable in arch/s390/kernel/time.c and only used
      for sched_clock and monotonic clock. For historical reasons there is an offset
      on that value. With todays code this offset is unnecessary. By removing that
      offset we can get a sched_clock which returns the nanoseconds after time_init.
      This improves CONFIG_PRINTK_TIME.
      
      Since sched_clock is the only user, I have also renamed jiffies_timer_cc to
      sched_clock_base_cc. In addition, the local variable init_timer_cc is redundant
      and can be romved as well.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      8107d829
    • Martin Schwidefsky's avatar
      [S390] fix system call parameter functions. · 59da2139
      Martin Schwidefsky authored
      
      
      syscall_get_nr() currently returns a valid result only if the call
      chain of the traced process includes do_syscall_trace_enter(). But
      collect_syscall() can be called for any sleeping task, the result of
      syscall_get_nr() in general is completely bogus.
      
      To make syscall_get_nr() work for any sleeping task the traps field
      in pt_regs is replace with svcnr - the system call number the process
      is executing. If svcnr == 0 the process is not on a system call path.
      
      The syscall_get_arguments and syscall_set_arguments use regs->gprs[2]
      for the first system call parameter. This is incorrect since gprs[2]
      may have been overwritten with the system call number if the call
      chain includes do_syscall_trace_enter. Use regs->orig_gprs2 instead.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      59da2139
  4. 23 Nov, 2008 1 commit
  5. 14 Nov, 2008 6 commits
  6. 28 Oct, 2008 6 commits
    • Christian Borntraeger's avatar
      [S390] s390: Fix build for !CONFIG_S390_GUEST + CONFIG_VIRTIO_CONSOLE · ea4bfdf5
      Christian Borntraeger authored
      The s390 kernel does not compile if virtio console is enabled, but guest
      support is disabled:
      
        LD      .tmp_vmlinux1
      arch/s390/kernel/built-in.o: In function `setup_arch':
      /space/linux-2.5/arch/s390/kernel/setup.c:773: undefined reference to
      `s390_virtio_console_init'
      
      The fix is related to
      commit 99e65c92
      
      
      Author: Christian Borntraeger <borntraeger@de.ibm.com>
      Date:   Fri Jul 25 15:50:04 2008 +0200
          KVM: s390: Fix guest kconfig
      
      Which changed the build process to build kvm_virtio.c only if CONFIG_S390_GUEST
      is set. We must ifdef the prototype in the header file accordingly.
      Reported-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      ea4bfdf5
    • Heiko Carstens's avatar
      [S390] No more 4kb stacks. · 7f5a8ba6
      Heiko Carstens authored
      
      
      We got a stack overflow with a small stack configuration on a 32 bit
      system. It just looks like as 4kb isn't enough and too dangerous.
      So lets get rid of 4kb stacks on 32 bit.
      
      But one thing I completely dislike about the call trace below is that
      just for debugging or tracing purposes sprintf gets called (cio_start_key):
      
      	/* process condition code */
      	sprintf(dbf_txt, "ccode:%d", ccode);
      	CIO_TRACE_EVENT(4, dbf_txt);
      
      But maybe its just me who thinks that this could be done better.
      
          <4>Kernel stack overflow.
          <4>Modules linked in: dm_multipath sunrpc bonding qeth_l2 dm_mod qeth ccwgroup vmur
          <4>CPU: 1 Not tainted 2.6.27-30.x.20081015-s390default #1
          <4>Process httpd (pid: 3807, task: 20ae2df8, ksp: 1666fb78)
          <4>Krnl PSW : 040c0000 8027098a (number+0xe/0x348)
          <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0
          <4>Krnl GPRS: 00d43318 0027097c 1666f277 9666f270
          <4>           00000000 00000000 0000000a ffffffff
          <4>           9666f270 1666f228 1666f277 1666f098
          <4>           00000002 80270982 80271016 1666f098
          <4>Krnl Code: 8027097e: f0340dd0a7f1	srp	3536(4,%r0),2033(%r10),4
          <4>           80270984: 0f00		clcl	%r0,%r0
          <4>           80270986: a7840001		brc	8,80270988
          <4>          >8027098a: 18ef		lr	%r14,%r15
          <4>           8027098c: a7faff68		ahi	%r15,-152
          <4>           80270990: 18bf		lr	%r11,%r15
          <4>           80270992: 18a2		lr	%r10,%r2
          <4>           80270994: 1893		lr	%r9,%r3
      
      Modified calltrace with annotated stackframe size of each function:
      
      stackframe size
          |
       0 304 vsnprintf+850 [0x271016]
       1  72 sprintf+74 [0x271522]
       2  56 cio_start_key+262 [0x2d4c16]
       3  56 ccw_device_start_key+222 [0x2dfe92]
       4  56 ccw_device_start+40 [0x2dff28]
       5  48 raw3215_start_io+104 [0x30b0f8]
       6  56 raw3215_write+494 [0x30ba0a]
       7  40 con3215_write+68 [0x30bafc]
       8  40 __call_console_drivers+146 [0x12b0fa]
       9  32 _call_console_drivers+102 [0x12b192]
      10  64 release_console_sem+268 [0x12b614]
      11 168 vprintk+462 [0x12bca6]
      12  72 printk+68 [0x12bfd0]
      13 256 __print_symbol+50 [0x15a882]
      14  56 __show_trace+162 [0x103d06]
      15  32 show_trace+224 [0x103e70]
      16  48 show_stack+152 [0x103f20]
      17  56 dump_stack+126 [0x104612]
      18  96 __alloc_pages_internal+592 [0x175004]
      19  80 cache_alloc_refill+776 [0x196f3c]
      20  40 __kmalloc+258 [0x1972ae]
      21  40 __alloc_skb+94 [0x328086]
      22  32 pskb_copy+50 [0x328252]
      23  32 skb_realloc_headroom+110 [0x328a72]
      24 104 qeth_l2_hard_start_xmit+378 [0x7803bfde]
      25  56 dev_hard_start_xmit+450 [0x32ef6e]
      26  56 __qdisc_run+390 [0x3425d6]
      27  48 dev_queue_xmit+410 [0x331e06]
      28  40 ip_finish_output+308 [0x354ac8]
      29  56 ip_output+218 [0x355b6e]
      30  24 ip_local_out+56 [0x354584]
      31 120 ip_queue_xmit+300 [0x355cec]
      32  96 tcp_transmit_skb+812 [0x367da8]
      33  40 tcp_push_one+158 [0x369fda]
      34 112 tcp_sendmsg+852 [0x35d5a0]
      35 240 sock_sendmsg+164 [0x32035c]
      36  56 kernel_sendmsg+86 [0x32064a]
      37  88 sock_no_sendpage+98 [0x322b22]
      38 104 tcp_sendpage+70 [0x35cc1e]
      39  48 sock_sendpage+74 [0x31eb66]
      40  64 pipe_to_sendpage+102 [0x1c4b2e]
      41  64 __splice_from_pipe+120 [0x1c5340]
      42  72 splice_from_pipe+90 [0x1c57e6]
      43  56 generic_splice_sendpage+38 [0x1c5832]
      44  48 do_splice_from+104 [0x1c4c38]
      45  48 direct_splice_actor+52 [0x1c4c88]
      46  80 splice_direct_to_actor+180 [0x1c4f80]
      47  72 do_splice_direct+70 [0x1c5112]
      48  64 do_sendfile+360 [0x19de18]
      49  72 sys_sendfile64+126 [0x19df32]
      50 336 sysc_do_restart+18 [0x111a1a]
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7f5a8ba6
    • Heiko Carstens's avatar
      [S390] Change default IPL method to IPL_VM. · 46e7951f
      Heiko Carstens authored
      
      
      allyesconfig and allmodconfig built kernels have a tape IPL record.
      A the vmreader record makes much more sense, since hardly anybody will
      ever IPL a kernel from tape. So change the default.
      As I side effect I can test these kernels without fiddling around with
      the kernel config ;)
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      46e7951f
    • Roel Kluin's avatar
      [S390] appldata: unsigned ops->size cannot be negative · 13f8b7c5
      Roel Kluin authored
      
      
      unsigned ops->size cannot be negative
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      13f8b7c5
    • Heiko Carstens's avatar
      [S390] Fix sysdev class file creation. · da5aae70
      Heiko Carstens authored
      
      
      Use sysdev_class_create_file() to create create sysdev class attributes
      instead of sysfs_create_file(). Using sysfs_create_file() wasn't a very
      good idea since the show and store functions have a different amount of
      parameters for sysfs files and sysdev class files.
      In particular the pointer to the buffer is the last argument and
      therefore accesses to random memory regions happened.
      Still worked surprisingly well until we got a kernel panic.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      da5aae70
    • Christian Borntraeger's avatar
      [S390] pgtables: Fix race in enable_sie vs. page table ops · 250cf776
      Christian Borntraeger authored
      
      
      The current enable_sie code sets the mm->context.pgstes bit to tell
      dup_mm that the new mm should have extended page tables. This bit is also
      used by the s390 specific page table primitives to decide about the page
      table layout - which means context.pgstes has two meanings. This can cause
      any kind of bugs. For example  - e.g. shrink_zone can call
      ptep_clear_flush_young while enable_sie is running. ptep_clear_flush_young
      will test for context.pgstes. Since enable_sie changed that value of the old
      struct mm without changing the page table layout ptep_clear_flush_young will
      do the wrong thing.
      The solution is to split pgstes into two bits
      - one for the allocation
      - one for the current state
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      250cf776
  7. 20 Oct, 2008 3 commits
    • Matt Helsley's avatar
      container freezer: implement freezer cgroup subsystem · dc52ddc0
      Matt Helsley authored
      
      
      This patch implements a new freezer subsystem in the control groups
      framework.  It provides a way to stop and resume execution of all tasks in
      a cgroup by writing in the cgroup filesystem.
      
      The freezer subsystem in the container filesystem defines a file named
      freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.  Reading will return the current state.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This is the basic mechanism which should do the right thing for user space
      task in a simple scenario.
      
      It's important to note that freezing can be incomplete.  In that case we
      return EBUSY.  This means that some tasks in the cgroup are busy doing
      something that prevents us from completely freezing the cgroup at this
      time.  After EBUSY, the cgroup will remain partially frozen -- reflected
      by freezer.state reporting "FREEZING" when read.  The state will remain
      "FREEZING" until one of these things happens:
      
      	1) Userspace cancels the freezing operation by writing "RUNNING" to
      		the freezer.state file
      	2) Userspace retries the freezing operation by writing "FROZEN" to
      		the freezer.state file (writing "FREEZING" is not legal
      		and returns EIO)
      	3) The tasks that blocked the cgroup from entering the "FROZEN"
      		state disappear from the cgroup's set of tasks.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: export thaw_process]
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc52ddc0
    • Matt Helsley's avatar
      container freezer: add TIF_FREEZE flag to all architectures · 83224b08
      Matt Helsley authored
      
      
      This patch series introduces a cgroup subsystem that utilizes the swsusp
      freezer to freeze a group of tasks.  It's immediately useful for batch job
      management scripts.  It should also be useful in the future for
      implementing container checkpoint/restart.
      
      The freezer subsystem in the container filesystem defines a cgroup file
      named freezer.state.  Reading freezer.state will return the current state
      of the cgroup.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This patch:
      
      The first step in making the refrigerator() available to all
      architectures, even for those without power management.
      
      The purpose of such a change is to be able to use the refrigerator() in a
      new control group subsystem which will implement a control group freezer.
      
      [akpm@linux-foundation.org: fix sparc]
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarPavel Machek <pavel@suse.cz>
      Acked-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarNigel Cunningham <nigel@tuxonice.net>
      Tested-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83224b08
    • Badari Pulavarty's avatar
      mm: cleanup to make remove_memory() arch-neutral · 71088785
      Badari Pulavarty authored
      
      
      There is nothing architecture specific about remove_memory().
      remove_memory() function is common for all architectures which support
      hotplug memory remove.  Instead of duplicating it in every architecture,
      collapse them into arch neutral function.
      
      [akpm@linux-foundation.org: fix the export]
      Signed-off-by: default avatarBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71088785
  8. 16 Oct, 2008 4 commits