- 29 Jul, 2009 5 commits
-
-
Dave Hansen authored
Once a structure goes over PAGE_SIZE*2, we see occasional allocation failures. Some people have chosen to switch over to things like vmalloc() that will let them keep array-like access to such a large structures. But, vmalloc() has plenty of downsides. Here's an alternative. I think it's what Andrew was suggesting here: http://lkml.org/lkml/2009/7/2/518 I call it a flexible array. It does all of its work in PAGE_SIZE bits, so never does an order>0 allocation. The base level has PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level. So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total storage when the objects pack nicely into a page. It is half that on 64-bit because the pointers are twice the size. There's a table detailing this in the code. There are kerneldocs for the functions, but here's an overview: flex_array_alloc() - dynamically allocate a base structure flex_array_free() - free the array and all of the second-level pages flex_array_free_parts() - free the second-level pages, but not the base (for static bases) flex_array_put() - copy into the array at the given index flex_array_get() - copy out of the array at the given index flex_array_prealloc() - preallocate the second-level pages between the given indexes to guarantee no allocs will occur at put() time. We could also potentially just pass the "element_size" into each of the API functions instead of storing it internally. That would get us one more base pointer on 32-bit. I've been testing this by running it in userspace. The header and patch that I've been using are here, as well as the little script I'm using to generate the size table which goes in the kerneldocs. http://sr71.net/~dave/linux/flexarray/ [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Dave Jones authored
Found with make headers_check /usr/include/linux/pps.h:52: found __[us]{8,16,32,64} type without #include <linux/types.h> Signed-off-by:
Dave Jones <davej@redhat.com> Cc: Rodolfo Giometti <giometti@linux.it> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
After commit ec64f515 ("cgroup: fix frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg) doesn't return -EBUSY by temporary ref counts. That commit expects all refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can wait permanently. This patch tries to fix that and change followings. - set CGRP_WAIT_ON_RMDIR flag before pre_destroy(). - clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case. if there are sleeping ones, wakes them up. - rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set. Tested-by:
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by:
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by:
Paul Menage <menage@google.com> Acked-by:
Balbir Sigh <balbir@linux.vnet.ibm.com> Signed-off-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Li Zefan authored
The bug was introduced by commit cc31edce ("cgroups: convert tasks file to use a seq_file with shared pid array"). We cache a pid array for all threads that are opening the same "tasks" file, but the pids in the array are always from the namespace of the last process that opened the file, so all other threads will read pids from that namespace instead of their own namespaces. To fix it, we maintain a list of pid arrays, which is keyed by pid_ns. The list will be of length 1 at most time. Reported-by:
Paul Menage <menage@google.com> Idea-by:
Paul Menage <menage@google.com> Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Reviewed-by:
Serge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
OGAWA Hirofumi authored
We really don't want to mark the pty as a low-latency device, because as Alan points out, the ->write method can be called from an IRQ (ppp?), and that means we can't use ->low_latency=1 as we take mutexes in the low_latency case. So rather than using low_latency to force the written data to be pushed to the ldisc handling at 'write()' time, just make the reader side (or the poll function) do the flush when it checks whether there is data to be had. This also fixes the problem with lost data in an emacs compile buffer (bugzilla 13815), and we can thus revert the low_latency pty hack (commit 3a542974: "pty: quickfix for the pty ENXIO timing problems"). Signed-off-by:
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [ Modified to do the tty_flush_to_ldisc() inside input_available_p() so that it triggers for both read and poll() - Linus] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 28 Jul, 2009 1 commit
-
-
Tejun Heo authored
On certain configurations, HPA isn't or can't be unlocked during probing but it somehow ends up unlocked afterwards. In the following thread, the problem can be reliably reproduced after resuming from STR. The BIOS turns on HPA during boot but forgets to do it during resume. http://thread.gmane.org/gmane.linux.kernel/858310 This patch updates libata revalidation such that it considers native n_sectors. If the device size has increased to match native n_sectors, it's assumed that HPA has been unlocked involuntarily and the device is recognized as the same one. This should be fairly safe while nicely working around the problem. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Christof Warlich <christof@warlich.name> Signed-off-by:
Jeff Garzik <jgarzik@redhat.com>
-
- 24 Jul, 2009 1 commit
-
-
Brian Johnson authored
Signed-off-by:
Brian Johnson <brijohn@gmail.com> Signed-off-by:
Jean-Francois Moine <moinejf@free.fr> Signed-off-by:
Mauro Carvalho Chehab <mchehab@redhat.com>
-
- 23 Jul, 2009 1 commit
-
-
Mike Snitzer authored
Incorrect device area lengths are being passed to device_area_is_valid(). The regression appeared in 2.6.31-rc1 through commit 754c5fc7. With the dm-stripe target, the size of the target (ti->len) was used instead of the stripe_width (ti->len/#stripes). An example of a consequent incorrect error message is: device-mapper: table: 254:0: sdb too small for target Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 22 Jul, 2009 3 commits
-
-
Anton Vorontsov authored
Fixed-link support is broken for the ucc_eth, gianfar, and fs_enet device drivers. The "OF MDIO rework" patches removed most of the support. Instead of re-adding fixed-link stuff to the drivers, this patch adds a support function for parsing the fixed-link property and obtaining a dummy phy to match. Note: the dummy phy handling in arch/powerpc is a bit of a hack and needs to be reworked. This function is being added now to solve the regression in the Ethernet drivers, but it should be considered a temporary measure until the fixed link handling can be reworked. Signed-off-by:
Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by:
Grant Likely <grant.likely@secretlab.ca> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Peter Zijlstra authored
Anton noted that for inherited counters the counter-id as provided by PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID because each inherited counter gets its own id. His suggestion was to always return the parent counter id, since that is the primary counter id as exposed. However, these inherited counters have a unique identifier so that events like PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which counter gets modified, which is important when trying to normalize the sample streams. This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD, which is more useful anyway, since changing periods became a lot more common than initially thought -- rendering PERF_EVENT_PERIOD the less useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate value, since it reports the value used to trigger the overflow, whereas PERF_EVENT_PERIOD simply reports the requested period changed, which might only take effect on the next cycle). This still leaves us PERF_EVENT_THROTTLE to consider, but since that _should_ be a rare occurrence, and linking it to a primary id is the most useful bit to diagnose the problem, we introduce a PERF_SAMPLE_STREAM_ID, for those few cases where the full reconstruction is important. [Does change the ABI a little, but I see no other way out] Suggested-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248095846.15751.8781.camel@twins>
-
Peter Zijlstra authored
commit ca109491 (hrtimer: removing all ur callback modes) moved all hrtimer callbacks into hard interrupt context when high resolution timers are active. That breaks code which relied on the assumption that the callback happens in softirq context. Provide a generic infrastructure which combines tasklets and hrtimers together to provide an in-softirq hrtimer experience. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: torvalds@linux-foundation.org Cc: kaber@trash.net Cc: David Miller <davem@davemloft.net> LKML-Reference: <1248265724.27058.1366.camel@twins> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 21 Jul, 2009 3 commits
-
-
Eric Paris authored
inotify can have a watchs removed under filesystem reclaim. ================================= [ INFO: inconsistent lock state ] 2.6.31-rc2 #16 --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3 {IN-RECLAIM_FS-W} state was registered at: [<c10536ab>] __lock_acquire+0x2c9/0xac4 [<c1053f45>] lock_acquire+0x9f/0xc2 [<c1308872>] __mutex_lock_common+0x2d/0x323 [<c1308c00>] mutex_lock_nested+0x2e/0x36 [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2 [<c108bfb6>] shrink_slab+0xe2/0x13c [<c108c3e1>] kswapd+0x3d1/0x55d [<c10449b5>] kthread+0x66/0x6b [<c1003fdf>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff Two things are needed to fix this. First we need a method to tell fsnotify_create_event() to use GFP_NOFS and second we need to stop using one global IN_IGNORED event and allocate them one at a time. This solves current issues with multiple IN_IGNORED on a queue having tail drop problems and simplifies the allocations since we don't have to worry about two tasks opperating on the IGNORED event concurrently. Signed-off-by:
Eric Paris <eparis@redhat.com>
-
Alan Jenkins authored
Some drivers don't need the return value of rfkill_set_hw_state(), so it should not be marked as __must_check. Signed-off-by:
Alan Jenkins <alan-jenkins@tuffmail.co.uk> Acked-by:
Johannes Berg <johannes@sipsolutions.net> Signed-off-by:
John W. Linville <linville@tuxdriver.com>
-
Thomas Gleixner authored
irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might sleep, but irq_set_thread_affinity() is called with desc->lock held and can be called from hard interrupt context as well. The code has another bug as it does not hold a ref on the task struct as required by set_cpus_allowed_ptr(). Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time the thread runs it migrates itself. Solves all of the above problems nicely. Add kerneldoc to irq_set_thread_affinity() while at it. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission>
-
- 18 Jul, 2009 1 commit
-
-
Thomas Gleixner authored
commit e3c8ca83 (sched: do not count frozen tasks toward load) broke the nr_uninterruptible accounting on freeze/thaw. On freeze the task is excluded from accounting with a check for (task->flags & PF_FROZEN), but that flag is cleared before the task is thawed. So while we prevent that the task with state TASK_UNINTERRUPTIBLE is accounted to nr_uninterruptible on freeze we decrement nr_uninterruptible on thaw. Use a separate flag which is handled by the freezing task itself. Set it before calling the scheduler with TASK_UNINTERRUPTIBLE state and clear it after we return from frozen state. Cc: <stable@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 17 Jul, 2009 2 commits
-
-
Alex Williamson authored
Qemu added support for a few extra RX modes that Linux doesn't currently make use of. Sync the headers to maintain consistency. Signed-off-by:
Alex Williamson <alex.williamson@hp.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
Matias Zabaljauregui authored
fix: "make Guest" was complaining about duplicated G:032 Signed-off-by:
Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
- 16 Jul, 2009 1 commit
-
-
Johannes Weiner authored
Bootmem is not used for the vt screen buffer anymore as slab is now available at the time the console is initialized. Get rid of the now superfluous distinction between slab and bootmem, it's always slab. This also fixes a kmalloc leak which Catalin described thusly: Commit a5f4f52e ("vt: use kzalloc() instead of the bootmem allocator") replaced the alloc_bootmem() with kzalloc() but didn't set vc_kmalloced to 1 and the memory block is later leaked. The corresponding kmemleak trace: unreferenced object 0xdf828000 (size 8192): comm "swapper", pid 0, jiffies 4294937296 backtrace: [<c006d473>] __save_stack_trace+0x17/0x1c [<c000d869>] log_early+0x55/0x84 [<c01cfa4b>] kmemleak_alloc+0x33/0x3c [<c006c013>] __kmalloc+0xd7/0xe4 [<c00108c7>] con_init+0xbf/0x1b8 [<c0010149>] console_init+0x11/0x20 [<c0008797>] start_kernel+0x137/0x1e4 Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Pekka Enberg <penberg@cs.helsinki.fi> Tested-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Jul, 2009 1 commit
-
-
Jan Kara authored
Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sence. Currently it was set only by ext3_getblk(). Since the parameter has some effect only if create == 1, it is easy to check that the three callers which end up calling ext3_getblk() with create == 1 (ext3_append, ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves. Signed-off-by:
Jan Kara <jack@suse.cz>
-
- 14 Jul, 2009 2 commits
-
-
Tejun Heo authored
PIONEER DVD-RW DVRTD08 times out SETXFER if no media is present. The device is SATA and simply skipping SETXFER works around the problem. Implement ATA_HORKAGE_NOSETXFER and apply it to the device. Reported by Moritz Rigler in the following thread. http://thread.gmane.org/gmane.linux.ide/36790 and by Lars in bko#9540. Updated to whine and ignore NOSETXFER if PATA component is detected as suggested by Alan Cox. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Moritz Rigler <linux-ide@momail.e4ward.com> Reported-by:
Lars <lars21ce@gmx.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by:
Jeff Garzik <jgarzik@redhat.com>
-
Tobias Klauser authored
Use the correct function call for skb_reserve in the comment for NET_IP_ALIGN. Signed-off-by:
Tobias Klauser <klto@zhaw.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 12 Jul, 2009 5 commits
-
-
Randy Dunlap authored
Fix usb.h kernel-doc warnings: Warning(include/linux/usb.h:918): Excess struct/union/enum/typedef member 'nodename' description in 'usb_device_driver' Warning(include/linux/usb.h:939): No description found for parameter 'nodename' Warning(include/linux/usb.h:1219): No description found for parameter 'sg' Warning(include/linux/usb.h:1219): No description found for parameter 'num_sgs' Signed-off-by:
Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Greg Kroah-Hartman authored
This reverts commit 453f7755. The driver should not have been accepted as the MSRT code is not in the main kernel yet, which this depends on. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Hao Wu <hao.wu@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Kay Sievers authored
The name size limit is gone from the driver-core, this is the removal of the last left-over. Signed-off-by:
Kay Sievers <kay.sievers@vrfy.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Alexey Dobriyan authored
* Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Julien Tinnes authored
We have found that the current PER_CLEAR_ON_SETID mask on Linux doesn't include neither ADDR_COMPAT_LAYOUT, nor MMAP_PAGE_ZERO. The current mask is READ_IMPLIES_EXEC|ADDR_NO_RANDOMIZE. We believe it is important to add MMAP_PAGE_ZERO, because by using this personality it is possible to have the first page mapped inside a process running as setuid root. This could be used in those scenarios: - Exploiting a NULL pointer dereference issue in a setuid root binary - Bypassing the mmap_min_addr restrictions of the Linux kernel: by running a setuid binary that would drop privileges before giving us control back (for instance by loading a user-supplied library), we could get the first page mapped in a process we control. By further using mremap and mprotect on this mapping, we can then completely bypass the mmap_min_addr restrictions. Less importantly, we believe ADDR_COMPAT_LAYOUT should also be added since on x86 32bits it will in practice disable most of the address space layout randomization (only the stack will remain randomized). Signed-off-by:
Julien Tinnes <jt@cr0.org> Signed-off-by:
Tavis Ormandy <taviso@sdf.lonestar.org> Cc: stable@kernel.org Acked-by:
Christoph Hellwig <hch@infradead.org> Acked-by:
Kees Cook <kees@ubuntu.com> Acked-by:
Eugene Teo <eugene@redhat.com> [ Shortened lines and fixed whitespace as per Christophs' suggestion ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 11 Jul, 2009 1 commit
-
-
Trond Myklebust authored
Move the definition of BLK_RW_ASYNC/BLK_RW_SYNC into linux/backing-dev.h so that it is available to all callers of set/clear_bdi_congested(). This replaces commit 097041e5 ("fuse: Fix build error"), which will be reverted. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by:
Larry Finger <Larry.Finger@lwfinger.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 10 Jul, 2009 7 commits
-
-
Alan Cox authored
The sysrq code acquired a kref leak. Fix it by passing the tty separately from the caller (thus effectively using the callers kref which all the callers hold anyway) Signed-off-by:
Alan Cox <alan@linux.intel.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Zijlstra authored
Optimize cond_resched() by removing one conditional. Currently cond_resched() checks system_state == SYSTEM_RUNNING in order to avoid scheduling before the scheduler is running. We can however, as per suggestion of Matt, use PREEMPT_ACTIVE to accomplish that very same. Suggested-by:
Matt Mackall <mpm@selenic.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by:
Matt Mackall <mpm@selenic.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Zijlstra authored
Pull the initial preempt_count value into a single definition site. Maintainers for: alpha, ia64 and m68k, please have a look, your arch code is funny. The header magic is a bit odd, but similar to the KERNEL_DS one, CPP waits with expanding these macros until the INIT_THREAD_INFO macro itself is expanded, which is in arch/*/kernel/init_task.c where we've already included sched.h so we're good. Cc: tony.luck@intel.com Cc: rth@twiddle.net Cc: geert@linux-m68k.org Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by:
Matt Mackall <mpm@selenic.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
FUJITA Tomonori authored
I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.htmlReported-by:
zhou sf <sxzzsf@gmail.com> Tested-by:
zhou sf <sxzzsf@gmail.com> Cc: stable@kernel.org Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Commit 1faa16d2 accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Heiko Carstens authored
git commit 507e1231 "timer stats: Optimize by adding quick check to avoid function calls" added one wrong check so that one unnecessary function call isn't elimated. time_stats_account_hrtimer() checks if timer->start_pid isn't initialized in order to find out if timer_stats_update_stats() should be called. However start_pid is initialized with -1 instead of 0, so that the function call always happens. Check timer->start_site like in timer_stats_account_timer() to fix this. Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
The timer migration expiry check should prevent the migration of a timer to another CPU when the timer expires before the next event is scheduled on the other CPU. Migrating the timer might delay it because we can not reprogram the clock event device on the other CPU. But the code implementing that check has two flaws: - for !HIGHRES the check compares the expiry value with the clock events device expiry value which is wrong for CLOCK_REALTIME based timers. - the check is racy. It holds the hrtimer base lock of the target CPU, but the clock event device expiry value can be modified nevertheless, e.g. by an timer interrupt firing. The !HIGHRES case is easy to fix as we can enqueue the timer on the cpu which was selected by the load balancer. It runs the idle balancing code once per jiffy anyway. So the maximum delay for the timer is the same as when we keep the tick on the current cpu going. In the HIGHRES case we can get the next expiry value from the hrtimer cpu_base of the target CPU and serialize the update with the cpu_base lock. This moves the lock section in hrtimer_interrupt() so we can set next_event to KTIME_MAX while we are handling the expired timers and set it to the next expiry value after we handled the timers under the base lock. While the expired timers are processed timer migration is blocked because the expiry time of the timer is always <= KTIME_MAX. Also remove the now useless clockevents_get_next_event() function. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 09 Jul, 2009 1 commit
-
-
Jiri Olsa authored
Adding smp_mb__after_lock define to be used as a smp_mb call after a lock. Making it nop for x86, since {read|write|spin}_lock() on x86 are full memory barriers. Signed-off-by:
Jiri Olsa <jolsa@redhat.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 08 Jul, 2009 5 commits
-
-
Jaswinder Singh Rajput authored
fix the following 'make includecheck' warning: include/linux/rfkill.h: linux/types.h is included more than once. Signed-off-by:
Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by:
John W. Linville <linville@tuxdriver.com>
-
Alexey Dobriyan authored
Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Parag Warudkar authored
Commit a65e7bfc broke the UML build with the following error - In file included from fs/proc/kcore.c:17: include/linux/elfcore.h: In function 'elf_core_copy_task_regs': include/linux/elfcore.h:129: error: implicit declaration of function 'task_pt_regs' Fix this by restoring the previous behavior of returning 0 for all arches like UML that don't define task_pt_regs. Signed-off-by:
Parag Warudkar <parag.lkml@gmail.com> Acked-by:
Amerigo Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
Functions like free_bootmem() are allowed to free only part of a memory block. This patch adds support for this via the kmemleak_free_part() callback which removes the original object and creates one or two additional objects as a result of the memory block split. Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by:
Pekka Enberg <penberg@cs.helsinki.fi>
-
Catalin Marinas authored
The kmalloc_large() and kmalloc_large_node() functions were missed when adding the kmemleak hooks to the slub allocator. However, they should be traced to avoid false positives. Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by:
Pekka Enberg <penberg@cs.helsinki.fi>
-