Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reserve just the minimum of pages needed to process one job.
Because we allocate pages from page allocator, we don't need to reserve
a large number of pages. The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Replace the arbitrary calculation of an initial io struct mempool size
with a constant.
The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.
Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch changes dm-kcopyd so that it allocates pages from the main
page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can
fail in case of memory pressure). If the allocation fails, dm-kcopyd
allocates pages from its own reserve.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Introduce a parameter for gfp flags to alloc_pl() for use in following
patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove the spinlock protecting the pages allocation. The spinlock is only
taken on initialization or from single-threaded workqueue. Therefore, the
spinlock is useless.
The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages.
kcopyd_get_pages is only called from run_pages_job, which is only
called from process_jobs called from do_work.
kcopyd_put_pages is called from client_alloc_pages (which is initialization
function) or from run_complete_job. run_complete_job is only called from
process_jobs called from do_work.
Another spinlock, kc->job_lock is taken each time someone pushes or pops
some work for the worker thread. Once we take kc->job_lock, we
guarantee that any written memory is visible to the other CPUs.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
There's a possible theoretical deadlock in dm-kcopyd because multiple
allocations from the same mempool are required to finish a request.
Avoid this by preallocating sub jobs.
There is a mempool of 512 entries. Each request requires up to 9
entries from the mempool. If we have at least 57 concurrent requests
running, the mempool may overflow and mempool allocations may start
blocking until another entry is freed to the mempool. Because the same
thread is used to free entries to the mempool and allocate entries from
the mempool, this may result in a deadlock.
This patch changes it so that one mempool entry contains all 9 "struct
kcopyd_job" required to fulfill the whole request. The allocation is
done only once in dm_kcopyd_copy and no further mempool allocations are
done during request processing.
If dm_kcopyd_copy is not run in the completion thread, this
implementation is deadlock-free.
MIN_JOBS needs reducing accordingly and we've chosen to reduce it
further to 8.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Don't split SUB_JOB_SIZE jobs
If the job size equals SUB_JOB_SIZE, there is no point in splitting it.
Splitting it just unnecessarily wastes time, because the split job size
is SUB_JOB_SIZE too.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.
Cc: stable@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds a check that a block device has a request function
defined before it is used. Otherwise, misconfiguration can cause an oops.
Because we are allowing devices with zero size e.g. an offline multipath
device as in commit 2cd54d9bed
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised. Some block devices, like a loop
device without a backing file, exist but have no request function.
Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)
dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0"
and mirror resync will immediatelly cause OOps.
BUG: unable to handle kernel NULL pointer dereference at (null)
? generic_make_request+0x2bd/0x590
? kmem_cache_alloc+0xad/0x190
submit_bio+0x53/0xe0
? bio_add_page+0x3b/0x50
dispatch_io+0x1ca/0x210 [dm_mod]
? read_callback+0x0/0xd0 [dm_mirror]
dm_io+0xbb/0x290 [dm_mod]
do_mirror+0x1e0/0x748 [dm_mirror]
Signed-off-by: Milan Broz <mbroz@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Permit a target to support discards regardless of whether or not all its
underlying devices do.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
Blackfin: debug-mmrs: include RSI_PID[4567] MMRs
Blackfin: bf51x: fix up RSI_PID# MMR defines
Blackfin: bf52x/bf54x: fix up usb MMR defines
Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi
Blackfin: gptimers: add structure for hardware register layout
Blackfin: wire up new sendmmsg syscall
Blackfin: mach/bfin_serial_5xx.h: punt now-unused header
Blackfin: bfin_serial.h: turn default port wrappers into stubs
Fix kernel-doc warnings in scsi_proc.c:
Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'dev'
Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'data'
Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 's' description in 'always_match'
Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 'p' description in 'always_match'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On one machine I've been getting hangs, a page fault's anon_vma_prepare()
waiting in anon_vma_lock(), other processes waiting for that page's lock.
This is a replay of last year's f18194275c "mm: fix hang on
anon_vma->root->lock".
The new page_lock_anon_vma() places too much faith in its refcount: when
it has acquired the mutex_trylock(), it's possible that a racing task in
anon_vma_alloc() has just reallocated the struct anon_vma, set refcount
to 1, and is about to reset its anon_vma->root.
Fix this by saving anon_vma->root, and relying on the usual page_mapped()
check instead of a refcount check: if page is still mapped, the anon_vma
is still ours; if page is not still mapped, we're no longer interested.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've hit the "address >= vma->vm_end" check in do_page_add_anon_rmap()
just once. The stack showed khugepaged allocation trying to compact
pages: the call to page_add_anon_rmap() coming from remove_migration_pte().
That path holds anon_vma lock, but does not hold mmap_sem: it can
therefore race with a split_vma(), and in commit 5f70b962cc "mmap:
avoid unnecessary anon_vma lock" we just took away the anon_vma lock
protection when adjusting vma->vm_end.
I don't think that particular BUG_ON ever caught anything interesting,
so better replace it by a comment, than reinstate the anon_vma locking.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
(interruptibly), repeatedly failing to locate the owner of a 0xff entry in
the swap_map.
Although shmem_writepage() does abandon when it sees incoming page index
is beyond eof, there was still a window in which shmem_truncate_range()
could come in between writepage's dropping lock and updating swap_map,
find the half-completed swap_map entry, and in trying to free it,
leave it in a state that swap_shmem_alloc() could not correct.
Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
of the different cases, but easiest to fix by moving swap_shmem_alloc()
under cover of the lock.
More interesting than the bug: it's been there since 2.6.33, why could
I not see it with earlier kernels? The mmotm of two weeks ago seems to
have some magic for generating races, this is just one of three I found.
With yesterday's git I first saw this in mainline, bisected in search of
that magic, but the easy reproducibility evaporated. Oh well, fix the bug.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The documentation is a little iffy as to whether these are actual MMRs,
but reading them on the hardware works, and the previous version of this
logic (the SDH) had PID[4567]. So add it for RSI too.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Looks like the copying of MMR defines from the SDH block missed updating
the addresses of the RSI_PID# registers. So tweak them to reflect the
actual hardware.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The bf52x/bf54x have the incorrect addresses for USB_EP_NI7_RXINTERVAL
and USB_EP_NI7_TXCOUNT, so adjust those.
Further, the bf54x header puts the USB defines in the wrong place, so
shuffle them back to the right grouping.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This code was mostly developed against a BF54x, so some BF537-specific
issues were missed.
The PPI block starts at PPI_CONTROL, not PPI_STATUS (which is the reverse
of the EPPI block).
The MDMA block starts at MDMA_NEXT_DESC_PTR, not MDMA_CONFIG. Seems the
sim does not catch misreads here so that'll need to get fixed.
The gptimer block is mostly 32bit regs, not 16bit. Use the gptimer struct
to figure that out rather than hardcoding it locally.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Now that the serial code has been unified in bfin_serial.h, and the
Blackfin UART driver pushed its resources to the boards files, we
don't need these headers anymore.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Any consumer that needs to access the MMRs has to provide these helpers,
so make the default into useless stubs.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits)
Cache xattr security drop check for write v2
fs: block_page_mkwrite should wait for writeback to finish
mm: Wait for writeback when grabbing pages to begin a write
configfs: remove unnecessary dentry_unhash on rmdir, dir rename
fat: remove unnecessary dentry_unhash on rmdir, dir rename
hpfs: remove unnecessary dentry_unhash on rmdir, dir rename
minix: remove unnecessary dentry_unhash on rmdir, dir rename
fuse: remove unnecessary dentry_unhash on rmdir, dir rename
coda: remove unnecessary dentry_unhash on rmdir, dir rename
afs: remove unnecessary dentry_unhash on rmdir, dir rename
affs: remove unnecessary dentry_unhash on rmdir, dir rename
9p: remove unnecessary dentry_unhash on rmdir, dir rename
ncpfs: fix rename over directory with dangling references
ncpfs: document dentry_unhash usage
ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename
hostfs: remove unnecessary dentry_unhash on rmdir, dir rename
hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename
hfs: remove unnecessary dentry_unhash on rmdir, dir rename
omfs: remove unnecessary dentry_unhash on rmdir, dir rneame
udf: remove unnecessary dentry_unhash from rmdir, dir rename
...
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Clean up desc.h a bit
x86, amd: Do not enable ARAT feature on AMD processors below family 0x12
x86: Move do_page_fault()'s error path under unlikely()
x86, efi: Retain boot service code until after switching to virtual mode
x86: Remove unnecessary check in detect_ht()
x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct
x86, UV: Clean up uv_tlb.c
x86, UV: Add support for SGI UV2 hub chip
x86, cpufeature: Update CPU feature RDRND to RDRAND
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state
rcu: Remove waitqueue usage for cpu, node, and boost kthreads
rcu: Avoid acquiring rcu_node locks in timer functions
atomic: Add atomic_or()
Documentation: Add statistics about nested locks
rcu: Decrease memory-barrier usage based on semi-formal proof
rcu: Make rcu_enter_nohz() pay attention to nesting
rcu: Don't do reschedule unless in irq
rcu: Remove old memory barriers from rcu_process_callbacks()
rcu: Add memory barriers
rcu: Fix unpaired rcu_irq_enter() from locking selftests
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
perf: Fix SIGIO handling
perf top: Don't stop if no kernel symtab is found
perf top: Handle kptr_restrict
perf top: Remove unused macro
perf events: initialize fd array to -1 instead of 0
perf tools: Make sure kptr_restrict warnings fit 80 col terms
perf tools: Fix build on older systems
perf symbols: Handle /proc/sys/kernel/kptr_restrict
perf: Remove duplicate headers
ftrace: Add internal recursive checks
tracing: Update btrfs's tracepoints to use u64 interface
tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
ftrace: Set ops->flag to enabled even on static function tracing
tracing: Have event with function tracer check error return
ftrace: Have ftrace_startup() return failure code
jump_label: Check entries limit in __jump_label_update
ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
scripts/tags.sh: Add magic for trace-events for etags too
scripts/tags.sh: Fix ctags for DEFINE_EVENT()
x86/ftrace: Fix compiler warning in ftrace.c
...
* 'for-usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci:
Intel xhci: Limit number of active endpoints to 64.
Intel xhci: Ignore spurious successful event.
Intel xhci: Support EHCI/xHCI port switching.
Intel xhci: Add PCI id for Panther Point xHCI host.
xhci: STFU: Be quieter during URB submission and completion.
xhci: STFU: Don't print event ring dequeue pointer.
xhci: STFU: Remove function tracing.
xhci: Don't submit commands when the host is dead.
xhci: Clear stopped_td when Stop Endpoint command completes.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (33 commits)
x86: poll waiting for I/OAT DMA channel status
maintainers: add dma engine tree details
dmaengine: add TODO items for future work on dma drivers
dmaengine: Add API documentation for slave dma usage
dmaengine/dw_dmac: Update maintainer-ship
dmaengine: move link order
dmaengine/dw_dmac: implement pause and resume in dwc_control
dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback
dmaengine/dw_dmac: Divide one sg to many desc, if sg len is greater than DWC_MAX_COUNT
dmaengine/dw_dmac: set residue as total len in dwc_tx_status if status is !DMA_SUCCESS
dmaengine/dw_dmac: don't call callback routine in case dmaengine_terminate_all() is called
dmaengine: at_hdmac: pause: no need to wait for FIFO empty
pch_dma: modify pci device table definition
pch_dma: Support new device ML7223 IOH
pch_dma: Support I2S for ML7213 IOH
pch_dma: Fix DMA setting issue
pch_dma: modify for checkpatch
pch_dma: fix dma direction issue for ML7213 IOH video-in
dmaengine: at_hdmac: use descriptor chaining help function
dmaengine: at_hdmac: implement pause and resume in atc_control
...
Fix up trivial conflict in drivers/dma/dw_dmac.c
* 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
spi/spi_bfin_sport: new driver for a SPI bus via the Blackfin SPORT peripheral
spi/tle620x: add missing device_remove_file()
* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6:
gpio/pch_gpio: Support new device ML7223
gpio: make gpio_{request,free}_array gpio array parameter const
GPIO: OMAP: move to drivers/gpio
GPIO: OMAP: move register offset defines into <plat/gpio.h>
gpio: Convert gpio_is_valid to return bool
gpio: Move the s5pc100 GPIO to drivers/gpio
gpio: Move the s5pv210 GPIO to drivers/gpio
gpio: Move the exynos4 GPIO to drivers/gpio
gpio: Move to Samsung common GPIO library to drivers/gpio
gpio/nomadik: add function to read GPIO pull down status
gpio/nomadik: show all pins in debug
gpio: move Nomadik GPIO driver to drivers/gpio
gpio: move U300 GPIO driver to drivers/gpio
langwell_gpio: add runtime pm support
gpio/pca953x: Add support for pca9574 and pca9575 devices
gpio/cs5535: Show explicit dependency between gpio_cs5535 and mfd_cs5535
32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.
Why xattr lookup on every write I hear you ask?
write wants to drop suid and security related xattrs that could set o
capabilities for executables. To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.
In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.
Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.
This patch adds a simple per inode to avoid this problem. We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.
I also used the same flag to avoid the suid check, although
that one is pretty cheap.
A file system can also set this flag when it creates the inode,
if it has a cheap way to do so. This is done for some common file systems
in followon patches.
With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.
v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can
result in softlockup warnings. Because some of RCU's kthreads can
legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE
state in order to avoid those warnings.
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is not necessary to use waitqueues for the RCU kthreads because
we always know exactly which thread is to be awakened. In addition,
wake_up() only issues an actual wakeup when there is a thread waiting on
the queue, which was why there was an extra explicit wake_up_process()
to get the RCU kthreads started.
Eliminating the waitqueues (and wake_up()) in favor of wake_up_process()
eliminates the need for the initial wake_up_process() and also shrinks
the data structure size a bit. The wakeup logic is placed in a new
rcu_wait() macro.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
An atomic_or() function is needed by TREE_RCU to avoid deadlock, so
add a generic version.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Explain what the trailing "/1" on some lock class names of
lock_stat output means.
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
tsk->cpus_allowed. Otherwise RT scheduler may confuse.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dima Zavin <dima@android.com> reported:
"After pulling the thread off the run-queue during a cgroup change,
the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
then gets normalized to this new value. This can then lead to the thread
getting an unfair boost in the new group if the vruntime of the next
task in the old run-queue was way further ahead."
Reported-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Recalls-having-tested-once-upon-a-time-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1305674470-23727-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Marc reported that e4a52bcb9 (sched: Remove rq->lock from the first
half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few
__ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu()
code was suspect.
Yong found that the interrupt could hit after context_switch() changes
current but before it clears p->on_cpu, if that interrupt were to
attempt a wake-up of p we would indeed find ourselves spinning in IRQ
context.
Fix this by reverting to the old behaviour for this situation and
perform a full remote wake-up.
Cc: Frank Rowand <frank.rowand@am.sony.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Marc Zyngier <Marc.Zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched_domain iterations needs to be protected by rcu_read_lock() now,
this patch adds another two places which needs the rcu lock, which is
spotted by following suspicious rcu_dereference_check() usage warnings.
kernel/sched_rt.c:1244 invoked rcu_dereference_check() without protection!
kernel/sched_stats.h:41 invoked rcu_dereference_check() without protection!
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1303469634-11678-1-git-send-email-dfeng@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the addition of a device platform mfd_cell pointer, MFD drivers
can go back to passing platform data back to their sub drivers.
This allows for an mfd_cell->mfd_data removal and thus keep the
sub drivers MFD agnostic. This is mostly needed for non MFD aware
sub drivers.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>