linux

Author	SHA1	Message	Date
Chris Wilson	a63b03e2d2	mutex: Always clear owner field upon mutex_unlock() Currently if DEBUG_MUTEXES is enabled, the mutex->owner field is only cleared iff debug_locks is active. This exposes a race to other users of the field where the mutex->owner may be still set to a stale value, potentially upsetting mutex_spin_on_owner() among others. References: https://bugs.freedesktop.org/show_bug.cgi?id=87955 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420540175-30204-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:20:39 +01:00
Tetsuo Handa	7f1a169b88	sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group() When alloc_fair_sched_group() in sched_create_group() fails, free_sched_group() is called, and free_fair_sched_group() is called by free_sched_group(). Since destroy_cfs_bandwidth() is called by free_fair_sched_group() without calling init_cfs_bandwidth(), RCU stall occurs at hrtimer_cancel(): INFO: rcu_sched self-detected stall on CPU { 1} (t=60000 jiffies g=13074 c=13073 q=0) Task dump for CPU 1: (fprintd) R running task 0 6249 1 0x00000088 ... Call Trace: <IRQ> [<ffffffff81094988>] sched_show_task+0xa8/0x110 [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50 [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0 [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700 [<ffffffff810cbf2b>] update_process_times+0x4b/0x80 [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50 [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70 [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0 [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50 [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230 [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70 [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50 [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230 [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0 [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230 [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30 [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0 [<ffffffff8108df46>] free_sched_group+0x16/0x40 [<ffffffff810971bb>] sched_create_group+0x4b/0x80 [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0 [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110 [<ffffffff81647729>] system_call_fastpath+0x12/0x17 Check whether init_cfs_bandwidth() was called before calling destroy_cfs_bandwidth(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [ Move the check into destroy_cfs_bandwidth() to aid compilability. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:19:00 +01:00
Luca Abeni	269ad8015a	sched/deadline: Avoid double-accounting in case of missed deadlines The dl_runtime_exceeded() function is supposed to ckeck if a SCHED_DEADLINE task must be throttled, by checking if its current runtime is <= 0. However, it also checks if the scheduling deadline has been missed (the current time is larger than the current scheduling deadline), further decreasing the runtime if this happens. This "double accounting" is wrong: - In case of partitioned scheduling (or single CPU), this happens if task_tick_dl() has been called later than expected (due to small HZ values). In this case, the current runtime is also negative, and replenish_dl_entity() can take care of the deadline miss by recharging the current runtime to a value smaller than dl_runtime - In case of global scheduling on multiple CPUs, scheduling deadlines can be missed even if the task did not consume more runtime than expected, hence penalizing the task is wrong This patch fix this problem by throttling a SCHED_DEADLINE task only when its runtime becomes negative, and not modifying the runtime Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: <stable@vger.kernel.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:18:57 +01:00
Luca Abeni	6a503c3be9	sched/deadline: Fix migration of SCHED_DEADLINE tasks According to global EDF, tasks should be migrated between runqueues without checking if their scheduling deadlines and runtimes are valid. However, SCHED_DEADLINE currently performs such a check: a migration happens doing: deactivate_task(rq, next_task, 0); set_task_cpu(next_task, later_rq->cpu); activate_task(later_rq, next_task, 0); which ends up calling dequeue_task_dl(), setting the new CPU, and then calling enqueue_task_dl(). enqueue_task_dl() then calls enqueue_dl_entity(), which calls update_dl_entity(), which can modify scheduling deadline and runtime, breaking global EDF scheduling. As a result, some of the properties of global EDF are not respected: for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on two cores can have unbounded response times for the third task even if 30/80+40/80+120/170 = 1.5809 < 2 This can be fixed by invoking update_dl_entity() only in case of wakeup, or if this is a new SCHED_DEADLINE task. Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: <stable@vger.kernel.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:18:56 +01:00
Yuyang Du	32a8df4e0b	sched: Fix odd values in effective_load() calculations In effective_load, we have (long w * unsigned long tg->shares) / long W, when w is negative, it is cast to unsigned long and hence the product is insanely large. Fix this by casting tg->shares to long. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:18:54 +01:00
Andy Lutomirski	88a7c26af8	perf: Move task_pt_regs sampling into arch code On x86_64, at least, task_pt_regs may be only partially initialized in many contexts, so x86_64 should not use it without extra care from interrupt context, let alone NMI context. This will allow x86_64 to override the logic and will supply some scratch space to use to make a cleaner copy of user regs. Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: chenggang.qcg@taobao.com Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salter <msalter@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-09 11:12:28 +01:00
Jiri Kosina	b9dfe0bed9	livepatch: handle ancient compilers with more grace We are aborting a build in case when gcc doesn't support fentry on x86_64 (regs->ip modification can't really reliably work with mcount). This however breaks allmodconfig for people with older gccs that don't support -mfentry. Turn the build-time failure into runtime failure, resulting in the whole infrastructure not being initialized if CC_USING_FENTRY is unset. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>	2015-01-09 10:55:10 +01:00
Oleg Nesterov	3245d6acab	exit: fix race between wait_consider_task() and wait_task_zombie() wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE change in between, gcc needs to reload p->exit_state after security_task_wait(). In this case ->notask_error will be wrongly cleared and do_wait() can hang forever if it was the last eligible child. Many thanks to Arne who carefully investigated the problem. Note: this bug is very old but it was pure theoretical until commit `b3ab03160d` ("wait: completely ignore the EXIT_DEAD tasks"). Before this commit "-O2" was probably enough to guarantee that compiler won't read ->exit_state twice. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Arne Goedeke <el@laramies.com> Tested-by: Arne Goedeke <el@laramies.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-08 15:10:51 -08:00
Sasha Levin	5e5aeb4367	time: adjtimex: Validate the ADJ_FREQUENCY values Verify that the frequency value from userspace is valid and makes sense. Unverified values can cause overflows later on. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [jstultz: Fix up bug for negative values and drop redunent cap check] Signed-off-by: John Stultz <john.stultz@linaro.org>	2015-01-07 09:50:32 -08:00
Sasha Levin	6ada1fc0e1	time: settimeofday: Validate the values of tv from user An unvalidated user input is multiplied by a constant, which can result in an undefined behaviour for large values. While this is validated later, we should avoid triggering undefined behaviour. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [jstultz: include trivial milisecond->microsecond correction noticed by Andy] Signed-off-by: John Stultz <john.stultz@linaro.org>	2015-01-07 09:49:14 -08:00
David S. Miller	44d84d7272	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-01-06 22:29:20 -05:00
Christoph Jaeger	83ac237a95	livepatch: kconfig: use bool instead of boolean Keyword 'boolean' for type definition attributes is considered deprecated and should not be used anymore. No functional changes. Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com Reference: http://lkml.kernel.org/r/1419108071-11607-1-git-send-email-cj@linux.com Signed-off-by: Christoph Jaeger <cj@linux.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-01-06 21:58:05 +01:00
Paul E. McKenney	e3663b1024	rcu: Handle gpnum/completed wrap while dyntick idle Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:05:28 -08:00
Paul E. McKenney	6ccd2ecd42	rcu: Improve diagnostics for spurious RCU CPU stall warnings The current RCU CPU stall warning code will print "Stall ended before state dump start" any time that the stall-warning code is triggered on a CPU that has already reported a quiescent state for the current grace period and if all quiescent states have been reported for the current grace period. However, a true stall can result in these symptoms, for example, by preventing RCU's grace-period kthreads from ever running This commit therefore checks for this condition, reporting the end of the stall only if one of the grace-period counters has actually advanced. Otherwise, it reports the last time that the grace-period kthread made meaningful progress. (In normal situations, the grace-period kthread should make meaningful progress at least every jiffies_till_next_fqs jiffies.) Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz>	2015-01-06 11:05:27 -08:00
Paul E. McKenney	fc908ed33e	rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:05:25 -08:00
Pranith Kumar	83fe27ea53	rcu: Make SRCU optional by using CONFIG_SRCU SRCU is not necessary to be compiled by default in all cases. For tinification efforts not compiling SRCU unless necessary is desirable. The current patch tries to make compiling SRCU optional by introducing a new Kconfig option CONFIG_SRCU which is selected when any of the components making use of SRCU are selected. If we do not select CONFIG_SRCU, srcu.o will not be compiled at all. text data bss dec hex filename 2007 0 0 2007 7d7 kernel/rcu/srcu.o Size of arch/powerpc/boot/zImage changes from text data bss dec hex filename 831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before 829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after so the savings are about ~2000 bytes. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]	2015-01-06 11:04:29 -08:00
Paul E. McKenney	a5c198f4f7	rcu: Expand SRCU ->completed to 64 bits When rcutorture used only the low-order 32 bits of the grace-period number, it was not a problem for SRCU to use a 32-bit completed field. However, rcutorture now uses the full 64 bits on 64-bit systems, so this commit converts SRCU's ->completed field to unsigned long so as to provide 64 bits on 64-bit systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:04:26 -08:00
Paul E. McKenney	ab954c167e	rcu: Remove redundant callback-list initialization The RCU callback lists are initialized in both rcu_boot_init_percpu_data() and rcu_init_percpu_data(). The former is intended for initializing immutable data, so this commit removes the initialization from rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data(). This change prepares for permitting callbacks to be queued very early in boot. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:54 -08:00
Paul E. McKenney	6cd534ef8b	rcu: Don't scan root rcu_node structure for stalled tasks Now that blocked tasks are no longer migrated to the root rcu_node structure, there is no need to scan the root rcu_node structure for blocked tasks stalling the current grace period. This commit therefore removes this scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:53 -08:00
Lai Jiangshan	abaf3f9d27	rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion The patch `dfeb9765ce` ("Allow post-unlock reference for rt_mutex") ensured rcu-boost safe even the rt_mutex has post-unlock reference. But rt_mutex allowing post-unlock reference is definitely a bug and it was fixed by the commit `27e35715df` ("rtmutex: Plug slow unlock race"). This fix made the previous patch (`dfeb9765ce`) useless. And even worse, the priority-inversion introduced by the the previous patch still exists. rcu_read_unlock_special() { rt_mutex_unlock(&rnp->boost_mtx); /* Priority-Inversion: * the current task had been deboosted and preempted as a low * priority task immediately, it could wait long before reschedule in, * and the rcu-booster also waits on this low priority task and sleeps. * This priority-inversion makes rcu-booster can't work * as expected. */ complete(&rnp->boost_completion); } Just revert the patch to avoid it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:52 -08:00
Paul E. McKenney	3ba4d0e09b	rcu: Note quiescent state when CPU goes offline The rcu_cleanup_dead_cpu() function (called after a CPU has gone completely offline) has not reported a quiescent state because there was probably at least one synchronize_rcu() between the time the CPU went offline and the CPU_DEAD notifier, and this would have detected the CPU's offline state via quiescent-state forcing. However, the plan is for CPUs to take themselves offline, at which point it makes sense for them to report their own quiescent state. This commit makes this change in preparation for the new CPU-hotplug setup. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:51 -08:00
Paul E. McKenney	5d0b024973	rcu: Don't bother affinitying rcub kthreads away from offline CPUs When rcu_boost_kthread_setaffinity() sees that all CPUs for a given rcu_node structure are now offline, it affinities the corresponding RCU-boost ("rcub") kthread away from those CPUs. This is pointless because the kthread cannot run on those offline CPUs in any case. This commit therefore removes this unneeded code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:50 -08:00
Paul E. McKenney	1be0085b51	rcu: Don't initiate RCU priority boosting on root rcu_node Because there is no longer any preempted tasks on the root rcu_node, and because there is no longer ever an rcub kthread for the root rcu_node, this commit drops the code in force_qs_rnp() that attempts to awaken the non-existent root rcub kthread. This is strictly a performance enhancement, removing a root rcu_node ->lock acquisition and release along with some tests in rcu_initiate_boost(), ending with the test that notes that there is no rcub kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:48 -08:00
Paul E. McKenney	3e9f5c70d8	rcu: Don't spawn rcub kthreads on root rcu_node structure Now that offlining CPUs no longer moves leaf rcu_node structures' ->blkd_tasks lists to the root, there is no way for the root rcu_node structure's ->blkd_task list to be nonempty, unless the root node is also the sole leaf node. This commit therefore refrains from creating an rcub kthread for the root rcu_node structure unless it is also the sole leaf. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:47 -08:00
Paul E. McKenney	96e92021d4	rcu: Make use of rcu_preempt_has_tasks() Given that there is now arcu_preempt_has_tasks() function that checks to see if the ->blkd_tasks list is non-empty, this commit makes use of it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:46 -08:00
Paul E. McKenney	a8f4cbadfb	rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() Now that we are not migrating callbacks, there is no need to hold the ->orphan_lock across the the ->qsmaskinit bit-clearing process. This commit therefore releases ->orphan_lock immediately after adopting the orphaned RCU callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:45 -08:00
Paul E. McKenney	d19fb8d1f3	rcu: Don't migrate blocked tasks even if all corresponding CPUs offline When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:44 -08:00
Paul E. McKenney	b6a932d1d9	rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit bit clearing up the rcu_node tree once a given rcu_node structure's blkd_tasks list becomes empty. This is the final commit in preparation for the rework of RCU priority boosting: It enables preempted tasks to remain queued on their rcu_node structure even after all of that rcu_node structure's CPUs have gone offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:43 -08:00
Paul E. McKenney	8af3a5e78c	rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:41 -08:00
Paul E. McKenney	74e871ac6c	rcu: Rename "empty" to "empty_norm" in preparation for boost rework This commit undertakes a simple variable renaming to make way for some rework of RCU priority boosting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:40 -08:00
Paul E. McKenney	b08ea27d95	rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE() This commit prevents random compiler optimizations by applying ACCESS_ONCE() to lockless accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:39 -08:00
Lai Jiangshan	5a43b88e98	rcu: Remove "select IRQ_WORK" from config TREE_RCU The `48a7639ce8` ("rcu: Make callers awaken grace-period kthread") removed the irq_work_queue(), so the TREE_RCU doesn't need irq work any more. This commit therefore updates RCU's Kconfig and Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:17 -08:00
Paul E. McKenney	41050a0096	rcu: Fix rcu_barrier() race that could result in too-short wait The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions. It checks a given CPU's lists of callbacks, and if all three no-CBs lists are empty, ignores that CPU. However, these three lists could potentially be empty even when callbacks are present if the check executed just as the callbacks were being moved from one list to another. It turns out that recent versions of rcutorture can spot this race. This commit plugs this hole by consolidating the per-list counts of no-CBs callbacks into a single count, which is incremented before the corresponding callback is posted and after it is invoked. Then rcu_barrier() checks this single count to reliably determine whether the corresponding CPU has no-CBs callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:15 -08:00
David Hildenbrand	87af9e7ff9	hotplugcpu: Avoid deadlocks by waking active_writer Commit `b2c4623dcd` ("rcu: More on deadlock between CPU hotplug and expedited grace periods") introduced another problem that can easily be reproduced by starting/stopping cpus in a loop. E.g.: for i in `seq 5000`; do echo 1 > /sys/devices/system/cpu/cpu1/online echo 0 > /sys/devices/system/cpu/cpu1/online done Will result in: INFO: task /cpu_start_stop:1 blocked for more than 120 seconds. Call Trace: ([<00000000006a028e>] __schedule+0x406/0x91c) [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4 [<0000000000130ff6>] _cpu_up+0x3e/0x1c4 [<0000000000131232>] cpu_up+0xb6/0xd4 [<00000000004a5720>] device_online+0x80/0xc0 [<00000000004a57f0>] online_store+0x90/0xb0 ... And a deadlock. Problem is that if the last ref in put_online_cpus() can't get the cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer might never be woken up, therefore never exiting the loop in cpu_hotplug_begin(). This fix removes puts_pending and turns refcount into an atomic variable. We also introduce a wait queue for the active_writer, to avoid possible races and use-after-free. There is no need to take the lock in put_online_cpus() anymore. Can't reproduce it with this fix. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:14 -08:00
Lai Jiangshan	5f6130fa52	tiny_rcu: Directly force QS when call_rcu_[bh\|sched]() on idle_task For RCU in UP, context-switch = QS = GP, thus we can force a context-switch when any call_rcu_[bh\|sched]() is happened on idle_task. After doing so, rcu_idle/irq_enter/exit() are useless, so we can simply make these functions empty. More important, this change does not change the functionality logically. Note: raise_softirq(RCU_SOFTIRQ)/rcu_sched_qs() in rcu_idle_enter() and outmost rcu_irq_exit() will have to wake up the ksoftirqd (due to in_interrupt() == 0). Before this patch After this patch: call_rcu_sched() in idle; call_rcu_sched() in idle set resched do other stuffs; do other stuffs outmost rcu_irq_exit() outmost rcu_irq_exit() (empty function) (or rcu_idle_enter()) (or rcu_idle_enter(), also empty function) start to resched. (see above) rcu_sched_qs() rcu_sched_qs() QS,and GP and advance cb QS,and GP and advance cb wake up the ksoftirqd wake up the ksoftirqd set resched resched to ksoftirqd (or other) resched to ksoftirqd (or other) These two code patches are almost the same. Size changed after patched: size kernel/rcu/tiny-old.o kernel/rcu/tiny-patched.o text data bss dec hex filename 3449 206 8 3663 e4f kernel/rcu/tiny-old.o 2406 144 8 2558 9fe kernel/rcu/tiny-patched.o Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:12 -08:00
Thomas Graf	113948d841	spinlock: Add spin_lock_bh_nested() Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-03 14:32:57 -05:00
Linus Torvalds	5e0f872c7d	Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit Pull audit fix from Paul Moore: "One audit patch to resolve a panic/oops when recording filenames in the audit log, see the mail archive link below. The fix isn't as nice as I would like, as it involves an allocate/copy of the filename, but it solves the problem and the overhead should only affect users who have configured audit rules involving file names. We'll revisit this issue with future kernels in an attempt to make this suck less, but in the meantime I think this fix should go into the next release of v3.19-rcX. [ https://marc.info/?t=141986927600001&r=1&w=2 ]" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: create private file name copies when auditing inodes	2014-12-31 14:52:18 -08:00
Paul E. McKenney	924df8a011	rcu: Fix invoke_rcu_callbacks() comment Despite what the comment says, it is only softirqs that are disabled, not interrupts. This commit therefore fixes the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-12-30 17:40:19 -08:00
Alexander Gordeev	ca9558a33f	rcu: Remove redundant rcu_is_cpu_rrupt_from_idle() from tiny RCU Let's start assuming that something in the idle loop posts a callback, and scheduling-clock interrupt occurs: 1. The system is idle and stays that way, no runnable tasks. 2. Scheduling-clock interrupt occurs, rcu_check_callbacks() is called as result, which in turn calls rcu_is_cpu_rrupt_from_idle(). 3. rcu_is_cpu_rrupt_from_idle() reports the CPU was interrupted from idle, which results in rcu_sched_qs() call, which does a raise_softirq(RCU_SOFTIRQ). 4. Upon return from interrupt, rcu_irq_exit() is invoked, which calls rcu_idle_enter_common(), which in turn calls rcu_sched_qs() again, which does another raise_softirq(RCU_SOFTIRQ). 5. The softirq happens shortly and invokes rcu_process_callbacks(), which invokes __rcu_process_callbacks(). 6. So now callbacks can be invoked. At least they can be if ->donetail has been updated. Which it will have been because rcu_sched_qs() invokes rcu_qsctr_help(). In the described scenario rcu_sched_qs() and raise_softirq(RCU_SOFTIRQ) get called twice in steps 3 and 4. This redundancy could be eliminated by removing rcu_is_cpu_rrupt_from_idle() function. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-12-30 17:40:18 -08:00
Paul E. McKenney	734d168013	rcu: Make rcu_nmi_enter() handle nesting The x86 architecture has multiple types of NMI-like interrupts: real NMIs, machine checks, and, for some values of NMI-like, debugging and breakpoint interrupts. These interrupts can nest inside each other. Andy Lutomirski is adding RCU support to these interrupts, so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting. This commit therefore introduces nesting, using a clever NMI-coordination algorithm suggested by Andy. The trick is to atomically increment ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry (and, accordingly, after on exit). In addition, ->dynticks_nmi_nesting is incremented by one if ->dynticks was incremented and by two otherwise. This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal to one, it knows that ->dynticks must be atomically incremented. This NMI-coordination algorithms has been validated by the following Promela model: ------------------------------------------------------------------------ /* * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter() * that allows nesting. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, you can access it online at * http://www.gnu.org/licenses/gpl-2.0.html. * * Copyright IBM Corporation, 2014 * * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> / byte dynticks_nmi_nesting = 0; byte dynticks = 0; / * Promela verision of rcu_nmi_enter(). / inline rcu_nmi_enter() { byte incby; byte tmp; incby = BUSY_INCBY; assert(dynticks_nmi_nesting >= 0); if :: (dynticks & 1) == 0 -> atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 1); incby = 1; :: else -> skip; fi; tmp = dynticks_nmi_nesting; tmp = tmp + incby; dynticks_nmi_nesting = tmp; assert(dynticks_nmi_nesting >= 1); } / * Promela verision of rcu_nmi_exit(). / inline rcu_nmi_exit() { byte tmp; assert(dynticks_nmi_nesting > 0); assert((dynticks & 1) != 0); if :: dynticks_nmi_nesting != 1 -> tmp = dynticks_nmi_nesting; tmp = tmp - BUSY_INCBY; dynticks_nmi_nesting = tmp; :: else -> dynticks_nmi_nesting = 0; atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 0); fi; } / * Base-level NMI runs non-atomically. Crudely emulates process-level * dynticks-idle entry/exit. / proctype base_NMI() { byte busy; busy = 0; do :: / Emulate base-level dynticks and not. / if :: 1 -> atomic { dynticks = dynticks + 1; } busy = 1; :: 1 -> skip; fi; / Verify that we only sometimes have base-level dynticks. / if :: busy == 0 -> skip; :: busy == 1 -> skip; fi; / Model RCU's NMI entry and exit actions. / rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); / Emulated re-entering base-level dynticks and not. / if :: !busy -> skip; :: busy -> atomic { dynticks = dynticks + 1; } busy = 0; fi; / We had better now be in dyntick-idle mode. / assert((dynticks & 1) == 0); od; } / * Nested NMI runs atomically to emulate interrupting base_level(). / proctype nested_NMI() { do :: / * Use an atomic section to model a nested NMI. This is * guaranteed to interleave into base_NMI() between a pair * of base_NMI() statements, just as a nested NMI would. / atomic { / Verify that we only sometimes are in dynticks. / if :: (dynticks & 1) == 0 -> skip; :: (dynticks & 1) == 1 -> skip; fi; / Model RCU's NMI entry and exit actions. */ rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); } od; } init { run base_NMI(); run nested_NMI(); } ------------------------------------------------------------------------ The following script can be used to run this model if placed in rcu_nmi.spin: ------------------------------------------------------------------------ if ! spin -a rcu_nmi.spin then echo Spin errors!!! exit 1 fi if ! cc -DSAFETY -o pan pan.c then echo Compilation errors!!! exit 1 fi ./pan -m100000 Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-12-30 17:40:16 -08:00
Richard Cochran	2eebdde652	timecounter: keep track of accumulated fractional nanoseconds The current timecounter implementation will drop a variable amount of resolution, depending on the magnitude of the time delta. In other words, reading the clock too often or too close to a time stamp conversion will introduce errors into the time values. This patch fixes the issue by introducing a fractional nanosecond field that accumulates the low order bits. Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:27 -05:00
Richard Cochran	74d23cc704	time: move the timecounter/cyclecounter code into its own file. The timecounter code has almost nothing to do with the clocksource code. Let it live in its own file. This will help isolate the timecounter users from the clocksource users in the source tree. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:25 -05:00
Linus Torvalds	2c90331cf5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen. 2) Fix receive checksum handling in enic driver, from Govindarajulu Varadarajan. 3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from Herbert Xu. Also, add code to detect drivers that have this mistake in the future. 4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai. 5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP input path,f rom Nicolas Dichtel. 6) Fix MPLS action validation in openvswitch, from Pravin B Shelar. 7) Fix double SKB free in vxlan driver, also from Pravin. 8) When we scrub a packet, which happens when we are switching the context of the packet (namespace, etc.), we should reset the secmark. From Thomas Graf. 9) ->ndo_gso_check() needs to do more than return true/false, it also has to allow the driver to clear netdev feature bits in order for the caller to be able to proceed properly. From Jesse Gross. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) genetlink: A genl_bind() to an out-of-range multicast group should not WARN(). netlink/genetlink: pass network namespace to bind/unbind ne2k-pci: Add pci_disable_device in error handling bonding: change error message to debug message in __bond_release_one() genetlink: pass multicast bind/unbind to families netlink: call unbind when releasing socket netlink: update listeners directly when removing socket genetlink: pass only network namespace to genl_has_listeners() netlink: rename netlink_unbind() to netlink_undo_bind() net: Generalize ndo_gso_check to ndo_features_check net: incorrect use of init_completion fixup neigh: remove next ptr from struct neigh_table net: xilinx: Remove unnecessary temac_property in the driver net: phy: micrel: use generic config_init for KSZ8021/KSZ8031 net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding openvswitch: fix odd_ptr_err.cocci warnings Bluetooth: Fix accepting connections when not using mgmt Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR brcmfmac: Do not crash if platform data is not populated ipw2200: select CFG80211_WEXT ...	2014-12-30 10:45:47 -08:00
Paul Moore	fcf22d8267	audit: create private file name copies when auditing inodes Unfortunately, while commit `4a928436` ("audit: correctly record file names with different path name types") fixed a problem where we were not recording filenames, it created a new problem by attempting to use these file names after they had been freed. This patch resolves the issue by creating a copy of the filename which the audit subsystem frees after it is done with the string. At some point it would be nice to resolve this issue with refcounts, or something similar, instead of having to allocate/copy strings, but that is almost surely beyond the scope of a -rcX patch so we'll defer that for later. On the plus side, only audit users should be impacted by the string copying. Reported-by: Toralf Foerster <toralf.foerster@gmx.de> Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-12-30 09:26:21 -05:00
Johannes Berg	023e2cfa36	netlink/genetlink: pass network namespace to bind/unbind Netlink families can exist in multiple namespaces, and for the most part multicast subscriptions are per network namespace. Thus it only makes sense to have bind/unbind notifications per network namespace. To achieve this, pass the network namespace of a given client socket to the bind/unbind functions. Also do this in generic netlink, and there also make sure that any bind for multicast groups that only exist in init_net is rejected. This isn't really a problem if it is accepted since a client in a different namespace will never receive any notifications from such a group, but it can confuse the family if not rejected (it's also possible to silently (without telling the family) accept it, but it would also have to be ignored on unbind so families that take any kind of action on bind/unbind won't do unnecessary work for invalid clients like that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-27 03:07:50 -05:00
Linus Torvalds	66b3f4f0a0	Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "Four patches to fix various problems with the audit subsystem, all are fairly small and straightforward. One patch fixes a problem where we weren't using the correct gfp allocation flags (GFP_KERNEL regardless of context, oops), one patch fixes a problem with old userspace tools (this was broken for a while), one patch fixes a problem where we weren't recording pathnames correctly, and one fixes a problem with PID based filters. In general I don't think there is anything controversial with this patchset, and it fixes some rather unfortunate bugs; the allocation flag one can be particularly scary looking for users" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: restore AUDIT_LOGINUID unset ABI audit: correctly record file names with different path name types audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb audit: don't attempt to lookup PIDs when changing PID filtering audit rules	2014-12-23 18:13:16 -08:00
Richard Guy Briggs	041d7b98ff	audit: restore AUDIT_LOGINUID unset ABI A regression was caused by commit `780a7654ce`: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by `e1760bd`) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Cc: stable@vger.kernel.org # v3.10-rc1+ Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-12-23 16:40:18 -05:00
Alex Thorlton	b74e6278fd	sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation When allocating space for load_balance_mask, in sched_init, when CPUMASK_OFFSTACK is set, we've managed to spill over KMALLOC_MAX_SIZE on our 6144 core machine. The patch below breaks up the allocations so that they don't overflow the max alloc size. It also allocates the masks on the the node from which they'll most commonly be accessed, to minimize remote accesses on NUMA machines. Suggested-by: George Beshers <gbeshers@sgi.com> Signed-off-by: Alex Thorlton <athorlton@sgi.com> Cc: George Beshers <gbeshers@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-23 11:43:48 +01:00
Steven Rostedt (Red Hat)	d716ff71dd	tracing: Remove taking of trace_types_lock in pipe files Taking the global mutex "trace_types_lock" in the trace_pipe files causes a bottle neck as most the pipe files can be read per cpu and there's no reason to serialize them. The current_trace variable was given a ref count and it can not change when the ref count is not zero. Opening the trace_pipe files will up the ref count (and decremented on close), so that the lock no longer needs to be taken when accessing the current_trace variable. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-22 23:37:46 -05:00
Rusty Russell	574732c73d	param: initialize store function to NULL if not available. I rebased Kees' 'param: do not set store func without write perm' on top of my 'params: cleanup sysfs allocation'. However, my patch uses krealloc which doesn't zero memory, leaving .store unset. Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-12-23 15:07:41 +10:30
Steven Rostedt (Red Hat)	cf6ab6d914	tracing: Add ref count to tracer for when they are being read by pipe When one of the trace pipe files are being read (by either the trace_pipe or trace_pipe_raw), do not allow the current_trace to change. By adding a ref count that is incremented when the pipe files are opened, will prevent the current_trace from being changed. This will allow for the removal of the global trace_types_lock from reading the pipe buffers (which is currently a bottle neck). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-22 15:39:40 -05:00
Josh Poimboeuf	33e8612f64	livepatch: use FTRACE_OPS_FL_IPMODIFY Use the FTRACE_OPS_FL_IPMODIFY flag to prevent conflicts with other ftrace users who also modify regs->ip. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 20:05:59 +01:00
Paul Moore	4a92843601	audit: correctly record file names with different path name types There is a problem with the audit system when multiple audit records are created for the same path, each with a different path name type. The root cause of the problem is in __audit_inode() when an exact match (both the path name and path name type) is not found for a path name record; the existing code creates a new path name record, but it never sets the path name in this record, leaving it NULL. This patch corrects this problem by assigning the path name to these newly created records. There are many ways to reproduce this problem, but one of the easiest is the following (assuming auditd is running): # mkdir /root/tmp/test # touch /root/tmp/test/567 # auditctl -a always,exit -F dir=/root/tmp/test # touch /root/tmp/test/567 Afterwards, or while the commands above are running, check the audit log and pay special attention to the PATH records. A faulty kernel will display something like the following for the file creation: type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2 success=yes exit=3 ... comm="touch" exe="/usr/bin/touch" type=CWD msg=audit(1416957442.025:93): cwd="/root/tmp" type=PATH msg=audit(1416957442.025:93): item=0 name="test/" inode=401409 ... nametype=PARENT type=PATH msg=audit(1416957442.025:93): item=1 name=(null) inode=393804 ... nametype=NORMAL type=PATH msg=audit(1416957442.025:93): item=2 name=(null) inode=393804 ... nametype=NORMAL While a patched kernel will show the following: type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2 success=yes exit=3 ... comm="touch" exe="/usr/bin/touch" type=CWD msg=audit(1416955786.566:89): cwd="/root/tmp" type=PATH msg=audit(1416955786.566:89): item=0 name="test/" inode=401409 ... nametype=PARENT type=PATH msg=audit(1416955786.566:89): item=1 name="test/567" inode=393804 ... nametype=NORMAL This issue was brought up by a number of people, but special credit should go to hujianyang@huawei.com for reporting the problem along with an explanation of the problem and a patch. While the original patch did have some problems (see the archive link below), it did demonstrate the problem and helped kickstart the fix presented here. * https://lkml.org/lkml/2014/9/5/66 Reported-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com>	2014-12-22 12:27:39 -05:00
Li Bin	b5bfc51707	livepatch: move x86 specific ftrace handler code to arch/x86 The execution flow redirection related implemention in the livepatch ftrace handler is depended on the specific architecture. This patch introduces klp_arch_set_pc(like kgdb_arch_set_pc) interface to change the pt_regs. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 15:40:49 +01:00
Seth Jennings	b700e7f03d	livepatch: kernel: add support for live patching This commit introduces code for the live patching core. It implements an ftrace-based mechanism and kernel interface for doing live patching of kernel and kernel module functions. It represents the greatest common functionality set between kpatch and kgraft and can accept patches built using either method. This first version does not implement any consistency mechanism that ensures that old and new code do not run together. In practice, ~90% of CVEs are safe to apply in this way, since they simply add a conditional check. However, any function change that can not execute safely with the old version of the function can _not_ be safely applied in this version. [ jkosina@suse.cz: due to the number of contributions that got folded into this original patch from Seth Jennings, add SUSE's copyright as well, as discussed via e-mail ] Signed-off-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 15:40:49 +01:00
Seth Jennings	c5f4546593	livepatch: kernel: add TAINT_LIVEPATCH This adds a new taint flag to indicate when the kernel or a kernel module has been live patched. This will provide a clean indication in bug reports that live patching was used. Additionally, if the crash occurs in a live patched function, the live patch module will appear beside the patched function in the backtrace. Signed-off-by: Seth Jennings <sjenning@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-12-22 15:40:48 +01:00
Linus Torvalds	5d6a546886	CONFIG_PM_RUNTIME elimination for 3.19-rc1 This removes the last few uses of CONFIG_PM_RUNTIME introduced recently and makes that config option finally go away. CONFIG_PM will be available directly from the menu now and also it will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION is set. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUlYaPAAoJEILEb/54YlRx/SoP/2wYioGzBhOCYfHw6fZF8zrP rotQ86sakhvSHre8K9QyFjvsA9wJ0CaTJF46YKZuHFhqU+IJZ7aXvNdEM1hK214J Mf3L2AcbcdnXioAN+HpeZhQklp2qHe84YkVXBqsFD6kb/qUNV2LSjy6nKEUdY3jW 6KL2f3RgF/LDjTdedujJgcCYwMBwfX4B7U42BG4NQQ8z3wCV+imJgzNDrR5nNlqK xu8ab8hO1Gi3msOJxS0y4MN6VTUpYOvQKhSyM9ErcB2ibclAdmcivKuFAz6gy5U7 PyDfYo/P3mXjMRBFb9fLqGtRcfstsnxPPSeKwp236tIQFX19Bj76UVUMJoUlXJP5 /f55/P7mCascg74ZZC4GiD/BSCRdqwInCsFMzqAfSq2NciKzeS6W7Mhd9VTLKDpl 5kqE39imUjZyps7/QqkfWskzB7Puhmqk3ZgTq2yAd4uQTpV7xlJYcnvr4oHCmAia SsLdYOqMQzWr3qyz2f5cOqPAvOo3/Xk/HHfTOCHW/4L+Ov+C921/f3d5GnxX9Ha+ ucRaMp9j5FPYVwFaFkczAMNF2Eanq+Fupa3e6XUNNbYdchFqT9obnHZbVKyvswjR vdGAYAjP/cLzIH9ETDCCXCRvBRw5pzeelDgvDPjPdmPjndHXG8WViyTIEyLL4+1i BENtc/SUw3pZ7iNlGO78 =QnSO -----END PGP SIGNATURE----- Merge tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki: "This removes the last few uses of CONFIG_PM_RUNTIME introduced recently and makes that config option finally go away. CONFIG_PM will be available directly from the menu now and also it will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION is set" * tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Eliminate CONFIG_PM_RUNTIME tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM	2014-12-20 13:37:44 -08:00
Richard Guy Briggs	54dc77d974	audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb Eric Paris explains: Since kauditd_send_multicast_skb() gets called in audit_log_end(), which can come from any context (aka even a sleeping context) GFP_KERNEL can't be used. Since the audit_buffer knows what context it should use, pass that down and use that. See: https://lkml.org/lkml/2014/12/16/542 BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin 2 locks held by sulogin/885: #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b #1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30 Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014 ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375 ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006 0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38 Call Trace: [<ffffffff916ba529>] dump_stack+0x50/0xa8 [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be [<ffffffff910632a6>] __might_sleep+0x119/0x128 [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9 [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3 [<ffffffff914e2b62>] skb_copy+0x3e/0xa3 [<ffffffff910c263e>] audit_log_end+0x83/0x100 [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103 [<ffffffff91252a73>] common_lsm_audit+0x441/0x450 [<ffffffff9123c163>] slow_avc_audit+0x63/0x67 [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3 [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65 [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10 [<ffffffff911515e6>] install_exec_creds+0xe/0x79 [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7 [<ffffffff9115198e>] search_binary_handler+0x81/0x18c [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7 [<ffffffff91153669>] do_execve+0x1f/0x21 [<ffffffff91153967>] SyS_execve+0x25/0x29 [<ffffffff916c61a9>] stub_execve+0x69/0xa0 Cc: stable@vger.kernel.org #v3.16-rc1 Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-12-19 18:37:56 -05:00
Paul Moore	3640dcfa4f	audit: don't attempt to lookup PIDs when changing PID filtering audit rules Commit `f1dc4867` ("audit: anchor all pid references in the initial pid namespace") introduced a find_vpid() call when adding/removing audit rules with PID/PPID filters; unfortunately this is problematic as find_vpid() only works if there is a task with the associated PID alive on the system. The following commands demonstrate a simple reproducer. # auditctl -D # auditctl -l # autrace /bin/true # auditctl -l This patch resolves the problem by simply using the PID provided by the user without any additional validation, e.g. no calls to check to see if the task/PID exists. Cc: stable@vger.kernel.org # 3.15 Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com>	2014-12-19 18:35:53 -05:00
Rafael J. Wysocki	464ed18ebd	PM: Eliminate CONFIG_PM_RUNTIME Having switched over all of the users of CONFIG_PM_RUNTIME to use CONFIG_PM directly, turn the latter into a user-selectable option and drop the former entirely from the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org>	2014-12-19 22:55:06 +01:00
Linus Torvalds	4bb9374e0b	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull NOHZ update from Thomas Gleixner: "Remove the call into the nohz idle code from the fake 'idle' thread in the powerclamp driver along with the export of those functions which was smuggeled in via the thermal tree. People have tried to hack around it in the nohz core code, but it just violates all rightful assumptions of that code about the only valid calling context (i.e. the proper idle task). The powerclamp trainwreck will still work, it just wont get the benefit of long idle sleeps" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/powerclamp: Remove tick_nohz_idle abuse	2014-12-19 13:29:20 -08:00
Linus Torvalds	ac88ee3b6c	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core fix from Thomas Gleixner: "A single fix plugging a long standing race between proc/stat and proc/interrupts access and freeing of interrupt descriptors" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent proc race against freeing of irq descriptors	2014-12-19 13:26:08 -08:00
Linus Torvalds	88a57667f2	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes and cleanups from Ingo Molnar: "A kernel fix plus mostly tooling fixes, but also some tooling restructuring and cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) perf: Fix building warning on ARM 32 perf symbols: Fix use after free in filename__read_build_id perf evlist: Use roundup_pow_of_two tools: Adopt roundup_pow_of_two perf tools: Make the mmap length autotuning more robust tools: Adopt rounddown_pow_of_two and deps tools: Adopt fls_long and deps tools: Move bitops.h from tools/perf/util to tools/ tools: Introduce asm-generic/bitops.h tools lib: Move asm-generic/bitops/find.h code to tools/include and tools/lib tools: Whitespace prep patches for moving bitops.h tools: Move code originally from asm-generic/atomic.h into tools/include/asm-generic/ tools: Move code originally from linux/log2.h to tools/include/linux/ tools: Move __ffs implementation to tools/include/asm-generic/bitops/__ffs.h perf evlist: Do not use hard coded value for a mmap_pages default perf trace: Let the perf_evlist__mmap autosize the number of pages to use perf evlist: Improve the strerror_mmap method perf evlist: Clarify sterror_mmap variable names perf evlist: Fixup brown paper bag on "hint" for --mmap-pages cmdline arg perf trace: Provide a better explanation when mmap fails ...	2014-12-19 13:15:24 -08:00
Thomas Gleixner	a5fd9733a3	tick/powerclamp: Remove tick_nohz_idle abuse commit `4dbd27711c` "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers. The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself. Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable and just papers over the real problem. There are way more subtle issues lurking around the corner. The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this. So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports. Fixes: `d6d71ee4a1` "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Pan Jacob jun <jacob.jun.pan@intel.com> Cc: LKP <lkp@01.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-19 14:05:52 +01:00
Linus Torvalds	d790be3863	The exciting thing here is the getting rid of stop_machine on module removal. This is possible by using a simple atomic_t for the counter, rather than our fancy per-cpu counter: it turns out that no one is doing a module increment per net packet, so the slowdown should be in the noise. Also, script fixed for new git version. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUk3cQAAoJENkgDmzRrbjxr44P/25ZBYmKZZ3XM3flt2o0LCti 1Px+MRbWuXhueWQOYZSXOO3c2ENNuV3siaU4jQZqnxslpdvT4rVsVFkYuwva2vHT hqpoq1Hz++yjFJArjERFOdoZ1gxkBbZQQGYm8esToAqU3b2Z74SrU48dPwp65q/1 r6hbXdWSiKALEBZeW2coi+QVCL/oxE8hmNqDO1mpe82aEKu0xIVpTdU5vAfBIj8/ Z95U2bx+CjiP7khhSjBGtltLqxL6QXw1m2eg1gO9nf1gJNI0/dAY6IJmFbGz+7Bt CAyc9BRsB40Em8G7d7wr4FsURcLfmYNdjtx79j+Rot5PkVIi+Ztv7C1QYlMQESPa ESddUMySOmKlzTm50w3ZLvV1ZTRU8TjmttSkzQYZ3csCLkKUgfeL9SAxU9KGoA2l jFxrvDcWEHtuU1D/FeYyOofNaD/BflPfdhj4WAm9XnPPi+THEu7fulWJaIP4glHh 8TpYNbinXuZqXO4nJ41Ad5utbSbBQa4fFBUuViWRTU0TtWJT2HVqn/XoYJ5mnPEz IbYh31rQDKFJKzePfscWrJ6XzoF59yGiAVcWcI3HS7aT8bFZGapAQu9mNCVu+cLF uRxWrukHG7d8YeYrAtbVXWfxArR155V9QJN55hQ1nKLq2M03gNvYTtAPw2yEsfuw u3Fk/KkV1RfaiFurjoG/ =rDum -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "The exciting thing here is the getting rid of stop_machine on module removal. This is possible by using a simple atomic_t for the counter, rather than our fancy per-cpu counter: it turns out that no one is doing a module increment per net packet, so the slowdown should be in the noise" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: param: do not set store func without write perm params: cleanup sysfs allocation kernel:module Fix coding style errors and warnings. module: Remove stop_machine from module unloading module: Replace module_ref with atomic_t refcnt lib/bug: Use RCU list ops for module_bug_list module: Unlink module with RCU synchronizing instead of stop_machine module: Wait for RCU synchronizing before releasing a module	2014-12-18 20:55:41 -08:00
Linus Torvalds	c0f486fde3	More ACPI and power management updates for 3.19-rc1 - Fix a regression in leds-gpio introduced by a recent commit that inadvertently changed the name of one of the properties used by the driver (Fabio Estevam). - Fix a regression in the ACPI backlight driver introduced by a recent fix that missed one special case that had to be taken into account (Aaron Lu). - Drop the level of some new kernel messages from the ACPI core introduced by a recent commit to KERN_DEBUG which they should have used from the start and drop some other unuseful KERN_ERR messages printed by ACPI (Rafael J Wysocki). - Revert an incorrect commit modifying the cpupower tool (Prarit Bhargava). - Fix two regressions introduced by recent commits in the OPP library and clean up some existing minor issues in that code (Viresh Kumar). - Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the tree (or drop it where that can be done) in order to make it possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf Hansson, Ludovic Desroches). There will be one more "CONFIG_PM_RUNTIME removal" batch after this one, because some new uses of it have been introduced during the current merge window, but that should be sufficient to finally get rid of it. - Make the ACPI EC driver more robust against race conditions related to GPE handler installation failures (Lv Zheng). - Prevent the ACPI device PM core code from attempting to disable GPEs that it has not enabled which confuses ACPICA and makes it report errors unnecessarily (Rafael J Wysocki). - Add a "force" command line switch to the intel_pstate driver to make it possible to override the blacklisting of some systems in that driver if needed (Ethan Zhao). - Improve intel_pstate code documentation and add a MAINTAINERS entry for it (Kristen Carlson Accardi). - Make the ACPI fan driver create cooling device interfaces witn names that reflect the IDs of the ACPI device objects they are associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B"). That's necessary for user space thermal management tools to be able to connect the fans with the parts of the system they are supposed to be cooling properly. From Srinivas Pandruvada. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUk0IDAAoJEILEb/54YlRx7fgP/3+yF/0TnEW93j2ALDAQFiLF tSv2A2vQC8vtMJjjWx0z/HqPh86gfaReEFZmUJD/Q/e2LXEnxNZJ+QMjcekPVkDM mTvcIMc2MR8vOA/oMkgxeaKregrrx7RkCfojd+NWZhVukkjl+mvBHgAnYjXRL+NZ unDWGlbHG97vq/3kGjPYhDS00nxHblw8NHFBu5HL5RxwABdWoeZJITwqxXWyuPLw nlqNWlOxmwvtSbw2VMKz0uof1nFHyQLykYsMG0ZsyayCRdWUZYkEqmE7GGpCLkLu D6yfmlpen6ccIOsEAae0eXBt50IFY9Tihk5lovx1mZmci2SNRg29BqMI105wIn0u 8b8Ej7MNHp7yMxRpB5WfU90p/y7ioJns9guFZxY0CKaRnrI2+BLt3RscMi3MPI06 Cu2/WkSSa09fhDPA+pk+VDYsmWgyVawigesNmMP5/cvYO/yYywVRjOuO1k77qQGp 4dSpFYEHfpxinejZnVZOk2V9MkvSLoSMux6wPV0xM0IE1iD0ulVpHjTJrwp80ph4 +bfUFVr/vrD1y7EKbf1PD363ZKvJhWhvQWDgETsM1vgLf21PfWO7C2kflIAsWsdQ 1ukD5nCBRlP4K73hG7bdM6kRztXhUdR0SHg85/t0KB/ExiVqtcXIzB60D0G1lENd QlKbq3O4lim1WGuhazQY =5fo2 -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: "These are regression fixes (leds-gpio, ACPI backlight driver, operating performance points library, ACPI device enumeration messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device PM), some cleanups in the operating performance points (OPP) framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of minor intel_pstate driver changes, a new MAINTAINERS entry for it and an ACPI fan driver change needed for better support of thermal management in user space. Specifics: - Fix a regression in leds-gpio introduced by a recent commit that inadvertently changed the name of one of the properties used by the driver (Fabio Estevam). - Fix a regression in the ACPI backlight driver introduced by a recent fix that missed one special case that had to be taken into account (Aaron Lu). - Drop the level of some new kernel messages from the ACPI core introduced by a recent commit to KERN_DEBUG which they should have used from the start and drop some other unuseful KERN_ERR messages printed by ACPI (Rafael J Wysocki). - Revert an incorrect commit modifying the cpupower tool (Prarit Bhargava). - Fix two regressions introduced by recent commits in the OPP library and clean up some existing minor issues in that code (Viresh Kumar). - Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the tree (or drop it where that can be done) in order to make it possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf Hansson, Ludovic Desroches). There will be one more "CONFIG_PM_RUNTIME removal" batch after this one, because some new uses of it have been introduced during the current merge window, but that should be sufficient to finally get rid of it. - Make the ACPI EC driver more robust against race conditions related to GPE handler installation failures (Lv Zheng). - Prevent the ACPI device PM core code from attempting to disable GPEs that it has not enabled which confuses ACPICA and makes it report errors unnecessarily (Rafael J Wysocki). - Add a "force" command line switch to the intel_pstate driver to make it possible to override the blacklisting of some systems in that driver if needed (Ethan Zhao). - Improve intel_pstate code documentation and add a MAINTAINERS entry for it (Kristen Carlson Accardi). - Make the ACPI fan driver create cooling device interfaces witn names that reflect the IDs of the ACPI device objects they are associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B"). That's necessary for user space thermal management tools to be able to connect the fans with the parts of the system they are supposed to be cooling properly. From Srinivas Pandruvada" * tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) MAINTAINERS: add entry for intel_pstate ACPI / video: update the skip case for acpi_video_device_in_dod() power / PM: Eliminate CONFIG_PM_RUNTIME NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / EC: Fix unexpected ec_remove_handlers() invocations Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()" tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c PM: Remove the SET_PM_RUNTIME_PM_OPS() macro mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro PM / Kconfig: Replace PM_RUNTIME with PM in dependencies ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / PM: Do not disable wakeup GPEs that have not been enabled ACPI / utils: Drop error messages from acpi_evaluate_reference() ...	2014-12-18 20:28:33 -08:00
Kees Cook	b0a65b0ccc	param: do not set store func without write perm When a module_param is defined without DAC write permissions, it can still be changed at runtime and updated. Drivers using a 0444 permission may be surprised that these values can still be changed. For drivers that want to allow updates, any S_IW* flag will set the "store" function as before. Drivers without S_IW* flags will have the "store" function unset, unforcing a read-only value. Drivers that wish neither "store" nor "get" can continue to use "0" for perms to stay out of sysfs entirely. Old behavior: # cd /sys/module/snd/parameters # ls -l total 0 -r--r--r-- 1 root root 4096 Dec 11 13:55 cards_limit -r--r--r-- 1 root root 4096 Dec 11 13:55 major -r--r--r-- 1 root root 4096 Dec 11 13:55 slots # cat major 116 # echo -1 > major -bash: major: Permission denied # chmod u+w major # echo -1 > major # cat major -1 New behavior: ... # chmod u+w major # echo -1 > major -bash: echo: write error: Input/output error Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-12-18 12:38:51 +10:30
Linus Torvalds	87c31b39ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace related fixes from Eric Biederman: "As these are bug fixes almost all of thes changes are marked for backporting to stable. The first change (implicitly adding MNT_NODEV on remount) addresses a regression that was created when security issues with unprivileged remount were closed. I go on to update the remount test to make it easy to detect if this issue reoccurs. Then there are a handful of mount and umount related fixes. Then half of the changes deal with the a recently discovered design bug in the permission checks of gid_map. Unix since the beginning has allowed setting group permissions on files to less than the user and other permissions (aka ---rwx---rwx). As the unix permission checks stop as soon as a group matches, and setgroups allows setting groups that can not later be dropped, results in a situtation where it is possible to legitimately use a group to assign fewer privileges to a process. Which means dropping a group can increase a processes privileges. The fix I have adopted is that gid_map is now no longer writable without privilege unless the new file /proc/self/setgroups has been set to permanently disable setgroups. The bulk of user namespace using applications even the applications using applications using user namespaces without privilege remain unaffected by this change. Unfortunately this ix breaks a couple user space applications, that were relying on the problematic behavior (one of which was tools/selftests/mount/unprivileged-remount-test.c). To hopefully prevent needing a regression fix on top of my security fix I rounded folks who work with the container implementations mostly like to be affected and encouraged them to test the changes. > So far nothing broke on my libvirt-lxc test bed. :-) > Tested with openSUSE 13.2 and libvirt 1.2.9. > Tested-by: Richard Weinberger <richard@nod.at> > Tested on Fedora20 with libvirt 1.2.11, works fine. > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Unbreak the unprivileged remount tests userns; Correct the comment in map_write userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks mnt: Clear mnt_expire during pivot_root mnt: Carefully set CL_UNPRIVILEGED in clone_mnt mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. umount: Do not allow unmounting rootfs. umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount	2014-12-17 12:31:40 -08:00
Linus Torvalds	603ba7e41b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc//ns/ weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc//ns/ symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns	2014-12-16 15:53:03 -08:00
Linus Torvalds	a7c180aa7e	As the merge window is still open, and this code was not as complex as I thought it might be. I'm pushing this in now. This will allow Thomas to debug his irq work for 3.20. This adds two new features: 1) Allow traceopoints to be enabled right after mm_init(). By passing in the trace_event= kernel command line parameter, tracepoints can be enabled at boot up. For debugging things like the initialization of interrupts, it is needed to have tracepoints enabled very early. People have asked about this before and this has been on my todo list. As it can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm pushing this now. This way he can add tracepoints into the IRQ set up and have users enable them when things go wrong. 2) Have the tracepoints printed via printk() (the console) when they are triggered. If the irq code locks up or reboots the box, having the tracepoint output go into the kernel ring buffer is useless for debugging. But being able to add the tp_printk kernel command line option along with the trace_event= option will have these tracepoints printed as they occur, and that can be really useful for debugging early lock up or reboot problems. This code is not that intrusive and it passed all my tests. Thomas tried them out too and it works for his needs. Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUjv3kAAoJEEjnJuOKh9ldLNsIANAe5EmDCBw0WjR72n+G3qOH NC8calXfkjqHU0bv8Q3dRv20KH4MHOy6l4+EiV9/ovt71LOF3NEyUJ3HuShf9a8b sWcUhYbX3D1hViQe5sOzv9AWhBCFlKQGoNmQnydX9xa8ivRsBaTGJIGktWlHcwBE jF1i3fj3l3vRQSS8qZFXp3bzreunlGyPoSHcT6eWQeos+utj4sKwQWTLXTLQeM+6 oQtFKRx7E5yX04qO1qFczS8qIEC6JH2C2jIRYEKUGepaELlnGkb8O7jQV/RaLF4/ 6P8VhZFG9YLS7fn7vWu0SnAN+Zwz5LzgjXAZt0FhGtIhLc18Oj8ouHH1UORsdQM= =Z4Un -----END PGP SIGNATURE----- Merge tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "As the merge window is still open, and this code was not as complex as I thought it might be. I'm pushing this in now. This will allow Thomas to debug his irq work for 3.20. This adds two new features: 1) Allow traceopoints to be enabled right after mm_init(). By passing in the trace_event= kernel command line parameter, tracepoints can be enabled at boot up. For debugging things like the initialization of interrupts, it is needed to have tracepoints enabled very early. People have asked about this before and this has been on my todo list. As it can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm pushing this now. This way he can add tracepoints into the IRQ set up and have users enable them when things go wrong. 2) Have the tracepoints printed via printk() (the console) when they are triggered. If the irq code locks up or reboots the box, having the tracepoint output go into the kernel ring buffer is useless for debugging. But being able to add the tp_printk kernel command line option along with the trace_event= option will have these tracepoints printed as they occur, and that can be really useful for debugging early lock up or reboot problems. This code is not that intrusive and it passed all my tests. Thomas tried them out too and it works for his needs. Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org" * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add tp_printk cmdline to have tracepoints go to printk() tracing: Move enabling tracepoints to just after rcu_init()	2014-12-16 12:53:59 -08:00
Linus Torvalds	988adfdffd	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Highlights: - AMD KFD driver merge This is the AMD HSA interface for exposing a lowlevel interface for GPGPU use. They have an open source userspace built on top of this interface, and the code looks as good as it was going to get out of tree. - Initial atomic modesetting work The need for an atomic modesetting interface to allow userspace to try and send a complete set of modesetting state to the driver has arisen, and been suffering from neglect this past year. No more, the start of the common code and changes for msm driver to use it are in this tree. Ongoing work to get the userspace ioctl finished and the code clean will probably wait until next kernel. - DisplayID 1.3 and tiled monitor exposed to userspace. Tiled monitor property is now exposed for userspace to make use of. - Rockchip drm driver merged. - imx gpu driver moved out of staging Other stuff: - core: panel - MIPI DSI + new panels. expose suggested x/y properties for virtual GPUs - i915: Initial Skylake (SKL) support gen3/4 reset work start of dri1/ums removal infoframe tracking fixes for lots of things. - nouveau: tegra k1 voltage support GM204 modesetting support GT21x memory reclocking work - radeon: CI dpm fixes GPUVM improvements Initial DPM fan control - rcar-du: HDMI support added removed some support for old boards slave encoder driver for Analog Devices adv7511 - exynos: Exynos4415 SoC support - msm: a4xx gpu support atomic helper conversion - tegra: iommu support universal plane support ganged-mode DSI support - sti: HDMI i2c improvements - vmwgfx: some late fixes. - qxl: use suggested x/y properties" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits) drm: sti: fix module compilation issue drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm: sti: correctly cleanup CRTC and planes drm: sti: add HQVDP plane drm: sti: add cursor plane drm: sti: enable auxiliary CRTC drm: sti: fix delay in VTG programming drm: sti: prepare sti_tvout to support auxiliary crtc drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off} drm: sti: fix hdmi avi infoframe drm: sti: remove event lock while disabling vblank drm: sti: simplify gdp code drm: sti: clear all mixer control drm: sti: remove gpio for HDMI hot plug detection drm: sti: allow to change hdmi ddc i2c adapter drm/doc: Document drm_add_modes_noedid() usage drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm: Zero out DRM object memory upon cleanup drm/i915/bdw: Fix the write setting up the WIZ hashing mode ...	2014-12-15 15:52:01 -08:00
Steven Rostedt (Red Hat)	0daa230296	tracing: Add tp_printk cmdline to have tracepoints go to printk() Add the kernel command line tp_printk option that will have tracepoints that are active sent to printk() as well as to the trace buffer. Passing "tp_printk" will activate this. To turn it off, the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note, this only works if the cmdline option is used. Echoing 1 into the sysctl file without the cmdline option will have no affect. Note, this is a dangerous option. Having high frequency tracepoints send their data to printk() can possibly cause a live lock. This is another reason why this is only active if the command line option is used. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-15 10:17:38 -05:00
Steven Rostedt (Red Hat)	5f893b2639	tracing: Move enabling tracepoints to just after rcu_init() Enabling tracepoints at boot up can be very useful. The tracepoint can be initialized right after RCU has been. There's no need to wait for the early_initcall() to be called. That's too late for some things that can use tracepoints for debugging. Move the logic to enable tracepoints out of the initcalls and into init/main.c to right after rcu_init(). This also allows trace_printk() to be used early too. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-15 10:16:50 -05:00
Linus Torvalds	37da7bbbe8	TTY/Serial driver patches for 3.19-rc1 Here's the big tty/serial driver update for 3.19-rc1. There are a number of TTY core changes/fixes in here from Peter Hurley that have all been teted in linux-next for a long time now. There are also the normal serial driver updates as well, full details in the changelog below. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlSOD/MACgkQMUfUDdst+ymW+wCfbSzoYMRObIImMPWfoQtxkvvN rpkAnAtyEP/zZIfkQIuKTSH6FJxocF8V =WZt3 -----END PGP SIGNATURE----- Merge tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here's the big tty/serial driver update for 3.19-rc1. There are a number of TTY core changes/fixes in here from Peter Hurley that have all been teted in linux-next for a long time now. There are also the normal serial driver updates as well, full details in the changelog below" * tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits) serial: pxa: hold port.lock when reporting modem line changes tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put" tty: Deletion of unnecessary checks before two function calls n_tty: Fix read_buf race condition, increment read_head after pushing data serial: of-serial: add PM suspend/resume support Revert "serial: of-serial: add PM suspend/resume support" Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type" serial: 8250: don't attempt a trylock if in sysrq serial: core: Add big-endian iotype serial: samsung: use port->fifosize instead of hardcoded values serial: samsung: prefer to use fifosize from driver data serial: samsung: fix style problems serial: samsung: wait for transfer completion before clock disable serial: icom: fix error return code serial: tegra: clean up tty-flag assignments serial: Fix io address assign flow with Fintek PCI-to-UART Product serial: mxs-auart: fix tx_empty against shift register serial: mxs-auart: fix gpio change detection on interrupt serial: mxs-auart: Fix mxs_auart_set_ldisc() serial: 8250_dw: Use 64-bit access for OCTEON. ...	2014-12-14 15:23:32 -08:00
Linus Torvalds	caf292ae5b	Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...	2014-12-13 14:14:23 -08:00
Linus Torvalds	8f4385d590	This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: o Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. o The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. o The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. o The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. o Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context. [ Updated to remove unnecessary preempt_disable in printk() ] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUi8A8AAoJEEjnJuOKh9ldv0sH/A+l9Ewrc3Kd0XuUKX9UO9Mj yrDz5dSWTxD6Pi7ni5Zo2f/MebXhrgS8gF1MBN1HMS5s9/9XdTTQijosOfs75iFd xufiDur7ssl2EOLB/ouqWVn16tu1PrPyw+U76JUZvsYlIMSWQu2FH8DSdo59N6Iz 7RxS8rtxJ2IwehmO7tu2Lq5rB7zGL4SET5oIfQ1+KnjzqB5Z1bfm9nGwAc8nozx8 3MqwsClEnXBTkY4eYZzu9wD7Nl/eknzTrk8KDbQ49oTYmoBuuh/s1FMuxe75cY55 wEtDA6HvvTXYnw6YOAMUB41cGnRg3KVRmmhcH5T9jrBxg2iZjXYa8iZxvcAM6Es= =zDMJ -----END PGP SIGNATURE----- Merge tag 'trace-seq-buf-3.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixlet from Steven Rostedt: "Remove unnecessary preempt_disable in printk()" * tag 'trace-seq-buf-3.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk: Do not disable preemption for accessing printk_func	2014-12-13 14:04:41 -08:00
Linus Torvalds	a99abce2d9	Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit Pull audit updates from Paul Moore: "Two small patches from the audit next branch; only one of which has any real significant code changes, the other is simply a MAINTAINERS update for audit. The single code patch is pretty small and rather straightforward, it changes the audit "version" number reported to userspace from an integer to a bitmap which is used to indicate the functionality of the running kernel. This really doesn't have much impact on the kernel, but it will make life easier for the audit userspace folks. Thankfully we were still on a version number which allowed us to do this without breaking userspace" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: convert status version to a feature bitmap audit: add Paul Moore to the MAINTAINERS entry	2014-12-13 13:41:28 -08:00
Jan Kara	0809ab69a2	fsnotify: unify inode and mount marks handling There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:53 -08:00
Riku Voipio	957e3facd1	gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs Following the suggestions from Andrew Morton and Stephen Rothwell, Dont expand the ARCH list in kernel/gcov/Kconfig. Instead, define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures can enable. set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was previously allowed + ARM64 which I tested. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Masanari Iida	d5393955c3	kexec: remove unnecessary KERN_ERR from kexec.c Remove unnecessary KERN_ERR from pr_err() within kexec.c. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
David Drysdale	51f39a1f0c	syscalls: implement execveat() system call This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Joonsoo Kim	9a92a6ce6f	stacktrace: introduce snprint_stack_trace for buffer output Current stacktrace only have the function for console output. page_owner that will be introduced in following patch needs to print the output of stacktrace into the buffer for our own output format so so new function, snprint_stack_trace(), is needed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:48 -08:00
Davidlohr Bueso	4a23717a23	uprobes: share the i_mmap_rwsem Both register and unregister call build_map_info() in order to create the list of mappings before installing or removing breakpoints for every mm which maps file backed memory. As such, there is no reason to hold the i_mmap_rwsem exclusively, so share it and allow concurrent readers to build the mapping data. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:45 -08:00
Davidlohr Bueso	c8c06efa8b	mm: convert i_mmap_mutex to rwsem The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:45 -08:00
Davidlohr Bueso	83cde9e8ba	mm: use new helper functions around the i_mmap_mutex Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:45 -08:00
Thomas Gleixner	c291ee6221	genirq: Prevent proc race against freeing of irq descriptors Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor. CPU0 CPU1 show_interrupts() desc = irq_to_desc(X); free_desc(desc) remove_from_radix_tree(); kfree(desc); raw_spinlock_irq(&desc->lock); /proc/interrupts is the only interface which can actively corrupt kernel memory via the lock access. /proc/stat can only read from freed memory. Extremly hard to trigger, but possible. The interfaces in /proc/irq/N/ are not affected by this because the removal of the proc file is serialized in procfs against concurrent readers/writers. The removal happens before the descriptor is freed. For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue as the descriptor is never freed. It's merely cleared out with the irq descriptor lock held. So any concurrent proc access will either see the old correct value or the cleared out ones. Protect the lookup and access to the irq descriptor in show_interrupts() with the sparse_irq_lock. Provide kstat_irqs_usr() which is protecting the lookup and access with sparse_irq_lock and switch /proc/stat to use it. Document the existing kstat_irqs interfaces so it's clear that the caller needs to take care about protection. The users of these interfaces are either not affected due to SPARSE_IRQ=n or already protected against removal. Fixes: `1f5a5b87f7` "genirq: Implement a sane sparse_irq allocator" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org	2014-12-13 13:33:07 +01:00
Rafael J. Wysocki	798bc6d8d5	tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM After commit `b2b49ccbdd` (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so files that are build conditionally if CONFIG_PM_RUNTIME is set may now be build if CONFIG_PM is set. Replace CONFIG_PM_RUNTIME with CONFIG_PM in kernel/trace/Makefile for this reason. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org.	2014-12-13 02:23:30 +01:00
Linus Torvalds	a7cb7bb664	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...	2014-12-12 10:08:06 -08:00
Ingo Molnar	3459f0d78f	Merge branch 'linus' into perf/urgent, to pick up the upstream merged bits Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-12 09:09:03 +01:00
Linus Torvalds	2756d373a3	Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup update from Tejun Heo: "cpuset got simplified a bit. cgroup core got a fix on unified hierarchy and grew some effective css related interfaces which will be used for blkio support for writeback IO traffic which is currently being worked on" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: implement cgroup_get_e_css() cgroup: add cgroup_subsys->css_e_css_changed() cgroup: add cgroup_subsys->css_released() cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cpuset: lock vs unlock typo cpuset: simplify cpuset_node_allowed API cpuset: convert callback_mutex to a spinlock	2014-12-11 18:57:19 -08:00
Linus Torvalds	0a27044c83	Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue update from Tejun Heo: "Work items which may be involved in memory reclaim path may be executed by the rescuer under memory pressure. When a rescuer gets activated, it processes whatever are on the pending list and then goes back to sleep until the manager kicks it again which involves 100ms delay. This is problematic for self-requeueing work items or the ones running on ordered workqueues as there always is only one work item on the pending list when the rescuer kicks in. The execution of that work item produces more to execute but the rescuer won't see them until after the said 100ms has passed, so such workqueues would only execute one work item every 100ms under prolonged memory pressure, which BTW may be being prolonged due to the slow execution. Neil wrote up a patch which fixes this issue by keeping the rescuer working as long as the target workqueue is busy but doesn't have enough workers" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: allow rescuer thread to do more work. workqueue: invert the order between pool->lock and wq_mayday_lock workqueue: cosmetic update in rescuer_thread()	2014-12-11 18:48:45 -08:00
Linus Torvalds	eedb3d3304	Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "Nothing interesting. A patch to convert the remaining __get_cpu_var() users, another to fix non-critical off-by-one in an assertion and a cosmetic conversion to lockless_dereference() in percpu-ref. The back-merge from mainline is to receive lockless_dereference()" * 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Replace smp_read_barrier_depends() with lockless_dereference() percpu: Convert remaining __get_cpu_var uses in 3.18-rcX percpu: off by one in BUG_ON()	2014-12-11 18:36:26 -08:00
Dave Airlie	b59f78228c	Merge tag 'drm-intel-next-fixes-2014-12-11' of git://anongit.freedesktop.org/drm-intel into drm-next Here's a batch of i915 fixes for 3.19. * tag 'drm-intel-next-fixes-2014-12-11' of git://anongit.freedesktop.org/drm-intel: drm/i915: save/restore GMBUS freq across suspend/resume on gen4 drm/i915: Remove '& 0xffff' from the mask given to WA_REG() drm/i915: Invert the mask and val arguments in wa_add() and WA_REG() drm/i915/bdw: Fix the write setting up the WIZ hashing mode drm/i915: Don't complain about stolen conflicts on gen3 drm/i915: resume MST after reading back hw state drm/i915: Handle inaccurate time conversion issues drm/i915: compute wait_ioctl timeout correctly drm/i915: don't always do full mode sets when infoframes are enabled	2014-12-12 11:39:49 +10:00
Linus Torvalds	27afc5dbda	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The most notable change for this pull request is the ftrace rework from Heiko. It brings a small performance improvement and the ground work to support a new gcc option to replace the mcount blocks with a single nop. Two new s390 specific system calls are added to emulate user space mmio for PCI, an artifact of the how PCI memory is accessed. Two patches for the memory management with changes to common code. For KVM mm_forbids_zeropage is added which disables the empty zero page for an mm that is used by a KVM process. And an optimization, pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full. Some micro optimization for the cmpxchg and the spinlock code. And as usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits) s390/cputime: fix 31-bit compile s390/scm_block: make the number of reqs per HW req configurable s390/scm_block: handle multiple requests in one HW request s390/scm_block: allocate aidaw pages only when necessary s390/scm_block: use mempool to manage aidaw requests s390/eadm: change timeout value s390/mm: fix memory leak of ptlock in pmd_free_tlb s390: use local symbol names in entry[64].S s390/ptrace: always include vector registers in core files s390/simd: clear vector register pointer on fork/clone s390: translate cputime magic constants to macros s390/idle: convert open coded idle time seqcount s390/idle: add missing irq off lockdep annotation s390/debug: avoid function call for debug_sprintf_* s390/kprobes: fix instruction copy for out of line execution s390: remove diag 44 calls from cpu_relax() s390/dasd: retry partition detection s390/dasd: fix list corruption for sleep_on requests s390/dasd: fix infinite term I/O loop s390/dasd: remove unused code ...	2014-12-11 17:30:55 -08:00
Eric W. Biederman	36476beac4	userns; Correct the comment in map_write It is important that all maps are less than PAGE_SIZE or else setting the last byte of the buffer to '0' could write off the end of the allocated storage. Correct the misleading comment. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-11 18:07:06 -06:00
Eric W. Biederman	66d2f338ee	userns: Allow setting gid_maps without privilege when setgroups is disabled Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-11 18:07:06 -06:00
Eric W. Biederman	9cc46516dd	userns: Add a knob to disable setgroups on a per user namespace basis - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-11 18:06:36 -06:00
Linus Torvalds	70e71ca0af	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...	2014-12-11 14:27:06 -08:00
Steven Rostedt (Red Hat)	1fb8915b98	printk: Do not disable preemption for accessing printk_func As printk_func will either be the default function, or a per_cpu function for the current CPU, there's no reason to disable preemption to access it from printk. That's because if the printk_func is not the default then the caller had better disabled preemption as they were the one to change it. Link: http://lkml.kernel.org/r/CA+55aFz5-_LKW4JHEBoWinN9_ouNcGRWAF2FUA35u46FRN-Kxw@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-11 09:12:01 -05:00
Jiri Olsa	9fc81d8742	perf: Fix events installation during moving group We allow PMU driver to change the cpu on which the event should be installed to. This happened in patch: `e2d37cd213` ("perf: Allow the PMU driver to choose the CPU on which to install events") This patch also forces all the group members to follow the currently opened events cpu if the group happened to be moved. This and the change of event->cpu in perf_install_in_context() function introduced in: `0cda4c0231` ("perf: Introduce perf_pmu_migrate_context()") forces group members to change their event->cpu, if the currently-opened-event's PMU changed the cpu and there is a group move. Above behaviour causes problem for breakpoint events, which uses event->cpu to touch cpu specific data for breakpoints accounting. By changing event->cpu, some breakpoints slots were wrongly accounted for given cpu. Vinces's perf fuzzer hit this issue and caused following WARN on my setup: WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150() Can't find any breakpoint slot [...] This patch changes the group moving code to keep the event's original cpu. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Cc: Yan, Zheng <zheng.z.yan@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-11 11:24:15 +01:00
Linus Torvalds	92a578b064	ACPI and power management updates for 3.19-rc1 This time we have some more new material than we used to have during the last couple of development cycles. The most important part of it to me is the introduction of a unified interface for accessing device properties provided by platform firmware. It works with Device Trees and ACPI in a uniform way and drivers using it need not worry about where the properties come from as long as the platform firmware (either DT or ACPI) makes them available. It covers both devices and "bare" device node objects without struct device representation as that turns out to be necessary in some cases. This has been in the works for quite a few months (and development cycles) and has been approved by all of the relevant maintainers. On top of that, some drivers are switched over to the new interface (at25, leds-gpio, gpio_keys_polled) and some additional changes are made to the core GPIO subsystem to allow device drivers to manipulate GPIOs in the "canonical" way on platforms that provide GPIO information in their ACPI tables, but don't assign names to GPIO lines (in which case the driver needs to do that on the basis of what it knows about the device in question). That also has been approved by the GPIO core maintainers and the rfkill driver is now going to use it. Second is support for hardware P-states in the intel_pstate driver. It uses CPUID to detect whether or not the feature is supported by the processor in which case it will be enabled by default. However, it can be disabled entirely from the kernel command line if necessary. Next is support for a platform firmware interface based on ACPI operation regions used by the PMIC (Power Management Integrated Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms. That interface is used for manipulating power resources and for thermal management: sensor temperature reporting, trip point setting and so on. Also the ACPI core is now going to support the _DEP configuration information in a limited way. Basically, _DEP it supposed to reflect off-the-hierarchy dependencies between devices which may be very indirect, like when AML for one device accesses locations in an operation region handled by another device's driver (usually, the device depended on this way is a serial bus or GPIO controller). The support added this time is sufficient to make the ACPI battery driver work on Asus T100A, but it is general enough to be able to cover some other use cases in the future. Finally, we have a new cpufreq driver for the Loongson1B processor. In addition to the above, there are fixes and cleanups all over the place as usual and a traditional ACPICA update to a recent upstream release. As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for Intel platforms should be able to handle power management of the DMA engine correctly, the cpufreq-dt driver should interact with the thermal subsystem in a better way and the ACPI backlight driver should handle some more corner cases, among other things. On top of the ACPICA update there are fixes for race conditions in the ACPICA's interrupt handling code which might lead to some random and strange looking failures on some systems. In the cleanups department the most visible part is the series of commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration option. That was triggered by a discussion regarding the generic power domains code during which we realized that trying to support certain combinations of PM config options was painful and not really worth it, because nobody would use them in production anyway. For this reason, we decided to make CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the conclusion that the latter became redundant and CONFIG_PM could be used instead of it. The material here makes that replacement in a major part of the tree, but there will be at least one more batch of that in the second part of the merge window. Specifics: - Support for retrieving device properties information from ACPI _DSD device configuration objects and a unified device properties interface for device drivers (and subsystems) on top of that. As stated above, this works with Device Trees and ACPI and allows device drivers to be written in a platform firmware (DT or ACPI) agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are now going to use this new interface and the GPIO subsystem is additionally modified to allow device drivers to assign names to GPIO resources returned by ACPI _CRS objects (in case _DSD is not present or does not provide the expected data). The changes in this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam, Geert Uytterhoeven). - Support for Hardware Managed Performance States (HWP) as described in Volume 3, section 14.4, of the Intel SDM in the intel_pstate driver. CPUID is used to detect whether or not the feature is supported by the processor. If supported, it will be enabled automatically unless the intel_pstate=no_hwp switch is present in the kernel command line. From Dirk Brandewie. - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie). - Support for firmware interface based on ACPI operation regions used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR platforms for power resource control and thermal management (Aaron Lu). - Limited support for retrieving off-the-hierarchy dependencies between devices from ACPI _DEP device configuration objects and deferred probing support for the ACPI battery driver based on the _DEP information to make that driver work on Asus T100A (Lan Tianyu). - New cpufreq driver for the Loongson1B processor (Kelvin Cheung). - ACPICA update to upstream revision 20141107 which only affects tools (Bob Moore). - Fixes for race conditions in the ACPICA's interrupt handling code and in the ACPI code related to system suspend and resume (Lv Zheng and Rafael J Wysocki). - ACPI core fix for an RCU-related issue in the ioremap() regions management code that slowed down significantly after CPUs had been allowed to enter idle states even if they'd had RCU callbakcs queued and triggered some problems in certain proprietary graphics driver (and elsewhere). The fix replaces synchronize_rcu() in that code with synchronize_rcu_expedited() which makes the issue go away. From Konstantin Khlebnikov. - ACPI LPSS (Low-Power Subsystem) driver fix to handle power management of the DMA engine included into the LPSS correctly. The problem is that the DMA engine doesn't have ACPI PM support of its own and it simply is turned off when the last LPSS device having ACPI PM support goes into D3cold. To work around that, the PM domain used by the ACPI LPSS driver is redesigned so at least one device with ACPI PM support will be on as long as the DMA engine is in use. From Andy Shevchenko. - ACPI backlight driver fix to avoid using it on "Win8-compatible" systems where it doesn't work and where it was used by default by mistake (Aaron Lu). - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki, Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin Chaugule (mostly related to the upcoming ARM64 support). - Intel RAPL (Running Average Power Limit) power capping driver fixes and improvements including new processor IDs (Jacob Pan). - Generic power domains modification to power up domains after attaching devices to them to meet the expectations of device drivers and bus types assuming devices to be accessible at probe time (Ulf Hansson). - Preliminary support for controlling device clocks from the generic power domains core code and modifications of the ARM/shmobile platform to use that feature (Ulf Hansson). - Assorted minor fixes and cleanups of the generic power domains core code (Ulf Hansson, Geert Uytterhoeven). - Assorted minor fixes and cleanups of the device clocks control code in the PM core (Geert Uytterhoeven, Grygorii Strashko). - Consolidation of device power management Kconfig options by making CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter which is now redundant (Rafael J Wysocki and Kevin Hilman). That is the first batch of the changes needed for this purpose. - Core device runtime power management support code cleanup related to the execution of callbacks (Andrzej Hajda). - cpuidle ARM support improvements (Lorenzo Pieralisi). - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and Bartlomiej Zolnierkiewicz). - New cpufreq driver callback (->ready) to be executed when the cpufreq core is ready to use a given policy object and cpufreq-dt driver modification to use that callback for cooling device registration (Viresh Kumar). - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James Geboski, Tomeu Vizoso). - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate, cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao, Stefan Wahren, Petr Cvek). - OPP (Operating Performance Points) framework modification to allow OPPs to be removed too and update of a few cpufreq drivers (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added during initialization) on driver removal (Viresh Kumar). - Hibernation core fixes and cleanups (Tina Ruchandani and Markus Elfring). - PM Kconfig fix related to CPU power management (Pankaj Dubey). - cpupower tool fix (Prarit Bhargava). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70 uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER 4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44 WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+ LBdhvVaS1VW4v+FWXEoZ =5dox -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "This time we have some more new material than we used to have during the last couple of development cycles. The most important part of it to me is the introduction of a unified interface for accessing device properties provided by platform firmware. It works with Device Trees and ACPI in a uniform way and drivers using it need not worry about where the properties come from as long as the platform firmware (either DT or ACPI) makes them available. It covers both devices and "bare" device node objects without struct device representation as that turns out to be necessary in some cases. This has been in the works for quite a few months (and development cycles) and has been approved by all of the relevant maintainers. On top of that, some drivers are switched over to the new interface (at25, leds-gpio, gpio_keys_polled) and some additional changes are made to the core GPIO subsystem to allow device drivers to manipulate GPIOs in the "canonical" way on platforms that provide GPIO information in their ACPI tables, but don't assign names to GPIO lines (in which case the driver needs to do that on the basis of what it knows about the device in question). That also has been approved by the GPIO core maintainers and the rfkill driver is now going to use it. Second is support for hardware P-states in the intel_pstate driver. It uses CPUID to detect whether or not the feature is supported by the processor in which case it will be enabled by default. However, it can be disabled entirely from the kernel command line if necessary. Next is support for a platform firmware interface based on ACPI operation regions used by the PMIC (Power Management Integrated Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms. That interface is used for manipulating power resources and for thermal management: sensor temperature reporting, trip point setting and so on. Also the ACPI core is now going to support the _DEP configuration information in a limited way. Basically, _DEP it supposed to reflect off-the-hierarchy dependencies between devices which may be very indirect, like when AML for one device accesses locations in an operation region handled by another device's driver (usually, the device depended on this way is a serial bus or GPIO controller). The support added this time is sufficient to make the ACPI battery driver work on Asus T100A, but it is general enough to be able to cover some other use cases in the future. Finally, we have a new cpufreq driver for the Loongson1B processor. In addition to the above, there are fixes and cleanups all over the place as usual and a traditional ACPICA update to a recent upstream release. As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for Intel platforms should be able to handle power management of the DMA engine correctly, the cpufreq-dt driver should interact with the thermal subsystem in a better way and the ACPI backlight driver should handle some more corner cases, among other things. On top of the ACPICA update there are fixes for race conditions in the ACPICA's interrupt handling code which might lead to some random and strange looking failures on some systems. In the cleanups department the most visible part is the series of commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration option. That was triggered by a discussion regarding the generic power domains code during which we realized that trying to support certain combinations of PM config options was painful and not really worth it, because nobody would use them in production anyway. For this reason, we decided to make CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the conclusion that the latter became redundant and CONFIG_PM could be used instead of it. The material here makes that replacement in a major part of the tree, but there will be at least one more batch of that in the second part of the merge window. Specifics: - Support for retrieving device properties information from ACPI _DSD device configuration objects and a unified device properties interface for device drivers (and subsystems) on top of that. As stated above, this works with Device Trees and ACPI and allows device drivers to be written in a platform firmware (DT or ACPI) agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are now going to use this new interface and the GPIO subsystem is additionally modified to allow device drivers to assign names to GPIO resources returned by ACPI _CRS objects (in case _DSD is not present or does not provide the expected data). The changes in this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam, Geert Uytterhoeven). - Support for Hardware Managed Performance States (HWP) as described in Volume 3, section 14.4, of the Intel SDM in the intel_pstate driver. CPUID is used to detect whether or not the feature is supported by the processor. If supported, it will be enabled automatically unless the intel_pstate=no_hwp switch is present in the kernel command line. From Dirk Brandewie. - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie). - Support for firmware interface based on ACPI operation regions used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR platforms for power resource control and thermal management (Aaron Lu). - Limited support for retrieving off-the-hierarchy dependencies between devices from ACPI _DEP device configuration objects and deferred probing support for the ACPI battery driver based on the _DEP information to make that driver work on Asus T100A (Lan Tianyu). - New cpufreq driver for the Loongson1B processor (Kelvin Cheung). - ACPICA update to upstream revision 20141107 which only affects tools (Bob Moore). - Fixes for race conditions in the ACPICA's interrupt handling code and in the ACPI code related to system suspend and resume (Lv Zheng and Rafael J Wysocki). - ACPI core fix for an RCU-related issue in the ioremap() regions management code that slowed down significantly after CPUs had been allowed to enter idle states even if they'd had RCU callbakcs queued and triggered some problems in certain proprietary graphics driver (and elsewhere). The fix replaces synchronize_rcu() in that code with synchronize_rcu_expedited() which makes the issue go away. From Konstantin Khlebnikov. - ACPI LPSS (Low-Power Subsystem) driver fix to handle power management of the DMA engine included into the LPSS correctly. The problem is that the DMA engine doesn't have ACPI PM support of its own and it simply is turned off when the last LPSS device having ACPI PM support goes into D3cold. To work around that, the PM domain used by the ACPI LPSS driver is redesigned so at least one device with ACPI PM support will be on as long as the DMA engine is in use. From Andy Shevchenko. - ACPI backlight driver fix to avoid using it on "Win8-compatible" systems where it doesn't work and where it was used by default by mistake (Aaron Lu). - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki, Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin Chaugule (mostly related to the upcoming ARM64 support). - Intel RAPL (Running Average Power Limit) power capping driver fixes and improvements including new processor IDs (Jacob Pan). - Generic power domains modification to power up domains after attaching devices to them to meet the expectations of device drivers and bus types assuming devices to be accessible at probe time (Ulf Hansson). - Preliminary support for controlling device clocks from the generic power domains core code and modifications of the ARM/shmobile platform to use that feature (Ulf Hansson). - Assorted minor fixes and cleanups of the generic power domains core code (Ulf Hansson, Geert Uytterhoeven). - Assorted minor fixes and cleanups of the device clocks control code in the PM core (Geert Uytterhoeven, Grygorii Strashko). - Consolidation of device power management Kconfig options by making CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter which is now redundant (Rafael J Wysocki and Kevin Hilman). That is the first batch of the changes needed for this purpose. - Core device runtime power management support code cleanup related to the execution of callbacks (Andrzej Hajda). - cpuidle ARM support improvements (Lorenzo Pieralisi). - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and Bartlomiej Zolnierkiewicz). - New cpufreq driver callback (->ready) to be executed when the cpufreq core is ready to use a given policy object and cpufreq-dt driver modification to use that callback for cooling device registration (Viresh Kumar). - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James Geboski, Tomeu Vizoso). - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate, cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao, Stefan Wahren, Petr Cvek). - OPP (Operating Performance Points) framework modification to allow OPPs to be removed too and update of a few cpufreq drivers (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added during initialization) on driver removal (Viresh Kumar). - Hibernation core fixes and cleanups (Tina Ruchandani and Markus Elfring). - PM Kconfig fix related to CPU power management (Pankaj Dubey). - cpupower tool fix (Prarit Bhargava)" * tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits) i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tools: cpupower: fix return checks for sysfs_get_idlestate_count() drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM leds: leds-gpio: Fix multiple instances registration without 'label' property iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hwrandom / exynos / PM: Use CONFIG_PM in #ifdef block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM USB / PM: Drop CONFIG_PM_RUNTIME from the USB core PM: Merge the SET*_RUNTIME_PM_OPS() macros ...	2014-12-10 21:17:00 -08:00
Linus Torvalds	350e4f4985	This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: o Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. o The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. o The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. o The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. o Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhbrnAAoJEEjnJuOKh9ldsCoIAJ3sKIJ5B3jxJJTCHPAx/lZD GVbV1J1mu4kTAZuhJZOAxW8D6PZGZMyEjg0y6ScDEnBGcjAZ9gTiWCdakPktf9EX GfaPPqwiL9dZ18J9Qc6uR+7M1Ffpzzwbcc6lJrpoTcjRgkoH9wCiLS9ozFQyYzWb /7m5UbUM/PIk9WAjLYXPW6UUVtPTPT0RdEQKofMGTeah+vgqj4TXCOROdlxsXXWF 77vqBvPd5TUPWFH9ftzJGDtZS8SroXVKCu3fZIqHgzAU0yqwVtH/JzDTy9u2UYhX GzDEPeAIdp6m6Uyc406VuIf1QW0gfBgmA0ir80vFoP27uFMM6j5HlF7azgQfx34= =YBgA -----END PGP SIGNATURE----- Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull nmi-safe seq_buf printk update from Steven Rostedt: "This code is a fork from the trace-3.19 pull as it needed the trace_seq clean ups from that branch. This code solves the issue of performing stack dumps from NMI context. The issue is that printk() is not safe from NMI context as if the NMI were to trigger when a printk() was being performed, the NMI could deadlock from the printk() internal locks. This has been seen in practice. With lots of review from Petr Mladek, this code went through several iterations, and we feel that it is now at a point of quality to be accepted into mainline. Here's what is contained in this patch set: - Creates a "seq_buf" generic buffer utility that allows a descriptor to be passed around where functions can write their own "printk()" formatted strings into it. The generic version was pulled out of the trace_seq() code that was made specifically for tracing. - The seq_buf code was change to model the seq_file code. I have a patch (not included for 3.19) that converts the seq_file.c code over to use seq_buf.c like the trace_seq.c code does. This was done to make sure that seq_buf.c is compatible with seq_file.c. I may try to get that patch in for 3.20. - The seq_buf.c file was moved to lib/ to remove it from being dependent on CONFIG_TRACING. - The printk() was updated to allow for a per_cpu "override" of the internal calls. That is, instead of writing to the console, a call to printk() may do something else. This made it easier to allow the NMI to change what printk() does in order to call dump_stack() without needing to update that code as well. - Finally, the dump_stack from all CPUs via NMI code was converted to use the seq_buf code. The caller to trigger the NMI code would wait till all the NMIs finished, and then it would print the seq_buf data to the console safely from a non NMI context One added bonus is that this code also makes the NMI dump stack work on PREEMPT_RT kernels. As printk() includes sleeping locks on PREEMPT_RT, printk() only writes to console if the console does not use any rt_mutex converted spin locks. Which a lot do" * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/nmi: Fix use of unallocated cpumask_var_t printk/percpu: Define printk_func when printk is not defined x86/nmi: Perform a safe NMI stack trace on all CPUs printk: Add per_cpu printk func to allow printk to be diverted seq_buf: Move the seq_buf code to lib/ seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions tracing: Have seq_buf use full buffer seq_buf: Add seq_buf_can_fit() helper function tracing: Add paranoid size check in trace_printk_seq() tracing: Use trace_seq_used() and seq_buf_used() instead of len tracing: Clean up tracing_fill_pipe_page() seq_buf: Create seq_buf_used() to find out how much was written tracing: Add a seq_buf_clear() helper and clear len and readpos in init tracing: Convert seq_buf fields to be like seq_file fields tracing: Convert seq_buf_path() to be like seq_path() tracing: Create seq_buf layer in trace_seq	2014-12-10 20:35:41 -08:00
Linus Torvalds	1dd7dcb6ea	There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: o Added some "!" (NOT) logic to the tracing filter. o Fixed the frame pointer logic to the x86_64 mcount trampolines o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY= =Z7Cm -----END PGP SIGNATURE----- Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: - Added some "!" (NOT) logic to the tracing filter. - Fixed the frame pointer logic to the x86_64 mcount trampolines - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them" * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits) tracing: Truncated output is better than nothing tracing: Add additional marks to signal very large time deltas Documentation: describe trace_buf_size parameter more accurately tracing: Allow NOT to filter AND and OR clauses tracing: Add NOT to filtering logic ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter ftrace/x86: Get rid of ftrace_caller_setup ftrace/x86: Have save_mcount_regs macro also save stack frames if needed ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs ftrace/x86: Simplify save_mcount_regs on getting RIP ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file ftrace/x86: Have static tracing also use ftrace_caller_setup ftrace/x86: Have static function tracing always test for function graph kprobes: Add IPMODIFY flag to kprobe_ftrace_ops ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict kprobes/ftrace: Recover original IP if pre_handler doesn't change it tracing/trivial: Fix typos and make an int into a bool tracing: Deletion of an unnecessary check before iput() ...	2014-12-10 19:58:13 -08:00
Linus Torvalds	b6da0076ba	Merge branch 'akpm' (patchbomb from Andrew) Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...	2014-12-10 18:34:42 -08:00
Al Viro	707c5960f1	Merge branch 'nsfs' into for-next	2014-12-10 21:31:59 -05:00
Oleg Nesterov	a53b831549	exit: pidns: fix/update the comments in zap_pid_ns_processes() The comments in zap_pid_ns_processes() are not clear, we need to explain how this code actually works. 1. "Ignore SIGCHLD" looks like optimization but it is not, we also need this for correctness. 2. The comment above sys_wait4() could tell more. EXIT_ZOMBIE child is only possible if it has exited before we ignored SIGCHLD. Or if it is traced from the parent namespace, but in this case it will be reaped by debugger after detach, sys_wait4() acts as a synchronization point. 3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is outdated. Contrary to what it says we do not need to make sure they all go away after `0a01f2cc39` "pidns: Make the pidns proc mount/umount logic obvious". At the same time, we do need to wait for nr_hashed==init_pids, but the reasons are quite different and not obvious: setns(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:18 -08:00
Oleg Nesterov	24c037ebf5	exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:18 -08:00
Oleg Nesterov	6c66e7dba3	exit: exit_notify: re-use "dead" list to autoreap current After the previous change we can add just the exiting EXIT_DEAD task to the "dead" list and remove another release_task(tsk). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:18 -08:00
Oleg Nesterov	482a3767e5	exit: reparent: call forget_original_parent() under tasklist_lock Shift "release dead children" loop from forget_original_parent() to its caller, exit_notify(). It is safe to reap them even if our parent reaps us right after we drop tasklist_lock, those children no longer have any connection to the exiting task. And this allows us to avoid write_lock_irq(tasklist_lock) right after it was released by forget_original_parent(), we can simply call it with tasklist_lock held. While at it, move the comment about forget_original_parent() up to this function. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:18 -08:00
Oleg Nesterov	ad9e206aef	exit: reparent: avoid find_new_reaper() if no children Now that pid_ns logic was isolated we can change forget_original_parent() to return right after find_child_reaper() when father->children is empty, there is nothing to reparent in this case. In particular this avoids find_alive_thread() and this can help if the whole process exits and it has a lot of PF_EXITING threads at the start of the thread list, this can easily lead to O(nr_threads ** 2) iterations. Trivial test case (tested under KVM, 2 CPUs): static void tfunc(void arg) { pause(); return NULL; } static int child(unsigned int nt) { pthread_t pt; while (nt--) assert(pthread_create(&pt, NULL, tfunc, NULL) == 0); pthread_kill(pt, SIGTRAP); pause(); return 0; } int main(int argc, const char *argv[]) { int stat; unsigned int nf = atoi(argv[1]); unsigned int nt = atoi(argv[2]); while (nf--) { if (!fork()) return child(nt); wait(&stat); assert(stat == SIGTRAP); } return 0; } $ time ./test 16 16536 shows: real user sys - 5m37.628s 0m4.437s 8m5.560s + 0m50.032s 0m7.130s 1m4.927s Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:18 -08:00
Oleg Nesterov	c9dc05bfdb	exit: reparent: introduce find_alive_thread() Add the new simple helper to factor out the for_each_thread() code in find_child_reaper() and find_new_reaper(). It can also simplify the potential PF_EXITING -> exit_state change, plus perhaps we can change this code to take SIGNAL_GROUP_EXIT into account. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	1109909c7d	exit: reparent: introduce find_child_reaper() find_new_reaper() does 2 completely different things. Not only it finds a reaper, it also updates pid_ns->child_reaper or kills the whole namespace if the caller is ->child_reaper. Now that has_child_subreaper logic doesn't depend on child_reaper check we can move that pid_ns code into a separate helper. IMHO this makes the code more clean, and this allows the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	175aed3f8d	exit: reparent: document the ->has_child_subreaper checks Swap the "init_task" and same_thread_group() checks. This way it is more simple to document these checks and we can remove the link to the previous discussion on lkml. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	3750ef979c	exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() Change find_new_reaper() to use for_each_thread() instead of deprecated while_each_thread(). We do not bother to check "thread != father" in the 1st loop, we can rely on PF_EXITING check. Note: this means the minor behavioural change: for_each_thread() starts from the group leader. But this should be fine, nobody should make any assumption about do_wait(__WNOTHREAD) when it comes to reparented tasks. And this can avoid the pointless reparenting to a short-living thread While zombie leaders are not that common. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	7d24e2df52	exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting find_new_reaper() assumes that "has_child_subreaper" logic is safe as long as we are not the exiting ->child_reaper and this is doubly wrong: 1. In fact it is safe if "pid_ns->child_reaper == father"; there must be no children after zap_pid_ns_processes() returns, so it doesn't matter what we return in this case and even pid_ns->child_reaper is wrong otherwise: we can't reparent to ->child_reaper == current. This is not a bug, but this is confusing. 2. It is not safe if we are not pid_ns->child_reaper but from the same thread group. We drop tasklist_lock before zap_pid_ns_processes(), so another thread can lock it and choose the new reaper from the upper namespace if has_child_subreaper == T, and this is obviously wrong. This is not that bad, zap_pid_ns_processes() won't return until the the new reaper reaps all zombies, but this should be fixed anyway. We could change for_each_thread() loop to use ->exit_state instead of PF_EXITING which we had to use until `8aac62706a`, or we could change copy_signal() to check CLONE_NEWPID before setting has_child_subreaper, but lets change this code so that it is clear we can't look outside of our namespace, otherwise same_thread_group(reaper, child_reaper) check will look wrong and confusing anyway. We can simply start from "father" and fix the problem. We can't wrongly return a thread from the same thread group if ->is_child_subreaper == T, we know that all threads have PF_EXITING set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	8a1296aea4	exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting The ->has_child_subreaper code in find_new_reaper() finds alive "thread" but returns another "reaper" thread which can be dead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	26e75b5c3d	exit: release_task: fix the comment about group leader accounting Contrary to what the comment in __exit_signal() says we do account the group leader. Fix this and explain why. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	986094dfe1	exit: wait: drop tasklist_lock before psig->c* accounting wait_task_zombie() no longer needs tasklist_lock to accumulate the psig->c* counters, we can drop it right after cmpxchg(exit_state). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	f953ccd006	exit: wait: don't use zombie->real_parent 1. wait_task_zombie() uses p->real_parent to get psig/siglock. This is correct but needs tasklist_lock, ->real_parent can exit. We can use "current" instead. This is our natural child, its parent must be our sub-thread. 2. Read psig/sig outside of ->siglock, ->signal is no longer protected by this lock. 3. Fix the outdated comments about tasklist_lock. We can not race with __exit_signal(), the whole thread group is dead, nobody but us can call it. Also clarify the usage of ->stats_lock and ->siglock. Note: thread_group_cputime_adjusted() is sub-optimal in this case, we probably want to export cputime_adjust() to avoid thread_group_cputime(). The comment says "all threads" but there are no other threads. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	f6507f83bc	exit: wait: cleanup the ptrace_reparented() checks Now that EXIT_DEAD is the terminal state we can kill "int traced" variable and check "state == EXIT_DEAD" instead to cleanup the code. In particular, this way it is clear that the check obviously doesn't need tasklist_lock. Also fix the type of "unsigned long state", "long" was always wrong although this doesn't matter because cmpxchg/xchg uses typeof(*ptr). [akpm@linux-foundation.org: don't make me google the C Operator Precedence table] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	7f6def9f9b	usermodehelper: kill the kmod_thread_locker logic Now that we do not call kernel_thread(CLONE_VFORK) from the worker thread we can not deadlock if do_execve() in turn triggers another call_usermodehelper(), we can remove the kmod_thread_locker code. Note: we should probably kill khelper_wq and simply use one of the global workqueues, say, system_unbound_wq, this special wq for umh buys nothing nowadays. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Oleg Nesterov	7117bc8888	usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() After "kernel/kmod: fix use-after-free of the sub_infostructure" CLONE_VFORK in __call_usermodehelper() buys nothing, we rely on on umh_complete() in ____call_usermodehelper() anyway. Remove it. This also eliminates the unnecessary sleep/wakeup in the likely case, and this allows the next change. While at it, kill the "int wait" locals in ____call_usermodehelper() and __call_usermodehelper(), they can safely use sub_info->wait. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:16 -08:00
Alex Elder	f099755d4c	printk: drop logbuf_cpu volatile qualifier Pranith Kumar posted a patch in which removed the "volatile" qualifier for the "logbuf_cpu" variable in vprintk_emit(). https://lkml.org/lkml/2014/11/13/894 In his patch, he used ACCESS_ONCE() for all references to that symbol to provide whatever protection was intended. There was some discussion that followed, and in the end Steven Rostedt concluded that not only was "volatile" not needed, neither was it required to use ACCESS_ONCE(). I offered an elaborate description that concluded Steven was right, and Pranith asked me to submit an alternative patch. And this is it. The basic reason "volatile" is not needed is that "logbuf_cpu" has static storage duration, and vprintk_emit() is an exported interface. This means that the value of logbuf_cpu must be read from memory the first time it is used in a particular call of vprintk_emit(). The variable's value is read only once in that function, when it's read it'll be the copy from memory (or cache). In addition, the value of "logbuf_cpu" is only ever written under protection of a spinlock. So the value that is read is the "real" value (and not an out-of-date cached one). If its value is not UINT_MAX, it is the current CPU's processor id, and it will have been last written by the running CPU. Signed-off-by: Alex Elder <elder@linaro.org> Reported-by: Pranith Kumar <bobby.prani@gmail.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:11 -08:00
Joe Perches	a39d4a857d	printk: add and use LOGLEVEL_<level> defines for KERN_<LEVEL> equivalents Use #defines instead of magic values. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:11 -08:00
Joe Perches	1dc6244bd6	printk: remove used-once early_vprintk Eliminate the unlikely possibility of message interleaving for early_printk/early_vprintk use. early_vprintk can be done via the %pV extension so remove this unnecessary function and change early_printk to have the equivalent vprintk code. All uses of early_printk already end with a newline so also remove the unnecessary newline from the early_printk function. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Prarit Bhargava	9e3961a097	kernel: add panic_on_warn There have been several times where I have had to rebuild a kernel to cause a panic when hitting a WARN() in the code in order to get a crash dump from a system. Sometimes this is easy to do, other times (such as in the case of a remote admin) it is not trivial to send new images to the user. A much easier method would be a switch to change the WARN() over to a panic. This makes debugging easier in that I can now test the actual image the WARN() was seen on and I do not have to engage in remote debugging. This patch adds a panic_on_warn kernel parameter and /proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common() path. The function will still print out the location of the warning. An example of the panic_on_warn output: The first line below is from the WARN_ON() to output the WARN_ON()'s location. After that the panic() output is displayed. WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]() Kernel panic - not syncing: panic_on_warn set ... CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df Call Trace: [<ffffffff81665190>] dump_stack+0x46/0x58 [<ffffffff8165e2ec>] panic+0xd0/0x204 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module] [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module] [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module] [<ffffffff81002144>] do_one_initcall+0xd4/0x210 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110 [<ffffffff810f8889>] load_module+0x16a9/0x1b30 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17 Successfully tested by me. hpa said: There is another very valid use for this: many operators would rather a machine shuts down than being potentially compromised either functionally or security-wise. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Oleg Nesterov	7c8bd2322c	exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent() Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks, we can simply pass "dead_children" list to exit_ptrace() and remove another release_task() loop. Plus this way we do not need to drop and reacquire tasklist_lock. Also shift the list_empty(ptraced) check, if we want this optimization it makes sense to eliminate the function call altogether. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Oleg Nesterov	2831096e21	exit: reparent: cleanup the usage of reparent_leader() 1. Now that reparent_leader() doesn't abuse ->sibling we can shift list_move_tail() from reparent_leader() to forget_original_parent() and turn it into a single list_splice_tail_init(). This also makes BUG_ON(!list_empty()) and list_for_each_entry_safe() unnecessary. 2. This also allows to shift the same_thread_group() check, it looks a bit more clear in the caller. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Oleg Nesterov	57a059187d	exit: reparent: cleanup the changing of ->parent 1. Cosmetic, but "if (t->parent == father)" looks a bit confusing. We need to change t->parent if and only if t is not traced. 2. If we actually want this BUG_ON() to ensure that parent/ptrace match each other, then we should also take ptrace_reparented() case into account too. 3. Change this code to use for_each_thread() instead of deprecated while_each_thread(). [dan.carpenter@oracle.com: silence a bogus static checker warning] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Oleg Nesterov	dc2fd4b009	exit: reparent: use ->ptrace_entry rather than ->sibling for EXIT_DEAD tasks reparent_leader() reuses ->sibling as a list node to add an EXIT_DEAD task into dead_children list we are going to release. This obviously removes the dead task from its real_parent->children list and this is even good; the parent can do nothing with the EXIT_DEAD reparented zombie, it only makes do_wait() slower. But, this also means that it can not be reparented once again, so if its new parent dies too nobody will update ->parent/real_parent, they can point to the freed memory even before release_task() we are going to call, this breaks the code which relies on pid_alive() to access ->real_parent/parent. Fortunately this is mostly theoretical, this can only happen if init or PR_SET_CHILD_SUBREAPER process ignores SIGCHLD and the new parent sub-thread exits right after we drop tasklist_lock. Change this code to use ->ptrace_entry instead, we know that the child is not traced so nobody can ever use this member. This also allows to unify this logic with exit_ptrace(), see the next changes. Note: we really need to change release_task() to nullify real_parent/ parent/group_leader pointers, but we need to change the current users first somehow. And it would be better to reap this zombie immediately but release_task_locked() we need is complicated by proc_flush_task(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Oleg Nesterov	a90e984c8a	sched_show_task: fix unsafe usage of ->real_parent rcu_read_lock() can not protect p->real_parent if release_task(p) was already called, change sched_show_task() to check pis_alive() like other users do. Note: we need some helpers to cleanup the code like this. And it seems that that the usage of cpu_curr(cpu) in dump_cpu_task() is not safe too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:09 -08:00
Johannes Weiner	5b1efc027c	kernel: res_counter: remove the unused API All memory accounting and limiting has been switched over to the lockless page counters. Bye, res_counter! [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt] [mhocko@suse.cz: ditch the last remainings of res_counter] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:04 -08:00
Linus Torvalds	cbfe0de303	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS changes from Al Viro: "First pile out of several (there _definitely_ will be more). Stuff in this one: - unification of d_splice_alias()/d_materialize_unique() - iov_iter rewrite - killing a bunch of ->f_path.dentry users (and f_dentry macro). Getting that completed will make life much simpler for unionmount/overlayfs, since then we'll be able to limit the places sensitive to file _dentry_ to reasonably few. Which allows to have file_inode(file) pointing to inode in a covered layer, with dentry pointing to (negative) dentry in union one. Still not complete, but much closer now. - crapectomy in lustre (dead code removal, mostly) - "let's make seq_printf return nothing" preparations - assorted cleanups and fixes There _definitely_ will be more piles" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) copy_from_iter_nocache() new helper: iov_iter_kvec() csum_and_copy_..._iter() iov_iter.c: handle ITER_KVEC directly iov_iter.c: convert copy_to_iter() to iterate_and_advance iov_iter.c: convert copy_from_iter() to iterate_and_advance iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() iov_iter.c: convert iov_iter_zero() to iterate_and_advance iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds iov_iter.c: convert iov_iter_npages() to iterate_all_kinds iov_iter.c: iterate_and_advance iov_iter.c: macros for iterating over iov_iter kill f_dentry macro dcache: fix kmemcheck warning in switch_names new helper: audit_file() nfsd_vfs_write(): use file_inode() ncpfs: use file_inode() kill f_dentry uses lockd: get rid of ->f_path.dentry->d_sb ...	2014-12-10 16:10:49 -08:00
Linus Torvalds	08e2fb6ce6	On a system that restricts access to dmesg, don't let people side-step that by reading copies that pstore saved. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUheJgAAoJEKurIx+X31iB5F0P/jdpAw6cI26icGiOcRvRYvce jLq/WbGggxZlx3rtgGpekJmcJ1NBBTLdyx4b86q4q/zstQkoJ9lqGCn63YcIMJNB pdctmbkGyoQQXBTAzSCFs6pybMUmtYKMDiT3OJddcCm4fUjd4RQHvNP+5ESsf0lQ 9YpIS+rZOtB2/5N6/i4+Lnaffc3s5gXw/dJMxOm/laWtRFRyhf22YP18cRp5LmuV NHqu1NoeLnar/qL6plPl73lEyZVOPRC01T7OWmmCkcLieYPGkqQlkoXp95VBKf5u CvD167oM71OccMa0gOTlCS8a6y5KO6y8I+YAR60iANTLDh+rHZiwNj1gY4v/Z29m 2ba1xAulQrpCxqml6eVxAKaF+4HXaXVXKqjQIivJcGyfYf6BXLMvC0M3Lsv7XQdz HKl++o0JELDEJjVW0i9Wa5CjgcqXdvuRXOoKDaKTZWff2yfUxqIN5Xl7zIV2kgVy ZqPDBHJSmHjuzmJ6inhPkmdS2uz94PVSE7ykeaa8iCBbpdsS+FchtF2sRMvUhU23 ekHsxk0Mk/pS5EBNc6rrrM9NtKrUQMa1e/oT5G7QowksDeNpsPjx92OeUImxgh3x +hmObN9vx6SepwVSfjI1rwrMsAknphJfPmyi/XJgkVbfRMCv2we1npvYd6hqFUMV daekMzGOi5eqoaWB8hje =Ezg0 -----END PGP SIGNATURE----- Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore fixes from Tony Luck: "On a system that restricts access to dmesg, don't let people side-step that by reading copies that pstore saved" * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: syslog: Provide stub check_syslog_permissions pstore: Honor dmesg_restrict sysctl on dmesg dumps pstore/ram: Strip ramoops header for correct decompression	2014-12-10 15:15:56 -08:00
David S. Miller	22f10923dd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/amd/xgbe/xgbe-desc.c drivers/net/ethernet/renesas/sh_eth.c Overlapping changes in both conflict cases. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-10 15:48:20 -05:00
Linus Torvalds	d82012695e	Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more 2038 timer work from Thomas Gleixner: "Two more patches for the ongoing 2038 work: - New accessors to clock MONOTONIC and REALTIME seconds This is a seperate branch as Arnd has follow up work depending on this" * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Provide y2038 safe accessor to the seconds portion of CLOCK_REALTIME timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC	2014-12-10 10:13:28 -08:00
Linus Torvalds	3eb5b893eb	Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MPX support from Thomas Gleixner: "This enables support for x86 MPX. MPX is a new debug feature for bound checking in user space. It requires kernel support to handle the bound tables and decode the bound violating instruction in the trap handler" * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: asm-generic: Remove asm-generic arch_bprm_mm_init() mm: Make arch_unmap()/bprm_mm_init() available to all architectures x86: Cleanly separate use of asm-generic/mm_hooks.h x86 mpx: Change return type of get_reg_offset() fs: Do not include mpx.h in exec.c x86, mpx: Add documentation on Intel MPX x86, mpx: Cleanup unused bound tables x86, mpx: On-demand kernel allocation of bounds tables x86, mpx: Decode MPX instruction to get bound violation information x86, mpx: Add MPX-specific mmap interface x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: Add MPX to disabled features ia64: Sync struct siginfo with general version mips: Sync struct siginfo with general version mpx: Extend siginfo structure to include bound violation information x86, mpx: Rename cfg_reg_u and status_reg x86: mpx: Give bndX registers actual names x86: Remove arbitrary instruction size limit in instruction decoder	2014-12-10 09:34:43 -08:00
Linus Torvalds	9e66645d72	Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq domain updates from Thomas Gleixner: "The real interesting irq updates: - Support for hierarchical irq domains: For complex interrupt routing scenarios where more than one interrupt related chip is involved we had no proper representation in the generic interrupt infrastructure so far. That made people implement rather ugly constructs in their nested irq chip implementations. The main offenders are x86 and arm/gic. To distangle that mess we have now hierarchical irqdomains which seperate the various interrupt chips and connect them via the hierarchical domains. That keeps the domain specific details internal to the particular hierarchy level and removes the criss/cross referencing of chip internals. The resulting hierarchy for a complex x86 system will look like this: vector mapped: 74 msi-0 mapped: 2 dmar-ir-1 mapped: 69 ioapic-1 mapped: 4 ioapic-0 mapped: 20 pci-msi-2 mapped: 45 dmar-ir-0 mapped: 3 ioapic-2 mapped: 1 pci-msi-1 mapped: 2 htirq mapped: 0 Neither ioapic nor pci-msi know about the dmar interrupt remapping between themself and the vector domain. If interrupt remapping is disabled ioapic and pci-msi become direct childs of the vector domain. In hindsight we should have done that years ago, but in hindsight we always know better :) - Support for generic MSI interrupt domain handling We have more and more non PCI related MSI interrupts, so providing a generic infrastructure for this is better than having all affected architectures implementing their own private hacks. - Support for PCI-MSI interrupt domain handling, based on the generic MSI support. This part carries the pci/msi branch from Bjorn Helgaas pci tree to avoid a massive conflict. The PCI/MSI parts are acked by Bjorn. I have two more branches on top of this. The full conversion of x86 to hierarchical domains and a partial conversion of arm/gic" * 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) genirq: Move irq_chip_write_msi_msg() helper to core PCI/MSI: Allow an msi_controller to be associated to an irq domain PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain PCI/MSI: Enhance core to support hierarchy irqdomain PCI/MSI: Move cached entry functions to irq core genirq: Provide default callbacks for msi_domain_ops genirq: Introduce msi_domain_alloc/free_irqs() asm-generic: Add msi.h genirq: Add generic msi irq domain support genirq: Introduce callback irq_chip.irq_write_msi_msg genirq: Work around __irq_set_handler vs stacked domains ordering issues irqdomain: Introduce helper function irq_domain_add_hierarchy() irqdomain: Implement a method to automatically call parent domains alloc/free genirq: Introduce helper irq_domain_set_info() to reduce duplicated code genirq: Split out flow handler typedefs into seperate header file genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip genirq: Add more helper functions to support stacked irq_chip genirq: Introduce helper functions to support stacked irq_chip irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF ...	2014-12-10 09:01:01 -08:00
Linus Torvalds	ecb50f0afd	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "This is the first (boring) part of irq updates: - support for big endian I/O accessors in the generic irq chip - cleanup of brcmstb/bcm7120 drivers so they can be reused for non ARM SoCs - the usual pile of fixes and updates for the various ARM irq chips" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) irqchip: dw-apb-ictl: Add PM support irqchip: dw-apb-ictl: Enable IRQ_GC_MASK_CACHE_PER_TYPE irqchip: dw-apb-ictl: Always use use {readl\|writel}_relaxed ARM: orion: convert the irq_reg_{readl,writel} calls to the new API irqchip: atmel-aic: Add missing entry for rm9200 irq fixups irqchip: atmel-aic: Rename at91sam9_aic_irq_fixup for naming consistency irqchip: atmel-aic: Add specific irq fixup function for sam9g45 and sam9rl irqchip: atmel-aic: Add irq fixups for at91sam926x SoCs irqchip: atmel-aic: Add irq fixup for RTT block irqchip: brcmstb-l2: Convert driver to use irq_reg_{readl,writel} irqchip: bcm7120-l2: Convert driver to use irq_reg_{readl,writel} irqchip: bcm7120-l2: Decouple driver from brcmstb-l2 irqchip: bcm7120-l2: Extend driver to support 64+ bit controllers irqchip: bcm7120-l2: Use gc->mask_cache to simplify suspend/resume functions irqchip: bcm7120-l2: Fix missing nibble in gc->unused mask irqchip: bcm7120-l2: Make sure all register accesses use base+offset irqchip: bcm7120-l2, brcmstb-l2: Remove ARM Kconfig dependency irqchip: bcm7120-l2: Eliminate bad IRQ check irqchip: brcmstb-l2: Eliminate dependency on ARM code genirq: Generic chip: Add big endian I/O accessors ...	2014-12-10 08:38:57 -08:00
Linus Torvalds	a157508c97	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "The time(r) departement provides: - more infrastructure work on the year 2038 issue - a few fixes in the Armada SoC timers - the usual pile of fixlets and improvements" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: armada-370-xp: Use the reference clock on A375 SoC watchdog: orion: Use the reference clock on Armada 375 SoC clocksource: armada-370-xp: Add missing clock enable time: Fix sign bug in NTP mult overflow warning time: Remove timekeeping_inject_sleeptime() rtc: Update suspend/resume timing to use 64bit time rtc/lib: Provide y2038 safe rtc_tm_to_time()/rtc_time_to_tm() replacement time: Fixup comments to reflect usage of timespec64 time: Expose get_monotonic_coarse64() for in-kernel uses time: Expose getrawmonotonic64 for in-kernel uses time: Provide y2038 safe mktime() replacement time: Provide y2038 safe timekeeping_inject_sleeptime() replacement time: Provide y2038 safe do_settimeofday() replacement time: Complete NTP adjustment threshold judging conditions time: Avoid possible NTP adjustment mult overflow. time: Rename udelay_test.c to test_udelay.c clocksource: sirf: Remove hard-coded clock rate	2014-12-10 08:18:32 -08:00
Linus Torvalds	86c6a2fddf	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y. This instruments might_sleep() checks to catch places that nest blocking primitives - such as mutex usage in a wait loop. Such bugs can result in hard to debug races/hangs. Another category of invalid nesting that this facility will detect is the calling of blocking functions from within schedule() -> sched_submit_work() -> blk_schedule_flush_plug(). There's some potential for false positives (if secondary blocking primitives themselves are not ready yet for this facility), but the kernel will warn once about such bugs per bootup, so the warning isn't much of a nuisance. This feature comes with a number of fixes, for problems uncovered with it, so no messages are expected normally. - Another round of sched/numa optimizations and refinements, for CONFIG_NUMA_BALANCING=y. - Another round of sched/dl fixes and refinements. Plus various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched: Add missing rcu protection to wake_up_all_idle_cpus sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK sched/numa: Init numa balancing fields of init_task sched/deadline: Remove unnecessary definitions in cpudeadline.h sched/cpupri: Remove unnecessary definitions in cpupri.h sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task() sched/fair: Fix stale overloaded status in the busiest group finding logic sched: Move p->nr_cpus_allowed check to select_task_rq() sched/completion: Document when to use wait_for_completion_io_() sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC sched/fair: Kill task_struct::numa_entry and numa_group::task_list sched: Refactor task_struct to use numa_faults instead of numa_ pointers sched/deadline: Don't check CONFIG_SMP in switched_from_dl() sched/deadline: Reschedule from switched_from_dl() after a successful pull sched/deadline: Push task away if the deadline is equal to curr during wakeup sched/deadline: Add deadline rq status print sched/deadline: Fix artificial overrun introduced by yield_task_dl() sched/rt: Clean up check_preempt_equal_prio() sched/core: Use dl_bw_of() under rcu_read_lock_sched() sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu ...	2014-12-09 21:21:34 -08:00
Linus Torvalds	5706ffd045	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events update from Ingo Molnar: "On the kernel side there's few changes, the one that stands out is PEBS machine state sampling support on x86, by Stephane Eranian. On the tooling side: User visible tooling changes: - Don't open the DWARF info multiple times, keeping instead a dwfl handle in struct dso, greatly speeding up 'perf report' on powerpc. (Sukadev Bhattiprolu) - Introduce PARSE_OPT_DISABLED option flag and use it to avoid showing undersired options in tools that provides frontends to 'perf record', like sched, kvm, etc (Namhyung Kim) - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo Carvalho de Melo) - Fix annotation with kcore (Adrian Hunter) - Support source line numbers in annotate using a hotkey (Andi Kleen) - Callchain improvements including: * Enable printing the srcline in the history * Make get_srcline fall back to sym+offset (Andi Kleen) - TUI hist_entry browser fixes, including showing missing overhead value for first level callchain. Detected comparing the output of --stdio/--gui (that matched) with --tui, that had this problem. (Namhyung Kim) - Support handling complete branch stacks as histograms (Andi Kleen) Tooling infrastructure changes: - Prep work for supporting per-pkg and snapshot counters in 'perf stat' (Jiri Olsa) - 'perf stat' refactorings, moving stuff from it to evsel.c to use in per-pkg/snapshot format changes (Jiri Olsa) - Add per-pkg format file parsing (Matt Fleming) - Clean up libelf feature support code (Namhyung Kim) - Add gzip decompression support for kernel modules (Namhyung Kim) - More prep patches for Intel PT, including a a thread stack and more stuff made available via the database export mechanism (Adrian Hunter) - More Intel PT work, including a facility to export sample data (comms, threads, symbol names, etc) in a database friendly way, with an script to use this to create a postgresql database. (Adrian Hunter) - Make sure that thread->mg->machine points to the machine where the thread exists (it was being set only for the kmaps kernel modules case, do it as well for the mmaps) and use it to shorten function signatures (Arnaldo Carvalho de Melo) ... and lots of other fixes and smaller improvements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) perf report: In branch stack mode use address history sorting perf report: Add --branch-history option perf callchain: Support handling complete branch stacks as histograms perf stat: Add support for snapshot counters perf stat: Add support for per-pkg counters perf tools: Remove perf_evsel__read interface perf stat: Use read_counter in read_counter_aggr perf stat: Make read_counter work over the thread dimension perf stat: Use perf_evsel__read_cb in read_counter perf tools: Add snapshot format file parsing perf tools: Add per-pkg format file parsing perf evsel: Introduce perf_evsel__read_cb function perf evsel: Introduce perf_counts_values__scale function perf evsel: Introduce perf_evsel__compute_deltas function perf tools: Allow to force redirect pr_debug to stderr. perf tools: Fix segfault due to invalid kernel dso access perf callchain: Make get_srcline fall back to sym+offset perf symbols: Move bfd_demangle stubbing to its only user perf callchain: Enable printing the srcline in the history perf tools: Collapse first level callchain entry if it has sibling ...	2014-12-09 20:55:37 -08:00
Linus Torvalds	c30110608c	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "These are the main changes in this cycle: - Streamline RCU's use of per-CPU variables, shifting from "cpu" arguments to functions to "this_"-style per-CPU variable accessors. - signal-handling RCU updates. - real-time updates. - torture-test updates. - miscellaneous fixes. - documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) rcu: Fix FIXME in rcu_tasks_kthread() rcu: More info about potential deadlocks with rcu_read_unlock() rcu: Optimize cond_resched_rcu_qs() rcu: Add sparse check for RCU_INIT_POINTER() documentation: memory-barriers.txt: Correct example for reorderings documentation: Add atomic_long_t to atomic_ops.txt documentation: Additional restriction for control dependencies documentation: Document RCU self test boot params rcutorture: Fix rcu_torture_cbflood() memory leak rcutorture: Remove obsolete kversion param in kvm.sh rcutorture: Remove stale test configurations rcutorture: Enable RCU self test in configs rcutorture: Add early boot self tests torture: Run Linux-kernel binary out of results directory cpu: Avoid puts_pending overflow rcu: Remove "cpu" argument to rcu_cleanup_after_idle() rcu: Remove "cpu" argument to rcu_prepare_for_idle() rcu: Remove "cpu" argument to rcu_needs_cpu() rcu: Remove "cpu" argument to rcu_note_context_switch() rcu: Remove "cpu" argument to rcu_preempt_check_callbacks() ...	2014-12-09 20:23:19 -08:00
Eric W. Biederman	f0d62aec93	userns: Rename id_map_mutex to userns_state_mutex Generalize id_map_mutex so it can be used for more state of a user namespace. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-09 17:08:33 -06:00
Eric W. Biederman	f95d7918bd	userns: Only allow the creator of the userns unprivileged mappings If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-09 17:08:32 -06:00
Eric W. Biederman	80dd00a237	userns: Check euid no fsuid when establishing an unprivileged uid mapping setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-09 17:08:32 -06:00
Eric W. Biederman	be7c6dba23	userns: Don't allow unprivileged creation of gid mappings As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-09 17:08:24 -06:00
Eric W. Biederman	273d2c67c3	userns: Don't allow setgroups until a gid mapping has been setablished setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Cc: stable@vger.kernel.org Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-09 16:58:40 -06:00
Arianna Avanzini	7eca210375	blktrace: don't let the sysfs interface remove trace from running list Currently, blktrace can be started/stopped via its ioctl-based interface (used by the userspace blktrace tool) or via its ftrace interface. The function blk_trace_remove_queue(), called each time an "enable" tunable of the ftrace interface transitions to zero, removes the trace from the running list, even if no function from the sysfs interface adds it to such a list. This leads to a null pointer dereference. This commit changes the blk_trace_remove_queue() function so that it does not remove the blk_trace from the running list. v2: - Now the patch removes the invocation of list_del() instead of adding an useless if branch, as suggested by Namhyung Kim. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-09 14:59:09 -07:00
Paul Moore	0f7e94ee40	Merge branch 'next' into upstream for v3.19	2014-12-09 14:38:30 -05:00
Al Viro	ba00410b81	Merge branch 'iov_iter' into for-next	2014-12-08 20:39:29 -05:00
Rafael J. Wysocki	e3d857e1ae	Merge branch 'pm-runtime' * pm-runtime: (25 commits) i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hwrandom / exynos / PM: Use CONFIG_PM in #ifdef block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM USB / PM: Drop CONFIG_PM_RUNTIME from the USB core PM: Merge the SET*_RUNTIME_PM_OPS() macros PM / Kconfig: Do not select PM directly from Kconfig files PCI / PM: Drop CONFIG_PM_RUNTIME from the PCI core ...	2014-12-08 20:00:44 +01:00
Rafael J. Wysocki	389cbf36e5	Merge branches 'pm-domains', 'pm-sleep' and 'pm-tools' * pm-domains: ARM: shmobile: Convert to genpd flags for PM clocks for R-mobile ARM: shmobile: Convert to genpd flags for PM clocks for r8a7779 PM / Domains: Initial PM clock support for genpd PM / Domains: Power on the PM domain right after attach completes PM / Domains: Move struct pm_domain_data to pm_domain.h PM / Domains: Extract code to power off/on a PM domain PM / Domains: Make genpd parameter of pm_genpd_present() const * pm-sleep: PM / hibernate: Deletion of an unnecessary check before the function call "vfree" PM / Hibernate: Migrate to ktime_t * pm-tools: tools: cpupower: fix return checks for sysfs_get_idlestate_count()	2014-12-08 20:00:02 +01:00
Rafael J. Wysocki	5aee40e4f7	Merge branches 'powercap', 'pm-clk', 'pm-config' and 'pm-opp' * powercap: powercap / RAPL: fix build dependency on iosf_mbi powercap / RAPL: add new model ids powercap / RAPL: handle atom and core differences powercap / RAPL: abstract per cpu type functions * pm-clk: PM / clock_ops: make __pm_clk_enable more generic PM / clock_ops: Add pm_clk_add_clk() * pm-config: PM: Kconfig: fix unmet dependency for CPU_PM * pm-opp: PM / OPP replace kfree_rcu() with call_srcu() in opp_set_availability() PM / OPP Introduce APIs to remove OPPs PM / OPP mark OPPs as 'static' or 'dynamic' PM / OPP don't match for existing OPPs when list is empty PM / OPP rename 'head' as 'rcu_head' or 'srcu_head' based on its type	2014-12-08 19:57:41 +01:00
NeilBrown	008847f66c	workqueue: allow rescuer thread to do more work. When there is serious memory pressure, all workers in a pool could be blocked, and a new thread cannot be created because it requires memory allocation. In this situation a WQ_MEM_RECLAIM workqueue will wake up the rescuer thread to do some work. The rescuer will only handle requests that are already on ->worklist. If max_requests is 1, that means it will handle a single request. The rescuer will be woken again in 100ms to handle another max_requests requests. I've seen a machine (running a 3.0 based "enterprise" kernel) with thousands of requests queued for xfslogd, which has a max_requests of 1, and is needed for retiring all 'xfs' write requests. When one of the worker pools gets into this state, it progresses extremely slowly and possibly never recovers (only waited an hour or two). With this patch we leave a pool_workqueue on mayday list until it is clearly no longer in need of assistance. This allows all requests to be handled in a timely fashion. We keep each pool_workqueue on the mayday list until need_to_create_worker() is false, and no work for this workqueue is found in the pool. I have tested this in combination with a (hackish) patch which forces all work items to be handled by the rescuer thread. In that context it significantly improves performance. A similar patch for a 3.0 kernel significantly improved performance on a heavy work load. Thanks to Jan Kara for some design ideas, and to Dongsu Park for some comments and testing. tj: Inverted the lock order between wq_mayday_lock and pool->lock with a preceding patch and simplified this patch. Added comment and updated changelog accordingly. Dongsu spotted missing get_pwq() in the simplified code. Cc: Dongsu Park <dongsu.park@profitbricks.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-12-08 12:39:16 -05:00
Tejun Heo	b2d829096b	workqueue: invert the order between pool->lock and wq_mayday_lock Currently, pool->lock nests inside pool->lock. There's no inherent reason for this order. The only place where the two locks are held together is pool_mayday_timeout() and it just got decided that way. This nesting order turns out to complicate things with the planned rescuer_thread() update. Let's invert them. This doesn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: NeilBrown <neilb@suse.de> Cc: Dongsu Park <dongsu.park@profitbricks.com>	2014-12-08 12:39:16 -05:00
Andy Lutomirski	fd7de1e8d5	sched: Add missing rcu protection to wake_up_all_idle_cpus Locklessly doing is_idle_task(rq->curr) is only okay because of RCU protection. The older variant of the broken code checked rq->curr == rq->idle instead and therefore didn't need RCU. Fixes: `f6be8af1c9` ("sched: Add new API wake_up_if_idle() to wake up the idle cpu") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Chuansheng Liu <chuansheng.liu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/729365dddca178506dfd0a9451006344cd6808bc.1417277372.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:44:19 +01:00
Dave Airlie	8c86394470	Linux 3.18 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhNLZAAoJEHm+PkMAQRiGAEcH/iclYDW7k2GKemMqboy+Ohmh +ELbQothNhlGZlS1wWdD69LBiiXkkQ+ufVYFh/hC0oy0gUdfPMt5t+bOHy6cjn6w 9zOcACtpDKnqbOwRqXZjZgNmIabk7lRjbn7GK4GQqpIaW4oO0FWcT91FFhtGSPDa tjtmGRqDmbNsqfzr18h0WPEpUZmT6MxIdv17AYDliPB1MaaRuAv1Kss05TJrXdfL Oucv+C0uwnybD9UWAz6pLJ3H/HR9VJFdkaJ4Y0pbCHAuxdd1+swoTpicluHlsJA1 EkK5iWQRMpcmGwKvB0unCAQljNpaJiq4/Tlmmv8JlYpMlmIiVLT0D8BZx5q05QQ= =oGNw -----END PGP SIGNATURE----- Merge tag 'v3.18' into drm-next Linux 3.18 Backmerge Linus tree into -next as we had conflicts in i915/radeon/nouveau, and everyone was solving them individually. * tag 'v3.18': (57 commits) Linux 3.18 watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7 uapi: fix to export linux/vm_sockets.h i2c: cadence: Set the hardware time-out register to maximum value i2c: davinci: generate STP always when NACK is received ahci: disable MSI on SAMSUNG 0xa800 SSD context_tracking: Restore previous state in schedule_user slab: fix nodeid bounds check for non-contiguous node IDs lib/genalloc.c: export devm_gen_pool_create() for modules mm: fix anon_vma_clone() error treatment mm: fix swapoff hang after page migration and fork fat: fix oops on corrupted vfat fs ipc/sem.c: fully initialize sem_array before making it visible drivers/input/evdev.c: don't kfree() a vmalloc address cxgb4: Fill in supported link mode for SFP modules xen-netfront: Remove BUGs on paged skb data which crosses a page boundary mm/vmpressure.c: fix race in vmpressure_work_fn() mm: frontswap: invalidate expired data on a dup-store failure mm: do not overwrite reserved pages counter at show_mem() drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 ... Conflicts: drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/nouveau/nouveau_drm.c drivers/gpu/drm/radeon/radeon_cs.c	2014-12-08 10:33:52 +10:00
Thomas Gleixner	74faaf7aa6	genirq: Move irq_chip_write_msi_msg() helper to core No point to expose this to the world. The only legitimate user is the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com>	2014-12-07 21:49:45 +01:00
Greg Kroah-Hartman	dd63af108f	Merge 3.18-rc7 into tty-next This resolves the merge issue with drivers/tty/serial/of_serial.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-12-06 08:17:24 -08:00
Alexei Starovoitov	ddd872bc30	bpf: verifier: add checks for BPF_ABS \| BPF_IND instructions introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used for attaching programs to sockets where ctx == skb. add verifier checks for ABS/IND instructions which can only be seen in socket filters, therefore the check: if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER) verbose("BPF_LD_ABS\|IND instructions are only allowed in socket filters\n"); Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-05 21:47:32 -08:00
Eric W. Biederman	0542f17bf2	userns: Document what the invariant required for safe unprivileged mappings. The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-05 19:07:26 -06:00
Eric W. Biederman	7ff4d90b4c	groups: Consolidate the setgroups permission checks Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-05 17:19:27 -06:00
Daniel Vetter	7bd0e226e3	drm/i915: compute wait_ioctl timeout correctly We've lost the +1 required for correct timeouts in commit `5ed0bdf21a` Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 16 21:05:06 2014 +0000 drm: i915: Use nsec based interfaces Use ktime_get_raw_ns() and get rid of the back and forth timespec conversions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: John Stultz <john.stultz@linaro.org> So fix this up by reinstating our handrolled _timeout function. While at it bother with handling MAX_JIFFIES. v2: Convert to usecs (we don't care about the accuracy anyway) first to avoid overflow issues Dave Gordon spotted. v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should take care of that already. It might be a bit too enthusiastic about it though. v4: Chris has a much nicer color, so use his implementation. This requires to export nsec_to_jiffies from time.c. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-12-05 15:20:24 +02:00
Al Viro	f77c80142e	bury struct proc_ns in fs/proc a) make get_proc_ns() return a pointer to struct ns_common b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that is_mnt_ns_file() could get away with fewer dereferences. That way struct proc_ns becomes invisible outside of fs/proc/*.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:54 -05:00
Al Viro	33c429405a	copy address of proc_ns_ops into ns_common Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:47 -05:00
Al Viro	6344c433a4	new helpers: ns_alloc_inum/ns_free_inum take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:36 -05:00
Al Viro	64964528b2	make proc_ns_operations work with struct ns_common * instead of void * We can do that now. And kill ->inum(), while we are at it - all instances are identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:17 -05:00
Al Viro	3c04118461	switch the rest of proc_ns_operations to working with &...->ns Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:11 -05:00
Al Viro	435d5f4bb2	common object embedded into various struct ....ns for now - just move corresponding ->proc_inum instances over there Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:31:00 -05:00
Tejun Heo	0479c8c549	workqueue: cosmetic update in rescuer_thread() rescuer_thread() caches &rescuer->scheduled in a local variable scheduled for convenience. There's one WARN_ON_ONCE() which was using &rescuer->scheduled directly. Replace it with the local variable. This patch causes no functional difference. Signed-off-by: Tejun Heo <tj@kernel.org>	2014-12-04 10:14:54 -05:00
Andy Lutomirski	7cc78f8fa0	context_tracking: Restore previous state in schedule_user It appears that some SCHEDULE_USER (asm for schedule_user) callers in arch/x86/kernel/entry_64.S are called from RCU kernel context, and schedule_user will return in RCU user context. This causes RCU warnings and possible failures. This is intended to be a minimal fix suitable for 3.18. Reported-and-tested-by: Dave Jones <davej@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-03 20:55:58 -08:00
Rafael J. Wysocki	d30d819dc8	PM: Drop CONFIG_PM_RUNTIME from the driver core After commit `b2b49ccbdd` (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so quite a few depend on CONFIG_PM or even may be dropped entirely in some cases. Replace CONFIG_PM_RUNTIME with CONFIG_PM in the PM core code. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-12-04 00:46:58 +01:00
Dan Carpenter	3558a5ac50	tracing: Truncated output is better than nothing The initial reason for this patch is that I noticed that: if (len > TRACE_BUF_SIZE) is off by one. In this code, if len == TRACE_BUF_SIZE, then it means we have truncated the last character off the output string. If we truncate two or more characters then we exit without printing. After some discussion, we decided that printing truncated data is better than not printing at all so we should just use vscnprintf() and remove the test entirely. Also I have updated memcpy() to copy the NUL char instead of setting the NUL in a separate step. Link: http://lkml.kernel.org/r/20141127155752.GA21914@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-03 17:10:14 -05:00
Byungchul Park	8e1e1df29d	tracing: Add additional marks to signal very large time deltas Currently, function graph tracer prints "!" or "+" just before function execution time to signal a function overhead, depending on the time. And some tracers tracing latency also print "!" or "+" just after time to signal overhead, depending on the interval between events. Even it is usually enough to do that, we sometimes need to signal for bigger execution time than 100 micro seconds. For example, I used function graph tracer to detect if there is any case that exit_mm() takes too much time. I did following steps in /sys/kernel/debug/tracing. It was easier to detect very large excution time with patched kernel than with original kernel. $ echo exit_mm > set_graph_function $ echo function_graph > current_tracer $ echo > trace $ cat trace_pipe > $LOGFILE ... (do something and terminate logging) $ grep "\\$" $LOGFILE 3) $ 22082032 us \| } /* kernel_map_pages / 3) $ 22082040 us \| } / free_pages_prepare / 3) $ 22082113 us \| } / free_hot_cold_page / 3) $ 22083455 us \| } / free_hot_cold_page_list / 3) $ 22083895 us \| } / release_pages / 3) $ 22177873 us \| } / free_pages_and_swap_cache / 3) $ 22178929 us \| } / unmap_single_vma / 3) $ 22198885 us \| } / unmap_vmas / 3) $ 22206949 us \| } / exit_mmap / 3) $ 22207659 us \| } / mmput / 3) $ 22207793 us \| } / exit_mm */ And then, it was easy to find out that a schedule-out occured by sub_preempt_count() within kernel_map_pages(). To detect very large function exection time caused by either problematic function implementation or scheduling issues, this patch can be useful. Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.com Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-03 17:10:13 -05:00
Steven Rostedt (Red Hat)	eabb8980a9	tracing: Allow NOT to filter AND and OR clauses Add support to allow not "!" for and (&&) and (\|\|). That is: !(field1 == X && field2 == Y) Where the value of the full clause will be notted. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-03 10:00:27 -05:00
Steven Rostedt (Red Hat)	e12c09cf30	tracing: Add NOT to filtering logic Ted noticed that he could not filter on an event for a bit being cleared. That's because the filtering logic only tests event fields with a limited number of comparisons which, for bit logic, only include "&", which can test if a bit is set, but there's no good way to see if a bit is clear. This adds a way to do: !(field & 2048) Which returns true if the bit is not set, and false otherwise. Note, currently !(field1 == 10 && field2 == 15) is not supported. That is, the 'not' only works for direct comparisons, not for the AND and OR logic. Link: http://lkml.kernel.org/r/20141202021912.GA29096@thunk.org Link: http://lkml.kernel.org/r/20141202120430.71979060@gandalf.local.home Acked-by: Alexei Starovoitov <ast@plumgrid.com> Suggested-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-03 10:00:13 -05:00
Dave Airlie	e8115e79aa	Linux 3.18-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUe7l9AAoJEHm+PkMAQRiGkGcIAIryQ7NKn4IaxUtS807Lx4Ih obEnx7nNKZTXCZpD/7XQGHMMJyozMJR50PHZESJoHu4Luhv9h7EFRnyJ6MdqMlwn zla3zY0yRsHwPoJKcHbSE0CPHZz0WPQHj7IEbM+XJz2tMNJfbgTrezElmcCM4DRp c9ae+ggwZ2cyNYM0r2RSwSJ525WMh69f9dzSUE27fpvkllQgwqNs/jHYz8HNOEht FWcv5UhvzKjwJS3awULfOB3zH2QdFvVTrwAzd+kbV2Q6T6CaUoFRlhXeKUO6W2Jv pJM6oj8tMZUkdXEv7EQXT1kwEqC4DULTTTHs4tSF79O1ESmNfePiOwwBcwoM2nM= =kG1Y -----END PGP SIGNATURE----- Merge tag 'v3.18-rc7' into drm-next This fixes a bunch of conflicts prior to merging i915 tree. Linux 3.18-rc7 Conflicts: drivers/gpu/drm/exynos/exynos_drm_drv.c drivers/gpu/drm/i915/i915_drv.c drivers/gpu/drm/i915/intel_pm.c drivers/gpu/drm/tegra/dc.c	2014-12-02 10:58:33 +10:00
David S. Miller	60b7379dc5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-11-29 20:47:48 -08:00
Rafael J. Wysocki	af261127e9	Merge back earlier 'pm-runtime' material for 3.19-rc1.	2014-11-27 01:37:53 +01:00
John Stultz	cb2aa63469	time: Fix sign bug in NTP mult overflow warning In commit `6067dc5a8c` ("time: Avoid possible NTP adjustment mult overflow") a new check was added to watch for adjustments that could cause a mult overflow. Unfortunately the check compares a signed with unsigned value and ignored the case where the adjustment was negative, which causes spurious warn-ons on some systems (and seems like it would result in problematic time adjustments there as well, due to the early return). Thus this patch adds a check to make sure the adjustment is positive before we check for an overflow, and resovles the issue in my testing. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Debugged-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-25 07:18:34 +01:00
Andy Lutomirski	82975bc6a6	uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 14:25:28 -08:00
Thomas Gleixner	90e362f4a7	sched: Provide update_curr callbacks for stop/idle scheduling classes Chris bisected a NULL pointer deference in task_sched_runtime() to commit `6e998916df` 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'. Chris observed crashes in atop or other /proc walking programs when he started fork bombs on his machine. He assumed that this is a new exit race, but that does not make any sense when looking at that commit. What's interesting is that, the commit provides update_curr callbacks for all scheduling classes except stop_task and idle_task. While nothing can ever hit that via the clock_nanosleep() and clock_gettime() interfaces, which have been the target of the commit in question, the author obviously forgot that there are other code paths which invoke task_sched_runtime() do_task_stat(() thread_group_cputime_adjusted() thread_group_cputime() task_cputime() task_sched_runtime() if (task_current(rq, p) && task_on_rq_queued(p)) { update_rq_clock(rq); up->sched_class->update_curr(rq); } If the stats are read for a stomp machine task, aka 'migration/N' and that task is current on its cpu, this will happily call the NULL pointer of stop_task->update_curr. Ooops. Chris observation that this happens faster when he runs the fork bomb makes sense as the fork bomb will kick migration threads more often so the probability to hit the issue will increase. Add the missing update_curr callbacks to the scheduler classes stop_task and idle_task. While idle tasks cannot be monitored via /proc we have other means to hit the idle case. Fixes: `6e998916df` 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency' Reported-by: Chris Mason <clm@fb.com> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 14:14:40 -08:00
Jiang Liu	38b6a1cf3e	PCI/MSI: Move cached entry functions to irq core Required to support non PCI based MSI. [ tglx: Extracted from Jiangs patch series ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:47 +01:00
Jiang Liu	aeeb59657c	genirq: Provide default callbacks for msi_domain_ops Extend struct msi_domain_info and provide default callbacks for msi_domain_ops. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/1416061447-9472-8-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:47 +01:00
Jiang Liu	d9109698be	genirq: Introduce msi_domain_alloc/free_irqs() Introduce msi_domain_{alloc\|free}_irqs() to alloc/free interrupts from generic MSI irqdomain. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/1416061447-9472-7-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:47 +01:00
Jiang Liu	f3cf8bb0d6	genirq: Add generic msi irq domain support Implement the basic functions for MSI interrupt support with hierarchical interrupt domains. [ tglx: Extracted and combined from several patches ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:47 +01:00
Marc Zyngier	f86eff222f	genirq: Work around __irq_set_handler vs stacked domains ordering issues With the introduction of stacked domains, we have the issue that, depending on where in the stack this is called, __irq_set_handler will succeed or fail: If this is called from the inner irqchip, __irq_set_handler() will fail, as it will look at the outer domain as the (desc->irq_data.chip == &no_irq_chip) test fails (we haven't set the top level yet). This patch implements the following: "If there is at least one valid irqchip in the domain, it will probably sort itself out". This is clearly not ideal, but it is far less confusing then crashing because the top-level domain is not up yet. [ tglx: Added comment and a protection against chained interrupts in that context ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1416048553-29289-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:47 +01:00
Jiang Liu	afb7da83b9	irqdomain: Introduce helper function irq_domain_add_hierarchy() Introduce helper function irq_domain_add_hierarchy(), which creates a linear irqdomain if parameter 'size' is not zero, otherwise creates a tree irqdomain. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Link: http://lkml.kernel.org/r/1416061447-9472-5-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	36d727310c	irqdomain: Implement a method to automatically call parent domains alloc/free Add a flags to irq_domain.flags to control whether the irqdomain core should automatically call parent irqdomain's alloc/free callbacks. It help to reduce hierarchy irqdomains users' code size. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Link: http://lkml.kernel.org/r/1416061447-9472-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	1b5377087c	genirq: Introduce helper irq_domain_set_info() to reduce duplicated code Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	2cb625478f	genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip Add IRQ_SET_MASK_OK_DONE in addition to IRQ_SET_MASK_OK and IRQ_SET_MASK_OK_NOCOPY to support stacked irqchip. IRQ_SET_MASK_OK_DONE is the same as IRQ_SET_MASK_OK to irq core. To stacked irqchip, it means that ascendant irqchips have done all the work and no more handling needed in descendant irqchips. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	515085ef7e	genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip Add callback irq_compose_msi_msg to struct irq_chip, which will be used to support stacked irqchip. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Yingjoe Chen	56e8abab61	genirq: Add more helper functions to support stacked irq_chip Add more helper function for stacked irq_chip to just call parent's function. Signed-off-by: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Gran Likely <grant.likely@linaro.org> Cc: Boris BREZILLON <boris.brezillon@free-electrons.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: <srv_heupstream@mediatek.com> Cc: <yingjoe.chen@gmail.com> Cc: <hc.yen@mediatek.com> Cc: <eddie.huang@mediatek.com> Cc: <nathan.chung@mediatek.com> Cc: <yh.chen@mediatek.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/1415893029-2971-3-git-send-email-yingjoe.chen@mediatek.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	85f08c17de	genirq: Introduce helper functions to support stacked irq_chip Now we already support hierarchy irq_data, so introduce several helpers to support stacked irq_chips. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Yingjoe Chen	0cc01abab6	irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF It is possible to call irq_create_of_mapping to create/translate the same IRQ from DT for multiple times. Perform irq_find_mapping check and set_type for hierarchy irqdomain in irq_create_of_mapping() to avoid duplicate these functionality in all outer most irqdomain. Signed-off-by: Yingjoe Chen <yingjoe.chen@mediatek.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:46 +01:00
Jiang Liu	f8264e3496	irqdomain: Introduce new interfaces to support hierarchy irqdomains We plan to use hierarchy irqdomain to suppport CPU vector assignment, interrupt remapping controller, IO-APIC controller, MSI interrupt and hypertransport interrupt etc on x86 platforms. So extend irqdomain interfaces to support hierarchy irqdomain. There are already many clients of current irqdomain interfaces. To minimize the changes, we choose to introduce new version 2 interfaces to support hierarchy instead of extending existing irqdomain interfaces. According to Thomas's suggestion, the most important design decision is to build hierarchy struct irq_data to support hierarchy irqdomain, so hierarchy irqdomain related data could be saved in struct irq_data. With support of hierarchy irq_data, we could also support stacked irq_chips. This is most useful in case of set_affinity(). The new hierarchy irqdomain introduces following interfaces: 1) irq_domain_alloc_irqs()/irq_domain_free_irqs(): allocate/release IRQ and related resources. 2) __irq_domain_alloc_irqs(): a special version to support legacy IRQs. 3) irq_domain_activate_irq()/irq_domain_deactivate_irq(): program interrupt controllers to activate/deactivate interrupt. There are also several help functions to ease irqdomain implemenations: 1) irq_domain_get_irq_data(): get irq_data associated with a specific irqdomain. 2) irq_domain_set_hwirq_and_chip(): save irqdomain specific data into irq_data. 3) irq_domain_alloc_irqs_parent()/irq_domain_free_irqs_parent(): invoke parent irqdomain's alloc/free callbacks. We also changed irq_startup()/irq_shutdown() to invoke irq_domain_activate_irq()/irq_domain_deactivate_irq() to program interrupt controller when start/stop interrupts. [ tglx: Folded parts of the later patch series in ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:45 +01:00
Tejun Heo	cceb9bd633	Merge branch 'master' into for-3.19 Pull in to receive `54ef6df3f3` ("rcu: Provide counterpart to rcu_dereference() for non-RCU situations"). Signed-off-by: Tejun Heo <tj@kernel.org>	2014-11-22 09:32:08 -05:00
David S. Miller	1459143386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ieee802154/fakehard.c A bug fix went into 'net' for ieee802154/fakehard.c, which is removed in 'net-next'. Add build fix into the merge from Stephen Rothwell in openvswitch, the logging macros take a new initial 'log' argument, a new call was added in 'net' so when we merge that in here we have to explicitly add the new 'log' arg to it else the build fails. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 22:28:24 -05:00
Linus Torvalds	8b2ed21e84	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: two NUMA fixes, two cputime fixes and an RCU/lockdep fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency sched/cputime: Fix cpu_timer_sample_group() double accounting sched/numa: Avoid selecting oneself as swap target sched/numa: Fix out of bounds read in sched_init_numa() sched: Remove lockdep check in sched_move_task()	2014-11-21 15:44:54 -08:00
Linus Torvalds	13f5004c94	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a build dependencies fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP perf: Fix corruption of sibling list with hotplug perf/x86: Fix embarrasing typo	2014-11-21 15:44:07 -08:00
John Stultz	5322e4c264	time: Fixup comments to reflect usage of timespec64 Fix up a few comments that weren't updated when the functions were converted to use timespec64 structures. Acked-by: Arnd Bergmann <arnd.bergmann@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:59 -08:00
John Stultz	334334b5f5	time: Expose get_monotonic_coarse64() for in-kernel uses Adds a timespec64 based get_monotonic_coarse64() implementation that can be used as we convert internal users of get_monotonic_coarse away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:59 -08:00
John Stultz	cdba2ec538	time: Expose getrawmonotonic64 for in-kernel uses Adds a timespec64 based getrawmonotonic64() implementation that can be used as we convert internal users of getrawmonotonic away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:58 -08:00
pang.xunlei	90b6ce9c40	time: Provide y2038 safe mktime() replacement As part of addressing "y2038 problem" for in-kernel uses, this patch adds safe mktime64() using time64_t. After this patch, mktime() is deprecated and all its call sites will be fixed using mktime64(), after that it can be removed. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:58 -08:00
pang.xunlei	04d9089086	time: Provide y2038 safe timekeeping_inject_sleeptime() replacement As part of addressing "y2038 problem" for in-kernel uses, this patch adds timekeeping_inject_sleeptime64() using timespec64. After this patch, timekeeping_inject_sleeptime() is deprecated and all its call sites will be fixed using the new interface, after that it can be removed. NOTE: timekeeping_inject_sleeptime() is safe actually, but we want to eliminate timespec eventually, so comes this patch. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:57 -08:00
pang.xunlei	21f7eca555	time: Provide y2038 safe do_settimeofday() replacement The kernel uses 32-bit signed value(time_t) for seconds elapsed 1970-01-01:00:00:00, thus it will overflow at 2038-01-19 03:14:08 on 32-bit systems. This is widely known as the y2038 problem. As part of addressing "y2038 problem" for in-kernel uses, this patch adds safe do_settimeofday64() using timespec64. After this patch, do_settimeofday() is deprecated and all its call sites will be fixed using do_settimeofday64(), after that it can be removed. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:57 -08:00
pang.xunlei	659bc17b80	time: Complete NTP adjustment threshold judging conditions The clocksource mult-adjustment threshold is [mult-maxadj, mult+maxadj], timekeeping_adjust() only deals with the upper threshold, but misses the lower threshold. This patch adds the lower threshold judging condition. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Minor fix for > 80 char line] Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:56 -08:00
pang.xunlei	6067dc5a8c	time: Avoid possible NTP adjustment mult overflow. Ideally, __clocksource_updatefreq_scale, selects the largest shift value possible for a clocksource. This results in the mult memember of struct clocksource being particularly large, although not so large that NTP would adjust the clock to cause it to overflow. That said, nothing actually prohibits an overflow from occuring, its just that it "shouldn't" occur. So while very unlikely, and so far never observed, the value of (cs->mult+cs->maxadj) may have a chance to reach very near 0xFFFFFFFF, so there is a possibility it may overflow when doing NTP positive adjustment See the following detail: When NTP slewes the clock, kernel goes through update_wall_time()->...->timekeeping_apply_adjustment(): tk->tkr.mult += mult_adj; Since there is no guard against it, its possible tk->tkr.mult may overflow during this operation. This patch avoids any possible mult overflow by judging the overflow case before adding mult_adj to mult, also adds the WARNING message when capturing such case. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Reworded commit message] Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:56 -08:00
John Stultz	fd866e2b11	time: Rename udelay_test.c to test_udelay.c Kees requested that this test module be renamed for consistency sake, so this patch renames the udelay_test.c file (recently added to tip/timers/core for 3.17) to test_udelay.c Cc: Kees Cook <keescook@chromium.org> Cc: Greg KH <greg@kroah.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linux-Next <linux-next@vger.kernel.org> Cc: David Riley <davidriley@chromium.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-11-21 11:59:55 -08:00
Masami Hiramatsu	1d70be34df	kprobes: Add IPMODIFY flag to kprobe_ftrace_ops Add FTRACE_OPS_FL_IPMODIFY flag to kprobe_ftrace_ops since kprobes can changes regs->ip. Link: http://lkml.kernel.org/r/20141121102523.11844.21298.stgit@localhost.localdomain Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-21 14:44:15 -05:00
Masami Hiramatsu	f8b8be8a31	ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among ftrace users who may modify regs->ip to change the execution path. If two or more users modify the regs->ip on the same function entry, one of them will be broken. So they must add IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds. Note that ftrace doesn't allow ftrace_ops which has IPMODIFY flag to have notrace hash, and the ftrace_ops must have a filter hash (so that the ftrace_ops can hook only specific entries), because it strongly depends on the address and must be allowed for only few selected functions. Link: http://lkml.kernel.org/r/20141121102516.11844.27829.stgit@localhost.localdomain Cc: Jiri Kosina <jkosina@suse.cz> Cc: Seth Jennings <sjenning@redhat.com> Cc: Petr Mladek <pmladek@suse.cz> Cc: Vojtech Pavlik <vojtech@suse.cz> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ fixed up some of the comments ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-21 14:42:10 -05:00
Steven Rostedt (Red Hat)	04b74b27c2	printk/percpu: Define printk_func when printk is not defined To avoid include hell, the per_cpu variable printk_func was declared in percpu.h. But it is only defined if printk is defined. As users of printk may also use the printk_func variable, it needs to be defined even if CONFIG_PRINTK is not. Also add a printk.h include in percpu.h just to be safe. Link: http://lkml.kernel.org/r/20141121183215.01ba539c@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-21 11:19:15 -05:00
Steven Rostedt (Red Hat)	0af26492d5	tracing/trivial: Fix typos and make an int into a bool Fix up a few typos in comments and convert an int into a bool in update_traceon_count(). Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-20 10:05:36 -05:00
Jiri Kosina	a02001086b	Merge Linus' tree to be be to apply submitted patches to newer code than current trivial.git base	2014-11-20 14:42:02 +01:00
Frans Klaver	eff264efee	kernel: trace: fix printk message s,produciton,production Signed-off-by: Frans Klaver <frans.klaver@xsens.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-11-20 14:29:19 +01:00
Ingo Molnar	d360b78f99	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Streamline RCU's use of per-CPU variables, shifting from "cpu" arguments to functions to "this_"-style per-CPU variable accessors. - Signal-handling RCU updates. - Real-time updates. - Torture-test updates. - Miscellaneous fixes. - Documentation updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-20 08:57:58 +01:00
Steven Rostedt (Red Hat)	afdc34a3d3	printk: Add per_cpu printk func to allow printk to be diverted Being able to divert printk to call another function besides the normal logging is useful for such things like NMI handling. If some functions are to be called from NMI that does printk() it is possible to lock up the box if the nmi handler triggers when another printk is happening. One example of this use is to perform a stack trace on all CPUs via NMI. But if the NMI is to do the printk() it can cause the system to lock up. By allowing the printk to be diverted to another function that can safely record the printk output and then print it when it in a safe context then NMIs will be safe to call these functions like show_regs(). Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:21 -05:00
Steven Rostedt (Red Hat)	8d58e99af5	seq_buf: Move the seq_buf code to lib/ The seq_buf functions are rather useful outside of tracing. Instead of having it be dependent on CONFIG_TRACING, move the code into lib/ and allow other users to have access to it even when tracing is not configured. The seq_buf utility is similar to the seq_file utility, but instead of writing sending data back up to userland, it writes it into a buffer defined at seq_buf_init(). This allows us to send a descriptor around that writes printf() formatted strings into it that can be retrieved later. It is currently used by the tracing facility for such things like trace events to convert its binary saved data in the ring buffer into an ASCII human readable context to be displayed in /sys/kernel/debug/trace. It can also be used for doing NMI prints safely from NMI context into the seq_buf and retrieved later and dumped to printk() safely. Doing printk() from an NMI context is dangerous because an NMI can preempt a current printk() and deadlock on it. Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:20 -05:00
Steven Rostedt (Red Hat)	2448913ed2	seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF The function bstr_printf() from lib/vsprnintf.c is only available if CONFIG_BINARY_PRINTF is defined. This is due to the only user currently being the tracing infrastructure, which needs to select this config when tracing is configured. Until there is another user of the binary printf formats, this will continue to be the case. Since seq_buf.c is now lives in lib/ and is compiled even without tracing, it must encompass its use of bstr_printf() which is used by seq_buf_printf(). This too is only used by the tracing infrastructure and is still encapsulated by the CONFIG_BINARY_PRINTF. Link: http://lkml.kernel.org/r/20141104160222.969013383@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:19 -05:00
Steven Rostedt (Red Hat)	01cb06a4c2	tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions Add two helper functions; seq_buf_get_buf() and seq_buf_commit() that are used by seq_buf_path(). This makes the code similar to the seq_file: seq_path() function, and will help to be able to consolidate the functions between seq_file and trace_seq. Link: http://lkml.kernel.org/r/20141104160222.644881406@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.977571447@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:18 -05:00
Steven Rostedt (Red Hat)	8cd709ae76	tracing: Have seq_buf use full buffer Currently seq_buf is full when all but one byte of the buffer is filled. Change it so that the seq_buf is full when all of the buffer is filled. Some of the functions would fill the buffer completely and report everything was fine. This was inconsistent with the max of size - 1. Changing this to be max of size makes all functions consistent. Link: http://lkml.kernel.org/r/20141104160222.502133196@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.811957882@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)	9b77215382	seq_buf: Add seq_buf_can_fit() helper function Add a seq_buf_can_fit() helper function that removes the possible mistakes of comparing the seq_buf length plus added data compared to the size of the buffer. Link: http://lkml.kernel.org/r/20141118164025.GL23958@pathway.suse.cz Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)	820b75f63d	tracing: Add paranoid size check in trace_printk_seq() To be really paranoid about writing out of bound data in trace_printk_seq(), add another check of len compared to size. Link: http://lkml.kernel.org/r/20141119144004.GB2332@dhcp128.suse.cz Suggested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:16 -05:00
Steven Rostedt (Red Hat)	5ac4837841	tracing: Use trace_seq_used() and seq_buf_used() instead of len As the seq_buf->len will soon be +1 size when there's an overflow, we must use trace_seq_used() or seq_buf_used() methods to get the real length. This will prevent buffer overflow issues if just the len of the seq_buf descriptor is used to copy memory. Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home Reported-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:15 -05:00
Steven Rostedt (Red Hat)	74f06bb723	tracing: Clean up tracing_fill_pipe_page() The function tracing_fill_pipe_page() logic is a little confusing with the use of count saving the seq.len and reusing it. Instead of subtracting a number that is calculated from the saved value of the seq.len from seq.len, just save the seq.len at the start and if we need to reset it, just assign it again. When the seq_buf overflow is len == size + 1, the current logic will break. Changing it to use a saved length for resetting back to the original value is more robust and will work when we change the way seq_buf sets the overflow. Link: http://lkml.kernel.org/r/20141118161546.GJ23958@pathway.suse.cz Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:14 -05:00
Steven Rostedt (Red Hat)	eeab98154d	seq_buf: Create seq_buf_used() to find out how much was written Add a helper function seq_buf_used() that replaces the SEQ_BUF_USED() private macro to let callers have a method to know how much of the seq_buf was written to. Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org Link: http://lkml.kernel.org/r/20141114011413.321654244@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:13 -05:00
Steven Rostedt (Red Hat)	dd23180aac	tracing: Convert seq_buf_path() to be like seq_path() Rewrite seq_buf_path() like it is done in seq_path() and allow it to accept any escape character instead of just "\n". Making seq_buf_path() like seq_path() will help prevent problems when converting seq_file to use the seq_buf logic. Link: http://lkml.kernel.org/r/20141104160222.048795666@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.338523371@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:10 -05:00
Steven Rostedt (Red Hat)	3a161d99c4	tracing: Create seq_buf layer in trace_seq Create a seq_buf layer that trace_seq sits on. The seq_buf will not be limited to page size. This will allow other usages of seq_buf instead of a hard set PAGE_SIZE one that trace_seq has. Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:09 -05:00
Markus Elfring	16a8ef2751	tracing: Deletion of an unnecessary check before iput() The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/5468F875.7080907@users.sourceforge.net Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 16:28:45 -05:00
Alexei Starovoitov	daaf427c6a	bpf: fix arraymap NULL deref and missing overflow and zero size checks - fix NULL pointer dereference: kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'. (kzalloc returns null) kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40) - integer overflow check was missing in arraymap (hashmap checks for overflow via kmalloc_array()) - arraymap can round_up(value_size, 8) to zero. check was missing. - hashmap was missing zero size check as well, since roundup_pow_of_two() can truncate into zero - found a typo in the arraymap comment and unnecessary empty line Fix all of these issues and make both overflow checks explicit U32 in size. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-19 15:40:00 -05:00
Steven Rostedt (Red Hat)	8e2e095cbe	tracing: Fix return value of ftrace_raw_output_prep() If the trace_seq of ftrace_raw_output_prep() is full this function returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero. The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero! The thing is, the caller of ftrace_raw_output_prep() expects a success to be zero. Change that to expect it to be TRACE_TYPE_HANDLED. Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.cz Reminded-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:48 -05:00
Steven Rostedt (Red Hat)	dba39448ab	tracing: Remove return values of most trace_seq_*() functions The trace_seq_printf() and friends are used to store strings into a buffer that can be passed around from function to function. If the trace_seq buffer fills up, it will not print any more. The return values were somewhat inconsistant and using trace_seq_has_overflowed() was a better way to know if the write to the trace_seq buffer succeeded or not. Now that all users have removed reading the return value of the printf() type functions, they can safely return void and keep future users of them from reading the inconsistent values as well. Link: http://lkml.kernel.org/r/20141114011411.992510720@goodmis.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:47 -05:00
Steven Rostedt (Red Hat)	183742f08c	tracing: Do not use return values of trace_seq_printf() in syscall tracing The functions trace_seq_printf() and friends will not be returning values soon and will be void functions. To know if they succeeded or not, the functions trace_seq_has_overflowed() and trace_handle_return() should be used instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:46 -05:00
Steven Rostedt (Red Hat)	8579a107a6	tracing/uprobes: Do not use return values of trace_seq_printf() The functions trace_seq_printf() and friends will soon no longer have return values. Using trace_seq_has_overflowed() and trace_handle_return() should be used instead. Link: http://lkml.kernel.org/r/20141114011411.693008134@goodmis.org Link: http://lkml.kernel.org/r/20141115050602.333705855@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:45 -05:00
Steven Rostedt (Red Hat)	d2b0191a38	tracing/probes: Do not use return value of trace_seq_printf() The functions trace_seq_printf() and friends will soon not have a return value and will only be a void function. Use trace_seq_has_overflowed() instead to know if the trace_seq operations succeeded or not. Link: http://lkml.kernel.org/r/20141114011411.530216306@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)	a72e10afab	tracing: Do not check return values of trace_seq_p*() for mmio tracer The return values for trace_seq_printf() and friends are going to be removed and they will become void functions. The mmio tracer checked their return and even did so incorrectly. Some of the funtions which returned the values were never checked themselves. Removing all the checks simplifies the code. Use trace_seq_has_overflowed() and trace_handle_return() where necessary instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)	85224da0b8	kprobes/tracing: Use trace_seq_has_overflowed() for overflow checks Instead of checking the return value of trace_seq_printf() and friends for overflowing of the buffer, use the trace_seq_has_overflowed() helper function. This cleans up the code quite a bit and also takes us a step closer to changing the return values of trace_seq_printf() and friends to void. Link: http://lkml.kernel.org/r/20141114011411.181812785@goodmis.org Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:43 -05:00
Steven Rostedt (Red Hat)	9d9add34ec	tracing: Have function_graph use trace_seq_has_overflowed() Instead of doing individual checks all over the place that makes the code very messy. Just check trace_seq_has_overflowed() at the end or in strategic places. This makes the code much cleaner and also helps with getting closer to removing the return values of trace_seq_printf() and friends. Link: http://lkml.kernel.org/r/20141114011410.987913836@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:42 -05:00
Steven Rostedt (Red Hat)	7d40f67165	tracing: Have branch tracer use trace_handle_return() helper function The branch tracer should not be checking the trace_seq_printf() return value as that will soon be void. There's a new trace_handle_return() helper function that will return TRACE_TYPE_PARTIAL_LINE if the trace_seq overflowed and TRACE_TYPE_HANDLED otherwise. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:41 -05:00
Steven Rostedt (Red Hat)	c0cd93aa16	ring-buffer: Remove check of trace_seq_{puts,printf}() return values Remove checking the return value of all trace_seq_puts(). It was wrong anyway as only the last return value mattered. But as the trace_seq_puts() is going to be a void function in the future, we should not be checking the return value of it anyway. Just return !trace_seq_has_overflowed() instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:40 -05:00
Steven Rostedt (Red Hat)	f4a1d08ce6	blktrace/tracing: Use trace_seq_has_overflowed() helper function Checking the return code of every trace_seq_printf() operation and having to return early if it overflowed makes the code messy. Using the new trace_seq_has_overflowed() and trace_handle_return() functions allows us to clean up the code. In the future, trace_seq_printf() and friends will be turning into void functions and not returning a value. The trace_seq_has_overflowed() is to be used instead. This cleanup allows that change to take place. Cc: Jens Axboe <axboe@fb.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)	19a7fe2062	tracing: Add trace_seq_has_overflowed() and trace_handle_return() Adding a trace_seq_has_overflowed() which returns true if the trace_seq had too much written into it allows us to simplify the code. Instead of checking the return value of every call to trace_seq_printf() and friends, they can all be called normally, and at the end we can return !trace_seq_has_overflowed() instead. Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function was created called trace_handle_return() which takes a trace_seq and returns these enums. Using this helper function also simplifies the code. This change also makes it possible to remove the return values of trace_seq_printf() and friends. They should instead just be void functions. Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)	e400a40cff	tracing: Fix trace_seq_bitmask() to start at current position In trace_seq_bitmask() it calls bitmap_scnprintf() not from the current position of the trace_seq buffer (s->buffer + s->len), but instead from the beginning of the buffer (s->buffer). Luckily, the only user of this "ipi_raise tracepoint" uses it as the first parameter, and as such, the start of the temp buffer in include/trace/ftrace.h (see __get_bitmask()). Reported-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:38 -05:00
Steven Rostedt (Red Hat)	aec0be2d6e	ftrace/x86/extable: Add is_ftrace_trampoline() function Stack traces that happen from function tracing check if the address on the stack is a __kernel_text_address(). That is, is the address kernel code. This calls core_kernel_text() which returns true if the address is part of the builtin kernel code. It also calls is_module_text_address() which returns true if the address belongs to module code. But what is missing is ftrace dynamically allocated trampolines. These trampolines are allocated for individual ftrace_ops that call the ftrace_ops callback functions directly. But if they do a stack trace, the code checking the stack wont detect them as they are neither core kernel code nor module address space. Adding another field to ftrace_ops that also stores the size of the trampoline assigned to it we can create a new function called is_ftrace_trampoline() that returns true if the address is a dynamically allocate ftrace trampoline. Note, it ignores trampolines that are not dynamically allocated as they will return true with the core_kernel_text() function. Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:26 -05:00
Al Viro	9f45f5bf30	new helper: audit_file() ... for situations when we don't have any candidate in pathnames - basically, in descriptor-based syscalls. [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:26 -05:00
Al Viro	b583043e99	kill f_dentry uses Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:25 -05:00
Alexey Ishchuk	4eafad7feb	s390/kernel: add system calls for PCI memory access Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read system calls to allow user space applications to access device PCI I/O memory pages on s390x platform. [ Martin Schwidefsky: some code beautification ] Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-11-19 09:46:43 +01:00
Steven Rostedt (Red Hat)	a9ce7c36aa	tracing: Fix race of function probes counting The function probe counting for traceon and traceoff suffered a race condition where if the probe was executing on two or more CPUs at the same time, it could decrement the counter by more than one when disabling (or enabling) the tracer only once. The way the traceon and traceoff probes are suppose to work is that they disable (or enable) tracing once per count. If a user were to echo 'schedule:traceoff:3' into set_ftrace_filter, then when the schedule function was called, it would disable tracing. But the count should only be decremented once (to 2). Then if the user enabled tracing again (via tracing_on file), the next call to schedule would disable tracing again and the count would be decremented to 1. But if multiple CPUS called schedule at the same time, it is possible that the count would be decremented more than once because of the simple "count--" used. By reading the count into a local variable and using memory barriers we can guarantee that the count would only be decremented once per disable (or enable). The stack trace probe had a similar race, but here the stack trace will decrement for each time it is called. But this had the read-modify- write race, where it could stack trace more than the number of times that was specified. This case we use a cmpxchg to stack trace only the number of times specified. The dump probes can still use the old "update_count()" function as they only run once, and that is controlled by the dump logic itself. Link: http://lkml.kernel.org/r/20141118134643.4b550ee4@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-18 23:06:35 -05:00
Rafael J. Wysocki	b2b49ccbdd	PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected The number of and dependencies between high-level power management Kconfig options make life much harder than necessary. Several conbinations of them have to be tested and supported, even though some of those combinations are very rarely used in practice (if they are used in practice at all). Moreover, the fact that we have separate independent Kconfig options for runtime PM and system suspend is a serious obstacle for integration between the two frameworks. To overcome these difficulties, always select PM_RUNTIME if PM_SLEEP is set. Among other things, this will allow system suspend callbacks provided by bus types and device drivers to rely on the runtime PM framework regardless of the kernel configuration. Enthusiastically-acked-by: Kevin Hilman <khilman@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-18 23:41:40 +01:00
Alexei Starovoitov	7943c0f329	bpf: remove test map scaffolding and user proper types proper types and function helpers are ready. Use them in verifier testsuite. Remove temporary stubs Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:44:00 -05:00
Alexei Starovoitov	d0003ec01c	bpf: allow eBPF programs to use maps expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem() map accessors to eBPF programs Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:44:00 -05:00
Alexei Starovoitov	a1854d6ac0	bpf: fix BPF_MAP_LOOKUP_ELEM command return code fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage described it in commit b4fc1a460f30("Merge branch 'bpf-next'"): ----- BPF_MAP_LOOKUP_ELEM int bpf_lookup_elem(int fd, void key, void value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr)); } bpf() syscall looks up an element with given key in a map fd. If element is found it returns zero and stores element's value into value. If element is not found it returns -1 and sets errno to ENOENT. and further down in manpage: ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that element with given key was not found. ----- In general all BPF commands return ENOENT when map element is not found (including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with flags == BPF_MAP_UPDATE_ONLY) Subsequent patch adds a testsuite to check return values for all of these combinations. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:59 -05:00
Alexei Starovoitov	28fbcfa08d	bpf: add array type of eBPF maps add new map type BPF_MAP_TYPE_ARRAY and its implementation - optimized for fastest possible lookup() . in the future verifier/JIT may recognize lookup() with constant key and optimize it into constant pointer. Can optimize non-constant key into direct pointer arithmetic as well, since pointers and value_size are constant for the life of the eBPF program. In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT while preserving concurrent access to this map from user space - two main use cases for array type: . 'global' eBPF variables: array of 1 element with key=0 and value is a collection of 'global' variables which programs can use to keep the state between events . aggregation of tracing events into fixed set of buckets - all array elements pre-allocated and zero initialized at init time - key as an index in array and can only be 4 byte - map_delete_elem() returns EINVAL, since elements cannot be deleted - map_update_elem() replaces elements in an non-atomic way (for atomic updates hashtable type should be used instead) Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:59 -05:00
Alexei Starovoitov	0f8e4bd8a1	bpf: add hashtable type of eBPF maps add new map type BPF_MAP_TYPE_HASH and its implementation - maps are created/destroyed by userspace. Both userspace and eBPF programs can lookup/update/delete elements from the map - eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism for concurrent updates - key/value are opaque range of bytes (aligned to 8 bytes) - user space provides 3 configuration attributes via BPF syscall: key_size, value_size, max_entries - map takes care of allocating/freeing key/value pairs - map_update_elem() must fail to insert new element when max_entries limit is reached to make sure that eBPF programs cannot exhaust memory - map_update_elem() replaces elements in an atomic way - optimized for speed of lookup() which can be called multiple times from eBPF program which itself is triggered by high volume of events . in the future JIT compiler may recognize lookup() call and optimize it further, since key_size is constant for life of eBPF program Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:25 -05:00
Alexei Starovoitov	3274f52073	bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: either update existing map element or create a new one. Initially the plan was to add a new command to handle the case of 'create new element if it didn't exist', but 'flags' style looks cleaner and overall diff is much smaller (more code reused), so add 'flags' attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: #define BPF_ANY 0 /* create new element or update existing / #define BPF_NOEXIST 1 / create new element if it didn't exist / #define BPF_EXIST 2 / update existing element / bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST if element already exists. bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT if element doesn't exist. Userspace will call it as: int bpf_update_elem(int fd, void key, void *value, __u64 flags) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), .flags = flags; }; return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } First two bits of 'flags' are used to encode style of bpf_update_elem() command. Bits 2-63 are reserved for future use. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:25 -05:00
Tejun Heo	eeecbd1971	cgroup: implement cgroup_get_e_css() Implement cgroup_get_e_css() which finds and gets the effective css for the specified cgroup and subsystem combination. This function always returns a valid pinned css. This will be used by cgroup writeback support. While at it, add comment to cgroup_e_css() to explain why that function is different from cgroup_get_e_css() and has to test cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:52 -05:00
Tejun Heo	56c807ba4e	cgroup: add cgroup_subsys->css_e_css_changed() Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is invoked if any of the effective csses seen from the css's cgroup may have changed. This will be used to implement cgroup writeback support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	7d172cc89b	cgroup: add cgroup_subsys->css_released() Add a new cgroup subsys callback css_released(). This is called when the reference count of the css (cgroup_subsys_state) reaches zero before RCU scheduling free. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	db6e305345	cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared asynchronously. If cgroup_subtree_control_write() is requested to enable the subsystem again before the entry is cleared, it has to wait for the previous offlining to finish and clear the @cgrp->subsys[] entry before trying to enable the subsystem again. This is currently done while verifying the input enable / disable parameters. This used to be correct but `f63070d350` ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") breaks it. The commit is one of the commits implementing subsystem dependency. Through subsystem dependency, some subsystems may be enabled and disabled implicitly in addition to the explicitly requested ones. The actual subsystems to be enabled and disabled are determined during @css_enable/disable calculation. The current offline wait logic skips the ones which are already implicitly enabled and then waits for subsystems in @enable; however, this misses the subsystems which may be implicitly enabled through dependency from @enable. If such implicitly subsystem hasn't yet finished offlining yet, the function ends up trying to create a css when its @cgrp->subsys[] slot is already occupied triggering BUG_ON() in init_and_link_css(). Fix it by moving the wait logic after @css_enable is calculated and waiting for all the subsystems in @css_enable. This fixes the above bug as the mask contains all subsystems which are to be enabled including the ones enabled through dependencies. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `f63070d350` ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	755bf5ee86	cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() Make cgroup_subtree_control_write() first calculate new subtree_control (new_sc), child_subsys_mask (new_ss) and css_enable/disable masks before applying them to the cgroup. Also, store the original subtree_control (old_sc) and child_subsys_mask (old_ss) and use them to restore the orignal state after failure. This patch shouldn't cause any behavior changes. This prepares for a fix for a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00
Tejun Heo	0f060deb5c	cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cgroup_refresh_child_subsys_mask() calculates and updates the effective @cgrp->child_subsys_maks according to the current @cgrp->subtree_control. Separate out the calculation part into cgroup_calc_child_subsys_mask(). This will be used to fix a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00
Pankaj Dubey	7d3dcd042c	PM: Kconfig: fix unmet dependency for CPU_PM If BL_SWITCHER is enabled but SUSPEND and CPU_IDLE is not enabled we are getting following config warning. warning: (BL_SWITCHER) selects CPU_PM which has unmet direct dependencies (SUSPEND \|\| CPU_IDLE) It has been noticed that CPU_PM dependencies in this file are not really required so let's remove these dependencies from CPU_PM. Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-18 02:51:17 +01:00
Markus Elfring	6c45de0d51	PM / hibernate: Deletion of an unnecessary check before the function call "vfree" The vfree() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-18 02:49:40 +01:00
Rafael J. Wysocki	c68cfdc0a5	Merge back 'pm-sleep' material for 3.19-rc1.	2014-11-18 01:23:34 +01:00
Dave Hansen	fe3d197f84	x86, mpx: On-demand kernel allocation of bounds tables This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly require anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this could be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are HUGE. An X-GB virtual area needs 4X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Qiaowei Ren	ee1b58d36a	mpx: Extend siginfo structure to include bound violation information This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Richard Guy Briggs	0288d7183c	audit: convert status version to a feature bitmap The version field defined in the audit status structure was found to have limitations in terms of its expressibility of features supported. This is distict from the get/set features call to be able to command those features that are present. Converting this field from a version number to a feature bitmap will allow distributions to selectively backport and support certain features and will allow upstream to be able to deprecate features in the future. It will allow userspace clients to first query the kernel for which features are actually present and supported. Currently, EINVAL is returned rather than EOPNOTSUP, which isn't helpful in determining if there was an error in the command, or if it simply isn't supported yet. Past features are not represented by this bitmap, but their use may be converted to EOPNOTSUP if needed in the future. Since "version" is too generic to convert with a #define, use a union in the struct status, introducing the member "feature_bitmap" unionized with "version". Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP* counterparts, leaving the former for backwards compatibility. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: minor whitespace tweaks] Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-11-17 16:53:51 -05:00
Peter Zijlstra	2565711fb7	perf: Improve the perf_sample_data struct layout This patch reorders fields in the perf_sample_data struct in order to minimize the number of cachelines touched in perf_sample_data_init(). It also removes some intializations which are redundant with the code in kernel/events/core.c Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411559322-16548-7-git-send-email-eranian@google.com Cc: cebbert.lkml@gmail.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: jolsa@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:42:04 +01:00
Stephane Eranian	60e2364e60	perf: Add ability to sample machine state on interrupt Enable capture of interrupted machine state for each sample. Registers to sample are passed per event in the sample_regs_intr bitmask. To sample interrupt machine state, the PERF_SAMPLE_INTR_REGS must be passed in sample_type. The list of available registers is arch dependent and provided by asm/perf_regs.h Registers are laid out as u64 in the order of the bit order of sample_intr_regs. This patch also adds a new ABI version PERF_ATTR_SIZE_VER4 because we extend the perf_event_attr struct with a new u64 field. Reviewed-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: cebbert.lkml@gmail.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:57 +01:00
Wanpeng Li	36ce98818a	sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK Introduce start_hrtick_dl for !CONFIG_SCHED_HRTICK to align with the fair class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415670747-58726-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:59:03 +01:00
pang.xunlei	c1a2b5f629	sched/deadline: Remove unnecessary definitions in cpudeadline.h Actually, cpudl_set() and cpudl_init() can never be used without CONFIG_SMP. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415260327-30465-4-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:59:00 +01:00
pang.xunlei	74e6942fbc	sched/cpupri: Remove unnecessary definitions in cpupri.h Actually, cpupri_set() and cpupri_init() can never be used without CONFIG_SMP. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: "pang.xunlei" <pang.xunlei@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415260327-30465-1-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:59 +01:00
Wanpeng Li	c51b8ab5ad	sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task() Do not call dequeue_pushable_dl_task() when failing to push an eligible task, as it remains pushable, merely not at this particular moment. Actually the patch is the same behavior as commit `311e800e16` ("sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()" in -rt side. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415258564-8573-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:57 +01:00
Wanpeng Li	cb0b9f2445	sched/fair: Fix stale overloaded status in the busiest group finding logic Commit `caeb178c60` ("sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sd") changes groups to be ranked in the order of overloaded > imbalance > other, and busiest group is picked according to this order. sgs->group_capacity_factor is used to check if the group is overloaded. When the child domain prefers tasks to go to siblings first, the sgs->group_capacity_factor will be set lower than one in order to move all the excess tasks away. However, group overloaded status is not updated when sgs->group_capacity_factor is set to lower than one, which leads to us missing to find the busiest group. This patch fixes it by updating group overloaded status when sg capacity factor is set to one, in order to find the busiest group accurately. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415144690-25196-1-git-send-email-wanpeng.li@linux.intel.com [ Fixed the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:56 +01:00
Wanpeng Li	6c1d9410f0	sched: Move p->nr_cpus_allowed check to select_task_rq() Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq(). This change will make fair.c, rt.c, and deadline.c all start with the same logic. Suggested-and-Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "pang.xunlei" <pang.xunlei@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:55 +01:00
Wolfram Sang	a1bd537335	sched/completion: Document when to use wait_for_completion_io_*() As discussed in [1], accounting IO is meant for blkio only. Document that so driver authors won't use them for device io. [1] http://thread.gmane.org/gmane.linux.drivers.i2c/20470 Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415098901-2768-1-git-send-email-wsa@the-dreams.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:54 +01:00
Kirill Tkhai	753899183c	sched/fair: Kill task_struct::numa_entry and numa_group::task_list Nobody iterates over numa_group::task_list, this just confuses the readers. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415358456.28592.17.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:58:48 +01:00
Ingo Molnar	e9ac5f0fa8	Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:50:25 +01:00
Stanislaw Gruszka	6e998916df	sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency Commit `d670ec1317` "posix-cpu-timers: Cure SMP wobbles" fixes one glibc test case in cost of breaking another one. After that commit, calling clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result of Y time being smaller than X time. Reproducer/tester can be found further below, it can be compiled and ran by: gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread while ./tst-cpuclock2 ; do : ; done This reproducer, when running on a buggy kernel, will complain about "clock_gettime difference too small". Issue happens because on start in thread_group_cputimer() we initialize sum_exec_runtime of cputimer with threads runtime not yet accounted and then add the threads runtime to running cputimer again on scheduler tick, making it's sum_exec_runtime bigger than actual threads runtime. KOSAKI Motohiro posted a fix for this problem, but that patch was never applied: https://lkml.org/lkml/2013/5/26/191 . This patch takes different approach to cure the problem. It calls update_curr() when cputimer starts, that assure we will have updated stats of running threads and on the next schedule tick we will account only the runtime that elapsed from cputimer start. That also assure we have consistent state between cpu times of individual threads and cpu time of the process consisted by those threads. Full reproducer (tst-cpuclock2.c): #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <time.h> #include <pthread.h> #include <stdint.h> #include <inttypes.h> /* Parameters for the Linux kernel ABI for CPU clocks. / #define CPUCLOCK_SCHED 2 #define MAKE_PROCESS_CPUCLOCK(pid, clock) \ ((~(clockid_t) (pid) << 3) \| (clockid_t) (clock)) static pthread_barrier_t barrier; / Help advance the clock. / static void chew_cpu(void arg) { pthread_barrier_wait(&barrier); while (1) ; return NULL; } / Don't use the glibc wrapper. / static int do_nanosleep(int flags, const struct timespec req) { clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED); return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL); } static int64_t tsdiff(const struct timespec before, const struct timespec after) { int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec; int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec; return after_i - before_i; } int main(void) { int result = 0; pthread_t th; pthread_barrier_init(&barrier, NULL, 2); if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) { perror("pthread_create"); return 1; } pthread_barrier_wait(&barrier); /* The test. / struct timespec before, after, sleeptimeabs; int64_t sleepdiff, diffabs; const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 }; / The relative nanosleep. Not sure why this is needed, but its presence seems to make it easier to reproduce the problem. / if (do_nanosleep(0, &sleeptime) != 0) { perror("clock_nanosleep"); return 1; } / Get the current time. / if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) { perror("clock_gettime[2]"); return 1; } / Compute the absolute sleep time based on the current time. / uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec; sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000; sleeptimeabs.tv_nsec = nsec % 1000000000; / Sleep for the computed time. / if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) { perror("absolute clock_nanosleep"); return 1; } / Get the time after the sleep. / if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) { perror("clock_gettime[3]"); return 1; } / The time after sleep should always be equal to or after the absolute sleep time passed to clock_nanosleep. / sleepdiff = tsdiff(&sleeptimeabs, &after); if (sleepdiff < 0) { printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff); result = 1; printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec); printf("After %llu.%09llu\n", after.tv_sec, after.tv_nsec); printf("Sleep %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec); } / The difference between the timestamps taken before and after the clock_nanosleep call should be equal to or more than the duration of the sleep. */ diffabs = tsdiff(&before, &after); if (diffabs < sleeptime.tv_nsec) { printf("clock_gettime difference too small: %" PRId64 "\n", diffabs); result = 1; } pthread_cancel(th); return result; } Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:04:20 +01:00
Peter Zijlstra	23cfa361f3	sched/cputime: Fix cpu_timer_sample_group() double accounting While looking over the cpu-timer code I found that we appear to add the delta for the calling task twice, through: cpu_timer_sample_group() thread_group_cputimer() thread_group_cputime() times->sum_exec_runtime += task_sched_runtime(); *sample = cputime.sum_exec_runtime + task_delta_exec(); Which would make the sample run ahead, making the sleep short. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:04:18 +01:00
Peter Zijlstra	7af683350c	sched/numa: Avoid selecting oneself as swap target Because the whole numa task selection stuff runs with preemption enabled (its long and expensive) we can end up migrating and selecting oneself as a swap target. This doesn't really work out well -- we end up trying to acquire the same lock twice for the swap migrate -- so avoid this. Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 10:04:17 +01:00
Mark Rutland	226424eee8	perf: Fix corruption of sibling list with hotplug When a CPU hotplugged out, we call perf_remove_from_context() (via perf_event_exit_cpu()) to rip each CPU-bound event out of its PMU's cpu context, but leave siblings grouped together. Freeing of these events is left to the mercy of the usual refcounting. When a CPU-bound event's refcount drops to zero we cross-call to __perf_remove_from_context() to clean it up, detaching grouped siblings. This works when the relevant CPU is online, but will fail if the CPU is currently offline, and we won't detach the event from its siblings before freeing the event, leaving the sibling list corrupt. If the sibling list is later walked (e.g. because the CPU cam online again before a remaining sibling's refcount drops to zero), we will walk the now corrupted siblings list, potentially dereferencing garbage values. Given that the events should never be scheduled again (as we removed them from their context), we can simply detatch siblings when the CPU goes down in the first place. If the CPU comes back online, the redundant call to __perf_remove_from_context() is safe. Reported-by: Drew Richardson <drew.richardson@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: vincent.weaver@maine.edu Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415203904-25308-2-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 09:45:46 +01:00
Kevin Hilman	977d2fa6b2	PM / Runtime: Kconfig: move ia64 dependency to arch/ia64/Kconfig The IA64_HP_SIM dependency on PM_RUNTIME should be done in the arch Kconfig instead of in the PM core. Move it accordingly. NOTE: arch/ia64/Kconfig currently does a 'select PM', which since commit `1eb208aea3` (PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP \|\| CONFIG_PM_RUNTIME)) is effectively a noop unless PM_SLEEP or PM_RUNTIME are set elsewhere. Signed-off-by: Kevin Hilman <khilman@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-15 00:39:56 +01:00
Linus Torvalds	78646f62db	ACPI and power management fixes for 3.18-rc5 - Fix a crash in the suspend-to-idle code path introduced by a recent commit that forgot to check a pointer against NULL before dereferencing it (Dmitry Eremin-Solenikov). - Fix a boot crash on Exynos5 introduced by a recent commit making that platform use generic Device Tree bindings for power domains which exposed a weakness in the generic power domains framework leading to that crash (Ulf Hansson). - Fix a crash during system resume on systems where cpufreq depends on Operation Performance Points (OPP) for functionality, but CONFIG_OPP is not set. This leads the cpufreq driver registration to fail, but the resume code attempts to restore the pre-suspend cpufreq configuration (which does not exist) nevertheless and crashes. From Geert Uytterhoeven. - Add a new ACPI blacklist entry for Dell Vostro 3546 that has problems if it is reported as Windows 8 compatible to the BIOS (Adam Lee). - Fix swapped arguments in an error message in the cpufreq-dt driver (Abhilash Kesavan). - Fix up the prototypes of new callbacks in struct generic_pm_domain to make them more useful. Users of those callbacks will be added in 3.19 and it's better for them to be based on the correct struct definition in mainline from the start. From Ulf Hansson and Kevin Hilman. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJUZhtnAAoJEILEb/54YlRxx1EP/0Rk7pJUHeOMmdyXyY7B+n+f MlXHVMDhskT370fsdTGbpeYb5ATr5kGatfhr+vyDQmBtxdw7lDJxKq54s6kmmIL3 SEMRRb4NtkPsdDE7zq985JmjsrnHtKxC5NjSUwEGxdyyfAZxll4mrZL6RrqXCu44 L+qdVXRffCCrJDXZl5FZUpSZ3ZUc+xTiaDy7ObjLe2bwmzvBOAwS2flBMKxN9X+e khlGdQZ0e9T2Y3IXriHxHMui8OVbkPyYZkW1aubCd0HwuTMP7sebosX/2JWdJOmg q6bGcvPlBwXDRoShlzFO8CN5w5E8fIe0vfPcg9SB3s21S7rJEbYQX/5ytm107aJj Ysv7mcb2dAHG0V3J7hxhkS+7UNPxfk3G+8frxW2UQ6eIDlZkBORIUhGCzeSbIGYM aIKiomN4jGuPeaOkEnKl4RwMlzjuzAs2V06viffbq63eyWBvtHDW8M5bdq901pXp 1jOT7yKqLzOZYqcYaLr3z+IBw/+hfuG/FdCp3uGyFqeHPBNIP3BfFnWm6A6E13b+ aC6gvhQHojT7L2gqIBJ+Qn0EiRWNqwoLk6w6DLDYJna/hYyoXq0BKv+/x2OegItU ENKYVpfmSt3YsEhcTBW4h5IpUvK07o5Oa3nTxen6924Im61dMyaSUDD5DiaqCgXO bVJTsF983hBZGTy0IMX/ =wQxT -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are three regression fixes, two recent (generic power domains, suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for one more machine having problems with Windows 8 compatibility, a minor cpufreq driver fix (cpufreq-dt) and a fixup for new callback definitions (generic power domains). Specifics: - Fix a crash in the suspend-to-idle code path introduced by a recent commit that forgot to check a pointer against NULL before dereferencing it (Dmitry Eremin-Solenikov). - Fix a boot crash on Exynos5 introduced by a recent commit making that platform use generic Device Tree bindings for power domains which exposed a weakness in the generic power domains framework leading to that crash (Ulf Hansson). - Fix a crash during system resume on systems where cpufreq depends on Operation Performance Points (OPP) for functionality, but CONFIG_OPP is not set. This leads the cpufreq driver registration to fail, but the resume code attempts to restore the pre-suspend cpufreq configuration (which does not exist) nevertheless and crashes. From Geert Uytterhoeven. - Add a new ACPI blacklist entry for Dell Vostro 3546 that has problems if it is reported as Windows 8 compatible to the BIOS (Adam Lee). - Fix swapped arguments in an error message in the cpufreq-dt driver (Abhilash Kesavan). - Fix up the prototypes of new callbacks in struct generic_pm_domain to make them more useful. Users of those callbacks will be added in 3.19 and it's better for them to be based on the correct struct definition in mainline from the start. From Ulf Hansson and Kevin Hilman" * tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: Fix initial default state of the need_restore flag PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set PM / Domains: Change prototype for the attach and detach callbacks cpufreq: Avoid crash in resume on SMP without OPP cpufreq: cpufreq-dt: Fix arguments in clock failure error message ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546	2014-11-14 13:38:02 -08:00
Byungchul Park	4526d0676a	function_graph: Fix micro seconds notations Usually, "msecs" notation means milli-seconds, and "usecs" notation means micro-seconds. Since the unit used in the code is micro-seconds, the notation should be replaced from msecs to usecs. Link: http://lkml.kernel.org/r/1415171926-9782-2-git-send-email-byungchul.park@lge.com Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-14 07:56:03 -05:00
Daniel Bristot de Oliveira	678f845ed0	ftrace-graph: show latency-format on print_graph_irq() On the function_graph tracer, the print_graph_irq() function prints a trace line with the flag ==========> on an irq handler entry, and the flag <========== on an irq handler return. But when the latency-format is enable, it is not printing the latency-format flags, causing the following error in the trace output: 0) ==========> \| 0) d... \| smp_apic_timer_interrupt() { This patch fixes this issue by printing the latency-format flags when it is enable. Link: http://lkml.kernel.org/r/7c2e226dac20c940b6242178fab7f0e3c9b5ce58.1415233316.git.bristot@redhat.com Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-14 07:56:02 -05:00
Rasmus Villemoes	1177e43641	trace: Replace single-character seq_puts with seq_putc Printing a single character to a seqfile might as well be done with seq_putc instead of seq_puts; this avoids a strlen() call and a memory access. It also shaves another few bytes off the generated code. Link: http://lkml.kernel.org/r/1415479332-25944-4-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-14 07:55:55 -05:00
David S. Miller	076ce44825	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/chelsio/cxgb4vf/sge.c drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c sge.c was overlapping two changes, one to use the new __dev_alloc_page() in net-next, and one to use s->fl_pg_order in net. ixgbe_phy.c was a set of overlapping whitespace changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-14 01:01:12 -05:00
Rasmus Villemoes	d79ac28fde	tracing: Merge consecutive seq_puts calls Consecutive seq_puts calls with literal strings can be merged to a single call. This reduces the size of the generated code, and can also lead to slight .rodata reduction (because of fewer nul and padding bytes). It should also shave a off a few clock cycles. Link: http://lkml.kernel.org/r/1415479332-25944-3-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:33:34 -05:00
Rasmus Villemoes	fa6f0cc751	tracing: Replace seq_printf by simpler equivalents Using seq_printf to print a simple string or a single character is a lot more expensive than it needs to be, since seq_puts and seq_putc exist. These patches do seq_printf(m, s) -> seq_puts(m, s) seq_printf(m, "%s", s) -> seq_puts(m, s) seq_printf(m, "%c", c) -> seq_putc(m, c) Subsequent patches will simplify further. Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:32:19 -05:00
Daniel Thompson	8520dedbbf	tracing: kdb: Fix kernel livelock with empty buffers Currently kdb's ftdump command will livelock by constantly printk'ing the empty string at KERN_EMERG level if it run when the ftrace system is not in use. This occurs because trace_empty() never returns false when the ring buffers are left at the start of a non-consuming read [launched by ring_buffer_read_start()]. This patch changes the loop exit condition to use the result of trace_find_next_entry_inc(). Effectively this switches the non-consuming kdb dumper to follow the approach of the non-consuming userspace interface [s_next()] rather than the consuming ftrace_dump(). Link: http://lkml.kernel.org/r/1415277716-19419-3-git-send-email-daniel.thompson@linaro.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:27:25 -05:00
Daniel Thompson	c270cc75cd	tracing: kdb: Fix kernel panic during ftdump Currently kdb's ftdump command unconditionally crashes due to a null pointer de-reference whenever the command is run. This in turn causes the kernel to panic. The abridged stacktrace (gathered with ARCH=arm) is: --- cut here --- [<c09535ac>] (panic) from [<c02132dc>] (die+0x264/0x440) [<c02132dc>] (die) from [<c0952eb8>] (__do_kernel_fault.part.11+0x74/0x84) [<c0952eb8>] (__do_kernel_fault.part.11) from [<c021f954>] (do_page_fault+0x1d0/0x3c4) [<c021f954>] (do_page_fault) from [<c020846c>] (do_DataAbort+0x48/0xac) [<c020846c>] (do_DataAbort) from [<c0213c58>] (__dabt_svc+0x38/0x60) Exception stack(0xc0deba88 to 0xc0debad0) ba80: e8c29180 00000001 e9854304 e9854300 c0f567d8 c0df2580 baa0: 00000000 00000000 00000000 c0f117b8 c0e3a3c0 c0debb0c 00000000 c0debad0 bac0: 0000672e c02f4d60 60000193 ffffffff [<c0213c58>] (__dabt_svc) from [<c02f4d60>] (kdb_ftdump+0x1e4/0x3d8) [<c02f4d60>] (kdb_ftdump) from [<c02ce328>] (kdb_parse+0x2b8/0x698) [<c02ce328>] (kdb_parse) from [<c02ceef0>] (kdb_main_loop+0x52c/0x784) [<c02ceef0>] (kdb_main_loop) from [<c02d1b0c>] (kdb_stub+0x238/0x490) --- cut here --- The NULL deref occurs due to the initialized use of struct trace_iter's buffer_iter member. This is a regression, albeit a fairly elderly one. It was introduced by commit `6d158a813e` ("tracing: Remove NR_CPUS array from trace_iterator"). This patch solves this by providing a collection of ring_buffer_iter(s) and using this to initialize buffer_iter. Note that static allocation is used solely because the trace_iter itself is also static allocated. Static allocation also means that we have to NULL-ify the pointer during cleanup to avoid use-after-free problems. Link: http://lkml.kernel.org/r/1415277716-19419-2-git-send-email-daniel.thompson@linaro.org Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:24:24 -05:00
Luis Claudio R. Goncalves	933ff9f202	tracing: Fix traceoff_on_warning handling on boot command line According to the documentation, adding "traceoff_on_warning" to the boot command line should be enough to enable the feature. But right now it is necessary to specify "traceoff_on_warning=". Along with fixing that, also verify if the value passed, if any, is either "0" or "off". Link: http://lkml.kernel.org/r/20141112231400.GL12281@uudg.org Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 21:03:41 -05:00
Steven Rostedt (Red Hat)	fe578ba36f	ftrace: Have the control_ops get a trampoline With the new logic, if only a single user of ftrace function hooks is used, it will get its own trampoline assigned to it. The problem is that the control_ops is an indirect ops that perf ops uses. What that means is that when perf registers its ops with register_ftrace_function(), it has the CONTROL flag set and gets added to the control list instead of the global ftrace list. The control_ops gets added to that instead and the mcount trampoline calls the control_ops function. The control_ops function will iterate the control list and call the ops functions that are attached to it. But currently the trampoline is added to the perf ops and not the control ops, and when ftrace tries to find a trampoline hook for it, it fails to find one and gives the following splat: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 10133 at kernel/trace/ftrace.c:2033 ftrace_get_addr_new+0x6f/0xc0() Modules linked in: [...] CPU: 0 PID: 10133 Comm: perf Tainted: P 3.18.0-rc1-test+ #388 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 00000000000007f1 ffff8800c2643bc8 ffffffff814fca6e ffff88011ea0ed01 0000000000000000 ffff8800c2643c08 ffffffff81041ffd 0000000000000000 ffffffff810c388c ffffffff81a5a350 ffff880119b00000 ffffffff810001c8 Call Trace: [<ffffffff814fca6e>] dump_stack+0x46/0x58 [<ffffffff81041ffd>] warn_slowpath_common+0x81/0x9b [<ffffffff810c388c>] ? ftrace_get_addr_new+0x6f/0xc0 [<ffffffff810001c8>] ? 0xffffffff810001c8 [<ffffffff81042031>] warn_slowpath_null+0x1a/0x1c [<ffffffff810c388c>] ftrace_get_addr_new+0x6f/0xc0 [<ffffffff8102e938>] ftrace_replace_code+0xd6/0x334 [<ffffffff810c4116>] ftrace_modify_all_code+0x41/0xc5 [<ffffffff8102eba6>] arch_ftrace_update_code+0x10/0x19 [<ffffffff810c293c>] ftrace_run_update_code+0x21/0x42 [<ffffffff810c298f>] ftrace_startup_enable+0x32/0x34 [<ffffffff810c3049>] ftrace_startup+0x14e/0x15a [<ffffffff810c307c>] register_ftrace_function+0x27/0x40 [<ffffffff810dc118>] perf_ftrace_event_register+0x3e/0xee [<ffffffff810dbfbe>] perf_trace_init+0x29d/0x2a9 [<ffffffff810eb422>] perf_tp_event_init+0x27/0x3a [<ffffffff810f18bc>] perf_init_event+0x9e/0xed [<ffffffff810f1ba4>] perf_event_alloc+0x299/0x330 [<ffffffff810f236b>] SYSC_perf_event_open+0x3ee/0x816 [<ffffffff8115a066>] ? mntput+0x2d/0x2f [<ffffffff81142b00>] ? __fput+0xa7/0x1b2 [<ffffffff81091300>] ? do_gettimeofday+0x22/0x3a [<ffffffff810f279c>] SyS_perf_event_open+0x9/0xb [<ffffffff81502a92>] system_call_fastpath+0x12/0x17 ---[ end trace 81a53565150e4982 ]--- Bad trampoline accounting at: ffffffff810001c8 (run_init_process+0x0/0x2d) (10000001) Update the control_ops trampoline instead of the perf ops one. Reported-by: lkp@01.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-13 19:40:56 -05:00
Xie XiuQi	bc53a3f46d	kernel/panic.c: update comments for print_tainted Commit `69361eef90` ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag, but failed to update the comments for print_tainted(). So, update the comments. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Paul E. McKenney	9ea6c58856	Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD cpu.2014.11.03a: Changes for per-CPU variables. doc.2014.11.13a: Documentation updates. fixes.2014.11.13a: Miscellaneous fixes. signal.2014.10.29a: Signal changes. rt.2014.10.29a: Real-time changes. torture.2014.11.03a: torture-test changes.	2014-11-13 10:39:04 -08:00
Paul E. McKenney	60ced4950c	rcu: Fix FIXME in rcu_tasks_kthread() This commit affines rcu_tasks_kthread() to the housekeeping CPUs in CONFIG_NO_HZ_FULL builds. This is just a default, so systems administrators are free to put this kthread somewhere else if they wish. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-11-13 10:35:41 -08:00
Linus Torvalds	911883759f	Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "After he sent the initial audit pull request for 3.18, Eric asked me to take over the management of the audit tree, hence this pull request to fix a couple of problems with audit. As you can see below, the changes are minimal: adding some whitespace to a string so userspace parses it correctly, and fixing a problem with audit's usage of fsnotify that was causing audit watch rules to be lost. Neither of these patches were very controversial on the mailing lists and they fix real problems, getting them into 3.18 would be a good thing" * 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit: audit: keep inode pinned audit: AUDIT_FEATURE_CHANGE message format missing delimiting space	2014-11-13 09:36:39 -08:00
Miklos Szeredi	799b601451	audit: keep inode pinned Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: `90b1e7a578` ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-11-11 14:20:22 -05:00
Jiang Liu	26488b3723	tracing: Add entry->next_cpu to trace_ctxwake_bin() Function trace_ctxwake_bin() misses ctx_switch_entry->next_cpu field, so user will get stale value for "next_cpu". Link: http://lkml.kernel.org/p/1377176379-27908-1-git-send-email-liuj97@gmail.com Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:43:34 -05:00
Steven Rostedt (Red Hat)	243f7610a6	tracing: Move tracing_sched_{switch,wakeup}() into wakeup tracer The only code that references tracing_sched_switch_trace() and tracing_sched_wakeup_trace() is the wakeup latency tracer. Those two functions use to belong to the sched_switch tracer which has long been removed. These functions were left behind because the wakeup latency tracer used them. But since the wakeup latency tracer is the only one to use them, they should be static functions inside that code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:43:15 -05:00
Oleg Nesterov	458faf0b88	tracing: Kill the dead code in probe_sched_switch() and probe_sched_wakeup() After the previous patch it is clear that "tracer_enabled" can never be true, we can remove the "if (tracer_enabled)" code in probe_sched_switch() and probe_sched_wakeup(). Plus we can obviously remove tracer_enabled, ctx_trace, and sched_stopped as well. Link: http://lkml.kernel.org/p/20140723193503.GA30217@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:42:44 -05:00
Oleg Nesterov	632537256e	tracing: Kill tracing_{start,stop}_sched_switch_record() and tracing_sched_switch_assign_trace() tracing_{start,stop}_sched_switch_record() have no callers since `87d80de280` "tracing: Remove obsolete sched_switch tracer". The last caller of tracing_sched_switch_assign_trace() was removed by `30dbb20e68` "tracing: Remove boot tracer". Link: http://lkml.kernel.org/p/20140723193501.GA30214@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:42:23 -05:00
Steven Rostedt (Red Hat)	4fd3279b48	ftrace: Add more information to ftrace_bug() output With the introduction of the dynamic trampolines, it is useful that if things go wrong that ftrace_bug() produces more information about what the current state is. This can help debug issues that may arise. Ftrace has lots of checks to make sure that the state of the system it touchs is exactly what it expects it to be. When it detects an abnormality it calls ftrace_bug() and disables itself to prevent any further damage. It is crucial that ftrace_bug() produces sufficient information that can be used to debug the situation. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:42:13 -05:00
Steven Rostedt (Red Hat)	12cce594fa	ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines When the static ftrace_ops (like function tracer) enables tracing, and it is the only callback that is referencing a function, a trampoline is dynamically allocated to the function that calls the callback directly instead of calling a loop function that iterates over all the registered ftrace ops (if more than one ops is registered). But when it comes to dynamically allocated ftrace_ops, where they may be freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe to free the trampoline. If a task was preempted while executing on the trampoline, there's currently no way to know when it will be off that trampoline. But this is not true when it comes to !CONFIG_PREEMPT. The current method of calling schedule_on_each_cpu() will force tasks off the trampoline, becaues they can not schedule while on it (kernel preemption is not configured). That means it is safe to free a dynamically allocated ftrace ops trampoline when CONFIG_PREEMPT is not configured. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-11 12:41:52 -05:00
Fabian Frederick	0f16996cf2	kernel/debug/debug_core.c: Logging clean-up -Convert printk( to pr_foo() -Add pr_fmt -Coalesce formats Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:53 -06:00
Daniel Thompson	a1465d2f39	kgdb: timeout if secondary CPUs ignore the roundup Currently if an active CPU fails to respond to a roundup request the CPU that requested the roundup will become stuck. This needlessly reduces the robustness of the debugger. This patch introduces a timeout allowing the system state to be examined even when the system contains unresponsive processors. It also modifies kdb's cpu command to make it censor attempts to switch to unresponsive processors and to report their state as (D)ead. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:53 -06:00
Daniel Thompson	b8017177cd	kdb: Allow access to sensitive commands to be restricted by default Currently kiosk mode must be explicitly requested by the bootloader or userspace. It is convenient to be able to change the default value in a similar manner to CONFIG_MAGIC_SYSRQ_DEFAULT_MASK. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:52 -06:00
Anton Vorontsov	420c2b1b0d	kdb: Add enable mask for groups of commands Currently all kdb commands are enabled whenever kdb is deployed. This makes it difficult to deploy kdb to help debug certain types of systems. Android phones provide one example; the FIQ debugger found on some Android devices has a deliberately weak set of commands to allow the debugger to enabled very late in the production cycle. Certain kiosk environments offer another interesting case where an engineer might wish to probe the system state using passive inspection commands without providing sufficient power for a passer by to root it. Without any restrictions, obtaining the root rights via KDB is a matter of a few commands, and works everywhere. For example, log in as a normal user: cbou:~$ id uid=1001(cbou) gid=1001(cbou) groups=1001(cbou) Now enter KDB (for example via sysrq): Entering kdb (current=0xffff8800065bc740, pid 920) due to Keyboard Entry kdb> ps 23 sleeping system daemon (state M) processes suppressed, use 'ps A' to see all. Task Addr Pid Parent [] cpu State Thread Command 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 bash 0xffff880007078000 1 0 0 0 S 0xffff8800070782e0 init [...snip...] 0xffff8800065be3c0 918 1 0 0 S 0xffff8800065be6a0 getty 0xffff8800065b9c80 919 1 0 0 S 0xffff8800065b9f60 login 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 *bash All we need is the offset of cred pointers. We can look up the offset in the distro's kernel source, but it is unnecessary. We can just start dumping init's task_struct, until we see the process name: kdb> md 0xffff880007078000 0xffff880007078000 0000000000000001 ffff88000703c000 ................ 0xffff880007078010 0040210000000002 0000000000000000 .....!@......... [...snip...] 0xffff8800070782b0 ffff8800073e0580 ffff8800073e0580 ..>.......>..... 0xffff8800070782c0 0000000074696e69 0000000000000000 init............ ^ Here, 'init'. Creds are just above it, so the offset is 0x02b0. Now we set up init's creds for our non-privileged shell: kdb> mm 0xffff8800065bc740+0x02b0 0xffff8800073e0580 0xffff8800065bc9f0 = 0xffff8800073e0580 kdb> mm 0xffff8800065bc740+0x02b8 0xffff8800073e0580 0xffff8800065bc9f8 = 0xffff8800073e0580 And thus gaining the root: kdb> go cbou:~$ id uid=0(root) gid=0(root) groups=0(root) cbou:~$ bash root:~# p.s. No distro enables kdb by default (although, with a nice KDB-over-KMS feature availability, I would expect at least some would enable it), so it's not actually some kind of a major issue. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:52 -06:00
Daniel Thompson	9452e977ac	kdb: Categorize kdb commands (similar to SysRq categorization) This patch introduces several new flags to collect kdb commands into groups (later allowing them to be optionally disabled). This follows similar prior art to enable/disable magic sysrq commands. The commands have been categorized as follows: Always on: go (w/o args), env, set, help, ?, cpu (w/o args), sr, dmesg, disable_nmi, defcmd, summary, grephelp Mem read: md, mdr, mdp, mds, ef, bt (with args), per_cpu Mem write: mm Reg read: rd Reg write: go (with args), rm Inspect: bt (w/o args), btp, bta, btc, btt, ps, pid, lsmod Flow ctrl: bp, bl, bph, bc, be, bd, ss Signal: kill Reboot: reboot All: cpu, kgdb, (and all of the above), nmi_console Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:52 -06:00
Anton Vorontsov	e8ab24d9b0	kdb: Remove KDB_REPEAT_NONE flag Since we now treat KDB_REPEAT_* as flags, there is no need to pass KDB_REPEAT_NONE. It's just the default behaviour when no flags are specified. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:52 -06:00
Anton Vorontsov	04bb171e7a	kdb: Use KDB_REPEAT_* values as flags The actual values of KDB_REPEAT_* enum values and overall logic stayed the same, but we now treat the values as flags. This makes it possible to add other flags and combine them, plus makes the code a lot simpler and shorter. But functionality-wise, there should be no changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:51 -06:00
Anton Vorontsov	42c884c10b	kdb: Rename kdb_register_repeat() to kdb_register_flags() We're about to add more options for commands behaviour, so let's give a more generic name to the low-level kdb command registration function. There are just various renames, no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:51 -06:00
Anton Vorontsov	15a42a9bc9	kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags We're about to add more options for command behaviour, so let's expand the meaning of kdb_repeat_t. So far we just do various renames, there should be no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:51 -06:00
Anton Vorontsov	a2e5d188aa	kdb: Remove currently unused kdbtab_t->cmd_flags The struct member is never used in the code, so we can remove it. We will introduce real flags soon by renaming cmd_repeat to cmd_flags. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2014-11-11 09:31:51 -06:00
Rusty Russell	18eb74fa94	params: cleanup sysfs allocation commit `63662139e5` attempted to patch a leak (which would only happen on OOM, ie. never), but it didn't quite work. This rewrites the code to be as simple as possible. add_sysfs_param() adds a parameter. If it fails, it's the caller's responsibility to clean up the parameters which already exist. The kzalloc-then-always-krealloc pattern is perhaps overly simplistic, but this code has clearly confused people. It worked on me... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:47 +10:30
Ionut Alexa	6da0b56515	kernel:module Fix coding style errors and warnings. Fixed codin style errors and warnings. Changes printk with print_debug/warn. Changed seq_printf to seq_puts. Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed bogus KERN_DEFAULT conversion)	2014-11-11 17:07:47 +10:30
Masami Hiramatsu	e513cc1c07	module: Remove stop_machine from module unloading Remove stop_machine from module unloading by adding new reference counting algorithm. This atomic refcounter works like a semaphore, it can get (be incremented) only when the counter is not 0. When loading a module, kmodule subsystem sets the counter MODULE_REF_BASE (= 1). And when unloading the module, it subtracts MODULE_REF_BASE from the counter. If no one refers the module, the refcounter becomes 0 and we can remove the module safely. If someone referes it, we try to recover the counter by adding MODULE_REF_BASE unless the counter becomes 0, because the referrer can put the module right before recovering. If the recovering is failed, we can get the 0 refcount and it never be incremented again, it can be removed safely too. Note that __module_get() forcibly gets the module refcounter, users should use try_module_get() instead of that. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:46 +10:30
Masami Hiramatsu	2f35c41f58	module: Replace module_ref with atomic_t refcnt Replace module_ref per-cpu complex reference counter with an atomic_t simple refcnt. This is for code simplification. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:46 +10:30
Masami Hiramatsu	0286b5ea12	lib/bug: Use RCU list ops for module_bug_list Actually since module_bug_list should be used in BUG context, we may not need this. But for someone who want to use this from normal context, this makes module_bug_list an RCU list. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:46 +10:30
Masami Hiramatsu	461e34aed0	module: Unlink module with RCU synchronizing instead of stop_machine Unlink module from module list with RCU synchronizing instead of using stop_machine(). Since module list is already protected by rcu, we don't need stop_machine() anymore. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:45 +10:30
Masami Hiramatsu	4f48795b61	module: Wait for RCU synchronizing before releasing a module Wait for RCU synchronizing on failure path of module loading before releasing struct module, because the memory of mod->list can still be accessed by list walkers (e.g. kallsyms). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-11-11 17:07:44 +10:30
Rabin Vincent	07906da788	tracing: Do not risk busy looping in buffer splice If the read loop in trace_buffers_splice_read() keeps failing due to memory allocation failures without reading even a single page then this function will keep busy looping. Remove the risk for that by exiting the function if memory allocation failures are seen. Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:47:31 -05:00
Rabin Vincent	e30f53aad2	tracing: Do not busy wait in buffer splice On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: `b1169cc69b` ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-10 16:45:43 -05:00
Andrey Ryabinin	c123588b3b	sched/numa: Fix out of bounds read in sched_init_numa() On latest mm + KASan patchset I've got this: ================================================================== BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c ============================================================================= BUG kmalloc-8 (Not tainted): kasan error ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0 __slab_alloc+0x4b4/0x4f0 __kmalloc_track_caller+0x15f/0x1e0 kstrdup+0x44/0x90 alloc_vfsmnt+0xb0/0x2c0 vfs_kern_mount+0x35/0x190 kern_mount_data+0x25/0x50 pid_ns_prepare_proc+0x19/0x50 alloc_pid+0x5e2/0x630 copy_process.part.41+0xdf5/0x2aa0 do_fork+0xf5/0x460 kernel_thread+0x21/0x30 rest_init+0x1e/0x90 start_kernel+0x522/0x531 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x15b/0x16a INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080 INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70 Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5 proc.kk. Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc ........ Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 3.18.0-rc3-mm1+ #108 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18 ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48 ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20 Call Trace: dump_stack (lib/dump_stack.c:52) print_trailer (mm/slub.c:645) object_err (mm/slub.c:652) ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178) ? kasan_poison_shadow (mm/kasan/kasan.c:48) ? kasan_unpoison_shadow (mm/kasan/kasan.c:54) ? kasan_poison_shadow (mm/kasan/kasan.c:48) ? kasan_kmalloc (mm/kasan/kasan.c:311) __asan_load4 (mm/kasan/kasan.c:371) ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) kernel_init_freeable (init/main.c:869 init/main.c:997) ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248) ? rest_init (init/main.c:924) kernel_init (init/main.c:929) ? rest_init (init/main.c:924) ret_from_fork (arch/x86/kernel/entry_64.S:348) ? rest_init (init/main.c:924) Read of size 4 by task swapper/0: Memory state around the buggy address: ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc ^ ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Zero 'level' (e.g. on non-NUMA system) causing out of bounds access in this line: sched_max_numa_distance = sched_domains_numa_distance[level - 1]; Fix this by exiting from sched_init_numa() earlier. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reviewed-by: Rik van Riel <riel@redhat.com> Fixes: `9942f79ba` ("sched/numa: Export info needed for NUMA balancing on complex topologies") Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-10 10:33:22 +01:00
Kevin Cernekee	b79055952b	genirq: Generic chip: Add big endian I/O accessors Use io{read,write}32be if the caller specified IRQ_GC_BE_IO when creating the irqchip. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1415342669-30640-5-git-send-email-cernekee@gmail.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>	2014-11-09 04:02:00 +00:00
Kevin Cernekee	332fd7c4fe	genirq: Generic chip: Change irq_reg_{readl,writel} arguments Pass in the irq_chip_generic struct so we can use different readl/writel settings for each irqchip driver, when appropriate. Compute (gc->reg_base + reg_offset) in the helper function because this is pretty much what all callers want to do anyway. Compile-tested using the following configurations: at91_dt_defconfig (CONFIG_ATMEL_AIC_IRQ=y) sama5_defconfig (CONFIG_ATMEL_AIC5_IRQ=y) sunxi_defconfig (CONFIG_ARCH_SUNXI=y) tb10x (ARC) is untested. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1415342669-30640-3-git-send-email-cernekee@gmail.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>	2014-11-09 04:01:22 +00:00
Dmitry Eremin-Solenikov	403b9636fe	PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set If no freeze_ops is set, trying to enter suspend-to-IDLE will cause a nice oops in platform_suspend_prepare_late(). Add respective checks to platform_suspend_prepare_late() and platform_resume_early() functions. Fixes: `a8d46b9e4e` (ACPI / sleep: Rework the handling of ACPI GPE wakeup ...) Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-08 22:30:05 +01:00
Peter Hurley	e1c2296c34	tty: Move session_of_pgrp() and make static tiocspgrp() is the lone caller of session_of_pgrp(); relocate and limit to file scope. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Reviewed-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-11-05 16:26:14 -08:00
Sebastian Schmidt	68c4a4f8ab	pstore: Honor dmesg_restrict sysctl on dmesg dumps When the kernel.dmesg_restrict restriction is in place, only users with CAP_SYSLOG should be able to access crash dumps (like: attacker is trying to exploit a bug, watchdog reboots, attacker can happily read crash dumps and logs). This puts the restriction on console-* types as well as sensitive information could have been leaked there. Other log types are unaffected. Signed-off-by: Sebastian Schmidt <yath@yath.de> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-05 09:59:48 -08:00
Iulia Manda	44dba3d5d6	sched: Refactor task_struct to use numa_faults instead of numa_* pointers This patch simplifies task_struct by removing the four numa_* pointers in the same array and replacing them with the array pointer. By doing this, on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on x86_64). A new parameter is added to the task_faults_idx function so that it can return an index to the correct offset, corresponding with the old precalculated pointers. All of the code in sched/ that depended on task_faults_idx and numa_* was changed in order to match the new logic. Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: dave@stgolabs.net Cc: riel@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:57 +01:00
Wanpeng Li	cad3bb32e1	sched/deadline: Don't check CONFIG_SMP in switched_from_dl() There are both UP and SMP version of pull_dl_task(), so don't need to check CONFIG_SMP in switched_from_dl(); Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:56 +01:00
Wanpeng Li	cd66091162	sched/deadline: Reschedule from switched_from_dl() after a successful pull In switched_from_dl() we have to issue a resched if we successfully pulled some task from other cpus. This patch also aligns the behavior with -rt. Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:55 +01:00
Wanpeng Li	6b0a563f3a	sched/deadline: Push task away if the deadline is equal to curr during wakeup This patch pushes task away if the dealine of the task is equal to current during wake up. The same behavior as rt class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:55 +01:00
Wanpeng Li	acb32132ec	sched/deadline: Add deadline rq status print This patch add deadline rq status print. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:54 +01:00
Wanpeng Li	804968809c	sched/deadline: Fix artificial overrun introduced by yield_task_dl() The yield semantic of deadline class is to reduce remaining runtime to zero, and then update_curr_dl() will stop it. However, comsumed bandwidth is reduced from the budget of yield task again even if it has already been set to zero which leads to artificial overrun. This patch fix it by make sure we don't steal some more time from the task that yielded in update_curr_dl(). Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:53 +01:00
Wanpeng Li	308a623a40	sched/rt: Clean up check_preempt_equal_prio() This patch checks if current can be pushed/pulled somewhere else in advance to make logic clear, the same behavior as dl class. - If current can't be migrated, useless to reschedule, let's hope task can move out. - If task is migratable, so let's not schedule it and see if it can be pushed or pulled somewhere else. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:52 +01:00
Juri Lelli	75e23e49db	sched/core: Use dl_bw_of() under rcu_read_lock_sched() As per commit `f10e00f4bf` ("sched/dl: Use dl_bw_of() under rcu_read_lock_sched()"), dl_bw_of() has to be protected by rcu_read_lock_sched(). Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:52 +01:00
Yao Dongdong	9f96742a13	sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu Idle cpu is idler than non-idle cpu, so we needn't search for least_loaded_cpu after we have found an idle cpu. Signed-off-by: Yao Dongdong <yaodongdong@huawei.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414469286-6023-1-git-send-email-yaodongdong@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:51 +01:00
Kirill Tkhai	67dfa1b756	sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl() Currently used hrtimer_try_to_cancel() is racy: raw_spin_lock(&rq->lock) ... dl_task_timer raw_spin_lock(&rq->lock) ... raw_spin_lock(&rq->lock) ... switched_from_dl() ... ... hrtimer_try_to_cancel() ... ... switched_to_fair() ... ... ... ... ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... ... do_exit() ... ... schedule() ... ... raw_spin_lock(&rq->lock) ... raw_spin_unlock(&rq->lock) ... ... ... raw_spin_unlock(&rq->lock) ... raw_spin_lock(&rq->lock) ... ... (asquired) put_task_struct() ... ... free_task_struct() ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... (use after free) ... So, let's implement 100% guaranteed way to cancel the timer and let's be sure we are safe even in very unlikely situations. rq unlocking does not limit the area of switched_from_dl() use, because this has already been possible in pull_dl_task() below. Let's consider the safety of of this unlocking. New code in the patch is working when hrtimer_try_to_cancel() fails. This means the callback is running. In this case hrtimer_cancel() is just waiting till the callback is finished. Two 1) Since we are in switched_from_dl(), new class is not dl_sched_class and new prio is not less MAX_DL_PRIO. So, the callback returns early; it's right after !dl_task() check. After that hrtimer_cancel() returns back too. The above is: raw_spin_lock(rq->lock); ... ... dl_task_timer() ... raw_spin_lock(rq->lock); switched_from_dl() ... hrtimer_try_to_cancel() ... raw_spin_unlock(rq->lock); ... hrtimer_cancel() ... ... raw_spin_unlock(rq->lock); ... return HRTIMER_NORESTART; ... ... raw_spin_lock(rq->lock); ... 2) But the below is also possible: dl_task_timer() raw_spin_lock(rq->lock); ... raw_spin_unlock(rq->lock); raw_spin_lock(rq->lock); ... switched_from_dl() ... hrtimer_try_to_cancel() ... ... return HRTIMER_NORESTART; raw_spin_unlock(rq->lock); ... hrtimer_cancel(); ... raw_spin_lock(rq->lock); ... In this case hrtimer_cancel() returns immediately. Very unlikely case, just to mention. Nobody can manipulate the task, because check_class_changed() is always called with pi_lock locked. Nobody can force the task to participate in (concurrent) priority inheritance schemes (the same reason). All concurrent task operations require pi_lock, which is held by us. No deadlocks with dl_task_timer() are possible, because it returns right after !dl_task() check (it does nothing). If we receive a new dl_task during the time of unlocked rq, we just don't have to do pull_dl_task() in switched_from_dl() further. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> [ Added comments] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:50 +01:00
Peter Zijlstra	e7097e8bd0	sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test In some cases this can trigger a true flood of output. Requested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:49 +01:00
Peter Zijlstra	6b55fc63f4	audit, sched/wait: Fixup kauditd_thread() wait loop The kauditd_thread wait loop is a bit iffy; it has a number of problems: - calls try_to_freeze() before schedule(); you typically want the thread to re-evaluate the sleep condition when unfreezing, also freeze_task() issues a wakeup. - it unconditionally does the {add,remove}_wait_queue(), even when the sleep condition is false. Use wait_event_freezable() that does the right thing. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: oleg@redhat.com Cc: Eric Paris <eparis@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:47 +01:00
Peter Zijlstra	cb6538e740	sched/wait: Fix a kthread race with wait_woken() There is a race between kthread_stop() and the new wait_woken() that can result in a lack of progress. CPU 0 \| CPU 1 \| rfcomm_run() \| kthread_stop() ... \| if (!test_bit(KTHREAD_SHOULD_STOP)) \| \| set_bit(KTHREAD_SHOULD_STOP) \| wake_up_process() wait_woken() \| wait_for_completion() set_current_state(INTERRUPTIBLE) \| if (!WQ_FLAG_WOKEN) \| schedule_timeout() \| \| After which both tasks will wait.. forever. Fix this by having wait_woken() check for kthread_should_stop() but only for kthreads (obviously). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:17:44 +01:00
Kirill Tkhai	f7b8a47da1	sched: Remove lockdep check in sched_move_task() sched_move_task() is the only interface to change sched_task_group: cpu_cgrp_subsys methods and autogroup_move_group() use it. Everything is synchronized by task_rq_lock(), so cpu_cgroup_attach() is ordered with other users of sched_move_task(). This means we do no need RCU here: if we've dereferenced a tg here, the .attach method hasn't been called for it yet. Thus, we should pass "true" to task_css_check() to silence lockdep warnings. Fixes: `eeb61e53ea` ("sched: Fix race between task_group and sched_task_group") Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414473874.8574.2.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-04 07:07:30 +01:00
Paul E. McKenney	b8969d1a50	rcutorture: Fix rcu_torture_cbflood() memory leak Commit `38706bc5a2` (rcutorture: Add callback-flood test) vmalloc()ed a bunch of RCU callbacks, but failed to free them. This commit fixes that oversight. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:26:41 -08:00
Pranith Kumar	aa23c6fbc5	rcutorture: Add early boot self tests Add early boot self tests for RCU under CONFIG_PROVE_RCU. Currently the only test is adding a dummy callback which increments a counter which we then later verify after calling rcu_barrier*(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-11-03 19:26:37 -08:00
Paul E. McKenney	62db99f478	cpu: Avoid puts_pending overflow A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:21:01 -08:00
Paul E. McKenney	8fa7845df5	rcu: Remove "cpu" argument to rcu_cleanup_after_idle() The "cpu" argument to rcu_cleanup_after_idle() is always the current CPU, so drop it. This moves the smp_processor_id() from the caller to rcu_cleanup_after_idle(), saving argument-passing overhead. Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:56 -08:00
Paul E. McKenney	198bbf8127	rcu: Remove "cpu" argument to rcu_prepare_for_idle() The "cpu" argument to rcu_prepare_for_idle() is always the current CPU, so drop it. This in turn allows two of the uses of "cpu" in this function to be replaced with a this_cpu_ptr() and the third by smp_processor_id(), replacing that of the call to rcu_prepare_for_idle(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:49 -08:00

... 5 6 7 8 9 ...

19884 Commits