next-20140324 currently fails compiling celleb_defconfig with:
arch/powerpc/include/asm/opal.h:894:42: error: 'struct notifier_block' declared inside parameter list [-Werror]
arch/powerpc/include/asm/opal.h:894:42: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
arch/powerpc/include/asm/opal.h:896:14: error: 'struct notifier_block' declared inside parameter list [-Werror]
This is due to a missing include which is added here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Just about all of these have been converted to __func__,
so convert the last uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent CPUs support quad word load and store instructions. Add
support to the alignment handler for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This call will not be understood by OPAL, and cause it to add an error
to it's log. Among other things, this is useful for testing the
behaviour of the log as it fills up.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL provides an in-memory circular buffer containing a message log
populated with various runtime messages produced by the firmware.
Provide a sysfs interface /sys/firmware/opal/msglog for userspace to
view the messages.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we wrongly allocate mc_recoverable_range buffer (to hold
recoverable ranges) based on size of the property "mcheck-recoverable-ranges".
This results in allocating less memory to hold available recoverable range
entries from /proc/device-tree/ibm,opal/mcheck-recoverable-ranges.
This patch fixes this issue by allocating mc_recoverable_range buffer based
on number of entries of recoverable ranges instead of device property size.
Without this change we end up allocating less memory and run into memory
corruption issue.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In:
commit 742415d6b6
Author: Michael Neuling <mikey@neuling.org>
powerpc: Turn syscall handler into macros
We converted the syscall entry code onto macros, but in doing this we
introduced some cruft that's never run and should never have been added.
This removes that code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
One OPAL call and one device tree property needed byte swapping.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge a few more patches from Andrew Morton:
"A few leftovers"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/ncpfs/dir.c: fix indenting in ncp_lookup()
ncpfs/inode.c: fix mismatch printk formats and arguments
ncpfs: remove now unused PRINTK macro
ncpfs: convert PPRINTK to ncp_vdbg
ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
ncpfs: Add pr_fmt and convert printks to pr_<level>
arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
lib/percpu_counter.c: fix bad percpu counter state during suspend
autofs4: check dev ioctl size before allocating
mm: vmscan: do not swap anon pages just because free+file is low
Pull second set of arm64 updates from Catalin Marinas:
"A second pull request for this merging window, mainly with fixes and
docs clarification:
- Documentation clarification on CPU topology and booting
requirements
- Additional cache flushing during boot (needed in the presence of
external caches or under virtualisation)
- DMA range invalidation fix for non cache line aligned buffers
- Build failure fix with !COMPAT
- Kconfig update for STRICT_DEVMEM"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix DMA range invalidation for cache line unaligned buffers
arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
arm64: fix !CONFIG_COMPAT build failures
Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
arm64: Relax the kernel cache requirements for boot
arm64: Update the TCR_EL1 translation granule definitions for 16K pages
ARM: topology: Make it clear that all CPUs need to be described
Pull second set of s390 patches from Martin Schwidefsky:
"The second part of Heikos uaccess rework, the page table walker for
uaccess is now a thing of the past (yay!)
The code change to fix the theoretical TLB flush problem allows us to
add a TLB flush optimization for zEC12, this machine has new
instructions that allow to do CPU local TLB flushes for single pages
and for all pages of a specific address space.
Plus the usual bug fixing and some more cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: rework uaccess code - fix locking issues
s390/mm,tlb: optimize TLB flushing for zEC12
s390/mm,tlb: safeguard against speculative TLB creation
s390/irq: Use defines for external interruption codes
s390/irq: Add defines for external interruption codes
s390/sclp: add timeout for queued requests
kvm/s390: also set guest pages back to stable on kexec/kdump
lcs: Add missing destroy_timer_on_stack()
s390/tape: Add missing destroy_timer_on_stack()
s390/tape: Use del_timer_sync()
s390/3270: fix crash with multiple reset device requests
s390/bitops,atomic: add missing memory barriers
s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
arm_pm_restart(), arm_pm_idle() and soft_restart() are all declared in
system_misc.h, but this file is not included in process.c. Add this
missing include. Found via sparse:
arch/arm/kernel/process.c:98:6: warning: symbol 'soft_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:127:6: warning: symbol 'arm_pm_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:134:6: warning: symbol 'arm_pm_idle' was not declared. Should it be static?
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* pm-cpufreq:
cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
cpufreq: create another field .flags in cpufreq_frequency_table
cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
cpufreq: don't print value of .driver_data from core
cpufreq: ia64: don't set .driver_data to index
cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
cpufreq: powernv: cpufreq driver for powernv platform
cpufreq: at32ap: don't declare local variable as static
cpufreq: loongson2_cpufreq: don't declare local variable as static
cpufreq: unicore32: fix typo issue for 'clk'
cpufreq: exynos: Disable on multiplatform build
Pull intel_idle and turbostat material for v3.15-rc1 from Len Brown.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: fine-tune IVT residency targets
tools/power turbostat: Run on Broadwell
tools/power turbostat: simplify output, add Avg_MHz
intel_idle: Add CPU model 54 (Atom N2000 series)
intel_idle: support Bay Trail
intel_idle: allow sparse sub-state numbering, for Bay Trail
ACPI idle: permit sparse C-state sub-state numbers
If the buffer needing cache invalidation for inbound DMA does start or
end on a cache line aligned address, we need to use the non-destructive
clean&invalidate operation. This issue was introduced by commit
7363590d2c (arm64: Implement coherent DMA API based on swiotlb).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Pull ARC changes from Vineet Gupta:
- Support for external initrd from Noam
- Fix broken serial console in nsimosci Virtual Platform
- Reuse of ENTRY/END assembler macros across hand asm code
- Other minor fixes here and there
* tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [nsimosci] Unbork console
ARC: [nsimosci] Change .dts to use generic 8250 UART
ARC: [SMP] General Fixes
ARC: Remove unused DT template file
ARC: [clockevent] simplify timer ISR
ARC: [clockevent] can't be SoC specific
ARC: Remove ARC_HAS_COH_RTSC
ARC: switch to generic ENTRY/END assembler annotations
ARC: support external initrd
ARC: add uImage to .gitignore
ARC: [arcfpga] Fix __initconst data const-correctness
Fix probable typo of renesas,groups in the lager dt. The kernel has no
renesas,gpios but this should match renesas,groups.
Signed-off-by: Rob Taylor <rob.taylor@codethink.co.uk>
[ben.dooks@codethink.co.uk: fixup description]
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...
Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.
The new fixmap and early_ioremap support also needs to use these macros
before paging_init() is called. This patch moves the init_mem_pgprot()
call out of paging_init() and into setup_arch() so that pgprot_default
gets initialized in time for fixmap and early_ioremap.
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures. The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available. Typically, this
means before paging_init() has run.
This patch (of 6):
There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);
early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.
For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap
From Boris:
"Ingo told me yesterday, it makes sense too. I'd guess we can try it.
FWIW, all callers of early_memremap use the memory they get remapped
as normal memory so we should be safe"
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel. The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).
The patch set also addresses various consistency issues in general with
the per cpu macros.
A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
because checks are skipped. This is typically shown through a raw_
prefix. So this patch set changes the places where __this_cpu_ptr()
is used to raw_cpu_ptr().
B. There has been the long term wish by some that __this_cpu operations
would check for preemption. However, there are cases where preemption
checks need to be skipped. This patch set adds raw_cpu operations that
do not check for preemption and then adds preemption checks to the
__this_cpu operations.
C. The use of __get_cpu_var is always a reference to a percpu variable
that can also be handled via a this_cpu operation. This patch set
replaces all uses of __get_cpu_var with this_cpu operations.
D. We can then use this_cpu RMW operations in various places replacing
sequences of instructions by a single one.
E. The use of this_cpu operations throughout will allow other arches than
x86 to implement optimized references and RMV operations to work with
per cpu local data.
F. The use of this_cpu operations opens up the possibility to
further optimize code that relies on synchronization through
per cpu data.
The patch set works in a couple of stages:
I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
Also converts the existing __this_cpu_xx_# primitive in the x86
code to raw_cpu_xx_#.
II. Patch 2-4 use the raw_cpu operations in places that would give
us false positives once they are enabled.
III. Patch 5 adds preemption checks to __this_cpu operations to allow
checking if preemption is properly disabled when these functions
are used.
IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
with this_cpu_ptr. They do not depend on any changes to the percpu
code. No preemption tests are skipped if they are applied.
V. Patches 21-46 are conversion patches that use this_cpu operations
in various kernel subsystems/drivers or arch code.
VI. Patches 47/48 (not included in this series) remove no longer used
functions (__this_cpu_ptr and __get_cpu_var). These should only be
applied after all the conversion patches have made it and after we
have done additional passes through the kernel to ensure that none of
the uses of these functions remain.
This patch (of 46):
The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.
raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.
Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.
Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Robert Richter <rric@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the renamed symbol is defined lib/iomap.c implements ioport_map and
ioport_unmap and currently (nearly) all platforms define the port
accessor functions outb/inb and friend unconditionally. So
HAS_IOPORT_MAP is the better name for this.
Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.
The motivation for this change is to reintroduce a symbol HAS_IOPORT
that signals if outb/int et al are available. I will address that at
least one merge window later though to keep surprises to a minimum and
catch new introductions of (HAS|NO)_IOPORT.
The changes in this commit were done using:
$ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This ensures that BUG() always has a definition that causes a trap (via
an undefined instruction), and that the compiler still recognizes the
code following BUG() as unreachable, avoiding warnings that would
otherwise appear (such as on non-void functions that don't return a
value after BUG()).
In addition to saving a few bytes over the generic infinite-loop
implementation, this implementation traps rather than looping, which
potentially allows for better error-recovery behavior (such as by
rebooting).
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/powerpc/kernel/mce.c, compiled in for PPC_BOOK3S_64, calls
functions only built when IRQ_WORK, so select it. Fixes the following
build error:
arch/powerpc/kernel/built-in.o: In function `.machine_check_queue_event':
(.text+0x11260): undefined reference to `.irq_work_queue'
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/ia64/kernel/unaligned.c uses tty_write_message to print an
unaligned access exception to the TTY of the current user process.
Enable TTY to prevent a build error.
Minimal fix, on the basis that few people on ia64 will care deeply about
kernel size enough to turn off TTY. Ideally, I'd instead suggest
dropping the tty_write_message entirely, and just leaving the printk.
Bonus: no need to sprintf first.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
Now allnoconfig started disabling CONFIG_PROC_FS:
arch/cris/kernel/built-in.o:(.rodata+0xc): undefined reference to `show_cpuinfo'
make: *** [vmlinux] Error 1
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/cris/arch-v10/kernel/debugport.c, compiled in unconditionally with
ETRAX_ARCH_V10, requires TTY, so select TTY to avoid a build failure.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes an artificial RapidIO bus root device and establishes
actual device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip
or an eternal bridge, known as "mport").
Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller. With introduction of devices using
multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
is very limiting or does not work at all. The implemented changes allow
to properly reference platform's local RapidIO mport devices and provide
device details needed for upper layers.
This change to RapidIO device hierarchy does not break any known
existing kernel or user space interfaces.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The main motivation behind this patch is to provide a way to disable THP
for jobs where the code cannot be modified, and using a malloc hook with
madvise is not an option (i.e. statically allocated data). This patch
allows us to do just that, without affecting other jobs running on the
system.
We need to do this sort of thing for jobs where THP hurts performance,
due to the possibility of increased remote memory accesses that can be
created by situations such as the following:
When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will
be handed out, and the THP will be stuck on whatever node the chunk was
originally referenced from. If many remote nodes need to do work on
that same chunk, they'll be making remote accesses.
With THP disabled, 4K pages can be handed out to separate nodes as
they're needed, greatly reducing the amount of remote accesses to
memory.
This patch is based on some of my work combined with some
suggestions/patches given by Oleg Nesterov. The main goal here is to
add a prctl switch to allow us to disable to THP on a per mm_struct
basis.
Here's a bit of test data with the new patch in place...
First with the flag unset:
# perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
Setting thp_disabled for this task...
thp_disable: 0
Set thp_disabled state to 0
Process pid = 18027
PF/
MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23
Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':
273719072.841402 task-clock # 641.026 CPUs utilized [100.00%]
1,008,986 context-switches # 0.000 M/sec [100.00%]
7,717 CPU-migrations # 0.000 M/sec [100.00%]
1,698,932 page-faults # 0.000 M/sec
355,222,544,890,379 cycles # 1.298 GHz [100.00%]
536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%]
409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%]
148,286,797,266,411 instructions # 0.42 insns per cycle
# 3.62 stalled cycles per insn [100.00%]
27,061,793,159,503 branches # 98.867 M/sec [100.00%]
1,188,655,196 branch-misses # 0.00% of all branches
427.001706337 seconds time elapsed
Now with the flag set:
# perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
Setting thp_disabled for this task...
thp_disable: 1
Set thp_disabled state to 1
Process pid = 144957
PF/
MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23
Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':
138789390.540183 task-clock # 641.959 CPUs utilized [100.00%]
534,205 context-switches # 0.000 M/sec [100.00%]
4,595 CPU-migrations # 0.000 M/sec [100.00%]
63,133,119 page-faults # 0.000 M/sec
147,977,747,269,768 cycles # 1.066 GHz [100.00%]
200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%]
105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%]
180,916,213,503,160 instructions # 1.22 insns per cycle
# 1.11 stalled cycles per insn [100.00%]
26,999,511,005,868 branches # 194.536 M/sec [100.00%]
714,066,351 branch-misses # 0.00% of all branches
216.196778807 seconds time elapsed
As with previous versions of the patch, We're getting about a 2x
performance increase here. Here's a link to the test case I used, along
with the little wrapper to activate the flag:
http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz
This patch (of 3):
Revert commit 8e72033f2a and add in code to fix up any issues caused
by the revert.
The revert is necessary because hugepage_madvise would return -EINVAL
when VM_NOHUGEPAGE is set, which will break subsequent chunks of this
patch set.
Here's a snip of an e-mail from Gerald detailing the original purpose of
this code, and providing justification for the revert:
"The intent of commit 8e72033f2a was to guard against any future
programming errors that may result in an madvice(MADV_HUGEPAGE) on
guest mappings, which would crash the kernel.
Martin suggested adding the bit to arch/s390/mm/pgtable.c, if
8e72033f2a was to be reverted, because that check will also prevent
a kernel crash in the case described above, it will now send a
SIGSEGV instead.
This would now also allow to do the madvise on other parts, if
needed, so it is a more flexible approach. One could also say that
it would have been better to do it this way right from the
beginning..."
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
"The purpose of this single series of commits from Srivatsa S Bhat
(with a small piece from Gautham R Shenoy) touching multiple
subsystems that use CPU hotplug notifiers is to provide a way to
register them that will not lead to deadlocks with CPU online/offline
operations as described in the changelog of commit 93ae4f978c ("CPU
hotplug: Provide lockless versions of callback registration
functions").
The first three commits in the series introduce the API and document
it and the rest simply goes through the users of CPU hotplug notifiers
and converts them to using the new method"
* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
net/iucv/iucv.c: Fix CPU hotplug callback registration
net/core/flow.c: Fix CPU hotplug callback registration
mm, zswap: Fix CPU hotplug callback registration
mm, vmstat: Fix CPU hotplug callback registration
profile: Fix CPU hotplug callback registration
trace, ring-buffer: Fix CPU hotplug callback registration
xen, balloon: Fix CPU hotplug callback registration
hwmon, via-cputemp: Fix CPU hotplug callback registration
hwmon, coretemp: Fix CPU hotplug callback registration
thermal, x86-pkg-temp: Fix CPU hotplug callback registration
octeon, watchdog: Fix CPU hotplug callback registration
oprofile, nmi-timer: Fix CPU hotplug callback registration
intel-idle: Fix CPU hotplug callback registration
clocksource, dummy-timer: Fix CPU hotplug callback registration
drivers/base/topology.c: Fix CPU hotplug callback registration
acpi-cpufreq: Fix CPU hotplug callback registration
zsmalloc: Fix CPU hotplug callback registration
scsi, fcoe: Fix CPU hotplug callback registration
scsi, bnx2fc: Fix CPU hotplug callback registration
scsi, bnx2i: Fix CPU hotplug callback registration
...
Pull OMAP fbdev changes from Tomi Valkeinen:
"This is based on the already pulled fbdev-main changes, and this also
merges .dts branch from Tony Lindgren (which has also been pulled), so
that I was able to add the display related .dts changes.
This contains OMAP related fbdev changes for 3.15. The bulk of the
patches are for adding Device Tree support for OMAP Display Subsystem:
- SoCs: OMAP2/3/4
- Boards: OMAP4 Panda, OMAP4 SDP, OMAP3 Beagle, OMAP3 Beagle-xM,
OMAP3 IGEP0020, OMAP3 N900
- Devices: TFP410 Encoder, tpd12s015 HDMI companion chip, Sony
acx565akm panel, MIPI DSI Command mode panel and HDMI, DVI and
Analog TV connectors"
* tag 'fbdev-omap-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (45 commits)
OMAPDSS: HDMI: fix interlace output
OMAPDSS: add missing __init for dss_init_ports
ARM: OMAP2+: remove pdata quirks for displays
OMAPDSS: remove DT hacks for regulators
Doc/DT: Add DT binding documentation for tpd12s015 encoder
Doc/DT: Add DT binding documentation for TFP410 encoder
Doc/DT: Add DT binding documentation for Sony acx565akm panel
Doc/DT: Add DT binding documentation for MIPI DSI CM Panel
Doc/DT: Add DT binding documentation for HDMI Connector
Doc/DT: Add DT binding documentation for DVI Connector
Doc/DT: Add DT binding documentation for Analog TV Connector
ARM: omap3-n900.dts: add display information
ARM: omap3-igep0020.dts: add display information
ARM: omap3-beagle-xm.dts: add display information
ARM: omap3-beagle.dts: add display information
ARM: omap4-sdp.dts: add display information
Doc/DT: Add DT binding documentation for OMAP DSS
OMAPDSS: acx565akm: Add DT support
OMAPDSS: connector-analog-tv: Add DT support
OMAPDSS: hdmi-connector: Add DT support
...
Pull MFD updates from Lee Jones:
"Changes to existing drivers:
- Use of managed resources - omap, twl4030, ti_am335x_tscadc
- Advanced error handling - omap
- Rework clk management - omap
- Device Tree (re-)work - tc3589x, pm8921, da9055, sec
- IRC management overhaul and !BROKEN - pm8921
- Convert to regmap - ssbi, pm8921
- Use simple power-management ops - ucb1x00
- Include file clean-up - adp5520, cs5535, janz, lpc_ich,
- lpc_sch, max14577, mcp-sa11x0, pcf50633-adc, rc5t583,
rdc321x-southbridge, retu, smsc-ece1099, ti-ssp, ti_am335x_tscadc,
tps65912, vexpress-config, wm8350, ywm8350
- Various bug fixes across the subsystem
- NULL/invalid pointer dereference prevention
- Resource leak mitigation,
- Variable used initialised
- Staticise various containers
- Enforce return value checks
New drivers/supported devices:
- Add support for s2mps14 and s2mpa01 to sec
- Add support for da9063 (v5) to da9063
- Add support for atom-c2000 to gpio-ich
- Add support for come-{mbt10,cbt6,chl6} to kempld
- Add support for da9053 to da9052
- Add support for itco-wdt (v3) and baytrail to lpc_ich
- Add new drivers for tps65218, rtsx_usb, bcm590xx
(Re-)moved drivers:
- twl4030 ==> drivers/iio
- ti-ssp ==> /dev/null"
* tag 'mfd-for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (103 commits)
mfd: wm5110: Correct default for HEADPHONE_DETECT_1
mfd: arizona: Correct small errors in the DT binding documentation
mfd: arizona: Mark DSP clocking register as volatile
mfd: devicetree: bindings: Add pm8xxx RTC description
mfd: kempld-core: Fix potential hang-up during boot
mfd: sec-core: Fix uninitialized 'regmap_rtc' on S2MPA01
mfd: tps65910: Fix regmap_irq_chip_data leak on mfd_add_devices fail
mfd: tps65910: Fix possible invalid pointer dereference on regmap_add_irq_chip fail
mfd: sec-core: Fix I2C dummy device resource leak on probe failure
mfd: sec-core: Add of_compatible strings for clock MFD cells
mfd: Remove obsolete ti-ssp driver
Documentation: mfd: s2mps11: Describe S5M8767 and S2MPS14 clocks
mfd: bcm590xx: Fix type argument for module device table
mfd: lpc_ich: Add support for Intel Bay Trail SoC
mfd: lpc_ich: Add support for NM10 GPIO
mfd: lpc_ich: Change Avoton to iTCO v3
watchdog: iTCO_wdt: Add support for v3 silicon
mfd: lpc_ich: Add support for iTCO v3
mfd: lpc_ich: Remove lpc_ich_cfg struct use
mfd: lpc_ich: Only configure watchdog or GPIO when present
...
The Kconfig for CONFIG_STRICT_DEVMEM is missing despite being
used in mmap.c. Add it.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Strictly speaking this call is a no-op on the platform where dcscb.c is
used since it only has architected caches. The call was there as a hint
to people inspired by this code when writing their own backend, but the
hint might not always be correct.
For example, if a PL310 were to be used it wouldn't be safe to call
the regular outer_flush_all() as atomic instructions for locking
are involved in that case and those instructions cannot be assumed to
still be operational after v7_exit_coherency_flush() has returned.
Given no other CPUs (in the cluster) should be running at that point
then standard concurrency concerns wouldn't apply.
So let's simply kill this call for now and enhance the existing comment.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>