Use the pci_info() and pci_err() wrappers for dev_printk() when possible.
Log PCI device vendor and device IDs and BAR information in the same format
used by other architectures.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Most architectures turn on PCI_COMMAND_IO and PCI_COMMAND_MEMORY in
pci_enable_device() when a driver claims the device. Sparc PCIC did it in
pcibios_fixup_bus(), which is called during enumeration, before any drivers
are attached.
Implement pcibios_enable_device() for PCIC so it will do the same as other
architectures. This implementation is copied verbatim from sparc64.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Most architectures turn on PCI_COMMAND_IO and PCI_COMMAND_MEMORY in
pci_enable_device() when a driver claims the device. Sparc LEON did it in
pcibios_fixup_bus(), which is called during enumeration, before any drivers
are attached.
Implement pcibios_enable_device() for LEON so it will do the same as other
architectures. This implementation is copied verbatim from sparc64.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Nobody is using it anymore, and it's been abandoned. Since David
is fine with removing it, kill it.
Suggested-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since fe83963b7c ("bpf, sparc64: remove ld_abs/ld_ind") it's not
used anymore therefore remove it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Benefit from the generic softirq mask implementation that rely on per-CPU
mutators instead of working with raw operators on top of this_cpu_ptr().
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-10-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to consolidate and optimize generic softirq mask accesses, we
first need to convert architectures to use per-cpu operations when
possible.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Define this symbol if the architecture either uses 64-bit pointers or the
PHYS_ADDR_T_64BIT is set. This covers 95% of the old arch magic. We only
need an additional select for Xen on ARM (why anyway?), and we now always
set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing
instead of only doing it when highmem is set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Hogan <jhogan@kernel.org>
This way we have one central definition of it, and user can select it as
needed. Note that we now also always select it when CONFIG_DMA_API_DEBUG
is select, which fixes some incorrect checks in a few network drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
This way we have one central definition of it, and user can select it as
needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
This way we have one central definition of it, and user can select it as
needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
This code is only used by sparc, and all new iommu drivers should use the
drivers/iommu/ framework. Also remove the unused exports.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
There is no arch specific code required for dma-debug, so there is no
need to opt into the support either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Most mainstream architectures are using 65536 entries, so lets stick to
that. If someone is really desperate to override it that can still be
done through <asm/dma-mapping.h>, but I'd rather see a really good
rationale for that.
dma_debug_init is now called as a core_initcall, which for many
architectures means much earlier, and provides dma-debug functionality
earlier in the boot process. This should be safe as it only relies
on the memory allocator already being available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'. Thanks to Daniel for the merge resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads. Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Since LD_ABS/LD_IND instructions are now removed from the core and
reimplemented through a combination of inlined BPF instructions and
a slow-path helper, we can get rid of the complexity from sparc64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fixup license text for oradax driver, from Rob Gardner.
- Release device object with put_device() instead of straight kfree(),
from Arvind Yadav.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: vio: use put_device() instead of kfree()
sparc64: Fix mistake in oradax license text
Never directly free @dev after calling device_register(), even
if it returned an error. Always use put_device() to give up the
reference initialized.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The license text in both oradax files mistakenly specifies "version 3" of
the GNU General Public License. This is corrected to specify "version 2".
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Filling in struct siginfo before calling force_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.
Simplify this process by using the helper force_sig_fault. Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls force_sig_info.
In short about a 5 line reduction in code for every time force_sig_info
is called, which makes the calling function clearer.
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Filling in struct siginfo before calling send_sig_info a tedious and
error prone process, where once in a great while the wrong fields
are filled out, and siginfo has been inconsistently cleared.
Simplify this process by using the helper send_sig_fault. Which
takes as a parameters all of the information it needs, ensures
all of the fiddly bits of filling in struct siginfo are done properly
and then calls send_sig_info.
In short about a 5 line reduction in code for every time send_sig_info
is called, which makes the calling function clearer.
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.
Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.
The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.
In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.
Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
sparc, uses a nonstandard variation of the generic sysvipc
data structures, intended to have the padding moved around
so it can deal with big-endian 32-bit user space that has
64-bit time_t.
Unlike most architectures, sparc actually succeeded in
defining this right for big-endian CPUs, but as everyone else
got it wrong, we just use the same hack everywhere.
This takes just take the same approach here that we have for
the asm-generic headers and adds separate 32-bit fields for the
upper halves of the timestamps, to let libc deal with the mess
in user space.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Starting with commit v4.14-rc1~60^2^2~1, a SIGFPE signal sent via kill
results to wrong values in si_pid and si_uid fields of compat siginfo_t.
This happens due to FPE_FIXME being defined to 0 for sparc, and at the
same time siginfo_layout() introduced by the same commit returns
SIL_FAULT for SIGFPE if si_code == SI_USER and FPE_FIXME is defined to 0.
Fix this regression by removing FPE_FIXME macro and changing all its users
to assign FPE_FLTUNK to si_code instead of FPE_FIXME.
Note that FPE_FLTUNK is a new macro introduced by commit
266da65e91.
Tested with commit v4.16-11958-g16e205cf42da.
This bug was found by strace test suite.
In the discussion about FPE_FLTUNK on sparc David Miller said:
> Eric, feel free to do something similar on Sparc.
Link: https://github.com/strace/strace/issues/21
Fixes: cc731525f2 ("signal: Remove kernel interal si_code magic")
Fixes: 2.3.41
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Conceptually-Acked-By: David Miller <davem@davemloft.net>
Thanks-to: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We have several files on sparc that include linux/compat.h and expect
asm/compat.h not to be included for 32-bit builds, otherwise we get a
build failure.
Since we need to include asm/compat.h for compat time_t handling on all
32-bit architectures now, this hides some portions of asm/compat.h in
order to let the rest of the file get included.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Previously we unconditionally requested the legacy VGA framebuffer (bus
address 0xa0000-0xbffff) before we even know what PCI devices are present,
in these paths:
pci_fire_pbm_init, schizo_pbm_init, pci_sun4v_pbm_init, psycho_pbm_init_common
pci_determine_mem_io_space
pci_register_legacy_regions
p->start = mem_res->start + 0xa0000
request_resource(mem_res, p) # claim VGA framebuffer
pci_scan_one_pbm
pci_of_scan_bus # scan DT for PCI devices
pci_claim_bus_resources # claim PCI device BARs
If we found a PCI device with a BAR or bridge window that overlapped the
framebuffer area, we complained about not being able to claim the BAR,
e.g.,
pci 0000:00:01.0: can't claim BAR 8 [mem 0x1ff00000000-0x1ffbfffffff]: address conflict with Video RAM area [??? 0x1ff000a0000-0x1ff000bffff flags 0x80000000]
pci 0000:02:01.0: can't claim BAR 8 [mem 0x1ff00100000-0x1ff028fffff]: no compatible bridge window
pci 0000:03:0f.0: can't claim BAR 8 [mem 0x1ff00100000-0x1ff028fffff]: no compatible bridge window
pci 0000:04:04.0: can't claim BAR 1 [mem 0x1ff02808000-0x1ff02808fff]: no compatible bridge window
This may make the conflicting device unusable because we try not to enable
devices that have unassigned or conflicting BARs, e.g.,
qla1280 0000:04:04.0: can't ioremap BAR 1: [mem size 0x00001000]
qla1280: Unable to map I/O memory
If there is no VGA device in the same PCI segment, there's no reason to
reserve the framebuffer and there's no conflict. If there *is* a VGA
device in the same segment, both the VGA device and the device with an
overlapping BAR may respond to the framebuffer addresses, which may cause
bus errors.
Request the legacy framebuffer area only when we actually find a VGA
device. The fact that VGA devices use the legacy framebuffer even though
it's not reported in a BAR is not sparc-specific, so the reservation of
that area could be made more generic in the PCI core eventually.
Note that on some systems, e.g., Blade 100, we still report a conflict
between an ISA bridge (00:07.0) and a VGA device (00:13.0):
pci_bus 0000:00: root bus resource [mem 0x1ff00000000-0x1ffffffffff] (bus address [0x00000000-0xffffffff])
pci 0000:00:07.0: reg 0x14: [mem 0x1ff00000000-0x1ff000fffff]
pci 0000:00:13.0: can't claim VGA legacy [mem 0x1ff000a0000-0x1ff000bffff]: address conflict with 0000:00:07.0 [mem 0x1ff00000000-0x1ff000fffff]
This is probably harmless, but if the VGA device and something behind the
ISA bridge both responded to reads of the framebuffer, it would cause a bus
error.
Link: https://lkml.kernel.org/r/alpine.LRH.2.21.1804112323170.25495@math.ut.ee
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=117191#c35
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- pass HOSTLDFLAGS when compiling single .c host programs
- build genksyms lexer and parser files instead of using shipped
versions
- rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
- let the top .gitignore globally ignore artifacts generated by
flex, bison, and asn1_compiler
- let the top Makefile globally clean artifacts generated by
flex, bison, and asn1_compiler
- use safer .SECONDARY marker instead of .PRECIOUS to prevent
intermediate files from being removed
- support -fmacro-prefix-map option to make __FILE__ a relative path
- fix # escaping to prepare for the future GNU Make release
- clean up deb-pkg by using debian tools instead of handrolled
source/changes generation
- improve rpm-pkg portability by supporting kernel-install as a
fallback of new-kernel-pkg
- extend Kconfig listnewconfig target to provide more information
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJa0krLAAoJED2LAQed4NsGyCAP/3Vsb8A4sea7sE3LV6/aFUJp
WcAm6PXcip1MXy7GI5yxFciwen3Z3ghQUer7fJKDcHR5c4mRSfKaqWp+TLHd6uux
7I4pV0FNx2PapcPu5T7wNZHN96p3xZC0Z66sq9BCZ/+gNyYmZLIDcBUSIOEk0nzJ
IsvD46zy6R6KtEnycShKVscg4JyPXJIw1UBqsPDEFHg5l16ARkghND7e5zTW62Fi
2MqQxNXAksIKpxxoxPH/fIcNp1kFKVxYBH2CW4LQtOjC3GmrozdeV5PUc7yTezPc
dpqOuEcIAbMH91bkvhhF+ZBi34YrxRoT4S8B3G9iCXRz+2LRZZaitqO4dAH8Kjbn
0KjkqzNc5TosJXQ8RPTcQlRBi+JmE1bHxICvTx3XNJcqJMqIH0vs3ez/LJKOwhB4
DbAROoxQNfVcOdouHcx2EuCSdHn24BEyzaGFhi04LACpbRLxr8IJS7hSGXRloBYp
K3ydRvG/dCZjFRTS+xWWSi3Nzjih2mCctQlH3D4nf4M3vtCX+/k5B9IMEYFfHlvL
KoNlK4/1vP/dAJZj0iOqd2ksCA1G6iLoHrFp3E5pdtmb4sVe2Ez3gMt+pxz3htR9
XvjuHOzkWE9eiihs1NsFgQuyP/o3UmNKpDDW0irQ06IFEPXkA/y1mVmeTU3qtrII
ZDiwGozIkMMEy/MLkcjE
=tD6R
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- pass HOSTLDFLAGS when compiling single .c host programs
- build genksyms lexer and parser files instead of using shipped
versions
- rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
- let the top .gitignore globally ignore artifacts generated by flex,
bison, and asn1_compiler
- let the top Makefile globally clean artifacts generated by flex,
bison, and asn1_compiler
- use safer .SECONDARY marker instead of .PRECIOUS to prevent
intermediate files from being removed
- support -fmacro-prefix-map option to make __FILE__ a relative path
- fix # escaping to prepare for the future GNU Make release
- clean up deb-pkg by using debian tools instead of handrolled
source/changes generation
- improve rpm-pkg portability by supporting kernel-install as a
fallback of new-kernel-pkg
- extend Kconfig listnewconfig target to provide more information
* tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: extend output of 'listnewconfig'
kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg
Kbuild: fix # escaping in .cmd files for future Make
kbuild: deb-pkg: split generating packaging and build
kbuild: use -fmacro-prefix-map to make __FILE__ a relative path
kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
kbuild: rename *-asn1.[ch] to *.asn1.[ch]
kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
.gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
kbuild: add %.dtb.S and %.dtb to 'targets' automatically
kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
genksyms: generate lexer and parser during build instead of shipping
kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
.gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
kbuild: use HOSTLDFLAGS for single .c executables
__get_user_pages_fast handles errors differently from
get_user_pages_fast: the former always returns the number of pages
pinned, the later might return a negative error code.
Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "exec: Pin stack limit during exec".
Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2]. In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging. Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits. This series implements
the approach.
[1] 04e35f4495 ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"
This patch (of 3):
Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.
Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc syscall cleanups from Al Viro:
"sparc syscall stuff - killing pointless wrappers, conversions to
{COMPAT_,}SYSCALL_DEFINE"
* 'misc.sparc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sparc: get rid of asm wrapper for nis_syscall()
sparc: switch compat {f,}truncate64() to COMPAT_SYSCALL_DEFINE
sparc: switch compat pread64 and pwrite64 to COMPAT_SYSCALL_DEFINE
convert compat sync_file_range() to COMPAT_SYSCALL_DEFINE
switch sparc_remap_file_pages() to SYSCALL_DEFINE
sparc: get rid of memory_ordering(2) wrapper
sparc: trivial conversions to {COMPAT_,}SYSCALL_DEFINE()
sparc: bury a zombie extern that had been that way for twenty years
sparc: get rid of remaining SIGN... wrappers
sparc: kill useless SIGN... wrappers
sparc: get rid of sys_sparc_pipe() wrappers
GNU Make automatically deletes intermediate files that are updated
in a chain of pattern rules.
Example 1) %.dtb.o <- %.dtb.S <- %.dtb <- %.dts
Example 2) %.o <- %.c <- %.c_shipped
A couple of makefiles mark such targets as .PRECIOUS to prevent Make
from deleting them, but the correct way is to use .SECONDARY.
.SECONDARY
Prerequisites of this special target are treated as intermediate
files but are never automatically deleted.
.PRECIOUS
When make is interrupted during execution, it may delete the target
file it is updating if the file was modified since make started.
If you mark the file as precious, make will never delete the file
if interrupted.
Both can avoid deletion of intermediate files, but the difference is
the behavior when Make is interrupted; .SECONDARY deletes the target,
but .PRECIOUS does not.
The use of .PRECIOUS is relatively rare since we do not want to keep
partially constructed (possibly corrupted) targets.
Another difference is that .PRECIOUS works with pattern rules whereas
.SECONDARY does not.
.PRECIOUS: $(obj)/%.lex.c
works, but
.SECONDARY: $(obj)/%.lex.c
has no effect. However, for the reason above, I do not want to use
.PRECIOUS which could cause obscure build breakage.
The targets specified as .SECONDARY must be explicit. $(targets)
contains all targets that need to include .*.cmd files. So, the
intermediates you want to keep are mostly in there. Therefore, mark
$(targets) as .SECONDARY. It means primary targets are also marked
as .SECONDARY, but I do not see any drawback for this.
I replaced some .SECONDARY / .PRECIOUS markers with 'targets'. This
will make Kbuild search for non-existing .*.cmd files, but this is
not a noticeable performance issue.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Frank Rowand <frowand.list@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
+yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
/jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
uLEf0DU7AEmdvzQ=
=T++R
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- move pci_uevent_ers() out of pci.h (Michael Ellerman)
- skip ASPM common clock warning if BIOS already configured it (Sinan
Kaya)
- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
- remove last user of pci_get_bus_and_slot() and the function itself
(Sinan Kaya)
- add decoding for 16 GT/s link speed (Jay Fang)
- add interfaces to get max link speed and width (Tal Gilboa)
- add pcie_bandwidth_capable() to compute max supported link bandwidth
(Tal Gilboa)
- add pcie_bandwidth_available() to compute bandwidth available to
device (Tal Gilboa)
- add pcie_print_link_status() to log link speed and whether it's
limited (Tal Gilboa)
- use PCI core interfaces to report when device performance may be
limited by its slot instead of doing it in each driver (Tal Gilboa)
- fix possible cpqphp NULL pointer dereference (Shawn Lin)
- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
hotplug (Mika Westerberg)
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space. This is fairly intrusive and includes minor changes to
interfaces used for I/O space on most platforms (Zhichang Yuan, John
Garry)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
ia64, and mn10300 with x86 (Bjorn Helgaas)
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
- use generic pci_mmap_resource_range() instead of powerpc and xtensa
arch-specific versions (David Woodhouse)
- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
- remove System and Video ROM reservations on sparc (Bjorn Helgaas)
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization
(KarimAllah Ahmed)
- consolidate VPD code in vpd.c (Bjorn Helgaas)
- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
- add DT support for R-Car r8a7743 (Biju Das)
- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
bridge driver that causes a general protection fault (Dexuan Cui)
- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
(Dexuan Cui)
- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
(Dexuan Cui)
- make several structures static (Fengguang Wu)
- increase number of MSI IRQs supported by Synopsys DesignWare bridges
from 32 to 256 (Gustavo Pimentel)
- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
API from DesignWare drivers (Gustavo Pimentel)
- add Tegra power management support (Manikanta Maddireddy)
- add Tegra loadable module support (Manikanta Maddireddy)
- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
- support optional regulator for HiSilicon STB (Shawn Guo)
- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
MAINTAINERS: Add missing /drivers/pci/cadence directory entry
fm10k: Report PCIe link properties with pcie_print_link_status()
net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
net/mlx5: Report PCIe link properties with pcie_print_link_status()
net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
PCI: Add pcie_print_link_status() to log link speed and whether it's limited
PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
misc: pci_endpoint_test: Handle 64-bit BARs properly
PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
...
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
reason. It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
don't already #include it. Also remove <linux/kmemleak.h> from source
files that do not use it.
This is tested on i386 allmodconfig and x86_64 allmodconfig. It would
be good to run it through the 0day bot for other $ARCHes. I have
neither the horsepower nor the storage space for the other $ARCHes.
Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
for which patches are included here (in v2).
[ slab.h is the second most used header file after module.h; kernel.h is
right there with slab.h. There could be some minor error in the
counting due to some #includes having comments after them and I didn't
combine all of those. ]
[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures]
Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thanks to commit 4b3ef9daa4 ("mm/swap: split swap cache into 64MB
trunks"), after swapoff the address_space associated with the swap
device will be freed. So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.
The dcache flushing functions (flush_dcache_page(), etc) in architecture
specific code may access the address_space of swap device for anonymous
pages in swap cache via page_mapping() function. But in some cases
there are no mechanisms to prevent the swap device from being swapoff,
for example,
CPU1 CPU2
__get_user_pages() swapoff()
flush_dcache_page()
mapping = page_mapping()
... exit_swap_address_space()
... kvfree(spaces)
mapping_mapped(mapping)
The address space may be accessed after being freed.
But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should be
used. The implementation of flush_dcache_page() in all architectures
follows this too. They will check whether page_mapping() is NULL and
whether mapping_mapped() is true to determine whether to flush the
dcache immediately. And they will use interval tree (mapping->i_mmap)
to find all user space mappings. While mapping_mapped() and
mapping->i_mmap isn't used by anonymous pages in swap cache at all.
So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise. All page_mapping() invoking in flush dcache functions are
replaced with page_mapping_file().
[akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc updates from David Miller:
1) Add support for ADI (Application Data Integrity) found in more
recent sparc64 cpus. Essentially this is keyed based access to
virtual memory, and if the key encoded in the virual address is
wrong you get a trap.
The mm changes were reviewed by Andrew Morton and others.
Work by Khalid Aziz.
2) Validate DAX completion index range properly, from Rob Gardner.
3) Add proper Kconfig deps for DAX driver. From Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
sparc64: Make atomic_xchg() an inline function rather than a macro.
sparc64: Properly range check DAX completion index
sparc: Make auxiliary vectors for ADI available on 32-bit as well
sparc64: Oracle DAX driver depends on SPARC64
sparc64: Update signal delivery to use new helper functions
sparc64: Add support for ADI (Application Data Integrity)
mm: Allow arch code to override copy_highpage()
mm: Clear arch specific VM flags on protection change
mm: Add address parameter to arch_validate_prot()
sparc64: Add auxiliary vectors to report platform ADI properties
sparc64: Add handler for "Memory Corruption Detected" trap
sparc64: Add HV fault type handlers for ADI related faults
sparc64: Add support for ADI register fields, ASIs and traps
mm, swap: Add infrastructure for saving page metadata on swap
signals, sparc: Add signal codes for ADI violations
This avoids a lot of -Wunused warnings such as:
====================
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
^~~~
kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
atomic_xchg(&kgdb_active, cpu);
^~~~~~~~~~~
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
"System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or
compat_sys_xyzzy() should only be called from userspace via the
syscall table, but not from elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from
v4.17 onwards to not call system call functions in the kernel: It is
better to use use a different calling convention for system calls
there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
which then hands processing over to the actual syscall function. This
means that only those parameters which are actually needed for a
specific syscall are passed on during syscall entry, instead of
filling in six CPU registers with random user space content all the
time (which may cause serious trouble down the call chain). Those
x86-specific patches will be pushed through the x86 tree in the near
future.
Moreover, rules on how data may be accessed may differ between kernel
data and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific
code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"
* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
bpf: whitelist all syscalls for error injection
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
kernel/sys_ni: sort cond_syscall() entries
syscalls/x86: auto-create compat_sys_*() prototypes
syscalls: sort syscall prototypes in include/linux/compat.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
kexec: move sys_kexec_load() prototype to syscalls.h
x86/sigreturn: use SYSCALL_DEFINE0
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
...
driver and timer driver), which has been through 7 rounds of review on mailing
list.
It is able to boot to shell and passes most LTP-2017 testsuites in nds32 AE3XX
platform.
Total Tests: 1901
Total Skipped Tests: 618
Total Failures: 78
Copied below is the ChangeLog that contains the history of this patch set:
Changes in v7:
- Update cpu binding document to add "andestech,nds32v3" as fallback
- Remove unnecessary configs of arch/nds32/Kconfig
- Use GENERIC_CALIBRATE_DELAY
- Add more help texts for minimum CPU type config
- Update defconfig because of Kconfig changed and bug fixed
- Move early_trap_init() declaration to nds32.h
- Refine dma.c
- Remove apply_relocate() in module.c and include <linux/moduleloader.h> to catch it
- Add do_kernel_restart() in machine_restart()
- Clean up setup.c to remove CONFIG_VGA_CONSOLE and some extern declaration functions
- Add negative dependency for VGA_CONSOLE on nds32
- Refine ptrace.c and arch/nds32/include/asm/ptrace.h
- Refine syscall restart flow and arch/nds32/kernel/signal.c
- Fix a bug in VDSO
- Remove the handling for kernel code unaligned accessing
- Add a description for unaligned access handling in git commit message.
- Rebase to v4.16-rc1
- Replace ACCESS_ONCE with READ_ONCE
- Replace atomic_long_dec(&mm->nr_ptes) with mm_dec_nr_ptes(mm)
- Remove print_symbol(%s) with printk(%pS)
- Add bpf_perf_event.h
- Remove init_stack and init_thread_info
Changes in v6:
- Refine naming for atl2c
- Refine ae3xx.dts
- Remove CONFIG_TIMER_ATCPIT100 in defconfig
- Refine elf.h
- Fix a vdso bug
- Separate arch patchset and timer patchset
- To select TIMER_OF in drivers/clocksource/Kconfig instead of arch/nds32/Kconfig
Changes in v5:
- Remove __NR__llseek and sys_mmap()
- Add a comment to explain that we don't have clocksource cycle counter in the CPU
- Add volatile in iounmap()
- Fix typo Featuretures to Features
- Replace CPU_CACHE_NONALIASING with !CPU_CACHE_ALIASING
- Fix a endian bug when we try to get val = of_get_property(cpu,"clock-frequency", NULL)
- Add screen_info to fix the building error when CONFIG_ VGA_CONSOLE is enabled
- Remove unnecessary msync()
- Add depends on !64BIT || BROKEN for faraday Kconfig because the descriptor only supports 32bit
- Add atl2c binding document
- Remove unnecessary include headers
- Fix a vector table bug. It placed wrong vector handlers for 2 exceptions.
- Fix a vdso bug. It may encounter TLB multi-hit exception because we accidently set it as a global page.
- Add proper isb and barrier after some cache operations
- Fix a bug in system call restart flow. $r0 ~ $r5 does not be recovered before restarting system call
- Fix the build errors for OpenRISC and SPARC because io.h changed.
- Update ae3xx.dts to support atl2c.
Changes in v4:
- Add atcpit100 timer driver due to it include vdso implementations and sent
them together with nds32 may help reviewer to review.
- Update ae3xx.dts for atcpit100 clock setting and remove vdso settings.
- To get cycle counter register by timer driver instead of dts.
- Use "depends on NDS32 || COMPILE_TEST" in atcpit100 driver because it is needed for nds32 vdso
- Update defconfig becasue kconfig rename from CONFIG_CLKSRC_ATCPIT100 to CONFIG_TIMER_ATCPIT100
- Remove ag101p.dts because we are not yet ready for ag101p platform.
- Update copyright style to SPDX-License-Identifier
- Include <linux/uaccess.h> instead of <asm/uaccess.h>
- Add local_irq_save()/local_irq_restore() to protect SR_TLB_VPN in update_mmu_cache().
- Update cpu_dcache_inval_all implementation to make sure all level cache are writeback.
Changes in v3:
- Use arch's io.h instead of generic one
- Add andestech-boards binding document
- Update nds32/cpus.txt binding document
- Remove atcpit100 timer drivers
- Select NO_BOOTMEM and delete HAVE_MEMBLOCK_NODE_MAP
- make CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN are dependent
- Add cpu type to select HWZOL/CPU_CACHE_ALIASING
- Change CPU_CACHE_NONALIASING to CPU_CACHE_ALIASING
- Remove bootarg from device tree script
- Update ag101p.dts and ae3xx.dts for correct board name.
- Clear and simplify defconfig
- Implement L2C_R_REG/ L2C_W_REG with readl/writel instead of __raw_readl/__raw_writel for endian save
- Remove early_init_dt_add_memory_arch/early_init_dt_alloc_memory_arch to use the generic ones
- Refine devicetree.c
- Fix bug https://lkml.kernel.org/r/1499782590-31366-1-git-send-ema...
- Refine irqchip/irq-ativic32.c implementations
- Add COMPILE_TEST in drivers/net/ethernet/faraday/Kconfig
- Refine cache operations
- Add CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS
- Fix ZERO_PAGE define
- Remove SA_RESTORER
- Remove uapi/asm/signal.h
- Redefine user_pt_regs
- Remove spinlock.h
- Remove __ARCH_WANT_RENAMEAT and __ARCH_WANT_SYSCALL_OFF_T from unistd.h
- Remove set_fs(USER_DS) because flush_old_exec() will do this setting
- Replace in_atomic() with faulthandler_disabled()
- Add barrier.h
- Select COMMON_CLK
- Add clk_pll in dts
- Add of_clk_init() in arch/nds32/kernel/time.c
Changes in v2:
- Set GENERIC_CALIBRATE_DELAY default n
- Add earlycon support
- Remove earlyprintk
- Add CPU_BIG_ENDIAN, CPU_LITTLE_ENDIAN support
- Refine unalignment access exception handler
- Add VMSPLIT support
- Use only one defconfig
- Change interrupt-cells from 2 to 1
- Refine andestech cpu names in bindings/nds32/cpus.txt
- Get clock frequency in dts because fpga bitmap doesn't include this feature
- Update MAINTAINERS for bindings
- Remove unused configs in Kconfig
- Refine device tree scripts
- Refine coding style
- Use generic ioremap_nocache
- Remove L2CC_PA_BASE define and its codes in head.S. It will be moved to bootloader.
- Set PHYS_OFFSET to 0x0 instead of CONFIG_MEMORY_START
- Remove unused macros
- Simplify cpu_cache_* API
- Change __asm__ __volatile__ to asm volatile
- Refine uaccess.h
- Remove unused/deprecated syscall
- Use generic posix_types.h
- Remove arch_trace_hardirqs_on/arch_trace_hardirqs_off
- Fix bug of restart syscall
- Refine syscall implementations
- Use IS_ENABLED to replace ifdef as possible
- Remove device_initcall(nds32_device_probe)
- Refine vdso implementations
- Refine copy_from_user()/copy_to_user()/clear_user()/get_user()/memmove()/memcpy()
- Refine ioremap.c
- Refine irq-ativic32.c
- Fix a bug of earlycon.c
- Export ioremap_nocache/ioremap_uc/ioremap_wc/ioremap_wt
- Add atcpit100 driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iQIcBAABAgAGBQJawZ28AAoJEHfB0l0b2JxExKYQAJ7btaPeIplIndrphlTQkfzW
d1AhVBhwAlvqsOcFf+kwVIqOnfcLCtzVvgc63qc6mAroDZKcd+uuqOtkC801b33i
jxcfSX802PciT3VhE8xz9OWFY9D3in/UCBhGe2OvY/cD/eC34gZhUqJhML/ioR5q
DNPvua6lgYIN9VrFds19MjRzl7RwDBqNccQoFTXWc9Hl1Vs0YdKdbkOz0IWNtoLQ
crP0v/UuHMC++WdU+MvDIEFqNVuXikg/NA+odPIbp3eF3xcmQBM0blWAi37eOKFo
rzTw7TKtL8xObjvhyzx3aYFKPpLBfuYwk8onoZlthlqcwFClZy4lzdDdDxJhKiJ6
5hilzCSqEWXB9osQsrWgAuK1rNRvroChIp6/rcdGAq33mTPLVydx7hSKELhE7wuN
UUaiJSSNRG1ZrR8tkccQpaRBjJ/gfXWGC3ys723oWz8A4bDzMkvZVzdOGOEZ+CsI
w4HKNHLeY50wztV6dDSiVPhvUXQjBH9qd2zVHlutbfulPI/XNkGRfWpEGVT1zD4y
pO3aHVJfsv+8aeyVBcXyN74O34a9HYa7811v7V5RI+uftdPhkzOwxuMQVMMJPe3s
4u4NglP7fekeWyDGCXFKOGoVOuCAUPHuzBUCFjL/cStHjrkBGVQvL1I925sJNabu
IatFh62x8Ez4m2hIf5fg
=J/YF
-----END PGP SIGNATURE-----
Merge tag 'nds32-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 architecture support from Greentime Hu:
"This contains the core nds32 Linux port (including interrupt
controller driver and timer driver), which has been through seven
rounds of review on mailing list.
It is able to boot to shell and passes most LTP-2017 testsuites in
nds32 AE3XX platform:
Total Tests: 1901
Total Skipped Tests: 618
Total Failures: 78"
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
* tag 'nds32-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux: (44 commits)
nds32: To use the generic dump_stack()
nds32: fix building failed if using elf toolchain.
nios2: add ioremap_nocache declaration before include asm-generic/io.h.
nds32: fix building failed if using older version gcc.
dt-bindings: timer: Add andestech atcpit100 timer binding doc
clocksource/drivers/atcpit100: VDSO support
clocksource/drivers/atcpit100: Add andestech atcpit100 timer
net: faraday add nds32 support.
irqchip: Andestech Internal Vector Interrupt Controller driver
dt-bindings: interrupt-controller: Andestech Internal Vector Interrupt Controller
dt-bindings: nds32 SoC Bindings
dt-bindings: nds32 L2 cache controller Bindings
dt-bindings: nds32 CPU Bindings
MAINTAINERS: Add nds32
nds32: Build infrastructure
nds32: defconfig
nds32: Miscellaneous header files
nds32: Device tree support
nds32: Generic timers support
nds32: Loadable modules
...
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel
calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that
this function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as ksys_fadvise64_64().
Some compat stubs called sys_fadvise64(), which then just passed through
the arguments to sys_fadvise64_64(). Get rid of this indirection, and call
ksys_fadvise64_64() directly.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel
calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_fallocate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_p{read,write}64() wrappers allows us to get rid of
in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_p{read,write}64().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_truncate() wrapper allows us to get rid of in-kernel
calls to the sys_truncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_truncate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using this helper allows us to avoid the in-kernel calls to the
sys_sync_file_range() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_sync_file_range().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using this helper allows us to avoid the in-kernel calls to the
sys_sync() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_sync().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_ftruncate().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Previously, pci_register_legacy_regions() reserved PCI address space under
every PCI host bridge for the System ROM and the Video ROM, but these
regions are not part of PCI address space.
Previously, pci_register_legacy_regions() reserved the following areas of
PCI address space under every PCI host bridge:
[bus 0xa0000-0xbffff] Video RAM area (VGA frame buffer)
[bus 0xc0000-0xc7fff] Video ROM
[bus 0xf0000-0xfffff] System ROM
It does need to reserve the [bus 0xa0000-0xbffff] region (at least if
there's a possibility of a VGA device below the bridge) because VGA devices
can respond to that even if they don't describe it with a BAR.
But the Video ROM and System ROM areas don't seem necessary because they
are not areas that legacy PCI devices respond to.
They appear to be copied from x86, where they describe areas of system
memory that depend on BIOS conventions. On x86, BIOS copies the option ROM
of the primary VGA device to RAM at 0xc0000, and the 0xf0000-0xfffff region
is reserved for the motherboard BIOS. Neither of these things applies to
sparc.
Stop reserving the System ROM and Video ROM regions in PCI space.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
... and drop the pointless checks - sys_truncate() itself
might've lacked the check when that stuff was first written,
but it has already grown one by the time that stuff went into
mainline.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Back in January 1998, in 2.1.79 check_pending() got taken out and shot.
Users in arch/sparc and arch/sparc64 got converted away from it. One
of the externs (in arch/sparc) was taken out at the same time. Two
other (in arch/sparc64/kernel/sys_sparc{,32}.c) got stuck. The one in
sys_sparc32.c was quietly removed 6 years later by an unrelated
commit. The last one kept shambling until now. It's time to end
that depravity, let's bury the body...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit c6202ca764 ("sparc64: Add auxiliary vectors to report
platform ADI properties") adds auxiliary vectors to report ADI
capabilities on sparc64 platform only. This needs to be uniform
across 64-bit and 32-bit. This patch makes the same vectors
available on 32-bit as well.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f8ec66014f ("signal: Add send_sig_fault and force_sig_fault")
added new helper functions to streamline signal delivery. This patch
updates signal delivery for new/updated handlers for ADI related
exceptions to use the helper function.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.
This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. ADI is not enabled by
default for any task. A task must explicitly enable ADI on a memory
range and set version tag for ADI to be effective for the task.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI feature on M7 and newer processors has three important properties
relevant to userspace apps using ADI capabilities - (1) Size of block of
memory an ADI version tag applies to, (2) Number of uppermost bits in
virtual address used to encode ADI tag, and (3) The value M7 processor
will force the ADI tags to if it detects uncorrectable error in an ADI
tagged cacheline. Kernel can retrieve these properties for a platform
through machine description provided by the firmware. This patch adds
code to retrieve these properties and report them to userspace through
auxiliary vectors.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
M7 and newer processors add a "Memory corruption Detected" trap with
the addition of ADI feature. This trap is vectored into kernel by HV
through resumable error trap with error attribute for the resumable
error set to 0x00000800.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI (Application Data Integrity) feature on M7 and newer processors
adds new fault types for hypervisor - Invalid ASI and MCD disabled.
This patch expands data access exception handler to handle these
faults.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC M7 processor adds new control register fields, ASIs and a new
trap to support the ADI (Application Data Integrity) feature. This
patch adds definitions for these register fields, ASIs and a handler
for the new precise memory corruption detected trap.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pmdp_invalidate() was changed to update the pmd atomically
(to not lose dirty/access bits) and return the original pmd
value.
However, in doing so, we lost a lot of the essential work that
set_pmd_at() does, namely to update hugepage mapping counts and
queuing up the batched TLB flush entry.
Thus we were not flushing entries out of the TLB when making
such PMD changes.
Fix this by abstracting the accounting work of set_pmd_at() out into a
separate function, and call it from pmdp_establish().
Fixes: a8e654f01c ("sparc64: update pmdp_invalidate() to return old pmd value")
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all the grouping is done with RB trees, we no longer need
group_entry and can replace the whole thing with sibling_list.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It allows some architectures to use this generic macro instead of
defining theirs.
sparc: io: To use the define of ioremap_[nocache|wc|wb] in asm-generic/io.h
It will move the ioremap_nocache out of the CONFIG_MMU ifdef. This means that
in order to suppress re-definition errors we need to remove the #define
in arch/sparc/include/asm/io_32.h. Also, the change adds a prototype for
ioremap where size is size_t and offset is phys_addr_t so fix that as well.
Signed-off-by: Greentime Hu <greentime@andestech.com>
Looking at functions with large stack frames across all architectures
led me discovering that BUG() suffers from the same problem as
fortify_panic(), which I've added a workaround for already.
In short, variables that go out of scope by calling a noreturn function
or __builtin_unreachable() keep using stack space in functions
afterwards.
A workaround that was identified is to insert an empty assembler
statement just before calling the function that doesn't return. I'm
adding a macro "barrier_before_unreachable()" to document this, and
insert calls to that in all instances of BUG() that currently suffer
from this problem.
The files that saw the largest change from this had these frame sizes
before, and much less with my patch:
fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
In case of ARC and CRIS, it turns out that the BUG() implementation
actually does return (or at least the compiler thinks it does),
resulting in lots of warnings about uninitialized variable use and
leaving noreturn functions, such as:
block/cfq-iosched.c: In function 'cfq_async_queue_prio':
block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
include/linux/dmaengine.h: In function 'dma_maxpq':
include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
This makes them call __builtin_trap() instead, which should normally
dump the stack and kill the current process, like some of the other
architectures already do.
I tried adding barrier_before_unreachable() to panic() and
fortify_panic() as well, but that had very little effect, so I'm not
submitting that patch.
Vineet said:
: For ARC, it is double win.
:
: 1. Fixes 3 -Wreturn-type warnings
:
: | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
: non-void function [-Wreturn-type]
:
: 2. bloat-o-meter reports code size improvements as gcc elides the
: generated code for stack return.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that USB_UHCI_BIG_ENDIAN_MMIO and USB_UHCI_BIG_ENDIAN_DESC are moved
outside of the USB_SUPPORT conditional, simply select them from
SPARC_LEON rather than by the symbol's defaults in drivers/usb/Kconfig,
similar to how it is done for USB_EHCI_BIG_ENDIAN_MMIO and
USB_EHCI_BIG_ENDIAN_DESC.
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: sparclinux@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Patchwork: https://patchwork.linux-mips.org/patch/18560/
Add support for arbitrary bus address offset. Previously we ignored the
child (PCI) address in the "ranges" property and assumed it was always
zero. That means every host bridge window mapped to PCI bus address zero,
e.g.,
pci_bus 0000:00: root bus resource [mem 0x2000000000000-0x200007fffffff] (bus address [0x00000000-0x7fffffff])
But some systems have host bridge windows with non-zero child addresses, so
parse the child address and compute the offset between the parent (CPU) and
child (PCI) addresses. This allows windows like these:
/pci@305: PCI MEM [mem 0x2000000100000-0x200007effffff] offset 2000000000000
pci_sun4v f02ae7f8: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x2000000100000-0x200007effffff] (bus address [0x00100000-0x7effffff])
[bhelgaas: changelog]
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
except, again, POLLFREE and POLL_BUSY_LOOP.
With this, we finally get to the promised end result:
- POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
stray instances of ->poll() still using those will be caught by
sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Require struct page by default for filesystem DAX to remove a number of
surprising failure cases. This includes failures with direct I/O, gdb and
fork(2).
* Add support for the new Platform Capabilities Structure added to the NFIT in
ACPI 6.2a. This new table tells us whether the platform supports flushing
of CPU and memory controller caches on unexpected power loss events.
* Revamp vmem_altmap and dev_pagemap handling to clean up code and better
support future future PCI P2P uses.
* Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become
out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and
instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL
families, NVDIMM_FAMILY_{HPE,MSFT}.
* Enhance nfit_test so we can test some of the new things added in version 1.6
of the DSM specification. This includes testing firmware download and
simulating the Last Shutdown State (LSS) status.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t
2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY
c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7
aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r
7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub
qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz
EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC
gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG
rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o
DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od
ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+
sie+u2rTod8/gQWSfHpJ
=MIMX
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ross Zwisler:
- Require struct page by default for filesystem DAX to remove a number
of surprising failure cases. This includes failures with direct I/O,
gdb and fork(2).
- Add support for the new Platform Capabilities Structure added to the
NFIT in ACPI 6.2a. This new table tells us whether the platform
supports flushing of CPU and memory controller caches on unexpected
power loss events.
- Revamp vmem_altmap and dev_pagemap handling to clean up code and
better support future future PCI P2P uses.
- Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
spec, and instead rely on the generic ND_CMD_CALL approach used by
the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
- Enhance nfit_test so we can test some of the new things added in
version 1.6 of the DSM specification. This includes testing firmware
download and simulating the Last Shutdown State (LSS) status.
* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
acpi, nfit: fix register dimm error handling
libnvdimm, namespace: make min namespace size 4K
tools/testing/nvdimm: force nfit_test to depend on instrumented modules
libnvdimm/nfit_test: adding support for unit testing enable LSS status
libnvdimm/nfit_test: add firmware download emulation
nfit-test: Add platform cap support from ACPI 6.2a to test
libnvdimm: expose platform persistence attribute for nd_region
acpi: nfit: add persistent memory control flag for nd_region
acpi: nfit: Add support for detect platform CPU cache flush on power loss
device-dax: Fix trailing semicolon
libnvdimm, btt: fix uninitialized err_lock
dax: require 'struct page' by default for filesystem dax
ext2: auto disable dax instead of failing mount
ext4: auto disable dax instead of failing mount
mm, dax: introduce pfn_t_special()
mm: Fix devm_memremap_pages() collision handling
mm: Fix memory size alignment in devm_memremap_pages_release()
memremap: merge find_dev_pagemap into get_dev_pagemap
memremap: change devm_memremap_pages interface to use struct dev_pagemap
...
to the clk rate protection support added by Jerome Brunet. This feature
will allow consumers to lock in a certain rate on the output of a clk so
that things like audio playback don't hear pops when the clk frequency
changes due to shared parent clks changing rates. Currently the clk
API doesn't guarantee the rate of a clk stays at the rate you request
after clk_set_rate() is called, so this new API will allow drivers
to express that requirement. Beyond this, the core got some debugfs
pretty printing patches and a couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h file
causes small one-liners in all the architecture Kbuild files. Overall, the
driver diff seems to be the normal stuff that comes all the time to
fix little problems here and there and to support new hardware.
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/
AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW
zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY
WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L
q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS
HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g
EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB
eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ
pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V
OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN
32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0
nC0JWTp5sC/pKf54QBXj
=bGPp
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core framework has a handful of patches this time around, mostly
due to the clk rate protection support added by Jerome Brunet.
This feature will allow consumers to lock in a certain rate on the
output of a clk so that things like audio playback don't hear pops
when the clk frequency changes due to shared parent clks changing
rates. Currently the clk API doesn't guarantee the rate of a clk stays
at the rate you request after clk_set_rate() is called, so this new
API will allow drivers to express that requirement.
Beyond this, the core got some debugfs pretty printing patches and a
couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h
file causes small one-liners in all the architecture Kbuild files.
Overall, the driver diff seems to be the normal stuff that comes all
the time to fix little problems here and there and to support new
hardware.
Summary:
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: fix reentrancy of clk_enable() on UP systems
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: renesas: r8a7796: Add FDP clock
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
clk: Improve flags doc for of_clk_detect_critical()
arch: Remove clkdev.h asm-generic from Kbuild
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: Prepare to remove asm-generic/clkdev.h
...
Pull sparc updates from David Miller:
"Of note is the addition of a driver for the Data Analytics
Accelerator, and some small cleanups"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
oradax: Fix return value check in dax_attach()
sparc: vDSO: remove an extra tab
sparc64: drop unneeded compat include
sparc64: Oracle DAX driver
sparc64: Oracle DAX infrastructure
Pull asm/uaccess.h whack-a-mole from Al Viro:
"It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop
cropping up..."
* 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h
riscv: use linux/uaccess.h, not asm/uaccess.h...
ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
It's required to avoid losing dirty and accessed bits.
[akpm@linux-foundation.org: add a `do' to the do-while loop]
Link: http://lkml.kernel.org/r/20171213105756.69879-9-kirill.shutemov@linux.intel.com
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Significantly shrink the core networking routing structures. Result
of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf
2) Add netdevsim driver for testing various offloads, from Jakub
Kicinski.
3) Support cross-chip FDB operations in DSA, from Vivien Didelot.
4) Add a 2nd listener hash table for TCP, similar to what was done for
UDP. From Martin KaFai Lau.
5) Add eBPF based queue selection to tun, from Jason Wang.
6) Lockless qdisc support, from John Fastabend.
7) SCTP stream interleave support, from Xin Long.
8) Smoother TCP receive autotuning, from Eric Dumazet.
9) Lots of erspan tunneling enhancements, from William Tu.
10) Add true function call support to BPF, from Alexei Starovoitov.
11) Add explicit support for GRO HW offloading, from Michael Chan.
12) Support extack generation in more netlink subsystems. From Alexander
Aring, Quentin Monnet, and Jakub Kicinski.
13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
Russell King.
14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.
15) Many improvements and simplifications to the NFP driver bpf JIT,
from Jakub Kicinski.
16) Support for ipv6 non-equal cost multipath routing, from Ido
Schimmel.
17) Add resource abstration to devlink, from Arkadi Sharshevsky.
18) Packet scheduler classifier shared filter block support, from Jiri
Pirko.
19) Avoid locking in act_csum, from Davide Caratti.
20) devinet_ioctl() simplifications from Al viro.
21) More TCP bpf improvements from Lawrence Brakmo.
22) Add support for onlink ipv6 route flag, similar to ipv4, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
tls: Add support for encryption using async offload accelerator
ip6mr: fix stale iterator
net/sched: kconfig: Remove blank help texts
openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
tcp_nv: fix potential integer overflow in tcpnv_acked
r8169: fix RTL8168EP take too long to complete driver initialization.
qmi_wwan: Add support for Quectel EP06
rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
ipmr: Fix ptrdiff_t print formatting
ibmvnic: Wait for device response when changing MAC
qlcnic: fix deadlock bug
tcp: release sk_frag.page in tcp_disconnect
ipv4: Get the address of interface correctly.
net_sched: gen_estimator: fix lockdep splat
net: macb: Handle HRESP error
net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
ipv6: addrconf: break critical section in addrconf_verify_rtnl()
ipv6: change route cache aging logic
i40e/i40evf: Update DESC_NEEDED value to reflect larger value
bnxt_en: cleanup DIM work on device shutdown
...
Pull crypto updates from Herbert Xu:
"API:
- Enforce the setting of keys for keyed aead/hash/skcipher
algorithms.
- Add multibuf speed tests in tcrypt.
Algorithms:
- Improve performance of sha3-generic.
- Add native sha512 support on arm64.
- Add v8.2 Crypto Extentions version of sha3/sm3 on arm64.
- Avoid hmac nesting by requiring underlying algorithm to be unkeyed.
- Add cryptd_max_cpu_qlen module parameter to cryptd.
Drivers:
- Add support for EIP97 engine in inside-secure.
- Add inline IPsec support to chelsio.
- Add RevB core support to crypto4xx.
- Fix AEAD ICV check in crypto4xx.
- Add stm32 crypto driver.
- Add support for BCM63xx platforms in bcm2835 and remove bcm63xx.
- Add Derived Key Protocol (DKP) support in caam.
- Add Samsung Exynos True RNG driver.
- Add support for Exynos5250+ SoCs in exynos PRNG driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (166 commits)
crypto: picoxcell - Fix error handling in spacc_probe()
crypto: arm64/sha512 - fix/improve new v8.2 Crypto Extensions code
crypto: arm64/sm3 - new v8.2 Crypto Extensions implementation
crypto: arm64/sha3 - new v8.2 Crypto Extensions implementation
crypto: testmgr - add new testcases for sha3
crypto: sha3-generic - export init/update/final routines
crypto: sha3-generic - simplify code
crypto: sha3-generic - rewrite KECCAK transform to help the compiler optimize
crypto: sha3-generic - fixes for alignment and big endian operation
crypto: aesni - handle zero length dst buffer
crypto: artpec6 - remove select on non-existing CRYPTO_SHA384
hwrng: bcm2835 - Remove redundant dev_err call in bcm2835_rng_probe()
crypto: stm32 - remove redundant dev_err call in stm32_cryp_probe()
crypto: axis - remove unnecessary platform_get_resource() error check
crypto: testmgr - test misuse of result in ahash
crypto: inside-secure - make function safexcel_try_push_requests static
crypto: aes-generic - fix aes-generic regression on powerpc
crypto: chelsio - Fix indentation warning
crypto: arm64/sha1-ce - get rid of literal pool
crypto: arm64/sha2-ce - move the round constant table to .rodata section
...
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv
QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU
0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr
pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf
NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq
A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q
f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG
PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh
BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F
0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ
jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG
tdfUsev1Mc8=
=jhWg
-----END PGP SIGNATURE-----
Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull init_task initializer cleanups from David Howells:
"It doesn't seem useful to have the init_task in a header file rather
than in a normal source file. We could consolidate init_task handling
instead and expand out various macros.
Here's a series of patches that consolidate init_task handling:
(1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
openrisc.
(2) Alter the INIT_TASK_DATA linker script macro to set
init_thread_union and init_stack rather than defining these in C.
Insert init_task and init_thread_into into the init_stack area in
the linker script as appropriate to the configuration, with
different section markers so that they end up correctly ordered.
We can then get merge ia64's init_task.c into the main one.
We then have a bunch of single-use INIT_*() macros that seem only
to be macros because they used to be used per-arch. We can then
expand these in place of the user and get rid of a few lines and
a lot of backslashes.
(3) Expand INIT_TASK() in place.
(4) Expand in place various small INIT_*() macros that are defined
conditionally. Expand them and surround them by #if[n]def/#endif
in the .c file as it takes fewer lines.
(5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
(6) Expand INIT_STRUCT_PID in place.
These macros can then be discarded"
* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
Expand INIT_STRUCT_PID and remove
Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
Expand various INIT_* macros and remove
Expand INIT_TASK() in init/init_task.c and remove
Construct init thread stack in the linker script rather than by union
openrisc: Make THREAD_SIZE available to vmlinux.lds
hexagon: Make THREAD_SIZE available to vmlinux.lds
cris: Make THREAD_SIZE available to vmlinux.lds
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) A number of extensions to tcp-bpf, from Lawrence.
- direct R or R/W access to many tcp_sock fields via bpf_sock_ops
- passing up to 3 arguments to bpf_sock_ops functions
- tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks
- optionally calling bpf_sock_ops program when RTO fires
- optionally calling bpf_sock_ops program when packet is retransmitted
- optionally calling bpf_sock_ops program when TCP state changes
- access to tclass and sk_txhash
- new selftest
2) div/mod exception handling, from Daniel.
One of the ugly leftovers from the early eBPF days is that div/mod
operations based on registers have a hard-coded src_reg == 0 test
in the interpreter as well as in JIT code generators that would
return from the BPF program with exit code 0. This was basically
adopted from cBPF interpreter for historical reasons.
There are multiple reasons why this is very suboptimal and prone
to bugs. To name one: the return code mapping for such abnormal
program exit of 0 does not always match with a suitable program
type's exit code mapping. For example, '0' in tc means action 'ok'
where the packet gets passed further up the stack, which is just
undesirable for such cases (e.g. when implementing policy) and
also does not match with other program types.
After considering _four_ different ways to address the problem,
we adapt the same behavior as on some major archs like ARMv8:
X div 0 results in 0, and X mod 0 results in X. aarch64 and
aarch32 ISA do not generate any traps or otherwise aborts
of program execution for unsigned divides.
Given the options, it seems the most suitable from
all of them, also since major archs have similar schemes in
place. Given this is all in the realm of undefined behavior,
we still have the option to adapt if deemed necessary.
3) sockmap sample refactoring, from John.
4) lpm map get_next_key fixes, from Yonghong.
5) test cleanups, from Alexei and Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from sparc64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch fixes the typo CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64
Fixes: 81658ad0d9 ("sparc64: Add CAMELLIA driver making use of the new camellia opcodes.")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This statement is indented one tab too far which is confusing and
leads to a Smatch warning:
arch/sparc/vdso/vma.c:254 arch_setup_additional_pages()
warn: curly braces intended?
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last user of compat_old_sigset_t was removed in commit
2d7d5f0511.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
DAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
(DAX2) processor chips, and has direct access to the CPU's L3 caches
as well as physical memory. It can perform several operations on data
streams with various input and output formats. This driver provides a
transport mechanism and has limited knowledge of the various opcodes
and data formats. A user space library provides high level services
and translates these into low level commands which are then passed
into the driver and subsequently the hypervisor and the coprocessor.
The library is the recommended way for applications to use the
coprocessor, and the driver interface is not intended for general use.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Signed-off-by: Sanath Kumar <sanath099@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds hypercall function stubs and C templates for
ccb_submit/info/kill which provide coprocessor services for the Oracle
Data Analytics Accelerator, registration for the DAX api group, and
all the various associated constants.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Signed-off-by: Sanath Kumar <sanath099@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems. Some architectures fail to handle all of the cases in in
the siginfo union. Some architectures perform a blind copy of the
siginfo union when the si_code is negative. A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.
Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.
A special case is made for x86 x32 format. This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats. By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures. Vastly increasing the testing base and the chances of
finding bugs.
As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
linux/compat.h
CONFIG_X86_X32 is set when the user requests X32 support.
CONFIG_X86_X32_ABI is set when the user requests X32 support
and the tool-chain has X32 allowing X32 support to be built.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We need to consistently enforce that keyed hashes cannot be used without
setting the key. To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not. AF_ALG currently does this by
checking for the presence of a ->setkey() method. However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key. (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)
Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called. Then set it on all the CRC-32 algorithms.
The same also applies to the Adler-32 implementation in Lustre.
Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.
The following symbols are then made available from INIT_TASK_DATA() linker
script macro:
init_thread_union
init_stack
INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that every architecture is using the generic clkdev.h file
and we no longer include asm/clkdev.h anywhere in the tree, we
can remove it.
Cc: Russell King <linux@armlinux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-28
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Fix incorrect state pruning related to recognition of zero initialized
stack slots, where stacksafe exploration would mistakenly return a
positive pruning verdict too early ignoring other slots, from Gianluca.
2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
in maximum call depth check, and rework maximum stack depth tracking
logic to fix a bypass of the total stack size check reported by Jann.
Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
Addition of various test cases to BPF selftests, from Alexei.
3) Addition of a BPF selftest to test_verifier that is related to BPF to
BPF calls which demonstrates a late caller stack size increase and
thus out of bounds access. Fixed above in 2). Test case from Jann.
4) Addition of correlating BPF helper calls, BPF to BPF calls as well
as BPF maps to bpftool xlated dump in order to allow for better
BPF program introspection and debugging, from Daniel.
5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
to get it actually to work for subprogs, from Daniel.
6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
of build errors for libbpf on sparc64, from David.
7) Allow narrower context access for BPF dev cgroup typed programs in
order to adapt to LLVM code generation. Also adjust memlock rlimit
in the test_dev_cgroup BPF selftest, from Yonghong.
8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
relies on netdevsim device being available, from Jakub.
9) Reduce scope of xdp_do_generic_redirect_map() to being static,
from Xiongwei.
10) Minor cleanups and spelling fixes in BPF verifier, from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit v4.12-rc4-1-g9289ea7f952b introduced a mistake that made the
64-bit hweight stub call the 16-bit hweight function.
Fixes: 9289ea7f95 ("sparc64: Use indirect calls in hamming weight stubs")
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modelled strongly upon the arm64 implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Lots of overlapping changes. Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.
Include a necessary change by Jakub Kicinski, with log message:
====================
cls_bpf no longer takes care of offload tracking. Make sure
netdevsim performs necessary checks. This fixes a warning
caused by TC trying to remove a filter it has not added.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller"
"What's a holiday weekend without some networking bug fixes? [1]
1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
calls, from Daniel Borkmann.
2) Fix regression from errata limiting change to marvell PHY driver,
from Zhao Qiang.
3) Fix u16 overflow in SCTP, from Xin Long.
4) Fix potential memory leak during bridge newlink, from Nikolay
Aleksandrov.
5) Fix BPF selftest build on s390, from Hendrik Brueckner.
6) Don't append to cfg80211 automatically generated certs file,
always write new ones from scratch. From Thierry Reding.
7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.
8) Fix hang on tg3 MTU change with certain chips, from Brian King.
9) Add stall detection to arc emac driver and reset chip when this
happens, from Alexander Kochetkov.
10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.
11) Fix stmmac timestamping bug due to mis-shifting of field. From
Fredrik Hallenberg.
12) Fix metrics match when deleting an ipv4 route. The kernel sets
some internal metrics bits which the user isn't going to set when
it makes the delete request. From Phil Sutter.
13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
from Yelena Krivosheev.
14) Fix double free and memory corruption in get_net_ns_by_id, from
Eric W. Biederman.
15) Flush ipv4 FIB tables in the reverse order. Some tables can share
their actual backing data, in particular this happens for the MAIN
and LOCAL tables. We have to kill the LOCAL table first, because
it uses MAIN's backing memory. Fix from Ido Schimmel.
16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
Horn, and Alexei Starovoitov.
17) Make changes to ipv6 autoflowlabel sysctl really propagate to
sockets, unless the socket has set the per-socket value
explicitly. From Shaohua Li.
18) Fix leaks and double callback invocations of zerocopy SKBs, from
Willem de Bruijn"
[1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
skbuff: skb_copy_ubufs must release uarg even without user frags
skbuff: orphan frags before zerocopy clone
net: reevalulate autoflowlabel setting after sysctl setting
openvswitch: Fix pop_vlan action for double tagged frames
ipv6: Honor specified parameters in fibmatch lookup
bpf: do not allow root to mangle valid pointers
selftests/bpf: add tests for recent bugfixes
bpf: fix integer overflows
bpf: don't prune branches when a scalar is replaced with a pointer
bpf: force strict alignment checks for stack pointers
bpf: fix missing error return in check_stack_boundary()
bpf: fix 32-bit ALU op verification
bpf: fix incorrect tracking of register size truncation
bpf: fix incorrect sign extension in check_alu_op()
bpf/verifier: fix bounds calculation on BPF_RSH
ipv4: Fix use-after-free when flushing FIB tables
s390/qeth: fix error handling in checksum cmd callback
tipc: remove joining group member from congested list
selftests: net: Adding config fragment CONFIG_NUMA=y
nfp: bpf: keep track of the offloaded program
...
The hashing of %p was designed to restrict kernel addresses. There is
no reason to hash the userspace values seen during a segfault report,
so switch these to %px. (Some architectures already use %lx.)
Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-17
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a corner case in generic XDP where we have non-linear skbs
but enough tailroom in the skb to not miss to linearizing there,
from Song.
2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
BPF context is not skb, from Daniel.
3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
call would use the wrong register for the skb, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
global bpf_jit_enable variable is tested multiple times in JITs,
blinding and verifier core. The malicious root can try to toggle
it while loading the programs. This race condition was accounted
for and there should be no issues, but it's safer to avoid
this race condition.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This reverts commits 5c9d2d5c26, c7da82b894, and e7fe7b5cae.
We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.
Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was. Dave
Hansen says:
"So, the underlying bug here is that we now a get_user_pages_remote()
and then go ahead and do the p*_access_permitted() checks against the
current PKRU. This was introduced recently with the addition of the
new p??_access_permitted() calls.
We have checks in the VMA path for the "remote" gups and we avoid
consulting PKRU for them. This got missed in the pkeys selftests
because I did a ptrace read, but not a *write*. I also didn't
explicitly test it against something where a COW needed to be done"
It's also not entirely clear that it makes sense to check the protection
key bits at this level at all. But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.
We'll see.
Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back. Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
program type") introduced the bpf_perf_event_data structure which
exports the pt_regs structure. This is OK for multiple architectures
but fail for s390 and arm64 which do not export pt_regs. Programs
using them, for example, the bpf selftest fail to compile on these
architectures.
For s390, exporting the pt_regs is not an option because s390 wants
to allow changes to it. For arm64, there is a user_pt_regs structure
that covers parts of the pt_regs structure for use by user space.
To solve the broken uapi for s390 and arm64, introduce an abstract
type for pt_regs and add an asm/bpf_perf_event.h file that concretes
the type. An asm-generic header file covers the architectures that
export pt_regs today.
The arch-specific enablement for s390 and arm64 follows in separate
commits.
Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Fixes: 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Mergr misc fixes from Andrew Morton:
"28 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
autofs: revert "autofs: take more care to not update last_used on path walk"
fs/fat/inode.c: fix sb_rdonly() change
mm, memcg: fix mem_cgroup_swapout() for THPs
mm: migrate: fix an incorrect call of prep_transhuge_page()
kmemleak: add scheduling point to kmemleak_scan()
scripts/bloat-o-meter: don't fail with division by 0
fs/mbcache.c: make count_objects() more robust
Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
mm/madvise.c: fix madvise() infinite loop under special circumstances
exec: avoid RLIMIT_STACK races with prlimit()
IB/core: disable memory registration of filesystem-dax vmas
v4l2: disable filesystem-dax mapping support
mm: fail get_vaddr_frames() for filesystem-dax mappings
mm: introduce get_user_pages_longterm
device-dax: implement ->split() to catch invalid munmap attempts
mm, hugetlbfs: introduce ->split() to vm_operations_struct
scripts/faddr2line: extend usage on generic arch
...
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pmd_write is must be referencing user-memory.
Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pud_write is must be referencing user-memory.
[dan.j.williams@intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:
did you consider using the other paradigm:
In arch include files:
#define pud_write pud_write
static inline int pud_write(pud_t pud)
.....
Then in include/asm-generic/pgtable.h:
#ifndef pud_write
tatic inline int pud_write(pud_t pud)
{
....
}
#endif
If you had, then the powerpc code would have worked ... ;-) and many
of the other interfaces in include/asm-generic/pgtable.h are
protected that way ...
Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we don't put the NG4fls.o object into the same part of
the link as the generic sparc64 objects for fls() and __fls()
then the relocation in the branch we use for patching will
not fit.
Move NG4fls.o into lib-y to fix this problem.
Fixes: 46ad8d2d22 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Pull sparc updates from David Miller:
1) Add missing cmpxchg64() for 32-bit sparc.
2) Timer conversions from Allen Pais and Kees Cook.
3) vDSO support, from Nagarathnam Muthusamy.
4) Fix sparc64 huge page table walks based upon bug report by Al Viro,
from Nitin Gupta.
5) Optimized fls() for T4 and above, from Vijay Kumar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix page table walk for PUD hugepages
sparc64: Convert timers to user timer_setup()
sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
sparc/led: Convert timers to use timer_setup()
sparc64: Use sparc optimized fls and __fls for T4 and above
sparc64: SPARC optimized __fls function
sparc64: SPARC optimized fls function
sparc64: Define SPARC default __fls function
sparc64: Define SPARC default fls function
vDSO for sparc
sparc32: Add cmpxchg64().
sbus: char: Move D7S_MINOR to include/linux/miscdevice.h
sparc: time: Remove unneeded linux/miscdevice.h include
sparc64: mmu_context: Add missing include files
Commit a7be6e5a7f ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().
The parent_node() macro in SPARC64 platform is unnecessary.
Remove it for cleanup.
Link: http://lkml.kernel.org/r/1504234599-29533-6-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parisc updates from Helge Deller:
"Highlights:
- one important fix from Dave to prevent kernel crash when userspace
hands over invalid values to our in-kernel CAS implementation.
- added CPU topology support, including multi-core scheduler support
on PA8900 CPUs
Minor changes:
- minor fixes for sparse (from Luc)
- drop duplicates for CPU_BIG_ENDIAN from parisc and sparc top
Kconfig files (from Babu)
- reorganized parisc PDC (firmware-access) header files for usage
from userspace. Required for upcoming qemu parisc emulator and
SeaBIOS fork to support parisc"
* 'parisc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
arch: Fix duplicates in Kconfig for parisc and sparc
parisc: Make some PDC structures accessible in uapi headers
parisc: Pass endianness info to sparse
parisc: Add CPU topology support
parisc: Fix validity check of pointer size argument in new CAS implementation
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDuoWAAoJEAAOaEEZVoIVXEQP/jQYoU9hgvEj8j3ZIgi56SDJ
pR45w2zcJz2/uU43DEKyShyLgsuoBbJQ3l/gGBH/tl+xGm9NzB0gatoEu9GmKNYz
/IN6/vUFnoIAUyD+iMZbpmsYKIkz0z2YJo261IfspAwIft/cvHJnYYGQrP9YXg9F
c7bdDuANTKocdQigc4BQyOe3OfIBGfTwJhuakO+1yuZmGOVNyxEcdYbMM8FiTfc8
+62kvQQ3t7WMqSbM8M0QdGcYQjG0EwcVAuV7COurLJIva7hUkVel32MVUjoFcf28
BnRu2ztFJCubm1HA85twlJDtpeXbcMqrUl/CcwRMpwDaePd5GVB1h5iKqbZ51BZ1
fWT2STmt+8hY2B5eiXoYEaG3B7ZRr+r0oroxqOxpiZ/m4AVeouF+gPGv+NV5zgvD
NGWC0MdklIJ4xaC99NEeP6kBhz0M74VKymFCTeHkVg9m4TqDepNvitKed0qagw19
uw8seei7TOTm4o117+l55NHmyfTHXFO4U0WLTJyeZcoEnUs0rOcHeqyy0RwCBMrK
W2fJtdBLFr+tBIIrID4TnPhhYtSvIPjz+FpiRDobqhgvMva/PIvLGTWK4unrgIjG
ZQ7YGnwWda8GjqKhgZacn/BSXyJzOAF9hJp0mz2ORaOxaMarEV55duiZufCvGuZw
uUQWRCKuQX7Oi05i9jXp
=fCeF
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking update from Jeff Layton:
"A couple of fixes for a patch that went into v4.14, and the bug report
just came in a few days ago.. It passes my (minimal) testing, and has
been in linux-next for a few days now.
I also would like to get my address changed in MAINTAINERS to clear
that hurdle"
* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
fcntl: don't leak fd reference when fixup_compat_flock fails
MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
Pull compat and uaccess updates from Al Viro:
- {get,put}_compat_sigset() series
- assorted compat ioctl stuff
- more set_fs() elimination
- a few more timespec64 conversions
- several removals of pointless access_ok() in places where it was
followed only by non-__ variants of primitives
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
coredump: call do_unlinkat directly instead of sys_unlink
fs: expose do_unlinkat for built-in callers
ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
ipmi: get rid of pointless access_ok()
pi433: sanitize ioctl
cxlflash: get rid of pointless access_ok()
mtdchar: get rid of pointless access_ok()
r128: switch compat ioctls to drm_ioctl_kernel()
selection: get rid of field-by-field copyin
VT_RESIZEX: get rid of field-by-field copyin
i2c compat ioctls: move to ->compat_ioctl()
sched_rr_get_interval(): move compat to native, get rid of set_fs()
mips: switch to {get,put}_compat_sigset()
sparc: switch to {get,put}_compat_sigset()
s390: switch to {get,put}_compat_sigset()
ppc: switch to {get,put}_compat_sigset()
parisc: switch to {get,put}_compat_sigset()
get_compat_sigset()
get rid of {get,put}_compat_itimerspec()
io_getevents: Use timespec64 to represent timeouts
...
Fix duplicates for sparc and parisc. This was due these following commits.
1. commit 4c97a0c8fe ("arch: define CPU_BIG_ENDIAN for all fixed big
endian archs")
2. commit 97d9f96916 ("arch/sparc: Define config parameter
CPU_BIG_ENDIAN")
3. commit 74ad3d28af ("parisc: Define CONFIG_CPU_BIG_ENDIAN")
Remove duplicates.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2 updates
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...
Most callers users of free_hot_cold_page claim the pages being released
are cache hot. The exception is the page reclaim paths where it is
likely that enough pages will be freed in the near future that the
per-cpu lists are going to be recycled and the cache hotness information
is lost. As no one really cares about the hotness of pages being
released to the allocator, just ditch the parameter.
The APIs are renamed to indicate that it's no longer about hot/cold
pages. It should also be less confusing as there are subtle differences
between them. __free_pages drops a reference and frees a page when the
refcount reaches zero. free_hot_cold_page handled pages whose refcount
was already zero which is non-obvious from the name. free_unref_page
should be more obvious.
No performance impact is expected as the overhead is marginal. The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.
[mgorman@techsingularity.net: add pages to head, not tail]
Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an optimized mm_zero_struct_page(), so struct page's are zeroed
without calling memset(). We do eight to ten regular stores based on
the size of struct page. Compiler optimizes out the conditions of
switch() statement.
SPARC-M6 with 15T of memory, single thread performance:
BASE FIX OPTIMIZED_FIX
bootmem_init 28.440467985s 2.305674818s 2.305161615s
free_area_init_nodes 202.845901673s 225.343084508s 172.556506560s
--------------------------------------------
Total 231.286369658s 227.648759326s 174.861668175s
BASE: current linux
FIX: This patch series without "optimized struct page zeroing"
OPTIMIZED_FIX: This patch series including the current patch.
bootmem_init() is where memory for struct pages is zeroed during
allocation. Note, about two seconds in this function is a fixed time:
it does not increase as memory is increased.
Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove duplicating code by using common functions vmemmap_pud_populate
and vmemmap_pgd_populate.
Link: http://lkml.kernel.org/r/20171013173214.27300-5-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().
With deferred struct page feature enabled there is a case where we set
some fields prior to initializing:
mem_init() {
register_page_bootmem_info();
free_all_bootmem();
...
}
When register_page_bootmem_info() is called only non-deferred struct
pages are initialized. But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.
mem_init
register_page_bootmem_info
register_page_bootmem_info_node
get_page_bootmem
.. setting fields here ..
such as: page->freelist = (void *)type;
free_all_bootmem()
free_low_memory_core_early()
for_each_reserved_mem_region()
reserve_bootmem_region()
init_reserved_page() <- Only if this is deferred reserved page
__init_single_pfn()
__init_single_page()
memset(0) <-- Loose the set fields here
We end up with similar issue as in the previous patch, where currently
we do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.
Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.
The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.
Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all allocations that used a NOTRACK flag to stop using it.
Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd
and nr_pud.
The patch also makes nr_ptes accounting dependent onto CONFIG_MMU. Page
table accounting doesn't make sense if you don't have page tables.
It's preparation for consolidation of page-table counters in mm_struct.
Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup. The trick is to allocate a lot of PUD page tables. We don't
account PUD page tables, only PMD and PTE.
We already addressed the same issue for PMD page tables, see commit
dc6c9a35b6 ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.
The patch expands accounting to PUD level.
[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes: 94073ad77f (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
For a PUD hugepage entry, we need to propagate bits [32:22]
from virtual address to resolve at 4M granularity. However,
the current code was incorrectly propagating bits [29:19].
This bug can cause incorrect data to be returned for pages
backed with 16G hugepages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to using the new timer_setup() and from_timer()
in LDOM Virtual I/O handshake.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable mdesc_handle.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a static variable to hold timeout
value.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For T4 and above, patch fls and __fls functions
at the boot time to use lzcnt instruction.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defined SPARC optimized __fls using lzcnt opcode.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defined SPARC optimized fls using lzcnt opcode.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch is based on work done by Nick Alcock on 64-bit vDSO for sparc
in Oracle linux. I have extended it to include support for 32-bit vDSO for sparc
on 64-bit kernel.
vDSO for sparc is based on the X86 implementation. This patch
provides vDSO support for both 64-bit and 32-bit programs on 64-bit kernel.
vDSO will be disabled on 32-bit linux kernel on sparc.
*) vclock_gettime.c contains all the vdso functions. Since data page is mapped
before the vdso code page, the pointer to data page is got by subracting offset
from an address in the vdso code page. The return address stored in
%i7 is used for this purpose.
*) During compilation, both 32-bit and 64-bit vdso images are compiled and are
converted into raw bytes by vdso2c program to be ready for mapping into the
process. 32-bit images are compiled only if CONFIG_COMPAT is enabled. vdso2c
generates two files vdso-image-64.c and vdso-image-32.c which contains the
respective vDSO image in C structure.
*) During vdso initialization, required number of vdso pages are allocated and
raw bytes are copied into the pages.
*) During every exec, these pages are mapped into the process through
arch_setup_additional_pages and the location of mapping is passed on to the
process through aux vector AT_SYSINFO_EHDR which is used by glibc.
*) A new update_vsyscall routine for sparc is added to keep the data page in
vdso updated.
*) As vDSO cannot contain dynamically relocatable references, a new version of
cpu_relax is added for the use of vDSO.
This change also requires a putback to glibc to use vDSO. For testing,
programs planning to try vDSO can be compiled against the generated
vdso(64/32).so in the source.
Testing:
========
[root@localhost ~]# cat vdso_test.c
int main() {
struct timespec tv_start, tv_end;
struct timeval tv_tmp;
int i;
int count = 1 * 1000 * 10000;
long long diff;
clock_gettime(0, &tv_start);
for (i = 0; i < count; i++)
gettimeofday(&tv_tmp, NULL);
clock_gettime(0, &tv_end);
diff = (long long)(tv_end.tv_sec -
tv_start.tv_sec)*(1*1000*1000*1000);
diff += (tv_end.tv_nsec - tv_start.tv_nsec);
printf("Start sec: %d\n", tv_start.tv_sec);
printf("End sec : %d\n", tv_end.tv_sec);
printf("%d cycles in %lld ns = %f ns/cycle\n", count, diff,
(double)diff / (double)count);
return 0;
}
[root@localhost ~]# cc vdso_test.c -o t32_without_fix -m32 -lrt
[root@localhost ~]# ./t32_without_fix
Start sec: 1502396130
End sec : 1502396140
10000000 cycles in 9565148528 ns = 956.514853 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t32_with_fix -m32 ./vdso32.so.dbg
[root@localhost ~]# ./t32_with_fix
Start sec: 1502396168
End sec : 1502396169
10000000 cycles in 798141262 ns = 79.814126 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_without_fix -m64 -lrt
[root@localhost ~]# ./t64_without_fix
Start sec: 1502396208
End sec : 1502396218
10000000 cycles in 9846091800 ns = 984.609180 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_with_fix -m64 ./vdso64.so.dbg
[root@localhost ~]# ./t64_with_fix
Start sec: 1502396257
End sec : 1502396257
10000000 cycles in 380984048 ns = 38.098405 ns/cycle
V1 to V2 Changes:
=================
Added hot patching code to switch the read stick instruction to read
tick instruction based on the hardware.
V2 to V3 Changes:
=================
Merged latest changes from sparc-next and moved the initialization
of clocksource_tick.archdata.vclock_mode to time_init_early. Disabled
queued spinlock and rwlock configuration when simulating 32-bit config
to compile 32-bit VDSO.
V3 to V4 Changes:
=================
Hardcoded the page size as 8192 in linker script for both 64-bit and
32-bit binaries. Removed unused variables in vdso2c.h. Added -mv8plus flag to
Makefile to prevent the generation of relocation entries for __lshrdi3 in 32-bit
vdso binary.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- turn dma_cache_sync into a dma_map_ops instance and remove
implementation that purely are dead because the architecture
doesn't support noncoherent allocations
- add a flag for busses that need DMA configuration (Robin Murphy)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSrYLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOMuQ//XXD94uNPYavrgXzGsAtg+I+LEm+xyk4T0dX5fxfj
amXX49MHoGemjsBgzJlkQMMFqwDEdkKyEuFnEuy6OeowYCyD6zW0MJ3MwP9OosNJ
PNTdGZIfSvxPYEW8cR9AdK3iQ2loMBZnYhd+O/oVjSugULLW2DNa7r2VRktcCKoh
8Ob/8gL6Y9xEYJBRszhrBwKTa/hU8IThxxozBFzN7I3LIKyFboSTcwXGLAHow43g
4anCTjWTaDcoU2JwY6UTRKRRTV+gD0ZRcsZfd8lNNb5rtMVZkBVOHbF14SMAmw1r
kSgRcU3+WIFPhK/8wBYqtGZZGnOgFBTHVeqow3AdS728pBWlWl8niTK0DiIgCd3m
qzScF6SqfN1bCZkZAy8FUV2l0DPYKS6lvyNkf00Eb2W/f6LEqAcjCi2QDDxRfaw+
Vm97nPUiM+uXNy/6KtAy6ChdprSqx12/edXPp7Y3H2rS/+Dmr6exeix+wb7QUN8W
JI7ZRHo4JLaJZk/XrZtGX/6jnN1Jo7vfApQOmYDY7kE1iGtOU/LQQj8gcZRVQxML
4soN6ivSmZX2k03LabWHpYQ8QiyCSYChLC+Az7rQH47LDLeu1IdTJu6orpXpaxyo
ymzEWlHbmF7mE66X4g/Up/eAYk2YLUA3rKLGVjAIaWDBzHftSFg5EaAnqMADC1G2
hSo=
=ALJf
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- turn dma_cache_sync into a dma_map_ops instance and remove
implementation that purely are dead because the architecture doesn't
support noncoherent allocations
- add a flag for busses that need DMA configuration (Robin Murphy)
* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: turn dma_cache_sync into a dma_map_ops method
sh: make dma_cache_sync a no-op
xtensa: make dma_cache_sync a no-op
unicore32: make dma_cache_sync a no-op
powerpc: make dma_cache_sync a no-op
mn10300: make dma_cache_sync a no-op
microblaze: make dma_cache_sync a no-op
ia64: make dma_cache_sync a no-op
frv: make dma_cache_sync a no-op
x86: make dma_cache_sync a no-op
floppy: consolidate the dummy fd_cacheflush definition
drivers: flag buses which demand DMA configuration
Pull timer updates from Thomas Gleixner:
"Yet another big pile of changes:
- More year 2038 work from Arnd slowly reaching the point where we
need to think about the syscalls themself.
- A new timer function which allows to conditionally (re)arm a timer
only when it's either not running or the new expiry time is sooner
than the armed expiry time. This allows to use a single timer for
multiple timeout requirements w/o caring about the first expiry
time at the call site.
- A new NMI safe accessor to clock real time for the printk timestamp
work. Can be used by tracing, perf as well if required.
- A large number of timer setup conversions from Kees which got
collected here because either maintainers requested so or they
simply got ignored. As Kees pointed out already there are a few
trivial merge conflicts and some redundant commits which was
unavoidable due to the size of this conversion effort.
- Avoid a redundant iteration in the timer wheel softirq processing.
- Provide a mechanism to treat RTC implementations depending on their
hardware properties, i.e. don't inflict the write at the 0.5
seconds boundary which originates from the PC CMOS RTC to all RTCs.
No functional change as drivers need to be updated separately.
- The usual small updates to core code clocksource drivers. Nothing
really exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
timers: Add a function to start/reduce a timer
pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
timer: Prepare to change all DEFINE_TIMER() callbacks
netfilter: ipvs: Convert timers to use timer_setup()
scsi: qla2xxx: Convert timers to use timer_setup()
block/aoe: discover_timer: Convert timers to use timer_setup()
ide: Convert timers to use timer_setup()
drbd: Convert timers to use timer_setup()
mailbox: Convert timers to use timer_setup()
crypto: Convert timers to use timer_setup()
drivers/pcmcia: omap1: Fix error in automated timer conversion
ARM: footbridge: Fix typo in timer conversion
drivers/sgi-xp: Convert timers to use timer_setup()
drivers/pcmcia: Convert timers to use timer_setup()
drivers/memstick: Convert timers to use timer_setup()
drivers/macintosh: Convert timers to use timer_setup()
hwrng/xgene-rng: Convert timers to use timer_setup()
auxdisplay: Convert timers to use timer_setup()
sparc/led: Convert timers to use timer_setup()
mips: ip22/32: Convert timers to use timer_setup()
...
<linux/pci.h> defines struct pci_bus and struct pci_dev and includes the
struct resource definition before including <asm/pci.h>. Nobody includes
<asm/pci.h> directly, so they don't need their own declarations.
Remove the redundant struct pci_dev, pci_bus, resource declarations.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> # CRIS
Acked-by: Ralf Baechle <ralf@linux-mips.org> # MIPS
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a static variable to hold timeout
value.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.
By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
otherwise syscall usage would not be possible.
Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
linux/compiler.h is included indirectly by linux/types.h via
uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
-> uapi/linux/stddef.h and is needed to provide a proper definition of
offsetof.
Unfortunately, compiler.h requires a definition of
smp_read_barrier_depends() for defining lockless_dereference() and soon
for defining READ_ONCE(), which means that all
users of READ_ONCE() will need to include asm/barrier.h to avoid splats
such as:
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from arch/h8300/kernel/asm-offsets.c:11:
include/linux/list.h: In function 'list_empty':
>> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
^
A better alternative is to include asm/barrier.h in linux/compiler.h,
but this requires a type definition for "bool" on some architectures
(e.g. x86), which is defined later by linux/types.h. Type "bool" is also
used directly in linux/compiler.h, so the whole thing is pretty fragile.
This patch splits compiler.h in two: compiler_types.h contains type
annotations, definitions and the compiler-specific parts, whereas
compiler.h #includes compiler-types.h and additionally defines macros
such as {READ,WRITE.ACCESS}_ONCE().
uapi/linux/stddef.h and linux/linkage.h are then moved over to include
linux/compiler_types.h, which fixes the build for h8 and blackfin.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After we removed all the dead wood it turns out only two architectures
actually implement dma_cache_sync as a real op: mips and parisc. Add
a cache_sync method to struct dma_map_ops and implement it for the
mips defualt DMA ops, and the parisc pa11 ops.
Note that arm, arc and openrisc support DMA_ATTR_NON_CONSISTENT, but
never provided a functional dma_cache_sync implementations, which
seems somewhat odd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Only mips defines this helper, so remove all the other arch definitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch_{read,spin,write}_relax() are defined as cpu_relax() by the core
code, so architectures that can't do better (i.e. most of them) don't
need to bother with the dummy definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Outside of the locking code itself, {read,spin,write}_can_lock() have no
users in tree. Apparmor (the last remaining user of write_can_lock()) got
moved over to lockdep by the previous patch.
This patch removes the use of {read,spin,write}_can_lock() from the
BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
lock status, and subsequently removes the unused macros altogether. They
aren't guaranteed to work in a concurrent environment and can give
incorrect results in the case of qrwlock.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.
With this new flag we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc ("net:add one common config...") did,
so revert this commit.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
arch/sparc/kernel/time_64.c does not contain any miscdevice so the
inclusion of linux/miscdevice.h is unnecessary.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following build errors.
In file included from arch/sparc/include/asm/mmu_context.h:4:0,
from include/linux/mmu_context.h:4,
from drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:29,
from drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:23:
arch/sparc/include/asm/mmu_context_64.h:22:37: error:
unknown type name 'per_cpu_secondary_mm'
arch/sparc/include/asm/mmu_context_64.h: In function 'switch_mm':
arch/sparc/include/asm/mmu_context_64.h:79:2: error:
implicit declaration of function 'smp_processor_id'
Fixes: 70539bd795 ("drm/amd: Update MEC HQD loading code for KFD")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
slightly more complicated than usual, since old sigframe layout
on sparc keeps the first 32 bits of mask away from the rest
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are 4 callers of sigset_to_compat() in the entire kernel. One is
in sparc compat rt_sigaction(2), the rest are in kernel/signal.c itself.
All are followed by copy_to_user(), and all but the sparc one are under
"if it's big-endian..." ifdefs.
Let's transform sigset_to_compat() into put_compat_sigset() that also
calls copy_to_user().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Load instructions using ASI_PNF or other no-fault ASIs should not
cause a SIGSEGV or SIGBUS.
A garden variety unmapped address follows the TSB miss path, and when
no valid mapping is found in the process page tables, the miss handler
checks to see if the access was via a no-fault ASI. It then fixes up
the target register with a zero, and skips the no-fault load
instruction.
But different paths are taken for data access exceptions and alignment
traps, and these do not respect the no-fault ASI. We add checks in
these paths for the no-fault ASI, and fix up the target register and
TPC just like in the TSB miss case.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For many sun4v processor types, reading or writing a privileged register
has a latency of 40 to 70 cycles. Use a combination of the low-latency
allclean, otherw, normalw, and nop instructions in etrap and rtrap to
replace 2 rdpr and 5 wrpr instructions and improve etrap/rtrap
performance. allclean, otherw, and normalw are available on NG2 and
later processors.
The average ticks to execute the flush windows trap ("ta 0x3") with and
without this patch on select platforms:
CPU Not patched Patched % Latency Reduction
NG2 1762 1558 -11.58
NG4 3619 3204 -11.47
M7 3015 2624 -12.97
SPARC64-X 829 770 -7.12
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge more updates from Andrew Morton:
- most of the rest of MM
- a small number of misc things
- lib/ updates
- checkpatch
- autofs updates
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
ipc: optimize semget/shmget/msgget for lots of keys
ipc/sem: play nicer with large nsops allocations
ipc/sem: drop sem_checkid helper
ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
ipc: convert ipc_namespace.count from atomic_t to refcount_t
kcov: support compat processes
sh: defconfig: cleanup from old Kconfig options
mn10300: defconfig: cleanup from old Kconfig options
m32r: defconfig: cleanup from old Kconfig options
drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
drivers/pps: aesthetic tweaks to PPS-related content
cpumask: make cpumask_next() out-of-line
kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
kmod: split off umh headers into its own file
MAINTAINERS: clarify kmod is just a kernel module loader
kmod: split out umh code into its own file
test_kmod: flip INT checks to be consistent
test_kmod: remove paranoid UINT_MAX check on uint range processing
vfat: deduplicate hex2bin()
...
Patch series "Define CPU_BIG_ENDIAN or warn for inconsistencies", v3.
While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to
clear a byte.
static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}
Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.
Define CPU_BIG_ENDIAN for all the fixed big endian architecture to fix it.
Also found few more references of this config parameter in
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c
Be aware that this may cause regressions if someone has worked-around
problems in the above code already. Remove the work-around.
Here is our original discussion
https://lkml.org/lkml/2017/5/24/620
Link: http://lkml.kernel.org/r/1499358861-179979-2-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops. I don't like the calling convention that uses a byte
count instead of a count of u16s, but it's a little late to change that.
Reduces code size of fbcon.o by almost 400 bytes on my laptop build.
[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
Pull RCU updates from Ingo Molnad:
"The main RCU related changes in this cycle were:
- Removal of spin_unlock_wait()
- SRCU updates
- RCU torture-test updates
- RCU Documentation updates
- Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
- Miscellaneous RCU fixes
- CPU-hotplug fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
arch: Remove spin_unlock_wait() arch-specific definitions
locking: Remove spin_unlock_wait() generic definitions
drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
ipc: Replace spin_unlock_wait() with lock/unlock pair
exit: Replace spin_unlock_wait() with lock/unlock pair
completion: Replace spin_unlock_wait() with lock/unlock pair
doc: Set down RCU's scheduling-clock-interrupt needs
doc: No longer allowed to use rcu_dereference on non-pointers
doc: Add RCU files to docbook-generation files
doc: Update memory-barriers.txt for read-to-write dependencies
doc: Update RCU documentation
membarrier: Provide expedited private command
rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
rcu: Add warning to rcu_idle_enter() for irqs enabled
rcu: Make rcu_idle_enter() rely on callers disabling irqs
rcu: Add assertions verifying blocked-tasks list
rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
rcu: Add TPS() protection for _rcu_barrier_trace strings
rcu: Use idle versions of swait to make idle-hack clear
swait: Add idle variants which don't contribute to load average
...
of_device_id are not supposed to change at runtime. All functions
working with of_device_id provided by <linux/of.h> work with const
of_device_ids. So mark the const and __initconst.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
of_device_id are not supposed to change at runtime. All functions
working with of_device_id provided by <linux/of.h> work with const
of_device_ids. So mark the const and __initconst.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.
Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.
This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.
Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
When building the kernel for Sparc using gcc 7.x, the build fails
with:
arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cmd |= PCI_COMMAND_IO;
^~
The simplified code looks like this:
unsigned int cmd;
[...]
pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
[...]
cmd |= PCI_COMMAND_IO;
I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.
As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
There is no need to log message if ATU hvapi couldn't get register.
Unlike PCI hvapi, ATU hvapi registration failure is not hard error.
Even if ATU hvapi registration fails (on system with ATU or without
ATU) system continues with legacy IOMMU. So only log message when
ATU hvapi successfully get registered.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
%g4 and %g5 are fixed registers used by the kernel for the thread
pointer and the per-cpu offset. Use %o4 and %g7 instead.
Diagnosis by Anthony Yznaga.
Fixes: 1b4af13ff2 ("sparc64: Add __multi3 for gcc 7.x and later.")
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:
default_hugepagesz=16G hugepagesz=16G hugepages=10
Testing:
Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.
Orabug: 25362942
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add RX & TX timers to perform delayed/asynchronous LDC
read and write operations.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enables VCC port probe and removal to initialize and terminate
VCC ports respectively. When a device/port matching the VCC driver
is added, the probe function is invoked along with a reference
to the device. remove function is called when the device is
removed.
Also add APIs to cache and retrieve VCC ports from a VCC table
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add C macros to print debug messages from VCC module
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enables the Virtual Console Concentrator (VCC) module
in linux kernel
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/linux/mm_types.h
mm/huge_memory.c
I removed the smp_mb__before_spinlock() like the following commit does:
8b1b436dd1 ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
and fixed up the affected commits.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
THP migration is added but only supports x86_64 at the moment. For all
other architectures, swp_entry_to_pmd() only returns a zero pmd_t.
Due to a GCC zero initializer bug #53119, the standard (pmd_t){0}
initializer is not accepted by all GCC versions. __pmd() is a feasible
workaround. In addition, sparc32's pmd_t is an array instead of a single
value, so we need (pmd_t){ {0}, } instead of (pmd_t){0}. Thus,
a different __pmd() definition is needed in sparc32.
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
New algorithm that takes advantage of the M7/M8 block init store
ASI, ie, overlapping pipelines and miss buffer filling.
Full details in code comments.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename exception handlers to memcpy_xxx as these
are going to be used by new memcpy routines and these
handlers are not exclusive to NG4memcpy anymore.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the exception handlers from NG4memcpy so that it can be
used with new memcpy routines. Make a separate file for all these handlers.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update comments about the range the different
parts of the code copies, the original comments were wrong.
Introduce a few descriptive labels too.
No functional changes.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mainline had UFO fixes, but UFO is removed in net-next so we
take the HEAD hunks.
Minor context conflict in bcmsysport statistics bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
It overflows the amount of space available in the initial .text section
of trap handler assembler in some configurations, resulting in build
failures.
Signed-off-by: David S. Miller <davem@davemloft.net>
Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.
The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.
The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).
Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: davem@davemloft.net
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: rkuo@codeaurora.org
Cc: vgupta@synopsys.com
Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use CPU_POKE hypervisor call to resume idle cpu if supported.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new hypercall CPU_POKE for quickly waking up an idle CPU.
CPU_POKE should only be sent to valid non-local CPUs.
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:
default_hugepagesz=16G hugepagesz=16G hugepages=10
Testing:
Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.
Orabug: 25362942
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the sparc64 eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes some mlx5 updates for both net-next and rdma trees.
From Saeed,
Core driver updates to allow selectively building the driver with
or without some large driver components, such as
- E-Switch (Ethernet SRIOV support).
- Multi-Physical Function Switch (MPFs) support.
For that we split E-Switch and MPFs functionalities into separate files.
From Erez,
Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
is taking place and until it completes.
From Rabie,
Increase the maximum supported flow counters.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZiDoAAAoJEEg/ir3gV/o+594H/RH5kRwC719s/5YQFJXvGsVC
fjtj3UUJPLrWB8XBh7a4PRcxXPIHaFKJuY3MU7KHFIeZQFklJcit3njjpxDlUINo
F5S1LHBSYBkeMD/ksWBA8OLCBprNGN6WQ2tuFfAjZlQQ44zqv8LJmegoDtW9bGRy
aGAkjUmALEblQsq81y0BQwN2/8DA8HAywrs8L2dkH1LHwijoIeYMZFOtKugv1FbB
ABSKxcU7D/NYw6rsVdZG59fHFQ+eKOspDFqBZrUzfQ+zUU2hFFo96ovfXBfIqYCV
7BtJuKXu2LeGPzFLsuw4h1131iqFT1iSMy9fEhf/4OwaL/KPP/+Umy8vP/XfM+U=
=wCpd
-----END PGP SIGNATURE-----
Merge tag 'mlx5-shared-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-shared-2017-08-07
This series includes some mlx5 updates for both net-next and rdma trees.
From Saeed,
Core driver updates to allow selectively building the driver with
or without some large driver components, such as
- E-Switch (Ethernet SRIOV support).
- Multi-Physical Function Switch (MPFs) support.
For that we split E-Switch and MPFs functionalities into separate files.
From Erez,
Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
is taking place and until it completes.
From Rabie,
Increase the maximum supported flow counters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On M8 chips, use a max_phys_bits value of 51.
Also, M8 supports VA bits up to 54 bits. However, for now
restrict VA bits to 53 due to 4-level pagetable limitation.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recognize SPARC-M8 cpu type, hardware caps and cpu
distribution map.
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sparc fixes from David Miller:
- block interrupts properly across the entire MMU context change (both
the hw MMU context change and the TSB table change) so that we don't
get a perf event interrupt in the middle. From Rob Gardner.
- be sure to register hugepages early enough, from Nitin Gupta.
- UltraSPARC-III user copy exception handling would return garbage for
the copied length in some circumstances.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix exception handling in UltraSPARC-III memcpy.
sbus: Convert to using %pOF instead of full_name
sparc: defconfig: Cleanup from old Kconfig options
sparc64: Register hugepages during arch init
sparc64: Prevent perf from running during super critical sections
Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.
After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.
The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.
Fixes: ee841d0aff ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.
Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are quite a number of occurrences in the kernel of the pattern
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
or
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
a system; those PCI devices possibly belong to different PCI bus trees (and
possibly rooted at different host bridges) and may well be enabled (ie
probed and bound to a driver) by the time pci_fixup_irqs() is called when
probing a given host bridge driver.
Furthermore, current kernel code relying on pci_fixup_irqs() to assign
legacy PCI IRQs to devices does not work at all for hotplugged devices in
that the code carrying out the IRQ fixup is called at host bridge driver
probe time, which just cannot take into account devices hotplugged after
the system has booted.
The introduction of map/swizzle function hooks in struct pci_host_bridge
allows us to define per-bridge map/swizzle functions that can be used at
device probe time in PCI core code to allocate IRQs for a given device
(through pci_assign_irq()).
Convert PCI host bridge initialization code to the
pci_scan_root_bus_bridge() API (that allows to pass a struct
pci_host_bridge with initialized map/swizzle pointers) and remove the
pci_fixup_irqs() call from arch code.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Multiple architectures define this as a trivial function, and I'm adding
another one as part of the RISC-V port. Add a __weak version of
pcibios_align_resource() and delete the now-obselete ones in a handful of
ports.
The only functional change should be that a handful of ports used to export
pcibios_fixup_bus(). Only some architectures export this, so I just
dropped it.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple architectures define this as an empty function, and I'm adding
another one as part of the RISC-V port. Add a __weak version of
pcibios_fixup_bus() and delete the now-obselete ones in a handful of
ports.
The only functional change should be that microblaze used to export
pcibios_fixup_bus(). None of the other architectures exports this, so I
just dropped it.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
huge at all, a revert of a patch that turned out to break things, a fix
up for a new tty ioctl we added in 4.13-rc1 to get the uapi definition
correct, and a few minor serial driver fixes for reported issues.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWXMebA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn9MgCdGsIEQlTvTeG4QikxUPB7dkEJQacAoLWOKJzQ
A65mBAeOfiJztczi8uo4
=KEXp
-----END PGP SIGNATURE-----
Merge tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
huge at all, a revert of a patch that turned out to break things, a
fix up for a new tty ioctl we added in 4.13-rc1 to get the uapi
definition correct, and a few minor serial driver fixes for reported
issues.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix TIOCGPTPEER ioctl definition
tty: hide unused pty_get_peer function
tty: serial: lpuart: Fix the logic for detecting the 32-bit type UART
serial: imx: Prevent TX buffer PIO write when a DMA has been started
Revert "serial: imx-serial - move DMA buffer configuration to DT"
serial: sh-sci: Uninitialized variables in sysfs files
serial: st-asc: Potential error pointer dereference
Remove old, dead Kconfig options (in order appearing in this commit):
- EXPERIMENTAL is gone since v3.9;
- INET_LRO: commit 7bbf3cae65 ("ipv4: Remove inet_lro library");
- AUTOFS_FS: commit 561c5cf923 ("staging: Remove autofs3");
- RCU_CPU_STALL_DETECTOR: commit a00e0d714f ("rcu: Remove conditional
compilation for RCU CPU stall warnings");
- USB_DEVICE_CLASS: commit 007bab9132 ("USB: remove
CONFIG_USB_DEVICE_CLASS");
- SYSCTL_SYSCALL_CHECK: commit 7c60c48f58 ("sysctl: Improve the
sysctl sanity checks");
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add hstate for each supported hugepage size using
arch initcall. This change fixes some hugepage
parameter parsing inconsistencies:
case 1: no hugepage parameters
Without hugepage parameters, only a hugepages-8192kB entry is visible
in sysfs. It's different from x86_64 where both 2M and 1G hugepage
sizes are available.
case 2: default_hugepagesz=[64K|256M|2G]
When specifying only a default_hugepagesz parameter, the default
hugepage size isn't really changed and it stays at 8M. This is again
different from x86_64.
Orabug: 25869946
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting si_code to __SI_FAULT results in a userspace seeing
an si_code of 0. This is the same si_code as SI_USER. Posix
and common sense requires that SI_USER not be a signal specific
si_code. As such this use of 0 for the si_code is a pretty
horribly broken ABI.
This was introduced in 2.3.41 so this mess has had a long time for
people to be able to start depending on it.
As this bug has existed for 17 years already I don't know if it is
worth fixing. It is definitely worth documenting what is going
on so that no one decides to copy this bad decision.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.
Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).
But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.
One particular critical section occurs in switch_mm:
spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);
If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.
This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.
Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.
Orabug: 25577560
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sparc fixes from David Miller:
- Fix DMA regression in 4.13 merge window, only certain chips can do
64-bit DMA. From Dave Dushar.
- Correct cpu cross-call algorithm to correctly detect stalled or stuck
remote cpus, from Jane Chu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Measure receiver forward progress to avoid send mondo timeout
SPARC64: Fix sun4v DMA panic
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.
Fixes: 54ebbfb160 ("tty: add TIOCGPTPEER ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Aleksa Sarai <asarai@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull uacess-unaligned removal from Al Viro:
"That stuff had just one user, and an exotic one, at that - binfmt_flat
on arm and m68k"
* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill {__,}{get,put}_user_unaligned()
binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.
But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.
This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.
In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.
Orabug: 25476541
Orabug: 26417466
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Move generic-y of exported headers to uapi/asm/Kbuild
for complete de-coupling of UAPI
- Clean up scripts/Makefile.headersinst
- Fix host programs for 32 bit machine with XFS file system
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZZ5yPAAoJED2LAQed4NsGSHkP/1sQy6k7OnBZKNzYsYXdUfQT
tNm+k0NVZJk2jSa2ASHFMCEywnZ/afBU6nyvYSs4zKI6biaV/k0XZ2PqN0JDNmNh
9cZEoW1iPgnmu6Y+VxFxezjO34Qr2Gi94e55SO8qSfCZsrfcsr4OMCNTd67ar2YB
IbIV5+sCCO/dzkX9F2UnNX+elt2PfJ1yoJCeVHOmMmtv+I5MbQhNhpF9gElhYSc3
dDckZvoNOPCP5ddQcUqLbIFL1GVzse4XkY5l766RlnvPjguNmktzrRBxtOT3g6oP
kJSmYYh7DcctTFS8c1lqf0ERNxoshGBIagPieROVF1wpBXY8hDsUPV6Xr/L/5qPo
Iy4A8SsvRADRPf5HKspN6ykg0/bttcNOCBIp2EEYuxomIa9D3BTf8vJZaJU/LrYv
zsZMQ5t78oH6HmkubmGlvb/5Mvt/Vi5upXHgmJtZ4cwT1meURpOhcPmoer4hs++/
QimZ5JnngAhB8w5oYPr/3vnJaWUA1VZnmZq64AXPHQoExc+Q9gJUtan91GjIq/Kb
HVKpDQn82/qfPFoTiJ4/CksLtF/s9e20CzGfFbPE8N9ekJoWGnIFW7yly2Bi/5C7
4oBjLT9gDZP5+nouM14/DwPCmjVTmx29Uj9aXDrE9a+jhKeycKWNeAx6coHiqS8v
DK7gzKyuovdmuu5T7y9P
=Sw8W
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Move generic-y of exported headers to uapi/asm/Kbuild for complete
de-coupling of UAPI
- Clean up scripts/Makefile.headersinst
- Fix host programs for 32 bit machine with XFS file system
* tag 'kbuild-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
kbuild: Enable Large File Support for hostprogs
kbuild: remove wrapper files handling from Makefile.headersinst
kbuild: split exported generic header creation into uapi-asm-generic
kbuild: do not include old-kbuild-file from Makefile.headersinst
xtensa: move generic-y of exported headers to uapi/asm/Kbuild
unicore32: move generic-y of exported headers to uapi/asm/Kbuild
tile: move generic-y of exported headers to uapi/asm/Kbuild
sparc: move generic-y of exported headers to uapi/asm/Kbuild
sh: move generic-y of exported headers to uapi/asm/Kbuild
parisc: move generic-y of exported headers to uapi/asm/Kbuild
openrisc: move generic-y of exported headers to uapi/asm/Kbuild
nios2: move generic-y of exported headers to uapi/asm/Kbuild
nios2: remove unneeded arch/nios2/include/(generated/)asm/signal.h
microblaze: move generic-y of exported headers to uapi/asm/Kbuild
metag: move generic-y of exported headers to uapi/asm/Kbuild
m68k: move generic-y of exported headers to uapi/asm/Kbuild
m32r: move generic-y of exported headers to uapi/asm/Kbuild
ia64: remove redundant generic-y += kvm_para.h from asm/Kbuild
hexagon: move generic-y of exported headers to uapi/asm/Kbuild
h8300: move generic-y of exported headers to uapi/asm/Kbuild
...
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.
Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)
- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim
- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.
- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.
- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).
- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.
- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.
- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.
Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.
This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.
[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For architectures that define HAVE_NMI_WATCHDOG, instead of having them
provide the complete touch_nmi_watchdog() function, just have them
provide arch_touch_nmi_watchdog().
This gives the generic code more flexibility in implementing this
function, and arch implementations don't miss out on touching the
softlockup watchdog or other generic details.
Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com> [sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
64bit DMA only supported on sun4v equipped with ATU IOMMU HW.
'Commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations")' introduced a code that incorrectly allow
dma_supported() to succeed for 64bit dma mask even if system doesn't
have ATU IOMMU. This results into panic.
Fix it.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sparc fixes from David Miller:
- Fix symbol version generation for assembler on sparc, from
Nagarathnam Muthusamy.
- Fix compound page handling in gup_huge_pmd(), from Nitin Gupta.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix gup_huge_pmd
Adding the type of exported symbols
sed regex in Makefile.build requires line break between exported symbols
Adding asm-prototypes.h for genksyms to generate crc
Since commit fcc8487d47 ("uapi: export all headers under uapi
directories"), all (and only) headers under uapi directories are
exported, but asm-generic wrappers are still exceptions.
To complete de-coupling the uapi from kernel headers, move generic-y
of exported headers to uapi/asm/Kbuild.
With this change, "make headers_install" will just need to parse
uapi/asm/Kbuild to build up exported headers.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull sparc updates from David Miller:
1) Queued spinlocks and rwlocks for sparc64, from Babu Moger.
2) Some const'ification from Arvind Yadav.
3) LDC/VIO driver infrastructure changes to facilitate future upcoming
drivers, from Jag Raman.
4) Initialize sched_clock() et al. early so that the initial printk
timestamps are all done while the implementation is available and
functioning. From Pavel Tatashin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (38 commits)
sparc: kernel: pmc: make of_device_ids const.
sparc64: fix typo in property
sparc64: add port_id to VIO device metadata
sparc64: Enhance search for VIO device in MDESC
sparc64: enhance VIO device probing
sparc64: check if a client is allowed to register for MDESC notifications
sparc64: remove restriction on VIO device name size
sparc64: refactor code to obtain cfg_handle property from MDESC
sparc64: add MDESC node name property to VIO device metadata
sparc64: mdesc: use __GFP_REPEAT action modifier for VM allocation
sparc64: expand MDESC interface
sparc64: skip handshake for LDC channels in RAW mode
sparc64: specify the device class in VIO version info. packet
sparc64: ensure VIO operations are defined while being used
sparc: kernel: apc: make of_device_ids const
sparc/time: make of_device_ids const
sparc64: broken %tick frequency on spitfire cpus
sparc64: use prom interface to get %stick frequency
sparc64: optimize functions that access tick
sparc64: add hot-patched and inlined get_tick()
...
Thin archives migration by Nicholas Piggin.
THIN_ARCHIVES has been available for a while as an optional feature
only for PowerPC architecture, but we do not need two different
intermediate-artifact schemes.
Using thin archives instead of conventional incremental linking has
various advantages:
- save disk space for builds
- speed-up building a little
- fix some link issues (for example, allyesconfig on ARM) due to
more flexibility for the final linking
- work better with dead code elimination we are planning
As discussed before, this migration has been done unconditionally
so that any problems caused by this will show up with "git bisect".
With testing with 0-day and linux-next, some architectures actually
showed up problems, but they were trivial and all fixed now.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZXsiSAAoJED2LAQed4NsGfqUQAIxbR4JcFCeGNNqgOV1q7Ban
CaMzVZWPum0Mq+JWzknHrCJQzBE+4BPLbOtZH4Y0YhjXVfc2/M8QkzEzSWyEPm03
FyaQ6WTq479mv7Ot2nAwaRSUYNSOuvlCx5KUOxITMJ/VmxwXXc9fCuT3ORu9opdK
4iyh0P2D+IeABQlrS5k1Rj+y4u/BtpiGY9U5RDssn7u8sjEgBHWFXFfE2fQ0No+0
1lzwa5EVyPHuq0XTBeZkPSDNxtou4iZzQC9QeNIYlyiod1G9deE4lzB55s+Qtkk0
h6rN9WF+Rvy7/hjFUJy0TDPNx0io2kdJxMaMKp2HaES49w5fHv7NAgxuipFC91vE
5UKs1sXxBe8dpPjfZWY7QSQ/JQv6NuG7NWcSGM29BWy3yFefSAXCggM+nn5IWzLH
pSutfOBGeceJdyKMcdn3AgcHCj0wddFxX8AXst+ZebnqVoNxR/Nu6HGmyaucwyp3
6fFTkbZ6DvOlu9MKbK0HSqrsT3DlAas2YWZKZ4Cc20wM99Z0OtFZlmpMCRIdiYtx
hZBwze/ElheUbZu6igH6UX2lpOlat0V6nT5vKHGGeOJlwkxduKi3Kj6zVSkCHic5
w3NLXr5FDWdkrMiC6/Z0Uae5mtAWOYyt6z1CwjgVmFrAkqlL8aWNagOcDCSFc1qR
+3Cv7pZQSRWy2TaaLMzo
=PAWi
-----END PGP SIGNATURE-----
Merge tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild thin archives updates from Masahiro Yamada:
"Thin archives migration by Nicholas Piggin.
THIN_ARCHIVES has been available for a while as an optional feature
only for PowerPC architecture, but we do not need two different
intermediate-artifact schemes.
Using thin archives instead of conventional incremental linking has
various advantages:
- save disk space for builds
- speed-up building a little
- fix some link issues (for example, allyesconfig on ARM) due to more
flexibility for the final linking
- work better with dead code elimination we are planning
As discussed before, this migration has been done unconditionally so
that any problems caused by this will show up with "git bisect".
With testing with 0-day and linux-next, some architectures actually
showed up problems, but they were trivial and all fixed now"
* tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
tile: remove unneeded extra-y in Makefile
kbuild: thin archives make default for all archs
x86/um: thin archives build fix
tile: thin archives fix linking
ia64: thin archives fix linking
sh: thin archives fix linking
kbuild: handle libs-y archives separately from built-in.o archives
kbuild: thin archives use P option to ar
kbuild: thin archives final link close --whole-archives option
ia64: remove unneeded extra-y in Makefile.gate
tile: fix dependency and .*.cmd inclusion for incremental build
sparc64: Use indirect calls in hamming weight stubs
Merge misc updates from Andrew Morton:
- a few hotfixes
- various misc updates
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
mm, memory_hotplug: move movable_node to the hotplug proper
mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
mm, memory_hotplug: drop artificial restriction on online/offline
mm: memcontrol: account slab stats per lruvec
mm: memcontrol: per-lruvec stats infrastructure
mm: memcontrol: use generic mod_memcg_page_state for kmem pages
mm: memcontrol: use the node-native slab memory counters
mm: vmstat: move slab statistics from zone to node counters
mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
mm/zswap.c: improve a size determination in zswap_frontswap_init()
mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
mm/swapfile.c: sort swap entries before free
mm/oom_kill: count global and memory cgroup oom kills
mm: per-cgroup memory reclaim stats
mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
mm: kmemleak: factor object reference updating out of scan_block()
mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
mm, mempolicy: don't check cpuset seqlock where it doesn't matter
mm, cpuset: always use seqlock when changing task's nodemask
mm, mempolicy: simplify rebinding mempolicies when updating cpusets
...
Pull user access str* updates from Al Viro:
"uaccess str...() dead code removal"
* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
s390 keyboard.c: don't open-code strndup_user()
mips: get rid of unused __strnlen_user()
get rid of unused __strncpy_from_user() instances
kill strlen_user()
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls
to ->mapping_error so that the dma_map_ops instances are
more self contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
+i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
R4o=
=0Fso
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping infrastructure from Christoph Hellwig:
"This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls to
->mapping_error so that the dma_map_ops instances are more self
contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)"
* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
ARM: dma-mapping: Remove traces of NOMMU code
ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
ARM: NOMMU: Introduce dma operations for noMMU
drivers: dma-mapping: allow dma_common_mmap() for NOMMU
drivers: dma-coherent: Introduce default DMA pool
drivers: dma-coherent: Account dma_pfn_offset when used with device tree
dma: Take into account dma_pfn_offset
dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
dma-mapping: remove dmam_free_noncoherent
crypto: qat - avoid an uninitialized variable warning
au1100fb: remove a bogus dma_free_nonconsistent call
MAINTAINERS: add entry for dma mapping helpers
powerpc: merge __dma_set_mask into dma_set_mask
dma-mapping: remove the set_dma_mask method
powerpc/cell: use the dma_supported method for ops switching
powerpc/cell: clean up fixed mapping dma_ops initialization
tile: remove dma_supported and mapping_error methods
xen-swiotlb: remove xen_swiotlb_set_dma_mask
arm: implement ->dma_supported instead of ->set_dma_mask
mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
...
A poisoned or migrated hugepage is stored as a swap entry in the page
tables. On architectures that support hugepages consisting of
contiguous page table entries (such as on arm64) this leads to ambiguity
in determining the page table entry to return in huge_pte_offset() when
a poisoned entry is encountered.
Let's remove the ambiguity by adding a size parameter to convey
additional information about the requested address. Also fixup the
definition/usage of huge_pte_offset() throughout the tree.
Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:
1) Several optimizations for UDP processing under high load from
Paolo Abeni.
2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.
3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.
7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.
8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.
10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.
11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.
13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.
15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.
16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.
19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.
21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpX4A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymobgCfd0d13IfpZoq1N41wc6z2Z0xD7cwAnRMeH1/p
kEeISGpHPYP9f8PBh9FO
=Hfqt
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues"
* tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits)
arm: mach-rpc: ecard: fix build error
zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()
driver-core: remove struct bus_type.dev_attrs
powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
powerpc: vio: use dev_groups and not dev_attrs for bus_type
USB: usbip: convert to use DRIVER_ATTR_RW
s390: drivers: convert to use DRIVER_ATTR_RO/WO
platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW
pcmcia: ds: convert to use DRIVER_ATTR_RO
wireless: ipw2x00: convert to use DRIVER_ATTR_RW
net: ehea: convert to use DRIVER_ATTR_RO
net: caif: convert to use DRIVER_ATTR_RO
TTY: hvc: convert to use DRIVER_ATTR_RW
PCI: pci-driver: convert to use DRIVER_ATTR_WO
IB: nes: convert to use DRIVER_ATTR_RW
HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups
arm: ecard: fix dev_groups patch typo
tty: serdev: use dev_groups and not dev_attrs for bus_type
sparc: vio: use dev_groups and not dev_attrs for bus_type
hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type
...
Here is the large tty/serial patchset for 4.13-rc1.
A lot of tty and serial driver updates are in here, along with some
fixups for some __get/put_user usages that were reported. Nothing huge,
just lots of development by a number of different developers, full
details in the shortlog.
All of these have been in linux-next for a while. There will be a merge
issue with the arm-soc tree in the include/linux/platform_data/atmel.h
file. Stephen has sent out a fixup for it, so it shouldn't be that
difficult to merge.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpZ9w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylkTgCfV2HhbxIph/aEL1nJmwW64oCXFrMAoK59ZH65
tBZIosv0d91K1A+mObBT
=adPL
-----END PGP SIGNATURE-----
Merge tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the large tty/serial patchset for 4.13-rc1.
A lot of tty and serial driver updates are in here, along with some
fixups for some __get/put_user usages that were reported. Nothing
huge, just lots of development by a number of different developers,
full details in the shortlog.
All of these have been in linux-next for a while"
* tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (71 commits)
tty: serial: lpuart: add a more accurate baud rate calculation method
tty: serial: lpuart: add earlycon support for imx7ulp
tty: serial: lpuart: add imx7ulp support
dt-bindings: serial: fsl-lpuart: add i.MX7ULP support
tty: serial: lpuart: add little endian 32 bit register support
tty: serial: lpuart: refactor lpuart32_{read|write} prototype
tty: serial: lpuart: introduce lpuart_soc_data to represent SoC property
serial: imx-serial - move DMA buffer configuration to DT
serial: imx: Enable RTSD only when needed
serial: imx: Remove unused members from imx_port struct
serial: 8250: 8250_omap: Fix race b/w dma completion and RX timeout
serial: 8250: Fix THRE flag usage for CAP_MINI
tty/serial: meson_uart: update to stable bindings
dt-bindings: serial: Add bindings for the Amlogic Meson UARTs
serial: Delete dead code for CIR serial ports
serial: sirf: make of_device_ids const
serial/mpsc: switch to dma_alloc_attrs
tty: serial: Add Actions Semi Owl UART earlycon
dt-bindings: serial: Document Actions Semi Owl UARTs
tty/serial: atmel: make the driver DT only
...
Pull SMP hotplug updates from Thomas Gleixner:
"This update is primarily a cleanup of the CPU hotplug locking code.
The hotplug locking mechanism is an open coded RWSEM, which allows
recursive locking. The main problem with that is the recursive nature
as it evades the full lockdep coverage and hides potential deadlocks.
The rework replaces the open coded RWSEM with a percpu RWSEM and
establishes full lockdep coverage that way.
The bulk of the changes fix up recursive locking issues and address
the now fully reported potential deadlocks all over the place. Some of
these deadlocks have been observed in the RT tree, but on mainline the
probability was low enough to hide them away."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
cpu/hotplug: Constify attribute_group structures
powerpc: Only obtain cpu_hotplug_lock if called by rtasd
ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
cpu/hotplug: Remove unused check_for_tasks() function
perf/core: Don't release cred_guard_mutex if not taken
cpuhotplug: Link lock stacks for hotplug callbacks
acpi/processor: Prevent cpu hotplug deadlock
sched: Provide is_percpu_thread() helper
cpu/hotplug: Convert hotplug locking to percpu rwsem
s390: Prevent hotplug rwsem recursion
arm: Prevent hotplug rwsem recursion
arm64: Prevent cpu hotplug rwsem recursion
kprobes: Cure hotplug lock ordering issues
jump_label: Reorder hotplug lock and jump_label_lock
perf/tracing/cpuhotplug: Fix locking order
ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
PCI: Replace the racy recursion prevention
PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
x86/perf: Drop EXPORT of perf_check_microcode
...
Pull timer updates from Thomas Gleixner:
"A rather large update for timers/timekeeping:
- compat syscall consolidation (Al Viro)
- Posix timer consolidation (Christoph Helwig / Thomas Gleixner)
- Cleanup of the device tree based initialization for clockevents and
clocksources (Daniel Lezcano)
- Consolidation of the FTTMR010 clocksource/event driver (Linus
Walleij)
- The usual set of small fixes and updates all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
timers: Make the cpu base lock raw
clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
clocksource/drivers/tcb_clksrc: Make IO endian agnostic
clocksource/drivers/sun4i: Switch to the timer-of common init
clocksource/drivers/timer-of: Fix invalid iomap check
Revert "ktime: Simplify ktime_compare implementation"
clocksource/drivers: Fix uninitialized variable use in timer_of_init
kselftests: timers: Add test for frequency step
kselftests: timers: Fix inconsistency-check to not ignore first timestamp
time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Clean up CLOCK_MONOTONIC_RAW time handling
posix-cpu-timers: Make timespec to nsec conversion safe
itimer: Make timeval to nsec conversion range limited
timers: Fix parameter description of try_to_del_timer_sync()
ktime: Simplify ktime_compare implementation
clocksource/drivers/fttmr010: Factor out clock read code
clocksource/drivers/fttmr010: Implement delay timer
clocksource/drivers: Add timer-of common init routine
clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
...
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, depending upon link order, the branch relocation
limits could be exceeded.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Usually dma_supported decisions are done by the dma_map_ops instance.
Switch sparc to that model by providing a ->dma_supported instance for
sbus that always returns false, and implementations tailored to the sun4u
and sun4v cases for sparc64, and leave it unimplemented for PCI on
sparc32, which means always supported.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
There is a typo in a comment that propagated into code:
upa-portis instead of upa-portid
This problem was detected by code inspection.
Fixes: eea9833453 ("sparc64: broken %tick frequency on spitfire cpus"
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reported-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port_id field to VIO device metadata to identify the port of
VIO device.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhances search for VIO device in MDESC by leveraging already existing
MDESC APIs. Enhances changes in earlier patch,
"sparc: Machine description indices can vary", by using existing MD
search functions. It also specifies a match function, thereby
enabling device_find_child() to use it for the purpose of matching
device nodes in MDESC.
An API to find VDEV node in MDESC based on its md_node_info is also
added. It is planned to be used by VIO device clients in the future.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Allocate IRQs for VIO devices during probing.
- Allow clients to specify if IRQs would be allocated for a given
VIO device.
- Cache the device handle of the root node of channel-devices sub-tree in
Machine Description (MDESC).
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check if a client is supported, by comparing against a whitelist, to
register for notifications from Machine Description (MDESC)
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes restriction on VIO device's size limit. Since KOBJ_NAME_LEN
has been dropped from kobject, there doesn't seem to be a
restriction on the device name anymore. This limit therefore
doesn't make sense.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactors code to get the cfg_handle property of a node from Machine
Description (MDESC)
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the MDESC node name of MDESC client to VIO device metadata. It is
later used to uniquely identify a node in the MDESC. VIO & MDESC APIs
are updated to handle this node name.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During MDESC handle allocation, use the __GFP_REPEAT flag instead of
__GFP_NOFAIL. If memory is not available, the caller expects a NULL
pointer instead of waiting until memory is allocated.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following two APIs to Machine Description (MDESC)
- mdesc_get_node: Searches for a node in the Machine
Description tree based on given information about
that node.
- mdesc_get_node_info: Retrieves information about a
given node.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LDC channels in RAW mode does not provide any session management. No
handshake protocol is defined for LDC channels in RAW mode. It's
therefore skipped.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Specify the class of VIO device in the version info. packet. The device's
class identifies the type of VIO device, whether it's DISK, CONSOLE,
NETWORK, etc... This packet is used in the handshake between the
client and server for this device.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that VIO operations are not defined for some VIO
clients. In that case, VIO ops pointer should be checked for
NULL before being used
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function assumes that each PMD points to head of a
huge page. This is not correct as a PMD can point to
start of any 8M region with a, say 256M, hugepage. The
fix ensures that it points to the correct head of any PMD
huge page.
Cc: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.
While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.
Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.
Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.
So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.
Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missing symbol type for few functions prevents genksyms from generating
symbol versions for those functions. This patch fixes them.
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following regex in Makefile.build matches only one ___EXPORT_SYMBOL per line.
sed
's/.*___EXPORT_SYMBOL[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*,.*/EXPORT_SYMBOL(\1);/'
ATOMIC_OPS macro in atomic_64.S expands multiple symbols in same line hence
version generation is done only for the last matched symbol. This patch adds
new line between the symbol expansions.
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the prototypes of assembly defined functions to asm-prototypes.h.
Some prototypes are directly added as they are not present in any existing header
files.
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After early boot time stamps project the %tick frequency is detected
incorrectly on spittfire cpus.
We must use cpuid of boot cpu to find corresponding cpu node in OpenBoot,
and extract clock-frequency property from there.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We initialize time early, we must use prom interface instead of open
firmware driver, which is not yet initialized.
Also, use prom_getintdefault() instead of prom_getint() to be compatible
with the code before early boot timestamps project.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace read tick function pointers with the new hot-patched get_tick().
This optimizes the performance of functions such as: sched_clock()
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new get_tick() function that is hot-patched during boot based on
processor we are booting on.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Linux it is possible to configure printk() to output timestamp next to
every line. This is very useful to determine the slow parts of the boot
process, and also to avoid regressions, as boot time is visiable to
everyone.
Also, there are scripts that change these time stamps to intervals.
However, on larger machines these timestamps start appearing many seconds,
and even minutes into the boot process. This patch gets stick-frequency
property early from OpenBoot, and uses its value to initialize time stamps
before the first printk() messages are printed.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prepares the code for early boot time stamps by making it more
modular.
- init_tick_ops() to initialize struct sparc64_tick_ops
- new sparc64_tick_ops operation get_frequency() which returns a
frequency
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In clock sched we now have three loads:
- Function pointer
- quotient for multiplication
- offset
However, it is possible to improve performance substantially, by
guaranteeing that all three loads are from the same cacheline.
By moving these three values first in sparc64_tick_ops, and by having
tick_operations 64-byte aligned we guarantee this.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On most platforms, time is shown from the beginning of boot. This patch is
adding offset to sched_clock() for SPARC, to also show time from 0.
This means we will have one more load, but we saved one in an ealier patch.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In timer_64.c tick functions are access via pointer (tick_ops), every time
clock is read, there is one extra load to get to the function.
This patch optimizes it, by accessing functions pointer from value.
Current ched_clock():
sethi %hi(0xb9b400), %g1
ldx [ %g1 + 0x250 ], %g1 ! <tick_ops>
ldx [ %g1 ], %g1
call %g1
nop
sethi %hi(0xb9b400), %g1
ldx [ %g1 + 0x300 ], %g1 ! <timer_ticks_per_nsec_quotient>
mulx %o0, %g1, %g1
rett %i7 + 8
srlx %g1, 0xa, %o0
New sched_clock():
sethi %hi(0xb9b400), %g1
ldx [ %g1 + 0x340 ], %g1
call %g1
nop
sethi %hi(0xb9b400), %g1
ldx [ %g1 + 0x378 ], %g1
mulx %o0, %g1, %g1
rett %i7 + 8
srlx %g1, 0xa, %o0
Before three loads, now two loads.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few changes that were reported by checkpatch, removed all trailing white
spaces in these two files.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print debug messages when reading from given LDC channel.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orabug: 20902628
When an ldc control-only packet is received during data exchange in
read_nonraw(), a new rx head is calculated but the rx queue head is not
actually advanced (rx_set_head() is not called) and a branch is taken to
'no_data' at which point two things can happen depending on the value
of the newly calculated rx head and the current rx tail:
- If the rx queue is determined to be not empty, then the wrong packet
is picked up.
- If the rx queue is determined to be empty, then a read error (EAGAIN)
is eventually returned since it is falsely assumed that more data was
expected.
The fix is to update the rx head and return in case of a control only
packet during data exchange.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that LDC channel is up before writing to it, in RAW mode. Generate
event to bring the LDC channel up, if it's not up already.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhance ldc_abort to accept a message to be printed when it is called. Add
a macro, LDC_ABORT, to print info. about the function that calls ldc_abort.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following LDC APIs which are planned to be used by
LDC clients in the future:
- ldc_set_state: Sets given LDC channel to given state
- ldc_mode: Returns the mode of given LDC channel
- ldc_print: Prints info about given LDC channel
- ldc_rx_reset: Reset the RX queue of given LDC channel
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When opening the slave end of a PTY, it is not possible for userspace to
safely ensure that /dev/pts/$num is actually a slave (in cases where the
mount namespace in which devpts was mounted is controlled by an
untrusted process). In addition, there are several unresolvable
race conditions if userspace were to attempt to detect attacks through
stat(2) and other similar methods [in addition it is not clear how
userspace could detect attacks involving FUSE].
Resolve this by providing an interface for userpace to safely open the
"peer" end of a PTY file descriptor by using the dentry cached by
devpts. Since it is not possible to have an open master PTY without
having its slave exposed in /dev/pts this interface is safe. This
interface currently does not provide a way to get the master pty (since
it is not clear whether such an interface is safe or even useful).
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Valentin Rothberg <vrothberg@suse.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: <sparclinux@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.
At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.
This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.
[DH: Modified to remove arm64 compat enablement also as requested by Eric
Biggers]
Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap. This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a memory location in every process.
Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.
Several approaches were explored before settling on this one:
Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).
This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.
Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.
The approach in this patch is the simplest and has almost no impact on
runtime. We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:
1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU
This patch implements the first solution
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hugetlb_bad_size needs to be called on invalid values. Also change the
pr_warn to a pr_err to better align with other platforms.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>
When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these
calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used. As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB. When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB. We take a performance hit for the table walk miss and
recreation of these entries.
Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.
Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.
To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.
Orabug: 25505750
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add jited_len to struct bpf_prog. It will be
useful for the struct bpf_prog_info which will
be added in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
By moving the kernel side __SI_* defintions right next to the userspace
ones we can kill the non-uapi versions of <asm/siginfo.h> include
include/asm-generic/siginfo.h and untangle the unholy mess of includes.
[ tglx: Removed uapi/asm/siginfo.h from m32r, microblaze, mn10300 and score ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-6-hch@lst.de
There is no need for the forward declaration of compat_siginfo provided
here. We can't yet use the generic header as we need to pull in the
sparc-specific version of the uapi <asm/siginfo.h>, but this prepares
for removing the non-uapi <asm/siginfo.h> entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20170603190102.28866-2-hch@lst.de
arch/sparc/kernel/ds.c: In function ‘register_services’:
arch/sparc/kernel/ds.c:912:3: error: ‘strcpy’: writing at least 1 byte
into a region of size 0 overflows the destination
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC M6-32 platform has (2^5) NUMA nodes, so need to bump up the
CONFIG_NODES_SHIFT to 5.
Orabug: 25577754
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conversion of the hotplug locking to a percpu rwsem unearthed lock
ordering issues all over the place.
The jump_label code has two issues:
1) Nested get_online_cpus() invocations
2) Ordering problems vs. the cpus rwsem and the jump_label_mutex
To cure these, the following lock order has been established;
cpus_rwsem -> jump_label_lock -> text_mutex
Even if not all architectures need protection against CPU hotplug, taking
cpus_rwsem before jump_label_lock is now mandatory in code pathes which
actually modify code and therefor need text_mutex protection.
Move the get_online_cpus() invocations into the core jump label code and
establish the proper lock order where required.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/20170524081549.025830817@linutronix.de
This patch makes the necessary changes in SPARC architecture to enable
queued spinlock support. Here are some of the earlier discussions about
this feature.
https://lwn.net/Articles/561775/https://lwn.net/Articles/590243/
Cleaned-up the spinlock_64.h. The definitions of arch_spin_xxx are
replaced by the function in <asm-generic/qspinlock.h>
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC supports 32 bit and 64 bit xchg right now. Add the support
for 16 bit (2 byte) xchg. This is required to support queued spinlock
feature which uses 2 byte xchg. This is achieved using 4 byte cas
instructions with byte manipulations.
Also re-arranged the code to call __cmpxchg_u32 inside xchg16.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable queued rwlocks for SPARC. Here are the discussions on this feature
when this was introduced.
https://lwn.net/Articles/572765/https://lwn.net/Articles/582200/
Cleaned-up the arch_read_xxx and arch_write_xxx definitions in spinlock_64.h.
These routines are replaced by the functions in include/asm-generic/qrwlock.h
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC supports 32 bit and 64 bit cmpxchg right now. Add support
for 8 bit (1 byte) cmpxchg. This is required to support queued
rwlocks feature which uses 1 byte cmpxchg.
The function __cmpxchg_u8 here uses the 4 byte cas instruction with a
byte manipulation to achieve 1 byte cmpxchg.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Found this problem while enabling queued rwlock on SPARC.
The parameter CONFIG_CPU_BIG_ENDIAN is used to clear the
specific byte in qrwlock structure. Without this parameter,
we clear the wrong byte. Here is the code.
static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}
Define CPU_BIG_ENDIAN for SPARC to fix it.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saw these compile errors on SPARC when queued rwlock feature is enabled.
CC kernel/locking/qrwlock.o
In file included from ./include/asm-generic/qrwlock_types.h:5,
from ./arch/sparc/include/asm/qrwlock.h:4,
from kernel/locking/qrwlock.c:24:
./arch/sparc/include/asm/spinlock_types.h:5:3: error:
#error "please don't include this file directly"
SPARC has this guard which causes compile error when spinlock_types.h
is included directly.
@ifndef __LINUX_SPINLOCK_TYPES_H
@ error "please don't include this file directly"
@endif
Remove this un-necessary "ifndef __LINUX_SPINLOCK_TYPES_H" stanza from SPARC.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters. This change pulls the x86_64 fix from 'commit 722b3c7469
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.
Example measurements for select_task_rq_fair running "hackbench 100
process 1000":
| tracing/trace_stat/function0 | function_graph
Before patch | 2.802 us | 4.255 us
After patch | 2.749 us | 3.094 us
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greetings,
GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.
Cheers,
Orlando.
[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
[2] https://gcc.gnu.org/gcc-7/changes.html
Signed-off-by: Orlando Arias <oarias@knights.ucf.edu>
--------------------------------------------------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
An incorrect huge page alignment check caused
mmap failure for 64K pages when MAP_FIXED is used
with address not aligned to HPAGE_SIZE.
Orabug: 25885991
Fixes: dcd1912d21 ("sparc64: Add 64K page size support")
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories,
but the de-coupling of exported headers has not been completed.
Headers listed in header-y are exported whether they exist in
uapi directories or not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported.
The asm-generic wrappers are still exceptions, but this is a big
step forward.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZE7MBAAoJED2LAQed4NsGroAP/iARejrIFmxuH96D5h2aiP1j
c8KHQ+5fuq4w2KBmfbfkNvWbazlVheT6RrYWBUh/GABGsSqQC07d8New6B8TaUkE
K0E48RsuYxouP18Ys6BOO4/zyRhEFD7Ta72PGQ/gDQY+6hAu4jYQnMdG0wipTblS
QWgnUxTqfCbTjnRpRKXpcwRff+OeTWtOv3s0V8UashJUxnFVQ7Br2uRsm/KKkU/k
jQC65KyHL4HlsFeeAiMmQ9IQPVwLsd6+d5crs0nydHaJ2XrFlNNQ7EEMyG8FxPdx
9b/VpS+XY6DO+jeqkcpFrdL9IgcmCn72Qc5/4vrHuQO2dpWW5mVaVPq9RAGP0Yq/
FB0vZRTp/tOIkD+0esirZW2gJtU3DWMY1A9rc5jjLRabdnRXVTdLfhEnksYJEfES
yPbDEuKyzo6a+zBSqNtMquJPmYVYEDS2mcmgxY5sB58qtXkUN2Yr+uUALxC8XhXW
SHHwIAV3a+UX5ZU9Ys8dp2hI4EXYXtdvsz2zvl4qPIn/Q9d1YoEJRe7/Y0p8gBXM
5pVJ1yohKoYrNZVGBe0LO/gHGVAVgMj0cKn0Xg51bbvjxY2U5djUbMY0uw1gFrrM
O9ld3C6O8zH5BsExCfwp9iPz2SW5W9N80kgnKfjCHBRUKuMTkm02DJf8Hx+pyfVQ
DCy9lYTi76IgZ1uflKq9
=Rqdo
-----END PGP SIGNATURE-----
Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild UAPI updates from Masahiro Yamada:
"Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories, but the
de-coupling of exported headers has not been completed. Headers listed
in header-y are exported whether they exist in uapi directories or
not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported. The
asm-generic wrappers are still exceptions, but this is a big step
forward"
* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
arch/include: remove empty Kbuild files
uapi: export all arch specifics directories
uapi: export all headers under uapi directories
smc_diag.h: fix include from userland
btrfs_tree.h: fix include from userland
uapi: includes linux/types.h before exporting files
Makefile.headersinst: remove destination-y option
Makefile.headersinst: cleanup input files
x86: stop exporting msr-index.h to userland
nios2: put setup.h in uapi
h8300: put bitsperlong.h in uapi
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.
In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.
After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h
Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).
Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.
For the record, note that exported files for asm directories are a mix of
files listed by:
- include/uapi/asm-generic/Kbuild.asm;
- arch/<arch>/include/uapi/asm/Kbuild;
- arch/<arch>/include/asm/Kbuild.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull sparc updates from David Miller:
"sparc changes, including a bug fix for handling exceptions during
bzero on some sparc64 cpus"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: fix fault handling in NGbzero.S and GENbzero.S
sparc: use memdup_user_nul in sun4m LED driver
sparc: Remove redundant tests in boot_flags_init().
When any of the functions contained in NGbzero.S and GENbzero.S
vector through *bzero_from_clear_user, we may end up taking a
fault when executing one of the store alternate address space
instructions. If this happens, the exception handler does not
restore the %asi register.
This commit fixes the issue by introducing a new exception
handler that ensures the %asi register is restored when
a fault is handled.
Orabug: 25577560
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user_nul() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
ags00JtMLitfNPBH3eSl
=eE4D
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add framework for supporting PCIe devices in Endpoint mode (Kishon
Vijay Abraham I)
- use non-postable PCI config space mappings when possible (Lorenzo
Pieralisi)
- clean up and unify mmap of PCI BARs (David Woodhouse)
- export and unify Function Level Reset support (Christoph Hellwig)
- avoid FLR for Intel 82579 NICs (Sasha Neftin)
- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
- short-circuit config access failures for disconnected devices (Keith
Busch)
- remove D3 sleep delay when possible (Adrian Hunter)
- freeze PME scan before suspending devices (Lukas Wunner)
- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
- add arch-specific alignment control to improve device passthrough by
avoiding multiple BARs in a page (Yongji Xie)
- add sysfs sriov_drivers_autoprobe to control VF driver binding
(Bodong Wang)
- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
- fix crashes when unbinding host controllers that don't support
removal (Brian Norris)
- add driver for MicroSemi Switchtec management interface (Logan
Gunthorpe)
- add driver for Faraday Technology FTPCI100 host bridge (Linus
Walleij)
- add i.MX7D support (Andrey Smirnov)
- use generic MSI support for Aardvark (Thomas Petazzoni)
- make Rockchip driver modular (Brian Norris)
- advertise 128-byte Read Completion Boundary support for Rockchip
(Shawn Lin)
- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
- convert atomic_t to refcount_t in HV driver (Elena Reshetova)
- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
- fix PCI bus removal in HV driver (Long Li)
- add support for ThunderX2 DMA alias topology (Jayachandran C)
- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
The test:
*commands && *commands == ' '
is equivalent to:
*commands == ' '
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Like other JITs, sparc64 maintains an array of instruction offsets but
stores the entries off by one. This is done because jumps to the
exit block are indexed to one past the last BPF instruction.
So if we size the array by the program length, we need to record
the previous instruction in order to stay within the array bounds.
This is explained in ARM JIT commit 8eee539dde ("arm64: bpf: fix
out-of-bounds read in bpf2a64_offset()").
But this scheme requires a little bit of careful handling when
the instruction before the branch destination is a 64-bit load
immediate. It takes up 2 BPF instruction slots.
Therefore, we have to fill in the array entry for the second
half of the 64-bit load immediate instruction rather than for
the one for the beginning of that instruction.
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer updates from Thomas Gleixner:
"The timer departement delivers:
- more year 2038 rework
- a massive rework of the arm achitected timer
- preparatory patches to allow NTP correction of clock event devices
to avoid early expiry
- the usual pile of fixes and enhancements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
timer/sysclt: Restrict timer migration sysctl values to 0 and 1
arm64/arch_timer: Mark errata handlers as __maybe_unused
Clocksource/mips-gic: Remove redundant non devicetree init
MIPS/Malta: Probe gic-timer via devicetree
clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK
acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver
clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
acpi/arm64: Add memory-mapped timer support in GTDT driver
clocksource: arm_arch_timer: simplify ACPI support code.
acpi/arm64: Add GTDT table parse driver
clocksource: arm_arch_timer: split MMIO timer probing.
clocksource: arm_arch_timer: add structs to describe MMIO timer
clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call
clocksource: arm_arch_timer: refactor arch_timer_needs_probing
clocksource: arm_arch_timer: split dt-only rate handling
x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks
unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks
um/time: Set ->min_delta_ticks and ->max_delta_ticks
tile/time: Set ->min_delta_ticks and ->max_delta_ticks
score/time: Set ->min_delta_ticks and ->max_delta_ticks
...
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
Doing a full 64-bit decomposition is really stupid especially for
simple values like 0 and -1.
But if we are going to optimize this, go all the way and try for all 2
and 3 instruction sequences not requiring a temporary register as
well.
First we do the easy cases where it's a zero or sign extended 32-bit
number (sethi+or, sethi+xor, respectively).
Then we try to find a range of set bits we can load simply then shift
up into place, in various ways.
Then we try negating the constant and see if we can do a simple
sequence using that with a xor at the end. (f.e. the range of set
bits can't be loaded simply, but for the negated value it can)
The final optimized strategy involves 4 instructions sequences not
needing a temporary register.
Otherwise we sadly fully decompose using a temp..
Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:
0000000000000000 <foo>:
0: 9d e3 bf 50 save %sp, -176, %sp
4: 01 00 00 00 nop
8: 90 10 00 18 mov %i0, %o0
c: 13 3f ff ff sethi %hi(0xfffffc00), %o1
10: 92 12 63 ff or %o1, 0x3ff, %o1 ! ffffffff <foo+0xffffffff>
14: 93 2a 70 10 sllx %o1, 0x10, %o1
18: 15 3f ff ff sethi %hi(0xfffffc00), %o2
1c: 94 12 a3 ff or %o2, 0x3ff, %o2 ! ffffffff <foo+0xffffffff>
20: 95 2a b0 10 sllx %o2, 0x10, %o2
24: 92 1a 60 00 xor %o1, 0, %o1
28: 12 e2 40 8a cxbe %o1, %o2, 38 <foo+0x38>
2c: 9a 10 20 02 mov 2, %o5
30: 10 60 00 03 b,pn %xcc, 3c <foo+0x3c>
34: 01 00 00 00 nop
38: 9a 10 20 01 mov 1, %o5 ! 1 <foo+0x1>
3c: 81 c7 e0 08 ret
40: 91 eb 40 00 restore %o5, %g0, %o0
Signed-off-by: David S. Miller <davem@davemloft.net>
cbcond combines a compare with a branch into a single instruction.
The limitations are:
1) Only newer chips support it
2) For immediate compares we are limited to 5-bit signed immediate
values
3) The branch displacement is limited to 10-bit signed
4) We cannot use it for JSET
Also, cbcond (unlike all other sparc control transfers) lacks a delay
slot.
Currently we don't have a useful instruction we can push into the
delay slot of normal branches. So using cbcond pretty much always
increases code density, and is therefore a win.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an eBPF JIT for sparc64. All major features are supported.
All tests under tools/testing/selftests/bpf/ pass.
Signed-off-by: David S. Miller <davem@davemloft.net>
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
In all cases we know which BAR it is. Passing it in means that arch code
(or generic code; watch this space) won't have to go looking for it again.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.
A 4.10 kernel fails to boot with the console output
Kernel: Using 8 locked TLB entries for main kernel image.
hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
Program terminated
with these config options
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_PROVE_LOCKING=n
To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.
Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.
Fixes: e6b5f1be7a ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is relatively esoteric, and knowing that we don't have it makes life
easier in some cases rather than just an eventual -EINVAL from
pci_mmap_page_range().
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We can declare it <linux/pci.h> even on platforms where it isn't going to
be defined. There's no need to have it littered through the various
<asm/pci.h> files.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Conflicts were simply overlapping changes. In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.
In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().
Signed-off-by: David S. Miller <davem@davemloft.net>
The mmustat_enable sysfs file accessor functions must run code on the
target CPU. This is achieved by temporarily setting the affinity of the
calling user space thread to the requested CPU and reset it to the original
affinity afterwards.
That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.
Replace it by using work_on_cpu() which guarantees to run the code on the
requested CPU.
Protection against CPU hotplug is not required as the open sysfs file
already prevents the removal from the CPU offline callback. Using the
hotplug protected version would actually be wrong because it would deadlock
against a CPU hotplug operation of the CPU associated to the sysfs file in
progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: herbert@gondor.apana.org.au
Cc: rjw@rjwysocki.net
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: jiangshanlai@gmail.com
Cc: sparclinux@vger.kernel.org
Cc: viresh.kumar@linaro.org
Cc: mpe@ellerman.id.au
Cc: tj@kernel.org
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131001270.2408@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the sparc arch's clockevent drivers initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from these
drivers.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Introduce a new getsockopt operation to retrieve the socket cookie
for a specific socket based on the socket fd. It returns a unique
non-decreasing cookie for each socket.
Tested: https://android-review.googlesource.com/#/c/358163/
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
It's unused for ages, used to be required for ksyms.c back in the v1.1
times.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc32:allmodconfig fails to build with the following error.
ERROR: "vac_cache_size" [drivers/infiniband/sw/rxe/rdma_rxe.ko] undefined!
Fixes: cb88645596 ("infiniband: Fix alignment of mmap cookies ...")
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory corruption was happening due to incorrect
TLB/TSB flushing of hugepages.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge PTRACE_SETREGSET leakage fixes from Dave Martin:
"This series is the collection of fixes I proposed on this topic, that
have not yet appeared upstream or in the stable branches,
The issue can leak kernel stack, but doesn't appear to allow userspace
to attack the kernel directly. The affected architectures are c6x,
h8300, metag, mips and sparc.
[ Mark Salter points out that c6x has no MMU or other mechanism to
prevent userspace access to kernel code or data on c6x, but it
doesn't hurt to clean that case up too. ]
The bugs arise from use of user_regset_copyin(). Users of
user_regset_copyin() can work in one of two ways:
1) Copy directly to thread_struct or equivalent. (This seems to be
the design assumption of the regset API, and is the most common
approach.)
2) Copy to a local variable and then transfer to thread_struct. (A
significant minority of cases.)
Buggy code typically involves approach 2"
* emailed patches from Dave Martin <Dave.Martin@arm.com>:
sparc/ptrace: Preserve previous registers for short regset write
mips/ptrace: Preserve previous registers for short regset write
metag/ptrace: Reject partial NT_METAG_RPIPE writes
metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
metag/ptrace: Preserve previous registers for short regset write
h8300/ptrace: Fix incorrect register transfer count
c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
Also address the "Programming Note" for optimal performance.
Here is the complete text from Oracle SPARC Architecture Specs.
6.3.4.7 DCTI Couples
"A delayed control transfer instruction (DCTI) in the delay slot of
another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
is deprecated in the Oracle SPARC Architecture; no new software should
place a DCTI in the delay slot of another DCTI, because on future Oracle
SPARC Architecture implementations DCTI couples may execute either
slowly or differently than the programmer assumes it will.
SPARC V8 and SPARC V9 Compatibility Note
The SPARC V8 architecture left behavior undefined for a DCTI couple. The
SPARC V9 architecture defined behavior in that case, but as of
UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
Software should not expect high performance from DCTI couples, and
performance of DCTI couples should be expected to decline further in
future processors.
Programming Note
As noted in TABLE 6-5 on page 115, an annulled branch-always
(branch-always with a = 1) instruction is not architecturally a DCTI.
However, since not all implementations make that distinction, for
optimal performance, a DCTI should not be placed in the instruction word
immediately following an annulled branch-always instruction (BA,A or
BPA,A)."
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I encountered this bug when using /proc/kcore to examine the kernel. Plus a
coworker inquired about debugging tools. We computed pa but did
not use it during the maximum physical address bits test. Instead we used
the identity mapped virtual address which will always fail this test.
I believe the defect came in here:
[bpicco@zareason linus.git]$ git describe --contains bb4e6e85da
v3.18-rc1~87^2~4
.
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This socket option returns the NAPI ID associated with the queue on which
the last frame is received. This information can be used by the apps to
split the incoming flows among the threads based on the Rx queue on which
they are received.
If the NAPI ID actually represents a sender_cpu then the value is ignored
and 0 is returned.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.
Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().
Suggested by Eric Dumazet.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.
If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.
If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
None of those file is ever included from uapi stuff, so __KERNEL__
is always defined. None of them is ever included from assembler
(they are only pulled from linux/uaccess.h, which _can't_ be
included from assembler), so __ASSEMBLY__ is never defined.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Update code that relied on sched.h including various MM types for them.
This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/hotplug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/hotplug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split more MM APIs out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.
The APIs that we are going to move are:
arch_pick_mmap_layout()
arch_get_unmapped_area()
arch_get_unmapped_area_topdown()
mm_update_next_owner()
Include the header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.
Create a trivial placeholder <linux/sched/topology.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.
Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So the original intention of tsk_cpus_allowed() was to 'future-proof'
the field - but it's pretty ineffectual at that, because half of
the code uses ->cpus_allowed directly ...
Also, the wrapper makes the code longer than the original expression!
So just get rid of it. This also shrinks <linux/sched.h> a bit.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.
(Michal Hocko provided most of the kerneldoc comment.)
Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typos and add the following to the scripts/spelling.txt:
partiton||partition
Link: http://lkml.kernel.org/r/1481573103-11329-7-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typos and add the following to the scripts/spelling.txt:
swith||switch
swithable||switchable
swithed||switched
swithing||switching
While we are here, fix the "update" to "updates" in the touched hunk in
drivers/net/wireless/marvell/mwifiex/wmm.c.
Link: http://lkml.kernel.org/r/1481573103-11329-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Often all is needed is these small helpers, instead of compiler.h or a
full kprobes.h. This is important for asm helpers, in fact even some
asm/kprobes.h make use of these helpers... instead just keep a generic
asm file with helpers useful for asm code with the least amount of
clutter as possible.
Likewise we need now to also address what to do about this file for both
when architectures have CONFIG_HAVE_KPROBES, and when they do not. Then
for when architectures have CONFIG_HAVE_KPROBES but have disabled
CONFIG_KPROBES.
Right now most asm/kprobes.h do not have guards against CONFIG_KPROBES,
this means most architecture code cannot include asm/kprobes.h safely.
Correct this and add guards for architectures missing them.
Additionally provide architectures that not have kprobes support with
the default asm-generic solution. This lets us force asm/kprobes.h on
the header include/linux/kprobes.h always, but most importantly we can
now safely include just asm/kprobes.h on architecture code without
bringing the full kitchen sink of header files.
Two architectures already provided a guard against CONFIG_KPROBES on its
kprobes.h: sh, arch. The rest of the architectures needed gaurds added.
We avoid including any not-needed headers on asm/kprobes.h unless
kprobes have been enabled.
In a subsequent atomic change we can try now to remove compiler.h from
include/linux/kprobes.h.
During this sweep I've also identified a few architectures defining a
common macro needed for both kprobes and ftrace, that of the definition
of the breakput instruction up. Some refer to this as
BREAKPOINT_INSTRUCTION. This must be kept outside of the #ifdef
CONFIG_KPROBES guard.
[mcgrof@kernel.org: fix arm64 build]
Link: http://lkml.kernel.org/r/CAB=NE6X1WMByuARS4mZ1g9+W=LuVBnMDnh_5zyN0CLADaVh=Jw@mail.gmail.com
[sfr@canb.auug.org.au: fixup for kprobes declarations moving]
Link: http://lkml.kernel.org/r/20170214165933.13ebd4f4@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170203233139.32682-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes
it was possible to remove the IB DMA mapping code entirely and
switch the RDMA stack to use the core DMA mapping code. This resulted
in a nice set of cleanups, but touched the entire tree. This branch
will be submitted separately to Linus at the end of the merge window
as per normal practice for tree wide changes like this.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
3e7mgpcwC+3XfA/6vU3F
=e0Si
-----END PGP SIGNATURE-----
Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
Patch "sparc64: Add 64K page size support"
unconditionally used __flush_huge_tsb_one_entry()
which is available only when hugetlb support is
enabled.
Another issue was incorrect TSB flushing for 64K
pages in flush_tsb_user().
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In add_node_ranges() when memblock resize happens, the iterator keeps using
the previous freed array. This bug cause hangs on machine where there are
over 128 memory blocks during boot. For example, on machines where memory
interleaving is small.
The problem is seen on T4-4 because it cant have 2T of memory, and memory
is interleaved at 8G. So we have 2T/8G = 256 regions to set node IDs. The
starting size of regions array is 128. Thus, we have to double at least one
time (actually we have to double twice because some memory is already
reserved and thus we need more than 256 regions). We start using an
incorrect pointer to the array after the first doubling.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add_node_ranges() takes 2.6s - 3.6s per 1T of boot time. On machine with 6T
memory it takes 15.4s, on 32T it would take 82s-115s of boot time.
This function sets NUMA ids for memory blocks, and scans the whole memory a
page at a time to do so. But, we could use values in latency groups mask
and match to determine the boundaries without checking every single page.
With the fix the add_node_ranges() time is reduced from 15.4s down to 0.2s
on machine with 6T memory.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch depends on:
[v6] sparc64: Multi-page size support
- Testing
Tested on Sonoma by running stream benchmark instance which allocated
48G worth of 64K pages.
boot params: default_hugepagesz=64K hugepagesz=64K hugepages=1310720
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for using multiple hugepage sizes simultaneously
on mainline. Currently, support for 256M has been added which
can be used along with 8M pages.
Page tables are set like this (e.g. for 256M page):
VA + (8M * x) -> PA + (8M * x) (sz bit = 256M) where x in [0, 31]
and TSB is set similarly:
VA + (4M * x) -> PA + (4M * x) (sz bit = 256M) where x in [0, 63]
- Testing
Tested on Sonoma (which supports 256M pages) by running stream
benchmark instances in parallel: one instance uses 8M pages and
another uses 256M pages, consuming 48G each.
Boot params used:
default_hugepagesz=256M hugepagesz=256M hugepages=300 hugepagesz=8M
hugepages=10000
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On panic, all other CPUs are stopped except the one which had
hit panic. To keep console alive, we need to migrate hvcons irq
to panicked CPU.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPU needs to be marked offline before stopping it. When not marked
offline, the xcall receives HV_EWOULDBLOCK and so assumes that not all
CPUs received the message, and retries. After 10000 retries, it finally
fails with fatal mondo timeout.
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When returning from the user probe code into userspace process, PC & NPC are
truncated to 32 bits.
Due to shared libraries getting loaded very high in the virtual address
space of
the process, placing a user probe inside a shared library makes the kernel
return into the process at the wrong address, causing it to seg'fault
most of
the time.
This patch prevents truncating PC and NPC.
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Reviewed-by: David Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently define macros referring to cpu_data if CONFIG_SMP is
defined, but only include the declaration if CONFIG_NUMA is defined.
Fixes: 541cc39433 ("sparc: fix a building error reported by kbuild")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The objects viking_ops, viking_sun4d_smp_ops and smp_cachetlb_ops of
type sparc32_cachetlb_ops are not modified anywhere after getting modified
in the init functions. Inside init their reference is also stored in a
pointer of type const struct sparc32_cachetlb_ops *. So these structures
are never modified after init, therefore add __ro_after to the declaration
of these structures.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
show_mem() allows to filter out node specific data which is irrelevant
to the allocation request via SHOW_MEM_FILTER_NODES. The filtering is
done in skip_free_areas_node which skips all nodes which are not in the
mems_allowed of the current process. This works most of the time as
expected because the nodemask shouldn't be outside of the allocating
task but there are some exceptions. E.g. memory hotplug might want to
request allocations from outside of the allowed nodes (see
new_node_page).
Get rid of this hardcoded behavior and push the allocation mask down the
show_mem path and use it instead of cpuset_current_mems_allowed. NULL
nodemask is interpreted as cpuset_current_mems_allowed.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a generic implementation for quite some time already. If there
is any arch specific information to be printed then we should add a
callback called from the generic code rather than duplicate the whole
show_mem.
The current code has resulted in the code duplication and the output
divergence which is both confusing and adds maintainance costs.
Let's just get rid of this mess.
Link: http://lkml.kernel.org/r/20170117091543.25850-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [UniCore32]
Acked-by: Helge Deller <deller@gmx.de> [for parisc]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrFn+AAoJEOvOhAQsB9HWbDAP/i1bMxYJSnwD2D/PBvzpY2AW
uinEigaRr6kpRCOfa9FrfgxokKfosZOx5h7Se3f6O3mPwgpsU+dqbaE18Z5XSgxh
+a9+HvAv3/XNZg7SvBtBaoYDblHWJ6AJ9rN9fuKg3e8btE3rSFG147vj1atlVz1+
iRsXcCPb1p5db2+wZdsYJPI5Zwt4N0nR6cxPX4RQ6jseiVqPpt/FDtB60RYCjbID
J0cOk1VV1Jn2H1Rfl+hjNQjIPMNx3zftOLQ2usr/kwuEqeuTKZR06yLXFOT6bdXU
6JBdfL+e2kHKbaLyJGr6MCjTokaMgN3SGZJWJqHgk5Nggq5BD+2c4AOs8t6URnE0
KThGiyY+YI5C/W6kMlEozLARiMKe4IIQpx1uj2Hv+YkndntvqjCqvfdQQJKnzm0G
YWfPnsG2dysiovwEOBoBwyFVFLFzzJ1o3uyRGkCzVGaLQVzD5ktAJM6ynMOxwcIn
zSN+agzdTAD7QJIDaa1p2r5fAqy7i4xIn2+ts1s9c410fdUTB4A2QJzTywjPAdCp
IRxcLLpYDeBZ5cbhqjR677WgPtteYFTljoX+/8BOFO2PI+HjKHrxfW02WSiCS0iu
CUndrlpmuyKlIrpw7mYpDTbORcSQSiUkB1pRGT7poh2p0KKAGSo9ZrRbx+qRvPdH
AxO+ZR6Jjj5LAMk5MkRz
=RK2h
-----END PGP SIGNATURE-----
Merge tag 'extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull exception table module split from Paul Gortmaker:
"Final extable.h related changes.
This completes the separation of exception table content from the
module.h header file. This is achieved with the final commit that
removes the one line back compatible change that sourced extable.h
into the module.h file.
The commits are unchanged since January, with the exception of a
couple Acks that came in for the last two commits a bit later. The
changes have been in linux-next for quite some time[1] and have got
widespread arch coverage via toolchains I have and also from
additional ones the kbuild bot has.
Maintaners of the various arch were Cc'd during the postings to
lkml[2] and informed that the intention was to take the remaining arch
specific changes and lump them together with the final two non-arch
specific changes and submit for this merge window.
The ia64 diffstat stands out and probably warrants a mention. In an
earlier review, Al Viro made a valid comment that the original header
separation of content left something to be desired, and that it get
fixed as a part of this change, hence the larger diffstat"
* tag 'extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (21 commits)
module.h: remove extable.h include now users have migrated
core: migrate exception table users off module.h and onto extable.h
cris: migrate exception table users off module.h and onto extable.h
hexagon: migrate exception table users off module.h and onto extable.h
microblaze: migrate exception table users off module.h and onto extable.h
unicore32: migrate exception table users off module.h and onto extable.h
score: migrate exception table users off module.h and onto extable.h
metag: migrate exception table users off module.h and onto extable.h
arc: migrate exception table users off module.h and onto extable.h
nios2: migrate exception table users off module.h and onto extable.h
sparc: migrate exception table users onto extable.h
openrisc: migrate exception table users off module.h and onto extable.h
frv: migrate exception table users off module.h and onto extable.h
sh: migrate exception table users off module.h and onto extable.h
xtensa: migrate exception table users off module.h and onto extable.h
mn10300: migrate exception table users off module.h and onto extable.h
alpha: migrate exception table users off module.h and onto extable.h
arm: migrate exception table users off module.h and onto extable.h
m32r: migrate exception table users off module.h and onto extable.h
ia64: ensure exception table search users include extable.h
...
cputime_t is now only used by two architectures:
* powerpc (when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y)
* s390
And since the core doesn't use it anymore, we don't need any arch support
from the others. So we can remove their stub implementations.
A final cleanup would be to provide an efficient pure arch
implementation of cputime_to_nsec() for s390 and powerpc and finally
remove include/linux/cputime.h .
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-36-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull sparc fixes from David Miller:
"Several small bug fixes and tidies, along with a fix for non-resumable
memory errors triggered by userspace"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Handle PIO & MEM non-resumable errors.
sparc64: Zero pages on allocation for mondo and error queues.
sparc: Fixed typo in sstate.c. Replaced panicing with panicking
sparc: use symbolic names for tsb indexing
User processes trying to access an invalid memory address via PIO will
receive a SIGBUS signal instead of causing a panic. Memory errors will
receive a SIGKILL since a SIGBUS may result in a coredump which may
attempt to repeat the faulting access.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Error queues use a non-zero first word to detect if the queues are full.
Using pages that have not been zeroed may result in false positive
overflow events. These queues are set up once during boot so zeroing
all mondo and error queue pages is safe.
Note that the false positive overflow does not always occur because the
page allocation for these queues is so early in the boot cycle that
higher number CPUs get fresh pages. It is only when traps are serviced
with lower number CPUs who were given already used pages that this issue
is exposed.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This file was using module.h for search_exception_table. We've
now separated that content out into its own file "extable.h" so
now move over to that. Unlike most other instances, we can't
delete the module.h include here since the file needs that for
the within_module_init definition.
Reported-by: kbuild test robot <lkp@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Introduce a new architecture-specific get_arch_dma_ops() function
that takes a struct bus_type * argument. Add get_dma_ops() in
<linux/dma-mapping.h>.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
enhance the performance for some cpu architecure, such as SPARC and so on.
Currently it only supports one special cpu architecture(SPARC) in 82599
driver to enable RO feature, this is not very common for other cpu architecture
which really needs RO feature.
This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Alexander Duyck <alexander.duyck@gmail.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use symbolic names MM_TSB_BASE and MM_TSB_HUGE instead of numeric values
0 and 1 in __tsb_context_switch. Code cleanup only, no functional change.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in having an extra type for extra confusion. u64 is
unambiguous.
Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;
@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement functions watchdog_nmi_enable and watchdog_nmi_disable to
enable/disable nmi watchdog. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch specific
nmi watchdogs using proc or sysctl interface.
Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog
It can also achieved using the sysctl parameter kernel.nmi_watchdog
Link: http://lkml.kernel.org/r/1478034826-43888-4-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Josh Hunt <johunt@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.
Link: http://lkml.kernel.org/r/20161110113544.76501.40008.stgit@ahduyck-blue-test.jf.intel.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull smp hotplug updates from Thomas Gleixner:
"This is the final round of converting the notifier mess to the state
machine. The removal of the notifiers and the related infrastructure
will happen around rc1, as there are conversions outstanding in other
trees.
The whole exercise removed about 2000 lines of code in total and in
course of the conversion several dozen bugs got fixed. The new
mechanism allows to test almost every hotplug step standalone, so
usage sites can exercise all transitions extensively.
There is more room for improvement, like integrating all the
pointlessly different architecture mechanisms of synchronizing,
setting cpus online etc into the core code"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
tracing/rb: Init the CPU mask on allocation
soc/fsl/qbman: Convert to hotplug state machine
soc/fsl/qbman: Convert to hotplug state machine
zram: Convert to hotplug state machine
KVM/PPC/Book3S HV: Convert to hotplug state machine
arm64/cpuinfo: Convert to hotplug state machine
arm64/cpuinfo: Make hotplug notifier symmetric
mm/compaction: Convert to hotplug state machine
iommu/vt-d: Convert to hotplug state machine
mm/zswap: Convert pool to hotplug state machine
mm/zswap: Convert dst-mem to hotplug state machine
mm/zsmalloc: Convert to hotplug state machine
mm/vmstat: Convert to hotplug state machine
mm/vmstat: Avoid on each online CPU loops
mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
tracing/rb: Convert to hotplug state machine
oprofile/nmi timer: Convert to hotplug state machine
net/iucv: Use explicit clean up labels in iucv_init()
x86/pci/amd-bus: Convert to hotplug state machine
x86/oprofile/nmi: Convert to hotplug state machine
...
Pull locking updates from Ingo Molnar:
"The tree got pretty big in this development cycle, but the net effect
is pretty good:
115 files changed, 673 insertions(+), 1522 deletions(-)
The main changes were:
- Rework and generalize the mutex code to remove per arch mutex
primitives. (Peter Zijlstra)
- Add vCPU preemption support: add an interface to query the
preemption status of vCPUs and use it in locking primitives - this
optimizes paravirt performance. (Pan Xinhui, Juergen Gross,
Christian Borntraeger)
- Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to
clean up and improve the s390 lock yielding machinery and its core
kernel impact. (Christian Borntraeger)
- Micro-optimize mutexes some more. (Waiman Long)
- Reluctantly add the to-be-deprecated mutex_trylock_recursive()
interface on a temporary basis, to give the DRM code more time to
get rid of its locking hacks. Any other users will be NAK-ed on
sight. (We turned off the deprecation warning for the time being to
not pollute the build log.) (Peter Zijlstra)
- Improve the rtmutex code a bit, in light of recent long lived
bugs/races. (Thomas Gleixner)
- Misc fixes, cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/paravirt: Fix bool return type for PVOP_CALL()
x86/paravirt: Fix native_patch()
locking/ww_mutex: Use relaxed atomics
locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()
locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()
Documentation/virtual/kvm: Support the vCPU preemption check
x86/xen: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
kvm: Introduce kvm_write_guest_offset_cached()
locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests
locking/spinlocks, s390: Implement vcpu_is_preempted(cpu)
locking/core, powerpc: Implement vcpu_is_preempted(cpu)
sched/core: Introduce the vcpu_is_preempted(cpu) interface
sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q
locking/core: Provide common cpu_relax_yield() definition
locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily
...
Pull sparc updates from David Miller:
"Just a bunch of small cleanups and fixes here, and support for user
probes from Allen Pais"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: fix a building error reported by kbuild
sparc64: fix typo in pgd_clear()
sparc64: restore irq in error paths in iommu
sparc: leon: Fix a retry loop in leon_init_timers()
sparc64: make string buffers large enough
sparc64: move dereference after check for NULL
sparc: kernel: use builtin_platform_driver
sparc64:Support User Probes for sparc
>> arch/sparc/include/asm/topology_64.h:44:44:
error: implicit declaration of function 'cpu_data'
[-Werror=implicit-function-declaration]
#define topology_physical_package_id(cpu) (cpu_data(cpu).proc_id)
^
Let's include cpudata.h in topology_64.h.
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It really has to be pgdp, not pgd.
It just happend to work since all callers have 'pgd' as an argument.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some error paths where we should restore IRQs but we don't.
Fixes: bb620c3d39 ("sparc: Make sparc64 use scalable lib/iommu-common.c functions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original code causes a static checker warning because it has a
continue inside a do { } while (0); loop. In that context, a continue
and a break are equivalent. The intent was to go back to the start of
the loop so the continue was a bug.
I've added a retry label at the start and changed the continue to a goto
retry. Then I removed the do { } while (0) loop and pulled the code in
one indent level.
Fixes: 2791c1a439 ("SPARC/LEON: added support for selecting Timer Core and Timer within core")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My static checker complains that if "lvl" is ULONG_MAX (this is 64 bit)
then some of the strings will overflow. I don't know if that's possible
but it seems simple enough to make the buffers slightly larger.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't dereference "iommu" until after we have checked that it is
non-NULL.
Fixes: f08978b0fd ("sparc64: Enable sun4v dma ops to use IOMMU v2 APIs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use builtin_platform_driver() helper to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation. For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.
To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.
The previous convention of keeping the files around until the CPU is dead
has not been preserved as there is no point to keep them available when the
cpu is going down. This makes the hotplug call symmetric.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: rt@linuxtronix.de
Link: http://lkml.kernel.org/r/20161117183541.8588-18-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop duplicate header scatterlist.h from iommu_common.h.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new config parameter limits the space used for "Lock debugging:
prove locking correctness" by about 4MB. The current sparc systems have
the limitation of 32MB size for kernel size including .text, .data and
.bss sections. With PROVE_LOCKING feature, the kernel size could grow
beyond this limit and causing system boot-up issues. With this option,
kernel limits the size of the entries of lock_chains, stack_trace etc.,
so that kernel fits in required size limit. This is not visible to user
and only used for sparc.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.
Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.
ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.
ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to duplicate the same define everywhere. Since
the only user is stop-machine and the only provider is
s390, we can use a default implementation of cpu_relax_yield()
in sched.h.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As there are no users left, we can remove cpu_relax_lowlatency()
implementations from every architecture.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477386195-32736-6-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A compile warning is introduced by a commit to fix the find_node().
This patch fix the compile warning by moving find_node() into __init
section. Because find_node() is only used by memblock_nid_range() which
is only used by a __init add_node_ranges(). find_node() and
memblock_nid_range() should also be inside __init section.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When booting up LDOM, find_node() warns that a physical address
doesn't match a NUMA node.
WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
find_node+0xf4/0x120 find_node: A physical address doesn't
match a NUMA node rule. Some physical memory will be
owned by node 0.Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
Call Trace:
[0000000000468ba0] __warn+0xc0/0xe0
[0000000000468c74] warn_slowpath_fmt+0x34/0x60
[00000000004592f4] find_node+0xf4/0x120
[0000000000dd0774] add_node_ranges+0x38/0xe4
[0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
[0000000000dd0e9c] bootmem_init+0xb8/0x160
[0000000000dd174c] paging_init+0x808/0x8fc
[0000000000dcb0d0] setup_arch+0x2c8/0x2f0
[0000000000dc68a0] start_kernel+0x48/0x424
[0000000000dcb374] start_early_boot+0x27c/0x28c
[0000000000a32c08] tlb_fixup_done+0x4c/0x64
[0000000000027f08] 0x27f08
It is because linux use an internal structure node_masks[] to
keep the best memory latency node only. However, LDOM mdesc can
contain single latency-group with multiple memory latency nodes.
If the address doesn't match the best latency node within
node_masks[], it should check for an alternative via mdesc.
The warning message should only be printed if the address
doesn't match any node_masks[] nor within mdesc. To minimize
the impact of searching mdesc every time, the last matched
mask and index is stored in a variable.
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.
When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.
We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.
Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.
First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages. This can be tuned in the future.
The huge range code path for each chip works as follows:
1) On spitfire we flush all non-locked TLB entries using diagnostic
acceses.
2) On cheetah we use the "flush all" TLB flush.
3) On sun4v/hypervisor we do a TLB context flush on context 0, which
unlike previous chips does not remove "permanent" or locked
entries.
We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.
Reported-by: James Clarke <jrtc27@jrtc27.com>
Tested-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like the non-cross-call TLB flush handlers, the cross-call ones need
to avoid doing PC-relative branches outside of their code blocks.
Signed-off-by: David S. Miller <davem@davemloft.net>
If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.
Based upon a patch and report by James Clarke.
Signed-off-by: David S. Miller <davem@davemloft.net>
Additionally, if the offset will overflow the immediate for a ba,pt
instruction, fall back on a standard ba to get an extra 3 bits.
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.
Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.
Use an absolute jmpl to fix this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
Its all generic atomic_long_t stuff now.
Tested-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all of the user copy routines are converted to return
accurate residual lengths when an exception occurs, we no longer need
the broken fixup routines.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
The fixup helper function mechanism for handling user copy fault
handling is not %100 accurrate, and can never be made so.
We are going to transition the code to return the running return
return length, which is always kept track in one or more registers
of each of these routines.
In order to convert them one by one, we have to allow the existing
behavior to continue functioning.
Therefore make all the copy code that wants the fixup helper to be
used return negative one.
After all of the user copy routines have been converted, this logic
and the fixup helpers themselves can be removed completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix [-Wold-style-declaration] GCC warnings by moving the inline keyword
before the return type.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix [-Wold-style-declaration] GCC warnings by moving the inline keyword
before the return type.
Signed-off-by: Tobias Klnuser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Individual scheduler domain should consist different hierarchy
consisting of cores sharing similar property. Currently, no
scheduler domain is defined separately for the cores that shares
the last level cache. As a result, the scheduler fails to take
advantage of cache locality while migrating tasks during load
balancing.
Here are the cpu masks currently present for sparc that are/can
be used in scheduler domain construction.
cpu_core_map : set based on the cores that shares l1 cache.
core_core_sib_map : is set based on the socket id.
The prior SPARC notion of socket was defined as highest level of
shared cache. However, the MD record on T7 platforms now describes
the CPUs that share the physical socket and this is no longer tied
to shared cache.
That's why a separate cpu mask needs to be created that truly
represent highest level of shared cache for all platforms.
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.
The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.
The patch series came out of the discussion around commit 38e0885465
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.
See
https://marc.info/?l=linux-mm&m=147585445805166
for the patch proposal"
Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.
[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]
* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kbuild updates from Michal Marek:
- EXPORT_SYMBOL for asm source by Al Viro.
This does bring a regression, because genksyms no longer generates
checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
working on a patch to fix this.
Plus, we are talking about functions like strcpy(), which rarely
change prototypes.
- Fixes for PPC fallout of the above by Stephen Rothwell and Nick
Piggin
- fixdep speedup by Alexey Dobriyan.
- preparatory work by Nick Piggin to allow architectures to build with
-ffunction-sections, -fdata-sections and --gc-sections
- CONFIG_THIN_ARCHIVES support by Stephen Rothwell
- fix for filenames with colons in the initramfs source by me.
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
initramfs: Escape colons in depfile
ppc: there is no clear_pages to export
powerpc/64: whitelist unresolved modversions CRCs
kbuild: -ffunction-sections fix for archs with conflicting sections
kbuild: add arch specific post-link Makefile
kbuild: allow archs to select link dead code/data elimination
kbuild: allow architectures to use thin archives instead of ld -r
kbuild: Regenerate genksyms lexer
kbuild: genksyms fix for typeof handling
fixdep: faster CONFIG_ search
ia64: move exports to definitions
sparc32: debride memcpy.S a bit
[sparc] unify 32bit and 64bit string.h
sparc: move exports to definitions
ppc: move exports to definitions
arm: move exports to definitions
s390: move exports to definitions
m68k: move exports to definitions
alpha: move exports to actual definitions
x86: move exports to actual definitions
...
Pull uaccess.h prepwork from Al Viro:
"Preparations to tree-wide switch to use of linux/uaccess.h (which,
obviously, will allow to start unifying stuff for real). The last step
there, ie
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
`git grep -l "$PATT"|grep -v ^include/linux/uaccess.h`
is not taken here - I would prefer to do it once just before or just
after -rc1. However, everything should be ready for it"
* 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
remove a stray reference to asm/uaccess.h in docs
sparc64: separate extable_64.h, switch elf_64.h to it
score: separate extable.h, switch module.h to it
mips: separate extable.h, switch module.h to it
x86: separate extable.h, switch sections.h to it
remove stray include of asm/uaccess.h from cacheflush.h
mn10300: remove a bogus processor.h->uaccess.h include
xtensa: split uaccess.h into C and asm sides
bonding: quit messing with IOCTL
kill __kernel_ds_p off
mn10300: finish verify_area() off
frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h
exceptions: detritus removal
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.
This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.
Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "improvements to the nmi_backtrace code" v9.
This patch series modifies the trigger_xxx_backtrace() NMI-based remote
backtracing code to make it more flexible, and makes a few small
improvements along the way.
The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu is
about to interrupt a task-isolated cpu. It can be helpful to see both
where the interrupting cpu is, and also an approximation of where the
cpu that is being interrupted is. The nmi_backtrace framework allows us
to discover the stack of the interrupted cpu.
I've tested that the change works as desired on tile, and build-tested
x86, arm, mips, and sparc64. For x86 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm, mips, and sparc I just build-tested it
and made sure the generic cpuidle routines were in the new cpuidle
section, but I didn't attempt to figure out which the platform-specific
idle routines might be. That might be more usefully done by someone
with platform experience in follow-up patches.
This patch (of 4):
Currently you can only request a backtrace of either all cpus, or all
cpus but yourself. It can also be helpful to request a remote backtrace
of a single cpu, and since we want that, the logical extension is to
support a cpumask as the underlying primitive.
This change modifies the existing lib/nmi_backtrace.c code to take a
cpumask as its basic primitive, and modifies the linux/nmi.h code to use
the new "cpumask" method instead.
The existing clients of nmi_backtrace (arm and x86) are converted to
using the new cpumask approach in this change.
The other users of the backtracing API (sparc64 and mips) are converted
to use the cpumask approach rather than the all/allbutself approach.
The mips code ignored the "include_self" boolean but with this change it
will now also dump a local backtrace if requested.
Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came to light when implementing native 64-bit atomics for ARCv2.
The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
to check whether atomic64_dec_if_positive() is available. It seems it
was needed when not every arch defined it. However as of current code
the Kconfig option seems needless
- for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
generic definition of API is present lib/atomic64.c
- arches with native 64-bit atomics select it in arch/*/Kconfig and
define the API in their headers
So I see no point in keeping the Kconfig option
Compile tested for:
- blackfin (CONFIG_GENERIC_ATOMIC64)
- x86 (!CONFIG_GENERIC_ATOMIC64)
- ia64
Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc updates from David Miller:
"Besides some cleanups the major thing here is supporting relaxed
ordering PCIe transactions on newer sparc64 machines, from Chris
Hyser"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: fixing ident and beautifying code
sparc64: Enable setting "relaxed ordering" in IOMMU mappings
sparc64: Enable PCI IOMMU version 2 API
sparc: migrate exception table users off module.h and onto extable.h
Good evening,
Following LinuxCodingStyle documentation and with the help of Sam, fixed
severals identation issues in the code, and few others cosmetic changes
And last and i hope least fixing my name :)
Signed-off-by : Dominique Carrel <netmonk@netmonk.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable relaxed ordering for memory writes in IOMMU TSB entry from
dma_4v_alloc_coherent(), dma_4v_map_page() and dma_4v_map_sg() when
dma_attrs DMA_ATTR_WEAK_ORDERING is set. This requires PCI IOMMU I/O
Translation Services version 2.0 API.
Many PCIe devices allow enabling relaxed-ordering (memory writes bypassing
other memory writes) for various DMA buffers. A notable exception is the
Mellanox mlx4 IB adapter. Due to the nature of x86 HW this appears to have
little performance impact there. On SPARC HW however, this results in major
performance degradation getting only about 3Gbps. Enabling RO in the IOMMU
entries corresponding to mlx4 data buffers increases the throughput to
about 13 Gbps.
Orabug: 19245907
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable Version 2 of the PCI IOMMU API needed for advanced features
such as PCI Relaxed Ordering and greater than 2 GB DMA address
space per root complex.
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:
- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.
- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.
- Handle unbalanced hotplug enable/disable calls more gracefully.
- Remove the obsolete CPU_STARTING/DYING notifier support.
- Convert another batch of notifier users.
The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.
The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...
Pull low-level x86 updates from Ingo Molnar:
"In this cycle this topic tree has become one of those 'super topics'
that accumulated a lot of changes:
- Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
x86 - preceded by an array of changes. v4.8 saw preparatory changes
in this area already - this is the rest of the work. Includes the
thread stack caching performance optimization. (Andy Lutomirski)
- switch_to() cleanups and all around enhancements. (Brian Gerst)
- A large number of dumpstack infrastructure enhancements and an
unwinder abstraction. The secret long term plan is safe(r) live
patching plus maybe another attempt at debuginfo based unwinding -
but all these current bits are standalone enhancements in a frame
pointer based debug environment as well. (Josh Poimboeuf)
- More __ro_after_init and const annotations. (Kees Cook)
- Enable KASLR for the vmemmap memory region. (Thomas Garnier)"
[ The virtually mapped stack changes are pretty fundamental, and not
x86-specific per se, even if they are only used on x86 right now. ]
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
x86/asm: Get rid of __read_cr4_safe()
thread_info: Use unsigned long for flags
x86/alternatives: Add stack frame dependency to alternative_call_2()
x86/dumpstack: Fix show_stack() task pointer regression
x86/dumpstack: Remove dump_trace() and related callbacks
x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
oprofile/x86: Convert x86_backtrace() to use the new unwinder
x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
perf/x86: Convert perf_callchain_kernel() to use the new unwinder
x86/unwind: Add new unwind interface and implementations
x86/dumpstack: Remove NULL task pointer convention
fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
lib/syscall: Pin the task stack in collect_syscall()
x86/process: Pin the target stack in get_wchan()
x86/dumpstack: Pin the target stack when dumping it
kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
sched/core: Add try_get_task_stack() and put_task_stack()
x86/entry/64: Fix a minor comment rebase error
iommu/amd: Don't put completion-wait semaphore on stack
...
Need to provide a dummy smp_fill_in_cpu_possible_map.
Fixes: 9b2f753ec2 ("sparc64: Fix cpu_possible_mask if nr_cpus is set")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, irq stack bootmem is allocated for all possible cpus
before nr_cpus value changes the list of possible cpus. As a result,
there is unnecessary wastage of bootmemory.
Move the irq stack bootmem allocation so that it happens after
possible cpu list is modified based on nr_cpus value.
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kernel boot parameter nr_cpus is set, it should define the number
of CPUs that can ever be available in the system i.e.
cpu_possible_mask. setup_nr_cpu_ids() overrides the nr_cpu_ids based
on the cpu_possible_mask during kernel initialization. If
cpu_possible_mask is not set based on the nr_cpus value, earlier part
of the kernel would be initialized using nr_cpus value leading to a
kernel crash.
Set cpu_possible_mask based on nr_cpus value. Thus setup_nr_cpu_ids()
becomes redundant and does not corrupt nr_cpu_ids value.
Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit af1b1a9b36 ("sparc64 mm: Fix base TSB sizing when hugetlb
pages are used") addressed the difference between hugetlb and THP
pages when computing TSB sizes. The following additional issues
were also discovered while working with the code.
In order to save memory, THP makes use of a huge zero page. This huge
zero page does not count against a task's RSS, but it does consume TSB
entries. This is similar to hugetlb pages. Therefore, count huge
zero page entries in hugetlb_pte_count.
Accounting of THP pages is done in the routine set_pmd_at().
Unfortunately, this does not catch the case where a THP page is split.
To handle this case, decrement the count in pmdp_invalidate().
pmdp_invalidate is only called when splitting a THP. However, 'sanity
checks' are added in case it is ever called for other purposes.
A more general issue exists with HPAGE_SIZE accounting.
hugetlb_pte_count tracks the number of HPAGE_SIZE (8M) pages. This
value is used to size the TSB for HPAGE_SIZE pages. However,
each HPAGE_SIZE page consists of two REAL_HPAGE_SIZE (4M) pages.
The TSB contains an entry for each REAL_HPAGE_SIZE page. Therefore,
the number of REAL_HPAGE_SIZE pages should be used to size the huge
page TSB. A new compile time constant REAL_HPAGE_PER_HPAGE is used
to multiply hugetlb_pte_count before sizing the TSB.
Changes from V1
- Fixed build issue if hugetlb or THP not configured
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To fix:
WARNING: vmlinux.o(.text.unlikely+0x580): Section mismatch in
reference from the function find_numa_latencies_for_group() to the
function .init.text:find_mlgroup()
The function find_numa_latencies_for_group() references the
function __init find_mlgroup(). This is often because
find_numa_latencies_for_group lacks a __init annotation or the
annotation of find_mlgroup is wrong.
It turns out find_numa_latencies_for_group is only called from:
static int __init numa_parse_mdesc(void)
and hence we can tag find_numa_latencies_for_group with __init.
In doing so we see that find_best_numa_node_for_mlgroup is only
called from within __init and hence can also be marked with __init.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
All users are converted to state machine, remove CPU_STARTING and the
corresponding CPU_DYING.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack. Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.
Note that an array of 50 ftrace_ret_stack structs is allocated for every
task. So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig. This removes some config file pollution and simplifies the
checking for the fp test.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
TS-41x
- s3c: clock fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
+S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
TMLpbWEoIsgFIZMP/hAP
=rpGB
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build/link time, include EXIT_TEXT as part
of __init and release it at system boot time.
Without this patch the link fails with:
`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o
Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge yet more updates from Andrew Morton:
- the rest of ocfs2
- various hotfixes, mainly MM
- quite a bit of misc stuff - drivers, fork, exec, signals, etc.
- printk updates
- firmware
- checkpatch
- nilfs2
- more kexec stuff than usual
- rapidio updates
- w1 things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
ipc: delete "nr_ipc_ns"
kcov: allow more fine-grained coverage instrumentation
init/Kconfig: add clarification for out-of-tree modules
config: add android config fragments
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
relay: add global mode support for buffer-only channels
init: allow blacklisting of module_init functions
w1:omap_hdq: fix regression
w1: add helper macro module_w1_family
w1: remove need for ida and use PLATFORM_DEVID_AUTO
rapidio/switches: add driver for IDT gen3 switches
powerpc/fsl_rio: apply changes for RIO spec rev 3
rapidio: modify for rev.3 specification changes
rapidio: change inbound window size type to u64
rapidio/idt_gen2: fix locking warning
rapidio: fix error handling in mbox request/release functions
rapidio/tsi721_dma: advance queue processing from transfer submit call
rapidio/tsi721: add messaging mbox selector parameter
rapidio/tsi721: add PCIe MRRS override parameter
rapidio/tsi721_dma: add channel mask and queue size parameters
...
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For PMD aligned (8M) hugepages, we currently allocate
all four page table levels which is wasteful. We now
allocate till PMD level only which saves memory usage
from page tables.
Also, when freeing page table for 8M hugepage backed region,
make sure we don't try to access non-existent PTE level.
Orabug: 22630259
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
do_sparc64_fault() calculates both the base and huge page RSS sizes and
uses this information in calls to tsb_grow(). The calculation for base
page TSB size is not correct if the task uses hugetlb pages. hugetlb
pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
does not include hugetlb pages. However, the number of pages based on
huge_pte_count (which does include hugetlb pages) is subtracted from
this value. This will result in an artificially small and often negative
RSS calculation. The base TSB size is then often set to max_tsb_size
as the passed RSS is unsigned, so a negative value looks really big.
THP pages are also accounted for in huge_pte_count, and THP pages are
accounted for in RSS so the calculation in do_sparc64_fault() is correct
if a task only uses THP pages.
A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
and THP pages can be used. Instead of a single counter, use two: one
for hugetlb and one for THP.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that these tests are off by one, which is true but not
life threatening.
arch/sparc/kernel/irq_32.c:169 irq_link()
error: buffer overflow 'irq_map' 384 <= 384
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On pre-Niagara systems, we fetch the fault address on data TLB
exceptions from the TLB_TAG_ACCESS register. But this register also
contains the context ID assosciated with the fault in the low 13 bits
of the register value.
This propagates into current_thread_info()->fault_address and can
cause trouble later on.
So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases
where it matters.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- new core infrastructure to allow better management of multi-queue
devices (interrupt spreading, node aware descriptor allocation ...)
- a new interrupt flow handler to support the new fangled Intel VMD
devices.
- yet another new interrupt controller driver.
- a series of fixes which addresses sparse warnings, missing
includes, missing static declarations etc from Ben Dooks.
- a fix for the error handling in the hierarchical domain allocation
code.
- the usual pile of small updates to core and driver code"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
genirq: Fix missing irq allocation affinity hint
irqdomain: Fix irq_domain_alloc_irqs_recursive() error handling
irq/Documentation: Correct result of echnoing 5 to smp_affinity
MAINTAINERS: Remove Jiang Liu from irq domains
genirq/msi: Fix broken debug output
genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
genirq/msi: Make use of affinity aware allocations
genirq: Use affinity hint in irqdesc allocation
genirq: Add affinity hint to irq allocation
genirq: Introduce IRQD_AFFINITY_MANAGED flag
genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
irqchip/s3c24xx: Fixup IO accessors for big endian
irqchip/exynos-combiner: Fix usage of __raw IO
irqdomain: Fix disposal of mappings for interrupt hierarchies
irqchip/aspeed-vic: Add irq controller for Aspeed
doc/devicetree: Add Aspeed VIC bindings
x86/PCI/VMD: Use untracked irq handler
genirq: Add untracked irq handler
irqchip/mips-gic: Populate irq_domain names
irqchip/gicv3-its: Implement two-level(indirect) device table support
...
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.
{pud,pmd}_alloc_one is using __GFP_REPEAT but it always allocates from
pgtable_cache which is initialzed to PAGE_SIZE objects. This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.
Link: http://lkml.kernel.org/r/1464599699-30131-13-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the third version of the patchset previously sent [1]. I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree. I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2. I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.
Motivation:
While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree. It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often. It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.
I think it makes some sense to get rid of them because they are just
making the semantic more unclear. Please note that GFP_REPEAT is
documented as
* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
* _might_ fail. This depends upon the particular VM implementation.
while !costly requests have basically nofail semantic. So one could
reasonably expect that order-0 request with __GFP_REPEAT will not loop
for ever. This is not implemented right now though.
I would like to move on with __GFP_REPEAT and define a better semantic
for it.
$ git grep __GFP_REPEAT origin/master | wc -l
111
$ git grep __GFP_REPEAT | wc -l
36
So we are down to the third after this patch series. The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests. This still needs some double checking which I will do later
after all the simple ones are sorted out.
I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them. Patches should be quite trivial to
review for stupid compile mistakes though. The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.
[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
This patch (of 19):
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we
have the full kernel tree with its usage for apparently order-0
allocations. This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).
Let's simply drop __GFP_REPEAT from those places. This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.
Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files. On sparc, these are PCI bus addresses, i.e., raw BAR values.
Previously pci_resource_to_user() computed the user address by
subtracting either pbm->io_space.start or pbm->mem_space.start from the
resource start.
We've already told the PCI core about those offsets here:
pci_scan_one_pbm()
pci_add_resource_offset(&resources, &pbm->io_space, pbm->io_space.start);
pci_add_resource_offset(&resources, &pbm->mem_space, pbm->mem_space.start);
pci_add_resource_offset(&resources, &pbm->mem64_space, pbm->mem_space.start);
so pcibios_resource_to_bus() knows how to do that translation.
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Replace the pci_resource_to_user() declarations in each arch that defines
HAVE_ARCH_PCI_RESOURCE_TO_USER with a single one in linux/pci.h.
Change the MIPS static inline implementation to a non-inline version so the
static inline doesn't conflict with the new non-static linux/pci.h
declaration.
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since all architectures have this implemented now natively, remove this
dead code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Implement FETCH-OP atomic primitives, these are very similar to the
existing OP-RETURN primitives we already have, except they return the
value of the atomic variable _before_ modification.
This is especially useful for irreversible operations -- such as
bitops (because it becomes impossible to reconstruct the state prior
to modification).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Y Knight <jyknight@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch updates/fixes all spin_unlock_wait() implementations.
The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.
This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.
I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.
Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.
I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.
Note: PPC32 will be fixed independently
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
sparc32:allmodconfig fails to build in next-20160602 as follows.
In file included from drivers/block/floppy.c:185:0:
include/linux/mc146818rtc.h: In function 'mc146818_is_updating':
include/linux/mc146818rtc.h:138:9: error: 'rtc_port' undeclared (first use in this function)
include/linux/mc146818rtc.h:138:9: note: each undeclared identifier is reported only once for each function it appears in
include/linux/mc146818rtc.h: In function 'mc146818_get_time':
include/linux/mc146818rtc.h:172:17: error: 'rtc_port' undeclared (first use in this function)
include/linux/mc146818rtc.h: In function 'mc146818_set_time':
include/linux/mc146818rtc.h:278:8: error: 'rtc_port' undeclared (first use in this function)
scripts/Makefile.build:295: recipe for target 'drivers/block/floppy.o' failed
The reason is a duplicate definition of the RTC_PORT macro. The
one in arch/sparc/include/asm/io_32.h was apparently used a long time
ago for the drivers/char/rtc.c driver that is not available on SPARC
any more, since we now select 'RTC_CLASS' unconditionally.
Removing the macro fixes the build problem, and for consistency,
this also removes the RTC_ALWAYS_BCD macro and the comment for both.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: fd09cc80165c ("rtc: cmos: move mc146818rtc code out of asm-generic/rtc.h")
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Pull sparc fixes from David Miller:
"sparc64 mmu context allocation and trap return bug fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix return from trap window fill crashes.
sparc: Harden signal return frame checks.
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.
Otherwise we can get an OOPS that looks like this:
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
[0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code. First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.
The userland register window fill handler is:
add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;
And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took. In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for. It just always branches to the last instruction in
the parent trap's handler.
For example, for a regular fault, the code goes:
winfix_trampoline:
rdpr %tpc, %g3
or %g3, 0x7c, %g3
wrpr %g3, %tnpc
done
All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.
On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons. The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).
This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary. Now if you look at them, we'll see at the end:
ba,a,pt %xcc, user_rtt_fill_fixup;
ba,a,pt %xcc, user_rtt_fill_fixup;
ba,a,pt %xcc, user_rtt_fill_fixup;
And oops, all three cases are handled like a fault.
This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.
So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.
So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.
Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
All signal frames must be at least 16-byte aligned, because that is
the alignment we explicitly create when we build signal return stack
frames.
All stack pointers must be at least 8-byte aligned.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf updates from Ingo Molnar:
"Mostly tooling and PMU driver fixes, but also a number of late updates
such as the reworking of the call-chain size limiting logic to make
call-graph recording more robust, plus tooling side changes for the
new 'backwards ring-buffer' extension to the perf ring-buffer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
perf record: Read from backward ring buffer
perf record: Rename variable to make code clear
perf record: Prevent reading invalid data in record__mmap_read
perf evlist: Add API to pause/resume
perf trace: Use the ptr->name beautifier as default for "filename" args
perf trace: Use the fd->name beautifier as default for "fd" args
perf report: Add srcline_from/to branch sort keys
perf evsel: Record fd into perf_mmap
perf evsel: Add overwrite attribute and check write_backward
perf tools: Set buildid dir under symfs when --symfs is provided
perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
perf annotate: Sort list of recognised instructions
perf annotate: Fix identification of ARM blt and bls instructions
perf tools: Fix usage of max_stack sysctl
perf callchain: Stop validating callchains by the max_stack sysctl
perf trace: Fix exit_group() formatting
perf top: Use machine->kptr_restrict_warned
perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
perf machine: Do not bail out if not managing to read ref reloc symbol
perf/x86/intel/p4: Trival indentation fix, remove space
...
On cheetahplus chips we take the ctx_alloc_lock in order to
modify the TLB lookup parameters for the indexed TLBs, which
are stored in the context register.
This is called with interrupts disabled, however ctx_alloc_lock
is an IRQ safe lock, therefore we must take acquire/release it
properly with spin_{lock,unlock}_irq().
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull sparc updates from David Miller:
"Some 32-bit kgdb cleanups from Sam Ravnborg, and a hugepage TLB flush
overhead fix on 64-bit from Nitin Gupta"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Reduce TLB flushes during hugepte changes
aeroflex/greth: fix warning about unused variable
openprom: fix warning
sparc32: drop superfluous cast in calls to __nocache_pa()
sparc32: fix build with STRICT_MM_TYPECHECKS
sparc32: use proper prototype for trapbase
sparc32: drop local prototype in kgdb_32
sparc32: drop hardcoding trap_level in kgdb_trap
During hugepage map/unmap, TSB and TLB flushes are currently
issued at every PAGE_SIZE'd boundary which is unnecessary.
We now issue the flush at REAL_HPAGE_SIZE boundaries only.
Without this patch workloads which unmap a large hugepage
backed VMA region get CPU lockups due to excessive TLB
flush calls.
Orabug: 22365539, 22643230, 22995196
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The binary GCD algorithm is based on the following facts:
1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.
On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.
There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available. This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.
If fast __ffs is not available, the "even/odd" GCD variant is used.
I use the following code to benchmark:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#define swap(a, b) \
do { \
a ^= b; \
b ^= a; \
a ^= b; \
} while (0)
unsigned long gcd0(unsigned long a, unsigned long b)
{
unsigned long r;
if (a < b) {
swap(a, b);
}
if (b == 0)
return a;
while ((r = a % b) != 0) {
a = b;
b = r;
}
return b;
}
unsigned long gcd1(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
if (!a || !b)
return r;
b >>= __builtin_ctzl(b);
for (;;) {
a >>= __builtin_ctzl(a);
if (a == b)
return a << __builtin_ctzl(r);
if (a < b)
swap(a, b);
a -= b;
}
}
unsigned long gcd2(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
if (!a || !b)
return r;
r &= -r;
while (!(b & r))
b >>= 1;
for (;;) {
while (!(a & r))
a >>= 1;
if (a == b)
return a;
if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}
unsigned long gcd3(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
if (!a || !b)
return r;
b >>= __builtin_ctzl(b);
if (b == 1)
return r & -r;
for (;;) {
a >>= __builtin_ctzl(a);
if (a == 1)
return r & -r;
if (a == b)
return a << __builtin_ctzl(r);
if (a < b)
swap(a, b);
a -= b;
}
}
unsigned long gcd4(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
if (!a || !b)
return r;
r &= -r;
while (!(b & r))
b >>= 1;
if (b == r)
return r;
for (;;) {
while (!(a & r))
a >>= 1;
if (a == r)
return r;
if (a == b)
return a;
if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}
static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
gcd0, gcd1, gcd2, gcd3, gcd4,
};
#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))
#if defined(__x86_64__)
#define rdtscll(val) do { \
unsigned long __a,__d; \
__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
} while(0)
static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
unsigned long a, unsigned long b, unsigned long *res)
{
unsigned long long start, end;
unsigned long long ret;
unsigned long gcd_res;
rdtscll(start);
gcd_res = gcd(a, b);
rdtscll(end);
if (end >= start)
ret = end - start;
else
ret = ~0ULL - start + 1 + end;
*res = gcd_res;
return ret;
}
#else
static inline struct timespec read_time(void)
{
struct timespec time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
return time;
}
static inline unsigned long long diff_time(struct timespec start, struct timespec end)
{
struct timespec temp;
if ((end.tv_nsec - start.tv_nsec) < 0) {
temp.tv_sec = end.tv_sec - start.tv_sec - 1;
temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec - start.tv_sec;
temp.tv_nsec = end.tv_nsec - start.tv_nsec;
}
return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
}
static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
unsigned long a, unsigned long b, unsigned long *res)
{
struct timespec start, end;
unsigned long gcd_res;
start = read_time();
gcd_res = gcd(a, b);
end = read_time();
*res = gcd_res;
return diff_time(start, end);
}
#endif
static inline unsigned long get_rand()
{
if (sizeof(long) == 8)
return (unsigned long)rand() << 32 | rand();
else
return rand();
}
int main(int argc, char **argv)
{
unsigned int seed = time(0);
int loops = 100;
int repeats = 1000;
unsigned long (*res)[TEST_ENTRIES];
unsigned long long elapsed[TEST_ENTRIES];
int i, j, k;
for (;;) {
int opt = getopt(argc, argv, "n:r:s:");
/* End condition always first */
if (opt == -1)
break;
switch (opt) {
case 'n':
loops = atoi(optarg);
break;
case 'r':
repeats = atoi(optarg);
break;
case 's':
seed = strtoul(optarg, NULL, 10);
break;
default:
/* You won't actually get here. */
break;
}
}
res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
memset(elapsed, 0, sizeof(elapsed));
srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
/* Do we have args? */
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
unsigned long long min_elapsed[TEST_ENTRIES];
for (k = 0; k < repeats; k++) {
for (i = 0; i < TEST_ENTRIES; i++) {
unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
if (k == 0 || min_elapsed[i] > tmp)
min_elapsed[i] = tmp;
}
}
for (i = 0; i < TEST_ENTRIES; i++)
elapsed[i] += min_elapsed[i];
}
for (i = 0; i < TEST_ENTRIES; i++)
printf("gcd%d: elapsed %llu\n", i, elapsed[i]);
k = 0;
srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
for (i = 1; i < TEST_ENTRIES; i++) {
if (res[j][i] != res[j][0])
break;
}
if (i < TEST_ENTRIES) {
if (k == 0) {
k = 1;
fprintf(stderr, "Error:\n");
}
fprintf(stderr, "gcd(%lu, %lu): ", a, b);
for (i = 0; i < TEST_ENTRIES; i++)
fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
}
}
if (k == 0)
fprintf(stderr, "PASS\n");
free(res);
return 0;
}
Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 10174
gcd1: elapsed 2120
gcd2: elapsed 2902
gcd3: elapsed 2039
gcd4: elapsed 2812
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9309
gcd1: elapsed 2280
gcd2: elapsed 2822
gcd3: elapsed 2217
gcd4: elapsed 2710
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9589
gcd1: elapsed 2098
gcd2: elapsed 2815
gcd3: elapsed 2030
gcd4: elapsed 2718
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9914
gcd1: elapsed 2309
gcd2: elapsed 2779
gcd3: elapsed 2228
gcd4: elapsed 2709
PASS
[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printk() takes some locks and could not be used a safe way in NMI
context.
The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc88093 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").
The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).
Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.
This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.
The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.
__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.
We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.
The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.
The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.
The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327
[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part]
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to call exit_thread from copy_process in a fail path. So make it
accept task_struct as a parameter.
[v2]
* s390: exit_thread_runtime_instr doesn't make sense to be called for
non-current tasks.
* arm: fix the comment in vfp_thread_copy
* change 'me' to 'tsk' for task_struct
* now we can change only archs that actually have exit_thread
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.
This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.
[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on recent thread on linux-arch (some weeks ago) I
decided to check how much work was required to build sparc32
with STRICT_MM_TYPECHECKS enabled.
The resulting binary (checked srmmu.o) was to my suprise smaller with
STRICT_MM_TYPECHECKS defined, than without.
As I have no working gear to test sparc32 bits at for the moment,
I did not enable STRICT_MM_TYPECHECKS - but was tempeted to do so.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This killed an extern ... in a .c file.
No functional change.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this so we pass the trap_level from the actual trap
code like we do in sparc64.
Add use on ENTRY(), ENDPROC() in the assembler function too.
This fixes a bug where the hardcoded value for trap_level
was the sparc64 value.
As the generic code does not use the trap_level argument
(for sparc32) - this patch does not have any functional impact.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
User visible:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXOn7eAAoJENZQFvNTUqpAsOAP/3f/XJekPQAnMcKRBp2noCuj
nRu1kBltVJyP8iOU5PKSJwel4F9ykNNMl+/rzzxHDo13IM8uc+HnZOJZ6e9mJIJ1
xqjdqM4EDlYYoFApJzCjTK6CMlevCazosdQT1bbmMDYVPc2uQR/GnutFrzqf/Plg
hEougIGtfrdy85g95CRdxpy2yMwDK4EwsiDRm9ib1hnuamQZl97buWemBVqSJmLY
p82E2aMU5Fv5+B8AO4I7V88ZmgpmryjxpM+LjffgNUDSKsSHrlG4NiQ3znV1bgst
Rc++w78+qxoIozOu6/IX8eSI2L/1eyM/yQ6Qre0KuvYXCl+NopTAYSSJlaA4tyHF
c55z7HucuyATN3PrFRHlbWUT/RMIVC0j0lnZOc7SJLl90hJQ+nv0iZcbYwMbeHu1
3LGlcd9jDwQYiClbaT9ATxZJ8B9An0/k/HJdatbAHN0wRomP2Ozz/qD2nmEbUwpV
sCyLOo/LJkvVkuUjSg6ZiOArNIk4iTSPSAUV+SAL6YOEOZMAX5ISUJQ174+zFC9a
gqtVsCXvwLIsndXb8ys1r9/fit/MUci0OzKX3SG1K765+E4Bk23KcAgMNbM/a7lp
ZmHDXMC+yBYcnYNnaxkp7c55CWUlKGOeR4e+KmB99KoeIleYgPhD2UM5beo61TmN
yUEPtiiFiZmTRkiAu83R
=7OdF
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-20160516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure changes:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I've just discovered that the useful-sounding has_transparent_hugepage()
is actually an architecture-dependent minefield: on some arches it only
builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
not, but on some of those (arm and arm64) it then gives the wrong
answer; and on mips alone it's marked __init, which would crash if
called later (but so far it has not been called later).
Straighten this out: make it available to all configs, with a sensible
default in asm-generic/pgtable.h, removing its definitions from those
arches (arc, arm, arm64, sparc, tile) which are served by the default,
adding #define has_transparent_hugepage has_transparent_hugepage to
those (mips, powerpc, s390, x86) which need to override the default at
runtime, and removing the __init from mips (but maybe that kind of code
should be avoided after init: set a static variable the first time it's
called).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Ning Qu <quning@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core infrastructural changes:
- Support for natively single-ended GPIO driver stages. This
means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than
(as we did before) try to emulate it by switching the line
to an input to get high impedance. This is also documented
throughly in Documentation/gpio/driver.txt for those of you
who did not understand one word of what I just wrote.
- Start to do away with the unnecessarily complex and
unitelligible ARCH_REQUIRE_GPIOLIB and
ARCH_WANT_OPTIONAL_GPIOLIB, another evolutional artifact from
the time when the GPIO subsystem was unmaintained. Archs can
now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs
ACKed the changes immediately so these are included in this
pull request.
- Advancing the use of the data pointer inside the GPIO device
for storing driver data by switching the PowerPC, Super-H
Unicore and a few other subarches or subsystem drivers in
ALSA SoC, Input, serial, SSB, staging etc to use it.
- The initialization now reads the input/output state of the
GPIO lines, so that each GPIO descriptor knows - if this
callback is implemented - whether the line is input or
output. This also reflects nicely in userspace "lsgpio".
- It is now possible to name GPIO producer names, line names,
from the device tree. (Platform data has been supported for
a while.) I bet we will get a similar mechanism for ACPI
one of those days. This makes is possible to get sensible
producer names for e.g. GPIO rails in "lsgpio" in userspace.
New drivers:
- New driver for the Loongson1.
- The XLP driver now supports Broadcom Vulcan ARM64.
- The IT87 driver now supports IT8620 and IT8628.
- The PCA953X driver now supports Galileo Gen2.
Driver improvements:
- MCP23S08 was switched to use the gpiolib irqchip helpers and
now also suppors level-triggered interrupts.
- 74x164 and RCAR now supports the .set_multiple() callback
- AMDPT was converted to use generic GPIO.
- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain
and in some cases open source.
- Implement the .get_direction() callback for a few more drivers
like PL061, Xgene.
Cleanups:
- Paul Gortmaker combed through the drivers and de-modularized
those who are not really modules.
- Move the GPIO poweroff DT bindings to the power subdir where
they belong.
- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXOuJ5AAoJEEEQszewGV1zNXsQAII5wtkP69WRJ3goYBKg1dZN
DkuLqZyVI4hCgRhptzUW10gDLHKKOCVubfetTJHSpyG/dWDJXPCyH6FHF+pW6lMX
y+em8kAvWctKpaosy4EM7O55/IohW0/fNCTOfzfrUNivjydFuA2XwPUiPqC7111O
DeKlC/t+W1JEvZTiKMi83pKq+9wqhiHmD0qxRHhV57S+MT8e7mdlSKOp7uUkKPkg
LPlerXosnmeFjL2emuSnKl/tq8pOyruU6uaIGG/uwpbo2W86Dok9GY2GWkQ4pANT
pDtprc4aJ/Clf6Q0CoKwQbmAozqTDeJo+Und9tRs2KuZRly2bWOcyVE0lyK+Y4s0
544LcKw2q6cB9ARZ6JExEVRJejPISGKMqo9TaHkyNSIJoiiatKYvNS4WVeFtTgbI
W+1WfM1svPymNRqVPO1PMLV+3m9dalDH2WjtaFF21uCAQ/G0AuPEHjEDbbx0HIpb
qrvWmYzZ97Rm/LdYROFRO53nEdCp2jh6c3n4/2kGYM8H0suvGxXZsB1g4i+Dm+B+
qKVTS282azlDuH9ohXeXizeb6atK6s8TC3Rmew97SmXDO00cUQzEQO/ZquRLHY9r
n83afQ4OL2Z9yruAxAk7pCshVSyheOsHuFPuZ7bwPW31VMdoWNRkhnaTUXMjGfYg
3y39IHrCKWNMCCVM1iNl
=z4d6
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel cycle v4.7:
Core infrastructural changes:
- Support for natively single-ended GPIO driver stages.
This means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than (as we
did before) try to emulate it by switching the line to an input to
get high impedance.
This is also documented throughly in Documentation/gpio/driver.txt
for those of you who did not understand one word of what I just
wrote.
- Start to do away with the unnecessarily complex and unitelligible
ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another
evolutional artifact from the time when the GPIO subsystem was
unmaintained.
Archs can now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs ACKed
the changes immediately so these are included in this pull request.
- Advancing the use of the data pointer inside the GPIO device for
storing driver data by switching the PowerPC, Super-H Unicore and
a few other subarches or subsystem drivers in ALSA SoC, Input,
serial, SSB, staging etc to use it.
- The initialization now reads the input/output state of the GPIO
lines, so that each GPIO descriptor knows - if this callback is
implemented - whether the line is input or output. This also
reflects nicely in userspace "lsgpio".
- It is now possible to name GPIO producer names, line names, from
the device tree. (Platform data has been supported for a while).
I bet we will get a similar mechanism for ACPI one of those days.
This makes is possible to get sensible producer names for e.g.
GPIO rails in "lsgpio" in userspace.
New drivers:
- New driver for the Loongson1.
- The XLP driver now supports Broadcom Vulcan ARM64.
- The IT87 driver now supports IT8620 and IT8628.
- The PCA953X driver now supports Galileo Gen2.
Driver improvements:
- MCP23S08 was switched to use the gpiolib irqchip helpers and now
also suppors level-triggered interrupts.
- 74x164 and RCAR now supports the .set_multiple() callback
- AMDPT was converted to use generic GPIO.
- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain and in some
cases open source.
- Implement the .get_direction() callback for a few more drivers like
PL061, Xgene.
Cleanups:
- Paul Gortmaker combed through the drivers and de-modularized those
who are not really modules.
- Move the GPIO poweroff DT bindings to the power subdir where they
belong.
- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less"
* tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits)
MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
gpio: zevio: make it explicitly non-modular
gpio: timberdale: make it explicitly non-modular
gpio: stmpe: make it explicitly non-modular
gpio: sodaville: make it explicitly non-modular
pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error
gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms
gpio: dt-bindings: add wd,mbl-gpio bindings
gpio: of: make it possible to name GPIO lines
gpio: make gpiod_to_irq() return negative for NO_IRQ
gpio: xgene: implement .get_direction()
gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver
gpio: tegra: Implement gpio_get_direction callback
gpio: set up initial state from .get_direction()
gpio: rename gpio-generic.c into gpio-mmio.c
gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case
gpio: dwapb: add gpio-signaled acpi event support
gpio: dwapb: convert device node to fwnode
gpio: dwapb: remove name from dwapb_port_property
gpio/qoriq: select IRQ_DOMAIN
...
Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.
This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf updates from Ingo Molnar:
"Bigger kernel side changes:
- Add backwards writing capability to the perf ring-buffer code,
which is preparation for future advanced features like robust
'overwrite support' and snapshot mode. (Wang Nan)
- Add pause and resume ioctls for the perf ringbuffer (Wang Nan)
- x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)
- x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
Zijlstra)
- x86 AUX (Intel PT) related enhancements and updates (Alexander
Shishkin)
- x86 MSR PMU driver enhancements and updates (Huang Rui)
- ... and lots of other changes spread out over 40+ commits.
Biggest tooling side changes:
- 'perf trace' features and enhancements. (Arnaldo Carvalho de Melo)
- BPF tooling updates (Wang Nan)
- 'perf sched' updates (Jiri Olsa)
- 'perf probe' updates (Masami Hiramatsu)
- ... plus 200+ other enhancements, fixes and cleanups to tools/
The merge commits, the shortlog and the changelogs contain a lot more
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf buildid-cache: Use lsdir() for looking up buildid caches
perf symbols: Use lsdir() for the search in kcore cache directory
perf tools: Use SBUILD_ID_SIZE where applicable
perf tools: Fix lsdir to set errno correctly
perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
perf trace: Move flock op beautifier to tools/perf/trace/beauty/
perf build: Add build-test for debug-frame on arm/arm64
perf build: Add build-test for libunwind cross-platforms support
perf script: Fix export of callchains with recursion in db-export
perf script: Fix callchain addresses in db-export
perf script: Fix symbol insertion behavior in db-export
perf symbols: Add dso__insert_symbol function
perf scripting python: Use Py_FatalError instead of die()
perf tools: Remove xrealloc and ALLOC_GROW
perf help: Do not use ALLOC_GROW in add_cmd_list
perf pmu: Make pmu_formats_string to check return value of strbuf
perf header: Make topology checkers to check return value of strbuf
perf tools: Make alias handler to check return value of strbuf
...
Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().
The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac4536355 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"
[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]
* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()
Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs.
Current cBPF ones:
# git grep -n HAVE_CBPF_JIT arch/
arch/arm/Kconfig:44: select HAVE_CBPF_JIT
arch/mips/Kconfig:18: select HAVE_CBPF_JIT if !CPU_MICROMIPS
arch/powerpc/Kconfig:129: select HAVE_CBPF_JIT
arch/sparc/Kconfig:35: select HAVE_CBPF_JIT
Current eBPF ones:
# git grep -n HAVE_EBPF_JIT arch/
arch/arm64/Kconfig:61: select HAVE_EBPF_JIT
arch/s390/Kconfig:126: select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
arch/x86/Kconfig:94: select HAVE_EBPF_JIT if X86_64
Later code also needs this facility to check for eBPF JITs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The system call tracing bug fix mentioned in the Fixes tag
below increased the amount of assembler code in the sequence
of assembler files included by head_64.S
This caused to total set of code to exceed 0x4000 bytes in
size, which overflows the expression in head_64.S that works
to place swapper_tsb at address 0x408000.
When this is violated, the TSB is not properly aligned, and
also the trap table is not aligned properly either. All of
this together results in failed boots.
So, do two things:
1) Simplify some code by using ba,a instead of ba/nop to get
those bytes back.
2) Add a linker script assertion to make sure that if this
happens again the build will fail.
Fixes: 1a40b95374 ("sparc: Fix system call tracing register handling.")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Joerg Abraham <joerg.abraham@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.
And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.
The new file is:
# cat /proc/sys/kernel/perf_event_max_stack
127
Chaging it:
# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256
But as soon as there is some event using callchains we get:
# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#
Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This symbols is not needed to get access to selecting the
GPIOLIB anymore: any arch can select GPIOLIB.
Cc: Michael Büsch <m@bues.ch>
Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Add code to recognize SPARC-Sonoma cpu correctly and update cpu hardware
caps and cpu distribution map. SPARC-Sonoma is based upon SPARC-M7 core
along with additional PCI functions added on and is reported by firmware
as "SPARC-SN".
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function pcibios_add_device() added by commit d0c31e0200
("sparc/PCI: Fix for panic while enabling SR-IOV") initializes
the dev_archdata by doing a memcpy from the PF. This has the
problem that it erroneously copies the OF device without
explicitly refcounting it.
As David Miller pointed out: "Generally speaking we don't
really support hot-plug for OF probed devices, but if we did
all of the device tree pointers have to be refcounted properly."
To fix this error, and also avoid code duplication, this patch
creates a new helper function, pci_init_dev_archdata(), that
initializes the fields in dev_archdata, and can be invoked
by callers after they have taken the needed refcounts
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Option is long gone, see
5d9efa7ee9 ("ipv6: Remove privacy config option.")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.
This shouldn't introduce any functional change.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull sparc fixes from David Miller:
"Minor typing cleanup from Joe Perches, and some comment typo fixes
from Adam Buchbinder"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Convert naked unsigned uses to unsigned int
sparc: Fix misspellings in comments.
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sparc's syscall_get_arch was buggy: it returned the task arch, not the
syscall arch. This could confuse seccomp and audit.
I don't think this is as bad for seccomp as it looks: sparc's 32-bit and
64-bit syscalls are numbered the same.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On sparc64 compat-enabled kernels, any task can make 32-bit and 64-bit
syscalls. is_compat_task returns true in 32-bit tasks, which does not
necessarily imply that the current syscall is 32-bit.
Provide an in_compat_syscall implementation that checks whether the
current syscall is compat.
As far as I know, sparc is the only architecture on which is_compat_task
checks the compat status of the task and on which the compat status of a
syscall can differ from the compat status of the task. On x86,
is_compat_task checks the syscall type, not the task type.
[akpm@linux-foundation.org: add comment, per Sam]
[akpm@linux-foundation.org: update comment, per Andy]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the more normal kernel definition/declaration style.
Done via:
$ git ls-files arch/sparc | \
xargs ./scripts/checkpatch.pl -f --fix-inplace --types=unspecified_int
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
Merge second patch-bomb from Andrew Morton:
- a couple of hotfixes
- the rest of MM
- a new timer slack control in procfs
- a couple of procfs fixes
- a few misc things
- some printk tweaks
- lib/ updates, notably to radix-tree.
- add my and Nick Piggin's old userspace radix-tree test harness to
tools/testing/radix-tree/. Matthew said it was a godsend during the
radix-tree work he did.
- a few code-size improvements, switching to __always_inline where gcc
screwed up.
- partially implement character sets in sscanf
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
sscanf: implement basic character sets
lib/bug.c: use common WARN helper
param: convert some "on"/"off" users to strtobool
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
include/linux/unaligned: force inlining of byteswap operations
include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
usb: common: convert to use match_string() helper
ide: hpt366: convert to use match_string() helper
ata: hpt366: convert to use match_string() helper
power: ab8500: convert to use match_string() helper
power: charger_manager: convert to use match_string() helper
drm/edid: convert to use match_string() helper
pinctrl: convert to use match_string() helper
device property: convert to use match_string() helper
lib/string: introduce match_string() helper
radix-tree tests: add test for radix_tree_iter_next
radix-tree tests: add regression3 test
...
Add ldmvsw.c driver
Details:
The ldmvsw driver very closely follows the sunvnet.c code and makes
use of the sunvnet_common.c code for core functionality.
A significant difference between sunvnet and ldmvsw driver is
sunvnet creates a network interface for each vnet-port *parent*
node in the MD while the ldmvsw driver creates a network interface
for every vsw-port node in the Machine Description (MD).
Therefore the netdev_priv() for sunvnet is a vnet structure while
the netdev_priv() for ldmvsw is a vnet_port structure.
Vnet_port structures allocated by ldmvsw have the vsw bit set.
When finding the net_device associated with a port, the common code keys
off this bit to use either the net_device found in the vnet_port or the
net_device in the vnet structure (see the VNET_PORT_TO_NET_DEVICE() macro in
sunvnet_common.h). This scheme allows the common code to work with
both drivers with minimal changes.
Similar to Xen, network interfaces created by the ldmvsw driver will always
have a HW Addr (i.e. mac address) of FE:FF:FF:FF:FF:FF and each will be
assigned the devname "vif<cfg_handle>.<port_id>" - where <cfg_handle> and
<port_id> are a unique handle/port pair assigned to the associated
vsw-port node in the MD.
Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Core changes:
- The gpio_chip is now a *real device*. Until now the gpio chips
were just piggybacking the parent device or (gasp) floating in
space outside of the device model. We now finally make GPIO chips
devices. The gpio_chip will create a gpio_device which contains
a struct device, and this gpio_device struct is kept private.
Anything that needs to be kept private from the rest of the kernel
will gradually be moved over to the gpio_device.
- As a result of making the gpio_device a real device, we have added
resource management, so devm_gpiochip_add_data() will cut down on
overhead and reduce code lines. A huge slew of patches convert
almost all drivers in the subsystem to use this.
- Building on making the GPIO a real device, we add the first step
of a new userspace ABI: the GPIO character device. We take small
steps here, so we first add a pure *information* ABI and the tool
"lsgpio" that will list all GPIO devices on the system and all
lines on these devices. We can now discover GPIOs properly from
userspace. We still have not come up with a way to actually *use*
GPIOs from userspace.
- To encourage people to use the character device for the future,
we have it always-enabled when using GPIO. The old sysfs ABI is
still opt-in (and can be used in parallel), but is marked as
deprecated. We will keep it around for the foreseeable future,
but it will not be extended to cover ever more use cases.
Cleanup:
- Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
includes. This dates back to when GPIO was an opt-in feature and
no shared library even existed: just a header file with proper
prototypes was provided and all semantics were up to the arch to
implement. These patches make the GPIO chip even more a proper
device and cleans out leftovers of the old in-kernel API here
and there. Still some cruft is left but it's very little now.
- There is still some clamping of return values for .get() going
on, but we now return sane values in the vast majority of drivers
and the errorpath is sanitized. Some patches for powerpc, blackfin
and unicore still drop in.
- We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
implementations to use gpiochip_add_data() and cut down on code
lines.
- MPC8xxx is converted to use the generic GPIO helpers.
- ATH79 is converted to use the generic GPIO helpers.
New drivers:
- WinSystems WS16C48
- Acces 104-DIO-48E
- F81866 (a F7188x variant)
- Qoric (a MPC8xxx variant)
- TS-4800
- SPI serializers (pisosr): simple 74xx shift registers connected
to SPI to obtain a dirt-cheap output-only GPIO expander.
- Texas Instruments TPIC2810
- Texas Instruments TPS65218
- Texas Instruments TPS65912
- X-Gene (ARM64) standby GPIO controller
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW6m24AAoJEEEQszewGV1zUasP/RpTrjRcNI5QFHjudd2oioDx
R/IljC06Q072ZqVy/MR7QxwhoU8jUnCgKgv4rgMa1OcfHblxC2R1+YBKOUSij831
E+SYmYDYmoMhN7j5Aslr66MXg1rLdFSdCZWemuyNruAK8bx6cTE1AWS8AELQzzTn
Re/CPpCDbujLy0ZK2wJHgr9ZkdcBGICtDRCrOR3Kyjpwk/DSZcruK1PDN+VQMI3k
bJlwgtGenOHINgCq/16edpwj/hzmoJXhTOZXJHI5XVR6czTwb3SvCYACvCkauI/a
/N7b3quG88b5y0OPQPVxp5+VVl9GyVcv5oGzIfTNat/g5QinShZIT4kVV9r0xu6/
TQHh1HlXleh+QI3yX0oRv9ztHreMf+vdpw1dhIwLqHqfJ7AWdOGk7BbKjwCrsOoq
t/qUVFnyvooLpyr53Z5JY8+LqyynHF68G+jUQyHLgTZ0GCE+z+1jqNl1T501n3kv
3CSlNYxSN/YUBN3cnroAIU/ZWcV4YRdxmOtEWP+7xgcdzTE6s/JHb2fuEfVHzWPf
mHWtJGy8U0IR4VSSEln5RtjhRr0PAjTHeTOGAmivUnaIGDziTowyUVF+X5hwC77E
DGTuLVx/Kniv173DK7xNAsUZNAETBa3fQZTgu+RfOpMiM1FZc7tI1rd7K7PjbyCc
d2M0gcq+d11ITJTxC7OM
=9AJ4
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel v4.6. There is quite a
lot of interesting stuff going on.
The patches to other subsystems and arch-wide are ACKed as far as
possible, though I consider things like per-arch <asm/gpio.h> as
essentially a part of the GPIO subsystem so it should not be needed.
Core changes:
- The gpio_chip is now a *real device*. Until now the gpio chips
were just piggybacking the parent device or (gasp) floating in
space outside of the device model.
We now finally make GPIO chips devices. The gpio_chip will create
a gpio_device which contains a struct device, and this gpio_device
struct is kept private. Anything that needs to be kept private
from the rest of the kernel will gradually be moved over to the
gpio_device.
- As a result of making the gpio_device a real device, we have added
resource management, so devm_gpiochip_add_data() will cut down on
overhead and reduce code lines. A huge slew of patches convert
almost all drivers in the subsystem to use this.
- Building on making the GPIO a real device, we add the first step of
a new userspace ABI: the GPIO character device. We take small
steps here, so we first add a pure *information* ABI and the tool
"lsgpio" that will list all GPIO devices on the system and all
lines on these devices.
We can now discover GPIOs properly from userspace. We still have
not come up with a way to actually *use* GPIOs from userspace.
- To encourage people to use the character device for the future, we
have it always-enabled when using GPIO. The old sysfs ABI is still
opt-in (and can be used in parallel), but is marked as deprecated.
We will keep it around for the foreseeable future, but it will not
be extended to cover ever more use cases.
Cleanup:
- Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
includes.
This dates back to when GPIO was an opt-in feature and no shared
library even existed: just a header file with proper prototypes was
provided and all semantics were up to the arch to implement. These
patches make the GPIO chip even more a proper device and cleans out
leftovers of the old in-kernel API here and there.
Still some cruft is left but it's very little now.
- There is still some clamping of return values for .get() going on,
but we now return sane values in the vast majority of drivers and
the errorpath is sanitized. Some patches for powerpc, blackfin and
unicore still drop in.
- We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
implementations to use gpiochip_add_data() and cut down on code
lines.
- MPC8xxx is converted to use the generic GPIO helpers.
- ATH79 is converted to use the generic GPIO helpers.
New drivers:
- WinSystems WS16C48
- Acces 104-DIO-48E
- F81866 (a F7188x variant)
- Qoric (a MPC8xxx variant)
- TS-4800
- SPI serializers (pisosr): simple 74xx shift registers connected to
SPI to obtain a dirt-cheap output-only GPIO expander.
- Texas Instruments TPIC2810
- Texas Instruments TPS65218
- Texas Instruments TPS65912
- X-Gene (ARM64) standby GPIO controller"
* tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (194 commits)
Revert "Share upstreaming patches"
gpio: mcp23s08: Fix clearing of interrupt.
gpiolib: Fix comment referring to gpio_*() in gpiod_*()
gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit
gpio: xgene: Fix kconfig for standby GIPO contoller
gpio: Add generic serializer DT binding
gpio: uapi: use 0xB4 as ioctl() major
gpio: tps65912: fix bad merge
Revert "gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free"
gpio: omap: drop dev field from gpio_bank structure
gpio: mpc8xxx: Slightly update the code for better readability
gpio: mpc8xxx: Remove *read_reg and *write_reg from struct mpc8xxx_gpio_chip
gpio: mpc8xxx: Fixup setting gpio direction output
gpio: mcp23s08: Add support for mcp23s18
dt-bindings: gpio: altera: Fix altr,interrupt-type property
gpio: add driver for MEN 16Z127 GPIO controller
gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free
gpio: timberdale: Switch to devm_ioremap_resource()
gpio: ts4800: Add IMX51 dependency
gpiolib: rewrite gpiodev_add_to_list
...
There are few things about *pte_alloc*() helpers worth cleaning up:
- 'vma' argument is unused, let's drop it;
- most __pte_alloc() callers do speculative check for pmd_none(),
before taking ptl: let's introduce pte_alloc() macro which does
the check.
The only direct user of __pte_alloc left is userfaultfd, which has
different expectation about atomicity wrt pmd.
- pte_alloc_map() and pte_alloc_map_lock() are redefined using
pte_alloc().
[sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
[sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>