With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC. However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully. If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE. This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not
have this MSR. For these families, the LFENCE instruction is already
serializing.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
- One line fix to mlx4 error flow (same as mlx5 fix in last pull request,
just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that they
don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaU+ZkAAoJELgmozMOVy/dLw8P/1f27k9c7Bg91VfuyQeIcSxA
kyRDdzlkRzuI/6QJ4ErK+IkOH8ADG6UGmQa+fOv1dxG8do+YwVflcY7gEgjJA7fP
k0oPuGjiq8wrEWZrFGinln38ou0KALYd4F2C32unVYrsIohQLHSr1D6Ttw0W5FA6
NQG4nVn9FzmilgjqtkW2zOGKw4jdAn57J47tUp49KufuPBTUcxjmZCdaV5AmiuzN
5JpZUieL49Zoc18pcm1OreqDPZcj5LV1XquDNV+AZgU9+uGKoIb932k6hQjBRuml
FSePxpPjdN8zX/KVaa4HQHX4U4uMBp0HcRHYME1bDsKwTh/d9xKM/yTPzzCtJz+r
wmGJ9TPr2nq8blJJq17nSXbaJ4LmzlScCwork3LomdZJi880JwWJlvjFG3M/Yir9
HvS2zIOUJm+xZBNCDVEayYcBMkXew5XjxETtDwOvfYX8FM419LLk1WOp2y/4LKDD
hIR8QYkZMl37lMYqWZUghNjR7Rov6jdd30KDiCGdOAO/qszlNyTSL+icWyzc1t/X
VT4ai7vc0RTicPWwb8H8o8/dQNj8Ed8w5NnMq3hjen+KrTKShkZTMuW+or/E9jZN
ha9jIzSPLRfOvX6mZRrQVe6hiY3fOWMZXdw7gtehUy2hX7LCSwwbn2v6FcsDxyMQ
UW6ZVG3ccP9YSY+tBWKg
=kUnv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
- One line fix to mlx4 error flow (same as mlx5 fix in last pull
request, just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that
they don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/srpt: Fix ACL lookup during login
IB/srpt: Disable RDMA access by the initiator
RDMA/netlink: Fix locking around __ib_get_device_by_index
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Address a wmi initcall ordering race resulting in a difficult to
reproduce boot failure.
The following is an automated git shortlog grouped by driver:
wmi:
- Call acpi_wmi_init() later
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaU72UAAoJEKbMaAwKp364/xYH/28Hv4EBwwMMWDMCdqhRUqaZ
W0w5OiOZYXeFp48IE336udfONQs2IKOSnkq4A5C+vkcKSHAfM1aNY32Muk0E0Dpt
fXuI2SnvhDTUc9jvN7n3RiRw9KpQ2rReEPRcUWhAZ89HkuYBSJ6/uG+aKtbouOWg
Mxy1sVsKLxmWm4D+i2PNBo6b7zqAs7TL7FKeeaNx7SPe3sQb/xFNf+wFiD5VPW5S
ecEc2djOOBkVQ83xmrNSi5c9RpwlviKRwFDD+3LFX5RuXqdK+unbOq4iPUiCNSso
eAybSqohPPCcIgT1EpTsnXjtUGw0wfbkOz5aj7MEDodmTWGVVTvCEyq5fVHKjFY=
=SKD2
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fix from Darren Hart:
"Address a wmi initcall ordering race resulting in a difficult to
reproduce boot failure"
* tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: wmi: Call acpi_wmi_init() later
The TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error. Luckily, the driver never uses this
register. :-)
Fixes: 4a55530f38 ("net: sh_eth: modify the definitions of register")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.
Fixes: 4bdcb1dd9f ("net: Add MDIO bus driver for the Allwinner EMAC")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1463447 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner says:
====================
SCTP PMTU discovery fixes
This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.
Please consider these to stable.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad391919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.
The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.
Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.
The guest would face errors like:
[root@guest ~]# ethtool -i eth0
Cannot get driver information: No such device
[root@guest ~]# ifconfig eth0
eth0: error fetching interface information: Device not found
This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: 2 small bug fixes.
The first one fixes the TC Flower flow parameter passed to firmware. The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.
Fixes: db1d36a273 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fixes from Tejun Heo:
"This contains fixes for the following two non-trivial issues:
- The task iterator got broken while adding thread mode support for
v4.14. It was less visible because it only triggers when both
cgroup1 and cgroup2 hierarchies are in use. The recent versions of
systemd uses cgroup2 for process management even when cgroup1 is
used for resource control exposing this issue.
- cpuset CPU hotplug path could deadlock when racing against exits.
There also are two patches to replace unlimited strcpy() usages with
strlcpy()"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
cgroup: Fix deadlock in cpu hotplug path
cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
cgroup: avoid copying strings longer than the buffers
ARCH_HAS_REFCOUNT is no longer marked as broken ('if BROKEN'), so remove
the stale comment regarding it being broken.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171229195303.17781-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On uniprocessor systems, critical and non-critical tasks cannot be
isolated, as there is only a single CPU core. Hence enabling CPU
isolation by default on such systems does not make much sense.
Instead of changing the default for !SMP, fix this by making the feature
depend on SMP, with an override for compile-testing. Note that its sole
selector (NO_HZ_FULL) already depends on SMP.
This decreases kernel size for a default uniprocessor kernel by ca. 1 KiB.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2c43838c99 ("sched/isolation: Enable CONFIG_CPU_ISOLATION=y by default")
Link: http://lkml.kernel.org/r/1514891590-20782-1-git-send-email-geert@linux-m68k.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So one of the constification patches unearthed a type casting fragility
of the underlying code:
276c870547 ("x86/platform/intel-mid: Make 'bt_sfi_data' const")
converted the struct to be const while it is also used as a temporary
container for important data that is used to fill 'parent' and 'name'
fields in struct platform_device_info.
The compiler doesn't notice this due to an explicit type cast that loses
the const - which fragility will be fixed separately.
This type cast turned a seemingly trivial const propagation patch into a
hard to debug data corruptor and crasher bug.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bhumika Goyal <bhumirks@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: julia.lawall@lip6.fr
Cc: platform-driver-x86@vger.kernel.org
Link: http://lkml.kernel.org/r/20171228122523.21802-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
issues to appear on some systems and they are difficult to reproduce,
because there is no guaranteed ordering between subsys_initcall()
calls, so they may occur in different orders on different systems.
In particular, commit 86d9f48534 (mm/slab: fix kmemcg cache
creation delayed issue) exposed one of these issues where genl_init()
and acpi_wmi_init() are both called at the same initcall level, but
the former must run before the latter so as to avoid a NULL pointer
dereference.
For this reason, move the acpi_wmi_init() invocation to the
initcall_sync level which should still be early enough for things
to work correctly in the WMI land.
Link: https://marc.info/?t=151274596700002&r=1&w=2
Reported-by: Jonathan McDowell <noodles@earth.li>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
The following code contains dead logic:
162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }
There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.
Dave Hansen explained:
Yes, the double-test was part of an optimization where we attempted to
avoid using a global spinlock in the fork() path. We would check for
unallocated mid-level page tables without the lock. The lock was only
taken when we needed to *make* an entry to avoid collisions.
Now that it is all single-threaded, there is no chance of a collision,
no need for a lock, and no need for the re-check.
As all these functions are only called during init, mark them __init as
well.
Fixes: 03f4424f34 ("x86/mm/pti: Add functions to clone kernel PMDs")
Signed-off-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jiri Koshina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kees Cook <keescook@google.com>
Cc: Andi Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg KH <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108160341.3461-1-albcamus@gmail.com
This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it. PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.
Undo the poison to allow execution.
Fixes: 385ce0ea4c ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jeff Law <law@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David" <dwmw@amazon.co.uk>
Cc: Nick Clifton <nickc@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
The cross-release lockdep functionality has been removed in:
e966eaeeb6: ("locking/lockdep: Remove the cross-release locking checks")
... leaving the kernel parameter docs behind. The code handling
the parameter does not exist so this is a plain documentation change.
Signed-off-by: David Sterba <dsterba@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20180108152731.27613-1-dsterba@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PCM OSS read/write loops keep taking the mutex lock for the whole
read/write, and this might take very long when the exceptionally high
amount of data is given. Also, since it invokes with mutex_lock(),
the concurrent read/write becomes unbreakable.
This patch tries to address these issues by replacing mutex_lock()
with mutex_lock_interruptible(), and also splits / re-takes the lock
at each read/write period chunk, so that it can switch the context
more finely if requested.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Variable Length Arrays In Structs (VLAIS) is not supported by Clang, and
frowned upon by others.
https://lkml.org/lkml/2013/9/23/500
Here, the VLAIS was used because the size of the bitmap returned from
xen_mc_entry() depended on possibly (based on kernel configuration)
runtime sized data. Rather than declaring args as a VLAIS then calling
sizeof on *args, we calculate the appropriate sizeof args manually.
Further, we can get rid of the #ifdef's and rely on num_possible_cpus()
(thanks to a helpful checkpatch warning from an earlier version of this
patch).
Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The header declares this function as __init but is defined in __ref
section.
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The loops for read and write in PCM OSS emulation have no proper check
of pending signals, and they keep processing even after user tries to
break. This results in a very long delay, often seen as RCU stall
when a huge unprocessed bytes remain queued. The bug could be easily
triggered by syzkaller.
As a simple workaround, this patch adds the proper check of pending
signals and aborts the loop appropriately.
Reported-by: syzbot+993cb4cfcbbff3947c21@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.
Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.
Allow architectures to override the show function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
The hotplug code uses its own workqueue to handle IRQ requests
(pseries_hp_wq), however that workqueue is initialized after
init_ras_IRQ(). That can lead to a kernel panic if any hotplug
interrupts fire after init_ras_IRQ() but before pseries_hp_wq is
initialised. eg:
UDP-Lite hash table entries: 2048 (order: 0, 65536 bytes)
NET: Registered protocol family 1
Unpacking initramfs...
(qemu) object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
Unable to handle kernel paging request for data at address 0xf94d03007c421378
Faulting instruction address: 0xc00000000012d744
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2-ziviani+ #26
task: (ptrval) task.stack: (ptrval)
NIP: c00000000012d744 LR: c00000000012d744 CTR: 0000000000000000
REGS: (ptrval) TRAP: 0380 Not tainted (4.15.0-rc2-ziviani+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28088042 XER: 20040000
CFAR: c00000000012d3c4 SOFTE: 0
...
NIP [c00000000012d744] __queue_work+0xd4/0x5c0
LR [c00000000012d744] __queue_work+0xd4/0x5c0
Call Trace:
[c0000000fffefb90] [c00000000012d744] __queue_work+0xd4/0x5c0 (unreliable)
[c0000000fffefc70] [c00000000012dce4] queue_work_on+0xb4/0xf0
This commit makes the RAS IRQ registration explicitly dependent on the
creation of the pseries_hp_wq.
Reported-by: Min Deng <mdeng@redhat.com>
Reported-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The RISC-V port doesn't suport a nommu mode, so there is no reason
to provide some code only under a CONFIG_MMU ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
We were hoping to avoid making this visible to userspace, but it looks
like we're going to have to because QEMU's user-mode emulation doesn't
want to emulate a vDSO. Having vDSO-only system calls was a bit
unothodox anyway, so I think in this case it's OK to just make the
actual system call number public.
This patch simply moves the definition of __NR_riscv_flush_icache
availiable to userspace, which results in the deletion of the now empty
vdso-syscalls.h.
Changes since v1:
* I've moved the definition into uapi/asm/syscalls.h rathen than
uapi/asm/unistd.h. This allows me to keep asm/unistd.h, so we can
keep the syscall table macros sane.
* As a side effect of the above, this no longer disables all system
calls on RISC-V. Whoops!
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This patch provides a basic defconfig for the RISC-V
architecture that enables enough kernel features to run a
basic Linux distribution on qemu's "virt" board for native
software development. Features include:
- serial console
- virtio block and network device support
- VFAT and ext2/3/4 filesystem support
- NFS client and NFS rootfs support
- an assortment of other kernel features required for
running systemd
It also enables a number of drivers for physical hardware
that target the "SiFive U500" SoC and the corresponding
development platform. These include:
- PCIe host controller support for the FPGA-based U500
development platform (PCIE_XILINX)
- USB host controller support (OHCI/EHCI/XHCI)
- USB HID (keyboard/mouse) support
- USB mass storage support (bulk and UAS)
- SATA support (AHCI)
- ethernet drivers (MACB for a SoC-internal MAC block, microsemi
ethernet phy, E1000E and R8169 for PCIe-connected external devices)
- DRM and framebuffer console support for PCIe-connected
Radeon graphics chips
Signed-off-by: Karsten Merker <merker@debian.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Pull parisc fixes from Helge Deller:
- Many small fixes to show the real physical addresses of devices
instead of hashed addresses.
- One important fix to unbreak 32-bit SMP support: We forgot to 16-byte
align the spinlocks in the assembler code.
- Qemu support: The host will get a chance to sleep when the parisc
guest is idle. We use the same mechanism as the power architecture by
overlaying the "or %r10,%r10,%r10" instruction which is simply a nop
on real hardware.
* 'parisc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: qemu idle sleep support
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
parisc: Show unhashed EISA EEPROM address
parisc: Show unhashed HPA of Dino chip
parisc: Show initial kernel memory layout unhashed
parisc: Show unhashed hardware inventory
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaUiTsAAoJEAUvNnAY1cPYA1MP/37LQRImjRGq0wGp9RE1BrCI
VobxrcnzmfMSTlGYGHJOXt81dy6Xr+qcri9Qfr+x4fb79R49NOS+2usQZ3gHu05+
WZOgTOut87uGDzNSWawn9HrsCs89obi+G8jVBt1gCcG4oazp00EHiFQEy6Lkt9D0
5JB4Sx53aTPcqkS/uDrEIDtClnBn/c5EYInKMh4aTaDOAaOAva67n6QELRWyUHoz
y1uf7YQWOoUkuQIgGFhsH/mcvSu71W4ZbIO1CuvsTBcUdXi0q7Hdp23EWdLuYgPe
dPTs1ibhE7gEg0xJ5f2lqF8uL8XYAsZGSWhgPDtcbWq+kpRIGa8TuHhuca84tRMU
6hf3xZwqYFItG9fzLxD6ZJdEmYdHFxlY6KHQ+azTxUdkomJJB72wsRdhB679+Xsa
GMGJ4W4xxUHX+2u5I7p/5FifzmHf2g6YK+v4kkz1U8d9Vgh68b20V+ioC7RgR6k6
qDuVxs0K3g5ikP8WQKzHwKfEp1Z62uHV8HnsmRmuYzoPzbPy3szmgM8tGk+/02Qw
JHEf/umPauG1QjHLMU8HkiB6OP8wFs0Y/mma+Iqy2WrFFPo0oa3A4AW3HKRC5imS
lUryPugYAoioAcu4raYZYKw/fv16YKP0wwbcLKKH2jA6TlUqseJaCW6K1asgMqbh
72UyDLCIjrAmlhpSdLhg
=b5fz
-----END PGP SIGNATURE-----
Merge tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor fix from John Johansen:
"This fixes a regression when the kernel feature set is reported as
supporting mount and policy is pinned to a feature set that does not
support mount mediation"
* tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix regression in mount mediation when feature set is pinned
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaUhWqAAoJEL1qUBy3i3wmDysP/1mi2O3hzvTW06a3c4HMZpi/
Wp4HHuJkJQhq3UUT/s9OLzimWVJhH8lNorHm0zPOYCj3C7L9NBgomV1JCdSSevQM
b1L2u5oXWnZrutLE7Echmf5CC6b9ntZBlfQIzULS7N7e0wFwpBPOF06nhfx/nkvT
9quci3Ha9NTIEUovuj3Lqs4Nh76qkvFhZ0LK39KWT5Gx60niCPTWmyCMEtp5qvWl
Y+RXvyRD7XYTSdkL8WJ3j/fDFVdj1zyRziir1BJ8K365TeyxH1Jb0EO/QY6ZnEX0
EMMi85GklPlMa911Hu3fCdok7dYoIVnC1BYnd3n8Em9EkvFUpvdCRAUrTHK/y1iy
1HhunswN+UTXp/04tjGDxlmXJt9DlfbcCi95hNKaeMBTO6/rj2AOug7xDqJOqTSg
/TZYEuOnvAKa0bv4JdtdX9Aw1Tl5vIEL/rYPhW/BPQk6yz7+1LFIB4WUC11lmb/T
4da6/Oxxn/94LD5mq9FriQPL+ibNPVWDe747x8yh84lC6/UT7TnxuWZGE70QvBa7
1bZHb9T2yK+/d05QjejdgQqdhKUAtAnSWv3/bTow9JdGa5HeJpiLaWa/KvLvYuua
y3A7QrpPVvdsyTWKzhgbAzV9OvQSsvnxigEQUPTLamCEWv20PZFaP775ubWOZqIx
05ZFOYSPLF/wyHqcttMM
=9XZT
-----END PGP SIGNATURE-----
Merge tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski:
"The commit 2b83ff96f5 for 4.15-rc6, which was fixing LED brightness
setting after clearing delay_off broke the behavior on any alteration
of delay_on{off} properties, due to use of a LED core helper that does
too much for this particular case"
* tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: core: Fix regression caused by commit 2b83ff96f5
Commit 2b83ff96f5 ("led: core: Fix brightness setting when setting delay_off=0")
replaced del_timer_sync(&led_cdev->blink_timer) with led_stop_software_blink()
in led_blink_set(), which additionally clears LED_BLINK_SW flag as well as
zeroes blink_delay_on and blink_delay_off properties of the struct led_classdev.
Cleansing of the latter ones wasn't required to fix the original issue but
wasn't considered harmful. It nonetheless turned out to be so in case when
pointer to one or both props is passed to led_blink_set() like in the
ledtrig-timer.c. In such cases zeroes are passed later in delay_on and/or
delay_off arguments to led_blink_setup(), which results either in stopping
the software blinking or setting blinking frequency always to 1Hz.
Avoid using led_stop_software_blink() and add a single call required
to clear LED_BLINK_SW flag, which was the only needed modification to
fix the original issue.
Fixes 2b83ff96f5 ("led: core: Fix brightness setting when setting delay_off=0")
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Pull vfs fixes from Al Viro:
- untangle sys_close() abuses in xt_bpf
- deal with register_shrinker() failures in sget()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
sget(): handle failures of register_shrinker()
mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
s390:
* Two fixes for potential bitmap overruns in the cmma migration code
x86:
* Clear guest provided GPRs to defeat the Project Zero PoC for CVE
2017-5715
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaUTJ4AAoJEED/6hsPKofohk0IAJAFlMG66u5MxC0kSM61U4Zf
1vkzRwAkBbcN82LpGQKbqabVyTq0F3aLipyOn6WO5SN0K5m+OI2OV/aAroPyX8bI
F7nWIqTXLhJ9X6KXINFvyavHMprvWl8PA72tR/B/7GhhfShrZ2wGgqhl0vv/kCUK
/8q+5e693yJqw8ceemin9a6kPJrLpmjeH+Oy24KIlGbvJWV4UrIE86pRHnAnBtg8
L7Vbxn5+ezKmakvBh+zF8NKcD1zHDcmQZHoYFPsQT0vX5GPoYqT2bcO6gsh1Grmp
8ti6KkrnP+j2A/OEna4LBWfwKI/1xHXneB22BYrAxvNjHt+R4JrjaPpx82SEB4Y=
=URMR
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"s390:
- Two fixes for potential bitmap overruns in the cmma migration code
x86:
- Clear guest provided GPRs to defeat the Project Zero PoC for CVE
2017-5715"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vmx: Scrub hardware GPRs at VM-exit
KVM: s390: prevent buffer overrun on memory hotplug during migration
KVM: s390: fix cmma migration for multiple memory slots
since commit 82abbf8d2f the verifier rejects the bit-wise
arithmetic on pointers earlier.
The test 'dubious pointer arithmetic' now has less output to match on.
Adjust it.
Fixes: 82abbf8d2f ("bpf: do not allow root to mangle valid pointers")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add psock NULL check to handle a racing sock event that can get the
sk_callback_lock before this case but after xchg happens causing the
refcnt to hit zero and sock user data (psock) to be null and queued
for garbage collection.
Also add a comment in the code because this is a bit subtle and
not obvious in my opinion.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb2 ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Cc: stable@vger.kernel.org
Reported-by: Sean Nyekjær <sean.nyekjaer@prevas.dk>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Richard Weinberger <richard@nod.at>