Try to lock moved BOs if it's successful we can update the
PTEs directly to the new location.
v2: rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can actually handle invalid huge pages perfectly fine now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suppress warning messages when allocating huge pages fails since we can
always fall back to normal pages.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No one will use this function except ttm_bo_io_mem_pfn() now, so move
the calculation of ttm_bo_default_io_mem_pfn() into ttm_bo_io_mem_pfn()
and do some cleanup.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The default interface situation has been taken into the framework, so
remove the default set of each module.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add smu_free_memory when smu fini to prevent memory leakage
v2: squash in typo fix (Yintian) and warning (Harry)
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use adev->vm_manager.id_mgr[0].num_ids rather than hardcoded 16.
v2: use AMDGPU_GFXHUB rather than hardcoded 0 (Christian)
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Noticed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add quirks for handling PX/HG systems. In this case, add
a quirk for a weston dGPU that only seems to properly power
down using ATPX power control rather than HG (_PR3).
v2: append a new weston XT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com> (v2)
Reviewed-and-Tested-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: ad1afb0039 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hopefully the last set of fixes for 4.15.
iwlwifi
* fix DMA mapping regression since v4.14
wcn36xx
* fix dynamic power save which has been broken since the driver was commited
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaVLeaAAoJEG4XJFUm622bYBUH/2NhQwUJbKIrbxYhYpo0d++8
GK3TjxpzTByCe0nXGUT5iuDaY72i9C2UoLhFQ5smVg04pE0IXQQjZus6vSGx4biz
GUave/SkzL0EruUXjXLBYWiYDND4iynk82gWX2/Lh7qGoT2SQfD5cKz0cMdE5NrW
E7Q1CMiaoB0i9jcksaU2uWA0XPwISxl61kU2dXuKHQOJ1CW1goI/YIHsBajshHmi
ZEfMqxZFE+jz2Kkp4tKhvG/Xva0ylJv8bwK8CMK6MqA8oa3xdgOhv67E9mm7IGpD
ETrRLJNnWGJlnod5u7QOZWcS01gAgT5whCqDl/lTEty0823kvjMQdwckQaIU+AE=
=me16
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2018-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.15
Hopefully the last set of fixes for 4.15.
iwlwifi
* fix DMA mapping regression since v4.14
wcn36xx
* fix dynamic power save which has been broken since the driver was commited
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-8 reports
net/caif/caif_usb.c: In function 'cfusbl_device_notify':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
set_fipers() calling should be protected by spinlock in
case that any interrupt breaks related registers setting
and the function we expect. This patch is to move set_fipers()
to spinlock protecting area in ptp_gianfar_adjtime().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner says:
====================
sctp: Some sockopt optlen fixes
Hangbin Liu reported that some SCTP sockopt are allowing the user to get
the kernel to allocate really large buffers by not having a ceiling on
optlen.
This patchset address this issue (in patch 2), replace an GFP_ATOMIC
that isn't needed and avoid calculating the option size multiple times
in some setsockopt.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some sockopt handling functions were calculating the length of the
buffer to be written to userspace and then calculating it again when
actually writing the buffer, which could lead to some write not using
an up-to-date length.
This patch updates such places to just make use of the len variable.
Also, replace some sizeof(type) to sizeof(var).
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu reported that some sockopt calls could cause the kernel to log
a warning on memory allocation failure if the user supplied a large optlen
value. That is because some of them called memdup_user() without a ceiling
on optlen, allowing it to try to allocate really large buffers.
This patch adds a ceiling by limiting optlen to the maximum allowed that
would still make sense for these sockopt.
Reported-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So replace it with GFP_USER and also add __GFP_NOWARN.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for devfreq to dynamically control the GPU frequency.
By default try to use the 'simple_ondemand' governor which can
adjust the frequency based on GPU load.
v2: Fix __aeabi_uldivmod issue from the 0 day bot and use
devfreq_recommended_opp() as suggested by Rob.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
A collection of the last-minute small PCM fixes:
- A workaround for the recent regression wrt PulseAudio
- Removal of spurious WARN_ON() that is triggered by syzkaller
- Fixes for aloop, hardening racy accesses
- Fixes in PCM OSS emulation wrt the unabortable loops that may cause
RCU stall
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlpUeWoOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8rMxAAy+XJNWigvkWHd79ttKeAmndia/u9d+T6Ge/I
VI/SJSy8vhnGO0YNf/AHEs6vtad73XnXP76x1H3TkCsrDxykfhKogCvp0Aat/Ji7
LQFkhQKsaEdACm2TlPxmxpO64sYB8UjvcZBFS82tCmNCldMkwi8T+DDDHocP0A0D
pOQogjffqPBZdk7X1hJxoVKOm95GI1ms09+JPrLl47aa6mLIvNxa81RGnrVK5blE
+kYZQAblweGN8RsMVWqyrnxgRatF59UbV6JIKui/8KD2AXl3Hya/Dn2aFWtMqqH8
p8siLsUI+tACPucNk7tMt9UjHEy7yGK02hClhYVZG6vZ81nSoJsJFTdwXBMKjrfy
Fa1bBb8quM6WfBEHXB7YISulUrrc2nftkPhB/zIa5E9arkHWY4FL7jhdUTEAjkgr
D0Ka3Q/PtdXxmK+NBqUpoiqDHoOQeA5HG+njsz5L0xSbxoxMy8guyxSaoeF4BOnW
KbrVbzcJzSxDWPYGbKmeLEYHW8P3FOKNv9SI/WZErmyjQkeMiq7AuP93yYACFEyj
LhSAxBZ00sStl6IgM4Unw6p4Gi0SOawQfADDG4Arfr/fRA52l9wmpaUwU3uJ1RMn
gLvLfJkBbs/MwBjD5BPxfdjKIuREvMdUBwl/hZk1zp5d2ay0lAl6toNcee//MuHf
DKd1t3I=
=Fz+p
-----END PGP SIGNATURE-----
Merge tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of the last-minute small PCM fixes:
- A workaround for the recent regression wrt PulseAudio
- Removal of spurious WARN_ON() that is triggered by syzkaller
- Fixes for aloop, hardening racy accesses
- Fixes in PCM OSS emulation wrt the unabortable loops that may cause
RCU stall"
* tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
ALSA: aloop: Fix racy hw constraints adjustment
ALSA: aloop: Fix inconsistent format due to incomplete rule
ALSA: aloop: Release cable upon open error path
ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error
ALSA: pcm: Add missing error checks in OSS emulation plugin builder
ALSA: pcm: Remove incorrect snd_BUG_ON() usages
This register does not contain it. Instead, we have to look into FAULT_TLB_DATA0 & 1
(where, by the way, we can also get the address space).
v2: Right formatting
v3:
- Use 12 (as per the register format) instead of PAGE_SIZE (Chris)
- s/BITS_44_TO_47/HIGHBITS (Chris)
- Right formatting, this time for real
Fixes: b03ec3d67a ("drm/i915: There is only one fault register from GEN8 onwards")
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1513982329-32191-1-git-send-email-oscar.mateo@intel.com
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.
Make sure to scan all bytes before deciding to optimize the NOPs in there.
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
Daniel Borkmann says:
====================
pull-request: bpf 2018-01-09
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Prevent out-of-bounds speculation in BPF maps by masking the
index after bounds checks in order to fix spectre v1, and
add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for
removing the BPF interpreter from the kernel in favor of
JIT-only mode to make spectre v2 harder, from Alexei.
2) Remove false sharing of map refcount with max_entries which
was used in spectre v1, from Daniel.
3) Add a missing NULL psock check in sockmap in order to fix
a race, from John.
4) Fix test_align BPF selftest case since a recent change in
verifier rejects the bit-wise arithmetic on pointers
earlier but test_align update was missing, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The vmw_view_cmd_to_type() function returns vmw_view_max (3) on error.
It's one element beyond the end of the vmw_view_cotables[] table.
My read on this is that it's possible to hit this failure. header->id
comes from vmw_cmd_check() and it's a user controlled number between
1040 and 1225 so we can hit that error. But I don't have the hardware
to test this code.
Fixes: d80efd5cb3 ("drm/vmwgfx: Initial DX support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: <stable@vger.kernel.org>
While moving code around for solving lockdep issue for GuC log relay,
spotted that uc_fini_wq is not being called in failure path in gem_init.
Missed in the below commit. Add it.
v2: Removed GEM_BUG_ON(!HAS_GUC()) from intel_uc_fini_wq as init happens
only based on enable_guc module parameter and does not consider has_guc
capability. (Michal)
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Fixes: 3176ff49bc ("drm/i915/guc: Move GuC workqueue allocations outside of the mutex")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1515588857-10283-1-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Even though the default countable for CP0 is CP_ALWAYS_COUNT (0),
program the selector during HW initialization in an effort to be
up front about which counters are programmed and why.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some 5xx based chipsets have different bins for GPU clock speeds.
Read the fuses (if applicable) and set the appropriate OPP table.
This will only work with OPP v2 tables - the bin will be ignored
for legacy pwrlevel tables.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move the clock parsing to adreno_gpu_init() to allow for target
specific probing and manipulation of the clock tables.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to convert the chipid to an intermediate value and
then back again into a struct adreno_rev.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Remove the downstream bus scaling code. It isn't needed for for
compatibility with a downstream or vendor kernel. Get it out of the
way to clear space for devfreq support.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Calling dev_pm_opp_find_freq_floor() returns the matched frequency
in 'freq'. We don't need to call dev_pm_opp_get_freq() again
to get the frequency value.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We need to call dev_pm_opp_put() to put back the reference
for the OPP struct after calling the various dev_pm_opp_get_*
functions.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When cleaning up after a partially successful gntdev_mmap(), unmap the
successfully mapped grant pages otherwise Xen will kill the domain if
in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will
kill the process and emit "BUG: Bad page map in process" if Xen is in
release mode.
This is only needed when use_ptemod is true because gntdev_put_map()
will unmap grant pages itself when use_ptemod is false.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
If the requested range has a hole, the calculation of the number of
pages to unmap is off by one. Fix it.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Since commit f11a04464a ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:
rtc-s35390a 0-0030: error resetting chip
rtc-s35390a: probe of 0-0030 failed with error -5
More debug code reveals:
i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
i2c i2c-0: NAK from device addr 0x30 msg #0
s35390a_get_reg: ret = -6
Commit 02e479808b ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.
Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().
Fixes: 02e479808b ("gpio: Alter semantics of *raw* operations to actually be raw")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The SOR0 found on Tegra124 and Tegra210 only supports eDP and LVDS and
therefore has a slightly different clock tree than the SOR1 which does
not support eDP, but HDMI and DP instead.
Commit e1335e2f0c ("drm/tegra: sor: Reimplement pad clock") breaks
setups with eDP because the sor->clk_out clock is uninitialized and
therefore setting the parent clock (either the safe clock or either of
the display PLLs) fails, which can cause hangs later on since there is
no clock driving the module.
Fix this by falling back to the module clock for sor->clk_out on those
setups. This guarantees that the module will always be clocked by an
enabled clock and hence prevents those hangs.
Fixes: e1335e2f0c ("drm/tegra: sor: Reimplement pad clock")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
New device-tree properties are available which tell the hypervisor
settings related to the RFI flush. Use them to determine the
appropriate flush instruction to use, and whether the flush is
required.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A new hypervisor call is available which tells the guest settings
related to the RFI flush. Use it to query the appropriate flush
instruction(s), and whether the flush is required.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Because there may be some performance overhead of the RFI flush, add
kernel command line options to disable it.
We add a sensibly named 'no_rfi_flush' option, but we also hijack the
x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we
see 'nopti' we can guess that the user is trying to avoid any overhead
of Meltdown mitigations, and it means we don't have to educate every
one about a different command line option.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.
This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.
The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.
In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.
In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.
For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.
In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
allocating or re-allocating it if necessary.
In the case where an HPT of the right size already exists and it just
zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
entries loaded from the old HPT.
However, that situation can arise when the HPT is resizing as well - or
even when switching from an RPT to HPT - so those cases need a TLB flush as
well.
So, move the TLB flush to trigger in all cases except for errors.
Cc: stable@vger.kernel.org # v4.10+
Fixes: f98a8bf9ee ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Commit 96df226 ("KVM: PPC: Book3S PR: Preserve storage control bits")
added code to preserve WIMG bits but it missed 2 special cases:
- a magic page in kvmppc_mmu_book3s_64_xlate() and
- guest real mode in kvmppc_handle_pagefault().
For these ptes, WIMG was 0 and pHyp failed on these causing a guest to
stop in the very beginning at NIP=0x100 (due to bd9166ffe "KVM: PPC:
Book3S PR: Exit KVM on failed mapping").
According to LoPAPR v1.1 14.5.4.1.2 H_ENTER:
The hypervisor checks that the WIMG bits within the PTE are appropriate
for the physical page number else H_Parameter return. (For System Memory
pages WIMG=0010, or, 1110 if the SAO option is enabled, and for IO pages
WIMG=01**.)
This hence initializes WIMG to non-zero value HPTE_R_M (0x10), as expected
by pHyp.
[paulus@ozlabs.org - fix compile for 32-bit]
Cc: stable@vger.kernel.org # v4.11+
Fixes: 96df226 "KVM: PPC: Book3S PR: Preserve storage control bits"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Ruediger Oertel <ro@suse.de>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
smp_call_function_many() requires disabling preemption around the call.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <stable@vger.kernel.org> # v4.14+
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171215192310.25293-1-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This contains what I hope are the last RISC-V changes to go into 4.15.
I know it's a bit last minute, but I think they're all fairly small
changes:
* SR_* constants have been renamed to match the latest ISA
specification.
* Some CONFIG_MMU #ifdef cruft has been removed. We've never supported
!CONFIG_MMU.
* __NR_riscv_flush_icache is now visible to userspace. We were hoping
to avoid making this public in order to force userspace to call the
vDSO entry, but it looks like QEMU's user-mode emulation doesn't want
to emulate a vDSO. In order to allow glibc to fall back to a system
call when the vDSO entry doesn't exist we're just
* Our defconfig is no long empty. This is another one that just slipped
through the cracks. The defconfig isn't perfect, but it's at least
close to what users will want for the first RISC-V development board.
Getting closer is kind of splitting hairs here: none of the RISC-V
specific drivers are in yet, so it's not like things will boot out of
the box.
The only one that's strictly necessary is the __NR_riscv_flush_icache
change, as I want that to be part of the public API starting from our
first kernel so nobody has to worry about it. The others are nice to
haves, but they seem sane for 4.15 to me.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlpU/UcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQdy5D/9fcTwXTk98U2gSoR4Dv25tztqbNMhw
+Lae5EeIqAaPI4xfyLGldJe0BWAJaouZWIY5xkB5JWzsdYPx/jYgC+SbwI/3aGVy
VjcU0d4haZtz2kdm0Y0ZKIGg91vDlULoVvcxrM8Jff0gDmyKoT1OjwKpt3esyhmN
Vc+iC0FxtJow/xIaFlnPa42qh/pFkcLDmmY/Im6N8IEcyHBT6vCDnD3CgCFY/hdu
9vcWJDvFBj4SFwL8y+ajspQ4tPzDt4Ko+3NLxtEv+19y3NEgLm+shbxv/J8AVO8O
BvBr51QfggM2rAqGzCa4nEZZR7Roxgg9bJVQARXyzX1tUhtBEz9+eUArJ0tzMtbx
GyXYY5NwyupDJ/MA9yn+GqYlLNnS2yL2y0zIBJehi/37+KpAFtH/cRnA58sXViqw
IKGhKW7JCGU3/xyW+RtuY3N5urU18+qE4CZRLtI5QN0QRcTWLhqqQvQRud86HqqD
g4KPo6g9Z6Ak9Xu81n/liIExp3Vp2kpQUts1lCF1D+4WYRwpb4Mqy4HiOCSf/OO2
wOuX5HY+tbS8yvupgYjszTXaYDn35RoGkcjK9o1Lkq9RgI5kzHDyaQrSK/c/oAzn
A7cJ2z7dBaV0W4O7R+2SJ2k9DHw1db/WVf19pKVjSi5osSoUds5w1YxHK25cSBUz
+47LVCgkQI/Scw==
=PhUK
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains what I hope are the last RISC-V changes to go into 4.15.
I know it's a bit last minute, but I think they're all fairly small
changes:
- SR_* constants have been renamed to match the latest ISA
specification.
- Some CONFIG_MMU #ifdef cruft has been removed. We've never
supported !CONFIG_MMU.
- __NR_riscv_flush_icache is now visible to userspace. We were hoping
to avoid making this public in order to force userspace to call the
vDSO entry, but it looks like QEMU's user-mode emulation doesn't
want to emulate a vDSO. In order to allow glibc to fall back to a
system call when the vDSO entry doesn't exist we're just
- Our defconfig is no long empty. This is another one that just
slipped through the cracks. The defconfig isn't perfect, but it's
at least close to what users will want for the first RISC-V
development board. Getting closer is kind of splitting hairs here:
none of the RISC-V specific drivers are in yet, so it's not like
things will boot out of the box.
The only one that's strictly necessary is the __NR_riscv_flush_icache
change, as I want that to be part of the public API starting from our
first kernel so nobody has to worry about it. The others are nice to
haves, but they seem sane for 4.15 to me"
* tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
riscv: rename SR_* constants to match the spec
riscv: remove CONFIG_MMU ifdefs
RISC-V: Make __NR_riscv_flush_icache visible to userspace
RISC-V: Add a basic defconfig
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.15.
- Maciej Rozycki found another series of FP issues which requires a
seven part series to restructure and fix.
- James fixes a warning about .set mt which gas doesn't like when
building for R1 processors"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: CPS: Fix r1 .set mt assembler warning
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>