Swap the I/O memory read value back to cpu endianness before storing it in
a data structures which are defined in the MPI headers where u8 components
are not defined in the endianness order.
In this area from day one mpt3sas driver is using le32_to_cpu() &
cpu_to_le32() APIs. But in commit cf6bf9710c
(mpt3sas: Bug fix for big endian systems) we have removed these APIs
before reading I/O memory which we should haven't done it. So
in this patch I am correcting it by adding these APIs back
before accessing I/O memory.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Refuse invalid port numbers in the modify_qp system call
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbZH4AAAoJEDht9xV+IJsausAP/RBqXIgC7qvZGrrmhBo5dF1j
n3+fmynR6KaGDMYRlEbuj+Ce70HbOWyvHx/KgyEMtejjtldVvQqBmGYGR6Dcpn0/
NA3VWFm7XaM+Oxg136NwkWtdiwUt0a/Ois6wSZsY6XJkXzTKmlyImcDJx1oQwvxZ
+WwVx7t3kWoCWTXlcbAmz41gDHJFD7vszavhgz0o7Ik1IloYs4bV7NVgGw/athan
2M8Df7tZNHhMgZDFhp3HeMbkjgApwnSMbGwtaEuaBZF4y1yEw7tp/xn6b87LzP9N
1JoVPenCp5+Pzw73FgT2sPY9gfI6RieHH/ZGM09Ng9awwPe44pIPnbH9AF6KxlBB
xFZT43i33CP1+85NFeSyxDeW3wwYXi48Qup7brtJ91LyD/KUmihAyoBuVaFbfYOr
lOPn8aMBvyumprSjAGzCiiEO8nrwoGs8ZRMJeQXyBkpT9siPvLAMUt5ZweWl4rD2
48/oR+KOrJ2rJCLwsmuQBoevxAJZ0/FVlg/o4oBk8ntjbDKVJzTZgEccqN+6sKrj
W3CgK8RHpXWWIScZHcAseQvdKrrOPNtNszOYtXFF8QjdP1QzQvoisrd9yM1L+/+g
RlsvMpLTHTF0S+d2mGNtB0xY6jh5HOAT62YNJGt8GNJ4d5c23bQ3s+SuiWLrF+6f
jKER3Mz0YbwwVR07Ai2h
=8q1F
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fix from Jason Gunthorpe:
"One bug for missing user input validation: refuse invalid port numbers
in the modify_qp system call"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/uverbs: Expand primary and alt AV port checks
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltkcI4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuByD/9Pf8mfsv/WDJposa5pRqnY18dtBNdfHjsb
t6vuBZiLxe86LDo1Y3e7663u+kgraARyswUpNTv5iAKseVkwIcz0RTArQJkaDLy7
AD2BEfLvpcacAoWZ96MiayjQabGXEnTWqiKnIlcbc6J4/LIfdYYZHd+SXUXH0R81
BaXPANL222srxuGbBifsfZ/yXxNPmBYfT1mTaFU6cKAftVO/24RV/MbwkU4cgym7
0ZM/iBDkFDy6eV3XhhGeX1miDhGa2zNM/NHvsnSsP8jQqlyJW+lpI5dRPEoH1Fzp
ecpxSH/HtjW5GYMwP7qhHMth/XHhjkOmNDRblkRWYrMGxgy7mAcSR8axdjqTdDxD
+vlez+b9uPF3qXUk1D2NkW3RmTSkxjV6ztzc+yddbFSzqmQcHSqH+7ohaHnQRefX
IGngTOq5pSmluNmrul48y/wmkSwVI1/jrteOfJfhJXWL0pY65g+8KWQxLmeZHke7
ytFu6msOsVmZjDQ7tckSxSLMB+frvzuc/h+eOBTx8ida0bjopVlfNUcLQRD4kZfy
xHn6nPxGsdiq2PM71nxYvqoUfmZnIE3o0A9+IB4OIp5bLVAGt2/fbWV3llunGK0A
JM8UmyOysh71xh288VyE/hPz/7tyea/KgTaTm0jdTkTeoHH2isHsgWWvYB3zjeEn
/1c7o2T13Q==
=Ml3z
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180803' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single fix, from Ming, fixing a regression in this cycle where
the busy tag iteration was changed to only calling the callback
function for requests that are started. We really want all non-free
requests.
This fixes a boot regression on certain VM setups"
* tag 'for-linus-20180803' of git://git.kernel.dk/linux-block:
blk-mq: fix blk_mq_tagset_busy_iter
One fix for a regression in a recent TLB flush optimisation, which caused us to
incorrectly not send TLB invalidations to coprocessors.
Thanks to:
Frederic Barrat, Nicholas Piggin, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJbZDxOExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAm
7w//QcYYpBcBu9XK0E53XglSoC4iTzMrPnIIivPsSngLXWtEC/5315nbCOdhT4mh
CTYxmwCm0pZG4kxJntnT4MXwSLhM46eFs586uOS1BgEJw+m85m2tiEsrNQSogJ14
GzZiIk1UbzKUZjpSTIXigx0Um6x+q83kIBhd/ewrejB1wz6d5T7Wktjv2iWewGdf
kOJPXuvaWXeGmJWJ2Gek3T5j7PCII0humEQjsy40XVFR3NYC7jT1r4UONE3NeBTU
xavo/EcB3JHx2MPPgAfJT/KAiHLQZzuR382azf7W5sQJ7/Yy2N1jpskSxFoMEMtc
axYWniHfGwJsnwcO4HPisIA67tXR6EzugAyuta4IQufRVp1s+j2WIzyqjl4uOB7Q
v5ZXBH/Y4wShqVHhUzzQasiJgGYZoUoeOvJIoFNEo5f6UFuT/l9mkuM4vPtNKi6i
NpT2Hj/hTATNRGyrBnvjcktS8uHbVxSHE6Cm4ZHSfQ8POMulUKvhHTZ7gYsxpxUE
ktKCNgNL57QHRQojY0AftpEMLrDYiJEE9+6OLjEk3Mf7ayiZAAjIpy7LRD9+p8Q8
eomEXmwQr1YOXGgvS0mFFPd+P2PfJoQztd1fWZtZIY1xksdwGep65e24hBOEjCAV
7vVx+3pm+dekEFVf976ZYgACHCjOn8Efd5t+AK7b8S0u0uw=
=Z81h
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression in a recent TLB flush optimisation, which
caused us to incorrectly not send TLB invalidations to coprocessors.
Thanks to Frederic Barrat, Nicholas Piggin, Vaibhav Jain"
* tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix missing global invalidations when removing copro
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbY+nYAAoJEAx081l5xIa+/bkP+wfj0XStqhexEAENgxQ3nzNc
YgHhXNMW4s/OKMqZZgYLbZNN7dYZ+HqAzGlCvgyDcX1aJ1vSx+Ozua9KK8xqWBFY
VuhrklY8ppLqTz+F9M2CgDvmRJfb0ncfNndA0UPmYcno8sljrDhiA+dhAQVNAxY+
cu8dPXi3ra23eIpid2tomHVZpntACwbu1Vya5BjWn/Q3FMjWaLP/wcimy2NVoagx
vr9qmuDtipLxGQZVNgoRfY2BZfzIbfkDTXBkKFz/esJiDCehpgckk53yr4v1juCI
kx1QkiOg1bAgdDOsOMc2p23DF64fS469AExL/vKLB7OAk1MCY1EIX/3wk4eZ80uq
q2p1SNcfYjySX2k7AaYlLDLndDhPaPxdQxDnF469i2it3EZ+aaeOIg4DFEfZ8na6
QgX5wSKLpyjIo545mw94nGQ/3Dbj/3890HHWrPAgK+XoUL0+ms1cjEi75UJqsXDI
JN/fX2H3rkII8nDH3FiKHVuutLHdqQ3eZc2SkRtj3UhlpWDZ0eikKThgc2bdTAC1
zPJJlpYXBlZiSgL5a/ek4JLVf+k/JN2wLgDt5OulKc/tvqQ4xt4KsuWD7WTFnV8Q
le/KsumBbFfdpuB98Qtv50yFCIINIAk/vIIwUzCk8mTgk163KlE1W8m9tXpgjDNU
hRYBKTZNzTlnoiJ3VEAC
=rjIZ
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2018-08-03' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Nothing too major at this late stage:
- adv7511: reset fix
- vc4: scaling fix
- two atomic core fixes
- one legacy core error handling fix
I had a bunch of driver fixes from hdlcd but I think I'll leave them
for -next at this point"
* tag 'drm-fixes-2018-08-03' of git://anongit.freedesktop.org/drm/drm:
drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
drm/atomic: Initialize variables in drm_atomic_helper_async_check() to make gcc happy
drm/atomic: Check old_plane_state->crtc in drm_atomic_helper_async_check()
drm: re-enable error handling
drm/bridge: adv7511: Reset registers on hotplug
Pull crypto fix from Herbert Xu:
"Fix a memory corruption in the padlock-aes driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: padlock-aes - Fix Nano workaround data corruption
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
a nesting interrupt. This stands as an optimization: whether a hardirq or a
softirq is interrupted, the tick is going to be reprogrammed when necessary
at the end of the inner interrupt, with even potential new updates on the
timer queue.
When soft interrupts are interrupted, it's assumed that they are executing
on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
called after softirq processing to take care of the tick reprogramming.
But the assumption is wrong: softirqs can be processed inline as well, ie:
outside of an interrupt, like in a call to local_bh_enable() or from
ksoftirqd.
Inline softirqs don't reprogram the tick once they are done, as opposed to
interrupt tail softirq processing. So if a tick interrupts an inline
softirq processing, the next timer will neither be reprogrammed from the
interrupting tick's irq_exit() nor after the interrupted softirq
processing. This situation may leave the tick unprogrammed while timers are
armed.
To fix this, simply keep reprogramming the tick even if a softirq has been
interrupted. That can be optimized further, but for now correctness is more
important.
Note that new timers enqueued in nohz_full mode after a softirq gets
interrupted will still be handled just fine through self-IPIs triggered by
the timer code.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org # 4.14+
Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
Add initial support for s390 auxiliary traces using the CPU-Measurement
Sampling Facility.
Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data
file. Later patches will show the contents of the auxiliary traces.
Setup the auxtrace queues and data structures for s390. A raw dump of
the perf.data file now does not show an error when an auxtrace event is
encountered.
Output before:
[root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace
0x128 [0x10]: failed to process type: 70
Error:
failed to process sample
0x128 [0x10]: event: 70
.
. ... raw event: size 16 bytes
. 0000: 00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00 ...F............
0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0
[root@s35lp76 perf]#
Output after:
# ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE
0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5
0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
offset: 0 ref: 0 idx: 4 tid: -1 cpu: 4
....
Additional notes about the underlying hardware and software
implementation, provided by Hendrik Brueckner (see Link: below).
=============================================================================
The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain
performance information on the mainframe. Basically, it was introduced
with System z10 years ago for the z/Architecture, that means, 64-bit.
For Linux, there are two facilities of interest, counter facility and sampling
facility. The counter facility provides hardware counters for instructions,
cycles, crypto-activities, and many more.
The sampling facility is a hardware sampler that when started will write
samples at a particular interval into a sampling buffer. At some point,
for example, if a sample block is full, it generates an interrupt to collect
samples (while the sampler continues to run).
Few years ago, I started to provide the a perf PMU to use the counter
and sampling facilities. Recently, the device driver was updated to also
"export" the sampling buffer into the AUX area. Thomas now completed the
related perf work to interpret and process these AUX data.
If people are more interested in the sampling facility, they can have a
look into:
- The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05
http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
and to learn how-to use it for Linux on Z, have look at chapter 54,
"Using the CPU-measurement facilities" in the:
- Device Drivers, Features, and Commands, SC33-8411-34
http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf
=============================================================================
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.
Fixes: 2a1d3ab898 ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
Peter is objecting to the direct PMU access in RDT. Right now the PMU usage
is broken anyway as it is not coordinated with perf.
Until this discussion settled, disable the PMU mechanics by simply
rejecting the type '2' measurement in the resctrl file.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
CC: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches
executed cannot be controlled by software that was executed in a less
privileged predictor mode or on another logical processor. As a
result, software operating on a processor with enhanced IBRS need not
use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
privileged predictor mode. Software can isolate predictor modes
effectively simply by setting the bit once. Software need not disable
enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre
variant 2) mitigation on Intel processors belonging to family 6
(enumerated by the CPUID instruction) that do not have support for
enhanced IBRS. On processors that support enhanced IBRS, it should be
used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
Some Intel processors have an EPT feature whereby the accessed & dirty bits
in EPT entries can be updated by HW. MSR IA32_VMX_EPT_VPID_CAP exposes the
presence of this capability.
There is no point in trying to use that new feature bit in the VMX code as
VMX needs to read the MSR anyway to access other bits, but having the
feature bit for EPT_AD in place helps virtualization management as it
exposes "ept_ad" in /proc/cpuinfo/$proc/flags if the feature is present.
[ tglx: Amended changelog ]
Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180801180657.138051-1-pshier@google.com
Code is emitting the following error message during boot on systems
without PMU hardware support while probing NMI capability.
NMI watchdog: Perf event create on CPU 0 failed with -2
This error is emitted as the perf subsystem returns -ENOENT due to lack of
PMUs in the system.
It is followed by the warning that NMI watchdog is disabled:
NMI watchdog: Perf NMI watchdog permanently disabled
While NMI disabled information is useful for ordinary users, seeing a PERF
event create failed with error code -2 is not.
Reduce the message severity to debug so that if debugging is still possible
in case the error code returned by perf is required for analysis.
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368
Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org
The shell file for test_lwt_seg6local contains an early iproute2 syntax
for installing a seg6local End.BPF route. iproute2 support for this
feature has recently been upstreamed, but with an additional keyword
required. This patch updates test_lwt_seg6local.sh to the definitive
iproute2 syntax
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbY5PfAAoJEAhfPr2O5OEVCTsP/2d8kSVYtdMbdqwvgNX/Tvnb
xaEnQiZWirS5E8v1t+M6qB5L/EvOGbGf0O+6Fvzg1lUoHu/Ni4v3b/7CauRCD4ct
XSR6tQXJ9ySglr5TfY+p2JZvls6R1sDRpOTnWtrllJ4KPALrB4ogavPD8gHV11Ut
Eza4eARzPrPl0HTFLbUwijiAN+HBFndx8GQIZ8pc/4lVWS4bMQKae+Ev7Crj2rNZ
hx84QGzTcYmzfqrXJJWOgBtuEg1sN7zuoVtytarGLTxQL97I7761pS1cP57P2wL1
TQKsK95k1yENBOA2bGg3OGWxMtlhDsCUs2vIpqh1fh/5WXkwJfu6k93E9SosV5YG
yfTvRbhfqFuLfOvQEz2CdYjKWjXhF+6If1bV35jNFvmGZrA1dtm2hEtVEUe4SOjk
Xm8a/zwV1u1w8axH89KNQUMpiW6Ad6w/VkZKgsBYzf1vbR5TDHSPnnriZ9EKLVvC
XOPYLbxlxSxaJF3xwtexH9pI4Ah9RlrmzwXIxHCRH2kzz0sPRVAPPq8EmQhIdZZ5
COKmNoYE3hKjEMdp6obOTyJOEuhoH8Y/rQF6uSOt2VM6ug1UkAJ7kGeAaNWBzA94
9IPlAL+W0xpviRc3G5q3FP/06Ok0HHrPlHPFV06/n/fRrcX15tPseCqOzCsiOtcv
vvIDjdyJQcrzB0Tq8eBk
=TwSI
-----END PGP SIGNATURE-----
Merge tag 'media/v4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a deadlock regression at vsp1 driver
- some Remote Controller fixes related to the new BPF filter logic
added on it for Kernel 4.18.
* tag 'media/v4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: v4l: vsp1: Fix deadlock in VSPDL DRM pipelines
media: rc: read out of bounds if bpf reports high protocol number
media: bpf: ensure bpf program is freed on detach
media: rc: be less noisy when driver misbehaves
Merge misc fixes from Andrew Morton:
"3 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
ipc/shm.c add ->pagesize function to shm_vm_ops
memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
The fix in commit 0cbb4b4f4c ("userfaultfd: clear the
vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") cleared the
vma->vm_userfaultfd_ctx but kept userfaultfd flags in vma->vm_flags
that were copied from the parent process VMA.
As the result, there is an inconsistency between the values of
vma->vm_userfaultfd_ctx.ctx and vma->vm_flags which triggers BUG_ON
in userfaultfd_release().
Clearing the uffd flags from vma->vm_flags in case of UFFD_EVENT_FORK
failure resolves the issue.
Link: http://lkml.kernel.org/r/1532931975-25473-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 0cbb4b4f4c ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reported-by: syzbot+121be635a7a35ddb7dcb@syzkaller.appspotmail.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 05ea88608d ("mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct") adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.
With System V shared memory model, if "huge page" is specified, the
"shared memory" is backed by hugetlbfs files, but the mappings initiated
via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
result in below BUG:
fs/hugetlbfs/inode.c
443 if (unlikely(page_mapped(page))) {
444 BUG_ON(truncate_op);
resulting in
hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
------------[ cut here ]------------
kernel BUG at fs/hugetlbfs/inode.c:444!
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
....
Call Trace:
hugetlbfs_evict_inode+0x1e/0x3e
evict+0xdb/0x1af
iput+0x1a2/0x1f7
dentry_unlink_inode+0xc6/0xf0
__dentry_kill+0xd8/0x18d
dput+0x1b5/0x1ed
__fput+0x18b/0x216
____fput+0xe/0x10
task_work_run+0x90/0xa7
exit_to_usermode_loop+0xdd/0x116
do_syscall_64+0x187/0x1ae
entry_SYSCALL_64_after_hwframe+0x150/0x0
[jane.chu@oracle.com: relocate comment]
Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com
Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
Fixes: 05ea88608d ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed
in mem_cgroup_idr even after memcg memory is freed. This leads to leak
of ID in mem_cgroup_idr.
This patch adds removal into mem_cgroup_css_alloc(), which fixes the
problem. For better readability, it adds a generic helper which is used
in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well.
Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain
Fixes 73f576c04b ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current value for a target abort error is 0x010, however, this value
should in fact be 0x002. As it stands, the range of error is 0..7 so
it is currently never being detected. This bug has been in the driver
since the early 2.6.12 days (or before).
Detected by CoverityScan, CID#744290 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight requests
in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT'
to replace 'blk_mq_request_started(req)', this way is wrong, and causes
lots of test system hang during booting.
Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter().
Fixes: d250bf4e77 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter")
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matt Hart <matthew.hart@linaro.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The position calculation in iomap_bmap() shifts bno the wrong way,
so we don't progress properly and end up re-mapping block zero
over and over, yielding an unchanging physical block range as the
logical block advances:
# filefrag -Be file
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 21.. 21: 1: merged
1: 1.. 1: 21.. 21: 1: 22: merged
Discontinuity: Block 1 is at 21 (was 22)
2: 2.. 2: 21.. 21: 1: 22: merged
Discontinuity: Block 2 is at 21 (was 22)
3: 3.. 3: 21.. 21: 1: 22: merged
This breaks the FIBMAP interface for anyone using it (XFS), which
in turn breaks LILO, zipl, etc.
Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes: 89eb1906a9 ("iomap: add an iomap-based bmap implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag
before starting the rport delete routine.
As the started flag was not cleared, we're not deleting the rport but
waiting for a restart and thus are keeping the reference count of the rdata
object at 1.
This leads to the following kmemleak report:
unreferenced object 0xffff88006542aa00 (size 512):
comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s)
hex dump (first 32 bytes):
68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............
01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$....
backtrace:
[<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe]
[<(____ptrval____)>] process_one_work+0x7ff/0x1420
[<(____ptrval____)>] worker_thread+0x87/0xef0
[<(____ptrval____)>] kthread+0x2db/0x390
[<(____ptrval____)>] ret_from_fork+0x35/0x40
[<(____ptrval____)>] 0xffffffffffffffff
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: ard <ard@kwaak.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
KASAN reports a use-after-free in fcoe_ctlr_els_send() when we're sending a
LOGO and have FIP debugging enabled. This is because we're first freeing
the skb and then printing the frame's DID. But the DID is a member of the
FC frame header which in turn is the skb's payload.
Exchange the debug print and kfree_skb() calls so we're not touching the
freed data.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now it looks just about the same as for the trace__sys_{enter,exit}.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mapping "__syscall_nr" to "id" and setting up "args" from the offset of
"__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same
as for raw_syscalls:*, just the fields have different names.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid having to ask libtraceevent to find a field by name when
handling each tracepoint event, we setup a struct syscall_tp with
a tp_field struct having an extractor function + the offset for the
"id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints.
Now that we want to do the same with syscalls:sys_{entry,exit}_NAME
individual syscall tracepoints, where we have "id" as "__syscall_nr" and
"args" as the actual series of per syscall parameters, we need more
flexibility from the routines that set up these pre-looked up syscall
tracepoint arg fields.
The next cset will use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because raw_syscalls have the field for the syscall number as 'id' while
the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr...
Since we want to support both for being able to enable just a
syscalls:sys_{enter,exit}_name instead of asking for
raw_syscalls:sys_{enter,exit} plus filters, make the method names for
each kind of tracepoint more explicit.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.org
Signef-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were using the beautifiers only when processing the
raw_syscalls:sys_enter events, but we can as well use them for the
syscalls:sys_enter_NAME events, as the layout is the same.
Some more tweaking is needed as we're processing them straight away,
i.e. there is no buffering in the sys_enter_NAME event to wait for
things like vfs_getname to provide pointer contents and then flushing
at sys_exit_NAME, so we need to state in the syscall_arg that this
is unbuffered, just print the pointer values, beautifying just
non-pointer syscall args.
This just shows an alternative way of processing tracepoints, that we
will end up using when creating "tracepoint" payloads that already copy
pointer contents (or chunks of it, i.e. not the whole filename, but just
the end of it, not all the bf for a read/write, but just the start,
etc), directly in the kernel using eBPF.
E.g.:
# perf trace -e syscalls:*enter*sleep,*sleep sleep 1
0.303 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0
0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0
# perf trace -e syscalls:*_*sleep,*sleep sleep 1
0.288 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40
0.289 ( ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ...
1000.479 ( ): syscalls:sys_exit_nanosleep:0x0
0.289 (1000.208 ms): sleep/8748 ... [continued]: nanosleep()) = 0
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
use actual protocol family passed by user rather than hardcoded
AF_INTE6 to cerate sockets.
current code is not working for IPv4.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix potential clobbering of user vector register state by AES ghash code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbYtldAAoJELescNyEwWM0aisH/ReBqfJb1kTr5ybUriSTsuTl
Vjn8O17aH+177P3xlY47+cYWsvewJxiy7MldlLEiER0tsyjYvqTDuzy7+ffdJJ8R
rs0MGhYh9WpD3OVjng7TmwXD71xefk/gLq4MPqmgn9e/DoAt4HO/7jC4c+mSwxRZ
gHsAH5g6WUclRvU1zXaT8QKdZnmwudPFvy5O+bQYQrIJr++zBiyJ47qu1+TjJQuC
kVmM7XV0c0L1fK9z7A18PcW+tMlIu15ITzJwEercXen/7XypdDOufgc4Y9odHCkC
5ZWnV5wZtaJIuo/JWWPnoheWfTqIVF7ggXwsaXmZ0hKfQkBvbJ8fxhogCIWgt40=
=rpVW
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 regression fix from Will Deacon:
"Ard found a nasty arm64 regression in 4.18 where the AES ghash/gcm
code doesn't notify the kernel about its use of the vector registers,
therefore potentially corrupting live user state.
The fix is straightforward and Herbert agreed for it to go via arm64"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
Pull networking fixes from David Miller:
"Fixes keep trickling in:
1) Various IP fragmentation memory limit hardening changes from Eric
Dumazet.
2) Revert ipv6 metrics leak change, it causes more problems than it
fixes for now.
3) Fix WoL regression in stmmac driver, from Jose Abreu.
4) Netlink socket spectre v1 gadget fix, from Jeremy Cline"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Revert "net/ipv6: fix metrics leak"
rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
net: dsa: Do not suspend/resume closed slave_dev
netlink: Fix spectre v1 gadget in netlink_create()
Documentation: dpaa2: Use correct heading adornment
net: stmmac: Fix WoL for PCI-based setups
bonding: avoid lockdep confusion in bond_get_stats()
enic: do not call enic_change_mtu in enic_probe
ipv4: frags: handle possible skb truesize change
inet: frag: enforce memory limits earlier
net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow
net/mlx5e: Fix null pointer access when setting MTU of vport representor
net/mlx5e: Set port trust mode to PCP as default
net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341
brcmfmac: fix regression in parsing NVRAM for multiple devices
iwlwifi: add more card IDs for 9000 series
Previously in squashfs_readpage() when copying data into the page
cache, it used the length of the datablock read from the filesystem
(after decompression). However, if the filesystem has been corrupted
this data block may be short, which will leave pages unfilled.
The fix for this is to compute the expected number of bytes to copy
from the inode size, and use this to detect if the block is short.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The squashfs fragment reading code doesn't actually verify that the
fragment is inside the fragment table. The end result _is_ verified to
be inside the image when actually reading the fragment data, but before
that is done, we may end up taking a page fault because the fragment
table itself might not even exist.
Another report from Anatoly and his endless squashfs image fuzzing.
Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>,
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the vfs_getname() wannabe tracepoint is in place:
# perf probe -l
probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
#
'perf trace' will use it to get the pathname when it is copied from
userspace to the kernel, right after syscalls:sys_enter_open, copied
in the 'probe:vfs_getname', stash it somewhere and then, at
syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a
successfull open syscall, associate that pathname to this return, i.e.
the fd.
We were not doing this for the 'openat' syscall, which would cause 'perf
trace' to fallback to using /proc to get the fd, change it so that we
use what we got from probe:vfs_getname, reducing the 'openat'
beautification process cost, ditching the syscalls performed to read
procfs state and avoiding some possible races in the process.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using cpu_all_mask in clockevents cpumask may result in issues while
comparing multiple clockevent devices to choose the preferred one.
On one of the platforms with 2 system (i.e. non per-CPU) timers with
different ratings, having cpu_all_mask for one of the device resulted in a
boot hang due to a endless loop in clockevents_notify_released() as both
were clocksources were selected as preferred.
In order to prevent such issues in the future, warn if any clockevent
driver sets cpu_all_mask as it's cpumask and just override it to use
cpu_possible_mask. All the existing occurrences of cpu_all_mask are already
replaced with cpu_possible_mask.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com
This is the last instance of cpu_all_mask usage in the core framework.
Replace it with cpu_possible_mask like all other instances in the
clockevent drivers. This makes it possible to add a warning in the core
clockevents_register_device on usage of cpu_all_mask from any clockevent
drivers in the future.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com
Using cpu_all_mask as target mask for clockevents is wrong as it never can
actually target not possible CPUs. Use cpu_possible_mask instead
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>