If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Due to a spec misinterpretation, the Linux implementation of the BTT log
area had different padding scheme from other implementations, such as
UEFI and NVML.
This fixes the padding scheme, and defaults to it for new BTT layouts.
We attempt to detect the padding scheme in use when probing for an
existing BTT. If we detect the older/incompatible scheme, we continue
using it.
Reported-by: Juston Li <juston.li@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 5212e11fde ("nd_btt: atomic sector updates")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recent updates to btt.h neglected to add corresponding kernel-doc lines
for new structure members. Add them.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Saeed Mahameed says:
===================
Mellanox, mlx5 fixes 2017-12-19
The follwoing series includes some fixes for mlx5 core and etherent
driver.
Please pull and let me know if there is any problem.
This series doesn't introduce any conflict with the ongoing mlx5 for-next
submission.
For -stable:
kernels >= v4.7.y
("net/mlx5e: Fix possible deadlock of VXLAN lock")
("net/mlx5e: Add refcount to VXLAN structure")
("net/mlx5e: Prevent possible races in VXLAN control flow")
("net/mlx5e: Fix features check of IPv6 traffic")
kernels >= v4.9.y
("net/mlx5: Fix error flow in CREATE_QP command")
("net/mlx5: Fix rate limit packet pacing naming and struct")
kernels >= v4.13.y
("net/mlx5: FPGA, return -EINVAL if size is zero")
kernels >= v4.14.y
("Revert "mlx5: move affinity hints assignments to generic code")
All above patches apply and compile with no issues on corresponding -stable.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_copy_ubufs creates a private copy of frags[] to release its hold
on user frags, then calls uarg->callback to notify the owner.
Call uarg->callback even when no frags exist. This edge case can
happen when zerocopy_sg_from_iter finds enough room in skb_headlen
to copy all the data.
Fixes: 3ece782693 ("sock: skb_copy_ubufs support for compound pages")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
calls to skb_uarg(skb)->callback for the same data.
skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
with each segment. This is only safe for uargs that do refcounting,
which is those that pass skb_orphan_frags without dropping their
shared frags. For others, skb_orphan_frags drops the user frags and
sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
calls for the same data causing the vhost_net_ubuf_ref_>refcount to
drop below zero.
Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
Fixes: 1f8b977ab3 ("sock: enable MSG_ZEROCOPY")
Reported-by: Andreas Hartmann <andihartmann@01019freenet.de>
Reported-by: David Hill <dhill@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
"It's been a few weeks, so here's a small collection of fixes that
should go into the current series.
This contains:
- NVMe pull request from Christoph, with a few important fixes.
- kyber hang fix from Omar.
- A blk-throttl fix from Shaohua, fixing a case where we double
charge a bio.
- Two call_single_data alignment fixes from me, fixing up some
unfortunate changes that went into 4.14 without being properly
reviewed on the block side (since nobody was CC'ed on the
patch...).
- A bounce buffer fix in two parts, one from me and one from Ming.
- Revert bdi debug error handling patch. It's causing boot issues for
some folks, and a week down the line, we're still no closer to a
fix. Revert this patch for now until it's figured out, then we can
retry for 4.16"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "bdi: add error handle for bdi_debug_register"
null_blk: unalign call_single_data
block: unalign call_single_data in struct request
block-throttle: avoid double charge
block: fix blk_rq_append_bio
block: don't let passthrough IO go into .make_request_fn()
nvme: setup streams after initializing namespace head
nvme: check hw sectors before setting chunk sectors
nvme: call blk_integrity_unregister after queue is cleaned up
nvme-fc: remove double put reference if admin connect fails
nvme: set discard_alignment to zero
kyber: fix another domain token wait queue hang
- A bug in handling of SPE state for non-vhe systems
- A fix for a crash on system shutdown
- Three timer fixes, introduced by the timer optimizations for v4.15
x86 fixes:
- fix for a WARN that was introduced in 4.15
- fix for SMM when guest uses PCID
- fixes for several bugs found by syzkaller
... and a dozen papercut fixes for the kvm_stat tool.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJaO6N9AAoJEL/70l94x66DC1wH/Rf+u0Cj6ZQil6LK6Nf8bfPd
3TqrwrxUDeXwi8GzsvK14izBr1mDzidSHIO0Q4XINFRSRdaf43h3R2im/SJqvNhP
xktCmJI2CxN96oaC7kIExgwf3YKhFdLIADfbT8oR9p3xZG/+c97dkr3b4XtmVCDb
ZXdUEOcKnoW4zwpfJN30FLlq4OwYvuYVz02AEfPivZRDfhhus/TYSnuSdxH8CLNf
75ymuKyXoo/RELbimwbMk8Cm9+ey7PjlUGOgbnbXIFtmgznXhLzAOeES2B+46J5b
sMBPlmiJrn6N//lM18CC5yOBzBLGsYOoXggtw4aU/5nM4GVcFebWedpcoD4D8Jw=
=Bt8w
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM fixes:
- A bug in handling of SPE state for non-vhe systems
- A fix for a crash on system shutdown
- Three timer fixes, introduced by the timer optimizations for v4.15
x86 fixes:
- fix for a WARN that was introduced in 4.15
- fix for SMM when guest uses PCID
- fixes for several bugs found by syzkaller
... and a dozen papercut fixes for the kvm_stat tool"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
tools/kvm_stat: sort '-f help' output
kvm: x86: fix RSM when PCID is non-zero
KVM: Fix stack-out-of-bounds read in write_mmio
KVM: arm/arm64: Fix timer enable flow
KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
KVM: arm/arm64: Fix HYP unmapping going off limits
arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
KVM/x86: Check input paging mode when cs.l is set
tools/kvm_stat: add line for totals
tools/kvm_stat: stop ignoring unhandled arguments
tools/kvm_stat: suppress usage information on command line errors
tools/kvm_stat: handle invalid regular expressions
tools/kvm_stat: add hint on '-f help' to man page
tools/kvm_stat: fix child trace events accounting
tools/kvm_stat: fix extra handling of 'help' with fields filter
tools/kvm_stat: fix missing field update after filter change
tools/kvm_stat: fix drilldown in events-by-guests mode
tools/kvm_stat: fix command line option '-g'
kvm: x86: fix WARN due to uninitialized guest FPU state
...
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.
The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.
To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.
Note, this changes behavior a little bit. Before commit 42240901f7
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a0747a859e.
It breaks some booting for some users, and more than a week
into this, there's still no good fix. Revert this commit
for now until a solution has been found.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, parameters such as oif and source address are not taken into
account during fibmatch lookup. Example (IPv4 for reference) before
patch:
$ ip -4 route show
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
$ ip -6 route show
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy0 proto kernel metric 256 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
$ ip -4 route get fibmatch 192.0.2.2 oif dummy0
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
$ ip -4 route get fibmatch 192.0.2.2 oif dummy1
RTNETLINK answers: No route to host
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
After:
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
RTNETLINK answers: Network is unreachable
The problem stems from the fact that the necessary route lookup flags
are not set based on these parameters.
Instead of duplicating the same logic for fibmatch, we can simply
resolve the original route from its copy and dump it instead.
Fixes: 18c3a61c42 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.
Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.
There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.
Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.
If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
is zero. This is wrong, since i_cnextents only tracks real extents in
the CoW fork, which means that we could have some delayed CoW
reservations still in there that will now never get cleaned.
Fix a further bug where we /don't/ clear the reflink iflag if there are
any attribute blocks -- really, it's only safe to clear the reflink flag
if there are no data fork extents and no cow fork extents.
Found by adding clonerange to fsstress in xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Sort the fields returned by specifying '-f help' on the command line.
While at it, simplify the code a bit, indent the output and eliminate an
extra blank line at the beginning.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then
CR4 & ~PCIDE, then CR0, then CR4.
However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier
in the long run to replace rsm_enter_protected_mode() with an emulator
callback that sets all the special registers (like KVM_SET_SREGS would
do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We added support for EXTPROC back in 2010 in commit 26df6d1340 ("tty:
Add EXTPROC support for LINEMODE") and the intent was to allow it to
override some (all?) ICANON behavior. Quoting from that original commit
message:
There is a new bit in the termios local flag word, EXTPROC.
When this bit is set, several aspects of the terminal driver
are disabled. Input line editing, character echo, and mapping
of signals are all disabled. This allows the telnetd to turn
off these functions when in linemode, but still keep track of
what state the user wants the terminal to be in.
but the problem turns out that "several aspects of the terminal driver
are disabled" is a bit ambiguous, and you can really confuse the n_tty
layer by setting EXTPROC and then causing some of the ICANON invariants
to no longer be maintained.
This fixes at least one such case (TIOCINQ) becoming unhappy because of
the confusion over whether ICANON really means ICANON when EXTPROC is set.
This basically makes TIOCINQ match the case of read: if EXTPROC is set,
we ignore ICANON. Also, make sure to reset the ICANON state ie EXTPROC
changes, not just if ICANON changes.
Fixes: 26df6d1340 ("tty: Add EXTPROC support for LINEMODE")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 4a336a23d6 ("kobject: copy env blob in one go") optimized
constructing uevent data for delivery over netlink by using the raw
environment buffer, instead of reconstructing it from individual
environment pointers. Unfortunately in doing so it broke suppressing
MODALIAS attribute for KOBJ_UNBIND events, as the code that suppressed this
attribute only adjusted the environment pointers, but left the buffer
itself alone. Let's fix it by making sure the offending attribute is
obliterated form the buffer as well.
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Casey Leedom <leedom@chelsio.com>
Fixes: 4a336a23d6 ("kobject: copy env blob in one go")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch bd36d3bab2 fixed a deadlock in the
failure path of drm_lease_create. This made the partially initialized
lease object visible for a short window of time.
To avoid having the lessee state appear transiently, I've rearranged
the code so that the lessor fields are not filled in until the
parameters are all validated and the function will succeed.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221065424.1304-1-keithp@keithp.com
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-21
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix multiple security issues in the BPF verifier mostly related
to the value and min/max bounds tracking rework in 4.14. Issues
range from incorrect bounds calculation in some BPF_RSH cases,
to improper sign extension and reg size handling on 32 bit
ALU ops, missing strict alignment checks on stack pointers, and
several others that got fixed, from Jann, Alexei and Edward.
2) Fix various build failures in BPF selftests on sparc64. More
specifically, librt needed to be added to the libs to link
against and few format string fixups for sizeof, from David.
3) Fix one last remaining issue from BPF selftest build that was
still occuring on s390x from the asm/bpf_perf_event.h include
which could not find the asm/ptrace.h copy, from Hendrik.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When an I/O is returned with an srb_status of SRB_STATUS_INVALID_LUN
which has zero good_bytes it must be assigned an error. Otherwise the
I/O will be continuously requeued and will cause a deadlock in the case
where disks are being hot added and removed. sd_probe_async will wait
forever for its I/O to complete while holding scsi_sd_probe_domain.
Also returning the default error of DID_TARGET_FAILURE causes multipath
to not retry the I/O resulting in applications receiving I/O errors
before a failover can occur.
Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Do not allow root to convert valid pointers into unknown scalars.
In particular disallow:
ptr &= reg
ptr <<= reg
ptr += ptr
and explicitly allow:
ptr -= ptr
since pkt_end - pkt == length
1.
This minimizes amount of address leaks root can do.
In the future may need to further tighten the leaks with kptr_restrict.
2.
If program has such pointer math it's likely a user mistake and
when verifier complains about it right away instead of many instructions
later on invalid memory access it's easier for users to fix their progs.
3.
when register holding a pointer cannot change to scalar it allows JITs to
optimize better. Like 32-bit archs could use single register for pointers
instead of a pair required to hold 64-bit scalars.
4.
reduces architecture dependent behavior. Since code:
r1 = r10;
r1 &= 0xff;
if (r1 ...)
will behave differently arm64 vs x64 and offloaded vs native.
A significant chunk of ptr mangling was allowed by
commit f1174f77b5 ("bpf/verifier: rework value tracking")
yet some of it was allowed even earlier.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov says:
====================
This patch set addresses a set of security vulnerabilities
in bpf verifier logic discovered by Jann Horn.
All of the patches are candidates for 4.14 stable.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
These tests should cover the following cases:
- MOV with both zero-extended and sign-extended immediates
- implicit truncation of register contents via ALU32/MOV32
- implicit 32-bit truncation of ALU32 output
- oversized register source operand for ALU32 shift
- right-shift of a number that could be positive or negative
- map access where adding the operation size to the offset causes signed
32-bit overflow
- direct stack access at a ~4GiB offset
Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
that should fail independent of what flags userspace passes.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There were various issues related to the limited size of integers used in
the verifier:
- `off + size` overflow in __check_map_access()
- `off + reg->off` overflow in check_mem_access()
- `off + reg->var_off.value` overflow or 32-bit truncation of
`reg->var_off.value` in check_mem_access()
- 32-bit truncation in check_stack_boundary()
Make sure that any integer math cannot overflow by not allowing
pointer math with large values.
Also reduce the scope of "scalar op scalar" tracking.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This could be made safe by passing through a reference to env and checking
for env->allow_ptr_leaks, but it would only work one way and is probably
not worth the hassle - not doing it will not directly lead to program
rejection.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Force strict alignment checks for stack pointers because the tracking of
stack spills relies on it; unaligned stack accesses can lead to corruption
of spilled registers, which is exploitable.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
Adjust the verifier accordingly.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Properly handle register truncation to a smaller size.
The old code first mirrors the clearing of the high 32 bits in the bitwise
tristate representation, which is correct. But then, it computes the new
arithmetic bounds as the intersection between the old arithmetic bounds and
the bounds resulting from the bitwise tristate representation. Therefore,
when coerce_reg_to_32() is called on a number with bounds
[0xffff'fff8, 0x1'0000'0007], the verifier computes
[0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
This is incorrect: The truncated number could also be in the range [0, 7],
and no meaningful arithmetic bounds can be computed in that case apart from
the obvious [0, 0xffff'ffff].
Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.
Debian assigned CVE-2017-16996 for this issue.
v2:
- flip the mask during arithmetic bounds calculation (Ben Hutchings)
v3:
- add CVE number (Ben Hutchings)
Fixes: b03c9f9fdc ("bpf/verifier: track signed and unsigned min/max values")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.
Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.
Debian assigned CVE-2017-16995 for this issue.
v3:
- add CVE number (Ben Hutchings)
Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Incorrect signed bounds were being computed.
If the old upper signed bound was positive and the old lower signed bound was
negative, this could cause the new upper signed bound to be too low,
leading to security issues.
Fixes: b03c9f9fdc ("bpf/verifier: track signed and unsigned min/max values")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
[jannh@google.com: changed description to reflect bug impact]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them
with separate i_flags. Right now we're abusing IEOFBLOCKS for both,
which is totally bogus because we won't tag the inode with COWBLOCKS if
IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS.
Found by wiring up clonerange to fsstress in xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
drm/i915 fixes for v4.15-rc5
* tag 'drm-intel-fixes-2017-12-20' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Protect DDI port to DPLL map from theoretical race.
drm/i915/lpe: Remove double-encapsulation of info string
Two simple fixes: one for sparse warnings that were introduced by the
merge window conversion to blist_flags_t and the other to fix dropped
I/O during reset in aacraid.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaOvWtAAoJEAVr7HOZEZN4pNYQAISqS0tbDt7bgT+e6I/PGy7D
0vxv13ELf+MV9AV2uJKDvpf73nNtzrMgEC9N2rCsQyKeietF4yW1N3tk8n5Me32g
vCD3YdwJFBwO3UaaNFkP+wpzh60RutMeBRUFAYeQu7LqBkEp4jOGx21N0fAb89wt
SUkwfib20XUs518Tuqsyzy0keNsH3sRNJUenoxXVnqNMqIobKpigxZORFMIJaloZ
2VyQhYqrL75iqLRHTUUpWorQC4Db/FTyl58oG7rG8JdRN0Mww3Hp8Jv2E8cn5e2z
Ze9J9Z/IUCxAV75muGR2GfXd9e5zgILOyLSwKcjxniElWWZbqTIYnEUlyElqBg5Z
4eWytQUmQTixeAqnNfnEYXpUiiJR3snKYCZpGhF/a7+Kzmid64GuOEhIQsroPy60
unO9LG50/WDsqWMFlSaJPoePnzOEDj4LrnZiedkroYQrAQq4I6QNAPcUE6ruYvka
czzbkqhuHs/jHe0rbiYtG6YjlU6FdV4XqCdx10ijX2oUVFxZeIkUHu1uCwqhqg24
p6UE2bEzCwpKMEOwVeNlRsC6BQKpxugJNGJPHS6WeFiVeFl/tHNpYh7L7jnzVAQH
C1L6RIGCK6jrzG49mn9mySNf6WmSfG7L3hqaHY5ngkz5sfdhR+6kjvnv8xFkyTK7
BJIyJBJBnDsaw/mDNRVt
=IuyM
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two simple fixes: one for sparse warnings that were introduced by the
merge window conversion to blist_flags_t and the other to fix dropped
I/O during reset in aacraid"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: aacraid: Fix I/O drop during reset
scsi: core: Use blist_flags_t consistently
Pull ARM fix from Russell King:
"Just one fix for a problem in the csum_partial_copy_from_user()
implementation when software PAN is enabled"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
- Fix an error handling issue in the ACPI APEI implementation
of the >read callback in struct pstore_info (Takashi Iwai).
- Fix a possible out-of-bounds arrar read in the ACPI CPPC
driver (Colin Ian King).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaOrJcAAoJEILEb/54YlRxhFAP/3urVt/pnoMUzfKX4jHX/3AG
B2Wk0HGKwtlvVMGhs/7+gUDZQSd1s/fcKvAVP429vzwLATdLqYj5JnhlBSvuKINt
RRsvw9Ks1YsowdmLbDm+o3nMpkSu0EZvADMDFBE4JvTvb8PHvev8NRRJD5ua110i
w2mbbocElzxjoCD9Fwt9El0fNSVbu8eslgk4RpmiU7XUjUP3Mo7j+sv+JslyS2QO
rY29T9qBwwVqTMBarV51ZTHgrB7LHhIXOnxUY7aVeY3aoLZi7nv1Gb8CEAec+T9x
0n7PxMGFYDRir3zRewxJqeE0iRZ1k+y9pWVYl/oslABFr+M+wZ2lbyqOk1i5rQsq
tkA7j6CBh8UWti/L36n/u9GK/AqwioyhH2UCBAvxLI3FGXtPgPI0F7ikZgPFxvGu
Dg/SVMuMd1z/auPaUJt9doO12Pvudh+RwzSQPoUoxwUcv7XgEYdU6XTqYKX0bD+R
vnXCnP/imgDC96rASoOx39t2cx7biiH1UyZS6vLLNM7VzaRi+Ox8kIyQBBc9VDBw
ETx6jH0npx9KT6NGzhTq0cItgtOMH1a9a/CdzZ+vY9FgSVQlxukEjfxYRVA2WRyt
y8DA5o23kZAUbi00QqP1Z42wVPVXn/2eEtdDl9GETyPcK/hmB5uiFEiwb62dQ5ok
cfPAMq2oSFXI484r0PeK
=jLX/
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a recently introduced issue in the ACPI CPPC driver and an
obscure error hanling bug in the APEI code.
Specifics:
- Fix an error handling issue in the ACPI APEI implementation of the
>read callback in struct pstore_info (Takashi Iwai).
- Fix a possible out-of-bounds arrar read in the ACPI CPPC driver
(Colin Ian King)"
* tag 'acpi-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
ACPI: CPPC: remove initial assignment of pcc_ss_data
- Fix an issue in the PCI handling of the "thaw" transition during
hibernation (after creating an image), introduced by a bug fix
from the 4.13 cycle and exposed by recent changes in the IRQ
subsystem, that caused pci_restore_state() to be called for
devices in low-power states in some cases which is incorrect
and breaks MSI management on some systems (Rafael Wysocki).
- Fix a recent regression in the imx6q cpufreq driver that broke
speed grading on i.MX6 QuadPlus by omitting checks causing
invalid operating performance points (OPPs) to be disabled on
that SoC as appropriate (Lucas Stach).
- Fix a regression introduced during the 4.14 cycle in the ondemand
and conservative cpufreq governors that causes the sampling
interval used by them to be shorter than the tick period in some
cases which leads to incorrect decisions (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaOrIPAAoJEILEb/54YlRxsysP/3pGtkj96iy5lyXlEv6vnCby
/Lo/5Qcfa9pM2P32AoeML+07N6YximK6IwEReVlswudKAryTa2jKQu9xY4gOH6dK
MprDkLPOSJL/CpxWRTOHYbZNSAopDc4ZzaVLH0P3XCoLoVWJ4z7LdK6jGkx1zJRE
UUWazE9G/K9Zw6U3ZbcOQOwrEf36pzNT0CEmFIkgIkImz5eaXnEx0h6mwpxtxc+O
Y4Buk+bgtojcYXC5cFj/Ik9lr0w5gFqSugLx13lK4oWBdBNdnd7+LDnVOPcCNfAe
RyWVieqeB+8TmUtLo/dJJFqe9vaUhU0jOb55MQqRDu+vrhlvBQmRjRuyYXKvH3/R
BQl5jOj2PbFWLZ+gsxszO6sU+PeO6DUSRe6Kbj8wx23ONzgOfLqRhbTKlwEHkjeO
28NI4wN1Nkr9ueoTiBe/g/gtwW5ogUiHq23RN/Tqi8gV9IEPX1wHEeVZYnuhny/R
y1HUVJhHnob1K9DLW5rre08ic7fvzOaf593roRds7iP+hnIgFyAPM2AGtZg907bK
5tNElfck96spr5tD7OeLoN9kXIoZh2/GIJrhBiMJCi+mVvZ9mJRoIEpHQPMjRdAC
K9h6Tb/LrRvZ4LlZQiNvReqJq/qM7zdNQL7TA/8/37fs6QOgp1mAPOGFppwqNrgc
CTWw3ppXgCve2XdfLyno
=fTB6
-----END PGP SIGNATURE-----
Merge tag 'pm-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a regression in the ondemand and conservative cpufreq
governors that was introduced during the 4.13 cycle, a recent
regression in the imx6q cpufreq driver and a regression in the PCI
handling of hibernation from the 4.14 cycle.
Specifics:
- Fix an issue in the PCI handling of the "thaw" transition during
hibernation (after creating an image), introduced by a bug fix from
the 4.13 cycle and exposed by recent changes in the IRQ subsystem,
that caused pci_restore_state() to be called for devices in
low-power states in some cases which is incorrect and breaks MSI
management on some systems (Rafael Wysocki).
- Fix a recent regression in the imx6q cpufreq driver that broke
speed grading on i.MX6 QuadPlus by omitting checks causing invalid
operating performance points (OPPs) to be disabled on that SoC as
appropriate (Lucas Stach).
- Fix a regression introduced during the 4.14 cycle in the ondemand
and conservative cpufreq governors that causes the sampling
interval used by them to be shorter than the tick period in some
cases which leads to incorrect decisions (Rafael Wysocki)"
* tag 'pm-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: governor: Ensure sufficiently large sampling intervals
cpufreq: imx6q: fix speed grading regression on i.MX6 QuadPlus
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
A bunch of really small fixes here, all driver specific and mostly in
error handling and remove paths. The most important fixes are for the
a3700 clock configuration and a fix for a nasty stall which could
potentially cause data corruption with the xilinx driver.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlo6kgoTHGJyb29uaWVA
a2VybmVsLm9yZwAKCRAk1otyXVSH0KGDB/4sweNiUfktOAN1Y86bRmyrTvJctCCY
MAOAzDxvKjUuYEoq0LZWKEt0uIXM5+cB+YrghEn6e5lwdB0rRsifzRQ5D2iR8odf
xhxOW4uV2+RhCDiPWnKP7cOiOxahYPSBw1RNHsAhXlcT11rRQC1QMwpWLE0ET/WG
UreahRO1pwGOCPVCxkfDmv+DxTm3IE0TojT8GvC7QsAD2UapqbLAcCNGN8SoSKD9
8P24pjStItA/6JQVawUIaFLHgcAGHw3WeTf2TQbP6rEF5u5je4BGtELGjSZ72bdS
0RT3CP0eNnLPzh1XqcBGjEPDKWnnglNc0o9XSU3lTNbfgB1cQ2kFQDs/
=ILZY
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A bunch of really small fixes here, all driver specific and mostly in
error handling and remove paths.
The most important fixes are for the a3700 clock configuration and a
fix for a nasty stall which could potentially cause data corruption
with the xilinx driver"
* tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
spi: sun4i: disable clocks in the remove function
spi: rspi: Do not set SPCR_SPE in qspi_set_config_register()
spi: Fix double "when"
spi: a3700: Fix clk prescaling for coefficient over 15
spi: xilinx: Detect stall with Unknown commands
spi: imx: Update device tree binding documentation
- Fix message timing issues;
cros_ec_spi
- Report correct state when an error occurs;
cros_ec_spi
- Reorder enums used for Power Management;
rtsx_pci
- Use correct OF helper for obtaining child nodes;
twl4030-audio, twl6040
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJaOmxWAAoJEFGvii+H/Hdh7q0P/1/hilKPUkcYME7vE4ZNCVUC
fSZpUjcw4S444TvjjVm7RtRiP0KJMjqM+xUbfoe/Ofu/jhe/+QMu+ygLcEEujssR
qGfLHcPJDodeSwNSAl3Q53ZF8MSsZFPsWA+xKMkdjllM5n8qWPDVN/HkWZR0z9cV
RnwPo5dC/sdc+hBjhxWKLNTAPcJET1zo6IdOoDar2uAuypJB+1vUjTObzIfB0j2a
yUozPhJfTwVMpDvnGMW97XSjRa59KPzmMTnHDBZlqSYCRgD4bUKFp2SX579HpjIW
J/R2AFape6I+mXc7KpTQERncl+b02Uu/LRysQSG/x/SwOGIlrPgBGNxt41YLep/u
iFDMVfXyh4N7sZC2KESiXEQurm/r8w+YHRFf/Z6omv92kshZ/FFNhYp4CJ3WIAqD
ni3BrTXsQqQ0FI+revvG6TE3W4wcNoIfKAI35yy3q7uzOuuL7nDp7HI55JLzYTV8
OcLMTTuKFNUF02pQ1djwkcFxXgfLjF4QW4blLQwMVdzi9gwn9jsg7BcRXuRLScyw
sYm92tFpPHQryX4xaHXG7JwBLeo9OVGi0rIHMxo45IJ4ILqWdcFvOqfBA/PTRbVB
v0FVbtz53/2WLXY0MX7Ihg7Hq/HQG+FWtxEfHOGDBr99ZFR/vUK9le30Lsb7vd/o
pFN+b/7YGMtUPO2zCa1+
=vlCx
-----END PGP SIGNATURE-----
Merge tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MDF bugfixes from Lee Jones:
- Fix message timing issues and report correct state when an error
occurs in cros_ec_spi
- Reorder enums used for Power Management in rtsx_pci
- Use correct OF helper for obtaining child nodes in twl4030-audio and
twl6040
* tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: Fix RTS5227 (and others) powermanagement
mfd: cros ec: spi: Fix "in progress" error signaling
mfd: twl6040: Fix child-node lookup
mfd: twl4030-audio: Fix sibling-node lookup
mfd: cros ec: spi: Don't send first message too soon
All stable fixes here:
- A regression fix of USB-audio for the previous hardening patch
- A potential UAF fix in rawmidi
- HD-audio and USB-audio quirks, the missing new ID
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlo6JA8OHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9dChAAvHUQUkik1F8KedxtEH+DCuaj0lz7y4vEtv+t
H/dbUMujlh8ywFORLevGMI095KP6ZDs1lxHvTijK8JhgOLbzBCBTU8PPSnsHGz0T
Z2Mxn1RBcaoXGK7Bcz3GVl3zxAsGRnNRrzSIui7OxfZ7fa8zNo0pP9MSz2ShXw9+
gfDjPbBcNV7sr2YY3BQ5DfOyhr/Csa4IhAXNYfc4A+Z8YRlc+3CRGWkNmerqK+eN
M4NZgM6ZfGV6aQCjt+fxciygSQfA+qgcilveScddnm3KTQ/GAfVxpw2IL3Q47PuC
zbL83vVUYzSTP9V0ZtVbrQVEl4Tbr32214B+bwqYyiY2/GTNDpGTmpqSgkVlz3vQ
VDk7rWlAQfaqXeHJo3sjGUCKqC+exEvMUUSXjJXz59eP/l2tXdK6Q8GC9U2p9H5N
UVAyl5NbEU0Zg7t8Jgvpxt/6BDGNDWN+vrWr7rnva/leMw3VUz2mR5oNjj2uDDWE
p8TydPKBZetYjTFqaH/H3e/5NZHyhp7jcmiZyCuq6wMsgOYnnw56eJfXXYj856UJ
M71es/3famBhTj8j0Fj36Zsn9QJQdGFya3MgdjPgV17yUWhzVPePBO6Han3a6bH0
ZXalibypQkvfIv8+xUbLpHznSsFjoA7dyQ4NO8gIHUpU/GDA23inyDltiM3Y5g/9
wcWQKfA=
=A9uJ
-----END PGP SIGNATURE-----
Merge tag 'sound-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All stable fixes here:
- a regression fix of USB-audio for the previous hardening patch
- a potential UAF fix in rawmidi
- HD-audio and USB-audio quirks, the missing new ID"
* tag 'sound-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
ALSA: hda/realtek - Fix Dell AIO LineOut issue
ALSA: rawmidi: Avoid racy info ioctl via ctl device
ALSA: hda - Add vendor id for Cannonlake HDMI codec
ALSA: usb-audio: Add native DSD support for Esoteric D-05X
Commit 966a967116 randomly added alignment to this structure, but
it's actually detrimental to performance of null_blk. Test case:
Running on both the home and remote node shows a ~5% degradation
in performance.
While in there, move blk_status_t to the hole after the integer tag
in the nullb_cmd structure. After this patch, we shrink the size
from 192 to 152 bytes.
Fixes: 966a967116 ("smp: Avoid using two cache lines for struct call_single_data")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous change blindly added massive alignment to the
call_single_data structure in struct request. This ballooned it in size
from 296 to 320 bytes on my setup, for no valid reason at all.
Use the unaligned struct __call_single_data variant instead.
Fixes: 966a967116 ("smp: Avoid using two cache lines for struct call_single_data")
Cc: stable@vger.kernel.org # v4.14
Signed-off-by: Jens Axboe <axboe@kernel.dk>