Commit Graph

798566 Commits

Author SHA1 Message Date
Ming Lei
5938870247 blk-mq: re-build queue map in case of kdump kernel
Now almost all .map_queues() implementation based on managed irq
affinity doesn't update queue mapping and it just retrieves the
old built mapping, so if nr_hw_queues is changed, the mapping talbe
includes stale mapping. And only blk_mq_map_queues() may rebuild
the mapping talbe.

One case is that we limit .nr_hw_queues as 1 in case of kdump kernel.
However, drivers often builds queue mapping before allocating tagset
via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set
as 1 in case of kdump kernel, so wrong queue mapping is used, and
kernel panic[1] is observed during booting.

This patch fixes the kernel panic triggerd on nvme by rebulding the
mapping table via blk_mq_map_queues().

[1] kernel panic log
[    4.438371] nvme nvme0: 16/0/0 default/read/poll queues
[    4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[    4.444681] PGD 0 P4D 0
[    4.445367] Oops: 0000 [#1] SMP NOPTI
[    4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459
[    4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[    4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[    4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222
[    4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6
[    4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286
[    4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001
[    4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001
[    4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002
[    4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538
[    4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001
[    4.469220] FS:  0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000
[    4.471554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0
[    4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    4.477061] PKRU: 55555554
[    4.477464] Call Trace:
[    4.478731]  blk_mq_init_allocated_queue+0x36a/0x3ad
[    4.479595]  blk_mq_init_queue+0x32/0x4e
[    4.480178]  nvme_validate_ns+0x98/0x623 [nvme_core]
[    4.480963]  ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core]
[    4.481685]  ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core]
[    4.482601]  nvme_scan_work+0x23a/0x29b [nvme_core]
[    4.483269]  ? _raw_spin_unlock_irqrestore+0x25/0x38
[    4.483930]  ? try_to_wake_up+0x38d/0x3b3
[    4.484478]  ? process_one_work+0x179/0x2fc
[    4.485118]  process_one_work+0x1d3/0x2fc
[    4.485655]  ? rescuer_thread+0x2ae/0x2ae
[    4.486196]  worker_thread+0x1e9/0x2be
[    4.486841]  kthread+0x115/0x11d
[    4.487294]  ? kthread_park+0x76/0x76
[    4.487784]  ret_from_fork+0x3a/0x50
[    4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables
[    4.489428] Dumping ftrace buffer:
[    4.489939]    (ftrace buffer empty)
[    4.490492] CR2: 0000000000000098
[    4.491052] ---[ end trace 03cd268ad5a86ff7 ]---

Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-nvme@lists.infradead.org
Cc: David Milburn <dmilburn@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Dennis Zhou
4705de735b blkcg: put back rcu lock in blkcg_bio_issue_check()
I was a little overzealous in removing the rcu_read_lock() call from
blkcg_bio_issue_check() and it broke blk-throttle. Put it back.

Fixes: e35403a034bf ("blkcg: associate blkg when associating a device")
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Josef Bacik
d3fcdff190 block: convert io-latency to use rq_qos_wait
Now that we have this common helper, convert io-latency over to use it
as well.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Josef Bacik
b6c7b58f5f block: convert wbt_wait() to use rq_qos_wait()
Now that we have rq_qos_wait() in place, convert wbt_wait() over to
using it with it's specific callbacks.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Josef Bacik
84f603246d block: add rq_qos_wait to rq_qos
Originally when I split out the common code from blk-wbt into rq_qos I
left the wbt_wait() where it was and simply copied and modified it
slightly to work for io-latency.  However they are both basically the
same thing, and as time has gone on wbt_wait() has ended up much smarter
and kinder than it was when I copied it into io-latency, which means
io-latency has lost out on these improvements.

Since they are the same thing essentially except for a few minor things,
create rq_qos_wait() that replicates what wbt_wait() currently does with
callbacks that can be passed in for the snowflakes to do their own thing
as appropriate.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Dennis Zhou
7754f669ff blkcg: rename blkg_try_get() to blkg_tryget()
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or %NULL.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Dennis Zhou
7fcf2b033b blkcg: change blkg reference counting to use percpu_ref
Every bio is now associated with a blkg putting blkg_get, blkg_try_get,
and blkg_put on the hot path. Switch over the refcnt in blkg to use
percpu_ref.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Dennis Zhou
6f70fb6618 blkcg: remove bio_disassociate_task()
Now that a bio only holds a blkg reference, so clean up is simply
putting back that reference. Remove bio_disassociate_task() as it just
calls bio_disassociate_blkg() and call the latter directly.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:38 -07:00
Dennis Zhou
fc5a828bfa blkcg: remove additional reference to the css
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
db6638d7d1 blkcg: remove bio->bi_css and instead use bio->bi_blkg
Prior patches ensured that any bio that interacts with a request_queue
is properly associated with a blkg. This makes bio->bi_css unnecessary
as blkg maintains a reference to blkcg already.

This removes the bio field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
fd42df305f blkcg: associate writeback bios with a blkg
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg(). In
this patch, wbc_init_bio() now requires a bio to have a device
associated with it.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
6a7f6d86a5 blkcg: associate a blkg for pages being evicted by swap
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
e439bedf6b blkcg: consolidate bio_issue_init() to be a part of core
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
5cdf2e3fea blkcg: associate blkg when associating a device
Previously, blkg association was handled by controller specific code in
blk-throttle and blk-iolatency. However, because a blkg represents a
relationship between a blkcg and a request_queue, it makes sense to keep
the blkg->q and bio->bi_disk->queue consistent.

This patch moves association into the bio_set_dev macro(). This should
cover the majority of cases where the device is set/changed keeping the
two pointers consistent. Fallback code is added to
blkcg_bio_issue_check() to catch any missing paths.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
892ad71f62 dm: set the static flush bio device on demand
The next patch changes the macro bio_set_dev() to associate a bio with a
blkg based on the device set. However, dm creates a static bio to be
used as the basis for cloning empty flush bios on creation. The
bio_set_dev() call in alloc_dev() will cause problems with the next
patch adding association to bio_set_dev() because the call is before the
bdev is associated with a gendisk (bd_disk is %NULL). To get around
this, set the device on the static bio every time and use that to clone
to the other bios.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:37 -07:00
Dennis Zhou
2268c0feb0 blkcg: introduce common blkg association logic
There are 3 ways blkg association can happen: association with the
current css, with the page css (swap), or from the wbc css (writeback).

This patch handles how association is done for the first case where we
are associating bsaed on the current css. If there is already a blkg
associated, the css will be reused and association will be redone as the
request_queue may have changed.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:36 -07:00
Dennis Zhou
beea9da07d blkcg: convert blkg_lookup_create() to find closest blkg
There are several scenarios where blkg_lookup_create() can fail such as
the blkcg dying, request_queue is dying, or simply being OOM. Most
handle this by simply falling back to the q->root_blkg and calling it a
day.

This patch implements the notion of closest blkg. During
blkg_lookup_create(), if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest() is introduced and used
during association so a bio is always attached to a blkg.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:36 -07:00
Dennis Zhou
b978962ad4 blkcg: update blkg_lookup_create() to do locking
To know when to create a blkg, the general pattern is to do a
blkg_lookup() and if that fails, lock and do the lookup again, and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create() to do locking and implement this
pattern. The old blkg_lookup_create() is renamed to
__blkg_lookup_create().  If a call site wants to do its own error
handling or already owns the queue lock, they can use
__blkg_lookup_create(). This will be used in upcoming patches.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:36 -07:00
Dennis Zhou
0fe061b9f0 blkcg: fix ref count issue with bio_blkcg() using task_css
The bio_blkcg() function turns out to be inconsistent and consequently
dangerous to use. The first part returns a blkcg where a reference is
owned by the bio meaning it does not need to be rcu protected. However,
the third case, the last line, is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

This can race against task migration and the cgroup dying. It is also
semantically different as it must be called rcu protected and is
susceptible to failure when trying to get a reference to it.

This patch adds association ahead of calling bio_blkcg() rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg(). In blk-iolatency, association is
moved above the bio_blkcg() call to ensure it will not return %NULL.

BFQ uses the old bio_blkcg() function, but I do not want to address it
in this series due to the complexity. I have created a private version
documenting the inconsistency and noting not to use it.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:36 -07:00
Jens Axboe
6e0de61107 blk-mq: remove QUEUE_FLAG_POLL from default MQ flags
We only support polling if we have poll queues now, but the flag is
being set by default. Remove the default QUEUE_FLAG_POLL setting, we'll
set it in blk_mq_init_allocated_queue() if we have poll queues available
for this device.

Fixes: 6544d229bf ("block: enable polling by default if a poll map is initalized")
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:36 -07:00
David S. Miller
8b78903bc5 Merge branch 'skb-headroom-slab-out-of-bounds'
Stefano Brivio says:

====================
Fix slab out-of-bounds on insufficient headroom for IPv6 packets

Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.

Patch 2/2 makes sure we avoid writing before the allocated buffer in
neigh_hh_output() in case the headroom is enough for the unaligned hardware
header size, but not enough for the aligned one, and that we warn if we hit
this condition.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:24:40 -08:00
Stefano Brivio
e6ac64d4c4 neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:24:40 -08:00
Stefano Brivio
66033f47ca ipv6: Check available headroom in ip6_xmit() even without options
Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.

On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().

KASan says:

[  264.967848] ==================================================================
[  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
[  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
[  264.967870]
[  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[  264.967887] Call Trace:
[  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
[  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
[  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
[  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
[  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
[  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
[  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
[  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
[  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
[  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
[  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
[  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
[  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
[  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
[  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
[  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
[  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
[  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
[  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0

[...]

Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.

This issue is older than git history.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:24:40 -08:00
Eric Dumazet
f9bfe4e6a9 tcp: lack of available data can also cause TSO defer
tcp_tso_should_defer() can return true in three different cases :

 1) We are cwnd-limited
 2) We are rwnd-limited
 3) We are application limited.

Neal pointed out that my recent fix went too far, since
it assumed that if we were not in 1) case, we must be rwnd-limited

Fix this by properly populating the is_cwnd_limited and
is_rwnd_limited booleans.

After this change, we can finally move the silly check for FIN
flag only for the application-limited case.

The same move for EOR bit will be handled in net-next,
since commit 1c09f7d073 ("tcp: do not try to defer skbs
with eor mark (MSG_EOR)") is scheduled for linux-4.21

Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
and checking none of them was rwnd_limited in the chrono_stat
output from "ss -ti" command.

Fixes: 41727549de ("tcp: Do not underestimate rwnd_limited")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:18:22 -08:00
Linus Torvalds
5f179793f0 virtio fixes
A couple of last-minute fixes.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcCdNyAAoJECgfDbjSjVRpNAoH/A2eW0+UjIQar+jPKh1XPwN6
 uOoPAS3AnXGC0qhlJb2/W77intpLmF/SMkOSNBfvg1MXYTPJRGXcjS7v7446qpZf
 iCk/UEDN3Ck0OuxoR4AfO5qkbZSDGBGYDoSvB4JLufy9PTaNCrd+gBM349Fg1cRc
 lkK6aXLe7vydB+x0rLyfCrBEccZ8UtmCGs1FIzd9bvDgRGzlPnPGwavX0w8lx7Jn
 ZtQClajG2BjF0kMu+1hW9m781yjh7TYwshG2lYhQama5a8x3X5jOIUDNknfw3uQd
 crWOPPlzIELJcWrmG2/psydNGcq1b0S6vhFbpO7BiFpvCvdW5hAMk75c7hBWpls=
 =DLS0
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost/virtio fixes from Michael Tsirkin:
 "A couple of last-minute fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/vsock: fix use-after-free in network stack callers
  virtio/s390: fix race in ccw_io_helper()
  virtio/s390: avoid race on vcdev->config
  vhost/vsock: fix reset orphans race with close timeout
2018-12-07 14:34:10 -08:00
Linus Torvalds
b8bf4692c9 - Avoid sending IPIs with interrupts disabled
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlwKvQgACgkQa9axLQDI
 XvH+uw//UTEMR3jPre+hLqXQ+/hWWu9yYQeD9PVsyce6a7dIJ2NsL0Y9k4eORArw
 2rftLWq1r2CjndlULwxge01WQ2RXGiDgQ5GFutwGw/6kWid/JPnHj4yhDyDz3mR7
 DtioRhVUx1X3I+gQIqlGZSlF24wHcJfcbbVDBpydpW428Qo0OqM4GZAiCQha7QCZ
 HUMFZgWrYh6vsSe77151cRqoe8gozwlJ3aYrgf1E7cORCVRCbc2uyKLNihizmzIr
 3FwSPPn6zVPiKHfoYPjoyIBGd5Um3I1MgUhRVY/w4sZbylsdVx9whJbC6J5rg3xu
 5gCU3B0tJbZCQJQlJwtUrWNocJ0CGwK20wBukpWtHPyIZg4lf9etzthXEP1Pqn6m
 WT7LXj3SMV0qykp8mPes+AbSKJHOhE9/yoS0wu1As/mGA6sbnu0p/T77DeovChC/
 MTvNFQTvd1MudqOGbvMJrWtyrKG5dPJsE1RZ/xdgqA+jEfgJ7foZz8XCnT8Ypf9X
 Gl0JwcnmhNRXXCZ6FQweyHUgkUMFyLce1YnvtTmWr9QP94HG32IpNBqOLhaIX7Nz
 2NvENq6pmKzzOBOXB8IWz29ZFUhaiM0gaI5BT8XmMQHI6q/Xctv04KY91aJ+RAoU
 zkVt5n1e2gIW3z7oAA6hfRbn7JRvF7TWNaGjRb23783Y3nBHQtw=
 =812H
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Avoid sending IPIs with interrupts disabled"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hibernate: Avoid sending cross-calling with interrupts disabled
2018-12-07 14:18:49 -08:00
Linus Torvalds
1cdc3624a1 Fixes for stackleak
- Remove tracing for inserted stack depth marking function (Anders Roxell)
 - Move gcc-plugin pass location to avoid objtool warnings (Alexander Popov)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlwKp1IWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuT9D/9DP75YerfMFxiVx8BsFnGVfPW3
 QWa/nf2c8VMhmouQ9OI8j8Nj+T4q5VXewbGC5I0F6b2YsIPjHOwK0PR557xn7jRi
 7bY3aTRzJs4v+dDYkXqTkGx4zQ9FSD3NDM0T4vtnVGEdOmojcvoLX6+V7WArOTaa
 M9oP4iNn5/+Z677HyMP3DyTY093WpCx0fNOAf1HI/kpM3TPVJiE5OLXBZY957N01
 eBrt0WHJkmaZkHeqUkK06RTxYzIKBQqFRw77pPiKq79ETxBEwHOgU2hmwwHBv4+h
 u6TQmy7aVsUiXfS1GVvkNkX/jCNxYuK8kP5dsd+cQKn3AfkDHj3RvBTOvrkD0xyF
 7F9Toz/Wpw8+/YVx8ks6cNrssmEq4rd6T7MJcoud1TwEG1o/bSUbPc4uednuIUGL
 sB4J6sxApL2vaZtgqUePVZZJVKwiryFa8LymihkHMfPU4dgCycrYLGa3A1ju9WVs
 psGYhFTEfC1KVLgTmfwZlxz/FWbRmSERRF7cl9cdw8mdlqkKxP1C//VgsdJXOnnW
 c51BS+XK9OI8HTYXmWah82ysuCE7qou4DUJA91jhyza5tEp2V5C0uhOQz2odFcBF
 8axjqExFr4YfAwIgtGOClPA0e5CaB4ASRbOIs8+WL03LiNbfP/p6+92TpnwaP637
 Q5CbAMIfKqNpqAcAJg==
 =1JZ6
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc stackleak plugin fixes from Kees Cook:

 - Remove tracing for inserted stack depth marking function (Anders
   Roxell)

 - Move gcc-plugin pass location to avoid objtool warnings (Alexander
   Popov)

* tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass
  stackleak: Mark stackleak_track_stack() as notrace
2018-12-07 13:13:07 -08:00
Linus Torvalds
52ab2ec005 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Disable the new crypto stats interface as it's still being changed

 - Fix potential uses-after-free in cbc/cfb/pcbc.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: user - Disable statistics interface
  crypto: do not free algorithm before using
2018-12-07 13:07:10 -08:00
Linus Torvalds
7b24f6c082 pci-v4.20-fixes-3
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlwKrrsUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxKmxAAjYa1d5fcj+7tG4xPLwr2OkKVhx5p
 GKiocbCohifhtCaqYf2WONDbE7hKgDaU6GdHOgnKY3ViSOxwuJmOcSn+npCE3kTf
 Ty56mEERt2CzopUr+jqBjgoabpymCVBybDDXbs4rogevc5mp4Y4czOKjiAURKSqn
 eSI/Z9O9g0OM6Bwc3aSqI7D88G1DDryrnKD2TX5YRVh0CPI/JvPR+HAhCM2350qT
 t3o5G0lIhHyI97gtW2ZKcijUYyqYbaUPkUHqe5uFIo6tAbz6LbUEFIFjXJgx5D7P
 GpIjDPm4zYY0W/ENMGgFsijEZF3eaTYK95BbYTge4I0n2wV9qrr9wHcX/iWwA0kT
 1KPMY8rYEepT8l+vtrufX/+z5olP0a36c8ySkmAAPdRtGTeFOa+Er/cNEHJPD26G
 rpCUjLgI2553NlDpf1RDJJXbctpHb9j2JMDRp2m7esRqNU9S4iEnPTa0wCbTaNFL
 OE1qE68ZwcKQPJvcajpmVZAiYwyba5jRhcIy1XHSMgVb5r+xlyJ8WXUatSjxquDq
 yJ14JxCsYVHiv4qATCDmN4SkgZN3f+7nPVrNpJWbwjbNOeofvQti1mKdcuvLKlTE
 VHMzPiOP7AExv/YMWOSsdHQ11pIrUv1/AbMC4ln2eAy0D5bhL3F4gi8FmBaba+53
 exf99NojNiAY1Ts=
 =h1rz
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Revert ASPM change that caused a regression"

* tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"
2018-12-07 12:58:34 -08:00
Shmulik Ladkani
1b4e5ad5d6 ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
initialization.

Fixes: 6c8702c60b ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 12:22:39 -08:00
Linus Torvalds
0b43a29979 for-linus-20181207
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwKqUQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg/8D/9AffzAvrfKo9aX8/yGcwFOE9VxmgGI8N/u
 XHVnuz+g+EjmTC3k+S3uMkp7+ED/oJvicqsbv93097ZVmEF1crdW/Uk0SLBkusTL
 njDBEWKgAz/nTOx8Tp1zWeXCMtIQs3LxUEIj+jPzR/ia9NW3shqNDM0v71u/tE6W
 Kjxi/uVxQ6VzJ8ppY+xCJC3d+QJ6r8SXdSmo84+OK/3u4jablX8QectrnuD/vBXd
 K9BfB5EewH+D+OR0MeGij8QlPdKV4an/RvGawFwRMQoUvJj3Uq0lh9dMEDcYjz9P
 7uHxPFAoUiNLBdasuSt4EVyRvVwnhfIvXa+6vC57Pi+vFo+GIH0v4lMa39nl0NJr
 woMLCIGS9cZCRRT9PL19bOo/JjHe8EvniHnHbJJTFG3w73SRg4TuTnjoodRLPR7G
 8TkNnO1F/LQFRn7J6+QgL17OxI4EySxSqCqTNNapHXealAQy+5P1JOCBfUWMlvsf
 it6piYt38zyKRAd3I+dw7VFkyS+7XCrKwbH6kphPRb+lhuvPo0IwxtFNMaGfbyFW
 FmDdeIHQNZ2U0MaaUn+dujkLi0eCbVw7T5FIxYuj4IQb3vefThya9FBgQsd2nlAn
 Eu2ATCclgp2wfTHlbg8lLGvjzbHgTRu5VrzjDg6If1kN8pXq6vy72oNLjN2smDHi
 eLxCawiC2g==
 =irzk
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20181207' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Let's try this again...

  We're finally happy with the DM livelock issue, and it's also passed
  overnight testing and the corruption regression test. The end result
  is much nicer now too, which is great.

  Outside of that fix, there's a pull request for NVMe with two small
  fixes, and a regression fix for BFQ from this merge window. The BFQ
  fix looks bigger than it is, it's 90% comment updates"

* tag 'for-linus-20181207' of git://git.kernel.dk/linux-block:
  blk-mq: punt failed direct issue to dispatch list
  nvmet-rdma: fix response use after free
  nvme: validate controller state before rescheduling keep alive
  block, bfq: fix decrement of num_active_groups
2018-12-07 10:40:37 -08:00
Linus Torvalds
52f842ccd6 Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "A set of driver bugfixes for the I2C subsystem"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode
  i2c: uniphier: fix violation of tLOW requirement for Fast-mode
  i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated START
  i2c: uniphier-f: fix timeout error after reading 8 bytes
  i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
  i2c: axxia: properly handle master timeout
  i2c: rcar: check bus state before reinitializing
  i2c: nvidia-gpu: limit reads also for combined messages
  i2c: nvidia-gpu: adhere to I2C fault codes
2018-12-07 10:31:31 -08:00
Linus Torvalds
c431b42058 dmaengine-4.20-rc6
dmaengine fixes for v4.20-rc6
 
  - Fixing imx-sdma handling of channel terminations, this involves
    reverting two commits and implement async termination
  - Fix cppi dma channel deletion from pending list on stop
  - Fix FIFO size for dw controller in Intel Merrifield
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcCfKmAAoJEHwUBw8lI4NHiDUP/2dOZoPBnzbSXRYVr0PQY1Us
 6aB3cClYtrLeGSKzSjiMgNqxRQ/VnMX3iZMmEseuLd9JNQXF48qWcH0HiPgWmoxy
 OgzcJxc/faAfFumpBnjM5Zm3+kzOWf0xPdVBkFKrJoLIMCZ4rU1bstEujzs+3ViE
 a8U2BoAADW6RWqjBSMjXGhRfmo0QxXz88wzzeV9l2gtFvgN32SlhD3+cHcRdV81K
 23XbeVD7ivmUKPTZWA+4T7O7ShGH2wB4it9DpTgCMsPdrQkdncBTjYg5JwSfNJ4j
 vfQQO02ctqm9ipZacifc0VknISzMIPRY2F/PScCuUfHbDfyw10u4NHGcZDgRrXLz
 BvE624XXZoISmRzLvRxdayo1pPVyVzvkT5BvTcsMGx44SrO8fmTf/M/B4HMlj6F+
 vCe+h6e77eY+FsIvZfrZiOYEp45OEHTT+kuv97kvLdfBl998CpP9K8B6aODIVCt2
 YujqHUSE/Udato9H5jrAms0tZDQqSdTgQU95ZHOWrNn9+MPWRLZ9HUBF7Hmp9zA8
 SbCDseUR1bc/WeW8ge/WAbS+KCL1wbwCXm5/t6JI7yXPn/eiVQgUMI71wphtiDFK
 hMSQX6oZy0vmblMiI7WjInZJn01+NigN7xiF2uxai4Xh7MvRbg9L6pFrgxc3wfB2
 3JPfWLdIXp/Zcf/qlR2E
 =qeTx
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "Another pull request for dmaengine. We got bunch of fixes early this
  week and all are tagged to stable. Hope this is last fix for this
  cycle:

   - Fix imx-sdma handling of channel terminations, this involves
     reverting two commits and implement async termination

   - Fix cppi dma channel deletion from pending list on stop

   - Fix FIFO size for dw controller in Intel Merrifield"

* tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dw: Fix FIFO size for Intel Merrifield
  dmaengine: cppi41: delete channel from pending list when stop channel
  dmaengine: imx-sdma: use GFP_NOWAIT for dma descriptor allocations
  dmaengine: imx-sdma: implement channel termination via worker
  Revert "dmaengine: imx-sdma: alloclate bd memory from dma pool"
  Revert "dmaengine: imx-sdma: Use GFP_NOWAIT for dma allocations"
2018-12-07 09:58:34 -08:00
Nick Desaulniers
ac3e233d29 x86/vdso: Drop implicit common-page-size linker flag
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.

Fixes: 2aae950b21 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Bill Wendling <morbo@google.com>
Suggested-by: Dmitry Golovin <dima@golovin.in>
Suggested-by: Rui Ueyama <ruiu@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Fangrui Song <maskray@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
Link: https://bugs.llvm.org/show_bug.cgi?id=38774
Link: https://github.com/ClangBuiltLinux/linux/issues/31
2018-12-07 18:57:38 +01:00
Will Deacon
b4aecf7808 arm64: hibernate: Avoid sending cross-calling with interrupts disabled
Since commit 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the
I-cache for kernel mappings"), a call to flush_icache_range() will use
an IPI to cross-call other online CPUs so that any stale instructions
are flushed from their pipelines. This triggers a WARN during the
hibernation resume path, where flush_icache_range() is called with
interrupts disabled and is therefore prone to deadlock:

  | Disabling non-boot CPUs ...
  | CPU1: shutdown
  | psci: CPU1 killed.
  | CPU2: shutdown
  | psci: CPU2 killed.
  | CPU3: shutdown
  | psci: CPU3 killed.
  | WARNING: CPU: 0 PID: 1 at ../kernel/smp.c:416 smp_call_function_many+0xd4/0x350
  | Modules linked in:
  | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc4 #1

Since all secondary CPUs have been taken offline prior to invalidating
the I-cache, there's actually no need for an IPI and we can simply call
__flush_icache_range() instead.

Cc: <stable@vger.kernel.org>
Fixes: 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Reported-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-12-07 15:52:39 +00:00
Jens Axboe
8b878ee247 Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph.

* 'nvme-4.20' of git://git.infradead.org/nvme:
  nvmet-rdma: fix response use after free
  nvme: validate controller state before rescheduling keep alive
2018-12-07 08:40:13 -07:00
Jens Axboe
c616cbee97 blk-mq: punt failed direct issue to dispatch list
After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.

Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
->queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.

This basically reverts ffe81d4532 and is based on a patch from Ming,
but with the list insert case covered as well.

Fixes: ffe81d4532 ("blk-mq: fix corruption with direct issue")
Cc: stable@vger.kernel.org
Suggested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 08:16:11 -07:00
Israel Rukshin
d7dcdf9d4e nvmet-rdma: fix response use after free
nvmet_rdma_release_rsp() may free the response before using it at error
flow.

Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-07 07:11:11 -08:00
James Smart
86880d6461 nvme: validate controller state before rescheduling keep alive
Delete operations are seeing NULL pointer references in call_timer_fn.
Tracking these back, the timer appears to be the keep alive timer.

nvme_keep_alive_work() which is tied to the timer that is cancelled
by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
wait for it's completion. So nvme_stop_keep_alive() only stops a timer
when it's pending. When a keep alive is in flight, there is no timer
running and the nvme_stop_keep_alive() will have no affect on the keep
alive io. Thus, if the io completes successfully, the keep alive timer
will be rescheduled.   In the failure case, delete is called, the
controller state is changed, the nvme_stop_keep_alive() is called while
the io is outstanding, and the delete path continues on. The keep
alive happens to successfully complete before the delete paths mark it
as aborted as part of the queue termination, so the timer is restarted.
The delete paths then tear down the controller, and later on the timer
code fires and the timer entry is now corrupt.

Fix by validating the controller state before rescheduling the keep
alive. Testing with the fix has confirmed the condition above was hit.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-07 07:11:11 -08:00
Paolo Valente
ba7aeae553 block, bfq: fix decrement of num_active_groups
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).

Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.

A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.

To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.

This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f89f. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.

Fixes: 2d29c9f89f ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett <steven@liquorix.net>
Tested-by: Steven Barrett <steven@liquorix.net>
Tested-by: Lucjan Lucjanov <lucjan.lucjanov@gmail.com>
Reviewed-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 07:40:07 -07:00
Greg Kroah-Hartman
dbde117c31 GNSS fixes for 4.20-rc6
Here's a fix for a broken activation retry loop in the sirf driver.
 
 Included are also two MAINTAINERS updates.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Johan Hovold <johan@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCXAptswAKCRALxc3C7H1l
 CPSwAQChmb2y7CwP5il1PxqTa98F9hRx4H3DBCyPvS4WT5U1MgEAlrx1fnuAPiS3
 97Q6+ZRCyy5rbyjPpzJ6NwTyt13c4w0=
 =oN0i
 -----END PGP SIGNATURE-----

Merge tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss into char-misc-linus

Johan writes:

GNSS fixes for 4.20-rc6

Here's a fix for a broken activation retry loop in the sirf driver.

Included are also two MAINTAINERS updates.

All have been in linux-next with no reported issues.

Signed-off-by: Johan Hovold <johan@kernel.org>

* tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss:
  MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
  MAINTAINERS: add gnss scm tree
  gnss: sirf: fix activation retry handling
2018-12-07 14:05:28 +01:00
Long Li
6ac79291fb CIFS: Avoid returning EBUSY to upper layer VFS
EBUSY is not handled by VFS, and will be passed to user-mode. This is not
correct as we need to wait for more credits.

This patch also fixes a bug where rsize or wsize is used uninitialized when
the call to server->ops->wait_mtu_credits() fails.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-12-07 00:59:23 -06:00
Herbert Xu
e61efff4ae crypto: user - Disable statistics interface
Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-07 13:56:08 +08:00
Linus Torvalds
d387ac13ad ast, msm display fixes, amdgpu fixes, lease fix, omap fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcCc0fAAoJEAx081l5xIa+6X4P/1pTMY/XJob0UVrmygU+oZu8
 NIRO+axzV5EksEaWHCxVN5VtxaXpA1BMbFQeMpen2Vv2+Bts/YYP36Vm6okPK2Kh
 rBByQDdZvpaTovrypososqJ/BBD2ak6CGqiedbb35RKdetxYpVW+gFjEb9spNeOw
 fQAqSaP605sJg+YxVMIg8bEPP4hENSjx+CsmyjDcw3Q5CzeyYz3kFjZ9QKMtb8UK
 inCDc/A4phY9YiglKbdiMfJzJf8HN3ZYLwIOrXJqsmi4HTElyoWCXMMOZNMp4UVO
 c95+AGart+YIXRJ/mbO+l9fitvvaymHNRfTpyvgF7P/MjCrazul+rzhfpjP+hU8a
 oz/8XFUqVphBdWDLHo4FesqpGPMBuEWSs8143Ohz2lG3F/wZknrXkVWi9ojviZ1u
 ahUmaCi+OwyEMCZQNJVcgNdHZrSu06ZjOLyhfy0oab3mXyr03cS2BX4Gv5FslMjQ
 IQUntn9vPu7Po4nsENKV5uSoBozoLlKw4b+r3w2sQbJFLlUXavrsx5lSlTB9ELDF
 AHZsJ0NkLaXbY5It+atzmLLeHwdepYoM1Ocx3DKy9D2gqdaPXQdDTamvoLeq7dOg
 zmGh9wNJdtX3Y2/I/5YVeClpUhuR6vQtj1x2x7gDHE2rIs+IoGnfUBg2NW67HH4o
 JhF+LXQclvtsAzp3P86L
 =6wgz
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "There's a bit more in here than I'd like, and I'm hoping things calm
  down when I'm out.

  msm:
   - a bunch of display fixes for the new DPU
   - a couple of command submission fixes

  omap:
   - some DSI fixes

  ast:
   - driver unload crash fix

  core:
   - fix the lease uevent so userspace can distinguish it

  amd:
   - fix a bpc regression
   - fix lru handling regression
   - fixed firmware support for new GPUs
   - power management fixes for vega20"

* tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drm: (37 commits)
  drm/ast: Fix connector leak during driver unload
  drm/amdgpu/vcn: Update vcn.cur_state during suspend
  drm/amd/display: Fix overflow/truncation from strncpy.
  drm/amd/powerplay: improve OD code robustness
  drm/amdgpu: enlarge maximum waiting time of KIQ
  drm/fb-helper: Fix typo in parameter description
  drm/amd/powerplay: support SoftMin/Max setting for some specific DPM
  drm/amd/powerplay: issue pre-display settings for display change event
  drm/amd/powerplay: support new pptable upload on Vega20
  drm/amdgpu/gmc8: always load MC firmware in the driver
  drm/amdgpu/gmc8: update MC firmware for polaris
  drm/amdgpu: update mc firmware image for polaris12 variants
  drm/msm: Fix error return checking
  drm/msm/dpu: Ignore alpha for XBGR8888 format
  drm/msm: dpu: Fix "WARNING: invalid free of devm_ allocated data"
  drm/msm/hdmi: Drop pointless static qualifier in msm_hdmi_bind()
  drm/msm: Move fence put to where failure occurs
  drm/msm: dpu: Don't set legacy plane->crtc pointer
  drm/msm/gpu: Don't map command buffers with nr_relocs equal to 0
  drm/msm/hdmi: Enable HPD after HDMI IRQ is set up
  ...
2018-12-06 19:35:50 -08:00
Linus Torvalds
7f80c7325b NFS client bugfixes for Linux 4.20
Highlights include:
 
 Stable fixes:
  - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data.
 
 Bugfixes:
  - Fix a regression that causes the RPC receive code to hang
  - Fix call_connect_status() so that it handles tasks that got transmitted
    while queued waiting for the socket lock.
  - Fix a memory leak in call_encode()
  - Fix several other connect races.
  - Fix receive code error handling.
  - Use the discard iterator rather than MSG_TRUNC for compatibility with
    AF_UNIX/AF_LOCAL sockets.
  - nfs: don't dirty kernel pages read by direct-io
  - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4 data
    servers
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcCWIOAAoJEA4mA3inWBJc3BsP/i/VXd0ZSxxL8i/++qCR1KGT
 /p0+t2HbrhPzb3jKmuaBe/6T6bLMbpmkwbesA6cHENkaPiOqxPhxLsJlh4o2BHwg
 NcjAbbov/hkakFAHlp69KqiL7DZe8YEqQE8GlUnn+3C3RM3i2TSRQ3AGXUH22P2a
 MY5fqiub2PmEwe2UZR8BzIEQd5w60AzTNXzQb181/+SCTOPdJTKneh0Tw54lD4d6
 vWKhi64cyQxQxshCvrX6IpcNWu9qwm7qDGQ3rDAg0whunve4YGtTz1suRUk888M4
 VfNxA8skFZuaQS/UU6oek2xaeMlSzEoJQXimKLYTEJKoqf7sWxfNLAfqHwnfyo4T
 Yab3cfVRs5KgEltVZyodb9oVQd6KI13hYeT+vXubz2kq1Ode4NJCnzgEefOP0hNV
 ENDal0hqBrfjfVIkpg/wfgRJln/W4Y/U0oPPm50eJJxa0ZKTfftBWo4me5DwCFF9
 0/XhPdFWTvZsYjmSGRC1RsaSrzUvO+wFo3tKQ2lQqf8QP3ix9ZtGQHN+h8RN9SxK
 ti5OxTMsfM3jYg7+yu4yOAQkcCcoaDA37+JztpuUSlMRfNss8uM7cQKsQ4WQf6Nr
 24At5Wr/ib7hVkAQ5oB98UWh5q1ZLzmmHhzsf8KacTSNcfjgu0H0DmKtm3CfThFK
 xfTHotzM3IqbUXRZQ7++
 =M/mt
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "This is mainly fallout from the updates to the SUNRPC code that is
  being triggered from less common combinations of NFS mount options.

  Highlights include:

  Stable fixes:
   - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data.

  Bugfixes:
   - Fix a regression that causes the RPC receive code to hang
   - Fix call_connect_status() so that it handles tasks that got
     transmitted while queued waiting for the socket lock.
   - Fix a memory leak in call_encode()
   - Fix several other connect races.
   - Fix receive code error handling.
   - Use the discard iterator rather than MSG_TRUNC for compatibility
     with AF_UNIX/AF_LOCAL sockets.
   - nfs: don't dirty kernel pages read by direct-io
   - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4
     data servers"

* tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Don't force a redundant disconnection in xs_read_stream()
  SUNRPC: Fix up socket polling
  SUNRPC: Use the discard iterator rather than MSG_TRUNC
  SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request()
  SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag
  SUNRPC: Fix RPC receive hangs
  SUNRPC: Fix a potential race in xprt_connect()
  SUNRPC: Fix a memory leak in call_encode()
  SUNRPC: Fix leak of krb5p encode pages
  SUNRPC: call_connect_status() must handle tasks that got transmitted
  nfs: don't dirty kernel pages read by direct-io
  flexfiles: enforce per-mirror stateid only for v4 DSes
2018-12-06 18:57:04 -08:00
Linus Torvalds
b72f711a4e Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM spectre fix from Russell King:
 "Exynos folk noticed that CPU hotplug wasn't working with their kernel
  configuration, and have tested this as fixing the problem"

* 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: ensure that processor vtables is not lost after boot
2018-12-06 16:45:36 -08:00
Linus Torvalds
7e40b56c77 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Some small fixes that have been accumulated:

   - Chris Cole noticed that in a SMP environment, the DMA cache
     coherence handling can produce undesirable results in a corner
     case

   - Propagate that fix for ARMv7M as well

   - Fix a false positive with source fortification

   - Fix an uninitialised return that Nathan Jones spotted"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8816/1: dma-mapping: fix potential uninitialized return
  ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart
  ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
  ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE
2018-12-06 16:39:44 -08:00
Masahiro Yamada
ece27a337d i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode
Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode,
tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us.

tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode:
  Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us
  Fast-mode:     tLOW = 1.39 us, tHIGH = 1.11 us

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-06 23:14:59 +01:00
Masahiro Yamada
8469636ab5 i2c: uniphier: fix violation of tLOW requirement for Fast-mode
Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode,
tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us.

tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode:
  Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us
  Fast-mode:     tLOW = 1.39 us, tHIGH = 1.11 us

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-06 23:14:59 +01:00
Masahiro Yamada
cd8843f541 i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated START
- For a repeated START condition, this controller starts data transfer
   immediately after the slave address is written to the TX-FIFO.

 - Once the TX-FIFO empty interrupt is asserted, the controller makes
   a pause even if additional data are written to the TX-FIFO.

Given those circumstances, the data after a repeated START may not be
transferred if the interrupt is asserted while the TX-FIFO is being
filled up. A more reliable way is to append TX data only in the
interrupt handler.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-12-06 23:14:59 +01:00