Commit Graph

873808 Commits

Author SHA1 Message Date
Matt Lupfer
5ff8437214 scsi: virtio_scsi: unplug LUNs when events missed
The event handler calls scsi_scan_host() when events are missed, which will
hotplug new LUNs.  However, this function won't remove any unplugged LUNs.
The result is that hotunplug doesn't work properly when the number of
unplugged LUNs exceeds the event queue size (currently 8).

Scan existing LUNs when events are missed to check if they are still
present.  If not, remove them.

Link: https://lore.kernel.org/r/20190905181903.29756-1-mlupfer@ddn.com
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-10 22:10:17 -04:00
Martin Wilck
57adf5d4cf scsi: scsi_dh_rdac: zero cdb in send_mode_select()
cdb in send_mode_select() is not zeroed and is only partially filled in
rdac_failover_get(), which leads to some random data getting to the
device. Users have reported storage responding to such commands with
INVALID FIELD IN CDB. Code before commit 3278255741 was not affected, as
it called blk_rq_set_block_pc().

Fix this by zeroing out the cdb first.

Identified & fix proposed by HPE.

Fixes: 3278255741 ("scsi_dh_rdac: switch to scsi_execute_req_flags()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20190904155205.1666-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Ales Novak <alnovak@suse.cz>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-10 21:25:36 -04:00
zhengbin
f96d279f0a scsi: fcoe: fix null-ptr-deref Read in fc_release_transport
In fcoe_if_init, if fc_attach_transport(&fcoe_vport_fc_functions)
fails, need to free the previously memory and return fail, otherwise
will trigger null-ptr-deref Read in fc_release_transport.

fcoe_exit
  fcoe_if_exit
    fc_release_transport(fcoe_vport_scsi_transport)

Link: https://lore.kernel.org/r/1566279789-58207-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-10 21:19:55 -04:00
Linus Walleij
5fbe5b5883 gpio: Initialize the irqchip valid_mask with a callback
After changing the valid_mask for the struct gpio_chip
to detect the need and presence of a valid mask with the
presence of a .init_valid_mask() callback to fill it in,
we augment the gpio_irq_chip to use the same logic.

Switch all driver using the gpio_irq_chio valid_mask
over to this new method.

This makes sure the valid_mask for the gpio_irq_chip gets
filled in when we add the gpio_chip, which makes it a
little easier to switch over drivers using the old
way of setting up gpio_irq_chip over to the new method
of passing the gpio_irq_chip along with the gpio_chip.
(See drivers/gpio/TODO for details.)

Cc: Joel Stanley <joel@jms.id.au>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
Link: https://lore.kernel.org/r/20190904140104.32426-1-linus.walleij@linaro.org
2019-09-11 01:09:37 +01:00
Tony Luck
38c7a30a9d Documentation/process: Volunteer as the ambassador for Intel
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Trilok Soni <tsoni@codeaurora.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20190910172646.25BFCE7B@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-10 23:20:40 +01:00
zhong jiang
16a0f687ca misc: mic: Use PTR_ERR_OR_ZERO rather than its implementation
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better to
use it directly. hence just replace it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Link: https://lore.kernel.org/r/1567665795-5901-3-git-send-email-zhongjiang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-10 23:18:05 +01:00
Pablo Neira Ayuso
be2861dc36 netfilter: nft_{fwd,dup}_netdev: add offload support
This patch adds support for packet mirroring and redirection. The
nft_fwd_dup_netdev_offload() function configures the flow_action object
for the fwd and the dup actions.

Extend nft_flow_rule_destroy() to release the net_device object when the
flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps
the net_device reference counter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>
2019-09-10 22:44:29 +02:00
Saeed Mahameed
fa355bb1b0 net/mlx5: FWTrace, Reduce stack usage
Mark mlx5_tracer_print_trace as noinline as the function only uses 512
bytes on the stack to avoid the following build warning:

drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:660:13: error: stack frame size of 1032 bytes in function 'mlx5_fw_tracer_handle_traces' [-Werror,-Wframe-larger-than=]

Fixes: 70dd6fdb89 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-10 13:43:27 -07:00
Nathan Chancellor
334a306f7b net/mlx5: Fix addr's type in mlx5dr_icm_dm
clang errors when CONFIG_PHYS_ADDR_T_64BIT is not set:

drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c:121:8:
error: incompatible pointer types passing 'u64 *' (aka 'unsigned long
long *') to parameter of type 'phys_addr_t *' (aka 'unsigned int *')
[-Werror,-Wincompatible-pointer-types]
                                   &icm_mr->dm.addr, &icm_mr->dm.obj_id);
                                   ^~~~~~~~~~~~~~~~
include/linux/mlx5/driver.h:1092:39: note: passing argument to parameter
'addr' here
                         u64 length, u16 uid, phys_addr_t *addr, u32 *obj_id);
                                                           ^
1 error generated.

Use phys_addr_t for addr's type in mlx5dr_icm_dm, which won't change
anything with 64-bit builds because phys_addr_t is u64 when
CONFIG_PHYS_ADDR_T_64BIT is set, which is always when CONFIG_64BIT is
set.

Fixes: 29cf8febd1 ("net/mlx5: DR, ICM pool memory allocator")
Link: https://github.com/ClangBuiltLinux/linux/issues/653
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-10 13:43:16 -07:00
Nathan Chancellor
7550d5415c net/mlx5: Fix rt's type in dr_action_create_reformat_action
clang warns:

drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9:
warning: implicit conversion from enumeration type 'enum
mlx5_reformat_ctx_type' to different enumeration type 'enum
mlx5dr_action_type' [-Wenum-conversion]
                        rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
                           ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1082:9:
warning: implicit conversion from enumeration type 'enum
mlx5_reformat_ctx_type' to different enumeration type 'enum
mlx5dr_action_type' [-Wenum-conversion]
                        rt = MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL;
                           ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51:
warning: implicit conversion from enumeration type 'enum
mlx5dr_action_type' to different enumeration type 'enum
mlx5_reformat_ctx_type' [-Wenum-conversion]
                ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz, data,
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            ^~
3 warnings generated.

Use the right type for rt, which is mlx5_reformat_ctx_type so there are
no warnings about mismatched types.

Fixes: 9db810ed2d ("net/mlx5: DR, Expose steering action functionality")
Link: https://github.com/ClangBuiltLinux/linux/issues/652
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Austin Kim <austindh.kim@gmail.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-10 13:43:05 -07:00
Fernando Fernandez Mancera
ee394f96ad netfilter: nft_synproxy: add synproxy stateful object support
Register a new synproxy stateful object type into the stateful object
infrastructure.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-10 22:35:37 +02:00
Dan Robertson
ffd96868ac hwmon: (shtc1) add support for the SHTC3 sensor
Add support for the Sensirion SHTC3 humidity and temperature sensor to
the shtc1 module.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Link: https://lore.kernel.org/r/20190905014554.21658-2-dan@dlrobertson.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-09-10 11:42:56 -07:00
Dan Robertson
fdc7d8e829 hwmon: (shtc1) fix shtc1 and shtw1 id mask
Fix an error in the bitmaskfor the shtc1 and shtw1 bitmask used to
retrieve the chip ID from the ID register. See section 5.7 of the shtw1
or shtc1 datasheet for details.

Fixes: 1a539d372e ("hwmon: add support for Sensirion SHTC1 sensor")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Link: https://lore.kernel.org/r/20190905014554.21658-3-dan@dlrobertson.com
[groeck: Reordered to be first in series and adjusted accordingly]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-09-10 11:42:22 -07:00
Tejun Heo
7c1ee704a1 iocost_monitor: Report debt
Report debt and rename del_ms row to delay for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
b06f2d35c6 iocost_monitor: Report more info with higher accuracy
When outputting json:

* Don't truncate numbers.

* Report address of iocg to ease drilling down further.

When outputting table:

* Use math.ceil() for delay_ms so that small delays don't read as 0.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e742bd5cb5 iocost_monitor: Always use strings for json values
Json has limited accuracy for numbers and can silently truncate 64bit
values, which can be extremely confusing.  Let's consistently use
string encapsulated values for json output.

While at it, convert an unnecesary f-string to str().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e1518f63f2 blk-iocost: Don't let merges push vtime into the future
Merges have the same problem that forced-bios had which is fixed by
the previous patch.  The cost of a merge is calculated at the time of
issue and force-advances vtime into the future.  Until global vtime
catches up, how the cgroup's hweight changes in the meantime doesn't
matter and it often leads to situations where the cost is calculated
at one hweight and paid at a very different one.  See the previous
patch for more details.

Fix it by never advancing vtime into the future for merges.  If budget
is available, vtime is advanced.  Otherwise, the cost is charged as
debt.

This brings merge cost handling in line with issue cost handling in
ioc_rqos_throttle().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
36a524814f blk-iocost: Account force-charged overage in absolute vtime
Currently, when a bio needs to be force-charged and there isn't enough
budget, vtime is simply pushed into the future.  This means that the
cost of the whole bio is scaled using the current hweight and then
charged immediately.  Until the global vtime advances beyond this
future vtime, the cgroup won't be allowed to issue normal IOs.

This is incorrect and can lead to, for example, exploding vrate or
extended stalls if vrate range is constrained.  Consider the following
scenario.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This is because the hweight the overage is calculated at is different
from the hweight that it's being paid at.

Fix it by remembering the overage in absoulte vtime and continuously
paying with the actual budget according to the current hweight at each
period.

Note that non-forced bios which wait already remembers the cost in
absolute vtime.  This brings forced-bio accounting in line.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e036c4caba blk-iocost: Fix incorrect operation order during iocg free
ioc_pd_free() first cancels the hrtimers and then deactivates the
iocg.  However, the iocg timer can run inbetween and reschedule the
hrtimers which will end up running after the iocg is freed leading to
crashes like the following.

  general protection fault: 0000 [#1] SMP
  ...
  RIP: 0010:iocg_kick_delay+0xbe/0x1b0
  RSP: 0018:ffffc90003598ea0 EFLAGS: 00010046
  RAX: 1cee00fd69512b54 RBX: ffff8881bba48400 RCX: 00000000000003e8
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881bba48400
  RBP: 0000000000004e20 R08: 0000000000000002 R09: 00000000000003e8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003598ef0
  R13: 00979f3810ad461f R14: ffff8881bba4b400 R15: 25439f950d26e1d1
  FS:  0000000000000000(0000) GS:ffff88885f800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f64328c7e40 CR3: 0000000002409005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   iocg_delay_timer_fn+0x3d/0x60
   __hrtimer_run_queues+0xfe/0x270
   hrtimer_interrupt+0xf4/0x210
   smp_apic_timer_interrupt+0x5e/0x120
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Fix it by canceling hrtimers after deactivating the iocg.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:17:04 -06:00
Xin Long
f794dc2304 sctp: fix the missing put_user when dumping transport thresholds
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
a transport thresholds info.

Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.

Fixes: 8add543e36 ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:32:28 +01:00
Cong Wang
d4d6ec6dac sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
it would make no progress inside the loop in hhf_dequeue() thus
kernel would get stuck.

Fix this by checking this corner case in hhf_change().

Fixes: 10239edf86 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Terry Lam <vtlam@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:31:00 +01:00
Cong Wang
8b142a00ed net_sched: check cops->tcf_block in tc_bind_tclass()
At least sch_red and sch_tbf don't implement ->tcf_block()
while still have a non-zero tc "class".

Instead of adding nop implementations to each of such qdisc's,
we can just relax the check of cops->tcf_block() in
tc_bind_tclass(). They don't support TC filter anyway.

Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:28:56 +01:00
Sean Christopherson
1edce0a9eb KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code
Move RDMSR and WRMSR emulation into common x86 code to consolidate
nearly identical SVM and VMX code.

Note, consolidating RDMSR introduces an extra indirect call, i.e.
retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a
guest kernel likely has bigger problems if increasing the latency of
RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM
performance.  E.g. the only recurring RDMSR VM-Exits (after booting) on
my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via
arch_cpu_idle_enter().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:29 +02:00
Sean Christopherson
f20935d85a KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers
Refactor the top-level MSR accessors to take/return the index and value
directly instead of requiring the caller to dump them into a msr_data
struct.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:14 +02:00
Xiaoyao Li
b274a29081 doc: kvm: Fix return description of KVM_SET_MSRS
Userspace can use ioctl KVM_SET_MSRS to update a set of MSRs of guest.
This ioctl set specified MSRs one by one. If it fails to set an MSR,
e.g., due to setting reserved bits, the MSR is not supported/emulated by
KVM, etc..., it stops processing the MSR list and returns the number of
MSRs have been set successfully.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:53 +02:00
Peter Xu
4f75bcc332 KVM: X86: Tune PLE Window tracepoint
The PLE window tracepoint triggers even if the window is not changed,
and the wording can be a bit confusing too.  One example line:

  kvm_ple_window: vcpu 0: ple_window 4096 (shrink 4096)

It easily let people think of "the window now is 4096 which is
shrinked", but the truth is the value actually didn't change (4096).

Let's only dump this message if the value really changed, and we make
the message even simpler like:

  kvm_ple_window: vcpu 4 old 4096 new 8192 (growed)

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:21 +02:00
Peter Xu
c5c5d6fae0 KVM: VMX: Change ple_window type to unsigned int
The VMX ple_window is 32 bits wide, so logically it can overflow with
an int.  The module parameter is declared as unsigned int which is
good, however the dynamic variable is not.  Switching all the
ple_window references to use unsigned int.

The tracepoint changes will also affect SVM, but SVM is using an even
smaller width (16 bits) so it's always fine.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:20 +02:00
Peter Xu
13a7e370cb KVM: X86: Remove tailing newline for tracepoints
It's done by TP_printk() already.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:19 +02:00
Peter Xu
d94fdcd7ea KVM: X86: Trace vcpu_id for vmexit
Tracing the ID helps to pair vmenters and vmexits for guests with
multiple vCPUs.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:18 +02:00
Paolo Bonzini
32d1d15c52 Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.4

- New ITS translation cache
- Allow up to 512 CPUs to be supported with GICv3 (for real this time)
- Now call kvm_arch_vcpu_blocking early in the blocking sequence
- Tidy-up device mappings in S2 when DIC is available
- Clean icache invalidation on VMID rollover
- General cleanup
2019-09-10 19:09:14 +02:00
David S. Miller
074be7fd99 Merge branch 'nfp-implement-firmware-loading-policy'
Simon Horman says:

====================
nfp: implement firmware loading policy

Dirk says:

This series adds configuration capabilities to the firmware loading policy of
the NFP driver.

NFP firmware loading is controlled via three HWinfo keys which can be set per
device: 'abi_drv_reset', 'abi_drv_load_ifc' and 'app_fw_from_flash'.
Refer to patch #11 for more detail on how these control the firmware loading.

In order to configure the full extend of FW loading policy, a new devlink
parameter has been introduced, 'reset_dev_on_drv_probe', which controls if the
driver should reset the device when it's probed. This, in conjunction with the
existing 'fw_load_policy' (extended to include a 'disk' option) provides the
means to tweak the NFP HWinfo keys as required by users.

Patches 1 and 2 adds the devlink modifications and patches 3 through 9 adds the
support into the NFP driver. Furthermore, the last 2 patches are documentation
only.

v2:
  Renamed all 'reset_dev_on_drv_probe' defines the same as the devlink parameter
  name (Jiri)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
40a962beeb Documentation: nfp: add nfp driver specific notes
This adds the initial documentation for the NFP driver specific
documentation.

Right now, only basic information is provided about acquiring firmware
and configuring device firmware loading.

Original driver documentation can be found here:
https://github.com/Netronome/nfp-drv-kmods/blob/master/README.md

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
8fb822ce93 kdoc: fix nfp_fw_load documentation
Fixed the incorrect prefix for the 'nfp_fw_load' function.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
0fbee0ec1f nfp: devlink: add 'reset_dev_on_drv_probe' support
Add support for the 'reset_dev_on_drv_probe' devlink parameter. The
reset control policy is controlled by the 'abi_drv_reset' hwinfo key.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
ff04788c5b nfp: devlink: add 'fw_load_policy' support
Add support for the 'fw_load_policy' devlink parameter. The FW load
policy is controlled by the 'app_fw_from_flash' hwinfo key.

Remap the values from devlink to the hwinfo key and back.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
165c3c9f8c nfp: add devlink param infrastructure
Register devlink parameters for driver use. Subsequent patches will add
support for specific parameters.

In order to support devlink parameters, the management firmware needs to
be able to lookup and set hwinfo keys.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
f8921d7330 nfp: honor FW reset and loading policies
The firmware reset and loading policies can be controlled with the
combination of three hwinfo keys, 'abi_drv_reset', 'abi_drv_load_ifc'
and 'app_fw_from_flash'.

'app_fw_from_flash' defines which firmware should take precedence,
'Disk', 'Flash' or the 'Preferred' firmware. When 'Preferred'
is selected, the management firmware makes the decision on which
firmware will be loaded by comparing versions of the flash firmware
and the host supplied firmware.

'abi_drv_reset' defines when the driver should reset the firmware when
the driver is probed, either 'Disk' if firmware was found on disk,
'Always' reset or 'Never' reset. Note that the device is always reset
on driver unload if firmware was loaded when the driver was probed.

'abi_drv_load_ifc' defines a list of PF devices allowed to load FW on
the device.

Furthermore, we limit the cases to where the driver will unload firmware
again when the driver is removed to only when firmware was loaded by the
driver and only if this particular device was the only one that could
have loaded firmware. This is needed to avoid firmware being removed
while in use on multi-host platforms.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
e69e9db903 nfp: nsp: add support for hwinfo set operation
Add support for the NSP HWinfo set command. This closely follows the
HWinfo lookup command.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
74612cdaf5 nfp: nsp: add support for optional hwinfo lookup
There are cases where we want to read a hwinfo entry from the NFP, and
if it doesn't exist, use a default value instead.

To support this, we must silence warning/error messages when the hwinfo
entry doesn't exist since this is a valid use case. The NSP command
structure provides the ability to silence command errors, in which case
the caller should log any command errors appropriately. Protocol errors
are unaffected by this.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
1da16f0c84 nfp: nsp: add support for fw_loaded command
Add support for the simple command that indicates whether application
firmware is loaded.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
5bbd21df5a devlink: add 'reset_dev_on_drv_probe' param
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the
device reset policy on driver probe.

This parameter is useful in conjunction with the existing
'fw_load_policy' parameter.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
e019a3b17f devlink: extend 'fw_load_policy' values
Add the 'disk' value to the generic 'fw_load_policy' devlink parameter.
This value indicates that firmware should always be loaded from disk
only.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
David S. Miller
4bb2f84a2a Merge branch 'net-dsa-mv88e6xxx-add-PCL-support'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: add PCL support

This small series implements the ethtool RXNFC operations in the
mv88e6xxx DSA driver to configure a port's Layer 2 Policy Control List
(PCL) supported by models such as 88E6352 and 88E6390 and equivalent.

This allows to configure a port to discard frames based on a configured
destination or source MAC address and an optional VLAN, with e.g.:

    # ethtool --config-nfc lan1 flow-type ether src 00:11:22:33:44:55 action -1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
da7dc87553 net: dsa: mv88e6xxx: add RXNFC support
Implement the .get_rxnfc and .set_rxnfc DSA operations to configure
a port's Layer 2 Policy Control List (PCL) via ethtool.

Currently only dropping frames based on MAC Destination or Source
Address (including the option VLAN parameter) is supported.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
f3a2cd326e net: dsa: mv88e6xxx: introduce .port_set_policy
Introduce a new .port_set_policy operation to configure a port's
Policy Control List, based on mapping such as DA, SA, Etype and so on.

Models similar to 88E6352 and 88E6390 are supported at the moment.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
d8291a956a net: dsa: mv88e6xxx: complete ATU state definitions
Marvell has different values for the state of a MAC address,
depending on its multicast bit. This patch completes the definitions
for these states.

At the same time, use 0 which is intuitive enough and simplifies the
code a bit, instead of the UC or MC unused value.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Jens Axboe
54a91f3bb9 io_uring: limit parallelism of buffered writes
All the popular filesystems need to grab the inode lock for buffered
writes. With io_uring punting buffered writes to async context, we
observe a lot of contention with all workers hamming this mutex.

For buffered writes, we generally don't need a lot of parallelism on
the submission side, as the flushing will take care of that for us.
Hence we don't need a deep queue on the write side, as long as we
can safely punt from the original submission context.

Add a workqueue with a limit of 2 that we can use for buffered writes.
This greatly improves the performance and efficiency of higher queue
depth buffered async writes with io_uring.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:49:35 -06:00
Jens Axboe
18d9be1a97 io_uring: add io_queue_async_work() helper
Add a helper for queueing a request for async execution, in preparation
for optimizing it.

No functional change in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:13:05 -06:00
Eric W. Biederman
821cc7b0b2 waitid: Add support for waiting for the current process group
It was recently discovered that the linux version of waitid is not a
superset of the other wait functions because it does not include support
for waiting for the current process group. This has two downsides:
1. An extra system call is needed to get the current process group.
2. After the current process group is received and before it is passed
   to waitid a signal could arrive causing the current process group to change.
   Inherent race-conditions as these make it impossible for userspace to
   emulate this functionaly and thus violate async-signal safety
   requirements for waitpid.

Arguments can be made for using a different choice of idtype and id
for this case but the BSDs already use this P_PGID and 0 to indicate
waiting for the current process's process group.  So be nice to user
space programmers and don't introduce an unnecessary incompatibility.

Some people have noted that the posix description is that
waitpid will wait for the current process group, and that in
the presence of pthreads that process group can change.  To get
clarity on this issue I looked at XNU, FreeBSD, and Luminos.  All of
those flavors of unix waited for the current process group at the
time of call and as written could not adapt to the process group
changing after the call.

At one point Linux did adapt to the current process group changing but
that stopped in 161550d74c ("pid: sys_wait... fixes").  It has been
over 11 years since Linux has that behavior, no programs that fail
with the change in behavior have been reported, and I could not
find any other unix that does this.  So I think it is safe to clarify
the definition of current process group, to current process group
at the time of the wait function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Zong Li <zongbox@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Link: https://lore.kernel.org/r/20190814154400.6371-2-christian.brauner@ubuntu.com
2019-09-10 17:05:46 +02:00
Paolo Bonzini
8146856b0a Merge tag 'kvm-ppc-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM update for 5.4

- Some prep for extending the uses of the rmap array
- Various minor fixes
- Commits from the powerpc topic/ppc-kvm branch, which fix a problem
  with interrupts arriving after free_irq, causing host hangs and crashes.
2019-09-10 16:51:17 +02:00