Commit Graph

840049 Commits

Author SHA1 Message Date
Martyna Szapar
24474f2709 i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
Fixed possible memory leak in i40e_vc_add_cloud_filter function:
cfilter is being allocated and in some error conditions
the function returns without freeing the memory.

Fix of integer truncation from u16 (type of queue_id value) to u8
when calling i40e_vc_isvalid_queue_id function.

Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:40:25 -07:00
Gustavo A. R. Silva
825f0a4eb7 i40e: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable size is not necessary, hence it
is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:34:43 -07:00
Maciej Paczkowski
0a92892c69 i40e: Revert ShadowRAM checksum calculation change
The reason of this revert is unexpected issue found in NVM Update tool
during NVM image downgrade. The implementation is no longer needed
since the QV tools are already aware of new FW double ShadowRAM dump
mechanism.

This patch reverts ShadowRAM checksum calculation change introduced in
commit 9d12f0c4e436 ("i40e: Revert ShadowRAM checksum calculation change")

Signed-off-by: Maciej Paczkowski <maciej.paczkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:30:58 -07:00
Martyna Szapar
d29e0d233e i40e: missing input validation on VF message handling by the PF
Patch is adding missing input validation on VF message handling
by the PF to the functions with opcodes:
	VIRTCHNL_OP_CONFIG_VSI_QUEUES = 6
	VIRTCHNL_OP_CONFIG_IRQ_MAP = 7,
	VIRTCHNL_OP_DISABLE_QUEUES = 9,
	VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE = 14,

Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:28:04 -07:00
Aleksandr Loktionov
2e45d3f467 i40e: Add support for X710 B/P & SFP+ cards
New device ids are created to support X710 backplane and SFP+ cards.

This patch adds in i40e driver support for 2.5GbaseT and 5GbaseT speed.
It's implemented by checking I40E_CAP_PHY_TYPE_2_5GBASE_T,
I40E_CAP_PHY_TYPE_5GBASE_T bits from f/w and setting corresponding bits
in ethtool link ksettings supported and advertising masks.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:24:48 -07:00
Grzegorz Siwik
c004804dce i40e: Wrong truncation from u16 to u8
In this patch fixed wrong truncation method from u16 to u8 during
validation.

It was changed by changing u8 to u32 parameter in method declaration
and arguments were changed to u32.

Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:22:48 -07:00
Sergey Nemov
7015ca3df9 i40e: add num_vectors checker in iwarp handler
Field num_vectors from struct virtchnl_iwarp_qvlist_info should not be
larger than num_msix_vectors_vf in the hw struct.  The iwarp uses the
same set of vectors as the LAN VF driver.

Signed-off-by: Sergey Nemov <sergey.nemov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:21:07 -07:00
Grzegorz Siwik
1aa874b42e i40e: Fix the typo in adding 40GE KR4 mode
This patch fixes the typo in I40E_CAP_PHY_TYPE mode link code.
It was fixed by changing 40000baseLR4_Full to 40000baseKR4_Full

Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:16:47 -07:00
Grzegorz Siwik
40a23040d8 i40e: Setting VF to VLAN 0 requires restart
This patch fixes a bug where changing VLAN to 0 was not set until VF
restart.

Now we are setting pvid info to 0 when we have to change VLAN to 0.
Without this change when VF VLAN was changed to 0 nothing happened until
VF restart. For changing to VLAN different than 0 it worked correctly.

Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:13:30 -07:00
Aleksandr Loktionov
e576e76966 i40e: add new pci id for X710/XXV710 N3000 cards
New device ids are created to support X710/XXV710 N3000 cards.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 17:10:52 -07:00
Grzegorz Siwik
937f599a11 i40e: VF's promiscuous attribute is not kept
This patch fixes a bug where the promiscuous mode was not being
kept when the VF switched to a new VLAN.
Now we are config two times a promiscuous mode when we switch VLAN.
Without this change when we change VF VLAN we still receive
all the packets from previous VLAN and only unicast from new VLAN.

Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 16:55:34 -07:00
Michal Swiatkowski
64439f8f0b ice: Disable sniffing VF traffic on PF
Delete code that add default Tx rule on PF. With this rule PF can see
Tx VF traffic that should go outside. For traffic from VF to another
VF default Tx rule on PF doesn't apply because of lower priority than
VF mac rule.

With this change on PF in promisc mode we can see only Rx traffic that
doesn't match any other rule (mac etc.). We can't see Tx traffic from
other VSI.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:44:47 -07:00
Jesse Brandeburg
0690527014 ice: Use more efficient structures
Move a bunch of members around to make more efficient use of
memory, eliminating holes where possible. None of these members
are hot path so cache line alignment is not very important here.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:40:36 -07:00
Jesse Brandeburg
0437f1a98a ice: Use bitfields where possible
The driver was converted to not use bool, but it was
neglected that the bools should have been converted to bit fields
as bit fields in software structures are ok, as long as they
use the correct kinds of unsigned types. This avoids
wasting lots of storage space to store single bit values.

One of the change hunks moves a variable lport out of
a group of "combinable" bit fields because all bits of
the u8 lport are valid and the variable can be packed in the
struct in struct holes.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:38:57 -07:00
Akeem G Abodunrin
d95276ced0 ice: Add function to program ethertype based filter rule on VSIs
This patch adds function to program VSI with ethertype based filter rule,
so that all flow control frames would be disallowed from being transmitted
to the client, in order to prevent malicious VSI, especially VF from
sending out PAUSE or PFC frames, and then control other VSIs traffic.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:36:28 -07:00
Tony Nguyen
8f529ff912 ice: Separate if conditions for ice_set_features()
Set features can have multiple features turned on|off in a single
call.  Grouping these all in an if/else means after one condition
is met, other conditions/features will not be evaluated.  Break
the if/else statements by feature to ensure all features will be
handled properly.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:33:44 -07:00
Tony Nguyen
a03499d614 ice: Remove __always_unused attribute
The variable netdev is being used in this function; remove the
__always_unused attribute from it.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:31:59 -07:00
Bruce Allan
c3a6825e82 ice: Suppress false-positive style issues reported by static analyzer
A recent version of cppcheck falsely reports-
    Variable ip.hdr is assigned a value that is never used.

ip is a union so the pointer ip.hdr is actually used when referenced as
ip.v4 and ip.v6.  Silence these false reports when using cppcheck with the
--inline-suppr command-line option.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:30:05 -07:00
Brett Creeley
e40c899a64 ice: Refactor getting/setting coalesce
Currently if the driver has an uneven amount of Rx/Tx queues
setting the coalesce settings through ethtool will result in
an error. This is happening because in the setting coalesce
flow we are reporting an error if either Rx or Tx fails.

Also, the flow for setting/getting per_q_coalesce and
setting/getting coalesce settings for the entire device
is different.

Fix these issues by adding one function, ice_set_q_coalesce(),
and another, ice_get_q_coalesce(), that both getting/setting
per_q and entire device coalesce can use. This makes handling
the error cases generic between the two flows and simplifies
__ice_set_coalesce() and __ice_get_coalesce().

Also, add a header comment to __ice_set_coalesce().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:25:26 -07:00
Brett Creeley
a85a3847fb ice: Always free/allocate q_vectors
Currently when probing/removing the driver we allocate/deallocate
each vsi->q_vectors array in ice_vsi_alloc_arrays() and
ice_vsi_free_arrays() respectively. However, we don't do this
during the reset and VSI rebuild flow. This is inconsistent
and unnecessary to have a difference between the two flows.

This patch makes the change to always allocate/deallocate the
vsi->q_vectors array regardless of the driver flow we are in.

Also, update the comment for ice_vsi_free_arrays() to be more
descriptive.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:22:55 -07:00
Bruce Allan
207e3721ac ice: Do not unnecessarily initialize local variable
The local variable speed does not need to be initialized and can cause some
static analysis tools to complain the initial assigned value is never used.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:21:01 -07:00
Michal Swiatkowski
ba0db585bd ice: Add more validation in ice_vc_cfg_irq_map_msg
Add few checks to validate msg from iavf driver.

Test if we have got enough q_vectors allocated in VSI connected with VF.
Add masks for itr_indx and msix_indx to avoid writing to reserved fieldi
of QINT. Clear q_vector->num_ring_rx/tx, without it we can increment this
value every time we send irq map msg from VF. So after second call this
value will be incorrect.

Decrement num_vectors from msg, because last vector in iavf msg is misc
vector (we don't set map for it).

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:18:27 -07:00
Akeem G Abodunrin
bb877b22bc ice: Don't remove VLAN filters that were never programmed
In case of non-trusted VFs, it is possible to program VLAN filter far
less than what is requested by the VF originally, thereby makes number of
VLAN elements being tracked by VF different from actual VLAN tags. This
patch makes sure that we are not attempting to remove VLAN filter that
does not exist.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 14:07:34 -07:00
Tony Nguyen
e80e76db6c ice: Preserve VLAN Rx stripping settings
When Tx insertion is set, we are not accounting for the state of Rx
stripping.  This causes Rx stripping to be enabled any time Tx
insertion is changed, even when it's supposed to be disabled.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 13:45:00 -07:00
Michal Swiatkowski
a52db6b260 ice: Fix for allowing too many MDD events on VF
Disable VF if any malicious device driver (MDD) event is detected by
hardware. Track vf->num_mdd_events for information about VF MDD events.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 13:42:53 -07:00
Jesse Brandeburg
819d899863 ice: Use pf instead of vsi-back
Many times in our functions we have a local variable pf, which is
equivalent to vsi->back. Just use pf consistently instead of vsi->back
where available.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-04 13:06:56 -07:00
Linus Torvalds
6203838dec powerpc fixes for 5.1 #7
One regression fix.
 
 Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes under
 load on some machines depending on memory layout.
 
 Thanks to:
   Christophe Leroy.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJczY3sAAoJEFHr6jzI4aWA9UkP/3iM1wt/L3iYpmfPA0MJ4h4f
 oEB2WR4KFhBHcMbLKUNS0m9MfecAxHyrFXW46lnFbqVLbaJ3rBSRBitTySo1VnT3
 wJKJi/rEf1uNmjxFTKRuE4eL1+4H2OVbwF+CLRB4jaILBmgDyvgYQDkeYQE689ut
 VjLA1C/8PRlKFO/82nGPaASdugqk4o+swxNxODhG+Bjv+9KGSinK4LN6zqyLoU/f
 aS4APLO59xd6vXosDZRiK6C5efoMu+8kAm8/pyA9E8+0TFTdrtZcPrbCme9Vj9xh
 i1Ruzxf9njZaSG37TQpgaFA/IKlSjBqAdaRmA6dy7fQ+GAOpt7YhtNIhk9prvOT6
 uGarcHuusQ/Nhy0pb+Kt5H90RKzcZr/pgDEjhujJdCnVozQnX4fvkOjm8SvV6vc4
 ucy2zzraMRo3OTvs38YAyGj/MPxOvMYZ1G+v/rAxpfvmdDl6KQ3uCvSczujP7qo6
 49VfvfENGRTQhb+KSQtD4VtR9rFvMco5F8m1TFlYI1+RcoxHqqcR8aTK22wQvs45
 2J97mkOvVeIp2pO7+yB5c2t3rPG9Qg9ARifnxXyF+9Cb1h1USy3vF2zUCj+wZSeP
 l+cYYcjrk/C6btDTVARMOXLKyeB4CWHaSCz8s3HIrXkuZsJbA3/qGi25mCWAEOVj
 1qN6UpFNICtXOHBCAPYx
 =/m4Q
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One regression fix.

  Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes
  under load on some machines depending on memory layout.

  Thanks to Christophe Leroy"

* tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX
2019-05-04 12:24:05 -07:00
Ming Lei
662156641b block: don't drain in-progress dispatch in blk_cleanup_queue()
Now freeing hw queue resource is moved to hctx's release handler,
we don't need to worry about the race between blk_cleanup_queue and
run queue any more.

So don't drain in-progress dispatch in blk_cleanup_queue().

This is basically revert of c2856ae2f3 ("blk-mq: quiesce queue before
freeing queue").

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:11 -06:00
Ming Lei
1b97871b50 blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release
hctx is always released after requeue is freed.

With holding queue's kobject refcount, it is safe for driver to run queue,
so one run queue might be scheduled after blk_sync_queue() is done.

So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release()
for avoiding run released queue.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:09 -06:00
Ming Lei
2f8f1336a4 blk-mq: always free hctx after request queue is freed
In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().

However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.

Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:08 -06:00
Ming Lei
7c6c5b7c91 blk-mq: split blk_mq_alloc_and_init_hctx into two parts
Split blk_mq_alloc_and_init_hctx into two parts, and one is
blk_mq_alloc_hctx() for allocating all hctx resources, another
is blk_mq_init_hctx() for initializing hctx, which serves as
counter-part of blk_mq_exit_hctx().

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org
Cc: Martin K . Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:06 -06:00
Ming Lei
c7e2d94b3d blk-mq: free hw queue's resource in hctx's release handler
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909
("blk-mq: Fix a use-after-free") fixes this issue exactly.

However, that commit introduces another issue. Before 45a9c9d909,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().

We have invented ways for addressing this kind of issue before, such as:

	8dc765d438 ("SCSI: fix queue cleanup race before queue initialization is done")
	c2856ae2f3 ("blk-mq: quiesce queue before freeing queue")

But still can't cover all cases, recently James reports another such
kind of issue:

	https://marc.info/?l=linux-scsi&m=155389088124782&w=2

This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.

Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.

This approach follows typical design pattern wrt. kobject's release handler.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart@broadcom.com>
Fixes: 45a9c9d909 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:05 -06:00
Ming Lei
fbc2a15e34 blk-mq: move cancel of requeue_work into blk_mq_release
With holding queue's kobject refcount, it is safe for driver
to schedule requeue. However, blk_mq_kick_requeue_list() may
be called after blk_sync_queue() is done because of concurrent
requeue activities, then requeue work may not be completed when
freeing queue, and kernel oops is triggered.

So moving the cancel of requeue_work into blk_mq_release() for
avoiding race between requeue and freeing queue.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:04 -06:00
Ming Lei
e87eb301be blk-mq: grab .q_usage_counter when queuing request from plug code path
Just like aio/io_uring, we need to grab 2 refcount for queuing one
request, one is for submission, another is for completion.

If the request isn't queued from plug code path, the refcount grabbed
in generic_make_request() serves for submission. In theroy, this
refcount should have been released after the sumission(async run queue)
is done. blk_freeze_queue() works with blk_sync_queue() together
for avoiding race between cleanup queue and IO submission, given async
run queue activities are canceled because hctx->run_work is scheduled with
the refcount held, so it is fine to not hold the refcount when
running the run queue work function for dispatch IO.

However, if request is staggered into plug list, and finally queued
from plug code path, the refcount in submission side is actually missed.
And we may start to run queue after queue is removed because the queue's
kobject refcount isn't guaranteed to be grabbed in flushing plug list
context, then kernel oops is triggered, see the following race:

blk_mq_flush_plug_list():
        blk_mq_sched_insert_requests()
                insert requests to sw queue or scheduler queue
                blk_mq_run_hw_queue

Because of concurrent run queue, all requests inserted above may be
completed before calling the above blk_mq_run_hw_queue. Then queue can
be freed during the above blk_mq_run_hw_queue().

Fixes the issue by grab .q_usage_counter before calling
blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
safe because the queue is absolutely alive before inserting request.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-04 07:24:02 -06:00
Sameer Pujar
f33e7bb3eb dmaengine: tegra210-adma: restore channel status
Status of ADMA channel registers is not saved and restored during system
suspend. During active playback if system enters suspend, this results in
wrong state of channel registers during system resume and playback fails
to resume properly. Fix this by saving following channel registers in
runtime suspend and restore during runtime resume.
 * ADMA_CH_LOWER_SRC_ADDR
 * ADMA_CH_LOWER_TRG_ADDR
 * ADMA_CH_FIFO_CTRL
 * ADMA_CH_CONFIG
 * ADMA_CH_CTRL
 * ADMA_CH_CMD
 * ADMA_CH_TC
Runtime PM calls will be inovked during system resume path if a playback
or capture needs to be resumed. Hence above changes work fine for system
suspend case.

Fixes: f46b195799 ("dmaengine: tegra-adma: Add support for Tegra210 ADMA")
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:42 +05:30
Sameer Pujar
f030e41950 dmaengine: tegra210-dma: free dma controller in remove()
Following kernel panic is seen during DMA driver unload->load sequence
==========================================================================
Unable to handle kernel paging request at virtual address ffffff8001198880
Internal error: Oops: 86000007 [#1] PREEMPT SMP
CPU: 0 PID: 5907 Comm: HwBinder:4123_1 Tainted: G C 4.9.128-tegra-g065839f
Hardware name: galen (DT)
task: ffffffc3590d1a80 task.stack: ffffffc3d0678000
PC is at 0xffffff8001198880
LR is at of_dma_request_slave_channel+0xd8/0x1f8
pc : [<ffffff8001198880>] lr : [<ffffff8008746f30>] pstate: 60400045
sp : ffffffc3d067b710
x29: ffffffc3d067b710 x28: 000000000000002f
x27: ffffff800949e000 x26: ffffff800949e750
x25: ffffff800949e000 x24: ffffffbefe817d84
x23: ffffff8009f77cb0 x22: 0000000000000028
x21: ffffffc3ffda49c8 x20: 0000000000000029
x19: 0000000000000001 x18: ffffffffffffffff
x17: 0000000000000000 x16: ffffff80082b66a0
x15: ffffff8009e78250 x14: 000000000000000a
x13: 0000000000000038 x12: 0101010101010101
x11: 0000000000000030 x10: 0101010101010101
x9 : fffffffffffffffc x8 : 7f7f7f7f7f7f7f7f
x7 : 62ff726b6b64622c x6 : 0000000000008064
x5 : 6400000000000000 x4 : ffffffbefe817c44
x3 : ffffffc3ffda3e08 x2 : ffffff8001198880
x1 : ffffffc3d48323c0 x0 : ffffffc3d067b788

Process HwBinder:4123_1 (pid: 5907, stack limit = 0xffffffc3d0678028)
Call trace:
[<ffffff8001198880>] 0xffffff8001198880
[<ffffff80087459f8>] dma_request_chan+0x50/0x1f0
[<ffffff8008745bc0>] dma_request_slave_channel+0x28/0x40
[<ffffff8001552c44>] tegra_alt_pcm_open+0x114/0x170
[<ffffff8008d65fa4>] soc_pcm_open+0x10c/0x878
[<ffffff8008d18618>] snd_pcm_open_substream+0xc0/0x170
[<ffffff8008d1878c>] snd_pcm_open+0xc4/0x240
[<ffffff8008d189e0>] snd_pcm_playback_open+0x58/0x80
[<ffffff8008cfc6d4>] snd_open+0xb4/0x178
[<ffffff8008250628>] chrdev_open+0xb8/0x1d0
[<ffffff8008246fdc>] do_dentry_open+0x214/0x318
[<ffffff80082485d0>] vfs_open+0x58/0x88
[<ffffff800825bce0>] do_last+0x450/0xde0
[<ffffff800825c718>] path_openat+0xa8/0x368
[<ffffff800825dd84>] do_filp_open+0x8c/0x110
[<ffffff8008248a74>] do_sys_open+0x164/0x220
[<ffffff80082b66dc>] compat_SyS_openat+0x3c/0x50
[<ffffff8008083040>] el0_svc_naked+0x34/0x38
---[ end trace 67e6d544e65b5145 ]---
Kernel panic - not syncing: Fatal exception
==========================================================================

In device probe(), of_dma_controller_register() registers DMA controller.
But when driver is removed, this is not freed. During driver reload this
results in data abort and kernel panic. Add of_dma_controller_free() in
driver remove path to fix the issue.

Fixes: f46b195799 ("dmaengine: tegra-adma: Add support for Tegra210 ADMA")
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:42 +05:30
Sameer Pujar
94dc8f4ed4 dmaengine: tegra210-adma: add pause/resume support
During an audio playback session it is observed that, audio goes off after
few seconds of continuous pause and play. No audio is heard even when the
playback is resumed.

The reason for above is, currently ADMA driver does not handle DMA_PAUSE/
DMA_RESUME and relevant callbacks for dma_device are not implemented. This
patch implements device_pause and device_resume callbacks for the device.
During pause TRANSFER_PAUSE bit of dma channel control register is set and
the same is cleared during resume.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:42 +05:30
Sameer Pujar
433de642a7 dmaengine: tegra210-adma: add support for Tegra186/Tegra194
Add Tegra186 specific macro defines and chip_data structure for chip
specific information. New compatibility is added to select relevant
chip details. There is no major change for Tegra194 and hence it can
use the same chip data.

The bits in the BURST_SIZE field of the ADMA CH_CONFIG register are
encoded differently on Tegra186 and Tegra194 compared with Tegra210.
On Tegra210 the bits are encoded as follows ...

 1 = WORD_1
 2 = WORDS_2
 3 = WORDS_4
 4 = WORDS_8
 5 = WORDS_16

Where as on Tegra186 and Tegra194 the bits are encoded as ...

 0 = WORD_1
 1 = WORDS_2
 2 = WORDS_3
 3 = WORDS_4
 4 = WORDS_5
 ...
 15 = WORDS_16

Add helper functions for generating the correct burst size.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:41 +05:30
Sameer Pujar
c0e74dd254 Documentation: DT: Add compatibility binding for Tegra186
Tegra186 Audio DMA controller has new updates from Tegra210 version.
Thus add new compatibility binding string and the same can be used
by Tegra194 as well.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:41 +05:30
Sameer Pujar
ded1f3db4c dmaengine: tegra210-adma: prepare for supporting newer Tegra chips
This is a preparatory patch to add support for Tegra186 and Tegra194 chips.
Following changes are necessary to make driver code generic.
 * chip_data structure is enhanced to have chip specific details and
   following are the additions to the structure
   * Offset addresses for ADMA global and channel registers
   * Offset values for Tx and Rx channel selection
   * Maximum supported Tx and Rx channels
   * Tx and Rx channel request mask
   * ADMA channel register space size
 * Make use of above chip_data to generalise the driver code

Support for Tegra186 and Tegra194 will be added in subsequent patches of
the series.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:13:41 +05:30
Dan Carpenter
0b515abb6b dmaengine: at_xdmac: remove a stray bottom half unlock
We switched this code from spin_lock_bh() to vanilla spin_lock() but
there was one stray spin_unlock_bh() that was overlooked.  This
patch converts it to spin_unlock() as well.

Fixes: d8570d018f ("dmaengine: at_xdmac: move spin_lock_bh to spin_lock in tasklet")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 16:11:02 +05:30
Krzysztof Kozlowski
e095189a54 dmaengine: fsl-edma: Adjust indentation
Fix indentation and remove unneeded space after 'return' keyword.  This
fixes checkpatch warning:
    WARNING: Statements should start on a tabstop

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 15:50:26 +05:30
Krzysztof Kozlowski
32685552fd dmaengine: fsl-edma: Fix typo in Vybrid name
Fix typo in comment for Vybrid SoC family.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 15:50:26 +05:30
Arnaud Pouliquen
2a4885abf5 dmaengine: stm32-dma: fix residue calculation in stm32-dma
In double buffer mode, during residue calculation, the DMA can
automatically switch to the next transfer. Indeed the CT bit that
gives position in the double buffer can has been updated by the
hardware, during calculation.
In this case the SxNDTR register value can not be trusted.
If a transition is detected we consider that the DMA has switched to
the beginning of next sg.

Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-05-04 15:46:58 +05:30
David Ahern
7fcd1e033d ipmr_base: Do not reset index in mr_table_dump
e is the counter used to save the location of a dump when an
skb is filled. Once the walk of the table is complete, mr_table_dump
needs to return without resetting that index to 0. Dump of a specific
table is looping because of the reset because there is no way to
indicate the walk of the table is done.

Move the reset to the caller so the dump of each table starts at 0,
but the loop counter is maintained if a dump fills an skb.

Fixes: e1cedae1ba ("ipmr: Refactor mr_rtm_dumproute")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:38:15 -04:00
Eelco Chaudron
a734d1f4c2 net: openvswitch: return an error instead of doing BUG_ON()
For all other error cases in queue_userspace_packet() the error is
returned, so it makes sense to do the same for these two error cases.

Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:36:36 -04:00
Heiner Kallweit
3aa4c491c5 r8169: remove rtl_write_exgmac_batch
rtl_write_exgmac_batch is used in only one place, so we can remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:34:50 -04:00
David S. Miller
8cca3397f6 Merge branch 'netlink-strict-attribute-checking-follow-up'
Michal Kubecek says:

====================
netlink: strict attribute checking follow-up

Three follow-up patches for recent strict netlink validation series.

Patch 1 fixes dump handling for genetlink families which validate and parse
messages themselves (e.g. because they need different policies for diferent
commands).

Patch 2 sets bad_attr in extack in one place where this was omitted.

Patch 3 adds new NL_VALIDATE_NESTED flags for strict validation to enable
checking that NLA_F_NESTED value in received messages matches expectations
and includes this flag in NL_VALIDATE_STRICT. This would change userspace
visible behavior but the previous switching to NL_VALIDATE_STRICT for new
code is still only in net-next at the moment.

v2: change error messages to mention NLA_F_NESTED explicitly
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:27:11 -04:00
Michal Kubecek
b424e432e7 netlink: add validation of NLA_F_NESTED flag
Add new validation flag NL_VALIDATE_NESTED which adds three consistency
checks of NLA_F_NESTED_FLAG:

  - the flag is set on attributes with NLA_NESTED{,_ARRAY} policy
  - the flag is not set on attributes with other policies except NLA_UNSPEC
  - the flag is set on attribute passed to nla_parse_nested()

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

v2: change error messages to mention NLA_F_NESTED explicitly
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:27:11 -04:00
Michal Kubecek
d54a16b201 netlink: set bad attribute also on maxtype check
The check that attribute type is within 0...maxtype range in
__nla_validate_parse() sets only error message but not bad_attr in extack.
Set also bad_attr to tell userspace which attribute failed validation.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:27:10 -04:00