Commit Graph

575695 Commits

Author SHA1 Message Date
Linus Torvalds
a9f70bd4e7 This includes two fixes.
The first is something that has come up a few times and has been
 worked out individually, but it's come up now enough that the problem
 should be generic. Tracepoints are protected by RCU sched. There are
 several tracepoints within core infrastructure like kfree().
 If a tracepoint is called when the CPU is going down, or when it's
 coming up but has yet to be recognized by RCU, a RCU warning is
 triggered. This is a true bug as that tracepoint is not protected by
 RCU. Usually, this is taken care of by testing for cpu online as
 a tracepoint condition. But as this is happening more often, moving
 it from a individual tracepoint to a check in the tracepoint infrastructure
 is more robust.
 
 Note, there is now a duplicate of a cpu online test, because this update
 does not remove the individual checks. But the overhead is small enough
 that the removal can be done in another release.
 
 The second change is strange linker breakage due to the branch tracer's
 builtin_constant_p() check failing, and treating the condition as a
 variable instead of a constant. Arnd Bergmann found that this can be
 fixed by testing !!(cond) instead of just (cond).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWw2vTAAoJEKKk/i67LK/8vkMIAI+Fx+S9sCeWVGp4VZ3DKH9K
 DibRD/2KREZe1AjYEU8ZAgo+VsFzW8OHiI1TI/1jP61YkiQSIhu6kVdPCoLG5buy
 8WwiKEQ94VWC1hbPOiiq3K7THEu+M8zuFdU3+odS8E3sXIGqKPKQ3iFwwfTVHI6o
 /cMTuefqsxo/hj8VwwaZdwlgWwLltM8sR040auTTEsqBLZ7D1q0aCyBrnju3FtBt
 uSIPK91d92ANkpq3ELDihxBa41XSEahYgGm/ozewjHwpooWvIQz4tpGaxxkyltuE
 RzeYBrM5LNBQUaXZ6C6jAdL0Y+bukS2MdNUjv8U6LwKbUvQoLuYteGEQ9g/m+mE=
 =8LDX
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This includes two fixes.

  The first is something that has come up a few times and has been
  worked out individually, but it's come up now enough that the problem
  should be generic.  Tracepoints are protected by RCU sched.  There are
  several tracepoints within core infrastructure like kfree().  If a
  tracepoint is called when the CPU is going down, or when it's coming
  up but has yet to be recognized by RCU, a RCU warning is triggered.

  This is a true bug as that tracepoint is not protected by RCU.
  Usually, this is taken care of by testing for cpu online as a
  tracepoint condition.  But as this is happening more often, moving it
  from a individual tracepoint to a check in the tracepoint
  infrastructure is more robust.

  Note, there is now a duplicate of a cpu online test, because this
  update does not remove the individual checks.  But the overhead is
  small enough that the removal can be done in another release.

  The second change is strange linker breakage due to the branch
  tracer's builtin_constant_p() check failing, and treating the
  condition as a variable instead of a constant.  Arnd Bergmann found
  that this can be fixed by testing !!(cond) instead of just (cond)"

* tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix freak link error caused by branch tracer
  tracepoints: Do not trace when cpu is offline
2016-02-17 11:35:41 -08:00
Sowmini Varadhan
ba94272d08 i40e: use eth_platform_get_mac_address()
This commit converts commit b499ffb0a2 ("i40e: Look up MAC address in
Open Firmware or IDPROM") to use eth_platform_get_mac_address()
added by commit c7f5d10549 ("net: Add eth_platform_get_mac_address()
helper.")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 09:24:35 -08:00
Alan
18f922d037 blk: fix overflow in queue_discard_max_hw_show
We get this right for queue_discard_max_show but not max_hw_show. Follow the
same pattern as queue_discard_max_show instead so that we don't truncate.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-17 10:20:42 -07:00
Anjali Singhai Jain
72b7486980 i40e: add priv flag for automatic rule eviction
The X722 can support automatic rule eviction for automatically added
flow director rules.  Feature is (should be) disabled by default.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 09:13:27 -08:00
Anjali Singhai
a340c7895a i40e: Enable Geneve offload for FW API ver > 1.4 for XL710/X710 devices
This patch makes sure we check the GENEVE offload capable flag before
we attempt offload.
It also enables the Capability for XL710/X710 devices with FW API
version higher than 1.4

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 09:04:30 -08:00
Jesse Brandeburg
4580de0de4 i40e/i40evf: bump version to 1.4.12/1.4.8
Bump driver versions to i40e-1.4.12 and i40evf-1.4.8

Change-ID: I0ad82668c4ded04250391fda396ce191a42ab754
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 09:03:54 -08:00
Jacob Keller
45d043597d i40e: avoid large memcpy by assigning struct
Assign the i40e_pf structure directly instead of using a large memcpy,
which avoids a sparse warning and lets the compiler optimize the copy
since it knows the size of the structure in advance.

Change-ID: I17604e23be2616521eb760290befcb767b52b3f7
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:57:44 -08:00
Jesse Brandeburg
c40918c3ad i40e: count allocation errors
Driver already counted allocation errors, so print
them as part of the ethtool -S output.  Useful for debugging
if your system is having trouble making memory available for
the driver.

Change-ID: I83839fa86e81e6d80f03b917c88dd3ef9a64dde0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:53:46 -08:00
Jesse Brandeburg
426bda0fe4 i40e: drop unused function
Delete the unused irq_dynamic_disable function.

Change-ID: Ia46071066babd121c7c90f141b6210b00078de3f
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Anjali Singhai <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:47:25 -08:00
Shannon Nelson
2f0aff4151 i40e: negate PHY int mask bits
The PHY interrupt mask bits mask out the events we don't want,
so we need to negate the bitmask of events we want.

Change-ID: I273244da5a8d285b6abc84fd68a90f1e6fa0393e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:42:59 -08:00
Kiran Patil
7bd6875bef i40e: APIs to Add/remove port mirroring rules
This patch implements necessary functions related to port
mirroring features such as add/delete mirror rule, function
to set promiscuous VLAN mode for VSI if mirror rule_type is
"VLAN Mirroring".

Change-ID: Iaf513fd5f188f99dcb977b48f99e73185dfddc40
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:38:41 -08:00
Jesse Brandeburg
c53934c6d1 i40e: fix: do not sleep in netdev_ops
The driver was being called by VLAN, bonding, teaming operations
that expected to be able to hold locks like rcu_read_lock().

This causes the driver to be held to the requirement to not sleep,
and was found by the kernel debug options for checking sleep
inside critical section, and the locking validator.

Change-ID: Ibc68c835f5ffa8ffe0638ffe910a66fc5649a7f7
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Nelson, Shannon <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:31:37 -08:00
Jesse Brandeburg
38c3cec73c i40e: allocate memory safer
The sync_vsi_filter function was allocating memory in such
a way that it could sleep (GFP_KERNEL) which was causing a problem
when called by the team driver under rcu_read_lock(), which cannot
be held while sleeping.  Found with lockdep.

Change-ID: I4e59053cb5eedcf3d0ca151715be3dc42a94bdd5
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:27:21 -08:00
Shannon Nelson
e9f6563d7b i40e: do TSO only if CHECKSUM_PARTIAL is set
Don't bother trying to set up a TSO if the skb->ip_summed is not
set to CHECKSUM_PARTIAL.

Change-ID: I6495b3568e404907a2965b48cf3e2effa7c9ab55
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:20:45 -08:00
Jesse Brandeburg
3578fa0a8c i40e: fix bug in dma sync
Driver was using an offset based off a DMA handle while mapping and
unmapping using sync_single_range_for[cpu|device], where it should
be using DMA handle (returned from alloc_coherent) and the offset of the
memory to be sync'd.

Change-ID: I208256565b1595ff0e9171ab852de06b997917c6
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Nelson, Shannon <shannon.nelson@intel.com>
Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:16:18 -08:00
Jesse Brandeburg
d89d967f71 i40e: trivial: fix missing space
Missing space in comment, fixed.

Change-ID: I8cdf3ce5994b4a97dcc3eeb33422533918546667
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 08:11:52 -08:00
Jesse Brandeburg
f3699b3c57 i40e: trivial: drop duplicate definition
The probe routine already had a u32 val declared, no need
to do it again.  Found by W=2 compile.

Change-ID: Id7b65f6d0ef6bb71067d0557f5be0202b6d8741e
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-17 07:59:18 -08:00
David Rivshin
d148bbd37a drivers: net: cpsw-phy-sel: add dev_warn() for unsupported PHY mode
The cpsw-phy-sel driver supports only MII, RMII, and RGMII PHY modes,
and silently handled any other values as if MII was specified. In a
case where the PHY mode was incorrectly specified, or a bug elsewhere,
there would be no indication of a problem. If MII was the correct mode,
then this will go unnoticed, otherwise the symptom will be a failure
to transmit/receive data over the RMII/RGMII link.

Add a dev_warn() to make this condition obvious and provide a
breadcrumb to follow.

Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David Rivshin <drivshin@allworx.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:50:15 -05:00
Woojung.Huh@microchip.com
cd772de358 phy: keep pause flags in phy driver features
genphy_config_init() masked out pause flags set in phy driver structure.
Pause flags needs to be preserved in phydev->supported &
phydev->advertising.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:48:07 -05:00
David S. Miller
1543b765d2 Merge branch 'mlx4-fixes'
Or Gerlitz says:

====================
Mellanox 10/40G mlx4 driver fixes for 4.5-rc

Bunch of fixes from the team to the mlx4 Eth and core drivers.

Series generated against net commit aac8d3c "qmi_wwan: add "4G LTE usb-modem U901""

Please push patches 1,2 and 6 to -stable  as well

changes from v0:
 - handled another wrongly accounted HW counter in patch #1 (Rick)
 - fixed coding style issues in patch #4 (Sergei)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
Eugenia Emantayev
925ab1aa93 net/mlx4_en: Avoid changing dev->features directly in run-time
It's forbidden to manually change dev->features in run-time. Currently, this is
done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
VXLAN tunnel is set. However, since the stack actually does features intersection
with hw_enc_features, we can safely revert to advertizing features early when
registering the netdevice.

Fixes: f4a1edd561 ('net/mlx4_en: Advertize encapsulation offloads [...]')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
Huy Nguyen
85743f1eb3 net/mlx4_core: Set UAR page size to 4KB regardless of system page size
problem description:

The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.

solution:

Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.

Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.

Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.

Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.

Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
Daniel Jurgens
22e3817e6c net/mlx4_core: Do not BUG_ON during reset when PCI is offline
The PCI channel could go offline during reset due to EEH.  Don't bug on in
this case, the error is recoverable.

Fixes: f6bc11e426 ('net/mlx4_core: Enhance the catas flow to support device reset')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:26 -05:00
Eran Ben Elisha
6b94bab0ee net/mlx4_core: Fix potential corruption in counters database
The error flow in procedure handle_existing_counter() is wrong.

The procedure should exit after encountering the error, not continue
as if everything is OK.

Fixes: 68230242cd ('net/mlx4_core: Add port attribute when tracking counters')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:26 -05:00
Eugenia Emantayev
31c128b66e net/mlx4_en: Choose time-stamping shift value according to HW frequency
Previously, the shift value used for time-stamping was constant and didn't
depend on the HW chip frequency. Change that to take the frequency into account
and calculate the maximal value in cycles per wraparound of ten seconds. This
time slot was chosen since it gives a good accuracy in time synchronization.

Algorithm for shift value calculation:
 * Round up the maximal value in cycles to nearest power of two

 * Calculate maximal multiplier by division of all 64 bits set
   to above result

 * Then, invert the function clocksource_khz2mult() to get the shift from
   maximal mult value

Fixes: ec693d4701 ('net/mlx4_en: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:25 -05:00
Amir Vadai
281e8b2fdf net/mlx4_en: Count HW buffer overrun only once
RdropOvflw counts overrun of HW buffer, therefore should
be used for rx_fifo_errors only.

Currently RdropOvflw counter is mistakenly also set into
rx_missed_errors and rx_over_errors too, which makes the
device total dropped packets accounting to show wrong results.

Fix that. Use it for rx_fifo_errors only.

Fixes: c27a02cd94 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:25 -05:00
Eran Ben Elisha
c2bab61981 IB/mlx4: Add support for the port info class for RoCE ports
Report that driver supports IB_PMA_CLASS_CAP_EXT_WIDTH in respond for
IB_MGMT_CLASS_PERF_MGMT mad with IB_PMA_CLASS_PORT_INFO attr id.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-17 10:07:20 -05:00
Eran Ben Elisha
c3c0c83667 IB/mlx4: Add support for extended counters over RoCE ports
When attribute IB_PMA_PORT_COUNTERS_EXT is set, we now return 64 bit
values for the counters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-17 10:07:20 -05:00
Devesh Sharma
b41f7852f3 RDMA/ocrdma: Fix arm logic to align with new cq API
Today ocrdma driver defer arming the CQ till poll is called.
This was used to prevent calling poll-cq on an armed CQ.

Recently a set of new CQ API has been introduced into the linux
kernel. The implementation of this API guarantees that a given
CQ is never armed before calling poll on it. Most of the kernel
ULPs have already moved to use this new API or have a code where
poll is called before arming the CQ.

Thus, the above workaround in ocrdma is not needed anymore.
This patch removes the additional logic to deffer arm till poll
is called. This patch adds a simple scheme where ib_req_notify_cq()
will actually arm the cq.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-17 10:07:20 -05:00
David S. Miller
f551588595 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-02-16

This series contains updates to i40e/i40evf only.

Shannon adds flags to MAC allocation requests to signify that the MAC VLAN
filters should come from the shared resource pool.  Added a new "set switch
config" admin queue command and the new Cisco VXLAN-GPE cloud tunnel type
for the admin queue commands.  Added more detail to the NVM update debug
message in order to see the full ethtool request data.  Also added a few
more bits of netdev data into the debugfs output for dump VSI.

Pandi fixes the width of two datatypes which were being declared a different
size from what they are assigned.

Anjali fixes an issue where we were not doing write-back on interrupt
throttle for legacy case in x722.

Mitch adds a counter for ARQ overflows since sometimes an ever-growing
number indicates that something bad is happening.  Also added 20G speed for
Tx bandwidth calculations.

Jesse refactors the DCB function based on a community suggestion to change
the multi-level if statement into a switch statement.  Cleans up VF device
IDs in the PF, since it does not need to know them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:48:50 -05:00
David S. Miller
86f447783f Merge branch 'tc-hw-offload'
John Fastabend says:

====================
tc offload for cls_u32 on ixgbe

This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.

Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.

... v3 includes a couple style fixups suggested by Jiri and a
quick fix to ixgbe to check features flag and not dev_features
flag the latter being always set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:37 -05:00
John Fastabend
db956ae882 net: ixgbe: abort with cls u32 divisor groups greater than 1
This patch ensures ixgbe will not try to offload hash tables from the
u32 module. The device class does not currently support this so until
it is enabled just abort on these tables.

Interestingly the more flexible your hardware is the less code you
need to implement to guard against these cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:37 -05:00
John Fastabend
b82b17d929 net: ixgbe: add support for tc_u32 offload
This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 800: order 1 \
	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent ffff: prio 49152
 #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
	handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
	u32 ht 800: order 1 link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 1: order 3 match tcp src 23 ffff action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent ffff:

<-- hardware cleaned up -->

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
9d35cf062e net: ixgbe: add minimal parser details for ixgbe
This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
3b01cf56da net: tc: helper functions to query action types
This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
1c78c64e9c net: add tc offload feature flag
Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
a1b7c5fd7f net: sched: add cls_u32 offload hooks for netdevs
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
16e5cc6471 net: rework setup_tc ndo op to consume general tc operand
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
John Fastabend
e4c6734eaa net: rework ndo tc op to consume additional qdisc handle parameter
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
Alexey Kardashevskiy
6ecad912a0 powerpc/ioda: Set "read" permission when "write" is set
Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty of
platforms. However IODA2 is strict about this and produces an EEH when
"read" permission is not set and reading happens.

This adds a workaround in the IODA code to always add the "read" bit
when the "write" bit is set.

Fixes: 10b35b2b74 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE")
Cc: stable@vger.kernel.org # 4.2+
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-17 23:52:17 +11:00
Marek Szyprowski
722ec35f7f arm64: dma-mapping: fix handling of devices registered before arch_initcall
This patch ensures that devices, which got registered before arch_initcall
will be handled correctly by IOMMU-based DMA-mapping code.

Cc: <stable@vger.kernel.org>
Fixes: 13b8629f65 ("arm64: Add IOMMU dma_ops")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-17 11:48:01 +00:00
Ville Syrjälä
8d409cb3e8 drm/i915: Fix hpd live status bits for g4x
Looks like g4x hpd live status bits actually agree with the spec. At
least they do on the machine I have, and apparently on Nick Bowler's
g4x as well.

So gm45 may be the only platform where they don't agree. At least
that seems to be the case based on the (somewhat incomplete)
logs/dumps in [1], and Daniel has also tested this on his gm45
sometime in the past.

So let's change the bits to match the spec on g4x. That actually makes
the g4x bits identical to vlv/chv so we can just share the code
between those platforms, leaving gm45 as the special case.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=52361

Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Sonika Jindal <sonika.jindal@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Nick Bowler <nbowler@draconx.ca>
References: https://lists.freedesktop.org/archives/dri-devel/2016-February/100382.html
Reported-by: Nick Bowler <nbowler@draconx.ca>
Cc: stable@vger.kernel.org
Fixes: 237ed86c69 ("drm/i915: Check live status before reading edid")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455127145-20087-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 0780cd36c7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-02-17 11:52:54 +02:00
Thomas Gleixner
059fcd8cd1 perf/core: Plug potential memory leak in CPU_UP_PREPARE
If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated
and assigned hash has been freed already, but perf_event_init_cpu()
unconditionally allocates and assignes a new hash if the swhash is referenced.
By overwriting the pointer the existing hash is not longer accessible.

Verify that there is no hash assigned on this cpu before allocating and
assigning a new one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20160209201007.843269966@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:37:30 +01:00
Thomas Gleixner
27ca9236c9 perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for
CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the
swhash is referenced. If yes it allocates a new hash and stores the pointer in
the per cpu data structure.

But at this point the cpu is still online, so there must be a valid hash
already. By overwriting the pointer the existing hash is not longer
accessible.

Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20160209201007.763417379@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:37:29 +01:00
Thomas Gleixner
b4f75d44be perf/core: Remove bogus UP_CANCELED hotplug state
If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(),
which is a pointless exercise. The cpu is not online, so the smp function
calls return -ENXIO. So the result is a list walk to call noops.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20160209201007.682184765@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:37:28 +01:00
Michael S. Tsirkin
4e7f9df258 hpet: Drop stale URLs
Looks like the HPET spec at intel.com got moved.
It isn't hard to find so drop the link, just mention
the revision assumed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 09:39:56 +01:00
Stefan Haberland
12d319b920 s390/dasd: fix performance drop
Commit ca369d51b ("sd: Fix device-imposed transfer length limits")
introduced a new queue limit max_dev_sectors which limits the maximum
sectors for requests. The default value leads to small dasd requests
and therefor to a performance drop.
Set the max_dev_sectors value to the same value as the max_hw_sectors
to use the maximum available request size for DASD devices.

Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-17 09:24:07 +01:00
Toshi Kani
a82eee7424 x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices.  This problem
is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem
devices.  This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.

__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes).  The
BTT driver updates the BTT map table, which entry size is
4 bytes.  Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.

Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes.  The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.

Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 09:10:23 +01:00
Toshi Kani
ee9737c924 x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirements.

Also change numeric branch target labels to named local labels.

No code changed:

 arch/x86/lib/copy_user_64.o:

    text    data     bss     dec     hex filename
    1239       0       0    1239     4d7 copy_user_64.o.before
    1239       0       0    1239     4d7 copy_user_64.o.after

 md5:
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brian.boylston@hpe.com
Cc: dan.j.williams@intel.com
Cc: linux-nvdimm@lists.01.org
Cc: micah.parrish@hpe.com
Cc: ross.zwisler@linux.intel.com
Cc: vishal.l.verma@intel.com
Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
[ Small readability edits and added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 09:10:22 +01:00
Heiko Carstens
52499d93d6 s390/maccess: reduce stnsm instructions
When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.

This saves us a couple of cycles which could improve the speed of
memcpy_real.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-17 09:05:04 +01:00