Commit Graph

678703 Commits

Author SHA1 Message Date
Jiri Pirko
e25ea21ffa net: sched: introduce a TRAP control action
There is need to instruct the HW offloaded path to push certain matched
packets to cpu/kernel for further analysis. So this patch introduces a
new TRAP control action to TC.

For kernel datapath, this action does not make much sense. So with the
same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just
dropped in the datapath (and virtually ejected to an upper level, which
does not exist in case of kernel).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:23 -04:00
Colin Ian King
928a759593 net/mlxfw: remove redundant goto on error check
The check to see of err is set and the subsequent goto is extraneous
as the next statement is where the goto is jumping to. Remove this
redundant check and goto.

Detected by CoverityScan, CID#1437734 ("Identical code for
different branches")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:21:29 -04:00
David S. Miller
bb36314054 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWTZjxvSw1s6N8H32AQKwUQ/8CPF6CFwn+oS7cTkkI27sKaW43tTWyxxl
 1qAXjeI5dqrnmR+xW5Xu06HwO8TQKfum5dvXJLse5y15ttbK9/fevRW1IzcYxeHQ
 YcR414c0akIZ72hJ93LZypmLwlhEicZs4dZXrUs6f6WuqFLwYrt4K1MyrY8Bt+bM
 +a2yLVToF4L0nI/aAhoU0Hh0sNv4AP/PKrLWEzhLDq1Q6xBiQHSsrHLPOPkJ9QqA
 KOZWSjZJj8j7gSBoXtMwQiBxV76KptbksYQFLpy3EwL/r7z1qBPI0TOAKnLDLs5Q
 cDHf2uSUTrgfO7TIg02/SJcHm+8s0p3K585E9iK5JZ6BMjdSRfKR14nJdlWyXdZ5
 EvvEA7AlUpukHVv+CP+03sdBfkZ3PSb4sAQ+CbwY30SKwL1fRE26NW0fZa5lSmUt
 E1ixCxHPJXPnSZJAa5kePdWDgQjn2qJI+3Zh+jw0yaQ+rAgpP4M95xckeWdU9PKg
 8uFMM7Z1h70PnmVV3nX603MqyVivpKEZKHKTQgqGz4BvB1ZEu9noLTfwQCodXtns
 /8/8sVD65L4/SpHr1AM3Y+v7483bHth8edAI0k/QZerdKGImR+enrYBoSZ53QkEf
 TG8pvK74Tdpw2LQJsUIDvL5+oBO4FtPNOmT4UHbotenrVkF/4laIFcCVPW58scG1
 mB8kAUS+bzs=
 =M7Pr
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20170606' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Support service upgrade

Here's a set of patches that allow AF_RXRPC to support the AuriStor service
upgrade facility.  This allows the server to change the service ID
requested to an upgraded service if the client requests it upon the
initiation of a connection.

This is used by the AuriStor AFS-compatible servers to implement IPv6
handling and improved facilities by providing improved volume location,
volume, protection, file and cache management services.  Note that certain
parts of the AFS protocol carry hard-coded IPv4 addresses.

The reason AuriStor does it this way is that probing the improved service
ID first will not incur an ABORT or any other response on some servers if
the server is not listening on it - and so one have to employ a timeout.

This is implemented in the server by allowing an AF_RXRPC server to call
bind() twice on a socket to allow it to listen on two service IDs and then
call setsockopt() to instruct the server to upgrade one into the other if
the client requests it (by setting userStatus to 1 on the first DATA packet
on a connection).  If the upgrade occurs, all further operations on that
connection are done with the new service ID.  AF_RXRPC has to handle this
automatically as connections are not exposed to userspace.

Clients can request this facility by setting an RXRPC_UPGRADE_SERVICE
command in the sendmsg() control buffer and then observing the resultant
service ID in the msg_addr returned by recvmsg().  This should only be used
to probe the service.  Clients should then use the returned service ID in
all subsequent communications with that server.  Note that the kernel will
not retain this information should the connection expire from its cache.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:05:57 -04:00
David S. Miller
25f411501b Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-06-06

This series contains updates and fixes to e1000e and igb.

Matwey V Kornilov fixes an issue where igb_get_phy_id_82575() relies on
the fact that page 0 is already selected, but this is not the case after
igb_read_phy_reg_gs40g()/igb_write_phy_reg_gs40g() were removed in a
previous commit.  This leads to initialization failure and some devices
not working.  To fix the issue, explicitly select page 0 before first
access to PHY registers.

Arnd Bergmann modifies the driver to avoid a "defined but not used"
warning by removing #ifdefs and using __maybe_unused annotation instead
for new power management functions.

Jake provides most of the changes in the series, all around PTP and
timestamp fixes/updates.  Resolved several race conditions based on
the hardware can only handle one transmit timestamp at a time, so
fix the locking logic, as well as create a statistic for "skipped"
timestamps to help administrators identify issues.

Benjamin Poirier provides 2 changes, first to igb to remove the
second argument to igb_update_stats() since it always passes the
same two arguments.  So instead of having to pass the second argument,
just update the function to the necessary information from the adapter
structure.  Second modifies the e1000e_get_stats64() call to
dev_get_stats() to avoid ethtool garbage being reported.

Konstantin Khlebnikov modifies e1000e to use disable_hardirq(), instead
of disable_irq() for MSIx vectors in e1000_netpoll().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 11:58:16 -04:00
Konstantin Khlebnikov
fd8e597ba4 e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()
Replace disable_irq() which waits for threaded irq handlers with
disable_hardirq() which waits only for hardirq part.

Fixes: 3111912971 ("e1000: use disable_hardirq() for e1000_netpoll()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:06:06 -07:00
Benjamin Poirier
24ad2a9209 e1000e: Don't return uninitialized stats
Some statistics passed to ethtool are garbage because e1000e_get_stats64()
doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
memory to userspace and confuses users.

Do like ixgbe and use dev_get_stats() which first zeroes out
rtnl_link_stats64.

Fixes: 5944701df9 ("net: remove useless memset's in drivers get_stats64")
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:05:13 -07:00
Benjamin Poirier
81e3f64a9b igb: Remove useless argument
Given that all callers of igb_update_stats() pass the same two arguments:
(adapter, &adapter->stats64), the second argument can be removed.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:04:13 -07:00
Jacob Keller
e5f36ad14c igb: check for Tx timestamp timeouts during watchdog
The igb driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an igb_ptp_tx_hang() function similar to the already existing
igb_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:03:17 -07:00
Jacob Keller
c3b8f85ec2 igb: add statistic indicating number of skipped Tx timestamps
The igb driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:02:05 -07:00
Jacob Keller
cff5714145 e1000e: add statistic indicating number of skipped Tx timestamps
The e1000e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:01:27 -07:00
Jacob Keller
74344e32fc igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS
The igb driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.

The state bit lock is not properly cleaned up during
igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.

Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.

Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:53:48 -07:00
Jacob Keller
4ccdc013b0 igb: fix race condition with PTP_TX_IN_PROGRESS bits
Hardware related to the igb driver has a limitation of only handling one
Tx timestamp at a time. Thus, the driver uses a state bit lock to
enforce that only one timestamp request is honored at a time.

Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying the stack of
a new Tx timestamp. Even a well behaved application which sends only one
timestamp request at once and waits for a response might wake up and
send a new packet before the bit lock is cleared. This results in
needlessly dropping some Tx timestamp requests.

We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.

To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.

This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:53:07 -07:00
Jacob Keller
5012863b73 e1000e: fix race condition around skb_tstamp_tx()
The e1000e driver and related hardware has a limitation on Tx PTP
packets which requires we limit to timestamping a single packet at once.
We do this by verifying that we never request a new Tx timestamp while
we still have a tx_hwtstamp_skb pointer.

Unfortunately the driver suffers from a race condition around this. The
tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx()
is called. This function notifies the stack and applications of a new
timestamp. Even a well behaved application that only sends a new request
when the first one is finished might be woken up and possibly send
a packet before we can free the timestamp in the driver again. The
result is that we needlessly ignore some Tx timestamp requests in this
corner case.

Fix this by assigning the tx_hwtstamp_skb pointer prior to calling
skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb
until that function finishes. This ensures that the application is not
woken up until the driver is ready to begin timestamping a new packet.

This ensures that well behaved applications do not accidentally race
with condition to skip Tx timestamps. Obviously an application which
sends multiple Tx timestamp requests at once will still only timestamp
one packet at a time. Unfortunately there is nothing we can do about
this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:52:17 -07:00
Arnd Bergmann
000ba1f2eb igb: mark PM functions as __maybe_unused
The new wake function is only used by the suspend/resume handlers that
are defined in inside of an #ifdef, which can cause this harmless
warning:

drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function]

Removing the #ifdef, instead using a __maybe_unused annotation
simplifies the code and avoids the warning.

Fixes: b90fa87635 ("igb: Enable reading of wake up packet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:51:36 -07:00
Matwey V Kornilov
440aeca4b9 igb: Explicitly select page 0 at initialization
The functions igb_read_phy_reg_gs40g/igb_write_phy_reg_gs40g (which were
removed in 2a3cdea) explicitly selected the required page at every phy_reg
access. Currently, igb_get_phy_id_82575 relays on the fact that page 0 is
already selected. The assumption is not fulfilled for my Lex 3I380CW
motherboard with integrated dual i211 based gigabit ethernet. This leads to igb
initialization failure and network interfaces are not working:

    igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
    igb: Copyright (c) 2007-2014 Intel Corporation.
    igb: probe of 0000:01:00.0 failed with error -2
    igb: probe of 0000:02:00.0 failed with error -2

In order to fix it, we explicitly select page 0 before first access to phy
registers.

See also: https://bugzilla.suse.com/show_bug.cgi?id=1009911
See also: http://www.lex.com.tw/products/pdf/3I380A&3I380CW.pdf

Fixes: 2a3cdea ("igb: Remove GS40G specific defines/functions")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Matwey V Kornilov <matwey@sai.msu.ru>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:49:20 -07:00
Colin Ian King
9d15e5cc8c mdio: mux: fix an incorrect less than zero error check using a u32
The u32 variable v is being checked to see if an error return is
less than zero and this check has no effect because it is unsigned.
Fix this by making v and int (this also matches the type of
cb->bus_number which is assigned to the value in v).

Detected by CoverityScan, CID#1440454 ("Unsigned compared against zero")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 17:45:51 -04:00
Icenowy Zheng
2f878491b3 net-next: stmmac: dwmac-sun8i: ensure the EPHY is properly reseted
The EPHY may be already enabled by bootloaders which have Ethernet
capability (e.g. current U-Boot). Thus it should be reseted properly
before doing the enabling sequence in the dwmac-sun8i driver, otherwise
the EMAC reset process may fail if no cable is plugged, and then fail
the dwmac-sun8i probing.

Tested on Orange Pi PC, One and Zero. All the boards fail to have
dwmac-sun8i probed with "EMAC reset timeout" without cable plugged
before, and with this fix they're now all able to successfully probe the
EMAC without cable plugged and then use the connection after a cable is
hot-plugged in.

Fixes: 9f93ac8d40 ("net-next: stmmac: Add dwmac-sun8i")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: is not as formal as Signed-off-by:.  It is a record that the acker
Reviewed-by: is similar.
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:03:52 -04:00
yuval.shaia@oracle.com
697dae1ee1 net/3com: Make el3_netdev_get_ecmd return void
Make return value void since function never returns meaningfull value.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:00:43 -04:00
yuval.shaia@oracle.com
82c01a84d5 net/{mii, smsc}: Make mii_ethtool_get_link_ksettings and smc_netdev_get_ecmd return void
Make return value void since functions never returns meaningfull value.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:00:42 -04:00
yuval.shaia@oracle.com
c7c6b8715a net/dec: Make __de_get_link_ksettings return void
Make return value void since function never return meaningfull value

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:00:41 -04:00
Jiri Pirko
8ec1507dc9 net: sched: select cls when cls_act is enabled
It really makes no sense to have cls_act enabled without cls. In that
case, the cls_act code is dead. So select it.

This also fixes an issue recently reported by kbuild robot:
[linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 10:56:36 -04:00
Rosen, Rami
4e2ec43654 genetlink: remove ops_list from genetlink header.
commit d91824c08f ("genetlink: register family ops as array") removed the
ops_list member from both genl_family and genl_ops; while the
documentation of genl_family was updated accordingly by this patch,
ops_list remained in the documentation of the genl_ops object.
This patch fixes it by removing ops_list from genl_ops documentation.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 10:54:55 -04:00
David Howells
4e255721d1 rxrpc: Add service upgrade support for client connections
Make it possible for a client to use AuriStor's service upgrade facility.

The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call.  This takes no parameters.

When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt.  If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.

Note that:

 (1) The choice of upgrade service is up to the server

 (2) Further client calls to the same server that would share a connection
     are blocked if an upgrade probe is in progress.

 (3) This should only be used to probe the service.  Clients should then
     use the returned service ID in all subsequent communications with that
     server (and not set the upgrade).  Note that the kernel will not
     retain this information should the connection expire from its cache.

 (4) If a server that supports upgrading is replaced by one that doesn't,
     whilst a connection is live, and if the replacement is running, say,
     OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
     server will not respond to packets sent to the upgraded connection.

     At this point, calls will time out and the server must be reprobed.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05 14:30:49 +01:00
David Howells
4722974d90 rxrpc: Implement service upgrade
Implement AuriStor's service upgrade facility.  There are three problems
that this is meant to deal with:

 (1) Various of the standard AFS RPC calls have IPv4 addresses in their
     requests and/or replies - but there's no room for including IPv6
     addresses.

 (2) Definition of IPv6-specific RPC operations in the standard operation
     sets has not yet been achieved.

 (3) One could envision the creation a new service on the same port that as
     the original service.  The new service could implement improved
     operations - and the client could try this first, falling back to the
     original service if it's not there.

     Unfortunately, certain servers ignore packets addressed to a service
     they don't implement and don't respond in any way - not even with an
     ABORT.  This means that the client must then wait for the call timeout
     to occur.

What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats.  Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.

In the protocol, the procedure is then as follows:

 (1) To request an upgrade, the first DATA packet in a new connection must
     have the userStatus set to 1 (this is normally 0).  The userStatus
     value is normally ignored by the server.

 (2) If the server doesn't support upgrading, the reply packets will
     contain the same service ID as for the first request packet.

 (3) If the server does support upgrading, all future reply packets on that
     connection will contain the new service ID and the new service ID will
     be applied to *all* further calls on that connection as well.

 (4) The RPC op used to probe the upgrade must take the same request data
     as the shadow call in the upgrade set (but may return a different
     reply).  GetCapability RPC ops were added to all standard sets for
     just this purpose.  Ops where the request formats differ cannot be
     used for probing.

 (5) The client must wait for completion of the probe before sending any
     further RPC ops to the same destination.  It should then use the
     service ID that recvmsg() reported back in all future calls.

 (6) The shadow service must have call definitions for all the operation
     IDs defined by the original service.


To support service upgrading, a server should:

 (1) Call bind() twice on its AF_RXRPC socket before calling listen().
     Each bind() should supply a different service ID, but the transport
     addresses must be the same.  This allows the server to receive
     requests with either service ID.

 (2) Enable automatic upgrading by calling setsockopt(), specifying
     RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
     unsigned shorts as the argument:

	unsigned short optval[2];

     This specifies a pair of service IDs.  They must be different and must
     match the service IDs bound to the socket.  Member 0 is the service ID
     to upgrade from and member 1 is the service ID to upgrade to.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05 14:30:49 +01:00
David Howells
28036f4485 rxrpc: Permit multiple service binding
Permit bind() to be called on an AF_RXRPC socket more than once (currently
maximum twice) to bind multiple listening services to it.  There are some
restrictions:

 (1) All bind() calls involved must have a non-zero service ID.

 (2) The service IDs must all be different.

 (3) The rest of the address (notably the transport part) must be the same
     in all (a single UDP socket is shared).

 (4) This must be done before listen() or sendmsg() is called.

This allows someone to connect to the service socket with different service
IDs and lays the foundation for service upgrading.

The service ID used by an incoming call can be extracted from the msg_name
returned by recvmsg().

Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05 14:30:49 +01:00
David Howells
68d6d1ae5c rxrpc: Separate the connection's protocol service ID from the lookup ID
Keep the rxrpc_connection struct's idea of the service ID that is exposed
in the protocol separate from the service ID that's used as a lookup key.

This allows the protocol service ID on a client connection to get upgraded
without making the connection unfindable for other client calls that also
would like to use the upgraded connection.

The connection's actual service ID is then returned through recvmsg() by
way of msg_name.

Whilst we're at it, we get rid of the last_service_id field from each
channel.  The service ID is per-connection, not per-call and an entire
connection is upgraded in one go.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05 14:30:49 +01:00
David S. Miller
aae1a2ce14 Merge branch 'mlxsw-Minor-cleanup'
Jiri Pirko says:

====================
mlxsw: Minor cleanup

Fix small issues I noticed during the refactoring.

First patch adds file name comments in the header file to make it clear
what goes where. Second patch fixes a typo and third patch simply aligns
RIF index allocation with similar allocations in the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:49 -04:00
Ido Schimmel
de5ed99e97 mlxsw: spectrum_router: Align RIF index allocation with existing code
The way we usually allocate an index is by letting the allocation
function return an error instead of an invalid index.

Do the same for RIF index.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:49 -04:00
Ido Schimmel
da0abcf93f mlxsw: Fix typo inside enumeration
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:48 -04:00
Ido Schimmel
cb4cc0e0b1 mlxsw: spectrum: Tidy up header file
Make it clear where functions are defined and move misplaced declaration
to their correct place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:49:48 -04:00
Yotam Gigi
a4e1ce24f7 mlxsw: spectrum: Rename the firmware file
Change the firmware file name to be in "mellanox" directory.

This commit is a followup to the linux-firmware commit a4c72696f5f4
("Mellanox: Add firmware for mlxsw_spectrum")

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:48:03 -04:00
David S. Miller
fc85c910ae Merge branch 'qed-vf-xdp'
Yuval Mintz says:

qed*: Support VF XDP attachment

====================
Each driver queue [Rx, Tx, XDP-forwarding] requires an allocated HW/FW
connection + configured queue-zone.

VF handling by the PF has several limitations that prevented adding the
capability to perform XDP at driver-level:

 - The VF assumes there's 1-to-1 correspondance between the VF queue and
   the used connection, meaning q<x> is always going to use cid<x>,
   whereas for its own queues the PF is acquiring a new cid per each new
   queue.

 - There's a 1-to-1 correspondate between the VF-queues and the HW queue
   zones. While this is necessary for Rx-queues [as the queue-zone
   contains the producer], transmission queues can share the underlaying
   queue-zone [only shared configuration is coalescing].
   But all VF<->PF communication mechanisms assume there's a single
   identifier that identify a queue [as queue-zone == queue], while
   sharing queue-zones requires passing additional information.

 - VFs currently don't try mapping a doorbell bar - there's a small
   doorbell window in the regview allowing VFs to doorbell up to 16
   connections; but this window isn's wide enough for the added XDP
   forwarding queues.

This series is going to add the necessary infrastrucutre to finally let
our VFs support XDP assuming both the PF and VF drivers are sufficiently
new [Legacy support would be retained both for older VFs and older PFs,
but both will be needed for this new support to work].
Basically, the various database driver maintains for its queue-cids
would be revised, and queue-cids would be identified using the
(queue-zone, unique index) pair. The TLV mechanism would then be
extended to allow VFs to communicate that unique-index as well as the
already provided queue-zone. Finally, the VFs would try to map their
doorbell bar and inform their PF that they're using it.

Almost all the changes are in qed, with exception of #3 [which does some
cleanup in qede as well] and #11 that actually enables the feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:32 -04:00
Mintz, Yuval
e7b80dece8 qede: VF XDP support
This introduces 2 changes needed for XDP to be supported for VFs:

 a. On VF-side, publish the NDO based on qed outputs

 b. On PF-side, request qed to allocate sufficient cids per-VF
    to allow the child vfs to support it

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
cbb8a12c08 qed: VF XDP support
The final addition on the qed front -
 - VFs would now require their PFs to provide multiple CIDs
 - Based on the availability of connections from PF, determine whether
   XDP is feasible and share it with qede via dev_info.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
1a850bfc9e qed: VFs to try utilizing the doorbell bar
VFs are currently not mapping their doorbell bar, instead relying
on the small doorbell window they have in their limited regview bar.

In order to increase the number of possible Tx connections [queues]
employeed by VF past 16, we need to start using the doorbell bar if
one such is exposed - VF would communicate this fact to PF which would
return the size-bar internally configured into chip, according to
which the VF would decide whether to actually utilize the doorbell
bar.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
08bc8f15e6 qed: Multiple qzone queues for VFs
This adds the infrastructure for supporting VFs that want to open
multiple transmission queues on the same queue-zone.
At this point, there are no VFs that actually request this functionality,
but later patches would remedy that.

 a. VF and PF would communicate the capability during ACQUIRE;
    Legacy VFs would continue on behaving as they do today

 b. PF would communicate number of supported CIDs to the VF
    and would enforce said limitation

 c. Whenever VF passes a request for a given queue configuration
    it would also pass an associated index within said queue-zone

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
007bc37179 qed: IOV db support multiple queues per qzone
Allow the infrastructure a PF maintains for each one of its VFs
to support multiple queue-cids on a single queue-zone.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
3b19f47820 qed: Make VF legacy a bitfield
Until now we used to have a single VF legacy compatibility mode,
one that affected the place of the Rx producers of those VFs [mostly].

As PF would soon support allocating CIDs for VFs instead of having
a static CID<->queue configuration for them, we'll need to have
an additional legacy mode since existing VFs would need to continue
on using the older mode of operation.

Change the infrastrucutre so that the legacy would be able to indicate
which of the legacy behaviors is needed for a given VF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
bbe3f233ec qed: Assign a unique per-queue index to queue-cid
When a queue-cid is allocated, assign an index inside that's
CID's queue-zone.

For PFs and VFS, this number is going to be unique and derive
from a per-queue-zone bitmap, while for PF's VFs queues the
number is currently going to constant; Later, we'd add the
capability of a VF to communicate such an index to its PF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:31 -04:00
Mintz, Yuval
3946497aff qed: Pass vf_params when creating a queue-cid
We're going to need additional information for queue-cids
that a PF creates for its VFs, so start by refactoring existing
logic used for initializing said struct into receiving a structure
encapsulating the VF-specific information that needs to be provided.

This also introduces QED_QUEUE_CID_SELF - each queue-cid would hold
an indication to whether it belongs to the hw-function holding it
[whether that's a PF or a VF], or else what's the VF id it belongs
to.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:30 -04:00
Mintz, Yuval
f604b17d7f qed*: L2 interface to use the SB structures directly
Part of an effort of a cleaner seperation between qed and the protocol
drivers, the L2 interface is to use the SB structure for initialization
purposes opaquely.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:30 -04:00
Mintz, Yuval
0db711bb26 qed: Create L2 queue database
First step in allowing a single PF/VF to open multiple queues on
the same queue zone is to add per-hwfn database of queue-cids
as a two-dimensional array where entry would be according to
[queue zone][internal index].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:30 -04:00
Mintz, Yuval
6bea61da17 qed: Add bitmaps for VF CIDs
Each PF has a bitmap for its own ranges of CIDs, to allow easy grabbing
of an available CID when such is needed. But VFs are not using the same
mechanism, instead relying on hard-coded CIDs [ queue-index == cid ].

As an infrastructure step toward increasing number of CIDs of VFs,
the PF is going to maintain bitmaps for the VF CIDs as well -
the bitmaps would be per-VF and the ranges would be the same [in HW all
VFs of a given PF have the same mapping of CIDs, and the HW is capable
of distinguishing between those according to the VF index]

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:08:30 -04:00
David S. Miller
a619cc8bed Merge branch 'skb-sgvec-overflow'
Jason A. Donenfeld says:

====================
net: Avoiding stack overflow in skb_to_sgvec

The recent bug with macsec and historical one with virtio have
indicated that letting skb_to_sgvec trounce all over an sglist
without checking the length is probably a bad idea. And it's not
necessary either: an sglist already explicitly marks its last
item, and the initialization functions are diligent in doing so.
Thus there's a clear way of avoiding future overflows.

So, this patchset, from a high level, makes skb_to_sgvec return
a potential error code, and then adjusts all callers to check
for the error code. There are two situations in which skb_to_sgvec
might return such an error:

   1) When the passed in sglist is too small; and
   2) When the passed in skbuff is too deeply nested.

So, the first patch in this series handles the issues with
skb_to_sgvec directly, and the remaining ones then handle the call
sites.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:48 -04:00
Jason A. Donenfeld
e2fcad58fd virtio_net: check return value of skb_to_sgvec always
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:48 -04:00
Jason A. Donenfeld
cda7ea6903 macsec: check return value of skb_to_sgvec always
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:47 -04:00
Jason A. Donenfeld
89a5ea9966 rxrpc: check return value of skb_to_sgvec always
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:47 -04:00
Jason A. Donenfeld
3f29770723 ipsec: check return value of skb_to_sgvec always
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:47 -04:00
Jason A. Donenfeld
48a1df6533 skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
This is a defense-in-depth measure in response to bugs like
4d6fa57b4d ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.

As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.

While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:01:47 -04:00
David S. Miller
a11227dcc3 Merge branch 'bpf-Add-BPF-support-to-all-perf_event'
Merge branch 'bpf-Add-BPF-support-to-all-perf_event'

Alexei Starovoitov says:

====================
bpf: Add BPF support to all perf_event

v3->v4: one more tweak to reject unsupported events at map
update time as Peter suggested

v2->v3: more refactoring to address Peter's feedback.
Now all perf_events are attachable and readable

v1->v2: address Peter's feedback. Refactor patch 1 to allow attaching
bpf programs to all event types and reading counters from all of them as well
patch 2 - more tests
patch 3 - address Dave's feedback and document bpf_perf_event_read()
and bpf_perf_event_output() properly
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:58:24 -04:00