This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb. The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from. If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used. The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.
In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.
The main use case for napi_alloc_skb would be for things such as copybreak
or page fragment based receive paths where an skb is allocated after the
data has been received instead of before.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache. By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context. The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.
I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag. First I updated the allocation to do either a 32K
allocation or an order 0 page. This is based on the changes in commmit
d9b2938aa where it was found that latencies could be reduced in case of
failures. Then I also rewrote the logic to work from the end of the page to
the start. By doing this the size value doesn't have to be used unless we
have run out of space for page fragments. Finally I cleaned up the atomic
bits so that we just do an atomic_sub_and_test and if that returns true then
we set the page->_count via an atomic_set. This way we can remove the extra
conditional for the atomic_read since it would have led to an atomic_inc in
the case of success anyway.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use MODULE_VERSION() now that dummy driver has a version.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The disable_irq_nosync function, not the disable_irq function, must be
used to disable the DMA channel interrupt from within the interrupt
service routine. Change the disable_irq call to disable_irq_nosync.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit fb9962f3ce ("tipc: ensure all name sequences are properly
protected with its lock") involves below errors:
net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock'
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hook a ndo_set_mac_address callback, update the internal Ethernet MAC in
the netdevice structure, and finally write that address down to the
UniMAC registers. If the interface is down, and most likely clock gated,
we do not update the registers but just the local copy, such that next
ndo_open() call will effectively write down the address.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu says:
====================
remove bridge mode BRIDGE_MODE_SWDEV
BRIDGE_MODE_SWDEV was introduced to indicate switchdev offloads
for bridging from user space (In other words to call into the hw switch
port driver directly). But user can use existing BRIDGE_FLAGS_SELF
to call into the hw switch port driver today. swdev mode is not required
anymore. So, this patch removes it.
v4 - v5
incorporate comments
- Define BRIDGE_MODE_UNDEF to handle cases where mode is not defined
- reverse the order of patches
- include patch comments in all patches
====================
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes bridge mode swdev.
Users can use BRIDGE_FLAGS_SELF to indicate swdev offload
if needed.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove use of 'swdev' mode in rocker. rocker dev offloads
can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds mode BRIDGE_MODE_UNDEF for cases where mode is not needed.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-12-08
Please pull this last batch of pending wireless updates for the 3.19 tree...
For the wireless bits, Johannes says:
"This time I have Felix's no-status rate control work, which will allow
drivers to work better with rate control even if they don't have perfect
status reporting. In addition to this, a small hwsim fix from Patrik,
one of the regulatory patches from Arik, and a number of cleanups and
fixes I did myself.
Of note is a patch where I disable CFG80211_WEXT so that compatibility
is no longer selectable - this is intended as a wake-up call for anyone
who's still using it, and is still easily worked around (it's a one-line
patch) before we fully remove the code as well in the future."
For the Bluetooth bits, Johan says:
"Here's one more bluetooth-next pull request for 3.19:
- Minor cleanups for ieee802154 & mac802154
- Fix for the kernel warning with !TASK_RUNNING reported by Kirill A.
Shutemov
- Support for another ath3k device
- Fix for tracking link key based security level
- Device tree bindings for btmrvl + a state update fix
- Fix for wrong ACL flags on LE links"
And...
"In addition to the previous one this contains two more cleanups to
mac802154 as well as support for some new HCI features from the
Bluetooth 4.2 specification.
From the original request:
'Here's what should be the last bluetooth-next pull request for 3.19.
It's rather large but the majority of it is the Low Energy Secure
Connections feature that's part of the Bluetooth 4.2 specification. The
specification went public only this week so we couldn't publish the
corresponding code before that. The code itself can nevertheless be
considered fairly mature as it's been in development for over 6 months
and gone through several interoperability test events.
Besides LE SC the pull request contains an important fix for command
complete events for mgmt sockets which also fixes some leaks of hci_conn
objects when powering off or unplugging Bluetooth adapters.
A smaller feature that's part of the pull request is service discovery
support. This is like normal device discovery except that devices not
matching specific UUIDs or strong enough RSSI are filtered out.
Other changes that the pull request contains are firmware dump support
to the btmrvl driver, firmware download support for Broadcom BCM20702A0
variants, as well as some coding style cleanups in 6lowpan &
ieee802154/mac802154 code.'"
For the NFC bits, Samuel says:
"With this one we get:
- NFC digital improvements for DEP support: Chaining, NACK and ATN
support added.
- NCI improvements: Support for p2p target, SE IO operand addition,
SE operands extensions to support proprietary implementations, and
a few fixes.
- NFC HCI improvements: OPEN_PIPE and NOTIFY_ALL_CLEARED support,
and SE IO operand addition.
- A bunch of minor improvements and fixes for STMicro st21nfcb and
st21nfca"
For the iwlwifi bits, Emmanuel says:
"Major works are CSA and TDLS. On top of that I have a new
firmware API for scan and a few rate control improvements.
Johannes find a few tricks to improve our CPU utilization
and adds support for a new spin of 7265 called 7265D.
Along with this a few random things that don't stand out."
And...
"I deprecate here -8.ucode since -9 has been published long ago.
Along with that I have a new activity, we have now better
a infrastructure for firmware debugging. This will allow to
have configurable probes insides the firmware.
Luca continues his work on NetDetect, this feature is now
complete. All the rest is minor fixes here and there."
For the Atheros bits, Kalle says:
"Only ath10k changes this time and no major changes. Most visible are:
o new debugfs interface for runtime firmware debugging (Yanbo)
o fix shared WEP (Sujith)
o don't rebuild whenever kernel version changes (Johannes)
o lots of refactoring to make it easier to add new hw support (Michal)
There's also smaller fixes and improvements with no point of listing
here."
In addition, there are a few last minute updates to ath5k,
ath9k, brcmfmac, brcmsmac, mwifiex, rt2x00, rtlwifi, and wil6210.
Also included is a pull of the wireless tree to pick-up the fixes
originally included in "pull request: wireless 2014-12-03"...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
PTR_ALIGN macro after skb_reserve is redundant, because skb_reserve
function adjusts the alignment of skb->data.
Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both of 'boguscnt' and 'quota' have nearly meaning as the condition of
the reception loop.
In order to cut down redundant processing, this patch changes excess
judgement.
Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the queue length of sd->input_pkt_queue has been put into qlen,
and impossible to change, since hold the lock
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-12-09
This series contains updates to i40e and i40evf.
Jeff (me) provides a single patch to convert a macro to a static inline
function based on feedback from Joe Perches on a previous patch.
Shannon provides the remaining twelve patches against i40e. Almost all
of Shannon's patches cleanup/fix NVM issues varying in range from
adding more detail to debug messages, to removing dead code, to fixing
NVM state transitions after an error. Change the handy decoder interface
for admin queue return code to help catch and properly report the condition
as a useful errno rather than returning a misleading '0'. Added a range
check to avoid any possible array index-out-of-bound issues.
v2:
- fixed up patch 05 in the series to use the ARRAY_SIZE() macro as suggested
by Sergei Shtylyov
- fix up patch 13 to remove unnecessary parens in the return statement
as suggested by Sergei Shtylyov
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUhNqoAAoJECte4hHFiupU57IP/1ioNl+EkM8ZXCH+pZCsMuoF
S33lLQjJ2WEh2WZXDEJqGWdv7FRh5dUyRB67TpMCzQa8lsyPykapFAy4s1DEEZ46
EbsRHjkJdw+fg3dRGp33XPD55t2xXz9CYB7OuVGLjBEWdFb5a2by+JYCctTynqum
xI+qGo6IKcAvlyYAmiopZ+FOBUMhRo30GkkzVnoIsQn+Z1HYEdJ+QGryL1rOY01D
Gt4d+hZ6bT08yy+4ZB3Sr6/H3w4e8saUCS8H+JyLVYR+quM0T/uV4drqk/21kUNU
954LPu5GY5l6gYDEaki96Rc6DpuqsWlgy7oh1E3p9XN0vZFPEjmFXkic28hHpvKm
nDThB9qllwYUu9hmALaMuxkbRmJK/NvFwlzdtp0uZIiiENGGQrD368wiWxyzD3aP
HvthWTNM2E+T15gmmzUNnGPbaTWgxjp4G4wEucX/yLiZDTu0ftoFBvnRy3emWhI0
3N1Lf3ZBGYuHQvyUMWMgQ53nwuPuDgcVy/wYEUu11rI4zFcP7OmrznPhtnwfwQmz
lMppDC0d3L0PGjI4/oKXJAXrCuAVldv+eLFOpHaJuXU+VuglEOpetjUDMv2A0hbQ
23HcX+rIRd+8M8H+RtAYrqmocAOw70/cy0NzuLfI8a7kOW9H55dHADx4IFTae2E+
X1dBTj1EHrIlyw6lkC9e
=icE1
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-3.19-20141207' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2014-12-07
this is a pull request of 8 patches for net-next/master.
Andri Yngvason contributes 4 patches in which the CAN state change
handling is consolidated and unified among the sja1000, mscan and
flexcan driver. The three patches by Jeremiah Mahler fix spelling
mistakes and eliminate the banner[] variable in various parts. And a
patch by me that switches on sparse endianess checking by default.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 95bd09eb27 ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)
At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).
This is bad because :
- It sends small TSO packets even in Slow Start where rate quickly
increases.
- It tends to make socket write queue very big, increasing tcp_ack()
processing time, but also increasing memory needs, not necessarily
accounted for, as fast clones overhead is currently ignored.
- Lower GRO efficiency and more ACK packets.
Servers with a lot of small lived connections suffer from this.
Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.
Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.
As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.
Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
contains number of mss that we can put in GSO packet, and is not
related to the autosizing goal anymore.
Tested:
40 ms rtt link
nstat >/dev/null
netperf -H remote -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"
Before patch :
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/s
87380 2000000 2000000 0.36 44.22
IpInReceives 600 0.0
IpOutRequests 599 0.0
TcpOutSegs 1397 0.0
IpExtOutOctets 2033249 0.0
After patch :
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 2000000 2000000 0.36 44.27
IpInReceives 221 0.0
IpOutRequests 232 0.0
TcpOutSegs 1397 0.0
IpExtOutOctets 2013953 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... making both non-draining. That means that tcp_recvmsg() becomes
non-draining. And _that_ would break iscsit_do_rx_data() unless we
a) make sure tcp_recvmsg() is uniformly non-draining (it is)
b) make sure it copes with arbitrary (including shifted)
iov_iter (it does, all it uses is iov_iter primitives)
c) make iscsit_do_rx_data() initialize ->msg_iter only once.
Fortunately, (c) is doable with minimal work and we are rid of one
the two places where kernel send/recvmsg users would be unhappy with
non-draining behaviour.
Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
The exception is skcipher_recvmsg() and it also isn't hard to convert
to primitives (iov_iter_get_pages() is needed there). That'll wait
a bit - there's some interplay with ->sendmsg() path for that one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just use copy_from_iter(). That's what this method is trying to do
in all cases, in a very convoluted fashion.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
it'll die soon enough - now that kvec-backed iov_iter works regardless
of set_fs(), both instances will become copy_from_iter() as soon as
we introduce ->msg_iter...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace a misspelled function name by %s and then __func__.
This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a misspelled function name by %s and then __func__.
This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a misspelled function name by %s and then __func__.
This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a misspelled function name by %s and then __func__.
In the first case, the print is just dropped, because kmalloc itself does
enough error reporting.
This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function name contains cleanup, not clean.
This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device is already used as an ipvlan port then refuse to
use it as a macvlan port at early stage of port creation.
thost1:~# ip link add link eth0 ipvl0 type ipvlan
thost1:~# echo $?
0
thost1:~# ip link add link eth0 mvl0 type macvlan
RTNETLINK answers: Device or resource busy
thost1:~# echo $?
2
thost1:~#
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the port check [ipvlan_dev_master()] and device check
[ipvlan_dev_slave()] functions to netdevice.h and rename them
netif_is_ipvlan_port() and netif_is_ipvlan() resp. to be
consistent with macvlan api naming.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a device is already a macvlan port then refuse to use it as
an ipvlan port in the early stage of port creation.
thost1:~# ip link add link eth0 mvl0 type macvlan
thost1:~# echo $?
0
thost1:~# ip link add link eth0 ipvl0 type ipvlan
RTNETLINK answers: Device or resource busy
thost1:~# echo $?
2
thost1:~#
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to a check for macvlan device, netif_is_macvlan(), add
another function to check if a device is used as macvlan port.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit f886497212 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
The command 'ethtool -i' is useful to find details
about the interface like the device driver being used.
This was missing for dummy driver.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the erronous usage of an hexadecimal address in the
example, by replacing it with a decimal address.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Respect what the caller passed to ovs_tunnel_get_egress_info.
Fixes: 8f0aad6f35 ("openvswitch: Extend packet attribute for egress tunnel info")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit fbe168ba91 ("net: generic dev_disable_lro() stacked
device handling"), dev_disable_lro() zeroes NETIF_F_LRO feature flag
first for a macvlan device and then for its lower device. As an attempt
to set NETIF_F_LRO to zero is ignored, dev_disable_lro() issues a
warning and taints kernel.
Allowing NETIF_F_LRO to be set independently of the lower device
consists of three parts:
- add the flag to hw_features to allow toggling it
- allow setting it to 0 even if lower device has the flag set
- add the flag to MACVLAN_FEATURES to restore copying from lower
device on macvlan creation
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inline functions are preferred over macros when they can be used
interchangeably.
CC: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add a little more state context to an NVM update debug message.
Change-ID: I512160259052bcdbe5bdf1adf403ab2bf7984970
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Decoding the AQ return code is great except when the AQ send timed out
and there's no return code set. This changes the handy decoder
interface to help catch and properly report the condition as a useful
errno rather than returning a misleading '0'.
Change-ID: I07a1f94f921606da49ffac7837bcdc37cd8222eb
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>