Configure arp request broadcast filter if this
option is enabled, in order to allow only arp
request broadcasts to pass-in.
(A following patch will make this filter even narrower
by limiting the arp request to our own ip)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Broadcast filtering allows dropping broadcast
frames that don't match the configured patterns.
Use predefined filters, and configure them for
each associated station vif.
There is no need to optimize and attach the same
filter to multiple vifs, as a following patch
will configure each filter to have per-vif unique
values.
Configure the bcast filtering on assoc changes.
Add a new IWLWIFI_BCAST_FILTERING Kconfig option
in order to enable broadcast filtering.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
As a debug tool, we dump the SRAM from the device when an
error occurs. The main users of this want it in a different
format, so change the format to suit their needs.
Also - add a short delay between the prints to make sure
that the user space logger can catch up.
This happens only when the firmware asserts, and only when
fw_restart is set to 0 which is typically a testing
configuration.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Based on the Bluetooth activity grading, we can stop using
the shared antenna and ask the stations to honor the new
SMPS state.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
If a vif is in low latency mode, it should be in primary
channel.
Also tell BT Coex about the change when a vif enters or
exits low latency mode.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Limit the scheduling duration of bindings without a low-latency
interface in the firmware, this prevents those bindings from
occupying the medium for a period of time longer than what we
want for the other interfaces in low-latency mode.
As older firmware doesn't do anything with the max_duration field
and ignores it completely, there's no need for a firmware flag.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
If there is/are interface(s) in low-latency mode, reserve a
percentage (currently 64%) of the quota for that binding to
improve the quality of service for those interfaces. However,
if there's more than one binding that has low-latency, then
give up and don't reserve, we can't allocate more than 100%.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
While an interface is in low-latency mode, for now powersave
should be disabled for it, so take low-latency into account
in the powersave code and force powersave recalculation when
low-latency mode changes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
For various traffic use cases, we want to be able to treat multi-
channel scenarios differently. Introduce a low-latency framework
that currently only has a debugfs file to enable low-latency mode,
but can later be extended.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Notify scan completed if fw_restart flow isn't going to be run.
Otherwise, the scan will stay stack forever and mac80211 will
not be able to remove the interface.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Don't stop scheduled scan before reporting HW restart;
mac80211 was changed to reschedule it after reconfigure.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
In iwl_pcie_int_cause_non_ict, trans_pcie is used for lockdep
purposes only. Since this might not be enabled, trans_pcie
finds itself without user leading to a complaint from gcc.
Avoid using trans_pcie by inlining IWL_TRANS_GET_PCIE_TRANS.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Since we use IWL_MVM_STATION_COUNT all over the driver, we
need to make sure that it is the right constant to look at.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
We somtimes need to fetch the iwl_mvm_sta structure from a
station index - provide a helper to do that.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
iwlwifi-7260-8.ucode has been release. Warn if it is not
on the file system.
iwlwifi-7260-7.ucode is still supported for another kernel
version.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
The code seems fine, as buf won't be assigned when an error
is returned, but checking for the error first is easier to
understand.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Also handle the bypass mode in which the second CPU doesn't
interfere.
Signed-off-by: Eran Harary <eran.harary@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
This configuration is invalid for this family.
Signed-off-by: Eran Harary <eran.harary@intel.com>
Reviewed-by: Dor Shaish <dor.shaish@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
This register is not present in 8000 family devices.
There is prph register instead.
Signed-off-by: Eran Harary <eran.harary@intel.com>
Reviewed-by: Dor Shaish <dor.shaish@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
APMG HW block was removed in this NIC, hence, no need to
configure it.
Signed-off-by: Eran Harary <eran.harary@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
The identification of the hardware section in the NVM
of new devices has been changed, hence the need to add it
to iwl_cfg and adapt the code that uses this value
accordingly.
Signed-off-by: Eran Harary <eran.harary@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Newer firmware will support uAPSD clients in AP/GO mode, so complete
the driver support for it. The way it works is described in comments
in the code, but basically the driver just has to pass down all the
mac80211 requests and do accounting on agg/non-agg queues properly.
For older firmware, this doesn't change anything as it ignores the
fields used by the new firmware, and we only advertise uAPSD support
when the firmware does.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
The driver wasn't reading the NVM properly. While this
didn't lead to any issue until now, it seems that there
is an old version of the NVM in the wild.
In this version, the A band channels appear to be valid
but the SKU capabilities (another field of the NVM) says
that A band isn't supported at all.
With this specific version of the NVM, the driver would
think that A band is supported while the HW / firmware
don't. This leads to asserts.
Cc: <stable@vger.kernel.org> [3.10+]
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Not doing so will let BT kill our probe requests leading to
failures in scan.
Cc: <stable@vger.kernel.org> [3.10+]
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Merge misc fixes from Andrew Morton:
"A few hotfixes and various leftovers which were awaiting other merges.
Mainly movement of zram into mm/"
* emailed patches fron Andrew Morton <akpm@linux-foundation.org>: (25 commits)
memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path
Documentation/filesystems/vfs.txt: update file_operations documentation
mm, oom: base root bonus on current usage
mm: don't lose the SOFT_DIRTY flag on mprotect
mm/slub.c: fix page->_count corruption (again)
mm/mempolicy.c: fix mempolicy printing in numa_maps
zram: remove zram->lock in read path and change it with mutex
zram: remove workqueue for freeing removed pending slot
zram: introduce zram->tb_lock
zram: use atomic operation for stat
zram: remove unnecessary free
zram: delay pending free request in read path
zram: fix race between reset and flushing pending work
zsmalloc: add maintainers
zram: add zram maintainers
zsmalloc: add copyright
zram: add copyright
zram: remove old private project comment
zram: promote zram from staging
zsmalloc: move it under mm
...
Pull more powerpc bits from Ben Herrenschmidt:
"Here are a few more powerpc bits for this merge window. The bulk is
made of two pull requests from Scott and Anatolij that I had missed
previously (they arrived while I was away). Since both their branches
are in -next independently, and the content has been around for a
little while, they can still go in.
The rest is mostly bug and regression fixes, a small series of
cleanups to our pseries cpuidle code (including moving it to the right
place), and one new cpuidle bakend for the powernv platform. I also
wired up the new sched_attr syscalls"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (37 commits)
powerpc: Wire up sched_setattr and sched_getattr syscalls
powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var
powerpc: Make sure "cache" directory is removed when offlining cpu
powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space
powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform.
powerpc/pseries/cpuidle: smt-snooze-delay cleanup.
powerpc/pseries/cpuidle: Remove MAX_IDLE_STATE macro.
powerpc/pseries/cpuidle: Make cpuidle-pseries backend driver a non-module.
powerpc/pseries/cpuidle: Use cpuidle_register() for initialisation.
powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
powerpc: Fix 32-bit frames for signals delivered when transactional
powerpc/iommu: Fix initialisation of DART iommu table
powerpc/numa: Fix decimal permissions
powerpc/mm: Fix compile error of pgtable-ppc64.h
powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations
clk: corenet: Adds the clock binding
powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E
powerpc/512x: dts: add MPC5125 clock specs
powerpc/512x: clk: support MPC5121/5123/5125 SoC variants
powerpc/512x: clk: enforce even SDHC divider values
...
It is required to call put_device() if device_register() fails, so that
we give up the last reference to the device. Calling put_device allows
for mdiobus_release to be executed, kfreeing the bus.
Signed-off-by: Levente Kurusa <levex@linux.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We had a bug that prevented us from removing a station
after we entered the drain flow:
We assign sta to be NULL if it was an error value.
Then we tested it against -EBUSY, but forget to retrieve
the value again from mvm->fw_id_to_mac_id[sta_id].
Due to this bug, we ended up never removing the STA from
the firmware. This led to an firmware assert when we remove
the GO vif.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Configure scheduled scan to notify match found on every beacon
or probe response if the scan request doesn't contain valid ssid
list for filtering.
Without this configuration the FW passes all beacons to the host
but doesn't notify the stack that the scan results are ready for
processing.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
This can be useful to be able to spot the firmware version
from the error reports without needing to fetch it from
another place.
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
The iwlwifi scheduled scan implementation doesn't adhere to the
userspace API correctly - the API assumes that any new incoming
'incompatible' request (like scan or remain-on-channel for this
driver) will just cancel the scheduled scan. Instead our driver
relies on userspace cancelling it, thus breaking existing wpa_s
versions.
Cc: stable@vger.kernel.org [3.13]
Fixes: 35a000b7c1 ("iwlwifi: mvm: support sched scan if supported by the fw")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
The address pointer used in the function shouldn't be static
since it's local data only. Having it static causes races if
a single machine has two devices, as the pointer would be
shared between instances.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Pull networking fixes from David Miller:
"Several fixups, of note:
1) Fix unlock of not held spinlock in RXRPC code, from Alexey
Khoroshilov.
2) Call pci_disable_device() from the correct shutdown path in bnx2x
driver, from Yuval Mintz.
3) Fix qeth build on s390 for some configurations, from Eugene
Crosser.
4) Cure locking bugs in bond_loadbalance_arp_mon(), from Ding
Tianhong.
5) Must do netif_napi_add() before registering netdevice in sky2
driver, from Stanislaw Gruszka.
6) Fix lost bug fix during merge due to code movement in ieee802154,
noticed and fixed by the eagle eyed Stephen Rothwell.
7) Get rid of resource leak in xen-netfront driver, from Annie Li.
8) Bounds checks in qlcnic driver are off by one, from Manish Chopra.
9) TPROXY can leak sockets when TCP early demux is enabled, fix from
Holger Eitzenberger"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
qeth: fix build of s390 allmodconfig
bonding: fix locking in bond_loadbalance_arp_mon()
tun: add device name(iff) field to proc fdinfo entry
DT: net: davinci_emac: "ti, davinci-no-bd-ram" property is actually optional
DT: net: davinci_emac: "ti, davinci-rmii-en" property is actually optional
bnx2x: Fix generic option settings
net: Fix warning on make htmldocs caused by skbuff.c
llc: remove noisy WARN from llc_mac_hdr_init
qlcnic: Fix loopback test failure
qlcnic: Fix tx timeout.
qlcnic: Fix initialization of vlan list.
qlcnic: Correct off-by-one errors in bounds checks
net: Document promote_secondaries
net: gre: use icmp_hdr() to get inner ip header
i40e: Add missing braces to i40e_dcb_need_reconfig()
xen-netfront: fix resource leak in netfront
net: 6lowpan: fixup for code movement
hyperv: Add support for physically discontinuous receive buffer
sky2: initialize napi before registering device
net: Fix memory leak if TPROXY used with TCP early demux
...
The commit 1d3ee88ae0
(bonding: add netlink attributes to slave link dev)
has add rtmsg_ifinfo() in bond_set_active_slave() and
bond_set_backup_slave(), so the two function need to
called in RTNL lock, but bond_loadbalance_arp_mon()
only calling these functions in RCU, warning message
will occurs.
fix this by add a new function bond_slave_state_change(),
which will reset the slave's state after slave link check,
so remove the bond_set_xxx_slave() from the cycle and only
record the slave_state_changed, this will call the new
function to set all slaves to new state in RTNL later.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A file descriptor opened for /dev/net/tun and a tun device are
connected with ioctl. Though understanding the connection is
important for trouble shooting, no way is given to a user to know
the connected device for a given file descriptor at userland.
This patch adds a new fdinfo field for the device name connected to
a file descriptor opened for /dev/net/tun.
Here is an example of the field:
# lsof | grep tun
qemu-syst 4565 qemu 25u CHR 10,200 0t138 12921 /dev/net/tun
...
# cat /proc/4565/fdinfo/25
pos: 138
flags: 0104002
iff: vnet0
# ip link show dev vnet0
8: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
changelog:
v2: indent iff just like the other fdinfo fields are.
v3: remove unused variable.
Both are suggested by David Miller <davem@davemloft.net>.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
<<
Switch mpc512x to the common clock framework and adapt mpc512x
drivers to use the new clock driver. Old PPC_CLOCK code is
removed entirely since there are no users any more.
>>
When user tried to change generic options using "ethtool -s" command, while SFP
module is plugged out or during module detection, the command would have failed
with "Unsupported port type" message. The fix is to ignore the port option in
case it's same as the current port configuration.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver was returning from link event handler without
setting linkup variable
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o __qlcnic_down call's netif_tx_disable which in turn stops
all the TX queues, corresponding start queue was missing in
__qlcnic_up which was leading to tx timeout.
o The commit b84caae486
(qlcnic: Fix usage of netif_tx_{wake, stop} api during link change.)
exposed this issue.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Do not re-initialize vlan list in case of adapter reset.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Bound checks should be >= instead of > for number of receive descriptors
and number of receive rings.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull powerpc updates from Ben Herrenschmidt:
"So here's my next branch for powerpc. A bit late as I was on vacation
last week. It's mostly the same stuff that was in next already, I
just added two patches today which are the wiring up of lockref for
powerpc, which for some reason fell through the cracks last time and
is trivial.
The highlights are, in addition to a bunch of bug fixes:
- Reworked Machine Check handling on kernels running without a
hypervisor (or acting as a hypervisor). Provides hooks to handle
some errors in real mode such as TLB errors, handle SLB errors,
etc...
- Support for retrieving memory error information from the service
processor on IBM servers running without a hypervisor and routing
them to the memory poison infrastructure.
- _PAGE_NUMA support on server processors
- 32-bit BookE relocatable kernel support
- FSL e6500 hardware tablewalk support
- A bunch of new/revived board support
- FSL e6500 deeper idle states and altivec powerdown support
You'll notice a generic mm change here, it has been acked by the
relevant authorities and is a pre-req for our _PAGE_NUMA support"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits)
powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
powerpc: Add support for the optimised lockref implementation
powerpc/powernv: Call OPAL sync before kexec'ing
powerpc/eeh: Escalate error on non-existing PE
powerpc/eeh: Handle multiple EEH errors
powerpc: Fix transactional FP/VMX/VSX unavailable handlers
powerpc: Don't corrupt transactional state when using FP/VMX in kernel
powerpc: Reclaim two unused thread_info flag bits
powerpc: Fix races with irq_work
Move precessing of MCE queued event out from syscall exit path.
pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
powerpc: Make add_system_ram_resources() __init
powerpc: add SATA_MV to ppc64_defconfig
powerpc/powernv: Increase candidate fw image size
powerpc: Add debug checks to catch invalid cpu-to-node mappings
powerpc: Fix the setup of CPU-to-Node mappings during CPU online
powerpc/iommu: Don't detach device without IOMMU group
powerpc/eeh: Hotplug improvement
powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space
powerpc/eeh: Add restore_config operation
...
Indentation mismatch spotted with Coverity.
Introduced in 4e3b35b044 ("i40e: add DCB and DCBNL support")
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes grant transfer releasing code from netfront, and uses
gnttab_end_foreign_access to end grant access since
gnttab_end_foreign_access_ref may fail when the grant entry is
currently used for reading or writing.
* clean up grant transfer code kept from old netfront(2.6.18) which grants
pages for access/map and transfer. But grant transfer is deprecated in current
netfront, so remove corresponding release code for transfer.
* fix resource leak, release grant access (through gnttab_end_foreign_access)
and skb for tx/rx path, use get_page to ensure page is released when grant
access is completed successfully.
Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
for them will be created separately.
V6: Correct subject line and commit message.
V5: Remove unecessary change in xennet_end_access.
V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
single patch.
V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
grant acess is ended.
V2: Improve patch comments.
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow us to use bigger receive buffer, and prevent allocation failure
due to fragmented memory.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we're calling it from under RCU context, however we're using some
functions that require rtnl to be held.
Fix this by restructuring the locking - don't call it under any locks,
aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
slave present), and use rtnl locking otherwise - if we need to modify
(in)active flags of a slave.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
however to work with curr_active_slave we're still holding the
curr_slave_lock.
To remove that curr_slave_lock - rcu_dereference the bond's
curr_active_slave and use it further - so that we're sure the slave won't
go away, and we don't care if it will change in the meanwhile.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Submission d9aee59 "bnx2x: Don't release PCI bars on shutdown" separated
the PCI remove and shutdown flows, but pci_disable_device() is still
being called on both.
As a result, a dev_WARN_ONCE will be hit during shutdown for every bnx2x
VF probed on a hypervisor (as its shutdown callback will be called and later
pci_disable_sriov() will call its remove callback).
This calls the pci_disable_device() only on the remove flow.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logging the MAC address on every if-up, is not really useful, and annoying when
there is no cable inserted and NetworkManager tries the ifup every 50 seconds.
Also change the log level from warning to info, as that is what it is.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logging a PTP error on hw which simply does not support PTP is not very
useful. Moreover this message gets logged on every if-up, and if there is
no cable inserted NetworkManager will re-try the ifup every 50 seconds.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/8390/apne.c: In function ‘apne_probe1’:
drivers/net/ethernet/8390/apne.c:215: warning: unused variable ‘ei_local’
Introduced by commit c45f812f02 ("8390 :
Replace ei_debug with msg_enable/NETIF_MSG_* feature"), which added the
variable without using it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
- Flow steering for InfiniBand UD traffic
- IP-based addressing for IBoE aka RoCE
- Pass SRP submaintainership from Dave to Bart
- SRP transport fixes from Bart
- Add the new Cisco usNIC low-level device driver
- Various other fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABCAAGBQJS4rzgAAoJEENa44ZhAt0hOL4P/jrNK2GeXqfzWIURL5MYtG9A
YK8xnRonlKQNo850E0WuC5wHCHqQ6Pqze+PL1rgR/MegNGrQ577qKo2eYumnMHSy
NO2BNhHa+5cUf04dXWOeJgyMTqo7CKwO7trZ6KwD+HFBAZqLDTFHPklH0qMI2bF6
U8HbKVslrvaDL3PywHop9Gxh9fWKY8ngw7LWPKkm5PQ0BFw8lZLOrGhWYr1MfJoY
iptf+wqehqlO8u7khfpo8tvar0hGbRYrUanx94RU/B5FbiQN936AXURtmbM+4MDD
o0QhzJKaaCmB1eYaeLsrEHyGcgAnifFPNzq/SLeRvL3TYIfMvTFWDECZDsdz0n5y
YuyuIQvs3FcbP9C014e2o8SXEfdJoR4Ht6XH2+wwDCD55t66ZnBHupYiVdYEJz09
UKBvvlY+v5cdzUmOeut21NgLHqQ/zpqWihfEFdTwXBNmKY27Ai9JILpFnrRCppYh
mawcEPKEXX1c0Adr1bXXsFBWhONgEOQFoth4FpVK31hJ1o2F9EyTZdLObcNHWcts
NdzOQ9S5UDcYN5OYfCM183cf/JmDBJB3Q01ms/1L2rhpLnoYA/Mj2BDzF+82FsMK
0BrPm7vX28a1mUVgLgpEpk1VEhJUJxzrmK8xogV9dU6vyGWmXPWhmBolLbPCx3SY
6dQ6u8v8tdl54CXStFuP
=wAWA
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"Main batch of InfiniBand/RDMA changes for 3.14:
- Flow steering for InfiniBand UD traffic
- IP-based addressing for IBoE aka RoCE
- Pass SRP submaintainership from Dave to Bart
- SRP transport fixes from Bart
- Add the new Cisco usNIC low-level device driver
- Various other fixes"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (75 commits)
IB/mlx5: Verify reserved fields are cleared
IB/mlx5: Remove old field for create mkey mailbox
IB/mlx5: Abort driver cleanup if teardown hca fails
IB/mlx5: Allow creation of QPs with zero-length work queues
mlx5_core: Fix PowerPC support
mlx5_core: Improve debugfs readability
IB/mlx5: Add support for resize CQ
IB/mlx5: Implement modify CQ
IB/mlx5: Make sure doorbell record is visible before doorbell
mlx5_core: Use mlx5 core style warning
IB/mlx5: Clear out struct before create QP command
mlx5_core: Fix out arg size in access_register command
RDMA/nes: Slight optimization of Ethernet address compare
IB/qib: Fix QP check when looping back to/from QP1
RDMA/cxgb4: Fix gcc warning on 32-bit arch
IB/usnic: Remove unused includes of <linux/version.h>
RDMA/amso1100: Add check if cache memory was allocated before freeing it
IPoIB: Report operstate consistently when brought up without a link
IB/core: Fix unused variable warning
RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal()
...
The adapter is freed before we check its flags. It was caused
by commit 144be3d ("net/cxgb4: Avoid disabling PCI device for
towice"). The problem was reported by Intel's "0-day" tool.
The patch fixes it to avoid reverting commit 144be3d.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Refactored MSI-x vector calculation for All adapters.
Decoupled logic in the code which was using same call to
request MSI-x vectors in default driver load, as well as
during set_channel() operation for TSS/RSS. This refactoring
simplifies code for TSS/RSS code path as well as probe path
of the driver load for all adapters.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Refactor configuration of interrupt coalescing parameters for
all supported adapters.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for MSI/MSI-X mode in poll controller routine.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Added hardware ops for interrupt enable/disable functions
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Dump each Tx queue details with all descriptors, queue indices
and Tx queue stats to imporve data colletion in situations
where Tx timeout occurs.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Use boolean type instead of u8.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the option conversion downdelay and updelay divide a u64
and on a 32 bit this causes the following errors:
ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!
Fix it by using a normal int instead because newval->value is capped
at INT_MAX by the way the option is defined.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of a workaround for a hardware erratum in the SFC9100 family
(SF bug 35388), the TX_DESC_UPD_DWORD register address is also used
for communicating with the event block, and only descriptor pointer
values < 2048 are valid.
If the TX DMA ring size is increased to 4096 descriptors (which the
firmware still allows) then we may write a descriptor pointer
value >= 2048, which has entirely different and undesirable effects!
Limit the TX DMA ring size correctly when this workaround is in
effect.
Fixes: 8127d661e7 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the practice set by commit 0afb166 "vxlan: Add capability
of Rx checksum offload for inner packet" is applied when the skb
goes through the portion of the RX code which is shared between
vxlan netdevices and ovs vxlan port instances.
Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this situation then ARRAY_SIZE() and sizeof() are the same, but we're
really dealing with array indexes and not byte offsets so ARRAY_SIZE()
is cleaner.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We possibly retrieve the adapter's statistics during EEH recovery
and that should be disallowed. Otherwise, it would possibly incur
replicate EEH error and EEH recovery is going to fail eventually.
The patch reuses statistics lock and checks net_device is attached
before going to retrieve statistics, so that the problem can be
avoided.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we have EEH error happens to the adapter and we have to remove
it from the system for some reasons (e.g. more than 5 EEH errors
detected from the device in last hour), the adapter will be disabled
for towice separately by eeh_err_detected() and remove_one(), which
will incur following unexpected backtrace. The patch tries to avoid
it.
WARNING: at drivers/pci/pci.c:1431
CPU: 12 PID: 121 Comm: eehd Not tainted 3.13.0-rc7+ #1
task: c0000001823a3780 ti: c00000018240c000 task.ti: c00000018240c000
NIP: c0000000003c1e40 LR: c0000000003c1e3c CTR: 0000000001764c5c
REGS: c00000018240f470 TRAP: 0700 Not tainted (3.13.0-rc7+)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28000024 XER: 00000004
CFAR: c000000000706528 SOFTE: 1
GPR00: c0000000003c1e3c c00000018240f6f0 c0000000010fe1f8 0000000000000035
GPR04: 0000000000000000 0000000000000000 00000000003ae509 0000000000000000
GPR08: 000000000000346f 0000000000000000 0000000000000000 0000000000003fef
GPR12: 0000000028000022 c00000000ec93000 c0000000000c11b0 c000000184ac3e40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0000000009398d8 c00000000101f9c0 c0000001860ae000
GPR28: c000000182ba0000 00000000000001f0 c0000001860ae6f8 c0000001860ae000
NIP [c0000000003c1e40] .pci_disable_device+0xd0/0xf0
LR [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0
Call Trace:
[c0000000003c1e3c] .pci_disable_device+0xcc/0xf0 (unreliable)
[d0000000073881c4] .remove_one+0x174/0x320 [cxgb4]
[c0000000003c57e0] .pci_device_remove+0x60/0x100
[c00000000046396c] .__device_release_driver+0x9c/0x120
[c000000000463a20] .device_release_driver+0x30/0x60
[c0000000003bcdb4] .pci_stop_bus_device+0x94/0xd0
[c0000000003bcf48] .pci_stop_and_remove_bus_device+0x18/0x30
[c00000000003f548] .pcibios_remove_pci_devices+0xa8/0x140
[c000000000035c00] .eeh_handle_normal_event+0xa0/0x3c0
[c000000000035f50] .eeh_handle_event+0x30/0x2b0
[c0000000000362c4] .eeh_event_handler+0xf4/0x1b0
[c0000000000c12b8] .kthread+0x108/0x130
[c00000000000a168] .ret_from_kernel_thread+0x5c/0x74
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The threshold values for RX interrupt mitigation
are different for AR9003 and AR9002 families.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The commit "ath9k: Process GTT interrupts" accidentally
had a line that was commented out.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the right function to update frequency value.
If rx skb is probe response or beacon, the wrong frequency value can
cause problem that bss info can't be updated when it should be.
Cc: <stable@vger.kernel.org>
Fixes: 8318d78a44 ("cfg80211 API for channels/bitrates, mac80211
and driver conversion")
Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
8 bytes preamble
14 bytes src/dst/eth_type
6 bytes 0xff:0xff..
http://en.wikipedia.org/wiki/Wake-on-LAN#Magic_packethttp://en.wikipedia.org/wiki/EtherType
This will fail if we VLAN or the magic packet is encapsulated
as a UDP packet...
Signed-off-by: Andreas Fenkart <afenkart@gmail.com>
Acked-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Do not continue with cleanup flow. If this ever happens we can check which
resources remained open.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Fix derivation of sub-page index from the dma address in free_4k.
2. Fix the DMA address passed to dma_unmap_page by masking it properly.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use strings to display transport service or state of QPs. Use numeric
value for MTU of a QP.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Implement resize CQ which is a mandatory verb in mlx5.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Modify CQ is used by ULPs like IPoIB to change moderation parameters. This
patch adds support in mlx5.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use mlx5_core_warn(), which is the standard warning emitter function, instead
of pr_warn().
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Output structs are expected by firmware to be cleared when a command is called.
Clear the "out" struct instead of "dout" which is used only later.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The output size should be the sum of the core access reg output struct
plus the size of the specific register data provided by the caller.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- FIFO event channels. Key advantages: support for over 100,000 events (2^17),
16 different event priorities, improved fairness in event latency through
the use of FIFOs.
- Xen PVH support. "It’s a fully PV kernel mode, running with paravirtualized
disk and network, paravirtualized interrupts and timers, no emulated devices
of any kind (and thus no qemu), no BIOS or legacy boot — but instead of
requiring PV MMU, it uses the HVM hardware extensions to virtualize the
pagetables, as well as system calls and other privileged operations."
(from "The Paravirtualization Spectrum, Part 2: From poles to a spectrum")
Bug-fixes:
- Fixes in balloon driver (refactor and make it work under ARM)
- Allow xenfb to be used in HVM guests.
- Allow xen_platform_pci=0 to work properly.
- Refactors in event channels.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS4BmLAAoJEFjIrFwIi8fJ4SAH/iNGESowgMhfW64vRA8pBWq+
NRJpUjYjjwmbxpwoNl6NPwn15cIXFyc3sMtvvrDD3taRDyko2RFuT+NTjpO05xPh
d/cRpRXpXERHoiFgPf/WTp7ONBDhvPtHG0+BzJKwgqEIOUYXdbhD+gEjaVlFJScS
CAY68OLmk7XYMSZBNzPfKNbSCyhVgZF7wpaimK9lxZBKsFRCDIq6jIyrAsC8epIL
6V/V4l2S6lk/uUeGB6ULphYeINjI2kkpbSfCd1vyenLfWpVscc2o8uWEYFcZMAxy
V4HpsoseuqrfdDqgPfud3VgogdISvbkCvDfW85rzfDP4MWxei2mVHFtJ/gSBV+g=
=ToNG
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen updates from Konrad Rzeszutek Wilk:
"Two major features that Xen community is excited about:
The first is event channel scalability by David Vrabel - we switch
over from an two-level per-cpu bitmap of events (IRQs) - to an FIFO
queue with priorities. This lets us be able to handle more events,
have lower latency, and better scalability. Good stuff.
The other is PVH by Mukesh Rathor. In short, PV is a mode where the
kernel lets the hypervisor program page-tables, segments, etc. With
EPT/NPT capabilities in current processors, the overhead of doing this
in an HVM (Hardware Virtual Machine) container is much lower than the
hypervisor doing it for us.
In short we let a PV guest run without doing page-table, segment,
syscall, etc updates through the hypervisor - instead it is all done
within the guest container. It is a "hybrid" PV - hence the 'PVH'
name - a PV guest within an HVM container.
The major benefits are less code to deal with - for example we only
use one function from the the pv_mmu_ops (which has 39 function
calls); faster performance for syscall (no context switches into the
hypervisor); less traps on various operations; etc.
It is still being baked - the ABI is not yet set in stone. But it is
pretty awesome and we are excited about it.
Lastly, there are some changes to ARM code - you should get a simple
conflict which has been resolved in #linux-next.
In short, this pull has awesome features.
Features:
- FIFO event channels. Key advantages: support for over 100,000
events (2^17), 16 different event priorities, improved fairness in
event latency through the use of FIFOs.
- Xen PVH support. "It’s a fully PV kernel mode, running with
paravirtualized disk and network, paravirtualized interrupts and
timers, no emulated devices of any kind (and thus no qemu), no BIOS
or legacy boot — but instead of requiring PV MMU, it uses the HVM
hardware extensions to virtualize the pagetables, as well as system
calls and other privileged operations." (from "The
Paravirtualization Spectrum, Part 2: From poles to a spectrum")
Bug-fixes:
- Fixes in balloon driver (refactor and make it work under ARM)
- Allow xenfb to be used in HVM guests.
- Allow xen_platform_pci=0 to work properly.
- Refactors in event channels"
* tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (52 commits)
xen/pvh: Set X86_CR0_WP and others in CR0 (v2)
MAINTAINERS: add git repository for Xen
xen/pvh: Use 'depend' instead of 'select'.
xen: delete new instances of __cpuinit usage
xen/fb: allow xenfb initialization for hvm guests
xen/evtchn_fifo: fix error return code in evtchn_fifo_setup()
xen-platform: fix error return code in platform_pci_init()
xen/pvh: remove duplicated include from enlighten.c
xen/pvh: Fix compile issues with xen_pvh_domain()
xen: Use dev_is_pci() to check whether it is pci device
xen/grant-table: Force to use v1 of grants.
xen/pvh: Support ParaVirtualized Hardware extensions (v3).
xen/pvh: Piggyback on PVHVM XenBus.
xen/pvh: Piggyback on PVHVM for grant driver (v4)
xen/grant: Implement an grant frame array struct (v3).
xen/grant-table: Refactor gnttab_init
xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
xen/pvh: Piggyback on PVHVM for event channels (v2)
xen/pvh: Update E820 to work with PVH (v2)
xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
...
Like bonding, team as netdevice doesn't cross netns boundaries.
Team ports and team itself live in same netns.
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like bridge, bonding as netdevice doesn't cross netns boundaries.
Bonding ports and bonding itself live in same netns.
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow user to identify easily what the attributes are related to. Change
the name of the group to "bonding_slave" to be similar to master which
is named "bonding".
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
A patch for fixing a race between queue selection and changing queues
was introduced in commit 92bb73ea2("tuntap: fix a possible race between
queue selection and changing queues").
The fix was to prevent the driver from re-reading the tun->numqueues
more than once within tun_select_queue() using ACCESS_ONCE().
We have been experiancing 'Divide-by-zero' errors in tun_net_xmit()
since we moved from 3.6 to 3.10, and believe that they come from a
simular source where the value of tun->numqueues changes to zero
between the first and a subsequent read of tun->numqueues.
The fix is a simular use of ACCESS_ONCE(), as well as a multiply
instead of a divide in the if statement.
Signed-off-by: Dominic Curran <dominic.curran@citrix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Max Krasnyansky <maxk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial tree updates from Jiri Kosina:
"Usual rocket science stuff from trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
neighbour.h: fix comment
sched: Fix warning on make htmldocs caused by wait.h
slab: struct kmem_cache is protected by slab_mutex
doc: Fix typo in USB Gadget Documentation
of/Kconfig: Spelling s/one/once/
mkregtable: Fix sscanf handling
lp5523, lp8501: comment improvements
thermal: rcar: comment spelling
treewide: fix comments and printk msgs
IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
Documentation: update /proc/uptime field description
Documentation: Fix size parameter for snprintf
arm: fix comment header and macro name
asm-generic: uaccess: Spelling s/a ny/any/
mtd: onenand: fix comment header
doc: driver-model/platform.txt: fix a typo
drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
doc: Fix typo (acces_process_vm -> access_process_vm)
treewide: Fix typos in printk
drivers/gpu/drm/qxl/Kconfig: reformat the help text
...
If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per suggestion from Eric W. Biederman, vxlan should be using
{un,}register_pernet_subsys() instead of {un,}register_pernet_device()
to ensure the vxlan_net structure is initialized before and cleaned
up after all network devices in a given network namespace i.e. when
dealing with network notifiers. This is similarly handeled already in
commit 91e2ff3528 ("net: Teach vlans to cleanup as a pernet subsystem")
and, thus, improves upon fd27e0d44a ("net: vxlan: do not use vxlan_net
before checking event type"). Just as in 91e2ff3528, we do not need
to explicitly handle deletion of vxlan devices as network namespace
exit calls dellink on all remaining virtual devices, and
rtnl_link_unregister() calls dellink on all outstanding devices in that
network namespace, so we can entirely drop the pernet exit operation
as well. Moreover, on vxlan module exit, rcu_barrier() is called by
netns since commit 3a765edadb ("netns: Add an explicit rcu_barrier
to unregister_pernet_{device|subsys}"), so this may be omitted. Tested
with various scenarios and works well on my side.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so slaves would use
the new bonding option API. Also move the option to its own set function
in bond_options.c and fix some style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so lp_interval would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so resend_igmp would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so all_slaves_active would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so queue_id would use
the new bonding option API. Also move it to its own set function in
bond_options.c and fix some style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so use_carrier would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so primary_reselect would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so primary would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so miimon would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so num_peer_notif would use
the new bonding option API.
When the auto-sysfs generation is done an alias should be added for
this option as there're currently 2 entries in sysfs for it.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so ad_select would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so min_links would use
the new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so lacp_rate would use
the new bonding option API. Also some trivial/style error fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so updelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so downdelay would use
the new bonding option API. Also some trivial style fixes.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_ip_target would use
the new bonding option API. This option is an exception because of
the way it's currently implemented that's why its netlink code is
a bit different from the other options to keep the functionality as
before and at the same time to have a single set function.
This patch also fixes a few stylistic errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_interval would use
the new bonding option API. The "default" definition has been removed as
it was 0.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary basic infrastructure to support
centralized and unified option manipulation API for the bonding. The new
structure bond_option will be used to describe each option with its
dependencies on modes which will be checked automatically thus removing a
lot of duplicated code. Also automatic range checking is added for
some options. Currently the option setting function requires RTNL to
be acquired prior to calling it, since many options already rely on RTNL
it seemed like the best choice to protect all against common race
conditions.
In order to add an option the following steps need to be done:
1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id
and a bit corresponding to the id
2. Add a bond_option entry to the bond_opts[] array in bond_options.c which
describes the option, its dependencies and its manipulation function
3. Add code to export the option through sysfs and/or as a module parameter
(the sysfs export will be made automatically in the future)
The options can have different flags set, currently the following are
supported:
BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior
to setting the option
BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to
setting the option
BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the
option to parse
There's a new value structure to describe different types of values
which can have the following flags:
BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias
to this option is "default")
BOND_VALFLAG_MIN - the minimum value that this option can have
BOND_VALFLAG_MAX - the maximum value that this option can have
An example would be nice here, so if we have an option which can have
the values "off"(2), "special"(4, default) and supports a range, say
16 - 32, it should be defined as follows:
"off", 2,
"special", 4, BOND_VALFLAG_DEFAULT,
"rangemin", 16, BOND_VALFLAG_MIN,
"rangemax", 32, BOND_VALFLAG_MAX
So we have the valid intervals: [2, 2], [4, 4], [16, 32]
Also the valid strings: "off" = 2, "special" and "default" = 4
"rangemin" = 16, "rangemax" = 32
BOND_VALFLAG_(MIN|MAX) can be used to specify a valid range for an
option, if MIN is omitted then 0 is considered as a minimum. If an
exact match is found in the values[] table it will be returned,
otherwise the range is tried (if available).
The option parameter passing is done by using a special structure called
bond_opt_value which can take either a string or a value to parse. One
of the bond_opt_init(val|str) macros should be used depending on which
one does the user want to parse (string or value). Then a call to
__bond_opt_set should be done under RTNL.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.
This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:
1) Initialization:
int l = ceil(log_2 d)
uword m' = floor((1<<32)*((1<<l)-d)/d)+1
int sh_1 = min(l,1)
int sh_2 = max(l-1,0)
2) For q = n/d, all uword:
uword t = (n*m')>>32
q = (t+((n-t)>>sh_1))>>sh_2
The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.
Joint work with Daniel Borkmann.
[1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
[2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
[3] https://gmplib.org/~tege/division-paper.pdf
[4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
[5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
[6] http://www.agner.org/optimize/asmlib.zip
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many functions have open coded a function that returns a random
number in range [0,N-1]. Under the assumption that we have a PRNG
such as taus113 with being well distributed in [0, ~0U] space,
we can implement such a function as uword t = (n*m')>>32, where
m' is a random number obtained from PRNG, n the right open interval
border and t our resulting random number, with n,m',t in u32 universe.
Lets go with Joe and simply call it prandom_u32_max(), although
technically we have an right open interval endpoint, but that we
have documented. Other users can further be migrated to the new
prandom_u32_max() function later on; for now, we need to make sure
to migrate reciprocal_divide() users for the reciprocal_divide()
follow-up fixup since their function signatures are going to change.
Joint work with Hannes Frederic Sowa.
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a fix to a regression introduced by commit:
"982290a net/mlx4_core: Check port number for validity
before accessing data"
IPoIB could not attach to multicast group and we get this in dmesg:
[144214.145008] ib0: failed to attach to multicast group, ret = -22
[144214.145016] ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
[144214.145019] ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
The cause to the problem is because port is extracted from gid[5].
Which is only valid for Ethernet.
Removed this validation in mlx4_qp_attach_common(), which is accessed
from both Ethernet and IB flows.
Error flow for bad port value in Ethernet is already exists in that
function.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic to put interface into VLAN Promiscous mode is not correct.
We should increment "adapter->vlans_added" before calling be_vid_config().
Also removed some unwanted log messages.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VF originating from a given PF is flr-ed, that PF gets an interrupt
from the chip management and takes a part in the flr process.
This patch fixes several corner cases in which the driver performs its part
of the flr flow out-of-order, causing the FW to assert due to badly timed
messages received from the driver.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a description to the vxlan module, helping save the world
from the minions of destruction and confusion.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Somehow I transposed these two while bringing the original statistics
support upstream.
Fixes: 99691c4ac1 ('sfc: Add PTP counters to ethtool stats')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PORT_RESET MC command was NOP in the ef10 firmware hence we are using
ENTITY_RESET to make sure all resource allocations are reset.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull RCU updates from Ingo Molnar:
- add RCU torture scripts/tooling
- static analysis improvements
- update RCU documentation
- miscellaneous fixes
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h
rcu: Remove "extern" from function declarations in include/linux/*rcu*.h
rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow
rcu: Don't activate RCU core on NO_HZ_FULL CPUs
rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq
rcu: Add an RCU_INITIALIZER for global RCU-protected pointers
rcu: Make rcu_assign_pointer's assignment volatile and type-safe
bonding: Use RCU_INIT_POINTER() for better overhead and for sparse
rcu: Add comment on evaluate-once properties of rcu_assign_pointer().
rcu: Provide better diagnostics for blocking in RCU callback functions
rcu: Improve SRCU's grace-period comments
rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values
rcu: Fix coccinelle warnings
rcutorture: Stop tracking FSF's postal address
rcutorture: Move checkarg to functions.sh
rcutorture: Flag errors and warnings with color coding
rcutorture: Record results from repeated runs of the same test scenario
rcutorture: Test summary at end of run with less chattiness
rcutorture: Update comment in kvm.sh listing typical RCU trace events
rcutorture: Add tracing-enabled version of TREE08
...
Pull m68k updates from Geert Uytterhoeven:
- Zorro bus cleanups and UAPI revival
- Bootinfo cleanups and UAPI revival
- Kexec support
- Memory size reductions and bug fixes for multi-platform kernels
- Polled interrupt support for Atari EtherNAT, EtherNEC and NetUSBee
- Machine-specific random_get_entropy()
- Defconfig updates and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (46 commits)
m68k/mac: Make SCC reset work more reliably
m68k/irq - Use polled IRQ flag for MFP timer cascaded interrupts
m68k: Update defconfigs for v3.13-rc1
m68k/defconfig: Enable EARLY_PRINTK
m68k/mm: kmap spelling/grammar fixes
m68k: Convert arch/m68k/kernel/traps.c to pr_*()
m68k: Convert arch/m68k/mm/fault.c to pr_*()
m68k/mm: Check for mm != NULL in do_page_fault() debug code
m68k/defconfig: Disable /sbin/hotplug fork-bomb by default
m68k/atari: Hide RTC_PORT() macro from rtc-cmos
m68k/amiga,atari: Fix specifying multiple debug= parameters
m68k/defconfig: Use ext4 for ext2/ext3 file systems
m68k: Add support to export bootinfo in procfs
m68k: Add kexec support
m68k/mac: Mark Mac IIsi ADB driver BROKEN
m68k/amiga: Provide mach_random_get_entropy()
m68k: Add infrastructure for machine-specific random_get_entropy()
m68k/atari: Call paging_init() before nf_init()
m68k: Remove superfluous inclusions of <asm/bootinfo.h>
m68k/UAPI: Use proper types (endianness/size) in <asm/bootinfo*.h>
...
None of the devices supported by iwldvm have support for
shadow registers. This means that we wake the NIC
when we increment the write pointer on Tx ring.
This happened even before my bad commit mentionned below.
Since my commit below, we wake up the NIC when we put a
host command on the ring regardless of shadow register
support. This means that in iwldvm (when the NIC doesn't
support shadow register), we wake up the NIC twice:
pcie_enqueue_hcmd:
wake up the NIC
iwl_pcie_txq_inc_wr_ptr:
wake up the NIC - no shadow reg support
Since waking up the NIC means that we need to acquire a
spinlock, this obviously leads to a recursive spinlock
and hence a freeze.
Fixes: b943949105 ("iwlwifi: pcie: keep the NIC awake when commands are in flight")
Reported-by: Janusz Dziedzic <janusz.dziedzic@gmail.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
When timestamping is enabled, stmmac_tx_clean will call
stmmac_get_tx_hwtstamp to get tx TS.
But the skb can be NULL because the last of its tx_skbuff is NULL
if this packet frame is filled in more than one descriptors.
To fix the issue, change the code:
- Store TX skb to the tx_skbuff[] of frame's last segment.
- Check skb is not NULL in stmmac_get_tx_hwtstamp.
Signed-off-by: Bruce Liu <damuzi000@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Allwinner A20 has an ethernet controller that seems to be
an early version of Synopsys DesignWare MAC 10/100/1000 Universal,
which is supported by the stmmac driver.
Allwinner's GMAC requires setting additional registers in the SoC's
clock control unit.
The exact version of the DWMAC IP that Allwinner uses is unknown,
thus the exact feature set is unknown.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stmmac driver core allows passing feature flags and callbacks via
platform data. Add a similar stmmac_of_data to pass flags and callbacks
tied to compatible strings. This allows us to extend stmmac with glue
layers for different SoCs.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The snps,phy-addr device tree property is non-standard, and should be
removed in favor of proper phy node support. Remove it from the binding
documents and warn if the property is still used.
Most PHYs respond to address 0, but a few don't, so auto-detect PHY
address by default, to make up for the lack of explicit address selection.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"snps,force_sf_dma_mode" is documented in stmmac device tree bindings,
but is never handled by the driver.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current .init and .exit callbacks requires access to driver
private data structures. This is not a good seperation and abstraction.
Instead, we add a new .setup callback for allocating private data, and
pass the returned pointer to the other callbacks.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DWMAC has a reset assert line, which is used on some SoCs. Add an
optional reset control to stmmac driver core.
To support reset control deferred probing, this patch changes the driver
probe function to return the actual error, instead of just -EINVAL.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stmmac driver does not enable the main clock during the probe phase.
If the clock was not enabled by the boot loader or was disabled by the
kernel, hardware features and the MDIO bus would not be probed properly.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code should avoid needless exports, don't export something unless it used.
Make local functions static and remove unused stubs.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch changed prototypes, but forgot functions.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.
Therefore, we need to extract them from the CQE and place them in
struct ib_wc (to be used for cases were they were taken from the gid).
Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the address
handle attributes.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
net/ipv4/tcp_metrics.c
Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.
Minor overlapping changes in bnx2x driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
This change merges the ixgbevf_tx_map call and the ixgbevf_tx_queue call
into a single function. In order to make room for this setting of cmd_type
and olinfo flags is done in separate functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch takes advantage of the dma buffer always being present in the
first descriptor and mapped as single. As such we can call dma_unmap_single
and don't need to check for DMA mapping in ixgbevf_clean_tx_irq().
In addition this patch makes use of the DMA API.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the first tx_buffer structure acts as a
central storage location for most of the info about the skb we are about
to transmit.
In addition this patch makes tx_flags part of the ixgbevf_tx_buffer struct.
This allows us to use the flags directly from the stucture and as result
removes the tx_flags parameter from some functions. Also as a cleanup
mapped_as_page is folded into tx_flags and some unused flags were removed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds counters for tx_restart_queue and tx_timeout_count.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the Tx/Rx counters for checksum offload.
The Tx counter was never updated and the Rx counter is of limited use.
This is in effort to clean up the counters and make them consistent
with the counters shown by ixgbe.
Also this patch removes some members of the adapter structure that were
never used and shuffles others to reduce number of holes.
before:
/* size: 1568, cachelines: 25, members: 48 */
/* sum members: 1519, holes: 10, sum holes: 43 */
/* padding: 6 */
/* last cacheline: 32 bytes */
after:
/* size: 1480, cachelines: 24, members: 43 */
/* sum members: 1479, holes: 1, sum holes: 1 */
/* last cacheline: 8 bytes */
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves hot-path specific statistics into the ring structure.
This allows us to drop the adapter structure in some functions and should
help with performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the code by removing the adapter structure as
parameter from multiple functions. The adapter structure was previously
being used to access the dev pointer, but this can also be done via the
ixgbevf_ring structure. This way we can drop the adapter as parameter from
these functions.
This patch also includes small cleanups in some error code paths.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework the device ID #defines to follow the _DEV_ID convention
already established in the other Intel drivers.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow compiling DCB related files if I40E_DCB option
is supported in the kernel configuration.
DCB is disabled by default.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds capability to configure DCB on i40e network
interfaces using Intel XL710 adapter firmware APIs.
By default all VSIs are only enabled for the default traffic
class enabled by firmware for any given PF. The driver would
query the firmware for the traffic classes that are enabled for
the port and reconfigure the LAN VSI to match to the port traffic
class settings. All other VSIs are only enabled for the default
traffic class settings for now.
The driver registers and listens to firmware events that may
require change in the DCB settings. It may reconfigure the VSI
settings based on these events.
This patch exposes IEEE DCBNL interfaces for the i40e driver to
allow any application to query the DCB settings on the adapter.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intel XL710 series of adapters support QoS as per the
IEEE 802.1 DCB (Data Center Bridging) standard.
This is supported in conjuction with:
- Enhanced Transmission Selection (ETS) - IEEE 802.1Qaz
- Priority Flow Control (PFC) - IEEE 802.1Qbb
- DCB eXchange Protocol (DCBX) - IEEE 802.1Qaz
On Intel XL710 adapters DCBX is performed by the adapter
firmware. The firmware runs DCBX in willing mode and configures
the port as per the DCB settings recommended by it's link
partner.
By default in absence of any DCBX; firmware would configure the
port with a single traffic class and all of the port bandwith
will be allocated to that traffic class.
This patch adds functions and calls to support querying and
configuring DCB using firmware APIs.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e hardware was generating some inconsistent results
when using current programming methods. This refactor
fixes the inconsistencies that were preventing clean
unloads of the driver, and moves the queues for handling
flow director errors into their own hardware VSI.
This patch also implements a corrected version of the
basic ethtool add ntuple rule, which will disable
the driver's automatic flow programming. A future patch
adds remove/replay/list support for ntuple.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FLAG_FDIR_* defines can be renamed to be more descriptive.
This patch is in preparation for the following where the fdir
code is refactored.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix more whitespace issues, including making some locals declared
in a nicer order.
Also update Copyright string printed when the driver loads.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove a workaround that is no longer necessary and implement
a better understanding of what firmware is returning in the MSI-X
vector count. This makes it so that the driver ends up with the
right amount of queues when using all available MSI-X vectors.
Change-ID: I34e60cc71dcfb1b5412f37df956fedcc49ade187
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Compile testing with higher warning levels found this complaint:
i40e_nvm.c: warning: 'checksum_local' may be used uninitialized in
this function
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ab7db91705 ("virtio-net: auto-tune mergeable rx buffer size for
improved performance") introduced a virtio-net dependency on EWMA.
The inclusion of EWMA is controlled by CONFIG_AVERAGE. Fix build error
when CONFIG_AVERAGE is not enabled by adding select AVERAGE to
virtio-net's Kconfig entry.
Build failure reported using config make ARCH=s390 defconfig.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bump the version number to better match functionality provided with out of
tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bump the version number to better match functionality provided with out
of tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds braces around the ixgbe_qv_lock_* calls which previously only
had braces around the if portion. Kernel style guidelines for this require
parenthesis around all conditions if they are required around one. In addition
the comment while not illegal C syntax makes the code look wrong at a cursory
glance. This patch corrects the style and adds braces so that the full if-else
block is uniform.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to get correct drop monitor notifications for dropped
packets, we should call kfree_skb() instead of dev_kfree_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest. Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.
Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes. Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave. Directory only appears when
<interface> is a slave.
$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state
$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg reported that commit acaf4e7099 caused a panic
when adding a network namespace while vxlan module was present in
the system:
[<ffffffff814d0865>] vxlan_lowerdev_event+0xf5/0x100
[<ffffffff816e9e5d>] notifier_call_chain+0x4d/0x70
[<ffffffff810912be>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff810912d6>] raw_notifier_call_chain+0x16/0x20
[<ffffffff815d9610>] call_netdevice_notifiers_info+0x40/0x70
[<ffffffff815d9656>] call_netdevice_notifiers+0x16/0x20
[<ffffffff815e1bce>] register_netdevice+0x1be/0x3a0
[<ffffffff815e1dce>] register_netdev+0x1e/0x30
[<ffffffff814cb94a>] loopback_net_init+0x4a/0xb0
[<ffffffffa016ed6e>] ? lockd_init_net+0x6e/0xb0 [lockd]
[<ffffffff815d6bac>] ops_init+0x4c/0x150
[<ffffffff815d6d23>] setup_net+0x73/0x110
[<ffffffff815d725b>] copy_net_ns+0x7b/0x100
[<ffffffff81090e11>] create_new_namespaces+0x101/0x1b0
[<ffffffff81090f45>] copy_namespaces+0x85/0xb0
[<ffffffff810693d5>] copy_process.part.26+0x935/0x1500
[<ffffffff811d5186>] ? mntput+0x26/0x40
[<ffffffff8106a15c>] do_fork+0xbc/0x2e0
[<ffffffff811b7f2e>] ? ____fput+0xe/0x10
[<ffffffff81089c5c>] ? task_work_run+0xac/0xe0
[<ffffffff8106a406>] SyS_clone+0x16/0x20
[<ffffffff816ee689>] stub_clone+0x69/0x90
[<ffffffff816ee329>] ? system_call_fastpath+0x16/0x1b
Apparently loopback device is being registered first and thus we
receive an event notification when vxlan_net is not ready. Hence,
when we call net_generic() and request vxlan_net_id, we seem to
access garbage at that point in time. In setup_net() where we set
up a newly allocated network namespace, we traverse the list of
pernet ops ...
list_for_each_entry(ops, &pernet_list, list) {
error = ops_init(ops, net);
if (error < 0)
goto out_undo;
}
... and loopback_net_init() is invoked first here, so in the middle
of setup_net() we get this notification in vxlan. As currently we
only care about devices that unregister, move access through
net_generic() there. Fix is based on Cong Wang's proposal, but
only changes what is needed here. It sucks a bit as we only work
around the actual cure: right now it seems the only way to check if
a netns actually finished traversing all init ops would be to check
if it's part of net_namespace_list. But that I find quite expensive
each time we go through a notifier callback. Anyway, did a couple
of tests and it seems good for now.
Fixes: acaf4e7099 ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 43dc4e01 Limit number of reported VFs to device
specific value It doesn't work and always returns -EBUSY because VFs are
already enabled.
ixgbe_enable_sriov()
pci_enable_sriov()
sriov_enable()
{
... ..
iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
... ...
}
pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
return -EBUSY;
...
}
So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().
V2: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because ixgbe driver limit the max number of VF
functions could be enabled to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT
and cleanup the const 63 in code.
v3: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The r7s72100 SoC includes a fast ethernet controller.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return a boolean from sh_eth_is_gether() and refactor it as a one-liner.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove function qlcnic_enable_eswitch which was defined
but never used in current code.
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions only used in one file should be static.
Found by running make namespacecheck
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2613af0ed1 ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads. For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.
This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.
These optimizations interact positively with recent commit
ba27524103 ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.
Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.
net-next w/ virtio_net before 2613af0ed1 (PAGE_SIZE bufs): 14642.85Gb/s
net-next (MTU-size bufs): 13170.01Gb/s
net-next + auto-tune: 14555.94Gb/s
Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
using MTU-sized buffers to about 26Gb/s using auto-tuning.
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.
To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like there's no need for those two fields:
- Unless there's a failure for the first refill try, rq->max should be always
equal to the vring size.
- rq->num is only used to determine the condition that we need to do the refill,
we could check vq->num_free instead.
- rq->num was required to be increased or decreased explicitly after each
get/put which results a bad API.
So this patch removes them both to make the code simpler.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes following sparse warning
davinci_mdio.c:85:27: warning: symbol 'default_pdata' was not declared. Should it be static?
Also makes the default_pdata as a constant.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if a slave's name change, we just pass it by. However, if the
slave is a current primary_slave, then we end up with using a slave, whose
name != params.primary, for primary_slave. And vice-versa, if we don't have
a primary_slave but have params.primary set - we will not detected a new
primary_slave.
Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave
accordingly. Also, if the primary_slave was changed, issue a reselection of
the active slave, cause the priorities have changed.
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refinements to cloud support in the Firmware API.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check that the descriptors were allocated before trying to dump
them to the logfile. While we're there, de-trick-ify the code
so as to be easier to read and not abusing the types and unions.
Change-ID: I22898f4b22cecda3582d4d9e4018da9cd540f177
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it catches the NETDEV_CHANGEMTU notification, which is signaled after
the actual change happened on the device, and returns NOTIFY_BAD, so that
the change on the device is reverted.
This might be quite costly and messy, so use the new NETDEV_PRECHANGEMTU to
catch the MTU change before the actual change happens and signal that it's
forbidden to do it.
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ever since this driver was merged the following code was included:
if (skb->len < MISR)
skb->len = MISR;
MISR is defined to 0x3C which is also equivalent to ETH_ZLEN, but use
ETH_ZLEN directly which is exactly what we want to be checking for.
Reported-by: Marc Volovic <marcv@ezchip.com>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On newer and faster machines (Vortex X86DX) using the r6040 driver, it
was noticed that the driver was returning an error during probing traced
down to being the MDIO bus probing and the inability to complete a MDIO
read operation in time. It turns out that the MDIO operations on these
faster machines usually complete after ~2140 iterations which is bigger
than 2048 (MAC_DEF_TIMEOUT) and results in spurious timeouts depending
on the system load.
Update r6040_phy_read() and r6040_phy_write() to include a 1
micro second delay in each busy-looping iteration of the loop which is a
much safer operation than incrementing MAC_DEF_TIMEOUT.
Reported-by: Nils Koehler <nils.koehler@ibt-interfaces.de>
Reported-by: Daniel Goertzen <daniel.goertzen@gmail.com>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for IPv6 checksum offload and GSO when those
features are available in the backend.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused function vxge_hw_vpath_vid_get
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use 'make namespacecheck' to code that could be declared static.
After that remove code that is not being used.
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
recently, dm9000 codes have many checkpatch errors and warnings:
WARNING: please, no space before tabs
3: FILE: dm9000.c:3:
+ * ^ICopyright (C) 1997 Sten Wang$
WARNING: please, no space before tabs
5: FILE: dm9000.c:5:
+ * ^IThis program is free software; you can redistribute it and/or$
WARNING: please, no space before tabs
6: FILE: dm9000.c:6:
+ * ^Imodify it under the terms of the GNU General Public License$
WARNING: please, no space before tabs
7: FILE: dm9000.c:7:
+ * ^Ias published by the Free Software Foundation; either version 2$
WARNING: please, no space before tabs
8: FILE: dm9000.c:8:
+ * ^Iof the License, or (at your option) any later version.$
WARNING: please, no space before tabs
10: FILE: dm9000.c:10:
+ * ^IThis program is distributed in the hope that it will be useful,$
WARNING: please, no space before tabs
11: FILE: dm9000.c:11:
+ * ^Ibut WITHOUT ANY WARRANTY; without even the implied warranty of$
WARNING: please, no space before tabs
12: FILE: dm9000.c:12:
+ * ^IMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the$
WARNING: please, no space before tabs
13: FILE: dm9000.c:13:
+ * ^IGNU General Public License for more details.$
WARNING: do not add new typedefs
97: FILE: dm9000.c:97:
+typedef struct board_info {
ERROR: spaces prohibited around that ':' (ctx:WxV)
113: FILE: dm9000.c:113:
+ unsigned int in_suspend :1;
^
ERROR: spaces prohibited around that ':' (ctx:WxV)
114: FILE: dm9000.c:114:
+ unsigned int wake_supported :1;
^
This patch fixes important errors in it.
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 9d8bf54 ("i40e: associate VMDq queue with VM type")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building resource_tracker.o triggers a GCC warning:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_HW2SW_SRQ_wrapper':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3202:17: warning: 'srq' may be used uninitialized in this function [-Wmaybe-uninitialized]
atomic_dec(&srq->mtt->ref_count);
^
This is a false positive. But a cleanup of srq_res_start_move_to() can
help GCC here. The code currently uses a switch statement where a plain
if/else would do, since only two of the switch's four cases can ever
occur. Dropping that switch makes the warning go away.
While we're at it, add some missing braces, and convert state to the
correct type.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building resource_tracker.o triggers a GCC warning:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_HW2SW_CQ_wrapper':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3019:16: warning: 'cq' may be used uninitialized in this function [-Wmaybe-uninitialized]
atomic_dec(&cq->mtt->ref_count);
^
This is a false positive. But a cleanup of cq_res_start_move_to() can
help GCC here. The code currently uses a switch statement where an
if/else construct would do too, since only two of the switch's four
cases can ever occur. Dropping that switch makes the warning go away.
While we're at it, add some missing braces.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7509963c70 (e1000e: Fix a compile flag mis-match for
suspend/resume) moved suspend and resume hooks to be available when
CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set
causing following warnings to be emitted:
drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning:
‘e1000_suspend’ defined but not used [-Wunused-function]
drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning:
‘e1000_resume’ defined but not used [-Wunused-function]
To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set
and remove CONFIG_PM wrapping from driver ops because this is already
handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Dave Ertman <davidx.m.ertman@intel.com>
Cc: Aaron Brown <aaron.f.brown@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>