Commit Graph

599120 Commits

Author SHA1 Message Date
Andy Shevchenko
0d08df6c49 net/lapb: tuse %*ph to dump buffers
Use %*ph specifier to dump small buffers in hex format instead doing this
byte-by-byte.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:33:25 -07:00
Dan Carpenter
6756325a9a ptp: oops in ptp_ioctl()
If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.

Fixes: 2ece068e1b ('ptp: use memdup_user().')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:32:27 -07:00
Arnd Bergmann
fabb13db44 fou: add Kconfig options for IPv6 support
A previous patch added the fou6.ko module, but that failed to link
in a couple of configurations:

net/built-in.o: In function `ip6_tnl_encap_add_fou_ops':
net/ipv6/fou6.c:88: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:94: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:97: undefined reference to `ip6_tnl_encap_del_ops'
net/built-in.o: In function `ip6_tnl_encap_del_fou_ops':
net/ipv6/fou6.c:106: undefined reference to `ip6_tnl_encap_del_ops'
net/ipv6/fou6.c:107: undefined reference to `ip6_tnl_encap_del_ops'

If CONFIG_IPV6=m, ip6_tnl_encap_add_ops/ip6_tnl_encap_del_ops
are in a module, but fou6.c can still be built-in, and that
obviously fails to link.

Also, if CONFIG_IPV6=y, but CONFIG_IPV6_TUNNEL=m or
CONFIG_IPV6_TUNNEL=n, the same problem happens for a different
reason.

This adds two new silent Kconfig symbols to work around both
problems:

- CONFIG_IPV6_FOU is now always set to 'm' if either CONFIG_NET_FOU=m
  or CONFIG_IPV6=m
- CONFIG_IPV6_FOU_TUNNEL is set implicitly when IPV6_FOU is enabled
  and NET_FOU_IP_TUNNELS is also turned out, and it will ensure
  that CONFIG_IPV6_TUNNEL is also available.

The options could be made user-visible as well, to give additional
room for configuration, but it seems easier not to bother users
with more choice here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: aa3463d65e ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:24:21 -07:00
Arnd Bergmann
9791d8e762 ipv6: hide ip6_encap_hlen/ip6_tnl_encap definitions
A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
definitions, but it is now invisible when CONFIG_INET is
not defined, but still referenced from ip6_tunnel.h:

In file included from net/xfrm/xfrm_input.c:17:0:
include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
   ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
                 ^~~~~~~~~~~~~~~~~~~

This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
of CONFIG_INET so we don't run into the the problem.

Alternatively we could move the macro out of the #ifdef again to
restore the previous behavior

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 55c2bc1432 ("net: Cleanup encap items in ip_tunnels.h")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:24:21 -07:00
David S. Miller
b7e7ad611e Merge branch 'qed-fixes'
Yuval Mintz says:

====================
qed*: Bug fixes

This series contain several small fixes, most of which deal with
either 100g support, sriov or bandwidth configurations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:33 -07:00
Yuval Mintz
3e7cfce228 qed: Don't config min BW on 100g on link flap
Currently 100g devices don't support minimum/maximum BW configurations,
yet link flaps might cause the driver to attempt to do such a
configuration. Prevent this just as we do for the maximum BW.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:33 -07:00
Sudarsana Reddy Kalluru
bb13ace7dc qed: Prevent 100g from working in MSI
Adapter can support 100g in both MSIx and INTa, but not in MSI.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:33 -07:00
Yuval Mintz
1af9dcf7f9 qed: Add missing 100g init mode
Some of the HW configurations are currently missing for 100g devices.
This can cause various classification issues, as well as prevent device
from fully reaching line-rate.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:33 -07:00
Yuval Mintz
cc3d5eb091 qed: Save min/max accross dcbx-change
When DCBx re-negotiation is occurring, the PF's configurations for
maximum and minimum bandwidth guarantees are currently lost.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:32 -07:00
Sudarsana Reddy Kalluru
795292916c qed: Fix allocation in interrupt context
Commit 39651abd28 ("qed: add support for dcbx") is re-configuring
the QM hw-block as part of its sequence. This is done in attention
handling context which is non-sleepable, yet memory is allocated in
this flow using GFP_KERNEL.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:32 -07:00
Yuval Mintz
6ecb0a0c0d qede: Don't expose self-test for VFs
PFs and VFs differ in their registered ethtool operations,
but they're using a common function for get_sset_count().
As a result, `ethtool -i' for a VF would indicate it supports
selftest, although that's not the case.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:32 -07:00
Yuval Mintz
ce2b885cc5 qede: Reload on GRO changes
Since driver is using a FW-based GRO implementation, this has some
effects on its ability to cope with GRO enablement/disablement.
As a result, driver must perform an inner-reload as a result of a state
change in the offload configuration of said feature.

[Failure to do so means network stack would continue to receive
aggregated packets even though user requested the feature to be disabled].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:32 -07:00
Yuval Mintz
be7b6d64c0 qede: Fix VF minimum BW setting
VF is currently ignoring the minimum provided by the API,
mistakenly using the maximum for minimum as well.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-26 12:27:32 -07:00
David S. Miller
61248720c4 Merge branch 'mlx4-stats-fixes'
Eric Dumazet says:

====================
net/mlx4_en: fix stats

mlx4 has various bugs in its ndo_get_stats() and related functions.
This patch series address the obvious issues.
Remaining ones will be discussed later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
f73a6f439f net/mlx4_en: get rid of private net_device_stats
We simply can use the standard net_device stats.

We do not need to clear fields that are already 0.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
9ed17db17f net/mlx4_en: get rid of ret_stats
mlx4 uses a private struct net_device_stats in a vain attempt
to avoid races.

This is buggy because multiple cpus could call mlx4_en_get_stats()
at the same time, so ret_stats can not guarantee stable results.

To fix this, we need to switch to ndo_get_stats64() as this
method provides per-thread storage.

This allows to reduce mlx4_en_priv bloat.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
45acbac609 net/mlx4_en: clear some TX ring stats in mlx4_en_clear_stats()
mlx4_en_clear_stats() clears about everything but few TX ring
fields are missing :
- queue_stopped, wake_queue, tso_packets, xmit_more

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
63a664b7e9 net/mlx4_en: fix tx_dropped bug
1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable
is overwritten in mlx4_en_DUMP_ETH_STATS().

2) This increment was not SMP safe, as a port might have many TX queues.

Add a per TX ring tx_dropped to fix these issues.

This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field.

So lets avoid bugs with SNMP agents having to cope with partial
overwraps. (One of these agents being bond_fold_stats())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:49 -07:00
Xin Long
bed187b540 sctp: fix double EPs display in sctp_diag
We have this situation: that EP hash table, contains only the EPs
that are listening, while the transports one, has the opposite.
We have to traverse both to dump all.

But when we traverse the transports one we will also get EPs that are
in the EP hash if they are listening. In this case, the EP is dumped
twice.

We will fix it by checking if the endpoint that is in the endpoint
hash table contains any ep->asoc in there, as it means we will also
find it via transport hash, and thus we can/should skip it, depending
on the filters used, like 'ss -l'.

Still, we should NOT skip it if the user is listing only listening
endpoints, because then we are not traversing the transport hash.
so we have to check idiag_states there also.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:14:31 -07:00
Marek Vasut
3424d9be8f net: arc: trivial: Replace comma with a semicolon
Fix a typo in the driver, replace comma with a semicolon at the end
of statement. While using comma is a legal C here and probably does
not even generate compiler warning, it was unlikely the intention.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Caesar Wang <wxt@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:13:15 -07:00
Marek Vasut
643d60bf57 net: stmmac: Fix incorrect memcpy source memory
The memcpy() currently copies mdio_bus_data into new_bus->irq, which
makes no sense, since the mdio_bus_data structure contains more than
just irqs. The code was likely supposed to copy mdio_bus_data->irqs
into the new_bus->irq instead, so fix this.

Fixes: e7f4dc3536 ("mdio: Move allocation of interrupts into core")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 21:43:35 -07:00
Feng Tang
26c5f03b2a net: alx: use custom skb allocator
This patch follows Eric Dumazet's commit 7b70176421 for Atheros
atl1c driver to fix one exactly same bug in alx driver, that the
network link will be lost in 1-5 minutes after the device is up.

My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
same problem with kernel 4.4, and it will be cured by Jarod Wilson's
commit c406700c for alx driver which get merged in 4.5. But there
are still some alx devices can't function well even with Jarod's
patch, while this patch could make them work fine. More details on
	https://bugzilla.kernel.org/show_bug.cgi?id=70761

The debug shows the issue is very likely to be related with the RX
DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
times, their will be RX overflow error and device will stop working.

For kernel 4.5.0 with Jarod's patch which works fine with my
AR8161/Lennov Y580, if I made some change to the
	__netdev_alloc_skb
		--> __alloc_page_frag()
to make the allocated buffer can get an address with 0x...f80,
then the same error happens. If I make it to 0x...f40 or 0x....fc0,
everything will be still fine. So I tend to believe that the
0x..f80 address cause the silicon to behave abnormally.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Cc: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Ole Lukoie <olelukoie@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 19:52:27 -04:00
Ivan Vecera
f6988cb63a team: don't call netdev_change_features under team->lock
The team_device_event() notifier calls team_compute_features() to fix
vlan_features under team->lock to protect team->port_list. The problem is
that subsequent __team_compute_features() calls netdev_change_features()
to propagate vlan_features to upper vlan devices while team->lock is still
taken. This can lead to deadlock when NETIF_F_LRO is modified on lower
devices or team device itself.

Example:
The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are
LRO capable and LRO is enabled. Thus LRO is also enabled on team0.

The command 'ethtool -K team0 lro off' now hangs due to this deadlock:

dev_ethtool()
-> ethtool_set_features()
 -> __netdev_update_features(team)
  -> netdev_sync_lower_features()
   -> netdev_update_features(lower_1)
    -> __netdev_update_features(lower_1)
    -> netdev_features_change(lower_1)
     -> call_netdevice_notifiers(...)
      -> team_device_event(lower_1)
       -> team_compute_features(team) [TAKES team->lock]
        -> netdev_change_features(team)
         -> __netdev_update_features(team)
          -> netdev_sync_lower_features()
           -> netdev_update_features(lower_2)
            -> __netdev_update_features(lower_2)
            -> netdev_features_change(lower_2)
             -> call_netdevice_notifiers(...)
              -> team_device_event(lower_2)
               -> team_compute_features(team) [DEADLOCK]

The bug is present in team from the beginning but it appeared after the commit
fd867d5 (net/core: generic support for disabling netdev features down stack)
that adds synchronization of features with lower devices.

Fixes: fd867d5 (net/core: generic support for disabling netdev features down stack)
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:58:04 -07:00
David S. Miller
f58fb33012 Merge branch 'dsa-doc-fixes'
Florian Fainelli says:

====================
Documentation: dsa: misc fixes

Here are some miscelaneous documentation fixes for DSA, I targeted "net"
because these are not functional code changes, but still documentation fixes
per-se.

Changes in v2:

- reword what the port_vlan_filtering is about based on feedback from Vivien and Ido
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:41 -07:00
Florian Fainelli
f05e2db199 Documentation: networking: dsa: Describe port_vlan_filtering
Described what the port_vlan_filtering function is supposed to
accomplish.

Fixes: fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Florian Fainelli
7013d8e1d0 Documentation: networking: dsa: Remove priv_size description
We no longer have a priv_size structure member since 5feebd0a8a ("net:
dsa: Remove allocation of driver private memory")

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Florian Fainelli
75c0bbd0e2 Documentation: networking: dsa: Remove poll_link description
This function has been removed in 4baee937b8 ("net: dsa: remove DSA
link polling") in favor of using the PHYLIB polling mechanism.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Edward Cree
c0795bf64c sfc: on MC reset, clear PIO buffer linkage in TXQs
Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
use the old ones, which aren't there any more.

Fixes: 183233bec8 "sfc: Allocate and link PIO buffers; map them with write-combining"
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:44:11 -07:00
David S. Miller
217f97e856 Merge branch 'hwbm-locking-fixes'
Gregory CLEMENT says:

====================
Fix spinlock usage in HWBM

these two patches fix spinlock related issues introduced in v4.6. They
have been reported by Russell King and Jean-Jacques Hiblot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:35:09 -07:00
Gregory CLEMENT
b388fc7405 net: hwbm: Fix unbalanced spinlock in error case
When hwbm_pool_add exited in error the spinlock was not released. This
patch fixes this issue.

Fixes: 8cb2d8bf57 ("net: add a hardware buffer management helper API")
Reported-by: Jean-Jacques Hiblot <jjhiblot@traphandler.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:35:09 -07:00
Gregory CLEMENT
91c45e38b9 net: mvneta: Fix lacking spinlock initialization
The spinlock used by the hwbm functions must be initialized by the
network driver. This commit fixes this lack and the following erros when
lockdep is enabled:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
[<c010ff80>] (unwind_backtrace) from [<c010bd08>] (show_stack+0x10/0x14)
[<c010bd08>] (show_stack) from [<c032913c>] (dump_stack+0xb4/0xe0)
[<c032913c>] (dump_stack) from [<c01670e4>] (__lock_acquire+0x1f58/0x2060)
[<c01670e4>] (__lock_acquire) from [<c0167dec>] (lock_acquire+0xa4/0xd0)
[<c0167dec>] (lock_acquire) from [<c06f6650>] (_raw_spin_lock_irqsave+0x54/0x68)
[<c06f6650>] (_raw_spin_lock_irqsave) from [<c058e830>] (hwbm_pool_add+0x1c/0xdc)
[<c058e830>] (hwbm_pool_add) from [<c043f4e8>] (mvneta_bm_pool_use+0x338/0x490)
[<c043f4e8>] (mvneta_bm_pool_use) from [<c0443198>] (mvneta_probe+0x654/0x1284)
[<c0443198>] (mvneta_probe) from [<c03b894c>] (platform_drv_probe+0x4c/0xb0)
[<c03b894c>] (platform_drv_probe) from [<c03b7158>] (driver_probe_device+0x214/0x2c0)
[<c03b7158>] (driver_probe_device) from [<c03b72c4>] (__driver_attach+0xc0/0xc4)
[<c03b72c4>] (__driver_attach) from [<c03b5440>] (bus_for_each_dev+0x68/0x9c)
[<c03b5440>] (bus_for_each_dev) from [<c03b65b8>] (bus_add_driver+0x1a0/0x218)
[<c03b65b8>] (bus_add_driver) from [<c03b79cc>] (driver_register+0x78/0xf8)
[<c03b79cc>] (driver_register) from [<c01018f4>] (do_one_initcall+0x90/0x1dc)
[<c01018f4>] (do_one_initcall) from [<c0900de4>] (kernel_init_freeable+0x15c/0x1fc)
[<c0900de4>] (kernel_init_freeable) from [<c06eed90>] (kernel_init+0x8/0x114)
[<c06eed90>] (kernel_init) from [<c0107910>] (ret_from_fork+0x14/0x24)

Fixes: baa11ebc0c ("net: mvneta: Use the new hwbm framework")
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:35:08 -07:00
Baozeng Ding
297f7d2cce tipc: fix potential null pointer dereferences in some compat functions
Before calling the nla_parse_nested function, make sure the pointer to the
attribute is not null. This patch fixes several potential null pointer
dereference vulnerabilities in the tipc netlink functions.

Signed-off-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:33:52 -07:00
Sudarsana Reddy Kalluru
ec7c7f5caf qed: Reset the enable flag for eth protocol.
This patch fixes the coding error in determining the enable flag for
the application/protocol. The enable flag should be set for all protocols
but the eth.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:32:41 -07:00
Vidya Sagar Ravipati
3851112e47 ethtool: add support for 25G/50G/100G speed modes
This patch enhances ethtool link mode bitmap to include
25G/50G/100G speed along with interface modes

Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:26:54 -07:00
Jamal Hadi Salim
3d3ed18151 net sched actions: policer missing timestamp processing
Policer was not dumping or updating timestamps

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-24 16:23:23 -07:00
Eric Dumazet
a9efad8b24 net_sched: avoid too many hrtimer_start() calls
I found a serious performance bug in packet schedulers using hrtimers.

sch_htb and sch_fq are definitely impacted by this problem.

We constantly rearm high resolution timers if some packets are throttled
in one (or more) class, and other packets are flying through qdisc on
another (non throttled) class.

hrtimer_start() does not have the mod_timer() trick of doing nothing if
expires value does not change :

	if (timer_pending(timer) &&
            timer->expires == expires)
                return 1;

This issue is particularly visible when multiple cpus can queue/dequeue
packets on the same qdisc, as hrtimer code has to lock a remote base.

I used following fix :

1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
it.

2) Cache watchdog prior expiration. hrtimer might provide this, but I
prefer to not rely on some hrtimer internal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-24 14:49:14 -07:00
Gavin Shan
3275c0c6c5 net/qlge: Avoids recursive EEH error
One timer, whose handler keeps reading on MMIO register for EEH
core to detect error in time, is started when the PCI device driver
is loaded. MMIO register can't be accessed during PE reset in EEH
recovery. Otherwise, the unexpected recursive error is triggered.
The timer isn't closed that time if the interface isn't brought
up. So the unexpected recursive error is seen during EEH recovery
when the interface is down.

This avoids the unexpected recursive EEH error by closing the timer
in qlge_io_error_detected() before EEH PE reset unconditionally. The
timer is started unconditionally after EEH PE reset in qlge_io_resume().
Also, the timer should be closed unconditionally when the device is
removed from the system permanently in qlge_io_error_detected().

Reported-by: Shriya R. Kulkarni <shriyakul@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-24 14:47:19 -07:00
Haishuang Yan
252f3f5a11 ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path.
In gre6 xmit path, we are sending a GRE packet, so set fl6 proto
to IPPROTO_GRE properly.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-24 14:33:48 -07:00
Haishuang Yan
1b227e5366 ip6_gre: Fix MTU setting for ip6gretap
When creat an ip6gretap interface with an unreachable route,
the MTU is about 14 bytes larger than what was needed.

If the remote address is reachable:
ping6 2001:0:130::1 -c 2
PING 2001:0:130::1(2001:0:130::1) 56 data bytes
64 bytes from 2001:0:130::1: icmp_seq=1 ttl=64 time=1.46 ms
64 bytes from 2001:0:130::1: icmp_seq=2 ttl=64 time=81.1 ms

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-24 14:33:48 -07:00
Dan Carpenter
54b9430f04 qed: signedness bug in qed_dcbx_process_tlv()
"priority" needs to be signed for the error handling to work.

Fixes: 39651abd28 ('qed: add support for dcbx.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 15:11:08 -07:00
Daniel Borkmann
612bacad78 bpf, inode: disallow userns mounts
Follow-up to commit e27f4a942a ("bpf: Use mount_nodev not mount_ns
to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.

The original idea was to have a per mountns instance instead of a
single global fs instance, but that didn't work out and we had to
switch to mount_nodev() model. The intent of that middle ground was
that we avoid users who don't play nice to create endless instances
of bpf fs which are difficult to control and discover from an admin
point of view, but at the same time it would have allowed us to be
more flexible with regard to namespaces.

Therefore, since we now did the switch to mount_nodev() as a fix
where individual instances are created, we also need to remove userns
mount flag along with it to avoid running into mentioned situation.
I don't expect any breakage at this early point in time with removing
the flag and we can revisit this later should the requirement for
this come up with future users. This and commit e27f4a942a have
been split to facilitate tracking should any of them run into the
unlikely case of causing a regression.

Fixes: b2197755b2 ("bpf: add support for persistent maps/progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 15:08:26 -07:00
Geert Uytterhoeven
156f4fbcc2 MAINTAINERS: Add file patterns for net device tree bindings
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 15:05:58 -07:00
Ezequiel Garcia
049bbf589e ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n
Commit fa50d974d1 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).

The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:

  ip_select_ttl
    ip4_dst_hoplimit
      net->ipv4.sysctl_ip_default_ttl

This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.

Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).

Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.

Fixes: fa50d974d1 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 14:32:06 -07:00
Muhammad Falak R Wani
2ece068e1b ptp: use memdup_user().
Use memdup_user to duplicate a memory region from user-space to
kernel-space, instead of open coding using kmalloc & copy_from_user.

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 14:05:30 -07:00
xypron.glpk@gmx.de
8da07a393f net: hns: avoid null pointer dereference
In the statement
  assert(priv || priv->ae_handle);
the right side of || is only evaluated if priv is null.

v2:
	As suggested by David Leight and David Miller the assert
	statements are removed.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 13:54:03 -07:00
Stefan Hajnoczi
c685293aa3 net/atm: sk_err_soft must be positive
The sk_err and sk_err_soft fields are positive errno values and
userspace applications rely on this when using getsockopt(SO_ERROR).

ATM code places an -errno into sk_err_soft in sigd_send() and returns it
from svc_addparty()/svc_dropparty().

Although I am not familiar with ATM code I came to this conclusion
because:

1. sigd_send() msg->type cases as_okay and as_error both have:

   sk->sk_err = -msg->reply;

   while the as_addparty and as_dropparty cases have:

   sk->sk_err_soft = msg->reply;

   This is the source of the inconsistency.

2. svc_addparty() returns an -errno and assumes sk_err_soft is also an
   -errno:

       if (flags & O_NONBLOCK) {
           error = -EINPROGRESS;
           goto out;
       }
       ...
       error = xchg(&sk->sk_err_soft, 0);
   out:
       release_sock(sk);
       return error;

   This shows that sk_err_soft is indeed being treated as an -errno.

This patch ensures that sk_err_soft is always a positive errno.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 13:51:10 -07:00
xypron.glpk@gmx.de
bbf178e0a0 net: pegasus: simplify logical constraint
If !count is true, count < 4 is also true.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-23 13:48:45 -07:00
Linus Torvalds
7639dad93a Three more changes.
1) I forgot that I had another selftest to stress test the ftrace
    instance creation. It was actually suppose to go into the 4.6
    merge window, but I never committed it. I almost forgot about it
    again, but noticed it was missing from your tree.
 
 2) Soumya PN sent me a clean up patch to not disable interrupts when
    taking the tasklist_lock for read, as it's unnecessary because
    that lock is never taken for write in irq context.
 
 3) Newer gcc's can cause the jump in the function_graph code to the
    global ftrace_stub label to be a short jump instead of a long one.
    As that jump is dynamically converted to jump to the trace code to
    do function graph tracing, and that conversion expects a long jump
    it can corrupt the ftrace_stub itself (it's directly after that call).
    One way to prevent gcc from using a short jump is to declare the
    ftrace_stub as a weak function, which we do here to keep gcc from
    optimizing too much.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXQhYQAAoJEKKk/i67LK/82pAH/3XzRCP366HqWnKdvluPB8vX
 UnVoXGAX1Eh2ZpvlPIJBXNYOZlnGRMMMAoeI+su31FoJHrzTzfGXvRynTkZPFZtd
 XakvHfACjtGtvi2MuCN1t9/d1ty/ob2o05KB9qc+JRlzHM09qTL/HX8hwZeEsMQ4
 NYgEY4Y727LOSCrJieLktchpwtie77q8Wq25oiWIVWOyDjpCsPnZyaOqaQSANot9
 Gd00cixbMam7Ba1BjoRsRQZaT2pYZ8vt7HDXDBfAOW1oOjalWARLhRg/zww1V3WD
 DEptuEeyAgMJS3v76Z6Sbk/QM7hyGUWCcmC2qaN1yc2n1Sh+zBOiN1eyiiUh/2U=
 =ERxv
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull motr tracing updates from Steven Rostedt:
 "Three more changes.

   - I forgot that I had another selftest to stress test the ftrace
     instance creation.  It was actually suppose to go into the 4.6
     merge window, but I never committed it.  I almost forgot about it
     again, but noticed it was missing from your tree.

   - Soumya PN sent me a clean up patch to not disable interrupts when
     taking the tasklist_lock for read, as it's unnecessary because that
     lock is never taken for write in irq context.

   - Newer gcc's can cause the jump in the function_graph code to the
     global ftrace_stub label to be a short jump instead of a long one.
     As that jump is dynamically converted to jump to the trace code to
     do function graph tracing, and that conversion expects a long jump
     it can corrupt the ftrace_stub itself (it's directly after that
     call).  One way to prevent gcc from using a short jump is to
     declare the ftrace_stub as a weak function, which we do here to
     keep gcc from optimizing too much"

* tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it
  ftrace: Don't disable irqs when taking the tasklist_lock read_lock
  ftracetest: Add instance created, delete, read and enable event test
2016-05-22 19:40:39 -07:00
Linus Torvalds
77ed402b7f Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu update from Greg Ungerer:
 "Only a single change to update my email address in the MAINTAINERS
  file"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: change m68knommu maintainer email address
2016-05-22 19:35:27 -07:00
Linus Torvalds
21f9debf74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Some 32-bit kgdb cleanups from Sam Ravnborg, and a hugepage TLB flush
  overhead fix on 64-bit from Nitin Gupta"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Reduce TLB flushes during hugepte changes
  aeroflex/greth: fix warning about unused variable
  openprom: fix warning
  sparc32: drop superfluous cast in calls to __nocache_pa()
  sparc32: fix build with STRICT_MM_TYPECHECKS
  sparc32: use proper prototype for trapbase
  sparc32: drop local prototype in kgdb_32
  sparc32: drop hardcoding trap_level in kgdb_trap
2016-05-22 19:24:13 -07:00