Thankfully, the neighbour subsystem is agnostic to the upper protocol
and used by both IPv4 and IPv6. By removing assumptions regarding the
neighbour type we can thus re-use much of the neighbour-related code for
both IPv4 and IPv6.
For each nexthop, store its gateway IP and for nexthop group store the
neighbour table used by its nexthops.
Use this information throughout the code and remove assumption about the
neighbour type.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbours' activity is currently dumped according to the ARP
table's DELAY_PROBE time, but with the introduction of IPv6 offload we
should set the interval according to the minimum between the ARP and
ndisc tables.
Signed-off-by: Arkadi Sharshvesky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the register so that the active IPv6 neighbours could be dumped
from the device's neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As with IPv4, listen to NEIGH_UPDATE events from the ndisc table and
program relevant neighbours to the device's neighbour table.
Note that neighbours with a link-local IP address aren't programmed, as
packets with a link-local destination IP are trapped after LPM lookup
and never reach the neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the register, so the IPv6 neighbours could be programmed to the
device's neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a netdev is configured with an IP address a router interface (RIF)
should be configured for it in the device. Allow configuration of RIFs
based on IPv6 address notifications as well as IPv4.
Note that the RIF exists as long as an IP address is configured on the
netdev, regardless of the address family.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we only flooded broadcast packets to the router when an L3
interface was configured on top of a bridge. However, IPv6 Neighbour
Discovery packets are trapped to the CPU inside the router and these can
be sent with a multicast address.
Flood unregistered multicast packets to the router port, so that
relevant packets could be trapped there.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before we can start using IPv6, we need to trap certain control packets
to the CPU. Among others, these include Neighbour Discovery, DHCP and
neighbour misses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable IPv6 and IPv6 forwarding on router interfaces (RIFs), so that
they will be able to receive and forward IPv6 traffic.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before we add IPv6 constructs like traps and router interfaces, we first
need to enable IPv6 routing in the device.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MD fixes from Shaohua Li:
- raid5-ppl fix by Artur. This one is introduced in this release cycle.
- raid5 reshape fix by Xiao. This is an old bug and will be added to
stable.
- bitmap fix by Guoqing.
* tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5-ppl: use BIOSET_NEED_BVECS when creating bioset
Raid5 should update rdev->sectors after reshape
md/bitmap: don't read page from device with Bitmap_sync
If 'dma_set_mask_and_coherent()' fails, we must undo the previous
'pci_request_regions()' call.
Adjust corresponding 'goto' to jump at the right place of the error
handling path.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
NUMA rework of workqueue made the combination of max_active of 1 and
WQ_UNBOUND insufficient to guarantee ST behavior system wide.
alloc_ordered_queue should now be used instead.
Signed-off-by: Alexei Potashnik <alexei@purestorage.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This fixes another cause of random segfaults and bus errors that may
occur while running perf with the callgraph option.
Critical sections beginning with spin_lock_irqsave() raise the interrupt
level to PIL_NORMAL_MAX (14) and intentionally do not block performance
counter interrupts, which arrive at PIL_NMI (15).
But some sections of code are "super critical" with respect to perf
because the perf_callchain_user() path accesses user space and may cause
TLB activity as well as faults as it unwinds the user stack.
One particular critical section occurs in switch_mm:
spin_lock_irqsave(&mm->context.lock, flags);
...
load_secondary_context(mm);
tsb_context_switch(mm);
...
spin_unlock_irqrestore(&mm->context.lock, flags);
If a perf interrupt arrives in between load_secondary_context() and
tsb_context_switch(), then perf_callchain_user() could execute with
the context ID of one process, but with an active TSB for a different
process. When the user stack is accessed, it is very likely to
incur a TLB miss, since the h/w context ID has been changed. The TLB
will then be reloaded with a translation from the TSB for one process,
but using a context ID for another process. This exposes memory from
one process to another, and since it is a mapping for stack memory,
this usually causes the new process to crash quickly.
This super critical section needs more protection than is provided
by spin_lock_irqsave() since perf interrupts must not be allowed in.
Since __tsb_context_switch already goes through the trouble of
disabling interrupts completely, we fix this by moving the secondary
context load down into this better protected region.
Orabug: 25577560
Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already have macros for values used by driver and Device Tree
sources for pin mux configuration. Use them instead of duplicating
defines.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
The global percpu variable ppp_xmit_recursion is used to detect the ppp
xmit recursion to avoid the deadlock, which is caused by one CPU tries to
lock the xmit lock twice. But it would report false recursion when one CPU
wants to send the skb from two different PPP devices, like one L2TP on the
PPPoE. It is a normal case actually.
Now use one percpu member of struct ppp instead of the gloable variable to
detect the xmit recursion of one ppp device.
Fixes: 55454a5658 ("ppp: avoid dealock on recursive xmit")
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Liu Jianying <jianying.liu@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull rdma fixes from Doug Ledford:
"First set of -rc fixes for 4.13 cycle:
- misc iSER fixes
- namespace fixups
- fix the fact that IPoIB didn't use the proper API for noio mem allocs
- rxe driver fixes
- hns_roce fixes
- misc core fixes
- misc IPoIB fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
IB/core: Allow QP state transition from reset to error
IB/hns: Fix for checkpatch.pl comment style warnings
IB/hns: Fix the bug with modifying the MAC address without removing the driver
IB/hns: Fix the bug with rdma operation
IB/hns: Fix the bug with wild pointer when destroy rc qp
IB/hns: Fix the bug of polling cq failed for loopback Qps
IB/rxe: Set dma_mask and coherent_dma_mask
IB/rxe: Fix kernel panic from skb destructor
IB/ipoib: Let lower driver handle get_stats64 call
IB/core: Add ordered workqueue for RoCE GID management
IB/mlx5: Clean mr_cache debugfs in case of failure
IB/core: Remove NOIO QP create flag
{net, IB}/mlx4: Remove gfp flags argument
IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
IB/IPoIB: Forward MTU change to driver below
IB: Convert msleep below 20ms to usleep_range
IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC
IB/core: Introduce modify QP operation with udata
IB/core: Don't resolve IP address to the loopback device
...
Florian Westphal says:
====================
xfrm: remove flow cache
After RCU-ification of ipsec packet path there are no major scalability
issues anymore without flow cache.
We still incur a performance hit, which comes mostly from the extra xfrm
dst allocation/freeing.
The last patch in the series adds a simple percpu cache to avoid the
extra allocation if a packet matched the same policies as last one.
The main concern with this is that we will see performance drops,
especially with large numbers of policies/SAs.
However, during hallway discussions at nfws 2017 it seemed the issues
with flow caching outweight the removal downsides, and that it
might be best to just 'remove it' and see where the practical issues
(if any) will appear.
It should now be possible to also remove the genid member in the policies
as we don't hold bundles for prolonged time anymore, but I think
this change is controversial (and intrusive) enough as-is, so defer
that to a later point in time.
Changes since last rfc:
- fix build failures due to implicit interrupt.h includes
- rework last patch (pcpu cache):
* avoid xchg()
* check policies for walk.dead = 1 instead of more costly bundle_ok().
* flush pcpu bundles when sa/policies get removed, to allow module
references to go away (suggested by Ilan Tayari)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.
The cache will not help with strict RR workloads as there is no hit.
The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.
We need to run the dst_release on the correct cpu to avoid races with
packet path. This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().
Test results using 4 network namespaces and null encryption:
ns1 ns2 -> ns3 -> ns4
netperf -> xfrm/null enc -> xfrm/null dec -> netserver
what TCP_STREAM UDP_STREAM UDP_RR
Flow cache: 14644.61 294.35 327231.64
No flow cache: 14349.81 242.64 202301.72
Pcpu cache: 14629.70 292.21 205595.22
UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.
'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.
See next patch for some numbers.
A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to remove flow cache object embedded in struct xfrm_dst.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the wrapper and renames the __xfrm_policy_lookup variant
to get rid of another place that used flow cache objects.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already
removed this function as its just returns the argument. Again, no
code change.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
after previous change oldflo and xdst are always NULL.
These branches were already removed by gcc, this doesn't change code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of consulting flow cache, call the xfrm bundle/policy lookup
functions directly. This pretends the flow cache had no entry.
This helps to gradually remove flow cache integration,
followup commit will remove the dead code that this change adds.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
revert c386578f1c ("xfrm: Let the flowcache handle its size by default.").
Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
these drivers use tasklets or irq apis, but don't include interrupt.h.
Once flow cache is removed the implicit interrupt.h inclusion goes away
which will break the build.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd fix from Bruce Fields:
"One fix for a problem introduced in the most recent merge window and
found by Dave Jones and KASAN"
* tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix a memory scribble in the callback channel
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: cleanup capabilities
This patch series removes the remaining capabilities as well as the
flags bitmap in the info structures. Most of them are turned into ops,
or new info members.
There is no mv88e6xxx_cap enum or bitmap flags anymore, only
mv88e6xxx_info and mv88e6xxx_ops structures.
While reviewing and documenting the related G2 registers, fix a few
inconsistencies: 88E6185 has no interrupt in G2 and 88E6390 has a POT.
Except these two adjustments, there is no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on a bitmap flag, add a new multi_chip info flag to
describe the presence of the indirect SMI access though the two device
registers 0x0 and 0x1.
All remaining capabilities and flags are now unused. Remove the
mv88e6xxx_cap enum and the info flags bitmaps.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6352 family supports Energy Detect and has one bit for Sense and
one bit for periodically transmit NLP (Energy Detect+TM). The 88E6390
family adds another bit to distinguish Auto or SW wake-up. Chips
supporting EEE all have an EEE Enabled bit in the Port Status Register.
This patch adds new ops for the PHY Energy Detect accesses.
This also allows us to get rid of the MV88E6XXX_FLAG_EEE flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to global1_addr, add a global2_addr member in the info
structure to describe the presence of the Global 2 Registers.
This allows us to get rid of the MV88E6XXX_FLAG_GLOBAL2 flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a pot_clear operation to clear the Priority Override Table and wrap
its call into a mv88e6xxx_pot_setup helper.
This allows us to get rid of the MV88E6XXX_FLAG_G2_POT flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6390 family clear the Priority Override Table the same way as
88E6352, thus add MV88E6XXX_FLAG_G2_POT to MV88E6XXX_FLAGS_FAMILY_6390.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6185 family only has one 16-bit register to mark the 16 802.1D
reserved multicast addresses in the range of 01:80:C2:00:00:0x as MGMT.
The 88E6352 family also has one 16-bit register to mark the 16 GARP
reserved multicast addresses in the range of 01:80:C2:00:00:2x as MGMT.
Split the existing mv88e6095 prefixed mgmt_rsvd2cpu operation into two
distinct mv88e6185 and mv88e6352 prefixed operations, and wrap its call
into a mv88e6xxx_rsvd2cpu_setup helper.
This allows us to also get rid of the MV88E6XXX_CAP_G2_MGMT_EN_* flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to g1_irqs, add a g2_irqs member to the info structure to
indicates the presence of the Global 2 Interrupt Source and Mask
registers.
At the same time, provide helpers and document the registers since they
differ a bit between 88E6352 and 88E6390 families.
This allows us to get rid of the MV88E6XXX_FLAG_G2_INT flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6185 family has no Global 2 Interrupt Source or Mask registers.
Remove the MV88E6XXX_FLAG_G2_INT from MV88E6XXX_FLAGS_FAMILY_6185.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
2610 304 8 2922 b6a sound/hda/hdac_i915.o
File size After adding 'const':
text data bss dec hex filename
2674 240 8 2922 b6a sound/hda/hdac_i915.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On Exynos, 0xf is always used as value of external interrupt in pin mux
function thus a more descriptive macro name can be used.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
When setting the pin function for external interrupts, the driver used
wrong IO memory address base. The pin function register is always under
pctl_base, not the eint_base.
By updating wrong register, the external interrupts for chosen GPIO
would not work at all and some other GPIO might be configured to wrong
value. For example on Exynos5433-based boards, the external interrupts
for gpf{1-5}-X GPIOs should not work at all (driver toggled reserved
registers from ALIVE bank instead).
Platforms other than Exynos5433 should not be affected as eint_base
equals pctl_base in such case.
Fixes: 8b1bd11c1f ("pinctrl: samsung: Add the support the multiple IORESOURCE_MEM for one pin-bank")
Cc: <stable@vger.kernel.org>
Reported-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Tested-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
This driver adds support for the Altera / Intel modular Scatter-Gather
Direct Memory Access (mSGDMA) intellectual property (IP) to the Linux
DMAengine subsystem. Currently it supports the following op modes:
- DMA_MEMCPY
- DMA_SG
- DMA_SLAVE
This implementation has been tested on an Altera Cyclone FPGA connected
via PCIe, both on an ARM and an x86 platform.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>