Commit Graph

768012 Commits

Author SHA1 Message Date
Antoine Tenart
662ae3fe65 net: mvpp2: improve the distribution of packets on CPUs when using RSS
This patch adds an extra indirection when setting the indirection table
into the RSS hardware table to improve the packets distribution across
CPUs. For example, if 2 queues are used on a multi-core system this new
indirection will choose two queues on two different CPUs instead of the
two first queues which are on the same first CPU.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Antoine Tenart
8179642b52 net: mvpp2: RSS indirection table support
This patch adds the RSS indirection table support, allowing to use the
ethtool -x and -X options to dump and set this table.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
[Maxime: Small warning fixes, use one table per port]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
a27a254c26 net: mvpp2: use one RSS table per port
PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
to be used is chosen according to the default rx queue that would be
used for the packet.

This default rx queue is set in the Lookup_id Table (also called
Decoding Table), and is equal to the port->first_rxq.

Since the Classifier itself isn't active at any time for the moment,
this doesn't have a direct effect, the default rx queue at the moment is
the one where all packets end-up into.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
4b86097be7 net: mvpp2: fix RSS register definitions
There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
which was specified is actually named "RSS_HASH_SEL", but isn't used by
this driver at all.

Based on how this register was used, it should have been the
RXQ2RSS_TABLE register, which allows to select the RSS table that will
be used for the incoming packet.

The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
register.

Since RSS tables are actually not used by the driver for now, this
commit does not fix a runtime bug.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Antoine Tenart
132baa0378 net: mvpp2: fix a typo in the RSS code
Cosmetic patch fixing a typo in one of the RSS comments.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
f8c6ba8424 net: mvpp2: use only one rx queue per port per CPU
The number of receive queue per port is :
 - MVPP2_DEFAULT_RXQ if in single queue mode
 - MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode

with MVPP2_DEFAULT_RXQ = 4.

However, we don't use the extra rx queues at the moment, we really only
need one per port per CPU, until some more advanced classification rules
are implemented.

Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
790d32c6d3 net: mvpp2: fix hardcoded number of rx queues
There's a dedicated #define that indicates the number of rx queues per
port per cpu, this commit removes a harcoded use of that value

This doesn't fix any runtime bugs since the harcoded value matches the
expected value.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Yan Markman
4c4a5686c4 net: mvpp2: use RSS only when using multi-queue mode
Since RSS only applies when we have per-cpu rx queues, it should only
be enabled when the driver is configured to make use of multi-queue
mode.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
3f6aaf7289 net: mvpp2: make multi queue mode the default mode
The multi queue mode is needed to have RSS available, and offers some
nice advantages, being able to have one rx queue vector per CPU.

This mode has been usable through the use of a module parameter, this
commit makes it the default value.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
1e27a628e3 net: mvpp2: make sure we use single queue mode on PPv2.1
The PPv2 driver defines 2 "queue_modes" :
 - QDIST_SINGLE_MODE, where each port share one rx queue vector
   between all CPUs
 - QDIST_MULTI_MODE, where each port has one rx queue vector per CPU.

Multi queue mode isn't available on PPv2.1, make sure we fallback to
single mode when running on this revision.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:48 -07:00
Maxime Chevallier
0ad2f53906 net: mvpp2: define the number of RSS entries per table in mvpp2.h
The size of the the RSS indirection tables should be defined in mvpp2.h,
so that we can use it in all files of the PPv2 driver.

This commit moves the define in mvpp2.h, and adds the missing #include
in mvpp2_cls.h.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:47 -07:00
Maxime Chevallier
53a40025c0 net: mvpp2: fix include guards in mvpp2_prs.h
Include guards should be put before #includes. This doesn't fix any bug,
but prevent future compilation issues when adding new files in the mvpp2
driver

The Header Parser init function needs the platform_device definition,
and with the fixed include guards we need to add the missing include.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:30:47 -07:00
Prashant Bhole
68d2f84a13 net: gro: properly remove skb from list
Following crash occurs in validate_xmit_skb_list() when same skb is
iterated multiple times in the loop and consume_skb() is called.

The root cause is calling list_del_init(&skb->list) and not clearing
skb->next in d4546c2509. list_del_init(&skb->list) sets skb->next
to point to skb itself. skb->next needs to be cleared because other
parts of network stack uses another kind of SKB lists.
validate_xmit_skb_list() uses such list.

A similar type of bugfix was reported by Jesper Dangaard Brouer.
https://patchwork.ozlabs.org/patch/942541/

This patch clears skb->next and changes list_del_init() to list_del()
so that list->prev will maintain the list poison.

[  148.185511] ==================================================================
[  148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0
[  148.190158] Read of size 8 at addr ffff8801e52eefc0 by task swapper/1/0
[  148.192940]
[  148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25
[  148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[  148.199129] Call Trace:
[  148.200565]  <IRQ>
[  148.201911]  dump_stack+0xc6/0x14c
[  148.203572]  ? dump_stack_print_info.cold.1+0x2f/0x2f
[  148.205083]  ? kmsg_dump_rewind_nolock+0x59/0x59
[  148.206307]  ? validate_xmit_skb+0x2c6/0x560
[  148.207432]  ? debug_show_held_locks+0x30/0x30
[  148.208571]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.211144]  print_address_description+0x6c/0x23c
[  148.212601]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.213782]  kasan_report.cold.6+0x241/0x2fd
[  148.214958]  validate_xmit_skb_list+0x4b/0xa0
[  148.216494]  sch_direct_xmit+0x1b0/0x680
[  148.217601]  ? dev_watchdog+0x4e0/0x4e0
[  148.218675]  ? do_raw_spin_trylock+0x10/0x120
[  148.219818]  ? do_raw_spin_lock+0xe0/0xe0
[  148.221032]  __dev_queue_xmit+0x1167/0x1810
[  148.222155]  ? sched_clock+0x5/0x10
[...]

[  148.474257] Allocated by task 0:
[  148.475363]  kasan_kmalloc+0xbf/0xe0
[  148.476503]  kmem_cache_alloc+0xb4/0x1b0
[  148.477654]  __build_skb+0x91/0x250
[  148.478677]  build_skb+0x67/0x180
[  148.479657]  e1000_clean_rx_irq+0x542/0x8a0
[  148.480757]  e1000_clean+0x652/0xd10
[  148.481772]  net_rx_action+0x4ea/0xc20
[  148.482808]  __do_softirq+0x1f9/0x574
[  148.483831]
[  148.484575] Freed by task 0:
[  148.485504]  __kasan_slab_free+0x12e/0x180
[  148.486589]  kmem_cache_free+0xb4/0x240
[  148.487634]  kfree_skbmem+0xed/0x150
[  148.488648]  consume_skb+0x146/0x250
[  148.489665]  validate_xmit_skb+0x2b7/0x560
[  148.490754]  validate_xmit_skb_list+0x70/0xa0
[  148.491897]  sch_direct_xmit+0x1b0/0x680
[  148.493949]  __dev_queue_xmit+0x1167/0x1810
[  148.495103]  br_dev_queue_push_xmit+0xce/0x250
[  148.496196]  br_forward_finish+0x276/0x280
[  148.497234]  __br_forward+0x44f/0x520
[  148.498260]  br_forward+0x19f/0x1b0
[  148.499264]  br_handle_frame_finish+0x65e/0x980
[  148.500398]  NF_HOOK.constprop.10+0x290/0x2a0
[  148.501522]  br_handle_frame+0x417/0x640
[  148.502582]  __netif_receive_skb_core+0xaac/0x18f0
[  148.503753]  __netif_receive_skb_one_core+0x98/0x120
[  148.504958]  netif_receive_skb_internal+0xe3/0x330
[  148.506154]  napi_gro_complete+0x190/0x2a0
[  148.507243]  dev_gro_receive+0x9f7/0x1100
[  148.508316]  napi_gro_receive+0xcb/0x260
[  148.509387]  e1000_clean_rx_irq+0x2fc/0x8a0
[  148.510501]  e1000_clean+0x652/0xd10
[  148.511523]  net_rx_action+0x4ea/0xc20
[  148.512566]  __do_softirq+0x1f9/0x574
[  148.513598]
[  148.514346] The buggy address belongs to the object at ffff8801e52eefc0
[  148.514346]  which belongs to the cache skbuff_head_cache of size 232
[  148.517047] The buggy address is located 0 bytes inside of
[  148.517047]  232-byte region [ffff8801e52eefc0, ffff8801e52ef0a8)
[  148.519549] The buggy address belongs to the page:
[  148.520726] page:ffffea000794bb00 count:1 mapcount:0 mapping:ffff880106f4dfc0 index:0xffff8801e52ee840 compound_mapcount: 0
[  148.524325] flags: 0x17ffffc0008100(slab|head)
[  148.525481] raw: 0017ffffc0008100 ffff880106b938d0 ffff880106b938d0 ffff880106f4dfc0
[  148.527503] raw: ffff8801e52ee840 0000000000190011 00000001ffffffff 0000000000000000
[  148.529547] page dumped because: kasan: bad access detected

Fixes: d4546c2509 ("net: Convert GRO SKB handling to list_head.")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reported-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:00:35 -07:00
David S. Miller
c8c81de96b Merge branch 's390-qeth-updates'
Julian Wiedmann says:

====================
s390/qeth: updates 2018-07-11

please apply this first batch of qeth patches for net-next. It brings the
usual cleanups, and some performance improvements to the transmit paths.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:40 -07:00
Julian Wiedmann
fb321f25e5 s390/qeth: speed-up IPv4 OSA xmit
Move the xmit of offload-eligible (ie IPv4) traffic on OSA over to the
new, copy-free path.
As with L2, we'll need to preserve the skb_orphan() behaviour of the
old code path until TX completion is sufficiently fast.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:40 -07:00
Julian Wiedmann
a647a02512 s390/qeth: speed-up L3 IQD xmit
This implements a new xmit path for L3 HiperSockets, which carves the
HW header from skb headroom instead of allocating it from the hdr cache.
It also adds NETIF_F_SG support.

The delta in qeth_l3_xmit() is all just removal of IQD-specific code and
some minor consolidation.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:40 -07:00
Julian Wiedmann
ea1d4a0c7f s390/qeth: add a L3 xmit wrapper
In preparation for future work, move the high-level xmit work into a
separate wrapper. This matches the L2 xmit code.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
371a1e7a07 s390/qeth: increase GSO max size for eligible L3 devices
When a L3 device doesn't offer TSO, allow the stack to build full-size
GSO skbs.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
09960b3a0a s390/qeth: clean up exported symbols
Remove some redundant EXPORTs. While at it, also move some L2-only
prototypes into the proper header file.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
6d8769abe4 s390/qeth: consolidate ccwgroup driver definition
Reshuffle the code a bit so that everything is in one place.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
86c0cdb9e0 s390/qeth: clean up Output Queue selection
Consolidate duplicated code, fix the misuse of RTN_UNSPEC and simplify
the handling of non-unicast traffic on IQD devices.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
9aa17df3b8 s390/qeth: fine-tune RX modesetting
Changing a device's address lists (or its promisc mode) already triggers
an RX modeset, there's no need to do it manually from the L2 driver's
ndo_vlan_rx_kill_vid() hook.

Also when setting a device online, dev_open() already calls
dev_set_rx_mode(). So a manual modeset is only necessary from the
recovery path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
f67a43a73b s390/qeth: remove unused buffer->aob pointer
Except for tracing, the pointer is not used.

At the same time, accessing it from qeth_qdio_output_handler() is racy:
whenever qeth_qdio_cq_handler() gets control, its call to
qeth_qdio_handle_aob() frees the AOB.

So the AOB pointer that qeth_qdio_output_handler() stores into 'buffer'
can go stale at any time, and trigger a use-after-free.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Julian Wiedmann
3b346c1814 s390/qeth: various buffer management cleanups
Use the new qeth_scrub_qdio_buffer() helper, remove an extra parameter
from qeth_clear_output_buffer(), init the bufstates.user field just once
(in qeth_flush_buffers()) and remove some noisy trace messages.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:42:39 -07:00
Jesper Dangaard Brouer
0761680d52 net: ipv4: fix listify ip_rcv_finish in case of forwarding
In commit 5fa12739a5 ("net: ipv4: listify ip_rcv_finish") calling
dst_input(skb) was split-out.  The ip_sublist_rcv_finish() just calls
dst_input(skb) in a loop.

The problem is that ip_sublist_rcv_finish() forgot to remove the SKB
from the list before invoking dst_input().  Further more we need to
clear skb->next as other parts of the network stack use another kind
of SKB lists for xmit_more (see dev_hard_start_xmit).

A crash occurs if e.g. dst_input() invoke ip_forward(), which calls
dst_output()/ip_output() that eventually calls __dev_queue_xmit() +
sch_direct_xmit(), and a crash occurs in validate_xmit_skb_list().

This patch only fixes the crash, but there is a huge potential for
a performance boost if we can pass an SKB-list through to ip_forward.

Fixes: 5fa12739a5 ("net: ipv4: listify ip_rcv_finish")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 16:40:19 -07:00
Arnd Bergmann
51bef926be nfp: avoid using getnstimeofday64()
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:

- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
  by a concurrent settimeofday() or during a leap second adjustment.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:55:39 -07:00
Arnd Bergmann
44c58899b0 liquidio: use ktime_get_real_ts64() instead of getnstimeofday64()
The two do the same thing, but we want to have a consistent
naming in the kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:55:39 -07:00
David S. Miller
ccb06fba51 Merge branch 'net-sched-act_skbedit-lockless-data-path'
Davide Caratti says:

====================
net/sched: act_skbedit: lockless data path

the data path of act_skbedit can be faster if we avoid using spinlocks:
 - patch 1 converts act_skbedit statistics to use per-cpu counters
 - patch 2 lets act_skbedit use RCU to read/update its configuration

test procedure (using pktgen from https://github.com/netoptimizer):

 # ip link add name eth1 type dummy
 # ip link set dev eth1 up
 # tc qdisc add dev eth1 clsact
 # tc filter add dev eth1 egress matchall action skbedit priority c1a0:c1a0
 # for c in 1 2 4 ; do
 > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1
 > done

test results (avg. pps/thread)

  $c | before patch |  after patch | improvement
 ----+--------------+--------------+------------
   1 | 3917464 ± 3% | 4000458 ± 3% |  irrelevant
   2 | 3455367 ± 4% | 3953076 ± 1% |        +14%
   4 | 2496594 ± 2% | 3801123 ± 3% |        +52%

v2: rebased on latest net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:54:12 -07:00
Davide Caratti
c749cdda90 net/sched: act_skbedit: don't use spinlock in the data path
use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on
act_skbedit configuration. This reduces the effects of contention in the
data path, in case multiple readers are present.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:54:12 -07:00
Davide Caratti
6f3dfb0dc8 net/sched: skbedit: use per-cpu counters
use per-CPU counters, instead of sharing a single set of stats with all
cores: this removes the need of spinlocks when stats are read/updated.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:54:12 -07:00
Arnd Bergmann
cca9bab1b7 tcp: use monotonic timestamps for PAWS
Using get_seconds() for timestamps is deprecated since it can lead
to overflows on 32-bit systems. While the interface generally doesn't
overflow until year 2106, the specific implementation of the TCP PAWS
algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
overflow.

A related problem is that the local timestamps in CLOCK_REALTIME form
lead to unexpected behavior when settimeofday is called to set the system
clock backwards or forwards by more than 24 days.

While the first problem could be solved by using an overflow-safe method
of comparing the timestamps, a nicer solution is to use a monotonic
clocksource with ktime_get_seconds() that simply doesn't overflow (at
least not until 136 years after boot) and that doesn't change during
settimeofday().

To make 32-bit and 64-bit architectures behave the same way here, and
also save a few bytes in the tcp_options_received structure, I'm changing
the type to a 32-bit integer, which is now safe on all architectures.

Finally, the ts_recent_stamp field also (confusingly) gets used to store
a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
This is currently safe, but changing the type to 32-bit requires
some small changes there to keep it working.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:50:40 -07:00
Vakul Garg
d2bdd26812 net/tls: Use aead_request_alloc/free for request alloc/free
Instead of kzalloc/free for aead_request allocation and free, use
functions aead_request_alloc(), aead_request_free(). It ensures that
any sensitive crypto material held in crypto transforms is securely
erased from memory.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:44:11 -07:00
Pieter Jansen van Vuuren
cba54f9cf4 tc-testing: add geneve options in tunnel_key unit tests
Extend tc tunnel_key action unit tests with geneve options. Tests
include testing single and multiple geneve options, as well as
testing geneve options that are expected to fail.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 12:34:31 -07:00
Daniel Borkmann
6fd0666041 Merge branch 'bpf-arm-jit-improvements'
Russell King says:

====================
This series improves the ARM BPF JIT compiler by:

- enumerating the stack layout rather than using constants that happen
  to be multiples of four
- rejig the BPF "register" accesses to use negative numbers instead of
  positive, which could be confused with register numbers in the bpf2a32
  array.
- since we maintain the ARM FP register as a pointer to the top of our
  scratch space (or, with frame pointers enabled, a valid ARM frame
  pointer register), we can access our scratch space using FP, which is
  constant across all BPF programs, including tail-called programs.
- use immediate forms of ARM instructions where possible, rather than
  first loading the immediate into an ARM register.
- use load-with-shift instruction rather than seperate shift instruction
  followed by load
- avoid reloading index and array in the tail-call code
- use double-word load/store instructions where available

Version 2:

- Fix ARMv5 test pointed out by Olof
- Fix build error found by 0-day (adding an additional patch)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:24 +02:00
Russell King
8c9602d38c ARM: net: bpf: use double-word load/stores where available
Use double-word load and stores where support for this instruction is
supported by the CPU architecture.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
bef8968df8 ARM: net: bpf: always use odd/even register pair
Always use an odd/even register pair for our 64-bit registers, so that
we're able to use the double-word load/store instructions in the future.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
b504522998 ARM: net: bpf: avoid reloading 'array'
Rearranging the order of the initial tail call code a little allows is
to avoid reloading the 'array' pointer.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
aaffd2f5c3 ARM: net: bpf: avoid reloading 'index'
Avoid reloading 'index' after we have validated it - it remains in
tmp2[1] up to the point that we begin the code to index the pointer
array, so with a little rearrangement of the registers, we can use
the already loaded value.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
2b6958ef11 ARM: net: bpf: use ldr instructions with shifted rm register
Rather than pre-shifting the rm register for the ldr in the tail call,
shift it in the load instruction.  This eliminates one unnecessary
instruction.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
828e2b90e8 ARM: net: bpf: use immediate forms of instructions where possible
Rather than moving constants to a register and then using them in a
subsequent instruction, use them directly in the desired instruction
cutting out the "middle" register.  This removes two instructions from
the tail call code path.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
1ca3b17b77 ARM: net: bpf: imm12 constant conversion
Provide a version of the imm8m() function that the compiler can optimise
when used with a constant expression.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:23 +02:00
Russell King
96cced4e77 ARM: net: bpf: access eBPF scratch space using ARM FP register
Access the eBPF scratch space using the frame pointer rather than our
stack pointer, as the offsets from the ARM frame pointer are constant
across all eBPF programs.

Since we no longer reference the scratch space registers from the stack
pointer, this simplifies emit_push_r64() as it no longer needs to know
how many words are pushed onto the stack.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
a6eccac507 ARM: net: bpf: 64-bit accessor functions for BPF registers
Provide a couple of 64-bit register accessors, and use them where
appropriate

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
7a98702563 ARM: net: bpf: provide accessor functions for BPF registers
Many of the code paths need to have knowledge about whether a register
is stacked or in a CPU register.  Move this decision making to a pair
of helper functions instead of having it scattered throughout the
code.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
47b9c3bf41 ARM: net: bpf: remove is_on_stack() and sstk/dstk
The decision about whether a BPF register is on the stack or in a CPU
register is detected at the top BPF insn processing level, and then
percolated throughout the remainder of the code.  Since we now use
negative register values to represent stacked registers, we can detect
where a BPF register is stored without restoring to carrying this
additional metadata through all code paths.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
1c35ba122d ARM: net: bpf: use negative numbers for stacked registers
Use negative numbers for eBPF registers that live on the stack.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
a8ef95a034 ARM: net: bpf: provide load/store ops with negative immediates
Provide a set of load/store opcode generators that work with negative
immediates as well as positive ones.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Russell King
d449ceb11b ARM: net: bpf: enumerate the JIT scratch stack layout
Enumerate the contents of the JIT scratch stack layout used for storing
some of the JITs 64-bit registers, tail call counter and AX register.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 20:45:22 +02:00
Daniel Borkmann
b103cbe0d7 Merge branch 'bpf-helper-man-install'
Quentin Monnet says:

====================
The three patches in this series are related to the documentation for eBPF
helpers. The first patch brings minor formatting edits to the documentation
in include/uapi/linux/bpf.h, and the second one updates the related header
file under tools/.

The third patch adds a Makefile under tools/bpf for generating the
documentation (man pages) about eBPF helpers. The targets defined in this
file can also be called from the bpftool directory (please refer to
relevant commit logs for details).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 18:55:54 +02:00
Quentin Monnet
86f7d85cec tools: bpf: build and install man page for eBPF helpers from bpftool/
Provide a new Makefile.helpers in tools/bpf, in order to build and
install the man page for eBPF helpers. This Makefile is also included in
the one used to build bpftool documentation, so that it can be called
either on its own (cd tools/bpf && make -f Makefile.helpers) or from
bpftool directory (cd tools/bpf/bpftool && make doc, or
cd tools/bpf/bpftool/Documentation && make helpers).

Makefile.helpers is not added directly to bpftool to avoid changing its
Makefile too much (helpers are not 100% directly related with bpftool).
But the possibility to build the page from bpftool directory makes us
able to package the helpers man page with bpftool, and to install it
along with bpftool documentation, so that the doc for helpers becomes
easily available to developers through the "man" program.

Cc: linux-man@vger.kernel.org
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-12 18:55:53 +02:00