Commit Graph

73 Commits

Author SHA1 Message Date
Cong Wang
695176bfe5 net_sched: refactor TC action init API
TC action ->init() API has 10 parameters, it becomes harder
to read. Some of them are just boolean and can be replaced
by flags. Similarly for the internal API tcf_action_init()
and tcf_exts_validate().

This patch converts them to flags and fold them into
the upper 16 bits of "flags", whose lower 16 bits are still
reserved for user-space. More specifically, the following
kernel flags are introduced:

TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
distinguish whether it is compatible with policer.

TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
this action is bound to a filter.

TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
means we are replacing an existing action.

TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
opposite meaning, because we still hold RTNL in most
cases.

The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
untouched and still stored as before.

I have tested this patch with tdc and I do not see any
failure related to this patch.

Tested-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02 10:24:38 +01:00
Davide Caratti
a7a12b5a0f net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
the following command

 # tc action add action tunnel_key \
 > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

generates the following splat:

 BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
 Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

 CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282
 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
 Call Trace:
  dump_stack+0x99/0xcb
  print_address_description.constprop.7+0x1e/0x230
  kasan_report.cold.13+0x37/0x7c
  tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
  tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
  tcf_action_init_1+0x5b5/0x850
  tcf_action_init+0x15d/0x370
  tcf_action_add+0xd9/0x2f0
  tc_ctl_action+0x29b/0x3a0
  rtnetlink_rcv_msg+0x341/0x8d0
  netlink_rcv_skb+0x120/0x380
  netlink_unicast+0x439/0x630
  netlink_sendmsg+0x719/0xbf0
  sock_sendmsg+0xe2/0x110
  ____sys_sendmsg+0x5ba/0x890
  ___sys_sendmsg+0xe9/0x160
  __sys_sendmsg+0xd3/0x170
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f872a96b338
 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
 RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
 RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
 R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

 Allocated by task 873:
  kasan_save_stack+0x19/0x40
  __kasan_kmalloc.constprop.7+0xc1/0xd0
  __kmalloc+0x151/0x310
  metadata_dst_alloc+0x20/0x40
  tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
  tcf_action_init_1+0x5b5/0x850
  tcf_action_init+0x15d/0x370
  tcf_action_add+0xd9/0x2f0
  tc_ctl_action+0x29b/0x3a0
  rtnetlink_rcv_msg+0x341/0x8d0
  netlink_rcv_skb+0x120/0x380
  netlink_unicast+0x439/0x630
  netlink_sendmsg+0x719/0xbf0
  sock_sendmsg+0xe2/0x110
  ____sys_sendmsg+0x5ba/0x890
  ___sys_sendmsg+0xe9/0x160
  __sys_sendmsg+0xd3/0x170
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

 The buggy address belongs to the object at ffff88813f5f1c00
  which belongs to the cache kmalloc-256 of size 256
 The buggy address is located 200 bytes inside of
  256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
 The buggy address belongs to the page:
 page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
 head:0000000011b48a19 order:1 compound_mapcount:0
 flags: 0x17ffffc0010200(slab|head)
 raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
                                               ^
  ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
the tunnel metadata, but then it expects additional bytes to store tunnel
specific metadata with tunnel_key_copy_opts().

Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
size previously computed by tunnel_key_get_opts_len(), like it's done for
IPv4 tunnels.

Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20 21:10:41 -07:00
Cong Wang
e49d8c22f1 net_sched: defer tcf_idr_insert() in tcf_action_init_1()
All TC actions call tcf_idr_insert() for new action at the end
of their ->init(), so we can actually move it to a central place
in tcf_action_init_1().

And once the action is inserted into the global IDR, other parallel
process could free it immediately as its refcnt is still 1, so we can
not fail after this, we need to move it after the goto action
validation to avoid handling the failure case after insertion.

This is found during code review, is not directly triggered by syzbot.
And this prepares for the next patch.

Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:46:21 -07:00
Xin Long
13e6ce98aa net: sched: only keep the available bits when setting vxlan md->gbp
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
be set to or parse from the packet for vxlan gbp option.

So we'd better do the mask when set it in act_tunnel_key and cls_flower.
Otherwise, when users don't know these bits, they may configure with a
value which can never be matched.

Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-14 16:49:39 -07:00
Linus Torvalds
1ae78780ed Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Dynamic tick (nohz) updates, perhaps most notably changes to force
     the tick on when needed due to lengthy in-kernel execution on CPUs
     on which RCU is waiting.

   - Linux-kernel memory consistency model updates.

   - Replace rcu_swap_protected() with rcu_prepace_pointer().

   - Torture-test updates.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
  bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
  fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
  drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
  drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
  x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
  rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
  rcu: Suppress levelspread uninitialized messages
  rcu: Fix uninitialized variable in nocb_gp_wait()
  rcu: Update descriptions for rcu_future_grace_period tracepoint
  rcu: Update descriptions for rcu_nocb_wake tracepoint
  rcu: Remove obsolete descriptions for rcu_barrier tracepoint
  rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
  workqueue: Convert for_each_wq to use built-in list check
  rcu: Several rcu_segcblist functions can be static
  rcu: Remove unused function hlist_bl_del_init_rcu()
  Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
  ...
2019-11-26 15:42:43 -08:00
Jakub Kicinski
a9f852e92e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f5489 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b0 ("s390/qeth: consolidate some duplicated HW cmd code").

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22 16:27:24 -08:00
Xin Long
e20d4ff2ac net: sched: add erspan option support to act_tunnel_key
This patch is to allow setting erspan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options,
vxlan options or erspan options can't be set at the
same time.

Options are expressed as ver:index:dir:hwid, when ver
is set to 1, index will be applied while dir and hwid
will be ignored, and when ver is set to 2, dir and
hwid will be used while index will be ignored.

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
           flower indev eth0 \
              ip_proto udp \
              action tunnel_key \
                  set src_ip 10.0.99.192 \
                  dst_ip 10.0.99.193 \
                  dst_port 6081 \
                  id 11 \
  		erspan_opts 1:2:0:0 \
          action mirred egress redirect dev erspan1

v1->v2:
  - do the validation when dst is not yet allocated as Jakub suggested.
  - use Duplicate instead of Wrong in err msg for extack.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21 11:44:06 -08:00
Xin Long
fca3f91cc3 net: sched: add vxlan option support to act_tunnel_key
This patch is to allow setting vxlan options using the
act_tunnel_key action. Different from geneve options,
only one option can be set. And also, geneve options
and vxlan options can't be set at the same time.

gbp is the only param for vxlan options:

  # ip link add name vxlan0 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
           flower indev eth0 \
              ip_proto udp \
              action tunnel_key \
                  set src_ip 10.0.99.192 \
                  dst_ip 10.0.99.193 \
                  dst_port 6081 \
                  id 11 \
  		  vxlan_opts 01020304 \
          action mirred egress redirect dev vxlan0

v1->v2:
  - add .strict_start_type for enc_opts_policy as Jakub noticed.
  - use Duplicate instead of Wrong in err msg for extack as Jakub
    suggested.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21 11:44:06 -08:00
Xin Long
4f0e97d070 net: sched: ensure opts_len <= IP_TUNNEL_OPTS_MAX in act_tunnel_key
info->options_len is 'u8' type, and when opts_len with a value >
IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
to u8 and set a wrong value to info->options_len.

Kernel crashed in my test when doing:

  # opts="0102:80:00800022"
  # for i in {1..99}; do opts="$opts,0102:80:00800022"; done
  # ip link add name geneve0 type geneve dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
       flower indev eth0 ip_proto udp action tunnel_key \
       set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
       dst_port 6081 id 11 geneve_opts $opts \
       action mirred egress redirect dev geneve0

So we should do the similar check as cls_flower does, return error
when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().

Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18 17:17:07 -08:00
Vlad Buslov
e382267860 net: sched: update action implementations to support flags
Extend struct tc_action with new "tcfa_flags" field. Set the field in
tcf_idr_create() function and provide new helper
tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
value. Update individual hardware-offloaded actions init() to pass their
"flags" argument to new helper in order to skip percpu stats allocation
when user requested it through flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:51 -07:00
Vlad Buslov
abbb0d3363 net: sched: extend TCA_ACT space with TCA_ACT_FLAGS
Extend TCA_ACT space with nla_bitfield32 flags. Add
TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
tcf_action_init_1() and pass resulting value as additional argument to
a_o->init().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
5e1ad95b63 net: sched: extract bstats update code into function
Extract common code that increments cpu_bstats counter into standalone act
API function. Change hardware offloaded actions that use percpu counter
allocation to use the new function instead of incrementing cpu_bstats
directly.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Paul E. McKenney
445d374931 net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().

Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
2019-10-30 08:45:48 -07:00
Cong Wang
981471bd3a net_sched: fix a NULL pointer deref in ipt action
The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().

The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.

Fixes: 0c66dc1ea3 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 15:05:58 -07:00
Dmytro Linkin
7be8ef2cdb net: sched: use temporary variable for actions indexes
Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.

Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 10:59:14 -07:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Johannes Berg
8cb081746c netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27 17:07:21 -04:00
Michal Kubecek
ae0be8de9a netlink: make nla_nest_start() add NLA_F_NESTED flag
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27 17:03:44 -04:00
Davide Caratti
e5fdabacbf net/sched: act_tunnel_key: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
 > nocsum id 1 pass index 90
 # tc actions replace action tunnel_key \
 > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action tunnel_key

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: tunnel_key  set
         src_ip 10.10.10.1
         dst_ip 20.20.2.0
         key_id 1
         dst_port 3128
         nocsum goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
 FS:  0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_tunnel_key_init() proved to fix
the above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
85d0966fa5 net/sched: prepare TC actions to properly validate the control action
- pass a pointer to struct tcf_proto in each actions's init() handler,
  to allow validating the control action, checking whether the chain
  exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
  to the action's init() handler, and replace it with a test that forbids
  addition of actions having 'goto_chain' and NULL goto_chain pointer at
  the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
  and eventually allocate the action 'goto_chain' within the init()
  handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
  swap the current 'goto_chain' pointer with the new given one.

This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.

By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:

 # tc chain add dev dd0 chain 42 ingress protocol ip flower \
 > ip_proto tcp action drop
 # tc chain add dev dd0 chain 43 ingress protocol ip flower \
 > ip_proto udp action drop
 # tc filter add dev dd0 ingress matchall \
 > action gact goto chain 42 index 66
 # tc filter replace dev dd0 ingress matchall \
 > action gact goto chain 43 index 66
 # echo scan >/sys/kernel/debug/kmemleak
 <...>
 unreferenced object 0xffff93c0ee09f000 (size 1024):
 comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
 hex dump (first 32 bytes):
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00  ................
 backtrace:
   [<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
   [<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
   [<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
   [<000000006126a348>] netlink_unicast+0x1a0/0x250
   [<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
   [<00000000a25a2171>] sock_sendmsg+0x36/0x40
   [<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
   [<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
   [<000000007a6c61f9>] do_syscall_64+0x5b/0x180
   [<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
   [<0000000013eaa334>] 0xffffffffffffffff

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
wenxu
4177c5d942 net/sched: act_tunnel_key: Fix double free dst_cache
dst_cache_destroy will be called in dst_release

dst_release-->dst_destroy_rcu-->dst_destroy-->metadata_dst_free
-->dst_cache_destroy

It should not call dst_cache_destroy before dst_release

Fixes: 41411e2fd6 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05 12:57:28 -08:00
Arnd Bergmann
096461de96 net/sched: avoid unused-label warning
The label is only used from inside the #ifdef and should be
hidden the same way, to avoid this warning:

net/sched/act_tunnel_key.c: In function 'tunnel_key_init':
net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label]
 release_tun_meta:

Fixes: 41411e2fd6 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04 13:24:25 -08:00
David S. Miller
9eb359140c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-02 12:54:35 -08:00
Vlad Buslov
87750d173c net: sched: act_tunnel_key: fix metadata handling
Tunnel key action params->tcft_enc_metadata is only set when action is
TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly
dereferenced during tunnel key init and release without verifying that
action is if correct type, which causes NULL pointer dereference. Metadata
tunnel dst_cache is also leaked on action overwrite.

Fix metadata handling:
- Verify that metadata pointer is not NULL before dereferencing it in
  tunnel_key_init error handling code.
- Move dst_cache destroy code into tunnel_key_release_params() function
  that is called in both action overwrite and release cases (fixes resource
  leak) and verifies that actions has correct type before dereferencing
  metadata pointer (fixes NULL pointer dereference).

Oops with KASAN enabled during tdc tests execution:

[  261.080482] ==================================================================
[  261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0
[  261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976
[  261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157
[  261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  261.116726] Call Trace:
[  261.119234]  dump_stack+0x9a/0xeb
[  261.122625]  ? dst_cache_destroy+0x21/0xa0
[  261.126818]  ? dst_cache_destroy+0x21/0xa0
[  261.131004]  kasan_report+0x176/0x192
[  261.134752]  ? idr_get_next+0xd0/0x120
[  261.138578]  ? dst_cache_destroy+0x21/0xa0
[  261.142768]  dst_cache_destroy+0x21/0xa0
[  261.146799]  tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[  261.152392]  tcf_action_cleanup+0x2c/0xc0
[  261.156490]  tcf_generic_walker+0x4c2/0x5c0
[  261.160794]  ? tcf_action_dump_1+0x390/0x390
[  261.165163]  ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[  261.170865]  ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[  261.176641]  tca_action_gd+0x600/0xa40
[  261.180482]  ? tca_get_fill.constprop.17+0x200/0x200
[  261.185548]  ? __lock_acquire+0x588/0x1d20
[  261.189741]  ? __lock_acquire+0x588/0x1d20
[  261.193922]  ? mark_held_locks+0x90/0x90
[  261.197944]  ? mark_held_locks+0x90/0x90
[  261.202018]  ? __nla_parse+0xfe/0x190
[  261.205774]  tc_ctl_action+0x218/0x230
[  261.209614]  ? tcf_action_add+0x230/0x230
[  261.213726]  rtnetlink_rcv_msg+0x3a5/0x600
[  261.217910]  ? lock_downgrade+0x2d0/0x2d0
[  261.222006]  ? validate_linkmsg+0x400/0x400
[  261.226278]  ? find_held_lock+0x6d/0xd0
[  261.230200]  ? match_held_lock+0x1b/0x210
[  261.234296]  ? validate_linkmsg+0x400/0x400
[  261.238567]  netlink_rcv_skb+0xc7/0x1f0
[  261.242489]  ? netlink_ack+0x470/0x470
[  261.246319]  ? netlink_deliver_tap+0x1f3/0x5a0
[  261.250874]  netlink_unicast+0x2ae/0x350
[  261.254884]  ? netlink_attachskb+0x340/0x340
[  261.261647]  ? _copy_from_iter_full+0xdd/0x380
[  261.268576]  ? __virt_addr_valid+0xb6/0xf0
[  261.275227]  ? __check_object_size+0x159/0x240
[  261.282184]  netlink_sendmsg+0x4d3/0x630
[  261.288572]  ? netlink_unicast+0x350/0x350
[  261.295132]  ? netlink_unicast+0x350/0x350
[  261.301608]  sock_sendmsg+0x6d/0x80
[  261.307467]  ___sys_sendmsg+0x48e/0x540
[  261.313633]  ? copy_msghdr_from_user+0x210/0x210
[  261.320545]  ? save_stack+0x89/0xb0
[  261.326289]  ? __lock_acquire+0x588/0x1d20
[  261.332605]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.340063]  ? mark_held_locks+0x90/0x90
[  261.346162]  ? do_filp_open+0x138/0x1d0
[  261.352108]  ? may_open_dev+0x50/0x50
[  261.357897]  ? match_held_lock+0x1b/0x210
[  261.364016]  ? __fget_light+0xa6/0xe0
[  261.369840]  ? __sys_sendmsg+0xd2/0x150
[  261.375814]  __sys_sendmsg+0xd2/0x150
[  261.381610]  ? __ia32_sys_shutdown+0x30/0x30
[  261.388026]  ? lock_downgrade+0x2d0/0x2d0
[  261.394182]  ? mark_held_locks+0x1c/0x90
[  261.400230]  ? do_syscall_64+0x1e/0x280
[  261.406172]  do_syscall_64+0x78/0x280
[  261.411932]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.419103] RIP: 0033:0x7f28e91a8b87
[  261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[  261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[  261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[  261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[  261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[  261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[  261.506281] ==================================================================
[  261.516076] Disabling lock debugging due to kernel taint
[  261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[  261.534413] #PF error: [normal kernel read fault]
[  261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0
[  261.551294] Oops: 0000 [#1] SMP KASAN PTI
[  261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G    B             5.0.0-rc7+ #157
[  261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0
[  261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[  261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[  261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[  261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[  261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[  261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[  261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[  261.671110] FS:  00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[  261.682447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
[  261.701283] Call Trace:
[  261.706374]  tunnel_key_release+0x3a/0x50 [act_tunnel_key]
[  261.714522]  tcf_action_cleanup+0x2c/0xc0
[  261.721208]  tcf_generic_walker+0x4c2/0x5c0
[  261.728074]  ? tcf_action_dump_1+0x390/0x390
[  261.734996]  ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
[  261.743247]  ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
[  261.751557]  tca_action_gd+0x600/0xa40
[  261.757991]  ? tca_get_fill.constprop.17+0x200/0x200
[  261.765644]  ? __lock_acquire+0x588/0x1d20
[  261.772461]  ? __lock_acquire+0x588/0x1d20
[  261.779266]  ? mark_held_locks+0x90/0x90
[  261.785880]  ? mark_held_locks+0x90/0x90
[  261.792470]  ? __nla_parse+0xfe/0x190
[  261.798738]  tc_ctl_action+0x218/0x230
[  261.805145]  ? tcf_action_add+0x230/0x230
[  261.811760]  rtnetlink_rcv_msg+0x3a5/0x600
[  261.818564]  ? lock_downgrade+0x2d0/0x2d0
[  261.825433]  ? validate_linkmsg+0x400/0x400
[  261.832256]  ? find_held_lock+0x6d/0xd0
[  261.838624]  ? match_held_lock+0x1b/0x210
[  261.845142]  ? validate_linkmsg+0x400/0x400
[  261.851729]  netlink_rcv_skb+0xc7/0x1f0
[  261.857976]  ? netlink_ack+0x470/0x470
[  261.864132]  ? netlink_deliver_tap+0x1f3/0x5a0
[  261.870969]  netlink_unicast+0x2ae/0x350
[  261.877294]  ? netlink_attachskb+0x340/0x340
[  261.883962]  ? _copy_from_iter_full+0xdd/0x380
[  261.890750]  ? __virt_addr_valid+0xb6/0xf0
[  261.897188]  ? __check_object_size+0x159/0x240
[  261.903928]  netlink_sendmsg+0x4d3/0x630
[  261.910112]  ? netlink_unicast+0x350/0x350
[  261.916410]  ? netlink_unicast+0x350/0x350
[  261.922656]  sock_sendmsg+0x6d/0x80
[  261.928257]  ___sys_sendmsg+0x48e/0x540
[  261.934183]  ? copy_msghdr_from_user+0x210/0x210
[  261.940865]  ? save_stack+0x89/0xb0
[  261.946355]  ? __lock_acquire+0x588/0x1d20
[  261.952358]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  261.959468]  ? mark_held_locks+0x90/0x90
[  261.965248]  ? do_filp_open+0x138/0x1d0
[  261.970910]  ? may_open_dev+0x50/0x50
[  261.976386]  ? match_held_lock+0x1b/0x210
[  261.982210]  ? __fget_light+0xa6/0xe0
[  261.987648]  ? __sys_sendmsg+0xd2/0x150
[  261.993263]  __sys_sendmsg+0xd2/0x150
[  261.998613]  ? __ia32_sys_shutdown+0x30/0x30
[  262.004555]  ? lock_downgrade+0x2d0/0x2d0
[  262.010236]  ? mark_held_locks+0x1c/0x90
[  262.015758]  ? do_syscall_64+0x1e/0x280
[  262.021234]  do_syscall_64+0x78/0x280
[  262.026500]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  262.033207] RIP: 0033:0x7f28e91a8b87
[  262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[  262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
[  262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
[  262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
[  262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
[  262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
[  262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas
[  262.204393] CR2: 00000000000000b0
[  262.210390] ---[ end trace 2e41d786f2c7901a ]---
[  262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
[  262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
[  262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282
[  262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
[  262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
[  262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
[  262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
[  262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
[  262.316139] FS:  00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
[  262.327146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0

Fixes: 41411e2fd6 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27 21:31:15 -08:00
Vlad Buslov
a3df633a3c net: sched: act_tunnel_key: fix NULL pointer dereference during init
Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
it is unconditionally dereferenced in tunnel_key_init() error handler.
Verify that metadata pointer is not NULL before dereferencing it in
tunnel_key_init error handling code.

Fixes: ee28bb56ac ("net/sched: fix memory leak in act_tunnel_key_init()")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25 10:13:38 -08:00
wenxu
41411e2fd6 net/sched: act_tunnel_key: Add dst_cache support
The metadata_dst is not init the dst_cache which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24 21:51:55 -08:00
Eli Cohen
eddd2cf195 net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICE
Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
TCA_ID_POLICE and also differentiates these identifier, used in struct
tc_action_ops type field, from other macros starting with TCA_ACT_.

To make things clearer, we name the enum defining the TCA_ID_*
identifiers and also change the "type" field of struct tc_action to
id.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-10 09:28:43 -08:00
Davide Caratti
9174c3df1c net/sched: act_tunnel_key: fix memory leak in case of action replace
running the following TDC test cases:

 7afc - Replace tunnel_key set action with all parameters
 364d - Replace tunnel_key set action with all parameters and cookie

it's possible to trigger kmemleak warnings like:

  unreferenced object 0xffff94797127ab40 (size 192):
  comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff  ................
    41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00  A...............
  backtrace:
    [<000000001e85b61c>] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
    [<000000007f3f6ee7>] tcf_action_init_1+0x384/0x4c0
    [<00000000e89e3ded>] tcf_action_init+0x12b/0x1a0
    [<00000000c1c8c0f8>] tcf_action_add+0x73/0x170
    [<0000000095a9fc28>] tc_ctl_action+0x122/0x160
    [<000000004bebeac5>] rtnetlink_rcv_msg+0x263/0x2d0
    [<000000009fd862dd>] netlink_rcv_skb+0x4a/0x110
    [<00000000b55199e7>] netlink_unicast+0x1a0/0x250
    [<000000004996cd21>] netlink_sendmsg+0x2c1/0x3c0
    [<000000004d6a94b4>] sock_sendmsg+0x36/0x40
    [<000000005d9f0208>] ___sys_sendmsg+0x280/0x2f0
    [<00000000dec19023>] __sys_sendmsg+0x5e/0xa0
    [<000000004b82ac81>] do_syscall_64+0x5b/0x180
    [<00000000a0f1209a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<000000002926b2ab>] 0xffffffffffffffff

when the tunnel_key action is replaced, the kernel forgets to release the
dst metadata: ensure they are released by tunnel_key_init(), the same way
it's done in tunnel_key_release().

Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-15 21:38:48 -08:00
Adi Nissim
1c25324caf net/sched: act_tunnel_key: Don't dump dst port if it wasn't set
It's possible to set a tunnel without a destination port. However,
on dump(), a zero dst port is returned to user space even if it was not
set, fix that.

Note that so far it wasn't required, b/c key less tunnels were not
supported and the UDP tunnels do require destination port.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04 20:53:37 -08:00
Adi Nissim
80ef0f22ce net/sched: act_tunnel_key: Allow key-less tunnels
Allow setting a tunnel without a tunnel key. This is required for
tunneling protocols, such as GRE, that define the key as an optional
field.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04 20:53:37 -08:00
David S. Miller
aaf9253025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-12 22:22:42 -07:00
Cong Wang
a162c35114 net_sched: properly cancel netlink dump on failure
When nla_put*() fails after nla_nest_start(), we need
to call nla_nest_cancel() to cancel the message, otherwise
we end up calling nla_nest_end() like a success.

Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 23:05:07 -07:00
Davide Caratti
ee28bb56ac net/sched: fix memory leak in act_tunnel_key_init()
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 # echo clear > /sys/kernel/debug/kmemleak
 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 Error: TC IDR already exists.
 We have an error talking to the kernel
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffff8800574e6c80 (size 256):
   comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  ................
     81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..............
   backtrace:
     [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
     [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
     [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
     [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
     [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
     [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
     [<00000000edab656f>] netlink_unicast+0x40f/0x5d0
     [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
     [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
     [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
     [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
     [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
     [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
     [<00000000fac1b476>] 0xffffffffffffffff

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:18:54 -07:00
Cong Wang
f061b48c17 Revert "net: sched: act: add extack for lookup callback"
This reverts commit 331a9295de ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 22:50:15 -07:00
Cong Wang
97a3f84f2c net_sched: remove unnecessary ops->delete()
All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.

More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().

So it can be just removed.

Fixes: b409074e66 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21 12:45:44 -07:00
Vlad Buslov
653cd284a8 net: sched: always disable bh when taking tcf_lock
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.

Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:

[  105.470398] ================================
[  105.475014] WARNING: inconsistent lock state
[  105.479628] 4.18.0-rc8+ #664 Not tainted
[  105.483897] --------------------------------
[  105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[  105.509696] {SOFTIRQ-ON-W} state was registered at:
[  105.514925]   _raw_spin_lock+0x2c/0x40
[  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
[  105.523990]   tcf_action_init_1+0x4e4/0x660
[  105.528518]   tcf_action_init+0x1ce/0x2d0
[  105.532880]   tcf_exts_validate+0x1d8/0x200
[  105.537416]   fl_change+0x55a/0x268b [cls_flower]
[  105.542469]   tc_new_tfilter+0x748/0xa20
[  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
[  105.551268]   netlink_rcv_skb+0x18d/0x200
[  105.555628]   netlink_unicast+0x2d0/0x370
[  105.559990]   netlink_sendmsg+0x3b9/0x6a0
[  105.564349]   sock_sendmsg+0x6b/0x80
[  105.568271]   ___sys_sendmsg+0x4a1/0x520
[  105.572547]   __sys_sendmsg+0xd7/0x150
[  105.576655]   do_syscall_64+0x72/0x2c0
[  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  105.586243] irq event stamp: 489296
[  105.590084] hardirqs last  enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[  105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[  105.609277] softirqs last  enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[  105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[  105.626813]
               other info that might help us debug this:
[  105.633976]  Possible unsafe locking scenario:

[  105.640526]        CPU0
[  105.643325]        ----
[  105.646125]   lock(&(&p->tcfa_lock)->rlock);
[  105.650747]   <Interrupt>
[  105.653717]     lock(&(&p->tcfa_lock)->rlock);
[  105.658514]
                *** DEADLOCK ***

[  105.665349] 1 lock held by swapper/16/0:
[  105.669629]  #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[  105.678200]
               stack backtrace:
[  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  105.698626] Call Trace:
[  105.701421]  <IRQ>
[  105.703791]  dump_stack+0x92/0xeb
[  105.707461]  print_usage_bug+0x336/0x34c
[  105.711744]  mark_lock+0x7c9/0x980
[  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
[  105.721424]  ? check_usage_forwards+0x230/0x230
[  105.726315]  __lock_acquire+0x923/0x26f0
[  105.730597]  ? debug_show_all_locks+0x240/0x240
[  105.735478]  ? mark_lock+0x493/0x980
[  105.739412]  ? check_chain_key+0x140/0x1f0
[  105.743861]  ? __lock_acquire+0x836/0x26f0
[  105.748323]  ? lock_acquire+0x12e/0x290
[  105.752516]  lock_acquire+0x12e/0x290
[  105.756539]  ? est_fetch_counters+0x3c/0xa0
[  105.761084]  _raw_spin_lock+0x2c/0x40
[  105.765099]  ? est_fetch_counters+0x3c/0xa0
[  105.769633]  est_fetch_counters+0x3c/0xa0
[  105.773995]  est_timer+0x87/0x390
[  105.777670]  ? est_fetch_counters+0xa0/0xa0
[  105.782210]  ? lock_acquire+0x12e/0x290
[  105.786410]  call_timer_fn+0x161/0x550
[  105.790512]  ? est_fetch_counters+0xa0/0xa0
[  105.795055]  ? del_timer_sync+0xd0/0xd0
[  105.799249]  ? __lock_is_held+0x93/0x110
[  105.803531]  ? mark_held_locks+0x20/0xe0
[  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
[  105.812525]  ? est_fetch_counters+0xa0/0xa0
[  105.817069]  ? est_fetch_counters+0xa0/0xa0
[  105.821610]  run_timer_softirq+0x3c4/0x9f0
[  105.826064]  ? lock_acquire+0x12e/0x290
[  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
[  105.835237]  ? __lock_is_held+0x25/0x110
[  105.839517]  __do_softirq+0x11d/0x7bf
[  105.843542]  irq_exit+0x140/0x190
[  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
[  105.852182]  apic_timer_interrupt+0xf/0x20
[  105.856628]  </IRQ>
[  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[  105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[  105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[  105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[  105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[  105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.935232]  do_idle+0x28e/0x320
[  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
[  105.943361]  ? mark_lock+0x8c1/0x980
[  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.952619]  cpu_startup_entry+0xc2/0xd0
[  105.956900]  ? cpu_in_idle+0x20/0x20
[  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.971391]  start_secondary+0x2b5/0x360
[  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
[  105.980654]  secondary_startup_64+0xa5/0xb0

Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:

[  162.108959]  Possible interrupt unsafe locking scenario:

[  162.116386]        CPU0                    CPU1
[  162.121277]        ----                    ----
[  162.126162]   lock(psample_groups_lock);
[  162.130447]                                local_irq_disable();
[  162.136772]                                lock(&(&p->tcfa_lock)->rlock);
[  162.143957]                                lock(psample_groups_lock);
[  162.150813]   <Interrupt>
[  162.153808]     lock(&(&p->tcfa_lock)->rlock);
[  162.158608]
                *** DEADLOCK ***

In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.

Fixes: 4e232818bd ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a2448 ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e012609 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d772849566 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f4370 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b45 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-19 10:46:21 -07:00
Vlad Buslov
729e012609 net: sched: act_tunnel_key: remove dependency on rtnl lock
Use tcf lock to protect tunnel key action struct private data from
concurrent modification in init and dump. Use rcu swap operation to
reassign params pointer under protection of tcf lock. (old params value is
not used by init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Paolo Abeni
7fd4b288ea tc/act: remove unneeded RCU lock in action callback
Each lockless action currently does its own RCU locking in ->act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -> v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

v3 -> v4:
 - fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:31:13 -07:00
David S. Miller
c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Or Gerlitz
07a557f47d net/sched: tunnel_key: Allow to set tos and ttl for tc based ip tunnels
Allow user-space to provide tos and ttl to be set for the tunnel headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 23:26:01 -07:00
Vlad Buslov
0190c1d452 net: sched: atomically check-allocate action
Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 12:42:29 +09:00
Vlad Buslov
4e8ddd7f17 net: sched: don't release reference on action overwrite
Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 12:42:29 +09:00
Vlad Buslov
b409074e66 net: sched: add 'delete' function to action ops
Extend action ops with 'delete' function. Each action type to implements
its own delete function that doesn't depend on rtnl lock.

Implement delete function that is required to delete actions without
holding rtnl lock. Use action API function that atomically deletes action
only if it is still in action idr. This implementation prevents concurrent
threads from deleting same action twice.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 12:42:29 +09:00
Vlad Buslov
789871bb2a net: sched: implement unlocked action init API
Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 12:42:28 +09:00
Vlad Buslov
036bb44327 net: sched: change type of reference and bind counters
Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 12:42:28 +09:00
Davide Caratti
38230a3e0e net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used
the control action in the common member of struct tcf_tunnel_key must be a
valid value, as it can contain the chain index when 'goto chain' is used.
Ensure that the control action can be read as x->tcfa_action, when x is a
pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to
prevent the following command:

 # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
 > $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1

from causing a NULL dereference when a matching packet is received:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 3491 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
 RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
 RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
 R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
 R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
 FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
 Call Trace:
  <IRQ>
  fl_classify+0x1ad/0x1c0 [cls_flower]
  ? __update_load_avg_se.isra.47+0x1ca/0x1d0
  ? __update_load_avg_se.isra.47+0x1ca/0x1d0
  ? update_load_avg+0x665/0x690
  ? update_load_avg+0x665/0x690
  ? kmem_cache_alloc+0x38/0x1c0
  tcf_classify+0x89/0x140
  __netif_receive_skb_core+0x5ea/0xb70
  ? enqueue_entity+0xd0/0x270
  ? process_backlog+0x97/0x150
  process_backlog+0x97/0x150
  net_rx_action+0x14b/0x3e0
  __do_softirq+0xde/0x2b4
  do_softirq_own_stack+0x2a/0x40
  </IRQ>
  do_softirq.part.18+0x49/0x50
  __local_bh_enable_ip+0x49/0x50
  __dev_queue_xmit+0x4ab/0x8a0
  ? wait_woken+0x80/0x80
  ? packet_sendmsg+0x38f/0x810
  ? __dev_queue_xmit+0x8a0/0x8a0
  packet_sendmsg+0x38f/0x810
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  ? do_vfs_ioctl+0xa4/0x630
  ? syscall_trace_enter+0x1df/0x2e0
  ? __audit_syscall_exit+0x22a/0x290
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fd67e18dc93
 Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
 RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93
 RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003
 RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
 R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003
 Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me
  mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca
 CR2: 0000000000000000
 ---[ end trace 1ab8b5b5d4639dfc ]---
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
 RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
 RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
 R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
 R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
 FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-07 22:01:08 +09:00
Simon Horman
0ed5269f9e net/sched: add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Simon Horman
9d7298cd1d net/sched: act_tunnel_key: add extended ack support
Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Simon Horman
a1165b5919 net/sched: act_tunnel_key: disambiguate metadata dst error cases
Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Kirill Tkhai
2f635ceeb2 net: Drop pernet_operations::async
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:09 -04:00