Merge branch 'skb-mono-delivery-time'

Martin KaFai Lau says:

====================
Preserve mono delivery time (EDT) in skb->tstamp

skb->tstamp was first used as the (rcv) timestamp.
The major usage is to report it to the user (e.g. SO_TIMESTAMP).

Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
during egress and used by the qdisc (e.g. sch_fq) to make decision on when
the skb can be passed to the dev.

Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
or the delivery_time, so it is always reset to 0 whenever forwarded
between egress and ingress.

While it makes sense to always clear the (rcv) timestamp in skb->tstamp
to avoid confusing sch_fq that expects the delivery_time, it is a
performance issue [0] to clear the delivery_time if the skb finally
egress to a fq@phy-dev.

This set is to keep the mono delivery time and make it available to
the final egress interface.  Please see individual patch for
the details.

[0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf

v6:
- Add kdoc and use non-UAPI type in patch 6 (Jakub)

v5:
netdev:
- Patch 3 in v4 is broken down into smaller patches 3, 4, and 5 in v5
- The mono_delivery_time bit clearing in __skb_tstamp_tx() is
  done in __net_timestamp() instead.  This is patch 4 in v5.
- Missed a skb_clear_delivery_time() for the 'skip_classify' case
  in dev.c in v4.  That is fixed in patch 5 in v5 for correctness.
  The skb_clear_delivery_time() will be moved to a later
  stage in Patch 10, so it was an intermediate error in v4.
- Added delivery time handling for nfnetlink_{log, queue}.c in patch 9 (Daniel)
- Added delivery time handling in the IPv6 IOAM hop-by-hop option which has
  an experimental IANA assigned value 49 in patch 8
- Added delivery time handling in nf_conntrack for the ipv6 defrag case
  in patch 7
- Removed unlikely() from testing skb->mono_delivery_time (Daniel)

bpf:
- Remove the skb->tstamp dance in ingress.  Depends on bpf insn
  rewrite to return 0 if skb->tstamp has delivery time in patch 11.
  It is to backward compatible with the existing tc-bpf@ingress in
  patch 11.
- bpf_set_delivery_time() will also allow dtime == 0 and
  dtime_type == BPF_SKB_DELIVERY_TIME_NONE as argument
  in patch 12.

v4:
netdev:
- Push the skb_clear_delivery_time() from
  ip_local_deliver() and ip6_input() to
  ip_local_deliver_finish() and ip6_input_finish()
  to accommodate the ipvs forward path.
  This is the notable change in v4 at the netdev side.

    - Patch 3/8 first does the skb_clear_delivery_time() after
      sch_handle_ingress() in dev.c and this will make the
      tc-bpf forward path work via the bpf_redirect_*() helper.

    - The next patch 4/8 (new in v4) will then postpone the
      skb_clear_delivery_time() from dev.c to
      the ip_local_deliver_finish() and ip6_input_finish() after
      taking care of the tstamp usage in the ip defrag case.
      This will make the kernel forward path also work, e.g.
      the ip[6]_forward().

- Fixed a case v3 which missed setting the skb->mono_delivery_time bit
  when sending TCP rst/ack in some cases (e.g. from a ctl_sk).
  That case happens at ip_send_unicast_reply() and
  tcp_v6_send_response().  It is fixed in patch 1/8 (and
  then patch 3/8) in v4.

bpf:
- Adding __sk_buff->delivery_time_type instead of adding
  __sk_buff->mono_delivery_time as in v3.  The tc-bpf can stay with
  one __sk_buff->tstamp instead of having two 'time' fields
  while one is 0 and another is not.
  tc-bpf can use the new __sk_buff->delivery_time_type to tell
  what is stored in __sk_buff->tstamp.
- bpf_skb_set_delivery_time() helper is added to set
  __sk_buff->tstamp from non mono delivery_time to
  mono delivery_time
- Most of the convert_ctx_access() bpf insn rewrite in v3
  is gone, so no new rewrite added for __sk_buff->tstamp.
  The only rewrite added is for reading the new
  __sk_buff->delivery_time_type.
- Added selftests, test_tc_dtime.c

v3:
- Feedback from v2 is using shinfo(skb)->tx_flags could be racy.
- Considered to reuse a few bits in skb->tstamp to represent
  different semantics, other than more code churns, it will break
  the bpf usecase which currently can write and then read back
  the skb->tstamp.
- Went back to v1 idea on adding a bit to skb and address the
  feedbacks on v1:
- Added one bit skb->mono_delivery_time to flag that
  the skb->tstamp has the mono delivery_time (EDT), instead
  of adding a bit to flag if the skb->tstamp has been forwarded or not.
- Instead of resetting the delivery_time back to the (rcv) timestamp
  during recvmsg syscall which may be too late and not useful,
  the delivery_time reset in v3 happens earlier once the stack
  knows that the skb will be delivered locally.
- Handled the tapping@ingress case by af_packet
- No need to change the (rcv) timestamp to mono clock base as in v1.
  The added one bit to flag skb->mono_delivery_time is enough
  to keep the EDT delivery_time during forward.
- Added logic to the bpf side to make the existing bpf
  running at ingress can still get the (rcv) timestamp
  when reading the __sk_buff->tstamp.  New __sk_buff->mono_delivery_time
  is also added.  Test is still needed to test this piece.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2022-03-03 14:38:49 +00:00
commit 01e2d15796
38 changed files with 1174 additions and 73 deletions

View File

@ -74,7 +74,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
skb_tx_timestamp(skb);
/* do not fool net_timestamp_check() with various clock bases */
skb->tstamp = 0;
skb_clear_tstamp(skb);
skb_orphan(skb);

View File

@ -572,7 +572,8 @@ struct bpf_prog {
has_callchain_buf:1, /* callchain buffer allocated? */
enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
call_get_func_ip:1; /* Do we call get_func_ip() */
call_get_func_ip:1, /* Do we call get_func_ip() */
delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */
enum bpf_prog_type type; /* Type of BPF program */
enum bpf_attach_type expected_attach_type; /* For some prog types */
u32 len; /* Number of filter blocks */

View File

@ -795,6 +795,10 @@ typedef unsigned char *sk_buff_data_t;
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
* @mono_delivery_time: When set, skb->tstamp has the
* delivery_time in mono clock base (i.e. EDT). Otherwise, the
* skb->tstamp has the (rcv) timestamp at ingress and
* delivery_time at egress.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @secmark: security marking
@ -937,8 +941,12 @@ struct sk_buff {
__u8 vlan_present:1; /* See PKT_VLAN_PRESENT_BIT */
__u8 csum_complete_sw:1;
__u8 csum_level:2;
__u8 csum_not_inet:1;
__u8 dst_pending_confirm:1;
__u8 mono_delivery_time:1;
#ifdef CONFIG_NET_CLS_ACT
__u8 tc_skip_classify:1;
__u8 tc_at_ingress:1;
#endif
#ifdef CONFIG_IPV6_NDISC_NODETYPE
__u8 ndisc_nodetype:2;
#endif
@ -949,10 +957,6 @@ struct sk_buff {
#ifdef CONFIG_NET_SWITCHDEV
__u8 offload_fwd_mark:1;
__u8 offload_l3_fwd_mark:1;
#endif
#ifdef CONFIG_NET_CLS_ACT
__u8 tc_skip_classify:1;
__u8 tc_at_ingress:1;
#endif
__u8 redirected:1;
#ifdef CONFIG_NET_REDIRECT
@ -965,6 +969,7 @@ struct sk_buff {
__u8 decrypted:1;
#endif
__u8 slow_gro:1;
__u8 csum_not_inet:1;
#ifdef CONFIG_NET_SCHED
__u16 tc_index; /* traffic control index */
@ -1042,10 +1047,16 @@ struct sk_buff {
/* if you move pkt_vlan_present around you also must adapt these constants */
#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_VLAN_PRESENT_BIT 7
#define TC_AT_INGRESS_MASK (1 << 0)
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 2)
#else
#define PKT_VLAN_PRESENT_BIT 0
#define TC_AT_INGRESS_MASK (1 << 7)
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 5)
#endif
#define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#ifdef __KERNEL__
/*
@ -3976,6 +3987,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
skb->mono_delivery_time = 0;
}
static inline ktime_t net_timedelta(ktime_t t)
@ -3983,6 +3995,56 @@ static inline ktime_t net_timedelta(ktime_t t)
return ktime_sub(ktime_get_real(), t);
}
static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
bool mono)
{
skb->tstamp = kt;
skb->mono_delivery_time = kt && mono;
}
DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
/* It is used in the ingress path to clear the delivery_time.
* If needed, set the skb->tstamp to the (rcv) timestamp.
*/
static inline void skb_clear_delivery_time(struct sk_buff *skb)
{
if (skb->mono_delivery_time) {
skb->mono_delivery_time = 0;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
else
skb->tstamp = 0;
}
}
static inline void skb_clear_tstamp(struct sk_buff *skb)
{
if (skb->mono_delivery_time)
return;
skb->tstamp = 0;
}
static inline ktime_t skb_tstamp(const struct sk_buff *skb)
{
if (skb->mono_delivery_time)
return 0;
return skb->tstamp;
}
static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
{
if (!skb->mono_delivery_time && skb->tstamp)
return skb->tstamp;
if (static_branch_unlikely(&netstamp_needed_key) || cond)
return ktime_get_real();
return 0;
}
static inline u8 skb_metadata_len(const struct sk_buff *skb)
{
return skb_shinfo(skb)->meta_len;
@ -4839,7 +4901,7 @@ static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
#ifdef CONFIG_NET_REDIRECT
skb->from_ingress = from_ingress;
if (skb->from_ingress)
skb->tstamp = 0;
skb_clear_tstamp(skb);
#endif
}

View File

@ -70,6 +70,7 @@ struct frag_v6_compare_key {
* @stamp: timestamp of the last received fragment
* @len: total length of the original datagram
* @meat: length of received fragments so far
* @mono_delivery_time: stamp has a mono delivery time (EDT)
* @flags: fragment queue flags
* @max_size: maximum received fragment size
* @fqdir: pointer to struct fqdir
@ -90,6 +91,7 @@ struct inet_frag_queue {
ktime_t stamp;
int len;
int meat;
u8 mono_delivery_time;
__u8 flags;
u16 max_size;
struct fqdir *fqdir;

View File

@ -5086,6 +5086,37 @@ union bpf_attr {
* Return
* 0 on success, or a negative error in case of failure. On error
* *dst* buffer is zeroed out.
*
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
* Description
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
* change the __sk_buff->delivery_time_type to *dtime_type*.
*
* When setting a delivery time (non zero *dtime*) to
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
* is supported. It is the only delivery_time_type that will be
* kept after bpf_redirect_*().
*
* If there is no need to change the __sk_buff->delivery_time_type,
* the delivery time can be directly written to __sk_buff->tstamp
* instead.
*
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
* can be used to clear any delivery time stored in
* __sk_buff->tstamp.
*
* Only IPv4 and IPv6 skb->protocol are supported.
*
* This function is most useful when it needs to set a
* mono delivery time to __sk_buff->tstamp and then
* bpf_redirect_*() to the egress of an iface. For example,
* changing the (rcv) timestamp in __sk_buff->tstamp at
* ingress to a mono delivery time and then bpf_redirect_*()
* to sch_fq@phy-dev.
* Return
* 0 on success.
* **-EINVAL** for invalid input
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5280,6 +5311,7 @@ union bpf_attr {
FN(xdp_load_bytes), \
FN(xdp_store_bytes), \
FN(copy_from_user_task), \
FN(skb_set_delivery_time), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -5469,6 +5501,12 @@ union { \
__u64 :64; \
} __attribute__((aligned(8)))
enum {
BPF_SKB_DELIVERY_TIME_NONE,
BPF_SKB_DELIVERY_TIME_UNSPEC,
BPF_SKB_DELIVERY_TIME_MONO,
};
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
@ -5509,7 +5547,8 @@ struct __sk_buff {
__u32 gso_segs;
__bpf_md_ptr(struct bpf_sock *, sk);
__u32 gso_size;
__u32 :32; /* Padding, future use. */
__u8 delivery_time_type;
__u32 :24; /* Padding, future use. */
__u64 hwtstamp;
};

View File

@ -62,7 +62,7 @@ EXPORT_SYMBOL_GPL(br_dev_queue_push_xmit);
int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
skb->tstamp = 0;
skb_clear_tstamp(skb);
return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING,
net, sk, skb, NULL, skb->dev,
br_dev_queue_push_xmit);

View File

@ -32,6 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
bool mono_delivery_time = skb->mono_delivery_time;
unsigned int hlen, ll_rs, mtu;
ktime_t tstamp = skb->tstamp;
struct ip_frag_state state;
@ -81,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
if (iter.frag)
ip_fraglist_prepare(skb, &iter);
skb->tstamp = tstamp;
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@ -112,7 +113,7 @@ slow_path:
goto blackhole;
}
skb2->tstamp = tstamp;
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;

View File

@ -2047,7 +2047,8 @@ void net_dec_egress_queue(void)
EXPORT_SYMBOL_GPL(net_dec_egress_queue);
#endif
static DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
EXPORT_SYMBOL(netstamp_needed_key);
#ifdef CONFIG_JUMP_LABEL
static atomic_t netstamp_needed_deferred;
static atomic_t netstamp_wanted;
@ -2108,14 +2109,15 @@ EXPORT_SYMBOL(net_disable_timestamp);
static inline void net_timestamp_set(struct sk_buff *skb)
{
skb->tstamp = 0;
skb->mono_delivery_time = 0;
if (static_branch_unlikely(&netstamp_needed_key))
__net_timestamp(skb);
skb->tstamp = ktime_get_real();
}
#define net_timestamp_check(COND, SKB) \
if (static_branch_unlikely(&netstamp_needed_key)) { \
if ((COND) && !(SKB)->tstamp) \
__net_timestamp(SKB); \
(SKB)->tstamp = ktime_get_real(); \
} \
bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)

View File

@ -2107,7 +2107,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
}
skb->dev = dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
dev_xmit_recursion_inc();
ret = dev_queue_xmit(skb);
@ -2176,7 +2176,7 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
}
skb->dev = dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
skb = skb_expand_head(skb, hh_len);
@ -2274,7 +2274,7 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
}
skb->dev = dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
skb = skb_expand_head(skb, hh_len);
@ -7388,6 +7388,43 @@ static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = {
.arg3_type = ARG_ANYTHING,
};
BPF_CALL_3(bpf_skb_set_delivery_time, struct sk_buff *, skb,
u64, dtime, u32, dtime_type)
{
/* skb_clear_delivery_time() is done for inet protocol */
if (skb->protocol != htons(ETH_P_IP) &&
skb->protocol != htons(ETH_P_IPV6))
return -EOPNOTSUPP;
switch (dtime_type) {
case BPF_SKB_DELIVERY_TIME_MONO:
if (!dtime)
return -EINVAL;
skb->tstamp = dtime;
skb->mono_delivery_time = 1;
break;
case BPF_SKB_DELIVERY_TIME_NONE:
if (dtime)
return -EINVAL;
skb->tstamp = 0;
skb->mono_delivery_time = 0;
break;
default:
return -EOPNOTSUPP;
}
return 0;
}
static const struct bpf_func_proto bpf_skb_set_delivery_time_proto = {
.func = bpf_skb_set_delivery_time,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_ANYTHING,
.arg3_type = ARG_ANYTHING,
};
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
@ -7749,6 +7786,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_tcp_gen_syncookie_proto;
case BPF_FUNC_sk_assign:
return &bpf_sk_assign_proto;
case BPF_FUNC_skb_set_delivery_time:
return &bpf_skb_set_delivery_time_proto;
#endif
default:
return bpf_sk_base_func_proto(func_id);
@ -8088,7 +8127,9 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
return false;
info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL;
break;
case offsetofend(struct __sk_buff, gso_size) ... offsetof(struct __sk_buff, hwtstamp) - 1:
case offsetof(struct __sk_buff, delivery_time_type):
return false;
case offsetofend(struct __sk_buff, delivery_time_type) ... offsetof(struct __sk_buff, hwtstamp) - 1:
/* Explicitly prohibit access to padding in __sk_buff. */
return false;
default:
@ -8443,6 +8484,15 @@ static bool tc_cls_act_is_valid_access(int off, int size,
break;
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
case offsetof(struct __sk_buff, delivery_time_type):
/* The convert_ctx_access() on reading and writing
* __sk_buff->tstamp depends on whether the bpf prog
* has used __sk_buff->delivery_time_type or not.
* Thus, we need to set prog->delivery_time_access
* earlier during is_valid_access() here.
*/
((struct bpf_prog *)prog)->delivery_time_access = 1;
return size == sizeof(__u8);
}
return bpf_skb_is_valid_access(off, size, type, prog, info);
@ -8838,6 +8888,45 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type,
return insn - insn_buf;
}
static struct bpf_insn *bpf_convert_dtime_type_read(const struct bpf_insn *si,
struct bpf_insn *insn)
{
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* value_reg = BPF_SKB_DELIVERY_TIME_MONO */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_MONO);
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 10 : 5);
*insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, skb_reg,
offsetof(struct sk_buff, tstamp));
*insn++ = BPF_JMP_IMM(BPF_JNE, tmp_reg, 0, 2);
/* value_reg = BPF_SKB_DELIVERY_TIME_NONE */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_NONE);
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 6 : 1);
#ifdef CONFIG_NET_CLS_ACT
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* At ingress, value_reg = 0 */
*insn++ = BPF_MOV32_IMM(value_reg, 0);
*insn++ = BPF_JMP_A(1);
#endif
/* value_reg = BPF_SKB_DELIVERYT_TIME_UNSPEC */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_UNSPEC);
/* 15 insns with CONFIG_NET_CLS_ACT */
return insn;
}
static struct bpf_insn *bpf_convert_shinfo_access(const struct bpf_insn *si,
struct bpf_insn *insn)
{
@ -8859,6 +8948,71 @@ static struct bpf_insn *bpf_convert_shinfo_access(const struct bpf_insn *si,
return insn;
}
static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
const struct bpf_insn *si,
struct bpf_insn *insn)
{
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
#ifdef CONFIG_NET_CLS_ACT
if (!prog->delivery_time_access) {
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 5);
/* @ingress, read __sk_buff->tstamp as the (rcv) timestamp,
* so check the skb->mono_delivery_time.
*/
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* skb->mono_delivery_time is set, read 0 as the (rcv) timestamp. */
*insn++ = BPF_MOV64_IMM(value_reg, 0);
*insn++ = BPF_JMP_A(1);
}
#endif
*insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg,
offsetof(struct sk_buff, tstamp));
return insn;
}
static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
const struct bpf_insn *si,
struct bpf_insn *insn)
{
__u8 value_reg = si->src_reg;
__u8 skb_reg = si->dst_reg;
#ifdef CONFIG_NET_CLS_ACT
if (!prog->delivery_time_access) {
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 3);
/* Writing __sk_buff->tstamp at ingress as the (rcv) timestamp.
* Clear the skb->mono_delivery_time.
*/
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
~SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
}
#endif
/* skb->tstamp = tstamp */
*insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg,
offsetof(struct sk_buff, tstamp));
return insn;
}
static u32 bpf_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
@ -9167,17 +9321,13 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
BUILD_BUG_ON(sizeof_field(struct sk_buff, tstamp) != 8);
if (type == BPF_WRITE)
*insn++ = BPF_STX_MEM(BPF_DW,
si->dst_reg, si->src_reg,
bpf_target_off(struct sk_buff,
tstamp, 8,
target_size));
insn = bpf_convert_tstamp_write(prog, si, insn);
else
*insn++ = BPF_LDX_MEM(BPF_DW,
si->dst_reg, si->src_reg,
bpf_target_off(struct sk_buff,
tstamp, 8,
target_size));
insn = bpf_convert_tstamp_read(prog, si, insn);
break;
case offsetof(struct __sk_buff, delivery_time_type):
insn = bpf_convert_dtime_type_read(si, insn);
break;
case offsetof(struct __sk_buff, gso_segs):

View File

@ -4851,7 +4851,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
if (hwtstamps)
*skb_hwtstamps(skb) = *hwtstamps;
else
skb->tstamp = ktime_get_real();
__net_timestamp(skb);
__skb_complete_tx_timestamp(skb, sk, tstype, opt_stats);
}
@ -5381,7 +5381,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
ipvs_reset(skb);
skb->mark = 0;
skb->tstamp = 0;
skb_clear_tstamp(skb);
}
EXPORT_SYMBOL_GPL(skb_scrub_packet);

View File

@ -130,6 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq,
goto err;
fq->q.stamp = skb->tstamp;
fq->q.mono_delivery_time = skb->mono_delivery_time;
if (frag_type == LOWPAN_DISPATCH_FRAG1)
fq->q.flags |= INET_FRAG_FIRST_IN;

View File

@ -572,6 +572,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
head->mono_delivery_time = q->mono_delivery_time;
}
EXPORT_SYMBOL(inet_frag_reasm_finish);

View File

@ -79,7 +79,7 @@ static int ip_forward_finish(struct net *net, struct sock *sk, struct sk_buff *s
if (unlikely(opt->optlen))
ip_forward_options(skb);
skb->tstamp = 0;
skb_clear_tstamp(skb);
return dst_output(net, sk, skb);
}

View File

@ -349,6 +349,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->iif = dev->ifindex;
qp->q.stamp = skb->tstamp;
qp->q.mono_delivery_time = skb->mono_delivery_time;
qp->q.meat += skb->len;
qp->ecn |= ecn;
add_frag_mem_limit(qp->q.fqdir, skb->truesize);

View File

@ -226,6 +226,7 @@ resubmit:
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
skb_clear_delivery_time(skb);
__skb_pull(skb, skb_network_header_len(skb));
rcu_read_lock();

View File

@ -761,6 +761,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
{
struct iphdr *iph;
struct sk_buff *skb2;
bool mono_delivery_time = skb->mono_delivery_time;
struct rtable *rt = skb_rtable(skb);
unsigned int mtu, hlen, ll_rs;
struct ip_fraglist_iter iter;
@ -852,7 +853,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
}
}
skb->tstamp = tstamp;
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
err = output(net, sk, skb);
if (!err)
@ -908,7 +909,7 @@ slow_path:
/*
* Put this fragment into the sending queue.
*/
skb2->tstamp = tstamp;
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
err = output(net, sk, skb2);
if (err)
goto fail;
@ -1727,6 +1728,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
arg->csumoffset) = csum_fold(csum_add(nskb->csum,
arg->csum));
nskb->ip_summed = CHECKSUM_NONE;
nskb->mono_delivery_time = !!transmit_time;
ip_push_pending_frames(sk, &fl4);
}
out:

View File

@ -1253,7 +1253,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
skb->skb_mstamp_ns = tp->tcp_wstamp_ns;
skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
if (clone_it) {
oskb = skb;
@ -1589,7 +1589,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
skb_split(skb, buff, len);
buff->tstamp = skb->tstamp;
skb_set_delivery_time(buff, skb->tstamp, true);
tcp_fragment_tstamp(skb, buff);
old_factor = tcp_skb_pcount(skb);
@ -2616,7 +2616,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
skb->skb_mstamp_ns = tp->tcp_wstamp_ns = tp->tcp_clock_cache;
tp->tcp_wstamp_ns = tp->tcp_clock_cache;
skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
tcp_init_tso_segs(skb, mss_now);
goto repair; /* Skip network transmission */
@ -3541,11 +3542,12 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
now = tcp_clock_ns();
#ifdef CONFIG_SYN_COOKIES
if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
skb->skb_mstamp_ns = cookie_init_timestamp(req, now);
skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
true);
else
#endif
{
skb->skb_mstamp_ns = now;
skb_set_delivery_time(skb, now, true);
if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
}
@ -3594,7 +3596,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
synack_type, &opts);
skb->skb_mstamp_ns = now;
skb_set_delivery_time(skb, now, true);
tcp_add_tx_delay(skb, tp);
return skb;
@ -3771,7 +3773,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
syn->skb_mstamp_ns = syn_data->skb_mstamp_ns;
skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
/* Now full SYN+DATA was cloned and sent (or not),
* remove the SYN from the original skb (syn_data)

View File

@ -635,7 +635,8 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
struct ioam6_schema *sc,
u8 sclen, bool is_input)
{
struct __kernel_sock_timeval ts;
struct timespec64 ts;
ktime_t tstamp;
u64 raw64;
u32 raw32;
u16 raw16;
@ -680,10 +681,9 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
if (!skb->dev) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
} else {
if (!skb->tstamp)
__net_timestamp(skb);
tstamp = skb_tstamp_cond(skb, true);
ts = ktime_to_timespec64(tstamp);
skb_get_new_timestamp(skb, &ts);
*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
}
data += sizeof(__be32);
@ -694,13 +694,12 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
if (!skb->dev) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
} else {
if (!skb->tstamp)
__net_timestamp(skb);
if (!trace->type.bit2) {
tstamp = skb_tstamp_cond(skb, true);
ts = ktime_to_timespec64(tstamp);
}
if (!trace->type.bit2)
skb_get_new_timestamp(skb, &ts);
*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
*(__be32 *)data = cpu_to_be32((u32)(ts.tv_nsec / NSEC_PER_USEC));
}
data += sizeof(__be32);
}

View File

@ -459,6 +459,7 @@ discard:
static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
skb_clear_delivery_time(skb);
rcu_read_lock();
ip6_protocol_deliver_rcu(net, skb, 0, false);
rcu_read_unlock();

View File

@ -440,7 +440,7 @@ static inline int ip6_forward_finish(struct net *net, struct sock *sk,
}
#endif
skb->tstamp = 0;
skb_clear_tstamp(skb);
return dst_output(net, sk, skb);
}
@ -813,6 +813,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
inet6_sk(skb->sk) : NULL;
bool mono_delivery_time = skb->mono_delivery_time;
struct ip6_frag_state state;
unsigned int mtu, hlen, nexthdr_offset;
ktime_t tstamp = skb->tstamp;
@ -903,7 +904,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
skb->tstamp = tstamp;
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
err = output(net, sk, skb);
if (!err)
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
@ -962,7 +963,7 @@ slow_path:
/*
* Put this fragment into the sending queue.
*/
frag->tstamp = tstamp;
skb_set_delivery_time(frag, tstamp, mono_delivery_time);
err = output(net, sk, frag);
if (err)
goto fail;

View File

@ -121,6 +121,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
bool mono_delivery_time = skb->mono_delivery_time;
ktime_t tstamp = skb->tstamp;
struct ip6_frag_state state;
u8 *prevhdr, nexthdr = 0;
@ -186,7 +187,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
skb->tstamp = tstamp;
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@ -219,7 +220,7 @@ slow_path:
goto blackhole;
}
skb2->tstamp = tstamp;
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;

View File

@ -264,6 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;
fq->q.stamp = skb->tstamp;
fq->q.mono_delivery_time = skb->mono_delivery_time;
fq->q.meat += skb->len;
fq->ecn |= ecn;
if (payload_len > fq->q.max_size)

View File

@ -194,6 +194,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;
fq->q.stamp = skb->tstamp;
fq->q.mono_delivery_time = skb->mono_delivery_time;
fq->q.meat += skb->len;
fq->ecn |= ecn;
add_frag_mem_limit(fq->q.fqdir, skb->truesize);

View File

@ -940,7 +940,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
} else {
mark = sk->sk_mark;
}
buff->tstamp = tcp_transmit_time(sk);
skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
}
fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark) ?: mark;
fl6.fl6_dport = t1->dest;

View File

@ -610,7 +610,7 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff *skb,
nf_reset_ct(skb);
skb_forward_csum(skb);
if (skb->dev)
skb->tstamp = 0;
skb_clear_tstamp(skb);
}
return ret;
}
@ -652,7 +652,7 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
if (!local) {
skb_forward_csum(skb);
if (skb->dev)
skb->tstamp = 0;
skb_clear_tstamp(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output);
} else
@ -674,7 +674,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff *skb,
ip_vs_drop_early_demux_sk(skb);
skb_forward_csum(skb);
if (skb->dev)
skb->tstamp = 0;
skb_clear_tstamp(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output);
} else

View File

@ -19,7 +19,7 @@ static void nf_do_netdev_egress(struct sk_buff *skb, struct net_device *dev)
skb_push(skb, skb->mac_len);
skb->dev = dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
dev_queue_xmit(skb);
}

View File

@ -376,7 +376,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
nf_flow_nat_ip(flow, skb, thoff, dir, iph);
ip_decrease_ttl(iph);
skb->tstamp = 0;
skb_clear_tstamp(skb);
if (flow_table->flags & NF_FLOWTABLE_COUNTER)
nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);
@ -611,7 +611,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
nf_flow_nat_ipv6(flow, skb, dir, ip6h);
ip6h->hop_limit--;
skb->tstamp = 0;
skb_clear_tstamp(skb);
if (flow_table->flags & NF_FLOWTABLE_COUNTER)
nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);

View File

@ -460,6 +460,7 @@ __build_packet_message(struct nfnl_log_net *log,
sk_buff_data_t old_tail = inst->skb->tail;
struct sock *sk;
const unsigned char *hwhdrp;
ktime_t tstamp;
nlh = nfnl_msg_put(inst->skb, 0, 0,
nfnl_msg_type(NFNL_SUBSYS_ULOG, NFULNL_MSG_PACKET),
@ -588,9 +589,10 @@ __build_packet_message(struct nfnl_log_net *log,
goto nla_put_failure;
}
if (hooknum <= NF_INET_FORWARD && skb->tstamp) {
tstamp = skb_tstamp_cond(skb, false);
if (hooknum <= NF_INET_FORWARD && tstamp) {
struct nfulnl_msg_packet_timestamp ts;
struct timespec64 kts = ktime_to_timespec64(skb->tstamp);
struct timespec64 kts = ktime_to_timespec64(tstamp);
ts.sec = cpu_to_be64(kts.tv_sec);
ts.usec = cpu_to_be64(kts.tv_nsec / NSEC_PER_USEC);

View File

@ -392,6 +392,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
bool csum_verify;
char *secdata = NULL;
u32 seclen = 0;
ktime_t tstamp;
size = nlmsg_total_size(sizeof(struct nfgenmsg))
+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
@ -407,7 +408,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
+ nla_total_size(sizeof(u_int32_t)) /* skbinfo */
+ nla_total_size(sizeof(u_int32_t)); /* cap_len */
if (entskb->tstamp)
tstamp = skb_tstamp_cond(entskb, false);
if (tstamp)
size += nla_total_size(sizeof(struct nfqnl_msg_packet_timestamp));
size += nfqnl_get_bridge_size(entry);
@ -582,9 +584,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
if (nfqnl_put_bridge(entry, skb) < 0)
goto nla_put_failure;
if (entry->state.hook <= NF_INET_FORWARD && entskb->tstamp) {
if (entry->state.hook <= NF_INET_FORWARD && tstamp) {
struct nfqnl_msg_packet_timestamp ts;
struct timespec64 kts = ktime_to_timespec64(entskb->tstamp);
struct timespec64 kts = ktime_to_timespec64(tstamp);
ts.sec = cpu_to_be64(kts.tv_sec);
ts.usec = cpu_to_be64(kts.tv_nsec / NSEC_PER_USEC);

View File

@ -145,7 +145,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
return;
skb->dev = dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
neigh_xmit(neigh_table, dev, addr, skb);
out:
regs->verdict.code = verdict;

View File

@ -507,7 +507,7 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
}
skb->dev = vport->dev;
skb->tstamp = 0;
skb_clear_tstamp(skb);
vport->ops->send(skb);
return;

View File

@ -460,7 +460,7 @@ static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts,
return TP_STATUS_TS_RAW_HARDWARE;
if ((flags & SOF_TIMESTAMPING_SOFTWARE) &&
ktime_to_timespec64_cond(skb->tstamp, ts))
ktime_to_timespec64_cond(skb_tstamp(skb), ts))
return TP_STATUS_TS_SOFTWARE;
return 0;
@ -2199,6 +2199,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
spin_lock(&sk->sk_receive_queue.lock);
po->stats.stats1.tp_packets++;
sock_skb_set_dropcount(sk, skb);
skb_clear_delivery_time(skb);
__skb_queue_tail(&sk->sk_receive_queue, skb);
spin_unlock(&sk->sk_receive_queue.lock);
sk->sk_data_ready(sk);
@ -2377,6 +2378,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
po->stats.stats1.tp_packets++;
if (copy_skb) {
status |= TP_STATUS_COPY;
skb_clear_delivery_time(copy_skb);
__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
}
spin_unlock(&sk->sk_receive_queue.lock);

View File

@ -53,6 +53,8 @@ static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(filter, skb);
}
if (unlikely(!skb->tstamp && skb->mono_delivery_time))
skb->mono_delivery_time = 0;
if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
skb_orphan(skb);

View File

@ -102,6 +102,8 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(prog->filter, skb);
}
if (unlikely(!skb->tstamp && skb->mono_delivery_time))
skb->mono_delivery_time = 0;
if (prog->exts_integrated) {
res->class = 0;

View File

@ -190,7 +190,7 @@ static void xfrmi_dev_uninit(struct net_device *dev)
static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
{
skb->tstamp = 0;
skb_clear_tstamp(skb);
skb->pkt_type = PACKET_HOST;
skb->skb_iif = 0;
skb->ignore_df = 0;

View File

@ -5086,6 +5086,37 @@ union bpf_attr {
* Return
* 0 on success, or a negative error in case of failure. On error
* *dst* buffer is zeroed out.
*
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
* Description
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
* change the __sk_buff->delivery_time_type to *dtime_type*.
*
* When setting a delivery time (non zero *dtime*) to
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
* is supported. It is the only delivery_time_type that will be
* kept after bpf_redirect_*().
*
* If there is no need to change the __sk_buff->delivery_time_type,
* the delivery time can be directly written to __sk_buff->tstamp
* instead.
*
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
* can be used to clear any delivery time stored in
* __sk_buff->tstamp.
*
* Only IPv4 and IPv6 skb->protocol are supported.
*
* This function is most useful when it needs to set a
* mono delivery time to __sk_buff->tstamp and then
* bpf_redirect_*() to the egress of an iface. For example,
* changing the (rcv) timestamp in __sk_buff->tstamp at
* ingress to a mono delivery time and then bpf_redirect_*()
* to sch_fq@phy-dev.
* Return
* 0 on success.
* **-EINVAL** for invalid input
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5280,6 +5311,7 @@ union bpf_attr {
FN(xdp_load_bytes), \
FN(xdp_store_bytes), \
FN(copy_from_user_task), \
FN(skb_set_delivery_time), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -5469,6 +5501,12 @@ union { \
__u64 :64; \
} __attribute__((aligned(8)))
enum {
BPF_SKB_DELIVERY_TIME_NONE,
BPF_SKB_DELIVERY_TIME_UNSPEC,
BPF_SKB_DELIVERY_TIME_MONO,
};
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
@ -5509,7 +5547,8 @@ struct __sk_buff {
__u32 gso_segs;
__bpf_md_ptr(struct bpf_sock *, sk);
__u32 gso_size;
__u32 :32; /* Padding, future use. */
__u8 delivery_time_type;
__u32 :24; /* Padding, future use. */
__u64 hwtstamp;
};

View File

@ -17,6 +17,8 @@
#include <linux/if_tun.h>
#include <linux/limits.h>
#include <linux/sysctl.h>
#include <linux/time_types.h>
#include <linux/net_tstamp.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
@ -29,6 +31,11 @@
#include "test_tc_neigh_fib.skel.h"
#include "test_tc_neigh.skel.h"
#include "test_tc_peer.skel.h"
#include "test_tc_dtime.skel.h"
#ifndef TCP_TX_DELAY
#define TCP_TX_DELAY 37
#endif
#define NS_SRC "ns_src"
#define NS_FWD "ns_fwd"
@ -61,6 +68,7 @@
#define CHK_PROG_PIN_FILE "/sys/fs/bpf/test_tc_chk"
#define TIMEOUT_MILLIS 10000
#define NSEC_PER_SEC 1000000000ULL
#define log_err(MSG, ...) \
fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", \
@ -440,6 +448,431 @@ static int set_forwarding(bool enable)
return 0;
}
static void rcv_tstamp(int fd, const char *expected, size_t s)
{
struct __kernel_timespec pkt_ts = {};
char ctl[CMSG_SPACE(sizeof(pkt_ts))];
struct timespec now_ts;
struct msghdr msg = {};
__u64 now_ns, pkt_ns;
struct cmsghdr *cmsg;
struct iovec iov;
char data[32];
int ret;
iov.iov_base = data;
iov.iov_len = sizeof(data);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = &ctl;
msg.msg_controllen = sizeof(ctl);
ret = recvmsg(fd, &msg, 0);
if (!ASSERT_EQ(ret, s, "recvmsg"))
return;
ASSERT_STRNEQ(data, expected, s, "expected rcv data");
cmsg = CMSG_FIRSTHDR(&msg);
if (cmsg && cmsg->cmsg_level == SOL_SOCKET &&
cmsg->cmsg_type == SO_TIMESTAMPNS_NEW)
memcpy(&pkt_ts, CMSG_DATA(cmsg), sizeof(pkt_ts));
pkt_ns = pkt_ts.tv_sec * NSEC_PER_SEC + pkt_ts.tv_nsec;
ASSERT_NEQ(pkt_ns, 0, "pkt rcv tstamp");
ret = clock_gettime(CLOCK_REALTIME, &now_ts);
ASSERT_OK(ret, "clock_gettime");
now_ns = now_ts.tv_sec * NSEC_PER_SEC + now_ts.tv_nsec;
if (ASSERT_GE(now_ns, pkt_ns, "check rcv tstamp"))
ASSERT_LT(now_ns - pkt_ns, 5 * NSEC_PER_SEC,
"check rcv tstamp");
}
static void snd_tstamp(int fd, char *b, size_t s)
{
struct sock_txtime opt = { .clockid = CLOCK_TAI };
char ctl[CMSG_SPACE(sizeof(__u64))];
struct timespec now_ts;
struct msghdr msg = {};
struct cmsghdr *cmsg;
struct iovec iov;
__u64 now_ns;
int ret;
ret = clock_gettime(CLOCK_TAI, &now_ts);
ASSERT_OK(ret, "clock_get_time(CLOCK_TAI)");
now_ns = now_ts.tv_sec * NSEC_PER_SEC + now_ts.tv_nsec;
iov.iov_base = b;
iov.iov_len = s;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = &ctl;
msg.msg_controllen = sizeof(ctl);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_TXTIME;
cmsg->cmsg_len = CMSG_LEN(sizeof(now_ns));
*(__u64 *)CMSG_DATA(cmsg) = now_ns;
ret = setsockopt(fd, SOL_SOCKET, SO_TXTIME, &opt, sizeof(opt));
ASSERT_OK(ret, "setsockopt(SO_TXTIME)");
ret = sendmsg(fd, &msg, 0);
ASSERT_EQ(ret, s, "sendmsg");
}
static void test_inet_dtime(int family, int type, const char *addr, __u16 port)
{
int opt = 1, accept_fd = -1, client_fd = -1, listen_fd, err;
char buf[] = "testing testing";
struct nstoken *nstoken;
nstoken = open_netns(NS_DST);
if (!ASSERT_OK_PTR(nstoken, "setns dst"))
return;
listen_fd = start_server(family, type, addr, port, 0);
close_netns(nstoken);
if (!ASSERT_GE(listen_fd, 0, "listen"))
return;
/* Ensure the kernel puts the (rcv) timestamp for all skb */
err = setsockopt(listen_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
&opt, sizeof(opt));
if (!ASSERT_OK(err, "setsockopt(SO_TIMESTAMPNS_NEW)"))
goto done;
if (type == SOCK_STREAM) {
/* Ensure the kernel set EDT when sending out rst/ack
* from the kernel's ctl_sk.
*/
err = setsockopt(listen_fd, SOL_TCP, TCP_TX_DELAY, &opt,
sizeof(opt));
if (!ASSERT_OK(err, "setsockopt(TCP_TX_DELAY)"))
goto done;
}
nstoken = open_netns(NS_SRC);
if (!ASSERT_OK_PTR(nstoken, "setns src"))
goto done;
client_fd = connect_to_fd(listen_fd, TIMEOUT_MILLIS);
close_netns(nstoken);
if (!ASSERT_GE(client_fd, 0, "connect_to_fd"))
goto done;
if (type == SOCK_STREAM) {
int n;
accept_fd = accept(listen_fd, NULL, NULL);
if (!ASSERT_GE(accept_fd, 0, "accept"))
goto done;
n = write(client_fd, buf, sizeof(buf));
if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
goto done;
rcv_tstamp(accept_fd, buf, sizeof(buf));
} else {
snd_tstamp(client_fd, buf, sizeof(buf));
rcv_tstamp(listen_fd, buf, sizeof(buf));
}
done:
close(listen_fd);
if (accept_fd != -1)
close(accept_fd);
if (client_fd != -1)
close(client_fd);
}
static int netns_load_dtime_bpf(struct test_tc_dtime *skel)
{
struct nstoken *nstoken;
#define PIN_FNAME(__file) "/sys/fs/bpf/" #__file
#define PIN(__prog) ({ \
int err = bpf_program__pin(skel->progs.__prog, PIN_FNAME(__prog)); \
if (!ASSERT_OK(err, "pin " #__prog)) \
goto fail; \
})
/* setup ns_src tc progs */
nstoken = open_netns(NS_SRC);
if (!ASSERT_OK_PTR(nstoken, "setns " NS_SRC))
return -1;
PIN(egress_host);
PIN(ingress_host);
SYS("tc qdisc add dev veth_src clsact");
SYS("tc filter add dev veth_src ingress bpf da object-pinned "
PIN_FNAME(ingress_host));
SYS("tc filter add dev veth_src egress bpf da object-pinned "
PIN_FNAME(egress_host));
close_netns(nstoken);
/* setup ns_dst tc progs */
nstoken = open_netns(NS_DST);
if (!ASSERT_OK_PTR(nstoken, "setns " NS_DST))
return -1;
PIN(egress_host);
PIN(ingress_host);
SYS("tc qdisc add dev veth_dst clsact");
SYS("tc filter add dev veth_dst ingress bpf da object-pinned "
PIN_FNAME(ingress_host));
SYS("tc filter add dev veth_dst egress bpf da object-pinned "
PIN_FNAME(egress_host));
close_netns(nstoken);
/* setup ns_fwd tc progs */
nstoken = open_netns(NS_FWD);
if (!ASSERT_OK_PTR(nstoken, "setns " NS_FWD))
return -1;
PIN(ingress_fwdns_prio100);
PIN(egress_fwdns_prio100);
PIN(ingress_fwdns_prio101);
PIN(egress_fwdns_prio101);
SYS("tc qdisc add dev veth_dst_fwd clsact");
SYS("tc filter add dev veth_dst_fwd ingress prio 100 bpf da object-pinned "
PIN_FNAME(ingress_fwdns_prio100));
SYS("tc filter add dev veth_dst_fwd ingress prio 101 bpf da object-pinned "
PIN_FNAME(ingress_fwdns_prio101));
SYS("tc filter add dev veth_dst_fwd egress prio 100 bpf da object-pinned "
PIN_FNAME(egress_fwdns_prio100));
SYS("tc filter add dev veth_dst_fwd egress prio 101 bpf da object-pinned "
PIN_FNAME(egress_fwdns_prio101));
SYS("tc qdisc add dev veth_src_fwd clsact");
SYS("tc filter add dev veth_src_fwd ingress prio 100 bpf da object-pinned "
PIN_FNAME(ingress_fwdns_prio100));
SYS("tc filter add dev veth_src_fwd ingress prio 101 bpf da object-pinned "
PIN_FNAME(ingress_fwdns_prio101));
SYS("tc filter add dev veth_src_fwd egress prio 100 bpf da object-pinned "
PIN_FNAME(egress_fwdns_prio100));
SYS("tc filter add dev veth_src_fwd egress prio 101 bpf da object-pinned "
PIN_FNAME(egress_fwdns_prio101));
close_netns(nstoken);
#undef PIN
return 0;
fail:
close_netns(nstoken);
return -1;
}
enum {
INGRESS_FWDNS_P100,
INGRESS_FWDNS_P101,
EGRESS_FWDNS_P100,
EGRESS_FWDNS_P101,
INGRESS_ENDHOST,
EGRESS_ENDHOST,
SET_DTIME,
__MAX_CNT,
};
const char *cnt_names[] = {
"ingress_fwdns_p100",
"ingress_fwdns_p101",
"egress_fwdns_p100",
"egress_fwdns_p101",
"ingress_endhost",
"egress_endhost",
"set_dtime",
};
enum {
TCP_IP6_CLEAR_DTIME,
TCP_IP4,
TCP_IP6,
UDP_IP4,
UDP_IP6,
TCP_IP4_RT_FWD,
TCP_IP6_RT_FWD,
UDP_IP4_RT_FWD,
UDP_IP6_RT_FWD,
UKN_TEST,
__NR_TESTS,
};
const char *test_names[] = {
"tcp ip6 clear dtime",
"tcp ip4",
"tcp ip6",
"udp ip4",
"udp ip6",
"tcp ip4 rt fwd",
"tcp ip6 rt fwd",
"udp ip4 rt fwd",
"udp ip6 rt fwd",
};
static const char *dtime_cnt_str(int test, int cnt)
{
static char name[64];
snprintf(name, sizeof(name), "%s %s", test_names[test], cnt_names[cnt]);
return name;
}
static const char *dtime_err_str(int test, int cnt)
{
static char name[64];
snprintf(name, sizeof(name), "%s %s errs", test_names[test],
cnt_names[cnt]);
return name;
}
static void test_tcp_clear_dtime(struct test_tc_dtime *skel)
{
int i, t = TCP_IP6_CLEAR_DTIME;
__u32 *dtimes = skel->bss->dtimes[t];
__u32 *errs = skel->bss->errs[t];
skel->bss->test = t;
test_inet_dtime(AF_INET6, SOCK_STREAM, IP6_DST, 0);
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P101));
ASSERT_GT(dtimes[EGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, EGRESS_FWDNS_P100));
ASSERT_EQ(dtimes[EGRESS_FWDNS_P101], 0,
dtime_cnt_str(t, EGRESS_FWDNS_P101));
ASSERT_GT(dtimes[EGRESS_ENDHOST], 0,
dtime_cnt_str(t, EGRESS_ENDHOST));
ASSERT_GT(dtimes[INGRESS_ENDHOST], 0,
dtime_cnt_str(t, INGRESS_ENDHOST));
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
}
static void test_tcp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
{
__u32 *dtimes, *errs;
const char *addr;
int i, t;
if (family == AF_INET) {
t = bpf_fwd ? TCP_IP4 : TCP_IP4_RT_FWD;
addr = IP4_DST;
} else {
t = bpf_fwd ? TCP_IP6 : TCP_IP6_RT_FWD;
addr = IP6_DST;
}
dtimes = skel->bss->dtimes[t];
errs = skel->bss->errs[t];
skel->bss->test = t;
test_inet_dtime(family, SOCK_STREAM, addr, 0);
/* fwdns_prio100 prog does not read delivery_time_type, so
* kernel puts the (rcv) timetamp in __sk_buff->tstamp
*/
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
for (i = INGRESS_FWDNS_P101; i < SET_DTIME; i++)
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
}
static void test_udp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
{
__u32 *dtimes, *errs;
const char *addr;
int i, t;
if (family == AF_INET) {
t = bpf_fwd ? UDP_IP4 : UDP_IP4_RT_FWD;
addr = IP4_DST;
} else {
t = bpf_fwd ? UDP_IP6 : UDP_IP6_RT_FWD;
addr = IP6_DST;
}
dtimes = skel->bss->dtimes[t];
errs = skel->bss->errs[t];
skel->bss->test = t;
test_inet_dtime(family, SOCK_DGRAM, addr, 0);
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
/* non mono delivery time is not forwarded */
ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
for (i = EGRESS_FWDNS_P100; i < SET_DTIME; i++)
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
}
static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
{
struct test_tc_dtime *skel;
struct nstoken *nstoken;
int err;
skel = test_tc_dtime__open();
if (!ASSERT_OK_PTR(skel, "test_tc_dtime__open"))
return;
skel->rodata->IFINDEX_SRC = setup_result->ifindex_veth_src_fwd;
skel->rodata->IFINDEX_DST = setup_result->ifindex_veth_dst_fwd;
err = test_tc_dtime__load(skel);
if (!ASSERT_OK(err, "test_tc_dtime__load"))
goto done;
if (netns_load_dtime_bpf(skel))
goto done;
nstoken = open_netns(NS_FWD);
if (!ASSERT_OK_PTR(nstoken, "setns fwd"))
goto done;
err = set_forwarding(false);
close_netns(nstoken);
if (!ASSERT_OK(err, "disable forwarding"))
goto done;
test_tcp_clear_dtime(skel);
test_tcp_dtime(skel, AF_INET, true);
test_tcp_dtime(skel, AF_INET6, true);
test_udp_dtime(skel, AF_INET, true);
test_udp_dtime(skel, AF_INET6, true);
/* Test the kernel ip[6]_forward path instead
* of bpf_redirect_neigh().
*/
nstoken = open_netns(NS_FWD);
if (!ASSERT_OK_PTR(nstoken, "setns fwd"))
goto done;
err = set_forwarding(true);
close_netns(nstoken);
if (!ASSERT_OK(err, "enable forwarding"))
goto done;
test_tcp_dtime(skel, AF_INET, false);
test_tcp_dtime(skel, AF_INET6, false);
test_udp_dtime(skel, AF_INET, false);
test_udp_dtime(skel, AF_INET6, false);
done:
test_tc_dtime__destroy(skel);
}
static void test_tc_redirect_neigh_fib(struct netns_setup_result *setup_result)
{
struct nstoken *nstoken = NULL;
@ -787,6 +1220,7 @@ static void *test_tc_redirect_run_tests(void *arg)
RUN_TEST(tc_redirect_peer_l3);
RUN_TEST(tc_redirect_neigh);
RUN_TEST(tc_redirect_neigh_fib);
RUN_TEST(tc_redirect_dtime);
return NULL;
}

View File

@ -0,0 +1,349 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2022 Meta
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <linux/bpf.h>
#include <linux/stddef.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <sys/socket.h>
/* veth_src --- veth_src_fwd --- veth_det_fwd --- veth_dst
* | |
* ns_src | ns_fwd | ns_dst
*
* ns_src and ns_dst: ENDHOST namespace
* ns_fwd: Fowarding namespace
*/
#define ctx_ptr(field) (void *)(long)(field)
#define ip4_src __bpf_htonl(0xac100164) /* 172.16.1.100 */
#define ip4_dst __bpf_htonl(0xac100264) /* 172.16.2.100 */
#define ip6_src { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
0x00, 0x01, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe }
#define ip6_dst { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
0x00, 0x02, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe }
#define v6_equal(a, b) (a.s6_addr32[0] == b.s6_addr32[0] && \
a.s6_addr32[1] == b.s6_addr32[1] && \
a.s6_addr32[2] == b.s6_addr32[2] && \
a.s6_addr32[3] == b.s6_addr32[3])
volatile const __u32 IFINDEX_SRC;
volatile const __u32 IFINDEX_DST;
#define EGRESS_ENDHOST_MAGIC 0x0b9fbeef
#define INGRESS_FWDNS_MAGIC 0x1b9fbeef
#define EGRESS_FWDNS_MAGIC 0x2b9fbeef
enum {
INGRESS_FWDNS_P100,
INGRESS_FWDNS_P101,
EGRESS_FWDNS_P100,
EGRESS_FWDNS_P101,
INGRESS_ENDHOST,
EGRESS_ENDHOST,
SET_DTIME,
__MAX_CNT,
};
enum {
TCP_IP6_CLEAR_DTIME,
TCP_IP4,
TCP_IP6,
UDP_IP4,
UDP_IP6,
TCP_IP4_RT_FWD,
TCP_IP6_RT_FWD,
UDP_IP4_RT_FWD,
UDP_IP6_RT_FWD,
UKN_TEST,
__NR_TESTS,
};
enum {
SRC_NS = 1,
DST_NS,
};
__u32 dtimes[__NR_TESTS][__MAX_CNT] = {};
__u32 errs[__NR_TESTS][__MAX_CNT] = {};
__u32 test = 0;
static void inc_dtimes(__u32 idx)
{
if (test < __NR_TESTS)
dtimes[test][idx]++;
else
dtimes[UKN_TEST][idx]++;
}
static void inc_errs(__u32 idx)
{
if (test < __NR_TESTS)
errs[test][idx]++;
else
errs[UKN_TEST][idx]++;
}
static int skb_proto(int type)
{
return type & 0xff;
}
static int skb_ns(int type)
{
return (type >> 8) & 0xff;
}
static bool fwdns_clear_dtime(void)
{
return test == TCP_IP6_CLEAR_DTIME;
}
static bool bpf_fwd(void)
{
return test < TCP_IP4_RT_FWD;
}
/* -1: parse error: TC_ACT_SHOT
* 0: not testing traffic: TC_ACT_OK
* >0: first byte is the inet_proto, second byte has the netns
* of the sender
*/
static int skb_get_type(struct __sk_buff *skb)
{
void *data_end = ctx_ptr(skb->data_end);
void *data = ctx_ptr(skb->data);
__u8 inet_proto = 0, ns = 0;
struct ipv6hdr *ip6h;
struct iphdr *iph;
switch (skb->protocol) {
case __bpf_htons(ETH_P_IP):
iph = data + sizeof(struct ethhdr);
if (iph + 1 > data_end)
return -1;
if (iph->saddr == ip4_src)
ns = SRC_NS;
else if (iph->saddr == ip4_dst)
ns = DST_NS;
inet_proto = iph->protocol;
break;
case __bpf_htons(ETH_P_IPV6):
ip6h = data + sizeof(struct ethhdr);
if (ip6h + 1 > data_end)
return -1;
if (v6_equal(ip6h->saddr, (struct in6_addr)ip6_src))
ns = SRC_NS;
else if (v6_equal(ip6h->saddr, (struct in6_addr)ip6_dst))
ns = DST_NS;
inet_proto = ip6h->nexthdr;
break;
default:
return 0;
}
if ((inet_proto != IPPROTO_TCP && inet_proto != IPPROTO_UDP) || !ns)
return 0;
return (ns << 8 | inet_proto);
}
/* format: direction@iface@netns
* egress@veth_(src|dst)@ns_(src|dst)
*/
SEC("tc")
int egress_host(struct __sk_buff *skb)
{
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1)
return TC_ACT_SHOT;
if (!skb_type)
return TC_ACT_OK;
if (skb_proto(skb_type) == IPPROTO_TCP) {
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO &&
skb->tstamp)
inc_dtimes(EGRESS_ENDHOST);
else
inc_errs(EGRESS_ENDHOST);
} else {
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_UNSPEC &&
skb->tstamp)
inc_dtimes(EGRESS_ENDHOST);
else
inc_errs(EGRESS_ENDHOST);
}
skb->tstamp = EGRESS_ENDHOST_MAGIC;
return TC_ACT_OK;
}
/* ingress@veth_(src|dst)@ns_(src|dst) */
SEC("tc")
int ingress_host(struct __sk_buff *skb)
{
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1)
return TC_ACT_SHOT;
if (!skb_type)
return TC_ACT_OK;
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO &&
skb->tstamp == EGRESS_FWDNS_MAGIC)
inc_dtimes(INGRESS_ENDHOST);
else
inc_errs(INGRESS_ENDHOST);
return TC_ACT_OK;
}
/* ingress@veth_(src|dst)_fwd@ns_fwd priority 100 */
SEC("tc")
int ingress_fwdns_prio100(struct __sk_buff *skb)
{
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1)
return TC_ACT_SHOT;
if (!skb_type)
return TC_ACT_OK;
/* delivery_time is only available to the ingress
* if the tc-bpf checks the skb->delivery_time_type.
*/
if (skb->tstamp == EGRESS_ENDHOST_MAGIC)
inc_errs(INGRESS_FWDNS_P100);
if (fwdns_clear_dtime())
skb->tstamp = 0;
return TC_ACT_UNSPEC;
}
/* egress@veth_(src|dst)_fwd@ns_fwd priority 100 */
SEC("tc")
int egress_fwdns_prio100(struct __sk_buff *skb)
{
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1)
return TC_ACT_SHOT;
if (!skb_type)
return TC_ACT_OK;
/* delivery_time is always available to egress even
* the tc-bpf did not use the delivery_time_type.
*/
if (skb->tstamp == INGRESS_FWDNS_MAGIC)
inc_dtimes(EGRESS_FWDNS_P100);
else
inc_errs(EGRESS_FWDNS_P100);
if (fwdns_clear_dtime())
skb->tstamp = 0;
return TC_ACT_UNSPEC;
}
/* ingress@veth_(src|dst)_fwd@ns_fwd priority 101 */
SEC("tc")
int ingress_fwdns_prio101(struct __sk_buff *skb)
{
__u64 expected_dtime = EGRESS_ENDHOST_MAGIC;
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1 || !skb_type)
/* Should have handled in prio100 */
return TC_ACT_SHOT;
if (skb_proto(skb_type) == IPPROTO_UDP)
expected_dtime = 0;
if (skb->delivery_time_type) {
if (fwdns_clear_dtime() ||
skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO ||
skb->tstamp != expected_dtime)
inc_errs(INGRESS_FWDNS_P101);
else
inc_dtimes(INGRESS_FWDNS_P101);
} else {
if (!fwdns_clear_dtime() && expected_dtime)
inc_errs(INGRESS_FWDNS_P101);
}
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) {
skb->tstamp = INGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC,
BPF_SKB_DELIVERY_TIME_MONO))
inc_errs(SET_DTIME);
if (!bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC,
BPF_SKB_DELIVERY_TIME_UNSPEC))
inc_errs(SET_DTIME);
}
if (skb_ns(skb_type) == SRC_NS)
return bpf_fwd() ?
bpf_redirect_neigh(IFINDEX_DST, NULL, 0, 0) : TC_ACT_OK;
else
return bpf_fwd() ?
bpf_redirect_neigh(IFINDEX_SRC, NULL, 0, 0) : TC_ACT_OK;
}
/* egress@veth_(src|dst)_fwd@ns_fwd priority 101 */
SEC("tc")
int egress_fwdns_prio101(struct __sk_buff *skb)
{
int skb_type;
skb_type = skb_get_type(skb);
if (skb_type == -1 || !skb_type)
/* Should have handled in prio100 */
return TC_ACT_SHOT;
if (skb->delivery_time_type) {
if (fwdns_clear_dtime() ||
skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO ||
skb->tstamp != INGRESS_FWDNS_MAGIC)
inc_errs(EGRESS_FWDNS_P101);
else
inc_dtimes(EGRESS_FWDNS_P101);
} else {
if (!fwdns_clear_dtime())
inc_errs(EGRESS_FWDNS_P101);
}
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) {
skb->tstamp = EGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC,
BPF_SKB_DELIVERY_TIME_MONO))
inc_errs(SET_DTIME);
if (!bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC,
BPF_SKB_DELIVERY_TIME_UNSPEC))
inc_errs(SET_DTIME);
}
return TC_ACT_OK;
}
char __license[] SEC("license") = "GPL";