Merge branch 'skb-mono-delivery-time'
Martin KaFai Lau says: ==================== Preserve mono delivery time (EDT) in skb->tstamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. This set is to keep the mono delivery time and make it available to the final egress interface. Please see individual patch for the details. [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf v6: - Add kdoc and use non-UAPI type in patch 6 (Jakub) v5: netdev: - Patch 3 in v4 is broken down into smaller patches 3, 4, and 5 in v5 - The mono_delivery_time bit clearing in __skb_tstamp_tx() is done in __net_timestamp() instead. This is patch 4 in v5. - Missed a skb_clear_delivery_time() for the 'skip_classify' case in dev.c in v4. That is fixed in patch 5 in v5 for correctness. The skb_clear_delivery_time() will be moved to a later stage in Patch 10, so it was an intermediate error in v4. - Added delivery time handling for nfnetlink_{log, queue}.c in patch 9 (Daniel) - Added delivery time handling in the IPv6 IOAM hop-by-hop option which has an experimental IANA assigned value 49 in patch 8 - Added delivery time handling in nf_conntrack for the ipv6 defrag case in patch 7 - Removed unlikely() from testing skb->mono_delivery_time (Daniel) bpf: - Remove the skb->tstamp dance in ingress. Depends on bpf insn rewrite to return 0 if skb->tstamp has delivery time in patch 11. It is to backward compatible with the existing tc-bpf@ingress in patch 11. - bpf_set_delivery_time() will also allow dtime == 0 and dtime_type == BPF_SKB_DELIVERY_TIME_NONE as argument in patch 12. v4: netdev: - Push the skb_clear_delivery_time() from ip_local_deliver() and ip6_input() to ip_local_deliver_finish() and ip6_input_finish() to accommodate the ipvs forward path. This is the notable change in v4 at the netdev side. - Patch 3/8 first does the skb_clear_delivery_time() after sch_handle_ingress() in dev.c and this will make the tc-bpf forward path work via the bpf_redirect_*() helper. - The next patch 4/8 (new in v4) will then postpone the skb_clear_delivery_time() from dev.c to the ip_local_deliver_finish() and ip6_input_finish() after taking care of the tstamp usage in the ip defrag case. This will make the kernel forward path also work, e.g. the ip[6]_forward(). - Fixed a case v3 which missed setting the skb->mono_delivery_time bit when sending TCP rst/ack in some cases (e.g. from a ctl_sk). That case happens at ip_send_unicast_reply() and tcp_v6_send_response(). It is fixed in patch 1/8 (and then patch 3/8) in v4. bpf: - Adding __sk_buff->delivery_time_type instead of adding __sk_buff->mono_delivery_time as in v3. The tc-bpf can stay with one __sk_buff->tstamp instead of having two 'time' fields while one is 0 and another is not. tc-bpf can use the new __sk_buff->delivery_time_type to tell what is stored in __sk_buff->tstamp. - bpf_skb_set_delivery_time() helper is added to set __sk_buff->tstamp from non mono delivery_time to mono delivery_time - Most of the convert_ctx_access() bpf insn rewrite in v3 is gone, so no new rewrite added for __sk_buff->tstamp. The only rewrite added is for reading the new __sk_buff->delivery_time_type. - Added selftests, test_tc_dtime.c v3: - Feedback from v2 is using shinfo(skb)->tx_flags could be racy. - Considered to reuse a few bits in skb->tstamp to represent different semantics, other than more code churns, it will break the bpf usecase which currently can write and then read back the skb->tstamp. - Went back to v1 idea on adding a bit to skb and address the feedbacks on v1: - Added one bit skb->mono_delivery_time to flag that the skb->tstamp has the mono delivery_time (EDT), instead of adding a bit to flag if the skb->tstamp has been forwarded or not. - Instead of resetting the delivery_time back to the (rcv) timestamp during recvmsg syscall which may be too late and not useful, the delivery_time reset in v3 happens earlier once the stack knows that the skb will be delivered locally. - Handled the tapping@ingress case by af_packet - No need to change the (rcv) timestamp to mono clock base as in v1. The added one bit to flag skb->mono_delivery_time is enough to keep the EDT delivery_time during forward. - Added logic to the bpf side to make the existing bpf running at ingress can still get the (rcv) timestamp when reading the __sk_buff->tstamp. New __sk_buff->mono_delivery_time is also added. Test is still needed to test this piece. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
commit
01e2d15796
@ -74,7 +74,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
|
||||
skb_tx_timestamp(skb);
|
||||
|
||||
/* do not fool net_timestamp_check() with various clock bases */
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
skb_orphan(skb);
|
||||
|
||||
|
@ -572,7 +572,8 @@ struct bpf_prog {
|
||||
has_callchain_buf:1, /* callchain buffer allocated? */
|
||||
enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
|
||||
call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
|
||||
call_get_func_ip:1; /* Do we call get_func_ip() */
|
||||
call_get_func_ip:1, /* Do we call get_func_ip() */
|
||||
delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */
|
||||
enum bpf_prog_type type; /* Type of BPF program */
|
||||
enum bpf_attach_type expected_attach_type; /* For some prog types */
|
||||
u32 len; /* Number of filter blocks */
|
||||
|
@ -795,6 +795,10 @@ typedef unsigned char *sk_buff_data_t;
|
||||
* @dst_pending_confirm: need to confirm neighbour
|
||||
* @decrypted: Decrypted SKB
|
||||
* @slow_gro: state present at GRO time, slower prepare step required
|
||||
* @mono_delivery_time: When set, skb->tstamp has the
|
||||
* delivery_time in mono clock base (i.e. EDT). Otherwise, the
|
||||
* skb->tstamp has the (rcv) timestamp at ingress and
|
||||
* delivery_time at egress.
|
||||
* @napi_id: id of the NAPI struct this skb came from
|
||||
* @sender_cpu: (aka @napi_id) source CPU in XPS
|
||||
* @secmark: security marking
|
||||
@ -937,8 +941,12 @@ struct sk_buff {
|
||||
__u8 vlan_present:1; /* See PKT_VLAN_PRESENT_BIT */
|
||||
__u8 csum_complete_sw:1;
|
||||
__u8 csum_level:2;
|
||||
__u8 csum_not_inet:1;
|
||||
__u8 dst_pending_confirm:1;
|
||||
__u8 mono_delivery_time:1;
|
||||
#ifdef CONFIG_NET_CLS_ACT
|
||||
__u8 tc_skip_classify:1;
|
||||
__u8 tc_at_ingress:1;
|
||||
#endif
|
||||
#ifdef CONFIG_IPV6_NDISC_NODETYPE
|
||||
__u8 ndisc_nodetype:2;
|
||||
#endif
|
||||
@ -949,10 +957,6 @@ struct sk_buff {
|
||||
#ifdef CONFIG_NET_SWITCHDEV
|
||||
__u8 offload_fwd_mark:1;
|
||||
__u8 offload_l3_fwd_mark:1;
|
||||
#endif
|
||||
#ifdef CONFIG_NET_CLS_ACT
|
||||
__u8 tc_skip_classify:1;
|
||||
__u8 tc_at_ingress:1;
|
||||
#endif
|
||||
__u8 redirected:1;
|
||||
#ifdef CONFIG_NET_REDIRECT
|
||||
@ -965,6 +969,7 @@ struct sk_buff {
|
||||
__u8 decrypted:1;
|
||||
#endif
|
||||
__u8 slow_gro:1;
|
||||
__u8 csum_not_inet:1;
|
||||
|
||||
#ifdef CONFIG_NET_SCHED
|
||||
__u16 tc_index; /* traffic control index */
|
||||
@ -1042,10 +1047,16 @@ struct sk_buff {
|
||||
/* if you move pkt_vlan_present around you also must adapt these constants */
|
||||
#ifdef __BIG_ENDIAN_BITFIELD
|
||||
#define PKT_VLAN_PRESENT_BIT 7
|
||||
#define TC_AT_INGRESS_MASK (1 << 0)
|
||||
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 2)
|
||||
#else
|
||||
#define PKT_VLAN_PRESENT_BIT 0
|
||||
#define TC_AT_INGRESS_MASK (1 << 7)
|
||||
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 5)
|
||||
#endif
|
||||
#define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
|
||||
#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
|
||||
#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
|
||||
|
||||
#ifdef __KERNEL__
|
||||
/*
|
||||
@ -3976,6 +3987,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
|
||||
static inline void __net_timestamp(struct sk_buff *skb)
|
||||
{
|
||||
skb->tstamp = ktime_get_real();
|
||||
skb->mono_delivery_time = 0;
|
||||
}
|
||||
|
||||
static inline ktime_t net_timedelta(ktime_t t)
|
||||
@ -3983,6 +3995,56 @@ static inline ktime_t net_timedelta(ktime_t t)
|
||||
return ktime_sub(ktime_get_real(), t);
|
||||
}
|
||||
|
||||
static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
|
||||
bool mono)
|
||||
{
|
||||
skb->tstamp = kt;
|
||||
skb->mono_delivery_time = kt && mono;
|
||||
}
|
||||
|
||||
DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
|
||||
|
||||
/* It is used in the ingress path to clear the delivery_time.
|
||||
* If needed, set the skb->tstamp to the (rcv) timestamp.
|
||||
*/
|
||||
static inline void skb_clear_delivery_time(struct sk_buff *skb)
|
||||
{
|
||||
if (skb->mono_delivery_time) {
|
||||
skb->mono_delivery_time = 0;
|
||||
if (static_branch_unlikely(&netstamp_needed_key))
|
||||
skb->tstamp = ktime_get_real();
|
||||
else
|
||||
skb->tstamp = 0;
|
||||
}
|
||||
}
|
||||
|
||||
static inline void skb_clear_tstamp(struct sk_buff *skb)
|
||||
{
|
||||
if (skb->mono_delivery_time)
|
||||
return;
|
||||
|
||||
skb->tstamp = 0;
|
||||
}
|
||||
|
||||
static inline ktime_t skb_tstamp(const struct sk_buff *skb)
|
||||
{
|
||||
if (skb->mono_delivery_time)
|
||||
return 0;
|
||||
|
||||
return skb->tstamp;
|
||||
}
|
||||
|
||||
static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
|
||||
{
|
||||
if (!skb->mono_delivery_time && skb->tstamp)
|
||||
return skb->tstamp;
|
||||
|
||||
if (static_branch_unlikely(&netstamp_needed_key) || cond)
|
||||
return ktime_get_real();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline u8 skb_metadata_len(const struct sk_buff *skb)
|
||||
{
|
||||
return skb_shinfo(skb)->meta_len;
|
||||
@ -4839,7 +4901,7 @@ static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress)
|
||||
#ifdef CONFIG_NET_REDIRECT
|
||||
skb->from_ingress = from_ingress;
|
||||
if (skb->from_ingress)
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
#endif
|
||||
}
|
||||
|
||||
|
@ -70,6 +70,7 @@ struct frag_v6_compare_key {
|
||||
* @stamp: timestamp of the last received fragment
|
||||
* @len: total length of the original datagram
|
||||
* @meat: length of received fragments so far
|
||||
* @mono_delivery_time: stamp has a mono delivery time (EDT)
|
||||
* @flags: fragment queue flags
|
||||
* @max_size: maximum received fragment size
|
||||
* @fqdir: pointer to struct fqdir
|
||||
@ -90,6 +91,7 @@ struct inet_frag_queue {
|
||||
ktime_t stamp;
|
||||
int len;
|
||||
int meat;
|
||||
u8 mono_delivery_time;
|
||||
__u8 flags;
|
||||
u16 max_size;
|
||||
struct fqdir *fqdir;
|
||||
|
@ -5086,6 +5086,37 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure. On error
|
||||
* *dst* buffer is zeroed out.
|
||||
*
|
||||
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
|
||||
* Description
|
||||
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
|
||||
* change the __sk_buff->delivery_time_type to *dtime_type*.
|
||||
*
|
||||
* When setting a delivery time (non zero *dtime*) to
|
||||
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
|
||||
* is supported. It is the only delivery_time_type that will be
|
||||
* kept after bpf_redirect_*().
|
||||
*
|
||||
* If there is no need to change the __sk_buff->delivery_time_type,
|
||||
* the delivery time can be directly written to __sk_buff->tstamp
|
||||
* instead.
|
||||
*
|
||||
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
|
||||
* can be used to clear any delivery time stored in
|
||||
* __sk_buff->tstamp.
|
||||
*
|
||||
* Only IPv4 and IPv6 skb->protocol are supported.
|
||||
*
|
||||
* This function is most useful when it needs to set a
|
||||
* mono delivery time to __sk_buff->tstamp and then
|
||||
* bpf_redirect_*() to the egress of an iface. For example,
|
||||
* changing the (rcv) timestamp in __sk_buff->tstamp at
|
||||
* ingress to a mono delivery time and then bpf_redirect_*()
|
||||
* to sch_fq@phy-dev.
|
||||
* Return
|
||||
* 0 on success.
|
||||
* **-EINVAL** for invalid input
|
||||
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -5280,6 +5311,7 @@ union bpf_attr {
|
||||
FN(xdp_load_bytes), \
|
||||
FN(xdp_store_bytes), \
|
||||
FN(copy_from_user_task), \
|
||||
FN(skb_set_delivery_time), \
|
||||
/* */
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
@ -5469,6 +5501,12 @@ union { \
|
||||
__u64 :64; \
|
||||
} __attribute__((aligned(8)))
|
||||
|
||||
enum {
|
||||
BPF_SKB_DELIVERY_TIME_NONE,
|
||||
BPF_SKB_DELIVERY_TIME_UNSPEC,
|
||||
BPF_SKB_DELIVERY_TIME_MONO,
|
||||
};
|
||||
|
||||
/* user accessible mirror of in-kernel sk_buff.
|
||||
* new fields can only be added to the end of this structure
|
||||
*/
|
||||
@ -5509,7 +5547,8 @@ struct __sk_buff {
|
||||
__u32 gso_segs;
|
||||
__bpf_md_ptr(struct bpf_sock *, sk);
|
||||
__u32 gso_size;
|
||||
__u32 :32; /* Padding, future use. */
|
||||
__u8 delivery_time_type;
|
||||
__u32 :24; /* Padding, future use. */
|
||||
__u64 hwtstamp;
|
||||
};
|
||||
|
||||
|
@ -62,7 +62,7 @@ EXPORT_SYMBOL_GPL(br_dev_queue_push_xmit);
|
||||
|
||||
int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
|
||||
{
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING,
|
||||
net, sk, skb, NULL, skb->dev,
|
||||
br_dev_queue_push_xmit);
|
||||
|
@ -32,6 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
|
||||
struct sk_buff *))
|
||||
{
|
||||
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
|
||||
bool mono_delivery_time = skb->mono_delivery_time;
|
||||
unsigned int hlen, ll_rs, mtu;
|
||||
ktime_t tstamp = skb->tstamp;
|
||||
struct ip_frag_state state;
|
||||
@ -81,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
|
||||
if (iter.frag)
|
||||
ip_fraglist_prepare(skb, &iter);
|
||||
|
||||
skb->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, data, skb);
|
||||
if (err || !iter.frag)
|
||||
break;
|
||||
@ -112,7 +113,7 @@ slow_path:
|
||||
goto blackhole;
|
||||
}
|
||||
|
||||
skb2->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, data, skb2);
|
||||
if (err)
|
||||
goto blackhole;
|
||||
|
@ -2047,7 +2047,8 @@ void net_dec_egress_queue(void)
|
||||
EXPORT_SYMBOL_GPL(net_dec_egress_queue);
|
||||
#endif
|
||||
|
||||
static DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
|
||||
DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
|
||||
EXPORT_SYMBOL(netstamp_needed_key);
|
||||
#ifdef CONFIG_JUMP_LABEL
|
||||
static atomic_t netstamp_needed_deferred;
|
||||
static atomic_t netstamp_wanted;
|
||||
@ -2108,14 +2109,15 @@ EXPORT_SYMBOL(net_disable_timestamp);
|
||||
static inline void net_timestamp_set(struct sk_buff *skb)
|
||||
{
|
||||
skb->tstamp = 0;
|
||||
skb->mono_delivery_time = 0;
|
||||
if (static_branch_unlikely(&netstamp_needed_key))
|
||||
__net_timestamp(skb);
|
||||
skb->tstamp = ktime_get_real();
|
||||
}
|
||||
|
||||
#define net_timestamp_check(COND, SKB) \
|
||||
if (static_branch_unlikely(&netstamp_needed_key)) { \
|
||||
if ((COND) && !(SKB)->tstamp) \
|
||||
__net_timestamp(SKB); \
|
||||
(SKB)->tstamp = ktime_get_real(); \
|
||||
} \
|
||||
|
||||
bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
|
||||
|
@ -2107,7 +2107,7 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
|
||||
}
|
||||
|
||||
skb->dev = dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
dev_xmit_recursion_inc();
|
||||
ret = dev_queue_xmit(skb);
|
||||
@ -2176,7 +2176,7 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
|
||||
}
|
||||
|
||||
skb->dev = dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
|
||||
skb = skb_expand_head(skb, hh_len);
|
||||
@ -2274,7 +2274,7 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
|
||||
}
|
||||
|
||||
skb->dev = dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
|
||||
skb = skb_expand_head(skb, hh_len);
|
||||
@ -7388,6 +7388,43 @@ static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = {
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_3(bpf_skb_set_delivery_time, struct sk_buff *, skb,
|
||||
u64, dtime, u32, dtime_type)
|
||||
{
|
||||
/* skb_clear_delivery_time() is done for inet protocol */
|
||||
if (skb->protocol != htons(ETH_P_IP) &&
|
||||
skb->protocol != htons(ETH_P_IPV6))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
switch (dtime_type) {
|
||||
case BPF_SKB_DELIVERY_TIME_MONO:
|
||||
if (!dtime)
|
||||
return -EINVAL;
|
||||
skb->tstamp = dtime;
|
||||
skb->mono_delivery_time = 1;
|
||||
break;
|
||||
case BPF_SKB_DELIVERY_TIME_NONE:
|
||||
if (dtime)
|
||||
return -EINVAL;
|
||||
skb->tstamp = 0;
|
||||
skb->mono_delivery_time = 0;
|
||||
break;
|
||||
default:
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_skb_set_delivery_time_proto = {
|
||||
.func = bpf_skb_set_delivery_time,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
#endif /* CONFIG_INET */
|
||||
|
||||
bool bpf_helper_changes_pkt_data(void *func)
|
||||
@ -7749,6 +7786,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_tcp_gen_syncookie_proto;
|
||||
case BPF_FUNC_sk_assign:
|
||||
return &bpf_sk_assign_proto;
|
||||
case BPF_FUNC_skb_set_delivery_time:
|
||||
return &bpf_skb_set_delivery_time_proto;
|
||||
#endif
|
||||
default:
|
||||
return bpf_sk_base_func_proto(func_id);
|
||||
@ -8088,7 +8127,9 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
|
||||
return false;
|
||||
info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL;
|
||||
break;
|
||||
case offsetofend(struct __sk_buff, gso_size) ... offsetof(struct __sk_buff, hwtstamp) - 1:
|
||||
case offsetof(struct __sk_buff, delivery_time_type):
|
||||
return false;
|
||||
case offsetofend(struct __sk_buff, delivery_time_type) ... offsetof(struct __sk_buff, hwtstamp) - 1:
|
||||
/* Explicitly prohibit access to padding in __sk_buff. */
|
||||
return false;
|
||||
default:
|
||||
@ -8443,6 +8484,15 @@ static bool tc_cls_act_is_valid_access(int off, int size,
|
||||
break;
|
||||
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
|
||||
return false;
|
||||
case offsetof(struct __sk_buff, delivery_time_type):
|
||||
/* The convert_ctx_access() on reading and writing
|
||||
* __sk_buff->tstamp depends on whether the bpf prog
|
||||
* has used __sk_buff->delivery_time_type or not.
|
||||
* Thus, we need to set prog->delivery_time_access
|
||||
* earlier during is_valid_access() here.
|
||||
*/
|
||||
((struct bpf_prog *)prog)->delivery_time_access = 1;
|
||||
return size == sizeof(__u8);
|
||||
}
|
||||
|
||||
return bpf_skb_is_valid_access(off, size, type, prog, info);
|
||||
@ -8838,6 +8888,45 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type,
|
||||
return insn - insn_buf;
|
||||
}
|
||||
|
||||
static struct bpf_insn *bpf_convert_dtime_type_read(const struct bpf_insn *si,
|
||||
struct bpf_insn *insn)
|
||||
{
|
||||
__u8 value_reg = si->dst_reg;
|
||||
__u8 skb_reg = si->src_reg;
|
||||
__u8 tmp_reg = BPF_REG_AX;
|
||||
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
|
||||
SKB_MONO_DELIVERY_TIME_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
|
||||
SKB_MONO_DELIVERY_TIME_MASK);
|
||||
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
|
||||
/* value_reg = BPF_SKB_DELIVERY_TIME_MONO */
|
||||
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_MONO);
|
||||
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 10 : 5);
|
||||
|
||||
*insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, skb_reg,
|
||||
offsetof(struct sk_buff, tstamp));
|
||||
*insn++ = BPF_JMP_IMM(BPF_JNE, tmp_reg, 0, 2);
|
||||
/* value_reg = BPF_SKB_DELIVERY_TIME_NONE */
|
||||
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_NONE);
|
||||
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 6 : 1);
|
||||
|
||||
#ifdef CONFIG_NET_CLS_ACT
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
|
||||
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
|
||||
/* At ingress, value_reg = 0 */
|
||||
*insn++ = BPF_MOV32_IMM(value_reg, 0);
|
||||
*insn++ = BPF_JMP_A(1);
|
||||
#endif
|
||||
|
||||
/* value_reg = BPF_SKB_DELIVERYT_TIME_UNSPEC */
|
||||
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_UNSPEC);
|
||||
|
||||
/* 15 insns with CONFIG_NET_CLS_ACT */
|
||||
return insn;
|
||||
}
|
||||
|
||||
static struct bpf_insn *bpf_convert_shinfo_access(const struct bpf_insn *si,
|
||||
struct bpf_insn *insn)
|
||||
{
|
||||
@ -8859,6 +8948,71 @@ static struct bpf_insn *bpf_convert_shinfo_access(const struct bpf_insn *si,
|
||||
return insn;
|
||||
}
|
||||
|
||||
static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
|
||||
const struct bpf_insn *si,
|
||||
struct bpf_insn *insn)
|
||||
{
|
||||
__u8 value_reg = si->dst_reg;
|
||||
__u8 skb_reg = si->src_reg;
|
||||
|
||||
#ifdef CONFIG_NET_CLS_ACT
|
||||
if (!prog->delivery_time_access) {
|
||||
__u8 tmp_reg = BPF_REG_AX;
|
||||
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
|
||||
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 5);
|
||||
/* @ingress, read __sk_buff->tstamp as the (rcv) timestamp,
|
||||
* so check the skb->mono_delivery_time.
|
||||
*/
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
|
||||
SKB_MONO_DELIVERY_TIME_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
|
||||
SKB_MONO_DELIVERY_TIME_MASK);
|
||||
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
|
||||
/* skb->mono_delivery_time is set, read 0 as the (rcv) timestamp. */
|
||||
*insn++ = BPF_MOV64_IMM(value_reg, 0);
|
||||
*insn++ = BPF_JMP_A(1);
|
||||
}
|
||||
#endif
|
||||
|
||||
*insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg,
|
||||
offsetof(struct sk_buff, tstamp));
|
||||
return insn;
|
||||
}
|
||||
|
||||
static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
|
||||
const struct bpf_insn *si,
|
||||
struct bpf_insn *insn)
|
||||
{
|
||||
__u8 value_reg = si->src_reg;
|
||||
__u8 skb_reg = si->dst_reg;
|
||||
|
||||
#ifdef CONFIG_NET_CLS_ACT
|
||||
if (!prog->delivery_time_access) {
|
||||
__u8 tmp_reg = BPF_REG_AX;
|
||||
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
|
||||
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 3);
|
||||
/* Writing __sk_buff->tstamp at ingress as the (rcv) timestamp.
|
||||
* Clear the skb->mono_delivery_time.
|
||||
*/
|
||||
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
|
||||
SKB_MONO_DELIVERY_TIME_OFFSET);
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
|
||||
~SKB_MONO_DELIVERY_TIME_MASK);
|
||||
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg,
|
||||
SKB_MONO_DELIVERY_TIME_OFFSET);
|
||||
}
|
||||
#endif
|
||||
|
||||
/* skb->tstamp = tstamp */
|
||||
*insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg,
|
||||
offsetof(struct sk_buff, tstamp));
|
||||
return insn;
|
||||
}
|
||||
|
||||
static u32 bpf_convert_ctx_access(enum bpf_access_type type,
|
||||
const struct bpf_insn *si,
|
||||
struct bpf_insn *insn_buf,
|
||||
@ -9167,17 +9321,13 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
|
||||
BUILD_BUG_ON(sizeof_field(struct sk_buff, tstamp) != 8);
|
||||
|
||||
if (type == BPF_WRITE)
|
||||
*insn++ = BPF_STX_MEM(BPF_DW,
|
||||
si->dst_reg, si->src_reg,
|
||||
bpf_target_off(struct sk_buff,
|
||||
tstamp, 8,
|
||||
target_size));
|
||||
insn = bpf_convert_tstamp_write(prog, si, insn);
|
||||
else
|
||||
*insn++ = BPF_LDX_MEM(BPF_DW,
|
||||
si->dst_reg, si->src_reg,
|
||||
bpf_target_off(struct sk_buff,
|
||||
tstamp, 8,
|
||||
target_size));
|
||||
insn = bpf_convert_tstamp_read(prog, si, insn);
|
||||
break;
|
||||
|
||||
case offsetof(struct __sk_buff, delivery_time_type):
|
||||
insn = bpf_convert_dtime_type_read(si, insn);
|
||||
break;
|
||||
|
||||
case offsetof(struct __sk_buff, gso_segs):
|
||||
|
@ -4851,7 +4851,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
|
||||
if (hwtstamps)
|
||||
*skb_hwtstamps(skb) = *hwtstamps;
|
||||
else
|
||||
skb->tstamp = ktime_get_real();
|
||||
__net_timestamp(skb);
|
||||
|
||||
__skb_complete_tx_timestamp(skb, sk, tstype, opt_stats);
|
||||
}
|
||||
@ -5381,7 +5381,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
|
||||
|
||||
ipvs_reset(skb);
|
||||
skb->mark = 0;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(skb_scrub_packet);
|
||||
|
||||
|
@ -130,6 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq,
|
||||
goto err;
|
||||
|
||||
fq->q.stamp = skb->tstamp;
|
||||
fq->q.mono_delivery_time = skb->mono_delivery_time;
|
||||
if (frag_type == LOWPAN_DISPATCH_FRAG1)
|
||||
fq->q.flags |= INET_FRAG_FIRST_IN;
|
||||
|
||||
|
@ -572,6 +572,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
|
||||
skb_mark_not_on_list(head);
|
||||
head->prev = NULL;
|
||||
head->tstamp = q->stamp;
|
||||
head->mono_delivery_time = q->mono_delivery_time;
|
||||
}
|
||||
EXPORT_SYMBOL(inet_frag_reasm_finish);
|
||||
|
||||
|
@ -79,7 +79,7 @@ static int ip_forward_finish(struct net *net, struct sock *sk, struct sk_buff *s
|
||||
if (unlikely(opt->optlen))
|
||||
ip_forward_options(skb);
|
||||
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
return dst_output(net, sk, skb);
|
||||
}
|
||||
|
||||
|
@ -349,6 +349,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
|
||||
qp->iif = dev->ifindex;
|
||||
|
||||
qp->q.stamp = skb->tstamp;
|
||||
qp->q.mono_delivery_time = skb->mono_delivery_time;
|
||||
qp->q.meat += skb->len;
|
||||
qp->ecn |= ecn;
|
||||
add_frag_mem_limit(qp->q.fqdir, skb->truesize);
|
||||
|
@ -226,6 +226,7 @@ resubmit:
|
||||
|
||||
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
|
||||
{
|
||||
skb_clear_delivery_time(skb);
|
||||
__skb_pull(skb, skb_network_header_len(skb));
|
||||
|
||||
rcu_read_lock();
|
||||
|
@ -761,6 +761,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
{
|
||||
struct iphdr *iph;
|
||||
struct sk_buff *skb2;
|
||||
bool mono_delivery_time = skb->mono_delivery_time;
|
||||
struct rtable *rt = skb_rtable(skb);
|
||||
unsigned int mtu, hlen, ll_rs;
|
||||
struct ip_fraglist_iter iter;
|
||||
@ -852,7 +853,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
}
|
||||
}
|
||||
|
||||
skb->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, skb);
|
||||
|
||||
if (!err)
|
||||
@ -908,7 +909,7 @@ slow_path:
|
||||
/*
|
||||
* Put this fragment into the sending queue.
|
||||
*/
|
||||
skb2->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, skb2);
|
||||
if (err)
|
||||
goto fail;
|
||||
@ -1727,6 +1728,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
|
||||
arg->csumoffset) = csum_fold(csum_add(nskb->csum,
|
||||
arg->csum));
|
||||
nskb->ip_summed = CHECKSUM_NONE;
|
||||
nskb->mono_delivery_time = !!transmit_time;
|
||||
ip_push_pending_frames(sk, &fl4);
|
||||
}
|
||||
out:
|
||||
|
@ -1253,7 +1253,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
|
||||
tp = tcp_sk(sk);
|
||||
prior_wstamp = tp->tcp_wstamp_ns;
|
||||
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
|
||||
skb->skb_mstamp_ns = tp->tcp_wstamp_ns;
|
||||
skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
|
||||
if (clone_it) {
|
||||
oskb = skb;
|
||||
|
||||
@ -1589,7 +1589,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
|
||||
|
||||
skb_split(skb, buff, len);
|
||||
|
||||
buff->tstamp = skb->tstamp;
|
||||
skb_set_delivery_time(buff, skb->tstamp, true);
|
||||
tcp_fragment_tstamp(skb, buff);
|
||||
|
||||
old_factor = tcp_skb_pcount(skb);
|
||||
@ -2616,7 +2616,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
|
||||
|
||||
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
|
||||
/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
|
||||
skb->skb_mstamp_ns = tp->tcp_wstamp_ns = tp->tcp_clock_cache;
|
||||
tp->tcp_wstamp_ns = tp->tcp_clock_cache;
|
||||
skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
|
||||
list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
|
||||
tcp_init_tso_segs(skb, mss_now);
|
||||
goto repair; /* Skip network transmission */
|
||||
@ -3541,11 +3542,12 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
|
||||
now = tcp_clock_ns();
|
||||
#ifdef CONFIG_SYN_COOKIES
|
||||
if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
|
||||
skb->skb_mstamp_ns = cookie_init_timestamp(req, now);
|
||||
skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
|
||||
true);
|
||||
else
|
||||
#endif
|
||||
{
|
||||
skb->skb_mstamp_ns = now;
|
||||
skb_set_delivery_time(skb, now, true);
|
||||
if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
|
||||
tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
|
||||
}
|
||||
@ -3594,7 +3596,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
|
||||
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
|
||||
synack_type, &opts);
|
||||
|
||||
skb->skb_mstamp_ns = now;
|
||||
skb_set_delivery_time(skb, now, true);
|
||||
tcp_add_tx_delay(skb, tp);
|
||||
|
||||
return skb;
|
||||
@ -3771,7 +3773,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
|
||||
|
||||
err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
|
||||
|
||||
syn->skb_mstamp_ns = syn_data->skb_mstamp_ns;
|
||||
skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
|
||||
|
||||
/* Now full SYN+DATA was cloned and sent (or not),
|
||||
* remove the SYN from the original skb (syn_data)
|
||||
|
@ -635,7 +635,8 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
|
||||
struct ioam6_schema *sc,
|
||||
u8 sclen, bool is_input)
|
||||
{
|
||||
struct __kernel_sock_timeval ts;
|
||||
struct timespec64 ts;
|
||||
ktime_t tstamp;
|
||||
u64 raw64;
|
||||
u32 raw32;
|
||||
u16 raw16;
|
||||
@ -680,10 +681,9 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
|
||||
if (!skb->dev) {
|
||||
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
|
||||
} else {
|
||||
if (!skb->tstamp)
|
||||
__net_timestamp(skb);
|
||||
tstamp = skb_tstamp_cond(skb, true);
|
||||
ts = ktime_to_timespec64(tstamp);
|
||||
|
||||
skb_get_new_timestamp(skb, &ts);
|
||||
*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
|
||||
}
|
||||
data += sizeof(__be32);
|
||||
@ -694,13 +694,12 @@ static void __ioam6_fill_trace_data(struct sk_buff *skb,
|
||||
if (!skb->dev) {
|
||||
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
|
||||
} else {
|
||||
if (!skb->tstamp)
|
||||
__net_timestamp(skb);
|
||||
if (!trace->type.bit2) {
|
||||
tstamp = skb_tstamp_cond(skb, true);
|
||||
ts = ktime_to_timespec64(tstamp);
|
||||
}
|
||||
|
||||
if (!trace->type.bit2)
|
||||
skb_get_new_timestamp(skb, &ts);
|
||||
|
||||
*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
|
||||
*(__be32 *)data = cpu_to_be32((u32)(ts.tv_nsec / NSEC_PER_USEC));
|
||||
}
|
||||
data += sizeof(__be32);
|
||||
}
|
||||
|
@ -459,6 +459,7 @@ discard:
|
||||
|
||||
static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
|
||||
{
|
||||
skb_clear_delivery_time(skb);
|
||||
rcu_read_lock();
|
||||
ip6_protocol_deliver_rcu(net, skb, 0, false);
|
||||
rcu_read_unlock();
|
||||
|
@ -440,7 +440,7 @@ static inline int ip6_forward_finish(struct net *net, struct sock *sk,
|
||||
}
|
||||
#endif
|
||||
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
return dst_output(net, sk, skb);
|
||||
}
|
||||
|
||||
@ -813,6 +813,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
|
||||
struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
|
||||
inet6_sk(skb->sk) : NULL;
|
||||
bool mono_delivery_time = skb->mono_delivery_time;
|
||||
struct ip6_frag_state state;
|
||||
unsigned int mtu, hlen, nexthdr_offset;
|
||||
ktime_t tstamp = skb->tstamp;
|
||||
@ -903,7 +904,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
if (iter.frag)
|
||||
ip6_fraglist_prepare(skb, &iter);
|
||||
|
||||
skb->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, skb);
|
||||
if (!err)
|
||||
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
|
||||
@ -962,7 +963,7 @@ slow_path:
|
||||
/*
|
||||
* Put this fragment into the sending queue.
|
||||
*/
|
||||
frag->tstamp = tstamp;
|
||||
skb_set_delivery_time(frag, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, frag);
|
||||
if (err)
|
||||
goto fail;
|
||||
|
@ -121,6 +121,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
struct sk_buff *))
|
||||
{
|
||||
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
|
||||
bool mono_delivery_time = skb->mono_delivery_time;
|
||||
ktime_t tstamp = skb->tstamp;
|
||||
struct ip6_frag_state state;
|
||||
u8 *prevhdr, nexthdr = 0;
|
||||
@ -186,7 +187,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
|
||||
if (iter.frag)
|
||||
ip6_fraglist_prepare(skb, &iter);
|
||||
|
||||
skb->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, data, skb);
|
||||
if (err || !iter.frag)
|
||||
break;
|
||||
@ -219,7 +220,7 @@ slow_path:
|
||||
goto blackhole;
|
||||
}
|
||||
|
||||
skb2->tstamp = tstamp;
|
||||
skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
|
||||
err = output(net, sk, data, skb2);
|
||||
if (err)
|
||||
goto blackhole;
|
||||
|
@ -264,6 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
|
||||
fq->iif = dev->ifindex;
|
||||
|
||||
fq->q.stamp = skb->tstamp;
|
||||
fq->q.mono_delivery_time = skb->mono_delivery_time;
|
||||
fq->q.meat += skb->len;
|
||||
fq->ecn |= ecn;
|
||||
if (payload_len > fq->q.max_size)
|
||||
|
@ -194,6 +194,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
|
||||
fq->iif = dev->ifindex;
|
||||
|
||||
fq->q.stamp = skb->tstamp;
|
||||
fq->q.mono_delivery_time = skb->mono_delivery_time;
|
||||
fq->q.meat += skb->len;
|
||||
fq->ecn |= ecn;
|
||||
add_frag_mem_limit(fq->q.fqdir, skb->truesize);
|
||||
|
@ -940,7 +940,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
|
||||
} else {
|
||||
mark = sk->sk_mark;
|
||||
}
|
||||
buff->tstamp = tcp_transmit_time(sk);
|
||||
skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
|
||||
}
|
||||
fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark) ?: mark;
|
||||
fl6.fl6_dport = t1->dest;
|
||||
|
@ -610,7 +610,7 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff *skb,
|
||||
nf_reset_ct(skb);
|
||||
skb_forward_csum(skb);
|
||||
if (skb->dev)
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
@ -652,7 +652,7 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
|
||||
if (!local) {
|
||||
skb_forward_csum(skb);
|
||||
if (skb->dev)
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb,
|
||||
NULL, skb_dst(skb)->dev, dst_output);
|
||||
} else
|
||||
@ -674,7 +674,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff *skb,
|
||||
ip_vs_drop_early_demux_sk(skb);
|
||||
skb_forward_csum(skb);
|
||||
if (skb->dev)
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb,
|
||||
NULL, skb_dst(skb)->dev, dst_output);
|
||||
} else
|
||||
|
@ -19,7 +19,7 @@ static void nf_do_netdev_egress(struct sk_buff *skb, struct net_device *dev)
|
||||
skb_push(skb, skb->mac_len);
|
||||
|
||||
skb->dev = dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
dev_queue_xmit(skb);
|
||||
}
|
||||
|
||||
|
@ -376,7 +376,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
|
||||
nf_flow_nat_ip(flow, skb, thoff, dir, iph);
|
||||
|
||||
ip_decrease_ttl(iph);
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
if (flow_table->flags & NF_FLOWTABLE_COUNTER)
|
||||
nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);
|
||||
@ -611,7 +611,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
|
||||
nf_flow_nat_ipv6(flow, skb, dir, ip6h);
|
||||
|
||||
ip6h->hop_limit--;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
|
||||
if (flow_table->flags & NF_FLOWTABLE_COUNTER)
|
||||
nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);
|
||||
|
@ -460,6 +460,7 @@ __build_packet_message(struct nfnl_log_net *log,
|
||||
sk_buff_data_t old_tail = inst->skb->tail;
|
||||
struct sock *sk;
|
||||
const unsigned char *hwhdrp;
|
||||
ktime_t tstamp;
|
||||
|
||||
nlh = nfnl_msg_put(inst->skb, 0, 0,
|
||||
nfnl_msg_type(NFNL_SUBSYS_ULOG, NFULNL_MSG_PACKET),
|
||||
@ -588,9 +589,10 @@ __build_packet_message(struct nfnl_log_net *log,
|
||||
goto nla_put_failure;
|
||||
}
|
||||
|
||||
if (hooknum <= NF_INET_FORWARD && skb->tstamp) {
|
||||
tstamp = skb_tstamp_cond(skb, false);
|
||||
if (hooknum <= NF_INET_FORWARD && tstamp) {
|
||||
struct nfulnl_msg_packet_timestamp ts;
|
||||
struct timespec64 kts = ktime_to_timespec64(skb->tstamp);
|
||||
struct timespec64 kts = ktime_to_timespec64(tstamp);
|
||||
ts.sec = cpu_to_be64(kts.tv_sec);
|
||||
ts.usec = cpu_to_be64(kts.tv_nsec / NSEC_PER_USEC);
|
||||
|
||||
|
@ -392,6 +392,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
|
||||
bool csum_verify;
|
||||
char *secdata = NULL;
|
||||
u32 seclen = 0;
|
||||
ktime_t tstamp;
|
||||
|
||||
size = nlmsg_total_size(sizeof(struct nfgenmsg))
|
||||
+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
|
||||
@ -407,7 +408,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
|
||||
+ nla_total_size(sizeof(u_int32_t)) /* skbinfo */
|
||||
+ nla_total_size(sizeof(u_int32_t)); /* cap_len */
|
||||
|
||||
if (entskb->tstamp)
|
||||
tstamp = skb_tstamp_cond(entskb, false);
|
||||
if (tstamp)
|
||||
size += nla_total_size(sizeof(struct nfqnl_msg_packet_timestamp));
|
||||
|
||||
size += nfqnl_get_bridge_size(entry);
|
||||
@ -582,9 +584,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
|
||||
if (nfqnl_put_bridge(entry, skb) < 0)
|
||||
goto nla_put_failure;
|
||||
|
||||
if (entry->state.hook <= NF_INET_FORWARD && entskb->tstamp) {
|
||||
if (entry->state.hook <= NF_INET_FORWARD && tstamp) {
|
||||
struct nfqnl_msg_packet_timestamp ts;
|
||||
struct timespec64 kts = ktime_to_timespec64(entskb->tstamp);
|
||||
struct timespec64 kts = ktime_to_timespec64(tstamp);
|
||||
|
||||
ts.sec = cpu_to_be64(kts.tv_sec);
|
||||
ts.usec = cpu_to_be64(kts.tv_nsec / NSEC_PER_USEC);
|
||||
|
@ -145,7 +145,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
|
||||
return;
|
||||
|
||||
skb->dev = dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
neigh_xmit(neigh_table, dev, addr, skb);
|
||||
out:
|
||||
regs->verdict.code = verdict;
|
||||
|
@ -507,7 +507,7 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
|
||||
}
|
||||
|
||||
skb->dev = vport->dev;
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
vport->ops->send(skb);
|
||||
return;
|
||||
|
||||
|
@ -460,7 +460,7 @@ static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts,
|
||||
return TP_STATUS_TS_RAW_HARDWARE;
|
||||
|
||||
if ((flags & SOF_TIMESTAMPING_SOFTWARE) &&
|
||||
ktime_to_timespec64_cond(skb->tstamp, ts))
|
||||
ktime_to_timespec64_cond(skb_tstamp(skb), ts))
|
||||
return TP_STATUS_TS_SOFTWARE;
|
||||
|
||||
return 0;
|
||||
@ -2199,6 +2199,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
|
||||
spin_lock(&sk->sk_receive_queue.lock);
|
||||
po->stats.stats1.tp_packets++;
|
||||
sock_skb_set_dropcount(sk, skb);
|
||||
skb_clear_delivery_time(skb);
|
||||
__skb_queue_tail(&sk->sk_receive_queue, skb);
|
||||
spin_unlock(&sk->sk_receive_queue.lock);
|
||||
sk->sk_data_ready(sk);
|
||||
@ -2377,6 +2378,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
|
||||
po->stats.stats1.tp_packets++;
|
||||
if (copy_skb) {
|
||||
status |= TP_STATUS_COPY;
|
||||
skb_clear_delivery_time(copy_skb);
|
||||
__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
|
||||
}
|
||||
spin_unlock(&sk->sk_receive_queue.lock);
|
||||
|
@ -53,6 +53,8 @@ static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
|
||||
bpf_compute_data_pointers(skb);
|
||||
filter_res = bpf_prog_run(filter, skb);
|
||||
}
|
||||
if (unlikely(!skb->tstamp && skb->mono_delivery_time))
|
||||
skb->mono_delivery_time = 0;
|
||||
if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
|
||||
skb_orphan(skb);
|
||||
|
||||
|
@ -102,6 +102,8 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
|
||||
bpf_compute_data_pointers(skb);
|
||||
filter_res = bpf_prog_run(prog->filter, skb);
|
||||
}
|
||||
if (unlikely(!skb->tstamp && skb->mono_delivery_time))
|
||||
skb->mono_delivery_time = 0;
|
||||
|
||||
if (prog->exts_integrated) {
|
||||
res->class = 0;
|
||||
|
@ -190,7 +190,7 @@ static void xfrmi_dev_uninit(struct net_device *dev)
|
||||
|
||||
static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
|
||||
{
|
||||
skb->tstamp = 0;
|
||||
skb_clear_tstamp(skb);
|
||||
skb->pkt_type = PACKET_HOST;
|
||||
skb->skb_iif = 0;
|
||||
skb->ignore_df = 0;
|
||||
|
@ -5086,6 +5086,37 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure. On error
|
||||
* *dst* buffer is zeroed out.
|
||||
*
|
||||
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
|
||||
* Description
|
||||
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
|
||||
* change the __sk_buff->delivery_time_type to *dtime_type*.
|
||||
*
|
||||
* When setting a delivery time (non zero *dtime*) to
|
||||
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
|
||||
* is supported. It is the only delivery_time_type that will be
|
||||
* kept after bpf_redirect_*().
|
||||
*
|
||||
* If there is no need to change the __sk_buff->delivery_time_type,
|
||||
* the delivery time can be directly written to __sk_buff->tstamp
|
||||
* instead.
|
||||
*
|
||||
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
|
||||
* can be used to clear any delivery time stored in
|
||||
* __sk_buff->tstamp.
|
||||
*
|
||||
* Only IPv4 and IPv6 skb->protocol are supported.
|
||||
*
|
||||
* This function is most useful when it needs to set a
|
||||
* mono delivery time to __sk_buff->tstamp and then
|
||||
* bpf_redirect_*() to the egress of an iface. For example,
|
||||
* changing the (rcv) timestamp in __sk_buff->tstamp at
|
||||
* ingress to a mono delivery time and then bpf_redirect_*()
|
||||
* to sch_fq@phy-dev.
|
||||
* Return
|
||||
* 0 on success.
|
||||
* **-EINVAL** for invalid input
|
||||
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -5280,6 +5311,7 @@ union bpf_attr {
|
||||
FN(xdp_load_bytes), \
|
||||
FN(xdp_store_bytes), \
|
||||
FN(copy_from_user_task), \
|
||||
FN(skb_set_delivery_time), \
|
||||
/* */
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
@ -5469,6 +5501,12 @@ union { \
|
||||
__u64 :64; \
|
||||
} __attribute__((aligned(8)))
|
||||
|
||||
enum {
|
||||
BPF_SKB_DELIVERY_TIME_NONE,
|
||||
BPF_SKB_DELIVERY_TIME_UNSPEC,
|
||||
BPF_SKB_DELIVERY_TIME_MONO,
|
||||
};
|
||||
|
||||
/* user accessible mirror of in-kernel sk_buff.
|
||||
* new fields can only be added to the end of this structure
|
||||
*/
|
||||
@ -5509,7 +5547,8 @@ struct __sk_buff {
|
||||
__u32 gso_segs;
|
||||
__bpf_md_ptr(struct bpf_sock *, sk);
|
||||
__u32 gso_size;
|
||||
__u32 :32; /* Padding, future use. */
|
||||
__u8 delivery_time_type;
|
||||
__u32 :24; /* Padding, future use. */
|
||||
__u64 hwtstamp;
|
||||
};
|
||||
|
||||
|
@ -17,6 +17,8 @@
|
||||
#include <linux/if_tun.h>
|
||||
#include <linux/limits.h>
|
||||
#include <linux/sysctl.h>
|
||||
#include <linux/time_types.h>
|
||||
#include <linux/net_tstamp.h>
|
||||
#include <sched.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
@ -29,6 +31,11 @@
|
||||
#include "test_tc_neigh_fib.skel.h"
|
||||
#include "test_tc_neigh.skel.h"
|
||||
#include "test_tc_peer.skel.h"
|
||||
#include "test_tc_dtime.skel.h"
|
||||
|
||||
#ifndef TCP_TX_DELAY
|
||||
#define TCP_TX_DELAY 37
|
||||
#endif
|
||||
|
||||
#define NS_SRC "ns_src"
|
||||
#define NS_FWD "ns_fwd"
|
||||
@ -61,6 +68,7 @@
|
||||
#define CHK_PROG_PIN_FILE "/sys/fs/bpf/test_tc_chk"
|
||||
|
||||
#define TIMEOUT_MILLIS 10000
|
||||
#define NSEC_PER_SEC 1000000000ULL
|
||||
|
||||
#define log_err(MSG, ...) \
|
||||
fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", \
|
||||
@ -440,6 +448,431 @@ static int set_forwarding(bool enable)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void rcv_tstamp(int fd, const char *expected, size_t s)
|
||||
{
|
||||
struct __kernel_timespec pkt_ts = {};
|
||||
char ctl[CMSG_SPACE(sizeof(pkt_ts))];
|
||||
struct timespec now_ts;
|
||||
struct msghdr msg = {};
|
||||
__u64 now_ns, pkt_ns;
|
||||
struct cmsghdr *cmsg;
|
||||
struct iovec iov;
|
||||
char data[32];
|
||||
int ret;
|
||||
|
||||
iov.iov_base = data;
|
||||
iov.iov_len = sizeof(data);
|
||||
msg.msg_iov = &iov;
|
||||
msg.msg_iovlen = 1;
|
||||
msg.msg_control = &ctl;
|
||||
msg.msg_controllen = sizeof(ctl);
|
||||
|
||||
ret = recvmsg(fd, &msg, 0);
|
||||
if (!ASSERT_EQ(ret, s, "recvmsg"))
|
||||
return;
|
||||
ASSERT_STRNEQ(data, expected, s, "expected rcv data");
|
||||
|
||||
cmsg = CMSG_FIRSTHDR(&msg);
|
||||
if (cmsg && cmsg->cmsg_level == SOL_SOCKET &&
|
||||
cmsg->cmsg_type == SO_TIMESTAMPNS_NEW)
|
||||
memcpy(&pkt_ts, CMSG_DATA(cmsg), sizeof(pkt_ts));
|
||||
|
||||
pkt_ns = pkt_ts.tv_sec * NSEC_PER_SEC + pkt_ts.tv_nsec;
|
||||
ASSERT_NEQ(pkt_ns, 0, "pkt rcv tstamp");
|
||||
|
||||
ret = clock_gettime(CLOCK_REALTIME, &now_ts);
|
||||
ASSERT_OK(ret, "clock_gettime");
|
||||
now_ns = now_ts.tv_sec * NSEC_PER_SEC + now_ts.tv_nsec;
|
||||
|
||||
if (ASSERT_GE(now_ns, pkt_ns, "check rcv tstamp"))
|
||||
ASSERT_LT(now_ns - pkt_ns, 5 * NSEC_PER_SEC,
|
||||
"check rcv tstamp");
|
||||
}
|
||||
|
||||
static void snd_tstamp(int fd, char *b, size_t s)
|
||||
{
|
||||
struct sock_txtime opt = { .clockid = CLOCK_TAI };
|
||||
char ctl[CMSG_SPACE(sizeof(__u64))];
|
||||
struct timespec now_ts;
|
||||
struct msghdr msg = {};
|
||||
struct cmsghdr *cmsg;
|
||||
struct iovec iov;
|
||||
__u64 now_ns;
|
||||
int ret;
|
||||
|
||||
ret = clock_gettime(CLOCK_TAI, &now_ts);
|
||||
ASSERT_OK(ret, "clock_get_time(CLOCK_TAI)");
|
||||
now_ns = now_ts.tv_sec * NSEC_PER_SEC + now_ts.tv_nsec;
|
||||
|
||||
iov.iov_base = b;
|
||||
iov.iov_len = s;
|
||||
msg.msg_iov = &iov;
|
||||
msg.msg_iovlen = 1;
|
||||
msg.msg_control = &ctl;
|
||||
msg.msg_controllen = sizeof(ctl);
|
||||
|
||||
cmsg = CMSG_FIRSTHDR(&msg);
|
||||
cmsg->cmsg_level = SOL_SOCKET;
|
||||
cmsg->cmsg_type = SCM_TXTIME;
|
||||
cmsg->cmsg_len = CMSG_LEN(sizeof(now_ns));
|
||||
*(__u64 *)CMSG_DATA(cmsg) = now_ns;
|
||||
|
||||
ret = setsockopt(fd, SOL_SOCKET, SO_TXTIME, &opt, sizeof(opt));
|
||||
ASSERT_OK(ret, "setsockopt(SO_TXTIME)");
|
||||
|
||||
ret = sendmsg(fd, &msg, 0);
|
||||
ASSERT_EQ(ret, s, "sendmsg");
|
||||
}
|
||||
|
||||
static void test_inet_dtime(int family, int type, const char *addr, __u16 port)
|
||||
{
|
||||
int opt = 1, accept_fd = -1, client_fd = -1, listen_fd, err;
|
||||
char buf[] = "testing testing";
|
||||
struct nstoken *nstoken;
|
||||
|
||||
nstoken = open_netns(NS_DST);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns dst"))
|
||||
return;
|
||||
listen_fd = start_server(family, type, addr, port, 0);
|
||||
close_netns(nstoken);
|
||||
|
||||
if (!ASSERT_GE(listen_fd, 0, "listen"))
|
||||
return;
|
||||
|
||||
/* Ensure the kernel puts the (rcv) timestamp for all skb */
|
||||
err = setsockopt(listen_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
|
||||
&opt, sizeof(opt));
|
||||
if (!ASSERT_OK(err, "setsockopt(SO_TIMESTAMPNS_NEW)"))
|
||||
goto done;
|
||||
|
||||
if (type == SOCK_STREAM) {
|
||||
/* Ensure the kernel set EDT when sending out rst/ack
|
||||
* from the kernel's ctl_sk.
|
||||
*/
|
||||
err = setsockopt(listen_fd, SOL_TCP, TCP_TX_DELAY, &opt,
|
||||
sizeof(opt));
|
||||
if (!ASSERT_OK(err, "setsockopt(TCP_TX_DELAY)"))
|
||||
goto done;
|
||||
}
|
||||
|
||||
nstoken = open_netns(NS_SRC);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns src"))
|
||||
goto done;
|
||||
client_fd = connect_to_fd(listen_fd, TIMEOUT_MILLIS);
|
||||
close_netns(nstoken);
|
||||
|
||||
if (!ASSERT_GE(client_fd, 0, "connect_to_fd"))
|
||||
goto done;
|
||||
|
||||
if (type == SOCK_STREAM) {
|
||||
int n;
|
||||
|
||||
accept_fd = accept(listen_fd, NULL, NULL);
|
||||
if (!ASSERT_GE(accept_fd, 0, "accept"))
|
||||
goto done;
|
||||
|
||||
n = write(client_fd, buf, sizeof(buf));
|
||||
if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
|
||||
goto done;
|
||||
rcv_tstamp(accept_fd, buf, sizeof(buf));
|
||||
} else {
|
||||
snd_tstamp(client_fd, buf, sizeof(buf));
|
||||
rcv_tstamp(listen_fd, buf, sizeof(buf));
|
||||
}
|
||||
|
||||
done:
|
||||
close(listen_fd);
|
||||
if (accept_fd != -1)
|
||||
close(accept_fd);
|
||||
if (client_fd != -1)
|
||||
close(client_fd);
|
||||
}
|
||||
|
||||
static int netns_load_dtime_bpf(struct test_tc_dtime *skel)
|
||||
{
|
||||
struct nstoken *nstoken;
|
||||
|
||||
#define PIN_FNAME(__file) "/sys/fs/bpf/" #__file
|
||||
#define PIN(__prog) ({ \
|
||||
int err = bpf_program__pin(skel->progs.__prog, PIN_FNAME(__prog)); \
|
||||
if (!ASSERT_OK(err, "pin " #__prog)) \
|
||||
goto fail; \
|
||||
})
|
||||
|
||||
/* setup ns_src tc progs */
|
||||
nstoken = open_netns(NS_SRC);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns " NS_SRC))
|
||||
return -1;
|
||||
PIN(egress_host);
|
||||
PIN(ingress_host);
|
||||
SYS("tc qdisc add dev veth_src clsact");
|
||||
SYS("tc filter add dev veth_src ingress bpf da object-pinned "
|
||||
PIN_FNAME(ingress_host));
|
||||
SYS("tc filter add dev veth_src egress bpf da object-pinned "
|
||||
PIN_FNAME(egress_host));
|
||||
close_netns(nstoken);
|
||||
|
||||
/* setup ns_dst tc progs */
|
||||
nstoken = open_netns(NS_DST);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns " NS_DST))
|
||||
return -1;
|
||||
PIN(egress_host);
|
||||
PIN(ingress_host);
|
||||
SYS("tc qdisc add dev veth_dst clsact");
|
||||
SYS("tc filter add dev veth_dst ingress bpf da object-pinned "
|
||||
PIN_FNAME(ingress_host));
|
||||
SYS("tc filter add dev veth_dst egress bpf da object-pinned "
|
||||
PIN_FNAME(egress_host));
|
||||
close_netns(nstoken);
|
||||
|
||||
/* setup ns_fwd tc progs */
|
||||
nstoken = open_netns(NS_FWD);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns " NS_FWD))
|
||||
return -1;
|
||||
PIN(ingress_fwdns_prio100);
|
||||
PIN(egress_fwdns_prio100);
|
||||
PIN(ingress_fwdns_prio101);
|
||||
PIN(egress_fwdns_prio101);
|
||||
SYS("tc qdisc add dev veth_dst_fwd clsact");
|
||||
SYS("tc filter add dev veth_dst_fwd ingress prio 100 bpf da object-pinned "
|
||||
PIN_FNAME(ingress_fwdns_prio100));
|
||||
SYS("tc filter add dev veth_dst_fwd ingress prio 101 bpf da object-pinned "
|
||||
PIN_FNAME(ingress_fwdns_prio101));
|
||||
SYS("tc filter add dev veth_dst_fwd egress prio 100 bpf da object-pinned "
|
||||
PIN_FNAME(egress_fwdns_prio100));
|
||||
SYS("tc filter add dev veth_dst_fwd egress prio 101 bpf da object-pinned "
|
||||
PIN_FNAME(egress_fwdns_prio101));
|
||||
SYS("tc qdisc add dev veth_src_fwd clsact");
|
||||
SYS("tc filter add dev veth_src_fwd ingress prio 100 bpf da object-pinned "
|
||||
PIN_FNAME(ingress_fwdns_prio100));
|
||||
SYS("tc filter add dev veth_src_fwd ingress prio 101 bpf da object-pinned "
|
||||
PIN_FNAME(ingress_fwdns_prio101));
|
||||
SYS("tc filter add dev veth_src_fwd egress prio 100 bpf da object-pinned "
|
||||
PIN_FNAME(egress_fwdns_prio100));
|
||||
SYS("tc filter add dev veth_src_fwd egress prio 101 bpf da object-pinned "
|
||||
PIN_FNAME(egress_fwdns_prio101));
|
||||
close_netns(nstoken);
|
||||
|
||||
#undef PIN
|
||||
|
||||
return 0;
|
||||
|
||||
fail:
|
||||
close_netns(nstoken);
|
||||
return -1;
|
||||
}
|
||||
|
||||
enum {
|
||||
INGRESS_FWDNS_P100,
|
||||
INGRESS_FWDNS_P101,
|
||||
EGRESS_FWDNS_P100,
|
||||
EGRESS_FWDNS_P101,
|
||||
INGRESS_ENDHOST,
|
||||
EGRESS_ENDHOST,
|
||||
SET_DTIME,
|
||||
__MAX_CNT,
|
||||
};
|
||||
|
||||
const char *cnt_names[] = {
|
||||
"ingress_fwdns_p100",
|
||||
"ingress_fwdns_p101",
|
||||
"egress_fwdns_p100",
|
||||
"egress_fwdns_p101",
|
||||
"ingress_endhost",
|
||||
"egress_endhost",
|
||||
"set_dtime",
|
||||
};
|
||||
|
||||
enum {
|
||||
TCP_IP6_CLEAR_DTIME,
|
||||
TCP_IP4,
|
||||
TCP_IP6,
|
||||
UDP_IP4,
|
||||
UDP_IP6,
|
||||
TCP_IP4_RT_FWD,
|
||||
TCP_IP6_RT_FWD,
|
||||
UDP_IP4_RT_FWD,
|
||||
UDP_IP6_RT_FWD,
|
||||
UKN_TEST,
|
||||
__NR_TESTS,
|
||||
};
|
||||
|
||||
const char *test_names[] = {
|
||||
"tcp ip6 clear dtime",
|
||||
"tcp ip4",
|
||||
"tcp ip6",
|
||||
"udp ip4",
|
||||
"udp ip6",
|
||||
"tcp ip4 rt fwd",
|
||||
"tcp ip6 rt fwd",
|
||||
"udp ip4 rt fwd",
|
||||
"udp ip6 rt fwd",
|
||||
};
|
||||
|
||||
static const char *dtime_cnt_str(int test, int cnt)
|
||||
{
|
||||
static char name[64];
|
||||
|
||||
snprintf(name, sizeof(name), "%s %s", test_names[test], cnt_names[cnt]);
|
||||
|
||||
return name;
|
||||
}
|
||||
|
||||
static const char *dtime_err_str(int test, int cnt)
|
||||
{
|
||||
static char name[64];
|
||||
|
||||
snprintf(name, sizeof(name), "%s %s errs", test_names[test],
|
||||
cnt_names[cnt]);
|
||||
|
||||
return name;
|
||||
}
|
||||
|
||||
static void test_tcp_clear_dtime(struct test_tc_dtime *skel)
|
||||
{
|
||||
int i, t = TCP_IP6_CLEAR_DTIME;
|
||||
__u32 *dtimes = skel->bss->dtimes[t];
|
||||
__u32 *errs = skel->bss->errs[t];
|
||||
|
||||
skel->bss->test = t;
|
||||
test_inet_dtime(AF_INET6, SOCK_STREAM, IP6_DST, 0);
|
||||
|
||||
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
|
||||
dtime_cnt_str(t, INGRESS_FWDNS_P100));
|
||||
ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
|
||||
dtime_cnt_str(t, INGRESS_FWDNS_P101));
|
||||
ASSERT_GT(dtimes[EGRESS_FWDNS_P100], 0,
|
||||
dtime_cnt_str(t, EGRESS_FWDNS_P100));
|
||||
ASSERT_EQ(dtimes[EGRESS_FWDNS_P101], 0,
|
||||
dtime_cnt_str(t, EGRESS_FWDNS_P101));
|
||||
ASSERT_GT(dtimes[EGRESS_ENDHOST], 0,
|
||||
dtime_cnt_str(t, EGRESS_ENDHOST));
|
||||
ASSERT_GT(dtimes[INGRESS_ENDHOST], 0,
|
||||
dtime_cnt_str(t, INGRESS_ENDHOST));
|
||||
|
||||
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
|
||||
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
|
||||
}
|
||||
|
||||
static void test_tcp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
|
||||
{
|
||||
__u32 *dtimes, *errs;
|
||||
const char *addr;
|
||||
int i, t;
|
||||
|
||||
if (family == AF_INET) {
|
||||
t = bpf_fwd ? TCP_IP4 : TCP_IP4_RT_FWD;
|
||||
addr = IP4_DST;
|
||||
} else {
|
||||
t = bpf_fwd ? TCP_IP6 : TCP_IP6_RT_FWD;
|
||||
addr = IP6_DST;
|
||||
}
|
||||
|
||||
dtimes = skel->bss->dtimes[t];
|
||||
errs = skel->bss->errs[t];
|
||||
|
||||
skel->bss->test = t;
|
||||
test_inet_dtime(family, SOCK_STREAM, addr, 0);
|
||||
|
||||
/* fwdns_prio100 prog does not read delivery_time_type, so
|
||||
* kernel puts the (rcv) timetamp in __sk_buff->tstamp
|
||||
*/
|
||||
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
|
||||
dtime_cnt_str(t, INGRESS_FWDNS_P100));
|
||||
for (i = INGRESS_FWDNS_P101; i < SET_DTIME; i++)
|
||||
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));
|
||||
|
||||
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
|
||||
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
|
||||
}
|
||||
|
||||
static void test_udp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
|
||||
{
|
||||
__u32 *dtimes, *errs;
|
||||
const char *addr;
|
||||
int i, t;
|
||||
|
||||
if (family == AF_INET) {
|
||||
t = bpf_fwd ? UDP_IP4 : UDP_IP4_RT_FWD;
|
||||
addr = IP4_DST;
|
||||
} else {
|
||||
t = bpf_fwd ? UDP_IP6 : UDP_IP6_RT_FWD;
|
||||
addr = IP6_DST;
|
||||
}
|
||||
|
||||
dtimes = skel->bss->dtimes[t];
|
||||
errs = skel->bss->errs[t];
|
||||
|
||||
skel->bss->test = t;
|
||||
test_inet_dtime(family, SOCK_DGRAM, addr, 0);
|
||||
|
||||
ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
|
||||
dtime_cnt_str(t, INGRESS_FWDNS_P100));
|
||||
/* non mono delivery time is not forwarded */
|
||||
ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
|
||||
dtime_cnt_str(t, INGRESS_FWDNS_P100));
|
||||
for (i = EGRESS_FWDNS_P100; i < SET_DTIME; i++)
|
||||
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));
|
||||
|
||||
for (i = INGRESS_FWDNS_P100; i < __MAX_CNT; i++)
|
||||
ASSERT_EQ(errs[i], 0, dtime_err_str(t, i));
|
||||
}
|
||||
|
||||
static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
|
||||
{
|
||||
struct test_tc_dtime *skel;
|
||||
struct nstoken *nstoken;
|
||||
int err;
|
||||
|
||||
skel = test_tc_dtime__open();
|
||||
if (!ASSERT_OK_PTR(skel, "test_tc_dtime__open"))
|
||||
return;
|
||||
|
||||
skel->rodata->IFINDEX_SRC = setup_result->ifindex_veth_src_fwd;
|
||||
skel->rodata->IFINDEX_DST = setup_result->ifindex_veth_dst_fwd;
|
||||
|
||||
err = test_tc_dtime__load(skel);
|
||||
if (!ASSERT_OK(err, "test_tc_dtime__load"))
|
||||
goto done;
|
||||
|
||||
if (netns_load_dtime_bpf(skel))
|
||||
goto done;
|
||||
|
||||
nstoken = open_netns(NS_FWD);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns fwd"))
|
||||
goto done;
|
||||
err = set_forwarding(false);
|
||||
close_netns(nstoken);
|
||||
if (!ASSERT_OK(err, "disable forwarding"))
|
||||
goto done;
|
||||
|
||||
test_tcp_clear_dtime(skel);
|
||||
|
||||
test_tcp_dtime(skel, AF_INET, true);
|
||||
test_tcp_dtime(skel, AF_INET6, true);
|
||||
test_udp_dtime(skel, AF_INET, true);
|
||||
test_udp_dtime(skel, AF_INET6, true);
|
||||
|
||||
/* Test the kernel ip[6]_forward path instead
|
||||
* of bpf_redirect_neigh().
|
||||
*/
|
||||
nstoken = open_netns(NS_FWD);
|
||||
if (!ASSERT_OK_PTR(nstoken, "setns fwd"))
|
||||
goto done;
|
||||
err = set_forwarding(true);
|
||||
close_netns(nstoken);
|
||||
if (!ASSERT_OK(err, "enable forwarding"))
|
||||
goto done;
|
||||
|
||||
test_tcp_dtime(skel, AF_INET, false);
|
||||
test_tcp_dtime(skel, AF_INET6, false);
|
||||
test_udp_dtime(skel, AF_INET, false);
|
||||
test_udp_dtime(skel, AF_INET6, false);
|
||||
|
||||
done:
|
||||
test_tc_dtime__destroy(skel);
|
||||
}
|
||||
|
||||
static void test_tc_redirect_neigh_fib(struct netns_setup_result *setup_result)
|
||||
{
|
||||
struct nstoken *nstoken = NULL;
|
||||
@ -787,6 +1220,7 @@ static void *test_tc_redirect_run_tests(void *arg)
|
||||
RUN_TEST(tc_redirect_peer_l3);
|
||||
RUN_TEST(tc_redirect_neigh);
|
||||
RUN_TEST(tc_redirect_neigh_fib);
|
||||
RUN_TEST(tc_redirect_dtime);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
|
349
tools/testing/selftests/bpf/progs/test_tc_dtime.c
Normal file
349
tools/testing/selftests/bpf/progs/test_tc_dtime.c
Normal file
@ -0,0 +1,349 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2022 Meta
|
||||
|
||||
#include <stddef.h>
|
||||
#include <stdint.h>
|
||||
#include <stdbool.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/stddef.h>
|
||||
#include <linux/pkt_cls.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <linux/in.h>
|
||||
#include <linux/ip.h>
|
||||
#include <linux/ipv6.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include <sys/socket.h>
|
||||
|
||||
/* veth_src --- veth_src_fwd --- veth_det_fwd --- veth_dst
|
||||
* | |
|
||||
* ns_src | ns_fwd | ns_dst
|
||||
*
|
||||
* ns_src and ns_dst: ENDHOST namespace
|
||||
* ns_fwd: Fowarding namespace
|
||||
*/
|
||||
|
||||
#define ctx_ptr(field) (void *)(long)(field)
|
||||
|
||||
#define ip4_src __bpf_htonl(0xac100164) /* 172.16.1.100 */
|
||||
#define ip4_dst __bpf_htonl(0xac100264) /* 172.16.2.100 */
|
||||
|
||||
#define ip6_src { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
|
||||
0x00, 0x01, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe }
|
||||
#define ip6_dst { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
|
||||
0x00, 0x02, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe }
|
||||
|
||||
#define v6_equal(a, b) (a.s6_addr32[0] == b.s6_addr32[0] && \
|
||||
a.s6_addr32[1] == b.s6_addr32[1] && \
|
||||
a.s6_addr32[2] == b.s6_addr32[2] && \
|
||||
a.s6_addr32[3] == b.s6_addr32[3])
|
||||
|
||||
volatile const __u32 IFINDEX_SRC;
|
||||
volatile const __u32 IFINDEX_DST;
|
||||
|
||||
#define EGRESS_ENDHOST_MAGIC 0x0b9fbeef
|
||||
#define INGRESS_FWDNS_MAGIC 0x1b9fbeef
|
||||
#define EGRESS_FWDNS_MAGIC 0x2b9fbeef
|
||||
|
||||
enum {
|
||||
INGRESS_FWDNS_P100,
|
||||
INGRESS_FWDNS_P101,
|
||||
EGRESS_FWDNS_P100,
|
||||
EGRESS_FWDNS_P101,
|
||||
INGRESS_ENDHOST,
|
||||
EGRESS_ENDHOST,
|
||||
SET_DTIME,
|
||||
__MAX_CNT,
|
||||
};
|
||||
|
||||
enum {
|
||||
TCP_IP6_CLEAR_DTIME,
|
||||
TCP_IP4,
|
||||
TCP_IP6,
|
||||
UDP_IP4,
|
||||
UDP_IP6,
|
||||
TCP_IP4_RT_FWD,
|
||||
TCP_IP6_RT_FWD,
|
||||
UDP_IP4_RT_FWD,
|
||||
UDP_IP6_RT_FWD,
|
||||
UKN_TEST,
|
||||
__NR_TESTS,
|
||||
};
|
||||
|
||||
enum {
|
||||
SRC_NS = 1,
|
||||
DST_NS,
|
||||
};
|
||||
|
||||
__u32 dtimes[__NR_TESTS][__MAX_CNT] = {};
|
||||
__u32 errs[__NR_TESTS][__MAX_CNT] = {};
|
||||
__u32 test = 0;
|
||||
|
||||
static void inc_dtimes(__u32 idx)
|
||||
{
|
||||
if (test < __NR_TESTS)
|
||||
dtimes[test][idx]++;
|
||||
else
|
||||
dtimes[UKN_TEST][idx]++;
|
||||
}
|
||||
|
||||
static void inc_errs(__u32 idx)
|
||||
{
|
||||
if (test < __NR_TESTS)
|
||||
errs[test][idx]++;
|
||||
else
|
||||
errs[UKN_TEST][idx]++;
|
||||
}
|
||||
|
||||
static int skb_proto(int type)
|
||||
{
|
||||
return type & 0xff;
|
||||
}
|
||||
|
||||
static int skb_ns(int type)
|
||||
{
|
||||
return (type >> 8) & 0xff;
|
||||
}
|
||||
|
||||
static bool fwdns_clear_dtime(void)
|
||||
{
|
||||
return test == TCP_IP6_CLEAR_DTIME;
|
||||
}
|
||||
|
||||
static bool bpf_fwd(void)
|
||||
{
|
||||
return test < TCP_IP4_RT_FWD;
|
||||
}
|
||||
|
||||
/* -1: parse error: TC_ACT_SHOT
|
||||
* 0: not testing traffic: TC_ACT_OK
|
||||
* >0: first byte is the inet_proto, second byte has the netns
|
||||
* of the sender
|
||||
*/
|
||||
static int skb_get_type(struct __sk_buff *skb)
|
||||
{
|
||||
void *data_end = ctx_ptr(skb->data_end);
|
||||
void *data = ctx_ptr(skb->data);
|
||||
__u8 inet_proto = 0, ns = 0;
|
||||
struct ipv6hdr *ip6h;
|
||||
struct iphdr *iph;
|
||||
|
||||
switch (skb->protocol) {
|
||||
case __bpf_htons(ETH_P_IP):
|
||||
iph = data + sizeof(struct ethhdr);
|
||||
if (iph + 1 > data_end)
|
||||
return -1;
|
||||
if (iph->saddr == ip4_src)
|
||||
ns = SRC_NS;
|
||||
else if (iph->saddr == ip4_dst)
|
||||
ns = DST_NS;
|
||||
inet_proto = iph->protocol;
|
||||
break;
|
||||
case __bpf_htons(ETH_P_IPV6):
|
||||
ip6h = data + sizeof(struct ethhdr);
|
||||
if (ip6h + 1 > data_end)
|
||||
return -1;
|
||||
if (v6_equal(ip6h->saddr, (struct in6_addr)ip6_src))
|
||||
ns = SRC_NS;
|
||||
else if (v6_equal(ip6h->saddr, (struct in6_addr)ip6_dst))
|
||||
ns = DST_NS;
|
||||
inet_proto = ip6h->nexthdr;
|
||||
break;
|
||||
default:
|
||||
return 0;
|
||||
}
|
||||
|
||||
if ((inet_proto != IPPROTO_TCP && inet_proto != IPPROTO_UDP) || !ns)
|
||||
return 0;
|
||||
|
||||
return (ns << 8 | inet_proto);
|
||||
}
|
||||
|
||||
/* format: direction@iface@netns
|
||||
* egress@veth_(src|dst)@ns_(src|dst)
|
||||
*/
|
||||
SEC("tc")
|
||||
int egress_host(struct __sk_buff *skb)
|
||||
{
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1)
|
||||
return TC_ACT_SHOT;
|
||||
if (!skb_type)
|
||||
return TC_ACT_OK;
|
||||
|
||||
if (skb_proto(skb_type) == IPPROTO_TCP) {
|
||||
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO &&
|
||||
skb->tstamp)
|
||||
inc_dtimes(EGRESS_ENDHOST);
|
||||
else
|
||||
inc_errs(EGRESS_ENDHOST);
|
||||
} else {
|
||||
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_UNSPEC &&
|
||||
skb->tstamp)
|
||||
inc_dtimes(EGRESS_ENDHOST);
|
||||
else
|
||||
inc_errs(EGRESS_ENDHOST);
|
||||
}
|
||||
|
||||
skb->tstamp = EGRESS_ENDHOST_MAGIC;
|
||||
|
||||
return TC_ACT_OK;
|
||||
}
|
||||
|
||||
/* ingress@veth_(src|dst)@ns_(src|dst) */
|
||||
SEC("tc")
|
||||
int ingress_host(struct __sk_buff *skb)
|
||||
{
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1)
|
||||
return TC_ACT_SHOT;
|
||||
if (!skb_type)
|
||||
return TC_ACT_OK;
|
||||
|
||||
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO &&
|
||||
skb->tstamp == EGRESS_FWDNS_MAGIC)
|
||||
inc_dtimes(INGRESS_ENDHOST);
|
||||
else
|
||||
inc_errs(INGRESS_ENDHOST);
|
||||
|
||||
return TC_ACT_OK;
|
||||
}
|
||||
|
||||
/* ingress@veth_(src|dst)_fwd@ns_fwd priority 100 */
|
||||
SEC("tc")
|
||||
int ingress_fwdns_prio100(struct __sk_buff *skb)
|
||||
{
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1)
|
||||
return TC_ACT_SHOT;
|
||||
if (!skb_type)
|
||||
return TC_ACT_OK;
|
||||
|
||||
/* delivery_time is only available to the ingress
|
||||
* if the tc-bpf checks the skb->delivery_time_type.
|
||||
*/
|
||||
if (skb->tstamp == EGRESS_ENDHOST_MAGIC)
|
||||
inc_errs(INGRESS_FWDNS_P100);
|
||||
|
||||
if (fwdns_clear_dtime())
|
||||
skb->tstamp = 0;
|
||||
|
||||
return TC_ACT_UNSPEC;
|
||||
}
|
||||
|
||||
/* egress@veth_(src|dst)_fwd@ns_fwd priority 100 */
|
||||
SEC("tc")
|
||||
int egress_fwdns_prio100(struct __sk_buff *skb)
|
||||
{
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1)
|
||||
return TC_ACT_SHOT;
|
||||
if (!skb_type)
|
||||
return TC_ACT_OK;
|
||||
|
||||
/* delivery_time is always available to egress even
|
||||
* the tc-bpf did not use the delivery_time_type.
|
||||
*/
|
||||
if (skb->tstamp == INGRESS_FWDNS_MAGIC)
|
||||
inc_dtimes(EGRESS_FWDNS_P100);
|
||||
else
|
||||
inc_errs(EGRESS_FWDNS_P100);
|
||||
|
||||
if (fwdns_clear_dtime())
|
||||
skb->tstamp = 0;
|
||||
|
||||
return TC_ACT_UNSPEC;
|
||||
}
|
||||
|
||||
/* ingress@veth_(src|dst)_fwd@ns_fwd priority 101 */
|
||||
SEC("tc")
|
||||
int ingress_fwdns_prio101(struct __sk_buff *skb)
|
||||
{
|
||||
__u64 expected_dtime = EGRESS_ENDHOST_MAGIC;
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1 || !skb_type)
|
||||
/* Should have handled in prio100 */
|
||||
return TC_ACT_SHOT;
|
||||
|
||||
if (skb_proto(skb_type) == IPPROTO_UDP)
|
||||
expected_dtime = 0;
|
||||
|
||||
if (skb->delivery_time_type) {
|
||||
if (fwdns_clear_dtime() ||
|
||||
skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO ||
|
||||
skb->tstamp != expected_dtime)
|
||||
inc_errs(INGRESS_FWDNS_P101);
|
||||
else
|
||||
inc_dtimes(INGRESS_FWDNS_P101);
|
||||
} else {
|
||||
if (!fwdns_clear_dtime() && expected_dtime)
|
||||
inc_errs(INGRESS_FWDNS_P101);
|
||||
}
|
||||
|
||||
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) {
|
||||
skb->tstamp = INGRESS_FWDNS_MAGIC;
|
||||
} else {
|
||||
if (bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC,
|
||||
BPF_SKB_DELIVERY_TIME_MONO))
|
||||
inc_errs(SET_DTIME);
|
||||
if (!bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC,
|
||||
BPF_SKB_DELIVERY_TIME_UNSPEC))
|
||||
inc_errs(SET_DTIME);
|
||||
}
|
||||
|
||||
if (skb_ns(skb_type) == SRC_NS)
|
||||
return bpf_fwd() ?
|
||||
bpf_redirect_neigh(IFINDEX_DST, NULL, 0, 0) : TC_ACT_OK;
|
||||
else
|
||||
return bpf_fwd() ?
|
||||
bpf_redirect_neigh(IFINDEX_SRC, NULL, 0, 0) : TC_ACT_OK;
|
||||
}
|
||||
|
||||
/* egress@veth_(src|dst)_fwd@ns_fwd priority 101 */
|
||||
SEC("tc")
|
||||
int egress_fwdns_prio101(struct __sk_buff *skb)
|
||||
{
|
||||
int skb_type;
|
||||
|
||||
skb_type = skb_get_type(skb);
|
||||
if (skb_type == -1 || !skb_type)
|
||||
/* Should have handled in prio100 */
|
||||
return TC_ACT_SHOT;
|
||||
|
||||
if (skb->delivery_time_type) {
|
||||
if (fwdns_clear_dtime() ||
|
||||
skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO ||
|
||||
skb->tstamp != INGRESS_FWDNS_MAGIC)
|
||||
inc_errs(EGRESS_FWDNS_P101);
|
||||
else
|
||||
inc_dtimes(EGRESS_FWDNS_P101);
|
||||
} else {
|
||||
if (!fwdns_clear_dtime())
|
||||
inc_errs(EGRESS_FWDNS_P101);
|
||||
}
|
||||
|
||||
if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) {
|
||||
skb->tstamp = EGRESS_FWDNS_MAGIC;
|
||||
} else {
|
||||
if (bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC,
|
||||
BPF_SKB_DELIVERY_TIME_MONO))
|
||||
inc_errs(SET_DTIME);
|
||||
if (!bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC,
|
||||
BPF_SKB_DELIVERY_TIME_UNSPEC))
|
||||
inc_errs(SET_DTIME);
|
||||
}
|
||||
|
||||
return TC_ACT_OK;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
Loading…
Reference in New Issue
Block a user