forked from Minki/linux
bpf: btf: add btf documentation
This patch added documentation for BTF (BPF Debug Format). The document is placed under linux:Documentation/bpf directory. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit is contained in:
parent
cbeaad9028
commit
ffcf7ce933
870
Documentation/bpf/btf.rst
Normal file
870
Documentation/bpf/btf.rst
Normal file
@ -0,0 +1,870 @@
|
||||
=====================
|
||||
BPF Type Format (BTF)
|
||||
=====================
|
||||
|
||||
1. Introduction
|
||||
***************
|
||||
|
||||
BTF (BPF Type Format) is the meta data format which
|
||||
encodes the debug info related to BPF program/map.
|
||||
The name BTF was used initially to describe
|
||||
data types. The BTF was later extended to include
|
||||
function info for defined subroutines, and line info
|
||||
for source/line information.
|
||||
|
||||
The debug info is used for map pretty print, function
|
||||
signature, etc. The function signature enables better
|
||||
bpf program/function kernel symbol.
|
||||
The line info helps generate
|
||||
source annotated translated byte code, jited code
|
||||
and verifier log.
|
||||
|
||||
The BTF specification contains two parts,
|
||||
* BTF kernel API
|
||||
* BTF ELF file format
|
||||
|
||||
The kernel API is the contract between
|
||||
user space and kernel. The kernel verifies
|
||||
the BTF info before using it.
|
||||
The ELF file format is a user space contract
|
||||
between ELF file and libbpf loader.
|
||||
|
||||
The type and string sections are part of the
|
||||
BTF kernel API, describing the debug info
|
||||
(mostly types related) referenced by the bpf program.
|
||||
These two sections are discussed in
|
||||
details in :ref:`BTF_Type_String`.
|
||||
|
||||
.. _BTF_Type_String:
|
||||
|
||||
2. BTF Type and String Encoding
|
||||
*******************************
|
||||
|
||||
The file ``include/uapi/linux/btf.h`` provides high
|
||||
level definition on how types/strings are encoded.
|
||||
|
||||
The beginning of data blob must be::
|
||||
|
||||
struct btf_header {
|
||||
__u16 magic;
|
||||
__u8 version;
|
||||
__u8 flags;
|
||||
__u32 hdr_len;
|
||||
|
||||
/* All offsets are in bytes relative to the end of this header */
|
||||
__u32 type_off; /* offset of type section */
|
||||
__u32 type_len; /* length of type section */
|
||||
__u32 str_off; /* offset of string section */
|
||||
__u32 str_len; /* length of string section */
|
||||
};
|
||||
|
||||
The magic is ``0xeB9F``, which has different encoding for big and little
|
||||
endian system, and can be used to test whether BTF is generated for
|
||||
big or little endian target.
|
||||
The btf_header is designed to be extensible with hdr_len equal to
|
||||
``sizeof(struct btf_header)`` when the data blob is generated.
|
||||
|
||||
2.1 String Encoding
|
||||
===================
|
||||
|
||||
The first string in the string section must be a null string.
|
||||
The rest of string table is a concatenation of other null-treminated
|
||||
strings.
|
||||
|
||||
2.2 Type Encoding
|
||||
=================
|
||||
|
||||
The type id ``0`` is reserved for ``void`` type.
|
||||
The type section is parsed sequentially and the type id is assigned to
|
||||
each recognized type starting from id ``1``.
|
||||
Currently, the following types are supported::
|
||||
|
||||
#define BTF_KIND_INT 1 /* Integer */
|
||||
#define BTF_KIND_PTR 2 /* Pointer */
|
||||
#define BTF_KIND_ARRAY 3 /* Array */
|
||||
#define BTF_KIND_STRUCT 4 /* Struct */
|
||||
#define BTF_KIND_UNION 5 /* Union */
|
||||
#define BTF_KIND_ENUM 6 /* Enumeration */
|
||||
#define BTF_KIND_FWD 7 /* Forward */
|
||||
#define BTF_KIND_TYPEDEF 8 /* Typedef */
|
||||
#define BTF_KIND_VOLATILE 9 /* Volatile */
|
||||
#define BTF_KIND_CONST 10 /* Const */
|
||||
#define BTF_KIND_RESTRICT 11 /* Restrict */
|
||||
#define BTF_KIND_FUNC 12 /* Function */
|
||||
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
|
||||
|
||||
Note that the type section encodes debug info, not just pure types.
|
||||
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
|
||||
|
||||
Each type contains the following common data::
|
||||
|
||||
struct btf_type {
|
||||
__u32 name_off;
|
||||
/* "info" bits arrangement
|
||||
* bits 0-15: vlen (e.g. # of struct's members)
|
||||
* bits 16-23: unused
|
||||
* bits 24-27: kind (e.g. int, ptr, array...etc)
|
||||
* bits 28-30: unused
|
||||
* bit 31: kind_flag, currently used by
|
||||
* struct, union and fwd
|
||||
*/
|
||||
__u32 info;
|
||||
/* "size" is used by INT, ENUM, STRUCT and UNION.
|
||||
* "size" tells the size of the type it is describing.
|
||||
*
|
||||
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
|
||||
* FUNC and FUNC_PROTO.
|
||||
* "type" is a type_id referring to another type.
|
||||
*/
|
||||
union {
|
||||
__u32 size;
|
||||
__u32 type;
|
||||
};
|
||||
};
|
||||
|
||||
For certain kinds, the common data are followed by kind specific data.
|
||||
The ``name_off`` in ``struct btf_type`` specifies the offset in the string table.
|
||||
The following details encoding of each kind.
|
||||
|
||||
2.2.1 BTF_KIND_INT
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: any valid offset
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_INT
|
||||
* ``info.vlen``: 0
|
||||
* ``size``: the size of the int type in bytes.
|
||||
|
||||
``btf_type`` is followed by a ``u32`` with following bits arrangement::
|
||||
|
||||
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
|
||||
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
|
||||
#define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff)
|
||||
|
||||
The ``BTF_INT_ENCODING`` has the following attributes::
|
||||
|
||||
#define BTF_INT_SIGNED (1 << 0)
|
||||
#define BTF_INT_CHAR (1 << 1)
|
||||
#define BTF_INT_BOOL (1 << 2)
|
||||
|
||||
The ``BTF_INT_ENCODING()`` provides extra information, signness,
|
||||
char, or bool, for the int type. The char and bool encoding
|
||||
are mostly useful for pretty print. At most one encoding can
|
||||
be specified for the int type.
|
||||
|
||||
The ``BTF_INT_BITS()`` specifies the number of actual bits held by
|
||||
this int type. For example, a 4-bit bitfield encodes
|
||||
``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8``
|
||||
must be equal to or greater than ``BTF_INT_BITS()`` for the type.
|
||||
The maximum value of ``BTF_INT_BITS()`` is 128.
|
||||
|
||||
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to
|
||||
calculate values for this int. For example, a bitfield struct
|
||||
member has
|
||||
|
||||
* btf member bit offset 100 from the start of the structure,
|
||||
* btf member pointing to an int type,
|
||||
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
|
||||
|
||||
Then in the struct memory layout, this member will occupy
|
||||
``4`` bits starting from bits ``100 + 2 = 102``.
|
||||
|
||||
Alternatively, the bitfield struct member can be the following to
|
||||
access the same bits as the above:
|
||||
|
||||
* btf member bit offset 102,
|
||||
* btf member pointing to an int type,
|
||||
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
|
||||
|
||||
The original intention of ``BTF_INT_OFFSET()`` is to provide
|
||||
flexibility of bitfield encoding.
|
||||
Currently, both llvm and pahole generates ``BTF_INT_OFFSET() = 0``
|
||||
for all int types.
|
||||
|
||||
2.2.2 BTF_KIND_PTR
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_PTR
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the pointee type of the pointer
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.3 BTF_KIND_ARRAY
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_ARRAY
|
||||
* ``info.vlen``: 0
|
||||
* ``size/type``: 0, not used
|
||||
|
||||
btf_type is followed by one "struct btf_array"::
|
||||
|
||||
struct btf_array {
|
||||
__u32 type;
|
||||
__u32 index_type;
|
||||
__u32 nelems;
|
||||
};
|
||||
|
||||
The ``struct btf_array`` encoding:
|
||||
* ``type``: the element type
|
||||
* ``index_type``: the index type
|
||||
* ``nelems``: the number of elements for this array (``0`` is also allowed).
|
||||
|
||||
The ``index_type`` can be any regular int types
|
||||
(u8, u16, u32, u64, unsigned __int128).
|
||||
The original design of including ``index_type`` follows dwarf
|
||||
which has a ``index_type`` for its array type.
|
||||
Currently in BTF, beyond type verification, the ``index_type`` is not used.
|
||||
|
||||
The ``struct btf_array`` allows chaining through element type to represent
|
||||
multiple dimensional arrays. For example, ``int a[5][6]``, the following
|
||||
type system illustrates the chaining:
|
||||
|
||||
* [1]: int
|
||||
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
|
||||
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
|
||||
|
||||
Currently, both pahole and llvm collapse multiple dimensional array
|
||||
into one dimensional array, e.g., ``a[5][6]``, the btf_array.nelems
|
||||
equal to ``30``. This is because the original use case is map pretty
|
||||
print where the whole array is dumped out so one dimensional array
|
||||
is enough. As more BTF usage is explored, pahole and llvm can be
|
||||
changed to generate proper chained representation for
|
||||
multiple dimensional arrays.
|
||||
|
||||
2.2.4 BTF_KIND_STRUCT
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
2.2.5 BTF_KIND_UNION
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0 or offset to a valid C identifier
|
||||
* ``info.kind_flag``: 0 or 1
|
||||
* ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION
|
||||
* ``info.vlen``: the number of struct/union members
|
||||
* ``info.size``: the size of the struct/union in bytes
|
||||
|
||||
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.::
|
||||
|
||||
struct btf_member {
|
||||
__u32 name_off;
|
||||
__u32 type;
|
||||
__u32 offset;
|
||||
};
|
||||
|
||||
``struct btf_member`` encoding:
|
||||
* ``name_off``: offset to a valid C identifier
|
||||
* ``type``: the member type
|
||||
* ``offset``: <see below>
|
||||
|
||||
If the type info ``kind_flag`` is not set, the offset contains
|
||||
only bit offset of the member. Note that the base type of the
|
||||
bitfield can only be int or enum type. If the bitfield size
|
||||
is 32, the base type can be either int or enum type.
|
||||
If the bitfield size is not 32, the base type must be int,
|
||||
and int type ``BTF_INT_BITS()`` encodes the bitfield size.
|
||||
|
||||
If the ``kind_flag`` is set, the ``btf_member.offset``
|
||||
contains both member bitfield size and bit offset. The
|
||||
bitfield size and bit offset are calculated as below.::
|
||||
|
||||
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
|
||||
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
|
||||
|
||||
In this case, if the base type is an int type, it must
|
||||
be a regular int type:
|
||||
|
||||
* ``BTF_INT_OFFSET()`` must be 0.
|
||||
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
|
||||
|
||||
The following kernel patch introduced ``kind_flag`` and
|
||||
explained why both modes exist:
|
||||
|
||||
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
|
||||
|
||||
2.2.6 BTF_KIND_ENUM
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0 or offset to a valid C identifier
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_ENUM
|
||||
* ``info.vlen``: number of enum values
|
||||
* ``size``: 4
|
||||
|
||||
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.::
|
||||
|
||||
struct btf_enum {
|
||||
__u32 name_off;
|
||||
__s32 val;
|
||||
};
|
||||
|
||||
The ``btf_enum`` encoding:
|
||||
* ``name_off``: offset to a valid C identifier
|
||||
* ``val``: any value
|
||||
|
||||
2.2.7 BTF_KIND_FWD
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: offset to a valid C identifier
|
||||
* ``info.kind_flag``: 0 for struct, 1 for union
|
||||
* ``info.kind``: BTF_KIND_FWD
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: 0
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.8 BTF_KIND_TYPEDEF
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: offset to a valid C identifier
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_TYPEDEF
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the type which can be referred by name at ``name_off``
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.9 BTF_KIND_VOLATILE
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_VOLATILE
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the type with ``volatile`` qualifier
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.10 BTF_KIND_CONST
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_CONST
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the type with ``const`` qualifier
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.11 BTF_KIND_RESTRICT
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_RESTRICT
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the type with ``restrict`` qualifier
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
2.2.12 BTF_KIND_FUNC
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: offset to a valid C identifier
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_FUNC
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: a BTF_KIND_FUNC_PROTO type
|
||||
|
||||
No additional type data follow ``btf_type``.
|
||||
|
||||
A BTF_KIND_FUNC defines, not a type, but a subprogram (function) whose
|
||||
signature is defined by ``type``. The subprogram is thus an instance of
|
||||
that type. The BTF_KIND_FUNC may in turn be referenced by a func_info in
|
||||
the :ref:`BTF_Ext_Section` (ELF) or in the arguments to
|
||||
:ref:`BPF_Prog_Load` (ABI).
|
||||
|
||||
2.2.13 BTF_KIND_FUNC_PROTO
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: 0
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_FUNC_PROTO
|
||||
* ``info.vlen``: # of parameters
|
||||
* ``type``: the return type
|
||||
|
||||
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.::
|
||||
|
||||
struct btf_param {
|
||||
__u32 name_off;
|
||||
__u32 type;
|
||||
};
|
||||
|
||||
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type,
|
||||
then ``btf_param.name_off`` must point to a valid C identifier
|
||||
except for the possible last argument representing the variable
|
||||
argument. The btf_param.type refers to parameter type.
|
||||
|
||||
If the function has variable arguments, the last parameter
|
||||
is encoded with ``name_off = 0`` and ``type = 0``.
|
||||
|
||||
3. BTF Kernel API
|
||||
*****************
|
||||
|
||||
The following bpf syscall command involves BTF:
|
||||
* BPF_BTF_LOAD: load a blob of BTF data into kernel
|
||||
* BPF_MAP_CREATE: map creation with btf key and value type info.
|
||||
* BPF_PROG_LOAD: prog load with btf function and line info.
|
||||
* BPF_BTF_GET_FD_BY_ID: get a btf fd
|
||||
* BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info
|
||||
and other btf related info are returned.
|
||||
|
||||
The workflow typically looks like:
|
||||
::
|
||||
|
||||
Application:
|
||||
BPF_BTF_LOAD
|
||||
|
|
||||
v
|
||||
BPF_MAP_CREATE and BPF_PROG_LOAD
|
||||
|
|
||||
V
|
||||
......
|
||||
|
||||
Introspection tool:
|
||||
......
|
||||
BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's)
|
||||
|
|
||||
V
|
||||
BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd)
|
||||
|
|
||||
V
|
||||
BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id)
|
||||
| |
|
||||
V |
|
||||
BPF_BTF_GET_FD_BY_ID (get btf_fd) |
|
||||
| |
|
||||
V |
|
||||
BPF_OBJ_GET_INFO_BY_FD (get btf) |
|
||||
| |
|
||||
V V
|
||||
pretty print types, dump func signatures and line info, etc.
|
||||
|
||||
|
||||
3.1 BPF_BTF_LOAD
|
||||
================
|
||||
|
||||
Load a blob of BTF data into kernel. A blob of data
|
||||
described in :ref:`BTF_Type_String`
|
||||
can be directly loaded into the kernel.
|
||||
A ``btf_fd`` returns to userspace.
|
||||
|
||||
3.2 BPF_MAP_CREATE
|
||||
==================
|
||||
|
||||
A map can be created with ``btf_fd`` and specified key/value type id.::
|
||||
|
||||
__u32 btf_fd; /* fd pointing to a BTF type data */
|
||||
__u32 btf_key_type_id; /* BTF type_id of the key */
|
||||
__u32 btf_value_type_id; /* BTF type_id of the value */
|
||||
|
||||
In libbpf, the map can be defined with extra annotation like below:
|
||||
::
|
||||
|
||||
struct bpf_map_def SEC("maps") btf_map = {
|
||||
.type = BPF_MAP_TYPE_ARRAY,
|
||||
.key_size = sizeof(int),
|
||||
.value_size = sizeof(struct ipv_counts),
|
||||
.max_entries = 4,
|
||||
};
|
||||
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
|
||||
|
||||
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name,
|
||||
key and value types for the map.
|
||||
During ELF parsing, libbpf is able to extract key/value type_id's
|
||||
and assigned them to BPF_MAP_CREATE attributes automatically.
|
||||
|
||||
.. _BPF_Prog_Load:
|
||||
|
||||
3.3 BPF_PROG_LOAD
|
||||
=================
|
||||
|
||||
During prog_load, func_info and line_info can be passed to kernel with
|
||||
proper values for the following attributes:
|
||||
::
|
||||
|
||||
__u32 insn_cnt;
|
||||
__aligned_u64 insns;
|
||||
......
|
||||
__u32 prog_btf_fd; /* fd pointing to BTF type data */
|
||||
__u32 func_info_rec_size; /* userspace bpf_func_info size */
|
||||
__aligned_u64 func_info; /* func info */
|
||||
__u32 func_info_cnt; /* number of bpf_func_info records */
|
||||
__u32 line_info_rec_size; /* userspace bpf_line_info size */
|
||||
__aligned_u64 line_info; /* line info */
|
||||
__u32 line_info_cnt; /* number of bpf_line_info records */
|
||||
|
||||
The func_info and line_info are an array of below, respectively.::
|
||||
|
||||
struct bpf_func_info {
|
||||
__u32 insn_off; /* [0, insn_cnt - 1] */
|
||||
__u32 type_id; /* pointing to a BTF_KIND_FUNC type */
|
||||
};
|
||||
struct bpf_line_info {
|
||||
__u32 insn_off; /* [0, insn_cnt - 1] */
|
||||
__u32 file_name_off; /* offset to string table for the filename */
|
||||
__u32 line_off; /* offset to string table for the source line */
|
||||
__u32 line_col; /* line number and column number */
|
||||
};
|
||||
|
||||
func_info_rec_size is the size of each func_info record, and line_info_rec_size
|
||||
is the size of each line_info record. Passing the record size to kernel make
|
||||
it possible to extend the record itself in the future.
|
||||
|
||||
Below are requirements for func_info:
|
||||
* func_info[0].insn_off must be 0.
|
||||
* the func_info insn_off is in strictly increasing order and matches
|
||||
bpf func boundaries.
|
||||
|
||||
Below are requirements for line_info:
|
||||
* the first insn in each func must points to a line_info record.
|
||||
* the line_info insn_off is in strictly increasing order.
|
||||
|
||||
For line_info, the line number and column number are defined as below:
|
||||
::
|
||||
|
||||
#define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10)
|
||||
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
|
||||
|
||||
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
|
||||
|
||||
In kernel, every loaded program, map or btf has a unique id.
|
||||
The id won't change during the life time of the program, map or btf.
|
||||
|
||||
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID
|
||||
returns all id's, one for each command, to user space, for bpf
|
||||
program or maps,
|
||||
so the inspection tool can inspect all programs and maps.
|
||||
|
||||
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
|
||||
|
||||
The introspection tool cannot use id to get details about program or maps.
|
||||
A file descriptor needs to be obtained first for reference counting purpose.
|
||||
|
||||
3.6 BPF_OBJ_GET_INFO_BY_FD
|
||||
==========================
|
||||
|
||||
Once a program/map fd is acquired, the introspection tool can
|
||||
get the detailed information from kernel about this fd,
|
||||
some of which is btf related. For example,
|
||||
``bpf_map_info`` returns ``btf_id``, key/value type id.
|
||||
``bpf_prog_info`` returns ``btf_id``, func_info and line info
|
||||
for translated bpf byte codes, and jited_line_info.
|
||||
|
||||
3.7 BPF_BTF_GET_FD_BY_ID
|
||||
========================
|
||||
|
||||
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``,
|
||||
bpf syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd.
|
||||
Then, with command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally
|
||||
loaded into the kernel with BPF_BTF_LOAD, can be retrieved.
|
||||
|
||||
With the btf blob, ``bpf_map_info`` and ``bpf_prog_info``, the introspection
|
||||
tool has full btf knowledge and is able to pretty print map key/values,
|
||||
dump func signatures, dump line info along with byte/jit codes.
|
||||
|
||||
4. ELF File Format Interface
|
||||
****************************
|
||||
|
||||
4.1 .BTF section
|
||||
================
|
||||
|
||||
The .BTF section contains type and string data. The format of this section
|
||||
is same as the one describe in :ref:`BTF_Type_String`.
|
||||
|
||||
.. _BTF_Ext_Section:
|
||||
|
||||
4.2 .BTF.ext section
|
||||
====================
|
||||
|
||||
The .BTF.ext section encodes func_info and line_info which
|
||||
needs loader manipulation before loading into the kernel.
|
||||
|
||||
The specification for .BTF.ext section is defined at
|
||||
``tools/lib/bpf/btf.h`` and ``tools/lib/bpf/btf.c``.
|
||||
|
||||
The current header of .BTF.ext section::
|
||||
|
||||
struct btf_ext_header {
|
||||
__u16 magic;
|
||||
__u8 version;
|
||||
__u8 flags;
|
||||
__u32 hdr_len;
|
||||
|
||||
/* All offsets are in bytes relative to the end of this header */
|
||||
__u32 func_info_off;
|
||||
__u32 func_info_len;
|
||||
__u32 line_info_off;
|
||||
__u32 line_info_len;
|
||||
};
|
||||
|
||||
It is very similar to .BTF section. Instead of type/string section,
|
||||
it contains func_info and line_info section. See :ref:`BPF_Prog_Load`
|
||||
for details about func_info and line_info record format.
|
||||
|
||||
The func_info is organized as below.::
|
||||
|
||||
func_info_rec_size
|
||||
btf_ext_info_sec for section #1 /* func_info for section #1 */
|
||||
btf_ext_info_sec for section #2 /* func_info for section #2 */
|
||||
...
|
||||
|
||||
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure
|
||||
when .BTF.ext is generated. btf_ext_info_sec, defined below, is
|
||||
the func_info for each specific ELF section.::
|
||||
|
||||
struct btf_ext_info_sec {
|
||||
__u32 sec_name_off; /* offset to section name */
|
||||
__u32 num_info;
|
||||
/* Followed by num_info * record_size number of bytes */
|
||||
__u8 data[0];
|
||||
};
|
||||
|
||||
Here, num_info must be greater than 0.
|
||||
|
||||
The line_info is organized as below.::
|
||||
|
||||
line_info_rec_size
|
||||
btf_ext_info_sec for section #1 /* line_info for section #1 */
|
||||
btf_ext_info_sec for section #2 /* line_info for section #2 */
|
||||
...
|
||||
|
||||
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure
|
||||
when .BTF.ext is generated.
|
||||
|
||||
The interpretation of ``bpf_func_info->insn_off`` and
|
||||
``bpf_line_info->insn_off`` is different between kernel API and ELF API.
|
||||
For kernel API, the ``insn_off`` is the instruction offset in the unit
|
||||
of ``struct bpf_insn``. For ELF API, the ``insn_off`` is the byte offset
|
||||
from the beginning of section (``btf_ext_info_sec->sec_name_off``).
|
||||
|
||||
5. Using BTF
|
||||
************
|
||||
|
||||
5.1 bpftool map pretty print
|
||||
============================
|
||||
|
||||
With BTF, the map key/value can be printed based on fields rather than
|
||||
simply raw bytes. This is especially
|
||||
valuable for large structure or if you data structure
|
||||
has bitfields. For example, for the following map,::
|
||||
|
||||
enum A { A1, A2, A3, A4, A5 };
|
||||
typedef enum A ___A;
|
||||
struct tmp_t {
|
||||
char a1:4;
|
||||
int a2:4;
|
||||
int :4;
|
||||
__u32 a3:4;
|
||||
int b;
|
||||
___A b1:4;
|
||||
enum A b2:4;
|
||||
};
|
||||
struct bpf_map_def SEC("maps") tmpmap = {
|
||||
.type = BPF_MAP_TYPE_ARRAY,
|
||||
.key_size = sizeof(__u32),
|
||||
.value_size = sizeof(struct tmp_t),
|
||||
.max_entries = 1,
|
||||
};
|
||||
BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);
|
||||
|
||||
bpftool is able to pretty print like below:
|
||||
::
|
||||
|
||||
[{
|
||||
"key": 0,
|
||||
"value": {
|
||||
"a1": 0x2,
|
||||
"a2": 0x4,
|
||||
"a3": 0x6,
|
||||
"b": 7,
|
||||
"b1": 0x8,
|
||||
"b2": 0xa
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
5.2 bpftool prog dump
|
||||
=====================
|
||||
|
||||
The following is an example to show func_info and line_info
|
||||
can help prog dump with better kernel symbol name, function prototype
|
||||
and line information.::
|
||||
|
||||
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
|
||||
[...]
|
||||
int test_long_fname_2(struct dummy_tracepoint_args * arg):
|
||||
bpf_prog_44a040bf25481309_test_long_fname_2:
|
||||
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
|
||||
0: push %rbp
|
||||
1: mov %rsp,%rbp
|
||||
4: sub $0x30,%rsp
|
||||
b: sub $0x28,%rbp
|
||||
f: mov %rbx,0x0(%rbp)
|
||||
13: mov %r13,0x8(%rbp)
|
||||
17: mov %r14,0x10(%rbp)
|
||||
1b: mov %r15,0x18(%rbp)
|
||||
1f: xor %eax,%eax
|
||||
21: mov %rax,0x20(%rbp)
|
||||
25: xor %esi,%esi
|
||||
; int key = 0;
|
||||
27: mov %esi,-0x4(%rbp)
|
||||
; if (!arg->sock)
|
||||
2a: mov 0x8(%rdi),%rdi
|
||||
; if (!arg->sock)
|
||||
2e: cmp $0x0,%rdi
|
||||
32: je 0x0000000000000070
|
||||
34: mov %rbp,%rsi
|
||||
; counts = bpf_map_lookup_elem(&btf_map, &key);
|
||||
[...]
|
||||
|
||||
5.3 verifier log
|
||||
================
|
||||
|
||||
The following is an example how line_info can help verifier failure debug.::
|
||||
|
||||
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
|
||||
* is modified as below.
|
||||
*/
|
||||
data = (void *)(long)xdp->data;
|
||||
data_end = (void *)(long)xdp->data_end;
|
||||
/*
|
||||
if (data + 4 > data_end)
|
||||
return XDP_DROP;
|
||||
*/
|
||||
*(u32 *)data = dst->dst;
|
||||
|
||||
$ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
|
||||
; data = (void *)(long)xdp->data;
|
||||
224: (79) r2 = *(u64 *)(r10 -112)
|
||||
225: (61) r2 = *(u32 *)(r2 +0)
|
||||
; *(u32 *)data = dst->dst;
|
||||
226: (63) *(u32 *)(r2 +0) = r1
|
||||
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
|
||||
R2 offset is outside of the packet
|
||||
|
||||
6. BTF Generation
|
||||
*****************
|
||||
|
||||
You need latest pahole
|
||||
|
||||
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
|
||||
|
||||
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't support .BTF.ext
|
||||
and btf BTF_KIND_FUNC type yet. For example,::
|
||||
|
||||
-bash-4.4$ cat t.c
|
||||
struct t {
|
||||
int a:2;
|
||||
int b:3;
|
||||
int c:2;
|
||||
} g;
|
||||
-bash-4.4$ gcc -c -O2 -g t.c
|
||||
-bash-4.4$ pahole -JV t.o
|
||||
File t.o:
|
||||
[1] STRUCT t kind_flag=1 size=4 vlen=3
|
||||
a type_id=2 bitfield_size=2 bits_offset=0
|
||||
b type_id=2 bitfield_size=3 bits_offset=2
|
||||
c type_id=2 bitfield_size=2 bits_offset=5
|
||||
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
|
||||
|
||||
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target only.
|
||||
The assembly code (-S) is able to show the BTF encoding in assembly format.::
|
||||
|
||||
-bash-4.4$ cat t2.c
|
||||
typedef int __int32;
|
||||
struct t2 {
|
||||
int a2;
|
||||
int (*f2)(char q1, __int32 q2, ...);
|
||||
int (*f3)();
|
||||
} g2;
|
||||
int main() { return 0; }
|
||||
int test() { return 0; }
|
||||
-bash-4.4$ clang -c -g -O2 -target bpf t2.c
|
||||
-bash-4.4$ readelf -S t2.o
|
||||
......
|
||||
[ 8] .BTF PROGBITS 0000000000000000 00000247
|
||||
000000000000016e 0000000000000000 0 0 1
|
||||
[ 9] .BTF.ext PROGBITS 0000000000000000 000003b5
|
||||
0000000000000060 0000000000000000 0 0 1
|
||||
[10] .rel.BTF.ext REL 0000000000000000 000007e0
|
||||
0000000000000040 0000000000000010 16 9 8
|
||||
......
|
||||
-bash-4.4$ clang -S -g -O2 -target bpf t2.c
|
||||
-bash-4.4$ cat t2.s
|
||||
......
|
||||
.section .BTF,"",@progbits
|
||||
.short 60319 # 0xeb9f
|
||||
.byte 1
|
||||
.byte 0
|
||||
.long 24
|
||||
.long 0
|
||||
.long 220
|
||||
.long 220
|
||||
.long 122
|
||||
.long 0 # BTF_KIND_FUNC_PROTO(id = 1)
|
||||
.long 218103808 # 0xd000000
|
||||
.long 2
|
||||
.long 83 # BTF_KIND_INT(id = 2)
|
||||
.long 16777216 # 0x1000000
|
||||
.long 4
|
||||
.long 16777248 # 0x1000020
|
||||
......
|
||||
.byte 0 # string offset=0
|
||||
.ascii ".text" # string offset=1
|
||||
.byte 0
|
||||
.ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7
|
||||
.byte 0
|
||||
.ascii "int main() { return 0; }" # string offset=33
|
||||
.byte 0
|
||||
.ascii "int test() { return 0; }" # string offset=58
|
||||
.byte 0
|
||||
.ascii "int" # string offset=83
|
||||
......
|
||||
.section .BTF.ext,"",@progbits
|
||||
.short 60319 # 0xeb9f
|
||||
.byte 1
|
||||
.byte 0
|
||||
.long 24
|
||||
.long 0
|
||||
.long 28
|
||||
.long 28
|
||||
.long 44
|
||||
.long 8 # FuncInfo
|
||||
.long 1 # FuncInfo section string offset=1
|
||||
.long 2
|
||||
.long .Lfunc_begin0
|
||||
.long 3
|
||||
.long .Lfunc_begin1
|
||||
.long 5
|
||||
.long 16 # LineInfo
|
||||
.long 1 # LineInfo section string offset=1
|
||||
.long 2
|
||||
.long .Ltmp0
|
||||
.long 7
|
||||
.long 33
|
||||
.long 7182 # Line 7 Col 14
|
||||
.long .Ltmp3
|
||||
.long 7
|
||||
.long 58
|
||||
.long 8206 # Line 8 Col 14
|
||||
|
||||
7. Testing
|
||||
**********
|
||||
|
||||
Kernel bpf selftest `test_btf.c` provides extensive set of BTF related tests.
|
@ -15,6 +15,13 @@ that goes into great technical depth about the BPF Architecture.
|
||||
The primary info for the bpf syscall is available in the `man-pages`_
|
||||
for `bpf(2)`_.
|
||||
|
||||
BPF Type Format (BTF)
|
||||
=====================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
btf
|
||||
|
||||
|
||||
Frequently asked questions (FAQ)
|
||||
|
Loading…
Reference in New Issue
Block a user