· 7 years ago · Jan 25, 2019, 12:10 PM
1# bcc Reference Guide
2
3Intended for search (Ctrl-F) and reference. For tutorials, start with [tutorial.md](tutorial.md).
4
5This guide is incomplete. If something feels missing, check the bcc and kernel source. And if you confirm we're missing something, please send a pull request to fix it, and help out everyone.
6
7## Contents
8
9- [BPF C](#bpf-c)
10 - [Events & Arguments](#events--arguments)
11 - [1. kprobes](#1-kprobes)
12 - [2. kretprobes](#2-kretprobes)
13 - [3. Tracepoints](#3-tracepoints)
14 - [4. uprobes](#4-uprobes)
15 - [5. uretprobes](#5-uretprobes)
16 - [6. USDT probes](#6-usdt-probes)
17 - [7. Raw Tracepoints](#7-raw-tracepoints)
18 - [Data](#data)
19 - [1. bpf_probe_read()](#1-bpf_probe_read)
20 - [2. bpf_probe_read_str()](#2-bpf_probe_read_str)
21 - [3. bpf_ktime_get_ns()](#3-bpf_ktime_get_ns)
22 - [4. bpf_get_current_pid_tgid()](#4-bpf_get_current_pid_tgid)
23 - [5. bpf_get_current_uid_gid()](#5-bpf_get_current_uid_gid)
24 - [6. bpf_get_current_comm()](#6-bpf_get_current_comm)
25 - [7. bpf_get_current_task()](#7-bpf_get_current_task)
26 - [8. bpf_log2l()](#8-bpflog2l)
27 - [9. bpf_get_prandom_u32()](#9-bpf_get_prandom_u32)
28 - [Debugging](#debugging)
29 - [1. bpf_override_return()](#1-bpf_override_return)
30 - [Output](#output)
31 - [1. bpf_trace_printk()](#1-bpf_trace_printk)
32 - [2. BPF_PERF_OUTPUT](#2-bpf_perf_output)
33 - [3. perf_submit()](#3-perf_submit)
34 - [Maps](#maps)
35 - [1. BPF_TABLE](#1-bpf_table)
36 - [2. BPF_HASH](#2-bpf_hash)
37 - [3. BPF_ARRAY](#3-bpf_array)
38 - [4. BPF_HISTOGRAM](#4-bpf_histogram)
39 - [5. BPF_STACK_TRACE](#5-bpf_stack_trace)
40 - [6. BPF_PERF_ARRAY](#6-bpf_perf_array)
41 - [7. BPF_PERCPU_ARRAY](#7-bpf_percpu_array)
42 - [8. BPF_LPM_TRIE](#8-bpf_lpm_trie)
43 - [9. BPF_PROG_ARRAY](#9-bpf_prog_array)
44 - [10. BPF_DEVMAP](#10-bpf_devmap)
45 - [11. BPF_CPUMAP](#11-bpf_cpumap)
46 - [12. map.lookup()](#12-maplookup)
47 - [13. map.lookup_or_init()](#13-maplookup_or_init)
48 - [14. map.delete()](#14-mapdelete)
49 - [15. map.update()](#15-mapupdate)
50 - [16. map.insert()](#16-mapinsert)
51 - [17. map.increment()](#17-mapincrement)
52 - [18. map.get_stackid()](#18-mapget_stackid)
53 - [19. map.perf_read()](#19-mapperf_read)
54 - [20. map.call()](#20-mapcall)
55 - [21. map.redirect_map()](#21-mapredirect_map)
56 - [Licensing](#licensing)
57
58- [bcc Python](#bcc-python)
59 - [Initialization](#initialization)
60 - [1. BPF](#1-bpf)
61 - [2. USDT](#2-usdt)
62 - [Events](#events)
63 - [1. attach_kprobe()](#1-attach_kprobe)
64 - [2. attach_kretprobe()](#2-attach_kretprobe)
65 - [3. attach_tracepoint()](#3-attach_tracepoint)
66 - [4. attach_uprobe()](#4-attach_uprobe)
67 - [5. attach_uretprobe()](#5-attach_uretprobe)
68 - [6. USDT.enable_probe()](#6-usdtenable_probe)
69 - [7. attach_raw_tracepoint()](#7-attach_raw_tracepoint)
70 - [Debug Output](#debug-output)
71 - [1. trace_print()](#1-trace_print)
72 - [2. trace_fields()](#2-trace_fields)
73 - [Output](#output)
74 - [1. perf_buffer_poll()](#1-perf_buffer_poll)
75 - [Maps](#maps)
76 - [1. get_table()](#1-get_table)
77 - [2. open_perf_buffer()](#2-open_perf_buffer)
78 - [3. items()](#3-items)
79 - [4. values()](#4-values)
80 - [5. clear()](#5-clear)
81 - [6. print_log2_hist()](#6-print_log2_hist)
82 - [7. print_linear_hist()](#6-print_linear_hist)
83 - [Helpers](#helpers)
84 - [1. ksym()](#1-ksym)
85 - [2. ksymname()](#2-ksymname)
86 - [3. sym()](#3-sym)
87 - [4. num_open_kprobes()](#4-num_open_kprobes)
88
89- [BPF Errors](#bpf-errors)
90 - [1. Invalid mem access](#1-invalid-mem-access)
91 - [2. Cannot call GPL only function from proprietary program](#2-cannot-call-gpl-only-function-from-proprietary-program)
92
93- [Environment Variables](#envvars)
94 - [1. kernel source directory](#1-kernel-source-directory)
95 - [2. kernel version overriding](#2-kernel-version-overriding)
96
97# BPF C
98
99This section describes the C part of a bcc program.
100
101## Events & Arguments
102
103### 1. kprobes
104
105BPF.get_kprobe_functions(self.pattern)
106BPF.get_user_functions_and_addresses(self.library, self.pattern)
107
108Syntax: kprobe__*kernel_function_name*
109
110```kprobe__``` is a special prefix that creates a kprobe (dynamic tracing of a kernel function call) for the kernel function name provided as the remainder. You can also use kprobes by declaring a normal C function, then using the Python ```BPF.attach_kprobe()``` (covered later) to associate it with a kernel function.
111
112Arguments are specified on the function declaration: kprobe__*kernel_function_name*(struct pt_regs *ctx [, *argument1* ...])
113
114For example:
115
116```C
117int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk)
118 [...]
119}
120```
121
122This instruments the tcp_v4_connect() kernel function using a kprobe, with the following arguments:
123
124- ```struct pt_regs *ctx```: Registers and BPF context.
125- ```struct sock *sk```: First argument to tcp_v4_connect().
126
127The first argument is always ```struct pt_regs *```, the remainder are the arguments to the function (they don't need to be specified, if you don't intend to use them).
128
129Examples in situ:
130[code](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/examples/tracing/tcpv4connect.py#L28) ([output](https://github.com/iovisor/bcc/blob/5bd0eb21fd148927b078deb8ac29fff2fb044b66/examples/tracing/tcpv4connect_example.txt#L8)),
131[code](https://github.com/iovisor/bcc/commit/310ab53710cfd46095c1f6b3e44f1dbc8d1a41d8#diff-8cd1822359ffee26e7469f991ce0ef00R26) ([output](https://github.com/iovisor/bcc/blob/3b9679a3bd9b922c736f6061dc65cb56de7e0250/examples/tracing/bitehist_example.txt#L6))
132<!--- I can't add search links here, since github currently cannot handle partial-word searches needed for "kprobe__" --->
133
134### 2. kretprobes
135
136Syntax: kretprobe__*kernel_function_name*
137
138```kretprobe__``` is a special prefix that creates a kretprobe (dynamic tracing of a kernel function return) for the kernel function name provided as the remainder. You can also use kretprobes by declaring a normal C function, then using the Python ```BPF.attach_kretprobe()``` (covered later) to associate it with a kernel function.
139
140Return value is available as ```PT_REGS_RC(ctx)```, given a function declaration of: kretprobe__*kernel_function_name*(struct pt_regs *ctx)
141
142For example:
143
144```C
145int kretprobe__tcp_v4_connect(struct pt_regs *ctx)
146{
147 int ret = PT_REGS_RC(ctx);
148 [...]
149}
150```
151
152This instruments the return of the tcp_v4_connect() kernel function using a kretprobe, and stores the return value in ```ret```.
153
154Examples in situ:
155[code](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/examples/tracing/tcpv4connect.py#L38) ([output](https://github.com/iovisor/bcc/blob/5bd0eb21fd148927b078deb8ac29fff2fb044b66/examples/tracing/tcpv4connect_example.txt#L8))
156
157### 3. Tracepoints
158
159Syntax: TRACEPOINT_PROBE(*category*, *event*)
160Example: TRACEPOINT_PROBE(syscalls, sys_enter_read)
161Macro: #define TRACEPOINT_PROBE(category, event) int tracepoint__##category##__##event(struct tracepoint__##category##__##event *args)
162MacroFile: https://github.com/iovisor/bcc/blob/master/src/cc/export/helpers.h
163Format: /sys/kernel/debug/tracing/events/category/event/format
164Example-Format: /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format
165
166#Python
167Imports: BPF
168List: BPF.get_tracepoints(pattern)
169Example: BPF.get_tracepoints("sys*")
170Get-All: BPF.get_tracepoints()
171
172
173This is a macro that instruments the tracepoint defined by *category*:*event*.
174
175Arguments are available in an ```args``` struct, which are the tracepoint arguments. One way to list these is to cat the relevant format file under /sys/kernel/debug/tracing/events/*category*/*event*/format.
176
177The ```args``` struct can be used in place of ``ctx`` in each functions requiring a context as an argument. This includes notably [perf_submit()](#3-perf_submit).
178
179For example:
180
181```C
182
183TRACEPOINT_PROBE(random, urandom_read) {
184 // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
185 bpf_trace_printk("%d\\n", args->got_bits);
186 return 0;
187}
188```
189
190Hint: If you use TRACEPOINT_PROBE and if you use bpf_trace_printk to output anything you must need b.trace_fields()
191 Better is you use BPF_PERF_OUTPUT Buffer because you can overwrite the output of any traced program
192 On every function in BPF program you must need give a output either bpf_trace_printk or BPF_PERF_OUTPUT oherwise you get a Permission error
193
194
195This instruments the random:urandom_read tracepoint, and prints the tracepoint argument ```got_bits```.
196
197Examples in situ:
198[code](https://github.com/iovisor/bcc/blob/a4159da8c4ea8a05a3c6e402451f530d6e5a8b41/examples/tracing/urandomread.py#L19) ([output](https://github.com/iovisor/bcc/commit/e422f5e50ecefb96579b6391a2ada7f6367b83c4#diff-41e5ecfae4a3b38de5f4e0887ed160e5R10)),
199[search /examples](https://github.com/iovisor/bcc/search?q=TRACEPOINT_PROBE+path%3Aexamples&type=Code),
200[search /tools](https://github.com/iovisor/bcc/search?q=TRACEPOINT_PROBE+path%3Atools&type=Code)
201
202### 4. uprobes
203
204These are instrumented by declaring a normal function in C, then associating it as a uprobe probe in Python via ```BPF.attach_uprobe()``` (covered later).
205
206Arguments can be examined using ```PT_REGS_PARM``` macros.
207
208For example:
209
210```C
211int count(struct pt_regs *ctx) {
212 char buf[64];
213 bpf_probe_read(&buf, sizeof(buf), (void *)PT_REGS_PARM1(ctx));
214 bpf_trace_printk("%s %d", buf, PT_REGS_PARM2(ctx));
215 return(0);
216}
217```
218
219Hint: You can not use direct read/write/memset/memcpy operations because direct memory operations is forbidden!
220 You must use bpf_* functions for that otherwise your BPF program can not compiled or you get a Permission deied error!
221
222This reads the first argument as a string, and then prints it with the second argument as an integer.
223
224Examples in situ:
225[code](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/examples/tracing/strlen_count.py#L26)
226
227### 5. uretprobes
228
229These are instrumented by declaring a normal function in C, then associating it as a uretprobe probe in Python via ```BPF.attach_uretprobe()``` (covered later).
230
231Return value is available as ```PT_REGS_RC(ctx)```, given a function declaration of: *function_name*(struct pt_regs *ctx)
232
233For example:
234
235```C
236BPF_HISTOGRAM(dist);
237int count(struct pt_regs *ctx) {
238 dist.increment(PT_REGS_RC(ctx));
239 return 0;
240}
241```
242
243This increments the bucket in the ```dist``` histogram that is indexed by the return value.
244
245Examples in situ:
246[code](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/examples/tracing/strlen_hist.py#L39) ([output](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/examples/tracing/strlen_hist.py#L15)),
247[code](https://github.com/iovisor/bcc/blob/4afa96a71c5dbfc4c507c3355e20baa6c184a3a8/tools/bashreadline.py) ([output](https://github.com/iovisor/bcc/commit/aa87997d21e5c1a6a20e2c96dd25eb92adc8e85d#diff-2fd162f9e594206f789246ce97d62cf0R7))
248
249### 6. USDT probes
250
251These are User Statically-Defined Tracing (USDT) probes, which may be placed in some applications or libraries to provide a user-level equivalent of tracepoints. The primary BPF method provided for USDT support method is ```enable_probe()```. USDT probes are instrumented by declaring a normal function in C, then associating it as a USDT probe in Python via ```USDT.enable_probe()```.
252
253Arguments can be read via: bpf_usdt_readarg(*index*, ctx, &addr)
254
255For example:
256
257```C
258int do_trace(struct pt_regs *ctx) {
259 uint64_t addr;
260 char path[128];
261 bpf_usdt_readarg(6, ctx, &addr);
262 bpf_probe_read(&path, sizeof(path), (void *)addr);
263 bpf_trace_printk("path:%s\\n", path);
264 return 0;
265};
266```
267
268This reads the sixth USDT argument, and then pulls it in as a string to ```path```.
269
270Examples in situ:
271[code](https://github.com/iovisor/bcc/commit/4f88a9401357d7b75e917abd994aa6ea97dda4d3#diff-04a7cad583be5646080970344c48c1f4R24),
272[search /examples](https://github.com/iovisor/bcc/search?q=bpf_usdt_readarg+path%3Aexamples&type=Code),
273[search /tools](https://github.com/iovisor/bcc/search?q=bpf_usdt_readarg+path%3Atools&type=Code)
274
275### 7. Raw Tracepoints
276
277Syntax: RAW_TRACEPOINT_PROBE(*event*)
278Example: RAW_TRACEPOINT_PROBE(sched_wakeup_new) -> See Here: https://github.com/iovisor/bcc/blob/master/tools/runqslower.py#L151
279Macro: int raw_tracepoint__##event(struct bpf_raw_tracepoint_args *ctx)
280MacroFile: https://github.com/iovisor/bcc/blob/master/src/cc/export/helpers.h
281
282This is a macro that instruments the raw tracepoint defined by *event*.
283
284The argument is a pointer to struct ```bpf_raw_tracepoint_args```, which is defined in [bpf.h](https://github.com/iovisor/bcc/blob/master/src/cc/compat/linux/bpf.h). The struct field ```args``` contains all parameters of the raw tracepoint where you can found at linux tree [include/trace/events](https://github.com/torvalds/linux/tree/master/include/trace/events)
285directory.
286
287For example:
288```C
289RAW_TRACEPOINT_PROBE(sched_switch)
290{
291 // TP_PROTO(bool preempt, struct task_struct *prev, struct task_struct *next)
292 struct task_struct *prev = (struct task_struct *)ctx->args[1];
293 struct task_struct *next= (struct task_struct *)ctx->args[2];
294 s32 prev_tgid, next_tgid;
295
296 bpf_probe_read(&prev_tgid, sizeof(prev->tgid), &prev->tgid);
297 bpf_probe_read(&next_tgid, sizeof(next->tgid), &next->tgid);
298 bpf_trace_printk("%d -> %d\\n", prev_tgid, next_tgid);
299}
300```
301
302This instruments the sched:sched_switch tracepoint, and prints the prev and next tgid.
303
304Examples in situ:
305[search /tools](https://github.com/iovisor/bcc/search?q=RAW_TRACEPOINT_PROBE+path%3Atools&type=Code)
306
307## Data
308
309### 1. bpf_probe_read()
310
311Syntax: ```int bpf_probe_read(void *dst, int size, const void *src)```
312
313Return: 0 on success
314
315This copies a memory location to the BPF stack, so that BPF can later operate on it. For safety, all memory reads must pass through bpf_probe_read(). This happens automatically in some cases, such as dereferencing kernel variables, as bcc will rewrite the BPF program to include the necessary bpf_probe_reads().
316
317Examples in situ:
318[search /examples](https://github.com/iovisor/bcc/search?q=bpf_probe_read+path%3Aexamples&type=Code),
319[search /tools](https://github.com/iovisor/bcc/search?q=bpf_probe_read+path%3Atools&type=Code)
320
321### 2. bpf_probe_read_str()
322
323Syntax: ```int bpf_probe_read_str(void *dst, int size, const void *src)```
324
325Return:
326 - \> 0 length of the string including the trailing NULL on success
327 - \< 0 error
328
329This copies a `NULL` terminated string from memory location to BPF stack, so that BPF can later operate on it. In case the string length is smaller than size, the target is not padded with further `NULL` bytes. In case the string length is larger than size, just `size - 1` bytes are copied and the last byte is set to `NULL`.
330
331Examples in situ:
332[search /examples](https://github.com/iovisor/bcc/search?q=bpf_probe_read_str+path%3Aexamples&type=Code),
333[search /tools](https://github.com/iovisor/bcc/search?q=bpf_probe_read_str+path%3Atools&type=Code)
334
335### 3. bpf_ktime_get_ns()
336
337Syntax: ```u64 bpf_ktime_get_ns(void)```
338
339Return: current time in nanoseconds
340
341Examples in situ:
342[search /examples](https://github.com/iovisor/bcc/search?q=bpf_ktime_get_ns+path%3Aexamples&type=Code),
343[search /tools](https://github.com/iovisor/bcc/search?q=bpf_ktime_get_ns+path%3Atools&type=Code)
344
345### 4. bpf_get_current_pid_tgid()
346
347Syntax: ```u64 bpf_get_current_pid_tgid(void)```
348
349Return: ```current->tgid << 32 | current->pid```
350
351Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits.
352
353Examples in situ:
354[search /examples](https://github.com/iovisor/bcc/search?q=bpf_get_current_pid_tgid+path%3Aexamples&type=Code),
355[search /tools](https://github.com/iovisor/bcc/search?q=bpf_get_current_pid_tgid+path%3Atools&type=Code)
356
357### 5. bpf_get_current_uid_gid()
358
359Syntax: ```u64 bpf_get_current_uid_gid(void)```
360
361Return: ```current_gid << 32 | current_uid```
362
363Returns the user ID and group IDs.
364
365Examples in situ:
366[search /examples](https://github.com/iovisor/bcc/search?q=bpf_get_current_uid_gid+path%3Aexamples&type=Code),
367[search /tools](https://github.com/iovisor/bcc/search?q=bpf_get_current_uid_gid+path%3Atools&type=Code)
368
369### 6. bpf_get_current_comm()
370
371Syntax: ```bpf_get_current_comm(char *buf, int size_of_buf)```
372
373Return: 0 on success
374
375Populates the first argument address with the current process name. It should be a pointer to a char array of at least size TASK_COMM_LEN, which is defined in linux/sched.h. For example:
376
377```C
378#include <linux/sched.h>
379
380int do_trace(struct pt_regs *ctx) {
381 char comm[TASK_COMM_LEN];
382 bpf_get_current_comm(&comm, sizeof(comm));
383[...]
384```
385
386Examples in situ:
387[search /examples](https://github.com/iovisor/bcc/search?q=bpf_get_current_comm+path%3Aexamples&type=Code),
388[search /tools](https://github.com/iovisor/bcc/search?q=bpf_get_current_comm+path%3Atools&type=Code)
389
390### 7. bpf_get_current_task()
391
392Syntax: ```bpf_get_current_task()```
393
394Return: current task as a pointer to struct task_struct.
395
396Returns a pointer to the current task's task_struct object. This helper can be used to compute the on-CPU time for a process, identify kernel threads, get the current CPU's run queue, or retrieve many other pieces of information.
397
398With Linux 4.13, due to issues with field randomization, you may need two #define directives before the includes:
399```C
400#define randomized_struct_fields_start struct {
401#define randomized_struct_fields_end };
402#include <linux/sched.h>
403
404int do_trace(void *ctx) {
405 struct task_struct *t = (struct task_struct *)bpf_get_current_task();
406[...]
407```
408
409Examples in situ:
410[search /examples](https://github.com/iovisor/bcc/search?q=bpf_get_current_task+path%3Aexamples&type=Code),
411[search /tools](https://github.com/iovisor/bcc/search?q=bpf_get_current_task+path%3Atools&type=Code)
412
413### 8. bpf_log2l()
414
415Syntax: ```unsigned int bpf_log2l(unsigned long v)```
416
417Returns the log-2 of the provided value. This is often used to create indexes for histograms, to construct power-of-2 histograms.
418
419Examples in situ:
420[search /examples](https://github.com/iovisor/bcc/search?q=bpf_log2l+path%3Aexamples&type=Code),
421[search /tools](https://github.com/iovisor/bcc/search?q=bpf_log2l+path%3Atools&type=Code)
422
423### 9. bpf_get_prandom_u32()
424
425Syntax: ```u32 bpf_get_prandom_u32()```
426
427Returns a pseudo-random u32.
428
429Example in situ:
430[search /examples](https://github.com/iovisor/bcc/search?q=bpf_get_prandom_u32+path%3Aexamples&type=Code),
431[search /tools](https://github.com/iovisor/bcc/search?q=bpf_get_prandom_u32+path%3Atools&type=Code)
432
433## Debugging
434
435### 1. bpf_override_return()
436
437Syntax: ```int bpf_override_return(struct pt_regs *, unsigned long rc)```
438
439Return: 0 on success
440
441When used in a program attached to a function entry kprobe, causes the
442execution of the function to be skipped, immediately returning `rc` instead.
443This is used for targeted error injection.
444
445bpf_override_return will only work when the kprobed function is whitelisted to
446allow error injections. Whitelisting entails tagging a function with
447`BPF_ALLOW_ERROR_INJECTION()` in the kernel source tree; see `io_ctl_init` for
448an example. If the kprobed function is not whitelisted, the bpf program will
449fail to attach with ` ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument`
450
451
452```C
453int kprobe__io_ctl_init(void *ctx) {
454 bpf_override_return(ctx, -ENOMEM);
455 return 0;
456}
457```
458
459## Output
460
461### 1. bpf_trace_printk()
462
463Syntax: ```int bpf_trace_printk(const char *fmt, ...)```
464
465Return: 0 on success
466
467A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(). Note that calling this helper is made simpler than the original kernel version, which has ```fmt_size``` as the second parameter.
468
469Examples in situ:
470[search /examples](https://github.com/iovisor/bcc/search?q=bpf_trace_printk+path%3Aexamples&type=Code),
471[search /tools](https://github.com/iovisor/bcc/search?q=bpf_trace_printk+path%3Atools&type=Code)
472
473### 2. BPF_PERF_OUTPUT
474
475Syntax: ```BPF_PERF_OUTPUT(name)```
476
477Creates a BPF table for pushing out custom event data to user space via a perf ring buffer. This is the preferred method for pushing per-event data to user space.
478
479For example:
480
481```C
482struct data_t {
483 u32 pid;
484 u64 ts;
485 char comm[TASK_COMM_LEN];
486};
487BPF_PERF_OUTPUT(events);
488
489int hello(struct pt_regs *ctx) {
490 struct data_t data = {};
491
492 data.pid = bpf_get_current_pid_tgid();
493 data.ts = bpf_ktime_get_ns();
494 bpf_get_current_comm(&data.comm, sizeof(data.comm));
495
496 events.perf_submit(ctx, &data, sizeof(data));
497
498 return 0;
499}
500```
501
502#Python example with callback function
503# load BPF program
504b = BPF(text=prog)
505b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
506
507# define output data structure in Python
508TASK_COMM_LEN = 16 # linux/sched.h
509class Data(ct.Structure):
510 _fields_ = [("pid", ct.c_uint),
511 ("ts", ct.c_ulonglong),
512 ("comm", ct.c_char * TASK_COMM_LEN)]
513
514# header
515print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
516
517# Callback fumctopm
518def callback(cpu, data, size):
519 assert sys.sizeof(data) >= size
520 event = ct.cast(data, ct.POINTER(Data)).contents
521 if start == 0:
522 start = event.ts
523 time_s = (float(event.ts - start)) / 1000000000
524 print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
525 "Hello, perf_output!"))
526
527# loop with callback to print_event
528b["events"].open_perf_buffer(print_event)
529while 1:
530 b.perf_buffer_poll()
531
532#Python end
533# Python example a litte bit changed because global start throw permission error
534https://github.com/iovisor/bcc/blob/b0bf04ac4042a6c004b15dcc5e40e12ae78020f9/examples/tracing/hello_perf_output.py
535
536The output table is named ```events```, and data is pushed to it via ```events.perf_submit()```.
537
538Examples in situ:
539[search /examples](https://github.com/iovisor/bcc/search?q=BPF_PERF_OUTPUT+path%3Aexamples&type=Code),
540[search /tools](https://github.com/iovisor/bcc/search?q=BPF_PERF_OUTPUT+path%3Atools&type=Code)
541
542### 3. perf_submit()
543
544Syntax: ```int perf_submit((void *)ctx, (void *)data, u32 data_size)```
545
546Return: 0 on success
547
548A method of a BPF_PERF_OUTPUT table, for submitting custom event data to user space. See the BPF_PERF_OUTPUT entry. (This ultimately calls bpf_perf_event_output().)
549
550Examples in situ:
551[search /examples](https://github.com/iovisor/bcc/search?q=perf_submit+path%3Aexamples&type=Code),
552[search /tools](https://github.com/iovisor/bcc/search?q=perf_submit+path%3Atools&type=Code)
553
554## Maps
555
556Maps are BPF data stores, and are the basis for higher level object types including tables, hashes, and histograms.
557
558### 1. BPF_TABLE
559
560Syntax: ```BPF_TABLE(_table_type, _key_type, _leaf_type, _name, _max_entries)```
561
562Creates a map named ```_name```. Most of the time this will be used via higher-level macros, like BPF_HASH, BPF_HIST, etc.
563
564Methods (covered later): map.lookup(), map.lookup_or_init(), map.delete(), map.update(), map.insert(), map.increment().
565
566Examples in situ:
567[search /examples](https://github.com/iovisor/bcc/search?q=BPF_TABLE+path%3Aexamples&type=Code),
568[search /tools](https://github.com/iovisor/bcc/search?q=BPF_TABLE+path%3Atools&type=Code)
569
570### 2. BPF_HASH
571
572Syntax: ```BPF_HASH(name [, key_type [, leaf_type [, size]]])```
573
574Creates a hash map (associative array) named ```name```, with optional parameters.
575
576Defaults: ```BPF_HASH(name, key_type=u64, leaf_type=u64, size=10240)```
577
578For example:
579
580```C
581BPF_HASH(start, struct request *);
582```
583
584This creates a hash named ```start``` where the key is a ```struct request *```, and the value defaults to u64. This hash is used by the disksnoop.py example for saving timestamps for each I/O request, where the key is the pointer to struct request, and the value is the timestamp.
585
586Methods (covered later): map.lookup(), map.lookup_or_init(), map.delete(), map.update(), map.insert(), map.increment().
587
588Examples in situ:
589[search /examples](https://github.com/iovisor/bcc/search?q=BPF_HASH+path%3Aexamples&type=Code),
590[search /tools](https://github.com/iovisor/bcc/search?q=BPF_HASH+path%3Atools&type=Code)
591
592### 3. BPF_ARRAY
593
594Syntax: ```BPF_ARRAY(name [, leaf_type [, size]])```
595
596Creates an int-indexed array which is optimized for fastest lookup and update, named ```name```, with optional parameters.
597
598Defaults: ```BPF_ARRAY(name, leaf_type=u64, size=10240)```
599
600For example:
601
602```C
603BPF_ARRAY(counts, u64, 32);
604```
605
606This creates an array named ```counts``` where with 32 buckets and 64-bit integer values. This array is used by the funccount.py example for saving call count of each function.
607
608Methods (covered later): map.lookup(), map.update(), map.increment(). Note that all array elements are pre-allocated with zero values and can not be deleted.
609
610Examples in situ:
611[search /examples](https://github.com/iovisor/bcc/search?q=BPF_ARRAY+path%3Aexamples&type=Code),
612[search /tools](https://github.com/iovisor/bcc/search?q=BPF_ARRAY+path%3Atools&type=Code)
613
614### 4. BPF_HISTOGRAM
615
616Syntax: ```BPF_HISTOGRAM(name [, key_type [, size ]])```
617
618Creates a histogram map named ```name```, with optional parameters.
619
620Defaults: ```BPF_HISTOGRAM(name, key_type=int, size=64)```
621
622For example:
623
624```C
625BPF_HISTOGRAM(dist);
626```
627
628This creates a histogram named ```dist```, which defaults to 64 buckets indexed by keys of type int.
629
630Methods (covered later): map.increment().
631
632Examples in situ:
633[search /examples](https://github.com/iovisor/bcc/search?q=BPF_HISTOGRAM+path%3Aexamples&type=Code),
634[search /tools](https://github.com/iovisor/bcc/search?q=BPF_HISTOGRAM+path%3Atools&type=Code)
635
636### 5. BPF_STACK_TRACE
637
638Syntax: ```BPF_STACK_TRACE(name, max_entries)```
639
640Creates stack trace map named ```name```, with a maximum entry count provided. These maps are used to store stack traces.
641
642For example:
643
644```C
645BPF_STACK_TRACE(stack_traces, 1024);
646```
647
648This creates stack trace map named ```stack_traces```, with a maximum number of stack trace entries of 1024.
649
650Methods (covered later): map.get_stackid().
651
652Examples in situ:
653[search /examples](https://github.com/iovisor/bcc/search?q=BPF_STACK_TRACE+path%3Aexamples&type=Code),
654[search /tools](https://github.com/iovisor/bcc/search?q=BPF_STACK_TRACE+path%3Atools&type=Code)
655
656### 6. BPF_PERF_ARRAY
657
658Syntax: ```BPF_PERF_ARRAY(name, max_entries)```
659
660Creates perf array named ```name```, with a maximum entry count provided, which must be equal to the number of system cpus. These maps are used to fetch hardware performance counters.
661
662For example:
663
664```C
665text="""
666BPF_PERF_ARRAY(cpu_cycles, NUM_CPUS);
667"""
668b = bcc.BPF(text=text, cflags=["-DNUM_CPUS=%d" % multiprocessing.cpu_count()])
669b["cpu_cycles"].open_perf_event(b["cpu_cycles"].HW_CPU_CYCLES)
670```
671
672This creates a perf array named ```cpu_cycles```, with number of entries equal to the number of cpus/cores. The array is configured so that later calling map.perf_read() will return a hardware-calculated counter of the number of cycles elapsed from some point in the past. Only one type of hardware counter may be configured per table at a time.
673
674Methods (covered later): map.perf_read().
675
676Examples in situ:
677[search /tests](https://github.com/iovisor/bcc/search?q=BPF_PERF_ARRAY+path%3Atests&type=Code)
678
679### 7. BPF_PERCPU_ARRAY
680
681Syntax: ```BPF_PERCPU_ARRAY(name [, leaf_type [, size]])```
682
683Creates NUM_CPU int-indexed arrays which are optimized for fastest lookup and update, named ```name```, with optional parameters. Each CPU will have a separate copy of this array. The copies are not kept synchronized in any way.
684
685
686Defaults: ```BPF_PERCPU_ARRAY(name, leaf_type=u64, size=10240)```
687
688For example:
689
690```C
691BPF_PERCPU_ARRAY(counts, u64, 32);
692```
693
694This creates NUM_CPU arrays named ```counts``` where with 32 buckets and 64-bit integer values.
695
696Methods (covered later): map.lookup(), map.update(), map.increment(). Note that all array elements are pre-allocated with zero values and can not be deleted.
697
698Examples in situ:
699[search /examples](https://github.com/iovisor/bcc/search?q=BPF_PERCPU_ARRAY+path%3Aexamples&type=Code),
700[search /tools](https://github.com/iovisor/bcc/search?q=BPF_PERCPU_ARRAY+path%3Atools&type=Code)
701
702### 8. BPF_LPM_TRIE
703
704Syntax: `BPF_LPM_TRIE(name [, key_type [, leaf_type [, size]]])`
705
706Creates a longest prefix match trie map named `name`, with optional parameters.
707
708Defaults: `BPF_LPM_TRIE(name, key_type=u64, leaf_type=u64, size=10240)`
709
710For example:
711
712```c
713BPF_LPM_TRIE(trie, struct key_v6);
714```
715
716This creates an LPM trie map named `trie` where the key is a `struct key_v6`, and the value defaults to u64.
717
718Methods (covered later): map.lookup(), map.lookup_or_init(), map.delete(), map.update(), map.insert(), map.increment().
719
720Examples in situ:
721[search /examples](https://github.com/iovisor/bcc/search?q=BPF_LPM_TRIE+path%3Aexamples&type=Code),
722[search /tools](https://github.com/iovisor/bcc/search?q=BPF_LPM_TRIE+path%3Atools&type=Code)
723
724### 9. BPF_PROG_ARRAY
725
726Syntax: ```BPF_PROG_ARRAY(name, size)```
727
728This creates a program array named ```name``` with ```size``` entries. Each entry of the array is either a file descriptor to a bpf program or ```NULL```. The array acts as a jump table so that bpf programs can "tail-call" other bpf programs.
729
730Methods (covered later): map.call().
731
732Examples in situ:
733[search /examples](https://github.com/iovisor/bcc/search?q=BPF_PROG_ARRAY+path%3Aexamples&type=Code),
734[search /tests](https://github.com/iovisor/bcc/search?q=BPF_PROG_ARRAY+path%3Atests&type=Code),
735[assign fd](https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.py#L24-L26)
736
737### 10. BPF_DEVMAP
738
739Syntax: ```BPF_DEVMAP(name, size)```
740
741This creates a device map named ```name``` with ```size``` entries. Each entry of the map is an `ifindex` to a network interface. This map is only used in XDP.
742
743For example:
744```C
745BPF_DEVMAP(devmap, 10);
746```
747
748Methods (covered later): map.redirect_map().
749
750Examples in situ:
751[search /examples](https://github.com/iovisor/bcc/search?q=BPF_DEVMAP+path%3Aexamples&type=Code),
752
753### 11. BPF_CPUMAP
754
755Syntax: ```BPF_CPUMAP(name, size)```
756
757This creates a cpu map named ```name``` with ```size``` entries. The index of the map represents the CPU id and each entry is the size of the ring buffer allocated for the CPU. This map is only used in XDP.
758
759For example:
760```C
761BPF_CPUMAP(cpumap, 16);
762```
763
764Methods (covered later): map.redirect_map().
765
766Examples in situ:
767[search /examples](https://github.com/iovisor/bcc/search?q=BPF_CPUMAP+path%3Aexamples&type=Code),
768
769### 12. map.lookup()
770
771Syntax: ```*val map.lookup(&key)```
772
773Lookup the key in the map, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer.
774
775Examples in situ:
776[search /examples](https://github.com/iovisor/bcc/search?q=lookup+path%3Aexamples&type=Code),
777[search /tools](https://github.com/iovisor/bcc/search?q=lookup+path%3Atools&type=Code)
778
779### 13. map.lookup_or_init()
780
781Syntax: ```*val map.lookup_or_init(&key, &zero)```
782
783Lookup the key in the map, and return a pointer to its value if it exists, else initialize the key's value to the second argument. This is often used to initialize values to zero.
784
785Examples in situ:
786[search /examples](https://github.com/iovisor/bcc/search?q=lookup_or_init+path%3Aexamples&type=Code),
787[search /tools](https://github.com/iovisor/bcc/search?q=lookup_or_init+path%3Atools&type=Code)
788
789### 14. map.delete()
790
791Syntax: ```map.delete(&key)```
792
793Delete the key from the hash.
794
795Examples in situ:
796[search /examples](https://github.com/iovisor/bcc/search?q=delete+path%3Aexamples&type=Code),
797[search /tools](https://github.com/iovisor/bcc/search?q=delete+path%3Atools&type=Code)
798
799### 15. map.update()
800
801Syntax: ```map.update(&key, &val)```
802
803Associate the value in the second argument to the key, overwriting any previous value.
804
805Examples in situ:
806[search /examples](https://github.com/iovisor/bcc/search?q=update+path%3Aexamples&type=Code),
807[search /tools](https://github.com/iovisor/bcc/search?q=update+path%3Atools&type=Code)
808
809### 16. map.insert()
810
811Syntax: ```map.insert(&key, &val)```
812
813Associate the value in the second argument to the key, only if there was no previous value.
814
815Examples in situ:
816[search /examples](https://github.com/iovisor/bcc/search?q=insert+path%3Aexamples&type=Code)
817
818### 17. map.increment()
819
820Syntax: ```map.increment(key[, increment_amount])```
821
822Increments the key's value by `increment_amount`, which defaults to 1. Used for histograms.
823
824Examples in situ:
825[search /examples](https://github.com/iovisor/bcc/search?q=increment+path%3Aexamples&type=Code),
826[search /tools](https://github.com/iovisor/bcc/search?q=increment+path%3Atools&type=Code)
827
828### 18. map.get_stackid()
829
830Syntax: ```int map.get_stackid(void *ctx, u64 flags)```
831
832This walks the stack found via the struct pt_regs in ```ctx```, saves it in the stack trace map, and returns a unique ID for the stack trace.
833
834Examples in situ:
835[search /examples](https://github.com/iovisor/bcc/search?q=get_stackid+path%3Aexamples&type=Code),
836[search /tools](https://github.com/iovisor/bcc/search?q=get_stackid+path%3Atools&type=Code)
837
838### 19. map.perf_read()
839
840Syntax: ```u64 map.perf_read(u32 cpu)```
841
842This returns the hardware performance counter as configured in [5. BPF_PERF_ARRAY](#5-bpf_perf_array)
843
844Examples in situ:
845[search /tests](https://github.com/iovisor/bcc/search?q=perf_read+path%3Atests&type=Code)
846
847### 20. map.call()
848
849Syntax: ```void map.call(void *ctx, int index)```
850
851This invokes ```bpf_tail_call()``` to tail-call the bpf program which the ```index``` entry in [9. BPF_PROG_ARRAY](#9-bpf_prog_array) points to. A tail-call is different from the normal call. It reuses the current stack frame after jumping to another bpf program and never goes back. If the ```index``` entry is empty, it won't jump anywhere and the program execution continues as normal.
852
853For example:
854
855```C
856BPF_PROG_ARRAY(prog_array, 10);
857
858int tail_call(void *ctx) {
859 bpf_trace_printk("Tail-call\n");
860 return 0;
861}
862
863int do_tail_call(void *ctx) {
864 bpf_trace_printk("Original program\n");
865 prog_array.call(ctx, 2);
866 return 0;
867}
868```
869
870```Python
871b = BPF(src_file="example.c")
872tail_fn = b.load_func("tail_call", BPF.KPROBE)
873prog_array = b.get_table("prog_array")
874prog_array[c_int(2)] = c_int(tail_fn.fd)
875b.attach_kprobe(event="some_kprobe_event", fn_name="do_tail_call")
876```
877
878This assigns ```tail_call()``` to ```prog_array[2]```. In the end of ```do_tail_call()```, ```prog_array.call(ctx, 2)``` tail-calls ```tail_call()``` and executes it.
879
880**NOTE:** To prevent infinite loop, the maximum number of tail-calls is 32 ([```MAX_TAIL_CALL_CNT```](https://github.com/torvalds/linux/search?l=C&q=MAX_TAIL_CALL_CNT+path%3Ainclude%2Flinux&type=Code)).
881
882Examples in situ:
883[search /examples](https://github.com/iovisor/bcc/search?l=C&q=call+path%3Aexamples&type=Code),
884[search /tests](https://github.com/iovisor/bcc/search?l=C&q=call+path%3Atests&type=Code)
885
886### 21. map.redirect_map()
887
888Syntax: ```int map.redirect_map(int index, int flags)```
889
890This redirects the incoming packets based on the ```index``` entry. If the map is [10. BPF_DEVMAP](#10-bpf_devmap), the packet will be sent to the transmit queue of the network interface that the entry points to. If the map is [11. BPF_CPUMAP](#11-bpf_cpumap), the packet will be sent to the ring buffer of the ```index``` CPU and be processed by the CPU later.
891
892If the packet is redirected successfully, the function will return XDP_REDIRECT. Otherwise, it will return XDP_ABORTED to discard the packet.
893
894For example:
895```C
896BPF_DEVMAP(devmap, 1);
897
898int redirect_example(struct xdp_md *ctx) {
899 return devmap.redirect_map(0, 0);
900}
901int xdp_dummy(struct xdp_md *ctx) {
902 return XDP_PASS;
903}
904```
905
906```Python
907ip = pyroute2.IPRoute()
908idx = ip.link_lookup(ifname="eth1")[0]
909
910b = bcc.BPF(src_file="example.c")
911
912devmap = b.get_table("devmap")
913devmap[c_uint32(0)] = c_int(idx)
914
915in_fn = b.load_func("redirect_example", BPF.XDP)
916out_fn = b.load_func("xdp_dummy", BPF.XDP)
917b.attach_xdp("eth0", in_fn, 0)
918b.attach_xdp("eth1", out_fn, 0)
919```
920
921Examples in situ:
922[search /examples](https://github.com/iovisor/bcc/search?l=C&q=redirect_map+path%3Aexamples&type=Code),
923
924## Licensing
925
926Depending on which [BPF helpers](kernel-versions.md#helpers) are used, a GPL-compatible license is required.
927
928The special BCC macro `BPF_LICENSE` specifies the license of the BPF program. You can set the license as a comment in your source code, but the kernel has a special interface to specify it programmatically. If you need to use GPL-only helpers, it is recommended to specify the macro in your C code so that the kernel can understand it:
929
930```C
931// SPDX-License-Identifier: GPL-2.0+
932#define BPF_LICENSE GPL
933```
934
935Otherwise, the kernel may reject loading your program (see the [error description](#2-cannot-call-gpl-only-function-from-proprietary-program) below). Note that it supports multiple words and quotes are not necessary:
936
937```C
938// SPDX-License-Identifier: GPL-2.0+ OR BSD-2-Clause
939#define BPF_LICENSE Dual BSD/GPL
940```
941
942Check the [BPF helpers reference](kernel-versions.md#helpers) to see which helpers are GPL-only and what the kernel understands as GPL-compatible.
943
944**If the macro is not specified, BCC will automatically define the license of the program as GPL.**
945
946# bcc Python
947
948## Initialization
949
950Constructors.
951
952### 1. BPF
953
954Syntax: ```BPF({text=BPF_program | src_file=filename} [, usdt_contexts=[USDT_object, ...]] [, cflags=[arg1, ...]] [, debug=int])```
955
956Creates a BPF object. This is the main object for defining a BPF program, and interacting with its output.
957
958Exactly one of `text` or `src_file` must be supplied (not both).
959
960The `cflags` specifies additional arguments to be passed to the compiler, for example `-DMACRO_NAME=value` or `-I/include/path`. The arguments are passed as an array, with each element being an additional argument. Note that strings are not split on whitespace, so each argument must be a different element of the array, e.g. `["-include", "header.h"]`.
961
962The `debug` flags control debug output, and can be or'ed together:
963- `DEBUG_LLVM_IR = 0x1` compiled LLVM IR
964- `DEBUG_BPF = 0x2` loaded BPF bytecode and register state on branches
965- `DEBUG_PREPROCESSOR = 0x4` pre-processor result
966- `DEBUG_SOURCE = 0x8` ASM instructions embedded with source
967- `DEBUG_BPF_REGISTER_STATE = 0x10` register state on all instructions in addition to DEBUG_BPF
968
969Examples:
970
971```Python
972# define entire BPF program in one line:
973BPF(text='int do_trace(void *ctx) { bpf_trace_printk("hit!\\n"); return 0; }');
974
975# define program as a variable:
976prog = """
977int hello(void *ctx) {
978 bpf_trace_printk("Hello, World!\\n");
979 return 0;
980}
981"""
982b = BPF(text=prog)
983
984# source a file:
985b = BPF(src_file = "vfsreadlat.c")
986
987# include a USDT object:
988u = USDT(pid=int(pid))
989[...]
990b = BPF(text=bpf_text, usdt_contexts=[u])
991
992# add include paths:
993u = BPF(text=prog, cflags=["-I/path/to/include"])
994```
995
996Examples in situ:
997[search /examples](https://github.com/iovisor/bcc/search?q=BPF+path%3Aexamples+language%3Apython&type=Code),
998[search /tools](https://github.com/iovisor/bcc/search?q=BPF+path%3Atools+language%3Apython&type=Code)
999
1000### 2. USDT
1001
1002Syntax: ```USDT({pid=pid | path=path})```
1003
1004Creates an object to instrument User Statically-Defined Tracing (USDT) probes. Its primary method is ```enable_probe()```.
1005
1006Arguments:
1007
1008- pid: attach to this process ID.
1009- path: instrument USDT probes from this binary path.
1010
1011Examples:
1012
1013```Python
1014# include a USDT object:
1015u = USDT(pid=int(pid))
1016[...]
1017b = BPF(text=bpf_text, usdt_contexts=[u])
1018```
1019
1020Examples in situ:
1021[search /examples](https://github.com/iovisor/bcc/search?q=USDT+path%3Aexamples+language%3Apython&type=Code),
1022[search /tools](https://github.com/iovisor/bcc/search?q=USDT+path%3Atools+language%3Apython&type=Code)
1023
1024## Events
1025
1026### 1. attach_kprobe()
1027
1028Syntax: ```BPF.attach_kprobe(event="event", fn_name="name")```
1029
1030Instruments the kernel function ```event()``` using kernel dynamic tracing of the function entry, and attaches our C defined function ```name()``` to be called when the kernel function is called.
1031
1032For example:
1033
1034```Python
1035b.attach_kprobe(event="sys_clone", fn_name="do_trace")
1036```
1037
1038This will instrument the kernel ```sys_clone()``` function, which will then run our BPF defined ```do_trace()``` function each time it is called.
1039
1040You can call attach_kprobe() more than once, and attach your BPF function to multiple kernel functions.
1041
1042See the previous kprobes section for how to instrument arguments from BPF.
1043
1044Examples in situ:
1045[search /examples](https://github.com/iovisor/bcc/search?q=attach_kprobe+path%3Aexamples+language%3Apython&type=Code),
1046[search /tools](https://github.com/iovisor/bcc/search?q=attach_kprobe+path%3Atools+language%3Apython&type=Code)
1047
1048### 2. attach_kretprobe()
1049
1050Syntax: ```BPF.attach_kretprobe(event="event", fn_name="name")```
1051
1052Instruments the return of the kernel function ```event()``` using kernel dynamic tracing of the function return, and attaches our C defined function ```name()``` to be called when the kernel function returns.
1053
1054For example:
1055
1056```Python
1057b.attach_kretprobe(event="vfs_read", fn_name="do_return")
1058```
1059
1060This will instrument the kernel ```vfs_read()``` function, which will then run our BPF defined ```do_return()``` function each time it is called.
1061
1062You can call attach_kretprobe() more than once, and attach your BPF function to multiple kernel function returns.
1063
1064See the previous kretprobes section for how to instrument the return value from BPF.
1065
1066Examples in situ:
1067[search /examples](https://github.com/iovisor/bcc/search?q=attach_kretprobe+path%3Aexamples+language%3Apython&type=Code),
1068[search /tools](https://github.com/iovisor/bcc/search?q=attach_kretprobe+path%3Atools+language%3Apython&type=Code)
1069
1070### 3. attach_tracepoint()
1071
1072Syntax: ```BPF.attach_tracepoint(tp="tracepoint", fn_name="name")```
1073
1074Instruments the kernel tracepoint described by ```tracepoint```, and when hit, runs the BPF function ```name()```.
1075
1076This is an explicit way to instrument tracepoints. The ```TRACEPOINT_PROBE``` syntax, covered in the earlier tracepoints section, is an alternate method with the advantage of auto-declaring an ```args``` struct containing the tracepoint arguments. With ```attach_tracepoint()```, the tracepoint arguments need to be declared in the BPF program.
1077
1078For example:
1079
1080```Python
1081# define BPF program
1082bpf_text = """
1083#include <uapi/linux/ptrace.h>
1084
1085struct urandom_read_args {
1086 // from /sys/kernel/debug/tracing/events/random/urandom_read/format
1087 u64 __unused__;
1088 u32 got_bits;
1089 u32 pool_left;
1090 u32 input_left;
1091};
1092
1093int printarg(struct urandom_read_args *args) {
1094 bpf_trace_printk("%d\\n", args->got_bits);
1095 return 0;
1096};
1097"""
1098
1099# load BPF program
1100b = BPF(text=bpf_text)
1101b.attach_tracepoint("random:urandom_read", "printarg")
1102```
1103
1104Notice how the first argument to ```printarg()``` is now our defined struct.
1105
1106Examples in situ:
1107[code](https://github.com/iovisor/bcc/blob/a4159da8c4ea8a05a3c6e402451f530d6e5a8b41/examples/tracing/urandomread-explicit.py#L41)
1108
1109### 4. attach_uprobe()
1110
1111Syntax: ```BPF.attach_uprobe(name="location", sym="symbol", fn_name="name")```
1112
1113Instruments the user-level function ```symbol()``` from either the library or binary named by ```location``` using user-level dynamic tracing of the function entry, and attach our C defined function ```name()``` to be called whenever the user-level function is called.
1114
1115Libraries can be given in the name argument without the lib prefix, or with the full path (/usr/lib/...). Binaries can be given only with the full path (/bin/sh).
1116
1117For example:
1118
1119```Python
1120b.attach_uprobe(name="c", sym="strlen", fn_name="count")
1121```
1122
1123This will instrument ```strlen()``` function from libc, and call our BPF function ```count()``` when it is called. Note how the "lib" in "libc" is not necessary to specify.
1124
1125Other examples:
1126
1127```Python
1128b.attach_uprobe(name="c", sym="getaddrinfo", fn_name="do_entry")
1129b.attach_uprobe(name="/usr/bin/python", sym="main", fn_name="do_main")
1130```
1131
1132You can call attach_uprobe() more than once, and attach your BPF function to multiple user-level functions.
1133
1134See the previous uprobes section for how to instrument arguments from BPF.
1135
1136Examples in situ:
1137[search /examples](https://github.com/iovisor/bcc/search?q=attach_uprobe+path%3Aexamples+language%3Apython&type=Code),
1138[search /tools](https://github.com/iovisor/bcc/search?q=attach_uprobe+path%3Atools+language%3Apython&type=Code)
1139
1140### 5. attach_uretprobe()
1141
1142Syntax: ```BPF.attach_uretprobe(name="location", sym="symbol", fn_name="name")```
1143
1144Instruments the return of the user-level function ```symbol()``` from either the library or binary named by ```location``` using user-level dynamic tracing of the function return, and attach our C defined function ```name()``` to be called whenever the user-level function returns.
1145
1146For example:
1147
1148```Python
1149b.attach_uretprobe(name="c", sym="strlen", fn_name="count")
1150```
1151
1152This will instrument ```strlen()``` function from libc, and call our BPF function ```count()``` when it returns.
1153
1154Other examples:
1155
1156```Python
1157b.attach_uprobe(name="c", sym="getaddrinfo", fn_name="do_entry")
1158b.attach_uprobe(name="/usr/bin/python", sym="main", fn_name="do_main")
1159```
1160
1161You can call attach_uretprobe() more than once, and attach your BPF function to multiple user-level functions.
1162
1163See the previous uretprobes section for how to instrument the return value from BPF.
1164
1165Examples in situ:
1166[search /examples](https://github.com/iovisor/bcc/search?q=attach_uretprobe+path%3Aexamples+language%3Apython&type=Code),
1167[search /tools](https://github.com/iovisor/bcc/search?q=attach_uretprobe+path%3Atools+language%3Apython&type=Code)
1168
1169### 6. USDT.enable_probe()
1170
1171Syntax: ```USDT.enable_probe(probe=probe, fn_name=name)```
1172
1173Attaches a BPF C function ```name``` to the USDT probe ```probe```.
1174
1175Example:
1176
1177```Python
1178# enable USDT probe from given PID
1179u = USDT(pid=int(pid))
1180u.enable_probe(probe="http__server__request", fn_name="do_trace")
1181```
1182
1183To check if your binary has USDT probes, and what they are, you can run ```readelf -n binary``` and check the stap debug section.
1184
1185Examples in situ:
1186[search /examples](https://github.com/iovisor/bcc/search?q=enable_probe+path%3Aexamples+language%3Apython&type=Code),
1187[search /tools](https://github.com/iovisor/bcc/search?q=enable_probe+path%3Atools+language%3Apython&type=Code)
1188
1189### 7. attach_raw_tracepoint()
1190
1191Syntax: ```BPF.attach_raw_tracepoint(tp="tracepoint", fn_name="name")```
1192
1193Instruments the kernel raw tracepoint described by ```tracepoint``` (```event``` only, no ```category```), and when hit, runs the BPF function ```name()```.
1194
1195This is an explicit way to instrument tracepoints. The ```RAW_TRACEPOINT_PROBE``` syntax, covered in the earlier raw tracepoints section, is an alternate method.
1196
1197For example:
1198
1199```Python
1200b.attach_raw_tracepoint("sched_swtich", "do_trace")
1201```
1202
1203Examples in situ:
1204[search /tools](https://github.com/iovisor/bcc/search?q=attach_raw_tracepoint+path%3Atools+language%3Apython&type=Code)
1205
1206## Debug Output
1207
1208### 1. trace_print()
1209
1210Syntax: ```BPF.trace_print(fmt="fields")```
1211
1212This method continually reads the globally shared /sys/kernel/debug/tracing/trace_pipe file and prints its contents. This file can be written to via BPF and the bpf_trace_printk() function, however, that method has limitations, including a lack of concurrent tracing support. The BPF_PERF_OUTPUT mechanism, covered earlier, is preferred.
1213
1214Arguments:
1215
1216- ```fmt```: optional, and can contain a field formatting string. It defaults to ```None```.
1217
1218Examples:
1219
1220```Python
1221# print trace_pipe output as-is:
1222b.trace_print()
1223
1224# print PID and message:
1225b.trace_print(fmt="{1} {5}")
1226```
1227
1228Examples in situ:
1229[search /examples](https://github.com/iovisor/bcc/search?q=trace_print+path%3Aexamples+language%3Apython&type=Code),
1230[search /tools](https://github.com/iovisor/bcc/search?q=trace_print+path%3Atools+language%3Apython&type=Code)
1231
1232### 2. trace_fields()
1233
1234Syntax: ```BPF.trace_fields(nonblocking=False)```
1235
1236This method reads one line from the globally shared /sys/kernel/debug/tracing/trace_pipe file and returns it as fields. This file can be written to via BPF and the bpf_trace_printk() function, however, that method has limitations, including a lack of concurrent tracing support. The BPF_PERF_OUTPUT mechanism, covered earlier, is preferred.
1237
1238Arguments:
1239
1240- ```nonblocking```: optional, defaults to ```False```. When set to ```True```, the program will not block waiting for input.
1241
1242Examples:
1243
1244```Python
1245while 1:
1246 try:
1247 (task, pid, cpu, flags, ts, msg) = b.trace_fields()
1248 except ValueError:
1249 continue
1250 [...]
1251```
1252
1253Examples in situ:
1254[search /examples](https://github.com/iovisor/bcc/search?q=trace_print+path%3Aexamples+language%3Apython&type=Code),
1255[search /tools](https://github.com/iovisor/bcc/search?q=trace_print+path%3Atools+language%3Apython&type=Code)
1256
1257## Output
1258
1259Normal output from a BPF program is either:
1260
1261- per-event: using PERF_EVENT_OUTPUT, open_perf_buffer(), and perf_buffer_poll().
1262- map summary: using items(), or print_log2_hist(), covered in the Maps section.
1263
1264### 1. perf_buffer_poll()
1265
1266Syntax: ```BPF.perf_buffer_poll()```
1267
1268This polls from all open perf ring buffers, calling the callback function that was provided when calling open_perf_buffer for each entry.
1269
1270Example:
1271
1272```Python
1273# loop with callback to print_event
1274b["events"].open_perf_buffer(print_event)
1275while 1:
1276 b.perf_buffer_poll()
1277```
1278
1279Examples in situ:
1280[code](https://github.com/iovisor/bcc/blob/08fbceb7e828f0e3e77688497727c5b2405905fd/examples/tracing/hello_perf_output.py#L61),
1281[search /examples](https://github.com/iovisor/bcc/search?q=perf_buffer_poll+path%3Aexamples+language%3Apython&type=Code),
1282[search /tools](https://github.com/iovisor/bcc/search?q=perf_buffer_poll+path%3Atools+language%3Apython&type=Code)
1283
1284## Maps
1285
1286Maps are BPF data stores, and are used in bcc to implement a table, and then higher level objects on top of tables, including hashes and histograms.
1287
1288### 1. get_table()
1289
1290Syntax: ```BPF.get_table(name)```
1291
1292Returns a table object. This is no longer used, as tables can now be read as items from BPF. Eg: ```BPF[name]```.
1293
1294Examples:
1295
1296```Python
1297counts = b.get_table("counts")
1298
1299counts = b["counts"]
1300```
1301
1302These are equivalent.
1303
1304### 2. open_perf_buffer()
1305
1306Syntax: ```table.open_perf_buffers(callback, page_cnt=N, lost_cb=None)```
1307
1308This operates on a table as defined in BPF as BPF_PERF_OUTPUT(), and associates the callback Python function ```callback``` to be called when data is available in the perf ring buffer. This is part of the recommended mechanism for transferring per-event data from kernel to user space. The size of the perf ring buffer can be specified via the ```page_cnt``` parameter, which must be a power of two number of pages and defaults to 8. If the callback is not processing data fast enough, some submitted data may be lost. ```lost_cb``` will be called to log / monitor the lost count. If ```lost_cb``` is the default ```None``` value, it will just print a line of message to ```stderr```.
1309
1310Example:
1311
1312```Python
1313# process event
1314def print_event(cpu, data, size):
1315 event = ct.cast(data, ct.POINTER(Data)).contents
1316 [...]
1317
1318# loop with callback to print_event
1319b["events"].open_perf_buffer(print_event)
1320while 1:
1321 b.perf_buffer_poll()
1322```
1323
1324Note that the data structure transferred will need to be declared in C in the BPF program, and in Python. For example:
1325
1326```C
1327// define output data structure in C
1328struct data_t {
1329 u32 pid;
1330 u64 ts;
1331 char comm[TASK_COMM_LEN];
1332};
1333```
1334
1335```Python
1336# define output data structure in Python
1337TASK_COMM_LEN = 16 # linux/sched.h
1338class Data(ct.Structure):
1339 _fields_ = [("pid", ct.c_ulonglong),
1340 ("ts", ct.c_ulonglong),
1341 ("comm", ct.c_char * TASK_COMM_LEN)]
1342```
1343
1344Perhaps in a future bcc version, the Python data structure will be automatically generated from the C declaration.
1345
1346Examples in situ:
1347[code](https://github.com/iovisor/bcc/blob/08fbceb7e828f0e3e77688497727c5b2405905fd/examples/tracing/hello_perf_output.py#L59),
1348[search /examples](https://github.com/iovisor/bcc/search?q=open_perf_buffer+path%3Aexamples+language%3Apython&type=Code),
1349[search /tools](https://github.com/iovisor/bcc/search?q=open_perf_buffer+path%3Atools+language%3Apython&type=Code)
1350
1351### 3. items()
1352
1353Syntax: ```table.items()```
1354
1355Returns an array of the keys in a table. This can be used with BPF_HASH maps to fetch, and iterate, over the keys.
1356
1357Example:
1358
1359```Python
1360# print output
1361print("%10s %s" % ("COUNT", "STRING"))
1362counts = b.get_table("counts")
1363for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
1364 print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))
1365```
1366
1367This example also uses the ```sorted()``` method to sort by value.
1368
1369Examples in situ:
1370[search /examples](https://github.com/iovisor/bcc/search?q=clear+items%3Aexamples+language%3Apython&type=Code),
1371[search /tools](https://github.com/iovisor/bcc/search?q=clear+items%3Atools+language%3Apython&type=Code)
1372
1373### 4. values()
1374
1375Syntax: ```table.values()```
1376
1377Returns an array of the values in a table.
1378
1379### 5. clear()
1380
1381Syntax: ```table.clear()```
1382
1383Clears the table: deletes all entries.
1384
1385Example:
1386
1387```Python
1388# print map summary every second:
1389while True:
1390 time.sleep(1)
1391 print("%-8s\n" % time.strftime("%H:%M:%S"), end="")
1392 dist.print_log2_hist(sym + " return:")
1393 dist.clear()
1394```
1395
1396Examples in situ:
1397[search /examples](https://github.com/iovisor/bcc/search?q=clear+path%3Aexamples+language%3Apython&type=Code),
1398[search /tools](https://github.com/iovisor/bcc/search?q=clear+path%3Atools+language%3Apython&type=Code)
1399
1400### 6. print_log2_hist()
1401
1402Syntax: ```table.print_log2_hist(val_type="value", section_header="Bucket ptr", section_print_fn=None)```
1403
1404Prints a table as a log2 histogram in ASCII. The table must be stored as log2, which can be done using the BPF function ```bpf_log2l()```.
1405
1406Arguments:
1407
1408- val_type: optional, column header.
1409- section_header: if the histogram has a secondary key, multiple tables will print and section_header can be used as a header description for each.
1410- section_print_fn: if section_print_fn is not None, it will be passed the bucket value.
1411
1412Example:
1413
1414```Python
1415b = BPF(text="""
1416BPF_HISTOGRAM(dist);
1417
1418int kprobe__blk_account_io_completion(struct pt_regs *ctx, struct request *req)
1419{
1420 dist.increment(bpf_log2l(req->__data_len / 1024));
1421 return 0;
1422}
1423""")
1424[...]
1425
1426b["dist"].print_log2_hist("kbytes")
1427```
1428
1429Output:
1430
1431```
1432 kbytes : count distribution
1433 0 -> 1 : 3 | |
1434 2 -> 3 : 0 | |
1435 4 -> 7 : 211 |********** |
1436 8 -> 15 : 0 | |
1437 16 -> 31 : 0 | |
1438 32 -> 63 : 0 | |
1439 64 -> 127 : 1 | |
1440 128 -> 255 : 800 |**************************************|
1441```
1442
1443This output shows a multi-modal distribution, with the largest mode of 128->255 kbytes and a count of 800.
1444
1445This is an efficient way to summarize data, as the summarization is performed in-kernel, and only the count column is passed to user space.
1446
1447Examples in situ:
1448[search /examples](https://github.com/iovisor/bcc/search?q=print_log2_hist+path%3Aexamples+language%3Apython&type=Code),
1449[search /tools](https://github.com/iovisor/bcc/search?q=print_log2_hist+path%3Atools+language%3Apython&type=Code)
1450
1451### 6. print_linear_hist()
1452
1453Syntax: ```table.print_linear_hist(val_type="value", section_header="Bucket ptr", section_print_fn=None)```
1454
1455Prints a table as a linear histogram in ASCII. This is intended to visualize small integer ranges, eg, 0 to 100.
1456
1457Arguments:
1458
1459- val_type: optional, column header.
1460- section_header: if the histogram has a secondary key, multiple tables will print and section_header can be used as a header description for each.
1461- section_print_fn: if section_print_fn is not None, it will be passed the bucket value.
1462
1463Example:
1464
1465```Python
1466b = BPF(text="""
1467BPF_HISTOGRAM(dist);
1468
1469int kprobe__blk_account_io_completion(struct pt_regs *ctx, struct request *req)
1470{
1471 dist.increment(req->__data_len / 1024);
1472 return 0;
1473}
1474""")
1475[...]
1476
1477b["dist"].print_linear_hist("kbytes")
1478```
1479
1480Output:
1481
1482```
1483 kbytes : count distribution
1484 0 : 3 |****** |
1485 1 : 0 | |
1486 2 : 0 | |
1487 3 : 0 | |
1488 4 : 19 |****************************************|
1489 5 : 0 | |
1490 6 : 0 | |
1491 7 : 0 | |
1492 8 : 4 |******** |
1493 9 : 0 | |
1494 10 : 0 | |
1495 11 : 0 | |
1496 12 : 0 | |
1497 13 : 0 | |
1498 14 : 0 | |
1499 15 : 0 | |
1500 16 : 2 |**** |
1501[...]
1502```
1503
1504This is an efficient way to summarize data, as the summarization is performed in-kernel, and only the values in the count column are passed to user space.
1505
1506Examples in situ:
1507[search /examples](https://github.com/iovisor/bcc/search?q=print_linear_hist+path%3Aexamples+language%3Apython&type=Code),
1508[search /tools](https://github.com/iovisor/bcc/search?q=print_linear_hist+path%3Atools+language%3Apython&type=Code)
1509
1510## Helpers
1511
1512Some helper methods provided by bcc. Note that since we're in Python, we can import any Python library and their methods, including, for example, the libraries: argparse, collections, ctypes, datetime, re, socket, struct, subprocess, sys, and time.
1513
1514### 1. ksym()
1515
1516Syntax: ```BPF.ksym(addr)```
1517
1518Translate a kernel memory address into a kernel function name, which is returned.
1519
1520Example:
1521
1522```Python
1523print("kernel function: " + b.ksym(addr))
1524```
1525
1526Examples in situ:
1527[search /examples](https://github.com/iovisor/bcc/search?q=ksym+path%3Aexamples+language%3Apython&type=Code),
1528[search /tools](https://github.com/iovisor/bcc/search?q=ksym+path%3Atools+language%3Apython&type=Code)
1529
1530### 2. ksymname()
1531
1532Syntax: ```BPF.ksymname(name)```
1533
1534Translate a kernel name into an address. This is the reverse of ksym. Returns -1 when the function name is unknown.
1535
1536Example:
1537
1538```Python
1539print("kernel address: %x" % b.ksymname("vfs_read"))
1540```
1541
1542Examples in situ:
1543[search /examples](https://github.com/iovisor/bcc/search?q=ksymname+path%3Aexamples+language%3Apython&type=Code),
1544[search /tools](https://github.com/iovisor/bcc/search?q=ksymname+path%3Atools+language%3Apython&type=Code)
1545
1546### 3. sym()
1547
1548Syntax: ```BPF.sym(addr, pid, show_module=False, show_offset=False)```
1549
1550Translate a memory address into a function name for a pid, which is returned. A pid of less than zero will access the kernel symbol cache. The `show_module` and `show_offset` parameters control whether the module in which the symbol lies should be displayed, and whether the instruction offset from the beginning of the symbol should be displayed. These extra parameters default to `False`.
1551
1552Example:
1553
1554```Python
1555print("function: " + b.sym(addr, pid))
1556```
1557
1558Examples in situ:
1559[search /examples](https://github.com/iovisor/bcc/search?q=sym+path%3Aexamples+language%3Apython&type=Code),
1560[search /tools](https://github.com/iovisor/bcc/search?q=sym+path%3Atools+language%3Apython&type=Code)
1561
1562### 4. num_open_kprobes()
1563
1564Syntax: ```BPF.num_open_kprobes()```
1565
1566Returns the number of open k[ret]probes. Can be useful for scenarios where event_re is used while attaching and detaching probes. Excludes perf_events readers.
1567
1568Example:
1569
1570```Python
1571b.attach_kprobe(event_re=pattern, fn_name="trace_count")
1572matched = b.num_open_kprobes()
1573if matched == 0:
1574 print("0 functions matched by \"%s\". Exiting." % args.pattern)
1575 exit()
1576```
1577
1578Examples in situ:
1579[search /examples](https://github.com/iovisor/bcc/search?q=num_open_kprobes+path%3Aexamples+language%3Apython&type=Code),
1580[search /tools](https://github.com/iovisor/bcc/search?q=num_open_kprobes+path%3Atools+language%3Apython&type=Code)
1581
1582# BPF Errors
1583
1584See the "Understanding eBPF verifier messages" section in the kernel source under Documentation/networking/filter.txt.
1585
1586## 1. Invalid mem access
1587
1588This can be due to trying to read memory directly, instead of operating on memory on the BPF stack. All memory reads must be passed via bpf_probe_read() to copy memory into the BPF stack, which can be automatic by the bcc rewriter in some cases of simple dereferencing. bpf_probe_read() does all the required checks.
1589
1590Example:
1591
1592```
1593bpf: Permission denied
15940: (bf) r6 = r1
15951: (79) r7 = *(u64 *)(r6 +80)
15962: (85) call 14
15973: (bf) r8 = r0
1598[...]
159923: (69) r1 = *(u16 *)(r7 +16)
1600R7 invalid mem access 'inv'
1601
1602Traceback (most recent call last):
1603 File "./tcpaccept", line 179, in <module>
1604 b = BPF(text=bpf_text)
1605 File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 172, in __init__
1606 self._trace_autoload()
1607 File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 612, in _trace_autoload
1608 fn = self.load_func(func_name, BPF.KPROBE)
1609 File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 212, in load_func
1610 raise Exception("Failed to load BPF program %s" % func_name)
1611Exception: Failed to load BPF program kretprobe__inet_csk_accept
1612```
1613
1614## 2. Cannot call GPL only function from proprietary program
1615
1616This error happens when a GPL-only helper is called from a non-GPL BPF program. To fix this error, do not use GPL-only helpers from a proprietary BPF program, or relicense the BPF program under a GPL-compatible license. Check which [BPF helpers](https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md#helpers) are GPL-only, and what licenses are considered GPL-compatible.
1617
1618Example calling `bpf_get_stackid()`, a GPL-only BPF helper, from a proprietary program (`#define BPF_LICENSE Proprietary`):
1619
1620```
1621bpf: Failed to load program: Invalid argument
1622[...]
16238: (85) call bpf_get_stackid#27
1624cannot call GPL only function from proprietary program
1625```
1626
1627# Environment Variables
1628
1629## 1. Kernel source directory
1630
1631eBPF program compilation needs kernel sources or kernel headers with headers
1632compiled. In case your kernel sources are at a non-standard location where BCC
1633cannot find then, its possible to provide BCC the absolute path of the location
1634by setting `BCC_KERNEL_SOURCE` to it.
1635
1636## 2. Kernel version overriding
1637
1638By default, BCC stores the `LINUX_VERSION_CODE` in the generated eBPF object
1639which is then passed along to the kernel when the eBPF program is loaded.
1640Sometimes this is quite inconvenient especially when the kernel is slightly
1641updated such as an LTS kernel release. Its extremely unlikely the slight
1642mismatch would cause any issues with the loaded eBPF program. By setting
1643`BCC_LINUX_VERSION_CODE` to the version of the kernel that's running, the check
1644for verifying the kernel version can be bypassed. This is needed for programs
1645that use kprobes. This needs to be encoded in the format: `(VERSION * 65536) +
1646(PATCHLEVEL * 256) + SUBLEVEL`. For example, if the running kernel is `4.9.10`,
1647then can set `export BCC_LINUX_VERSION_CODE=264458` to override the kernel
1648version check successfully.