Demonstrating 4 nifty ways that bpftrace can be used to report SKB drops in the Linux kernel networking data path.
The Linux kernel networking subystem used to silently drop SKBs on error, with no way of knowing
what caused the drops. Then came the kfree_skb
tracepoint that gave visibility of SKB drops,
albeit in a raw way:
cat /sys/kernel/tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 1886
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:void * location; offset:16; size:8; signed:0;
field:void * rx_sk; offset:24; size:8; signed:0;
field:unsigned short protocol; offset:32; size:2; signed:0;
field:enum skb_drop_reason reason; offset:36; size:4; signed:0;
This is today's format of the kfree_skb
trace event, which includes a reason
field. The
advent of drop reasons brought much needed precision to the information reported by kfree_skb
events.
The bpftrace
tool makes it super easy to write observability programs that attach to kernel
trace points, amongst other things. Here are a few simple bpftrace
one-liners for reporting
SKB drops as they happen.
Printing SKB drops as they happen
Let's start with a really simple bpftrace
program that reports SKB drops as they happen.
bpftrace -e 'tracepoint:skb:kfree_skb { print(args->reason) }'
Attaching 1 probe...
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_OLD_SEQUENCE
If you're just trying to identify why packets are getting dropped, this might be good enough. But it can be a bit of a firehose when there are a lot of drops for different reasons.
Printing an updating report of SKB drops
This next program builds an array of reason occurrences and prints it out once a second. It prints a console blanking escape sequence so that you get a dynamically refreshing list rather that a scrolling display in the terminal window.
sudo bpftrace -e '
tracepoint:skb:kfree_skb { @[args->reason] = count(); }
interval:s:1 { print("\033[H\033[J"); print(@) }'
Attaching 2 probes...
@[SKB_DROP_REASON_UNHANDLED_PROTO]: 1
@[SKB_DROP_REASON_TCP_CLOSE]: 2
@[SKB_DROP_REASON_NO_SOCKET]: 11
@[SKB_DROP_REASON_NOT_SPECIFIED]: 11
@[SKB_DROP_REASON_TCP_OLD_SEQUENCE]: 13
@[SKB_DROP_REASON_TCP_ABORT_ON_DATA]: 115
@[SKB_DROP_REASON_QUEUE_PURGE]: 277
Printing length of SKB drops
If you need to dig a little deeper to understand more about the packets that get dropped, then
the other fields of the kfree_skb
tracepoint come in handy.
This program dereferences the skbaddr
field of the tracepoint to find the packet length. The
program then uses a nice feature of bpftrace
to build a historgram of packet lengths that are
getting dropped.
sudo bpftrace -e '
tracepoint:skb:kfree_skb { @skb_len = hist(((struct sk_buff *) args->skbaddr)->len) }
interval:s:1 { print("\033[H\033[J"); print(@skb_len) }'
Attaching 2 probes...
@skb_len:
[0] 402 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 0 | |
[2, 4) 0 | |
[4, 8) 1 | |
[8, 16) 0 | |
[16, 32) 2 | |
[32, 64) 202 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 20 |@@ |
[128, 256) 9 |@ |
[256, 512) 1 | |
[512, 1K) 18 |@@ |
Printing where SKB drops happen
There are still some calls to kfree_skb
in the Linux kernel that have not been annotated with
a drop reason. These drops are reported with SKB_DROP_REASON_NOT_SPECIFIED
. We can find out
where they occur by printing the kernel stack trace of these calls.
This program builds an array of stack occurrences for SKB drops with
SKB_DROP_REASON_NOT_SPECIFIED
and prints out the stack when the -c
command terminates.
bpftrace -e '
tracepoint:skb:kfree_skb {
if (args->reason == SKB_DROP_REASON_NOT_SPECIFIED) { @[kstack] = count(); }
}' -c 'sleep 10'
Attaching 1 probe...
@[
sk_skb_reason_drop+148
sk_skb_reason_drop+148
tcp_v4_rcv+157
ip_protocol_deliver_rcu+53
ip_local_deliver_finish+118
__netif_receive_skb_one_core+133
process_backlog+135
__napi_poll+49
net_rx_action+824
handle_softirqs+240
do_softirq.part.0+59
__local_bh_enable_ip+96
__dev_queue_xmit+1041
ip_finish_output2+587
__ip_queue_xmit+873
__tcp_transmit_skb+2326
tcp_write_xmit+641
__tcp_push_pending_frames+57
tcp_sendmsg_locked+1992
tcp_sendmsg+47
sock_write_iter+294
vfs_write+964
ksys_write+191
do_syscall_64+130
entry_SYSCALL_64_after_hwframe+118
]: 2
@[
sk_skb_reason_drop+148
sk_skb_reason_drop+148
__udp4_lib_mcast_deliver.isra.0+900
__udp4_lib_rcv+1224
ip_protocol_deliver_rcu+205
ip_local_deliver_finish+118
__netif_receive_skb_one_core+133
netif_receive_skb+280
br_handle_frame_finish+713
br_handle_frame+953
__netif_receive_skb_core.constprop.0+536
__netif_receive_skb_list_core+307
netif_receive_skb_list_internal+462
napi_complete_done+130
rtl8169_poll+433
__napi_poll+49
net_rx_action+824
handle_softirqs+240
__irq_exit_rcu+203
common_interrupt+133
asm_common_interrupt+38
cpuidle_enter_state+204
cpuidle_enter+49
cpuidle_idle_call+245
do_idle+120
cpu_startup_entry+41
start_secondary+294
common_startup_64+318
]: 14