Packet SKB drops four ways

Demonstrating 4 nifty ways that bpftrace can be used to report SKB drops in the Linux kernel networking data path.

The Linux kernel networking subystem used to silently drop SKBs on error, with no way of knowing what caused the drops. Then came the kfree_skb tracepoint that gave visibility of SKB drops, albeit in a raw way:

cat /sys/kernel/tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 1886
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:void * skbaddr;	offset:8;	size:8;	signed:0;
	field:void * location;	offset:16;	size:8;	signed:0;
	field:void * rx_sk;	offset:24;	size:8;	signed:0;
	field:unsigned short protocol;	offset:32;	size:2;	signed:0;
	field:enum skb_drop_reason reason;	offset:36;	size:4;	signed:0;

This is today's format of the kfree_skb trace event, which includes a reason field. The advent of drop reasons brought much needed precision to the information reported by kfree_skb events.

The bpftrace tool makes it super easy to write observability programs that attach to kernel trace points, amongst other things. Here are a few simple bpftrace one-liners for reporting SKB drops as they happen.

Printing SKB drops as they happen

Let's start with a really simple bpftrace program that reports SKB drops as they happen.

bpftrace -e 'tracepoint:skb:kfree_skb { print(args->reason) }'
Attaching 1 probe...
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_ABORT_ON_DATA
SKB_DROP_REASON_QUEUE_PURGE
SKB_DROP_REASON_TCP_OLD_SEQUENCE

If you're just trying to identify why packets are getting dropped, this might be good enough. But it can be a bit of a firehose when there are a lot of drops for different reasons.

Printing an updating report of SKB drops

This next program builds an array of reason occurrences and prints it out once a second. It prints a console blanking escape sequence so that you get a dynamically refreshing list rather that a scrolling display in the terminal window.

sudo bpftrace -e '
    tracepoint:skb:kfree_skb { @[args->reason] = count(); }
    interval:s:1 { print("\033[H\033[J"); print(@) }'
Attaching 2 probes...
@[SKB_DROP_REASON_UNHANDLED_PROTO]: 1
@[SKB_DROP_REASON_TCP_CLOSE]: 2
@[SKB_DROP_REASON_NO_SOCKET]: 11
@[SKB_DROP_REASON_NOT_SPECIFIED]: 11
@[SKB_DROP_REASON_TCP_OLD_SEQUENCE]: 13
@[SKB_DROP_REASON_TCP_ABORT_ON_DATA]: 115
@[SKB_DROP_REASON_QUEUE_PURGE]: 277

Printing length of SKB drops

If you need to dig a little deeper to understand more about the packets that get dropped, then the other fields of the kfree_skb tracepoint come in handy.

This program dereferences the skbaddr field of the tracepoint to find the packet length. The program then uses a nice feature of bpftrace to build a historgram of packet lengths that are getting dropped.

sudo bpftrace -e '
    tracepoint:skb:kfree_skb { @skb_len = hist(((struct sk_buff *) args->skbaddr)->len) }
    interval:s:1 { print("\033[H\033[J"); print(@skb_len) }'
Attaching 2 probes...
@skb_len:
[0]                  402 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                    0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 1 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               2 |                                                    |
[32, 64)             202 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[64, 128)             20 |@@                                                  |
[128, 256)             9 |@                                                   |
[256, 512)             1 |                                                    |
[512, 1K)             18 |@@                                                  |

Printing where SKB drops happen

There are still some calls to kfree_skb in the Linux kernel that have not been annotated with a drop reason. These drops are reported with SKB_DROP_REASON_NOT_SPECIFIED. We can find out where they occur by printing the kernel stack trace of these calls.

This program builds an array of stack occurrences for SKB drops with SKB_DROP_REASON_NOT_SPECIFIED and prints out the stack when the -c command terminates.

bpftrace -e '
    tracepoint:skb:kfree_skb {
        if (args->reason == SKB_DROP_REASON_NOT_SPECIFIED) { @[kstack] = count(); }
    }' -c 'sleep 10'
Attaching 1 probe...

@[
        sk_skb_reason_drop+148
        sk_skb_reason_drop+148
        tcp_v4_rcv+157
        ip_protocol_deliver_rcu+53
        ip_local_deliver_finish+118
        __netif_receive_skb_one_core+133
        process_backlog+135
        __napi_poll+49
        net_rx_action+824
        handle_softirqs+240
        do_softirq.part.0+59
        __local_bh_enable_ip+96
        __dev_queue_xmit+1041
        ip_finish_output2+587
        __ip_queue_xmit+873
        __tcp_transmit_skb+2326
        tcp_write_xmit+641
        __tcp_push_pending_frames+57
        tcp_sendmsg_locked+1992
        tcp_sendmsg+47
        sock_write_iter+294
        vfs_write+964
        ksys_write+191
        do_syscall_64+130
        entry_SYSCALL_64_after_hwframe+118
]: 2
@[
        sk_skb_reason_drop+148
        sk_skb_reason_drop+148
        __udp4_lib_mcast_deliver.isra.0+900
        __udp4_lib_rcv+1224
        ip_protocol_deliver_rcu+205
        ip_local_deliver_finish+118
        __netif_receive_skb_one_core+133
        netif_receive_skb+280
        br_handle_frame_finish+713
        br_handle_frame+953
        __netif_receive_skb_core.constprop.0+536
        __netif_receive_skb_list_core+307
        netif_receive_skb_list_internal+462
        napi_complete_done+130
        rtl8169_poll+433
        __napi_poll+49
        net_rx_action+824
        handle_softirqs+240
        __irq_exit_rcu+203
        common_interrupt+133
        asm_common_interrupt+38
        cpuidle_enter_state+204
        cpuidle_enter+49
        cpuidle_idle_call+245
        do_idle+120
        cpu_startup_entry+41
        start_secondary+294
        common_startup_64+318
]: 14
kernel  bpf