英文:
Difference between BPF_PROG_TYPE_SOCK_OPS and BPF_PROG_TYPE_CGROUP_SOCK
问题
The BPF_PROG_TYPE_SOCK_OPS and BPF_PROG_TYPE_CGROUP_SOCK programs appear to be very similar. 根据内核源代码,以下是这两种程序类型的定义:
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock,
struct bpf_sock, struct sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops,
struct bpf_sock_ops, struct bpf_sock_ops_kern)
CGROUP_SOCK是否是SOCK_OPS程序类型的子集?因为它的相关bpf_sock
似乎具有与bpf_sock_ops
相同的常见字段。
Edit: 在测试过程中,我还注意到bpf_sock
结构只允许对源和目标IP进行受限访问。这是否强化了CGROUP_SOCK是SOCK_OPS程序类型的子集的观点?
英文:
The BPF_PROG_TYPE_SOCK_OPS and BPF_PROG_TYPE_CGROUP_SOCK programs seen to be very similar. According to the kernel source, the following are the definitions of the two program types:
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock,
struct bpf_sock, struct sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops,
struct bpf_sock_ops, struct bpf_sock_ops_kern)
Is the CGROUP_SOCK a subset of the SOCK_OPS program type? Because its associated bpf_sock
seems to have common fields as bpf_sock_ops
.
Edit: While testing, I also realized that the bpf_sock
struct only allows restricted access to the source and destination IPs. Does this reinforce that CGROUP_SOCK is a subset of the SOCK_OPS program type?
答案1
得分: 2
以下是已翻译的内容:
这是引入BPF_PROG_TYPE_SOCK_OPS
程序类型的提交:https://github.com/torvalds/linux/commit/40304b2a1567fecc321f640ee4239556dd0f3ee0
以下是提交消息中概述差异的部分:
创建了一个新的BPF程序类型,BPF_PROG_TYPE_SOCK_OPS,以及一个相应的结构,允许此类型的BPF程序访问套接字的一些字段(如IP地址、端口等)。它使用现有的bpf cgroups基础设施,因此程序可以附加到每个cgroup并具有完全的继承支持。程序将在适当的时候被调用以设置相关的连接参数,如缓冲区大小、SYN和SYN-ACK RTO等,基于连接信息,如IP地址、端口号等。
虽然已经有3种机制来设置参数(sysctls、路由度量和setsockopts),但这种新机制提供了一些明显的优势。与sysctls不同,它可以针对每个连接设置参数。与路由度量相比,它还可以使用端口号和用户级程序提供的信息。此外,它可以以概率方式设置参数以进行评估(即在10%的流量上执行不同的操作并与其余90%的流量进行比较)。此外,在IPv6地址包含地理信息的情况下,根据主机之间的距离(或RTT)进行更改的规则比路由度量规则更容易,并且可以是全局的。最后,与setsockopt不同,它不需要应用程序更改,并且可以随时轻松更新。
尽管bpf cgroup框架已经包含一个与套接字相关的程序类型(BPF_PROG_TYPE_CGROUP_SOCK),但我创建了新类型(BPF_PROG_TYPE_SOCK_OPS),因为现有类型只希望在连接的生命周期内被调用一次。相反,新的程序类型将从网络堆栈代码的不同位置多次调用。例如,在发送SYN和SYN-ACK以设置适当的超时之前,当建立连接以设置拥塞控制等。因此,它具有“op”字段以指定所请求操作的类型。
此新程序类型的目的是简化设置连接参数,如缓冲区大小、TCP的SYN RTO等。例如,可以轻松使用Facebook的内部IPv6地址来确定连接的两个主机是否在同一个数据中心。因此,可以轻松编写一个BPF程序,以在两个主机位于同一数据中心时选择较小的SYN RTO值。
在测试时,我还意识到bpf_sock结构只允许受限制地访问源和目标IP。这是否强化了CGROUP_SOCK是SOCK_OPS程序类型的子集?
不,一个不是另一个的子集,它们只是为不同用例而设计的不同程序类型。它们具有不同的上下文,并且每个程序类型可以具有非常具体的规则,规定哪些字段是只读的,哪些是可写的。
英文:
This is the commit that introduced the BPF_PROG_TYPE_SOCK_OPS
program type: https://github.com/torvalds/linux/commit/40304b2a1567fecc321f640ee4239556dd0f3ee0
The following is from the commit message which outlines the differences quite nicely:
> Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.
>
> Although there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.
>
> Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) because the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code. For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.
>
> The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.
> While testing, I also realized that the bpf_sock struct only allows restricted access to the source and destination IPs. Does this reinforce that CGROUP_SOCK is a subset of the SOCK_OPS program type?
No, one is not the subset of the other, they are simply different program types intended for different use-cases. They have different contexts and each program type can have very specific rules around which fields are read-only and write-only.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论