eBPF地图在循环中并发更新和删除

huangapple go评论49阅读模式
英文:

eBPF map concurrent update and delete in a loop

问题

目标情景是我想使用ebpf程序实现流量日志记录,并将该程序附加到不同的网络接口上,使用tc。

因此,ebpf映射将如下所示:
键是源IP、源端口、目标IP、目标端口、协议的元组。
值是字节传输和数据包传输的结构体。

因此,每次网络接口看到数据包时,我将更新映射以增加数据包和字节。

然而,ebpf映射有最大大小,我将需要定期清除该映射中的过期条目,否则映射会很快变满。

因此,我计划实现一个用户空间程序,定期使流映射条目过期。

然而,同时从用户空间程序和内核并发访问该映射有什么影响?我应该使用bpf_spin_lock吗?我担心每个数据包都获取锁可能会很昂贵。

我还发现了一个帖子https://justin.azoff.dev/blog/bpf_map_get_next_key-pitfalls/,作者使用了一种奇怪的方法来迭代和删除映射条目。但是,我也在内核源代码树中找到了这个,它只是在while循环中删除项目。https://elixir.bootlin.com/linux/latest/source/samples/bpf/trace_event_user.c#L108 谁是对的,谁是错的?(我想我应该更信任内核源代码树)。

即使有上面的示例,它也表明我可以在迭代映射时删除映射项。然而,上面的帖子没有提到是否可以在内核中同时更新映射元素。

非常感谢建议。

英文:

Target scenario is that I want to implementing a flow logging using ebpf program and attach the program to different network interfaces using tc.

So the ebpf map will looks like this:
key is a tuple of srcIp, srcPort, dstIp, dstPort, protocol.
value is a struct of bytesTransmitted and packetsTransmitted.

So when each packet is seen by network interface, I will update the map to increment the packets and bytes.

However, there is max size of the ebpf map, and I will need to periodically cleanup the stale entries in that map, otherwise the map will be full pretty quickly.

So I plan to implement a userspace program that will periodically expire the flow map entries.

However, what's the implication of accessing that map concurrently from both userspace program and kernel? Should I use bpf_spin_lock? I am worried that acquiring lock on each packet can be expensive.

I also found a post https://justin.azoff.dev/blog/bpf_map_get_next_key-pitfalls/, and the author is using a weird way to iterate and delete map entries. However, I also found this in the kernel source tree, which is just deleting items in a while loop. https://elixir.bootlin.com/linux/latest/source/samples/bpf/trace_event_user.c#L108 Who is right and who is wrong? ( I guess I should trust kernel source tree more ).

Even with the example above, it's indicating that I can delete the map item while iterating the map. However, the post above did not mention anything like if I can concurrently updating the map element in the kernel.

I really appreciate the advice.

答案1

得分: 1

然而,ebpf地图有最大尺寸,我需要定期清理该地图中的陈旧条目,否则地图很快就会变满。

所以我计划实现一个用户空间程序,定期过期流地图条目。

另一种方法可能是使用LRU地图。

但是,从用户空间程序和内核并发访问该地图的影响是什么?我应该使用bpf_spin_lock吗?我担心每个数据包都获取锁可能会很昂贵。

这取决于您的目标。在从用户空间读取时,您不需要使用同步原语。系统调用将地图值复制到用户提供的缓冲区后,它将无法修改。但可能会发生以下情况:

假设我们的地图值有2个字段,field1和field2。如果我的eBPF程序增加了这两个字段(field1++; field2++;)。然后,在系统调用中的复制可能发生在这两个修改之间。因此,您可能会得到一个在eBPF程序运行之前和之后地图状态的组合值。对于大多数应用程序,例如统计信息,这不是问题。但是,如果由于某种原因需要使地图值内的更改成为原子性,则必须使用spin_lock,如果您希望并发频繁更新相同的地图值,这不是性能方面的理想选择。

对于大多数与统计相关的用例,来自eBPF侧的per-CPU地图的原子操作足以保持准确的计数。然后定期从用户空间收集它们。

谁是对的,谁是错的?

两者都是正确的,第一个示例只想删除地图的一部分,并观察到从地图中删除键后,不能用于获取其后的下一个键,因此需要注意操作的顺序,以避免重新启动循环并执行额外的工作。第二个示例只是删除所有值,不关心是否从头开始重新启动。

因此,在迭代时删除是可以的,但是如果只想进行部分删除,必须注意操作的顺序以避免额外的工作。

即使有上述示例,它表明我可以在迭代地图时删除地图项。但是,上面的帖子没有提到是否可以在内核中同时更新地图元素。

这取决于您所说的“可以”。您的计算机不会爆炸或崩溃,这是允许的,但不幸的是,在多个系统调用之间没有同步机制。因此,您不能在用户空间中读取地图值,修改它并写回而不允许eBPF修改它。因此,在读取和更新系统调用之间的eBPF修改将丢失。我知道可以做类似这样的事情的唯一方法是使用map-in-maps以原子方式交换完整的地图,这也带来了自己的挑战。

英文:

> However, there is max size of the ebpf map, and I will need to periodically cleanup the stale entries in that map, otherwise the map will be full pretty quickly.
>
> So I plan to implement a userspace program that will periodically expire the flow map entries.

Another approach might be to use an LRU map.

> However, what's the implication of accessing that map concurrently from both userspace program and kernel? Should I use bpf_spin_lock? I am worried that acquiring lock on each packet can be expensive.

It depends on your goals. You are not required to use synchronization primitives when reading from user space. The syscall will copy the map value into the user supplied buffer after which it can't be modified. However, what can happen is the following:

Lets say our map value has 2 fields, field1 and field2. If my eBPF program increments both fields (field1++; field2++;). Then the copy in the syscall could happen in between these two modifications. So you could end up with a value that is a combination of the map state before and after the eBPF program ran. For most applications such as statistics this isn't an issue. But if for whatever reason changes within the map value need to be atomic, then you will have to use a spin_lock, which is not ideal for performance if you expect to update the same map values concurrently a lot.

For most stats related use-cases atomic operations from the eBPF side of per-CPU maps are enough to keep an accurate count. Then collect them periodically from userspace.

> Who is right and who is wrong?

Both are right, the first example wants to delete only part of the map and observes that after deleting a key from a map, it can't be used to get the next key after it, so you need to watch order of operations to avoid re-starting the loop and doing extra work. The second example simply removes all values and does not care if it restarts from the beginning.

So its alright to delete while iterating, but if you only want to do a partial deletion, you have to watch order of operations to not do extra work.

> Even with the example above, it's indicating that I can delete the map item while iterating the map. However, the post above did not mention anything like if I can concurrently updating the map element in the kernel.

It depends on what you mean my "can". You computer will not explode or crash, its allowed, but unfortunately there is no synchronization mechanism over multiple syscalls. So you can't read a map value, modify it in userspace and write it back without eBPF being able to modify it. So these modification by eBPF between the read and update syscalls will be lost. To only way I know to do something like this would be to use map-in-maps to swap out full maps atomically which brings challenges of its own.

huangapple
  • 本文由 发表于 2023年6月1日 03:38:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376767.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定