英文:
How to use judy array lib properly in golang?
问题
在Golang中,调用C库的方式与其他主流动态语言(如PHP/Python/Java)不同,因为Golang具有不基于操作系统线程的不同的多任务机制,所以调用C函数可能会导致上下文切换或线程切换,就我所了解的情况而言。在我的项目中,我正在尝试在Golang中使用Judy Array(作为队列工作者)来进行一些简单但大量的字典相关计算,比如"select distinct"。
在涉及这种C库(用于相对高密度计算)的情况下,如何最佳实践,并尽量减少引入的性能开销?
英文:
In golang, the way calling C library is different from what's used in other mainframe dynamic language like PHP / Python / Java because Golang has a different multitasking mechanism which is not OS thread based, so call c function may result in a context switching or thread switching as I understand. In my project I'm trying to use Judy Array in Golang (as a queue worker) to do some simple but large amount dict-related calculation like "select distinct", so
> What's the best practice to involve such c lib (for relatively high density calculation) and minimalise the performance overhead introduced as much as possible?
答案1
得分: 1
尽管标题如此,但这个问题实际上有两个部分:一个是关于golang和C接口的通用问题,用于提高效率;另一个是关于高效使用judy数组的具体问题。
这个帖子似乎总结了成本:https://groups.google.com/forum/#!topic/golang-nuts/RTtMsgZi88Q ,所以相比直接使用C,它的成本较高,你应该尽量减少从Go到C的切换点。
以下是关于judy数组的额外建议:我以前在C/C++代码中使用过judy数组。该库的接口在某些地方并不直观。默认情况下,它使用基于C宏的API,这使得正确使用接口变得棘手,因为编译器无法像通常那样提供太多帮助。
因此,我建议你首先在C中编写你的测试和基准测试,以便你了解API及其奇怪的情况。在我的应用程序中,与C++的字符串向量相比,基准测试judy数组的速度快了3倍,所以这可能是值得的。但是将任务分为三个阶段。首先在C中完成你想要做的事情,并确保它在你自己的C代码中按预期工作。然后扩展基本的C接口,以处理你需要完成的批处理任务,以最小化Go到C的切换次数。然后从Go中绑定你的新C接口。
英文:
Despite the title, the question here really has two parts: a generic one about golang and C-interfacing for efficiency, and a specific one about performant use of judy arrays.
This thread seems to summarize the costs: https://groups.google.com/forum/#!topic/golang-nuts/RTtMsgZi88Q , so yeah its expensive compared to straight C, and you should try to minimize the crossover points from Go to C.
Here's additional, judy array specific advice: I've used judy arrays before in C/C++ code. The library's interface is not intuitive in certain places. And by default it uses a C-macro based API, which makes it tricky to get the interface usage correct because the compiler can't offer as much help as usual.
What I recommend, therefore, is that you write your tests and benchmarks in C first, so you understand the API and its weird cases. Judy arrays when benchmarked for my application (vs C++ vector of strings) were 3x faster, so it can be worth it. But break the task into three phases. First do what you want to do in C, and make sure it works as expected in your own C code. Then expand the basic C interface to handle batches of what you need done, so as to minimize the number of Go->C switches. Then bind your new C interface from Go.
答案2
得分: 0
如果你从头开始为库创建绑定,我建议你首先以最直接的方式使用cgo,然后查看性能是否符合你的要求。
如果不符合要求,尝试减少在常用调用点进行的C调用次数。正如你在问题中提到的,当Go进行C调用时,它会切换到不同的堆栈,如果你对微不足道的函数进行了大量的cgo调用,这将影响性能。因此,改进性能的一种方法是减少总的C调用次数。
例如,如果你需要调用多个C函数来实现Go API中的一个操作,请考虑是否可以编写一个小的shim C函数来组合这些调用。
如果你要封装的API涉及大量的字符串处理,如果你有很多类似下面的调用,这可能会出现问题:
func foo(bar string) {
cBar := C.CString(bar)
defer C.free(unsafe.Pointer(cBar))
C.foo(cBar)
}
这是三个C调用。如果你要封装的API可以处理未终止的字符串,在这种情况下,一种选择是将指向字符串的指针传递给包装器,并在生成的_cgo_export.h
中使用定义的GoString
类型。例如,在Go端:
func foo(bar string) {
C.foo_wrapper(unsafe.Pointer(&bar))
}
在C端:
#include "_cgo_export.h"
void foo_wrapper(void *ptr_to_string) {
GoString *bar = ptr_to_string;
foo_with_length(bar->p, bar->n);
}
只要库在foo_wrapper
返回后不再持有字符串数据,这应该是安全的。
可能还有其他一些优化方法,但我强烈建议最初保持简单,并将精力投入到重要的优化领域。
英文:
If you are starting the binding for the library from scratch, I'd start by using cgo in the most straight forward way possible, and then see whether the performance meets your requirements.
If it doesn't, try minimising the number of C calls you make in commonly called spots. As you've already mentioned in the question, Go switches to a different stack when it makes a C call and this will affect the performance if you make lots of cgo calls to trivial functions. So one way to improve performance is to reduce the total number of C calls.
For example, if you need to call multiple C functions to implement one operation in your Go API, consider whether you could write a small shim C function that could combine those calls.
If the API you're wrapping deals with a lot of strings, this can show up if you've got many calls like:
func foo(bar string) {
cBar := C.CString(bar)
defer C.free(unsafe.Pointer(cBar)
C.foo(cBar)
}
Which is three C calls. If the API you're wrapping can deal with unterminated strings, one option here is to pass a pointer to the string to a wrapper, and use the GoString
type defined in the generated _cgo_export.h
. For example, on the Go side:
func foo(bar string) {
C.foo_wrapper(unsafe.Pointer(&bar))
}
And on the C side:
#include "_cgo_export.h"
void foo_wrapper(void *ptr_to_string) {
GoString *bar = ptr_to_string;
foo_with_length(bar->p, bar->n);
}
As long as the library doesn't hold on to the string data past when foo_wrapper
returns, this should be safe.
There are probably some other optimisations that could help, but I'd strongly recommend keeping things simple initially and put your efforts into optimising the areas that matter.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论