英文:
How to put regexp inside proto struct in golang?
问题
我有一个类似树状结构的数据,其中包含了一个字符串正则表达式,我希望能够将Go编译的*regexp.Regexp
也作为结构的一部分,以便在树上运行算法。当我将其序列化并传递给另一台机器时,我可能需要从字符串重新编译它。请问如何正确地实现这个功能?如何强制protobuf在序列化时忽略指针字段(ideally,它应该是*regexp.Regexp
类型)?(我唯一能想到的方法是创建一个uint64
字段,并将其值强制转换为*Regexp
类型来实现)
伪代码(因为所需的功能似乎不在语言中):
// 由protoc生成的结构体
type ProtoMessage struct {
Data string
Source string
Regexp uint64 // 序列化时不应包含此字段,希望在调用proto.Marshal时能够强制忽略它,ideally,它应该是*regexp.Regexp类型
Left *ProtoMessage
Right *ProtoMessage
}
func main() {
// 发送方计算机执行doSend():
mSrc := &ProtoMessage{Data: "its meee!!!", Source: "hello.+world"}
payload, _ := proto.Marshal(m)
// 接收方计算机执行onRecv():
mDst := new(ProtoMessage)
proto.Unmarshal(payload, mDst)
r, _ := regexp.Compile(mDst.Source)
mDst.Regexp = uint64(unsafe.Pointer(r)) // 顺便说一下,这个方法不起作用
TreeMatch = func(tree *ProtoMessage, line string) string {
if (*regexp.Regexp)(t.Regexp).Match(line) { // 这一行不起作用
return t.Data
}
if tree.Left == nil {
return ""
}
return TreeMatch(tree.Left, line)
}
assert(TreeMatch(mDst, "hello, world") == "its meee!!!") // 如果条件为假,则会引发错误
}
使用JSON序列化,我可以直接传递一个指向regexp的指针,并在字段上添加标签json:"-"
,以便在序列化时不包含该字段。当然,这是序列化/反序列化系统的一个重要功能,可以保持高效(例如,在同一结构上运行算法,并避免在反序列化后进行数据复制)。请问如何在protobuf中实现相同的功能?
英文:
I have a tree-like structure, that has string regexp and I want Go compiled *regexp.Regexp
to be part of it as well, in order to run algorithms on the tree. When I marshal and pass it to a different machine I may just recompile it again from the string. What is the correct way to do that, how to force protobuf to store pointers in a structure, that it ideally wont marshal? (the only way that i see is to make uint64
field and cast its value to/from *Regexp
)
pseudo-code (because required wanted features seems to be not in the language):
// struct generated by protoc
type ProtoMessage struct {
Data string
Source string
Regexp uint64 // should not be marshalled, should be forcefully omitted from payload when doing proto.Marshal, ideally it should be *regexp.Regexp
Left *ProtoMessage
Right *ProtoMessage
}
func main() {
// sender computer doSend():
mSrc := &ProtoMessage{Data:"its meee!!!", Source: "hello.+world"}
payload, _ := proto.Marshal(m)
//receiver computer: onRecv()
mDst := new(ProtoMessage)
proto.Unmarshal(payload, mDst)
r, _ := regexp.Compile(mDst.Source)
mDst.Regexp = uint64(unsafe.Pointer(r)) // not working btw
TreeMatch = func(tree* ProtoMessage, line string) string {
if *regexp.Regexp(t.Regexp).Match(line) { // not working line
return t.Data
}
if tree.Left == nil {
return ""
}
return TreeMatch(tree.Left, line)
}
assert( TreeMatch(mDst, "hello, world") == "its meee!!!") // panic if condition is false
}
With json marshal i can just pot a pointer to regexp and provide a tag json:"-"
in order not to include this field into marshalled structure, and ofc its important feature of marshalling/unmarshalling system to stay efficient (eg use same structure to run algorithms on in, and avoid data copying after unmarshal). How can I do the same with protobuf?
答案1
得分: 2
你不能在protobuf中存储指针,因为接收方很可能是另一台计算机。即使你可以这样做,一旦尝试解引用指针,就会出现恐慌。最简单的方法是只传递RegExp字符串,然后在目标位置重新编译:
package main
import (
"fmt"
"google.golang.org/protobuf/proto"
"google.golang.org/protobuf/types/known/structpb"
)
func main() {
v := structpb.NewStringValue("hello.+world")
b, err := proto.Marshal(v)
if err != nil {
panic(err)
}
fmt.Printf("%q\n", b) // "\x1a\fhello.+world"
}
注意:你也不能使用Gob绕过这个问题:
package main
import (
"bytes"
"encoding/gob"
"regexp"
)
func main() {
re := regexp.MustCompile("hello.+world")
buf := new(bytes.Buffer)
if err := gob.NewEncoder(buf).Encode(re); err != nil {
panic(err) // type regexp.Regexp has no exported fields
}
}
英文:
You can't store a pointer in a protobuf, as the recipient is likely a different computer. Even if you could, you'd get a panic as soon as you tried to dereference the pointer. Easiest thing to do would be just pass the RegExp string, then compile again at the destination:
package main
import (
"fmt"
"google.golang.org/protobuf/proto"
"google.golang.org/protobuf/types/known/structpb"
)
func main() {
v := structpb.NewStringValue("hello.+world")
b, err := proto.Marshal(v)
if err != nil {
panic(err)
}
fmt.Printf("%q\n", b) // "\x1a\fhello.+world"
}
Note: you can't hack around this with Gob either:
package main
import (
"bytes"
"encoding/gob"
"regexp"
)
func main() {
re := regexp.MustCompile("hello.+world")
buf := new(bytes.Buffer)
if err := gob.NewEncoder(buf).Encode(re); err != nil {
panic(err) // type regexp.Regexp has no exported fields
}
}
答案2
得分: 1
找到解决方案,你只需要在结构体中有任何指针(无论是否进行了编组,因为在接收方不使用其未编组的值):
proto声明:
syntax = "proto3";
package main;
option go_package = ".;main";
message Empty {
}
message ProtoMessage {
string data = 1;
string source = 2;
Empty regexp = 3; // ideally should not be marshalled at all, like `json:"-"` but for protobuf
ProtoMessage left = 4;
ProtoMessage right = 5;
}
测试代码:
package main
import (
"regexp"
"testing"
"unsafe"
)
type Empty struct {
//state protoimpl.MessageState
//sizeCache protoimpl.SizeCache
//unknownFields protoimpl.UnknownFields
}
// struct generated by protoc
type ProtoMessage struct {
//state protoimpl.MessageState
//sizeCache protoimpl.SizeCache
//unknownFields protoimpl.UnknownFields
Data string `protobuf:"bytes,1,opt,name=data,proto3" json:"data,omitempty"`
Source string `protobuf:"bytes,2,opt,name=source,proto3" json:"source,omitempty"`
Regexp *Empty `protobuf:"bytes,3,opt,name=regexp,proto3" json:"regexp,omitempty"` // ideally should not be marshalled at all, like `json:"-"` but for protobuf
Left *ProtoMessage `protobuf:"bytes,4,opt,name=left,proto3" json:"left,omitempty"`
Right *ProtoMessage `protobuf:"bytes,5,opt,name=right,proto3" json:"right,omitempty"`
}
func (p *ProtoMessage) GetCompiledRegexp() *regexp.Regexp {
return (*regexp.Regexp)(unsafe.Pointer(p.Regexp))
}
func (p *ProtoMessage) SetCompiledRegexp(r *regexp.Regexp) {
p.Regexp = (*Empty)(unsafe.Pointer(r))
}
func TreeMatch(tree *ProtoMessage, line string) string {
if tree.GetCompiledRegexp().Match([]byte(line)) { // not working line
return tree.Data
}
if tree.Left == nil {
return ""
}
return TreeMatch(tree.Left, line)
}
func TestTreeMatch(t *testing.T) {
//happening at receiver side: imagine its proto.Unmarshal(payload, receiverMsg)
receiverMsg := &ProtoMessage{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
if TreeMatch(receiverMsg, "helloworld") != "" {
t.Fatalf("TreeMatch gives non-existing match!")
}
if TreeMatch(receiverMsg, "hello, world") != "its meee!!!" {
t.Fatalf("TreeMatch is not working!")
}
}
type ProtoMessageDirect struct {
Data string
Source string
Regexp *regexp.Regexp
Left *ProtoMessageDirect
Right *ProtoMessageDirect
}
func (p *ProtoMessageDirect) GetCompiledRegexp() *regexp.Regexp {
return p.Regexp
}
func (p *ProtoMessageDirect) SetCompiledRegexp(r *regexp.Regexp) {
p.Regexp = r
}
func TreeMatchDirect(tree *ProtoMessageDirect, line string) string {
if tree.GetCompiledRegexp().Match([]byte(line)) { // not working line
return tree.Data
}
if tree.Left == nil {
return ""
}
return TreeMatchDirect(tree.Left, line)
}
func BenchmarkRegexpCast(b *testing.B) {
receiverMsg := &ProtoMessage{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
b.ResetTimer()
for i := 0; i < b.N; i++ {
TreeMatch(receiverMsg, "hello, world")
}
}
func BenchmarkRegexpDirect(b *testing.B) {
receiverMsg := &ProtoMessageDirect{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
b.ResetTimer()
for i := 0; i < b.N; i++ {
TreeMatchDirect(receiverMsg, "hello, world")
}
}
TestTreeMatch
通过了测试,并且基准测试显示这样的转换没有产生任何有意义的差异:
BenchmarkRegexpCast-20 2741786 376.7 ns/op 16 B/op 1 allocs/op
BenchmarkRegexpDirect-20 3075280 377.0 ns/op 16 B/op 1 allocs/op
PASS
英文:
Found the solution, you just have to have any pointer inside your struct (no matter if its marshalling or not, you are not using its unmarshalled value on receiver side):
proto declaration:
syntax = "proto3";
package main;
option go_package = ".;main";
message Empty {
}
message ProtoMessage {
string data = 1;
string source = 2;
Empty regexp = 3; // ideally should not be marshalled at all, like `json:"-"` but for protobuf
ProtoMessage left = 4;
ProtoMessage right = 5;
}
testing code:
package main
import (
"regexp"
"testing"
"unsafe"
)
type Empty struct {
//state protoimpl.MessageState
//sizeCache protoimpl.SizeCache
//unknownFields protoimpl.UnknownFields
}
// struct generated by protoc
type ProtoMessage struct {
//state protoimpl.MessageState
//sizeCache protoimpl.SizeCache
//unknownFields protoimpl.UnknownFields
Data string `protobuf:"bytes,1,opt,name=data,proto3" json:"data,omitempty"`
Source string `protobuf:"bytes,2,opt,name=source,proto3" json:"source,omitempty"`
Regexp *Empty `protobuf:"bytes,3,opt,name=regexp,proto3" json:"regexp,omitempty"` // ideally should not be marshalled at all, like `json:"-"` but for protobuf
Left *ProtoMessage `protobuf:"bytes,4,opt,name=left,proto3" json:"left,omitempty"`
Right *ProtoMessage `protobuf:"bytes,5,opt,name=right,proto3" json:"right,omitempty"`
}
func (p *ProtoMessage) GetCompiledRegexp() *regexp.Regexp {
return (*regexp.Regexp)(unsafe.Pointer(p.Regexp))
}
func (p *ProtoMessage) SetCompiledRegexp(r *regexp.Regexp) {
p.Regexp = (*Empty)(unsafe.Pointer(r))
}
func TreeMatch(tree *ProtoMessage, line string) string {
if tree.GetCompiledRegexp().Match([]byte(line)) { // not working line
return tree.Data
}
if tree.Left == nil {
return ""
}
return TreeMatch(tree.Left, line)
}
func TestTreeMatch(t *testing.T) {
//happening at receiver side: imagine its proto.Unmarshal(payload, receiverMsg)
receiverMsg := &ProtoMessage{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
if TreeMatch(receiverMsg, "helloworld") != "" {
t.Fatalf("TreeMatch gives non-existing match!")
}
if TreeMatch(receiverMsg, "hello, world") != "its meee!!!" {
t.Fatalf("TreeMatch is not working!")
}
}
type ProtoMessageDirect struct {
Data string
Source string
Regexp *regexp.Regexp
Left *ProtoMessageDirect
Right *ProtoMessageDirect
}
func (p *ProtoMessageDirect) GetCompiledRegexp() *regexp.Regexp {
return p.Regexp
}
func (p *ProtoMessageDirect) SetCompiledRegexp(r *regexp.Regexp) {
p.Regexp = r
}
func TreeMatchDirect(tree *ProtoMessageDirect, line string) string {
if tree.GetCompiledRegexp().Match([]byte(line)) { // not working line
return tree.Data
}
if tree.Left == nil {
return ""
}
return TreeMatchDirect(tree.Left, line)
}
func BenchmarkRegexpCast(b *testing.B) {
receiverMsg := &ProtoMessage{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
b.ResetTimer()
for i := 0; i < b.N; i++ {
TreeMatch(receiverMsg, "hello, world")
}
}
func BenchmarkRegexpDirect(b *testing.B) {
receiverMsg := &ProtoMessageDirect{
Data: "its meee!!!",
Source: "hello.+world",
}
r, _ := regexp.Compile(receiverMsg.Source)
receiverMsg.SetCompiledRegexp(r)
b.ResetTimer()
for i := 0; i < b.N; i++ {
TreeMatchDirect(receiverMsg, "hello, world")
}
}
TestTreeMatch
is passing and Benchmarks shows that such a cast does not create any meaningful difference:
BenchmarkRegexpCast-20 2741786 376.7 ns/op 16 B/op 1 allocs/op
BenchmarkRegexpDirect-20 3075280 377.0 ns/op 16 B/op 1 allocs/op
PASS
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论