将点分隔的值使用Python转换为Go结构体

huangapple go评论89阅读模式
英文:

Convert dot-separated values into Go structs using Python

问题

这是一个特定的应用要求,其配置可以更改(特别是WSO2 Identity Server,因为我正在使用Go为其编写Kubernetes操作员)。但这与当前问题无关。我想创建一个解决方案,可以轻松管理大量配置映射以生成Go结构体。这些配置在一个.csv文件中进行映射。

.csv文件链接 - my_configs.csv

我想要:
编写一个Python脚本,可以自动生成Go结构体,这样只需执行Python脚本就可以通过创建相应的Go结构体来更新应用程序配置的任何更改。我指的是应用程序本身的配置。例如,csv中的toml键名可以更改/添加新值。

到目前为止,我已经成功创建了一个几乎实现我的目标的Python脚本。脚本如下:

import pandas as pd

def convert_to_dict(data):
    result = {}
    for row in data:
        current_dict = result
        for item in row[:-1]:
            if item is not None:
                if item not in current_dict:
                    current_dict[item] = {}
                current_dict = current_dict[item]
    return result

def extract_json_key(yaml_key):
    if isinstance(yaml_key, str) and '.' in yaml_key:
        return yaml_key.split('.')[-1]
    else:
        return yaml_key

def add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key):
    struct_string += str(go_var) + " " + str(go_type) + ' `json:"' + str(json_key) + ',omitempty" toml:"' +str(toml_key) + '"` ' + "\n"
    return struct_string

def generate_go_struct(struct_name, struct_data):
    struct_name="Configurations" if struct_name == "" else struct_name
    struct_string = "type " + struct_name + " struct {\n"
    yaml_key=df['yaml_key'].str.split('.').str[-1]
    
    # Base case: Generate fields for the current struct level    
    for key, value in struct_data.items():
        selected_rows = df[yaml_key == key]

        if len(selected_rows) > 1:
            go_var = selected_rows['go_var'].values[1]
            toml_key = selected_rows['toml_key'].values[1]
            go_type=selected_rows['go_type'].values[1]
            json_key=selected_rows['json_key'].values[1]
        else:
            go_var = selected_rows['go_var'].values[0]
            toml_key = selected_rows['toml_key'].values[0]
            go_type=selected_rows['go_type'].values[0]
            json_key=selected_rows['json_key'].values[0]

        # Add fields to the body of the struct
        struct_string=add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key)   

    struct_string += "}\n\n"
    
    # Recursive case: Generate struct definitions for nested structs
    for key, value in struct_data.items():
        selected_rows = df[yaml_key == key]

        if len(selected_rows) > 1:
            go_var = selected_rows['go_var'].values[1]
        else:
            go_var = selected_rows['go_var'].values[0]

        if isinstance(value, dict) and any(isinstance(v, dict) for v in value.values()):
            nested_struct_name = go_var
            nested_struct_data = value
            struct_string += generate_go_struct(nested_struct_name, nested_struct_data)
    
    return struct_string

# Read excel
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)

# Remove rows where all columns are NaN
df = df.dropna(how='all')
# Create the 'json_key' column using the custom function
df['json_key'] = df['yaml_key'].apply(extract_json_key)

data=df['yaml_key'].values.tolist() # Read the 'yaml_key' column
data = pd.DataFrame({'column':data}) # Convert to dataframe

data=data['column'].str.split('.', expand=True) # Split by '.'

nested_list = data.values.tolist() # Convert to nested list
data=nested_list 

result_json = convert_to_dict(data) # Convert to dict (JSON)

# The generated co code
go_struct = generate_go_struct("", result_json)

# Write to file
file_path = "output.go"
with open(file_path, "w") as file:
    file.write(go_struct)

问题是(请看csv的下面部分):

authentication.authenticator.basic
authentication.authenticator.basic.parameters
authentication.authenticator.basic.parameters.showAuthFailureReason
authentication.authenticator.basic.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp
authentication.authenticator.totp.parameters
authentication.authenticator.totp.parameters.showAuthFailureReason
authentication.authenticator.totp.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp.parameters.encodingMethod
authentication.authenticator.totp.parameters.timeStepSize

在这里,由于parameters字段在basictotp中重复,脚本会混淆并生成两个TotpParameters结构体。期望的结果是有BasicParametersTotpParameters结构体。csv的yaml_key列中存在许多类似重复的单词。

我理解这与go_var = selected_rows['go_var'].values[1]中硬编码的索引为1有关,但我很难解决这个问题。

请问有人可以指点我一个答案吗?我认为以下可能是问题的根本原因:

  1. 递归函数的问题
  2. 生成JSON的代码的问题

谢谢!

我也尝试了ChatGPT,但由于这涉及到嵌套和递归,ChatGPT提供的答案并不是很有效。

更新

我发现问题存在于包含propertiespoolOptionsendpointparameters字段的行中。这是因为它们在yaml_key列中重复出现。

英文:

This is a specific requirement for a application whose configurations can be changed (Specifically the WSO2 Identity Server since I'm writing a Kubernetes operator for it using Go). But it's really not relevant here. I want to create a solution which would allow to easily manage a lot of config mappings to generate Go structs. These configs are mapped within a .csv

Link to .csv - my_configs.csv

I want to,
write a python script that would automatically generate the Go structs so that any change to the application configs could be updated by creating the corresponding Go structs by simply executing the python script. I'm referring to the configs of the application itself. For example, the toml key names in the csv can be changed/new values can be added.

I have been successful so far to create a python script that nearly achieves my goal. The script is,

import pandas as pd
def convert_to_dict(data):
result = {}
for row in data:
current_dict = result
for item in row[:-1]:
if item is not None:
if item not in current_dict:
current_dict[item] = {}
current_dict = current_dict[item]
return result
def extract_json_key(yaml_key):
if isinstance(yaml_key, str) and '.' in yaml_key:
return yaml_key.split('.')[-1]
else:
return yaml_key
def add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key):
struct_string += str(go_var) + " " + str(go_type) + ' `json:"' + str(json_key) + ',omitempty" toml:"' +str(toml_key) + '"` ' + "\n"
return struct_string
def generate_go_struct(struct_name, struct_data):
struct_name="Configurations" if struct_name == "" else struct_name
struct_string = "type " + struct_name + " struct {\n"
yaml_key=df['yaml_key'].str.split('.').str[-1]
# Base case: Generate fields for the current struct level    
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
toml_key = selected_rows['toml_key'].values[1]
go_type=selected_rows['go_type'].values[1]
json_key=selected_rows['json_key'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
toml_key = selected_rows['toml_key'].values[0]
go_type=selected_rows['go_type'].values[0]
json_key=selected_rows['json_key'].values[0]
# Add fields to the body of the struct
struct_string=add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key)   
struct_string += "}\n\n"
# Recursive case: Generate struct definitions for nested structs
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
if isinstance(value, dict) and any(isinstance(v, dict) for v in value.values()):
nested_struct_name = go_var
nested_struct_data = value
struct_string += generate_go_struct(nested_struct_name, nested_struct_data)
return struct_string
# Read excel
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)
# Remove rows where all columns are NaN
df = df.dropna(how='all')
# Create the 'json_key' column using the custom function
df['json_key'] = df['yaml_key'].apply(extract_json_key)
data=df['yaml_key'].values.tolist() # Read the 'yaml_key' column
data = pd.DataFrame({'column':data}) # Convert to dataframe
data=data['column'].str.split('.', expand=True) # Split by '.'
nested_list = data.values.tolist() # Convert to nested list
data=nested_list 
result_json = convert_to_dict(data) # Convert to dict (JSON)
# The generated co code
go_struct = generate_go_struct("", result_json)
# Write to file
file_path = "output.go"
with open(file_path, "w") as file:
file.write(go_struct)

The problem is (look at the below part of the csv),

authentication.authenticator.basic
authentication.authenticator.basic.parameters
authentication.authenticator.basic.parameters.showAuthFailureReason
authentication.authenticator.basic.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp
authentication.authenticator.totp.parameters
authentication.authenticator.totp.parameters.showAuthFailureReason
authentication.authenticator.totp.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp.parameters.encodingMethod
authentication.authenticator.totp.parameters.timeStepSize

Here since the fields parameters are repeated for basic and totp, the script confuses itself and produces two TotpParameters structs. The expected outcome is to have BasicParameters and TotpParameters structs. Many similar repeating words are present in the csv's yaml_key column.

I understand this has something to do with the index being hardcoded as 1 in go_var = selected_rows['go_var'].values[1] but have a hard-time fixing this.

Could anyone please point me to an answer? I think,

  1. An issue with the recursive function
  2. An issue in the code to generate the JSON
    might be a root cause for this issue.

Thanks!

I tried with ChatGPT also, but since this has something to do with nesting and recursion, the provided answers by ChatGPT are not very valid.

UPDATE

I found that the problem exists for the rows that contain properties, poolOptions, endpoint and parameters fields. This is because they are repeated in the yaml_key column.

答案1

得分: 1

我能够解决这个问题。但是,我不得不完全采用一种新的方法来解决这个问题,即使用树数据结构,然后遍历它。这是其中的主要逻辑 - https://www.geeksforgeeks.org/level-order-tree-traversal/

以下是工作的Python代码。

import pandas as pd
from collections import deque

structs=[]
class TreeNode:
    def __init__(self, name):
        self.name = name
        self.children = []
        self.path=""

    def add_child(self, child):
        self.children.append(child)

def create_tree(data):
    root = TreeNode('')
    for item in data:
        node = root
        for name in item.split('.'):
            existing_child = next((child for child in node.children if child.name == name), None)
            if existing_child:
                node = existing_child
            else:
                new_child = TreeNode(name)
                node.add_child(new_child)
                node = new_child
    return root

def generate_go_struct(struct_data):
    struct_name = struct_data['struct_name']
    fields = struct_data['fields']
    
    go_struct = f"type {struct_name} struct {{\n"

    for field in fields:
        field_name = field['name']
        field_type = field['type']
        field_default_val = str(field['default_val'])
        json_key=field['json_key']
        toml_key=field['toml_key']

        tail_part=f"\t{field_name} {field_type} `json:\"{json_key},omitempty\" toml:\"{toml_key}\"`\n\n"

        if pd.isna(field['default_val']):
            go_struct += tail_part
        else:
            field_default_val = "\t// +kubebuilder:default:=" + field_default_val
            go_struct += field_default_val + "\n" + tail_part

    go_struct += "}\n\n"
    return go_struct

def write_go_file(go_structs, file_path):
    with open(file_path, 'w') as file:
        for go_struct in go_structs:
            file.write(go_struct)

def create_new_struct(struct_name):
    struct_name = "Configurations" if struct_name == "" else struct_name
    struct_dict = {
        "struct_name": struct_name,
        "fields": []
    }
    
    return struct_dict

def add_field(struct_dict, field_name, field_type,default_val,json_key, toml_key):
    field_dict = {
        "name": field_name,
        "type": field_type,
        "default_val": default_val,
        "json_key":json_key,
        "toml_key":toml_key
    }
    struct_dict["fields"].append(field_dict)
    
    return struct_dict

def traverse_tree(root):
    queue = deque([root])  
    while queue:
        node = queue.popleft()
        filtered_df = df[df['yaml_key'] == node.path]
        go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
        go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None

        if node.path=="":
            go_type="Configurations"

        # The structs themselves
        current_struct = create_new_struct(go_type)
        
        for child in node.children:  
            if (node.name!=""):
                child.path=node.path+"."+child.name   
            else:
                child.path=child.name

            filtered_df = df[df['yaml_key'] == child.path]
            go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
            go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
            default_val = filtered_df['default_val'].values[0] if not filtered_df.empty else None

            # Struct fields
            json_key = filtered_df['yaml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            toml_key = filtered_df['toml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            
            current_struct = add_field(current_struct, go_var, go_type,default_val,json_key, toml_key)

            if (child.children):
                # Add each child to the queue for processing
                queue.append(child)

        go_struct = generate_go_struct(current_struct)
        # print(go_struct,"\n")        
        structs.append(go_struct)

    write_go_file(structs, "output.go")

csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file) 

sample_data=df['yaml_key'].values.tolist()

# Create the tree
tree = create_tree(sample_data)

# Traverse the tree
traverse_tree(tree)

非常感谢您的所有帮助!

英文:

I was able to solve the issue. But, I had to completely use a new approach to solve the problem, i.e. using a tree data structure and then traversing it. Here's the main logic behind it - https://www.geeksforgeeks.org/level-order-tree-traversal/

Here's the working python code.

import pandas as pd
from collections import deque

structs=[]
class TreeNode:
    def __init__(self, name):
        self.name = name
        self.children = []
        self.path=""

    def add_child(self, child):
        self.children.append(child)

def create_tree(data):
    root = TreeNode('')
    for item in data:
        node = root
        for name in item.split('.'):
            existing_child = next((child for child in node.children if child.name == name), None)
            if existing_child:
                node = existing_child
            else:
                new_child = TreeNode(name)
                node.add_child(new_child)
                node = new_child
    return root

def generate_go_struct(struct_data):
    struct_name = struct_data['struct_name']
    fields = struct_data['fields']
    
    go_struct = f"type {struct_name} struct {{\n"

    for field in fields:
        field_name = field['name']
        field_type = field['type']
        field_default_val = str(field['default_val'])
        json_key=field['json_key']
        toml_key=field['toml_key']

        tail_part=f"\t{field_name} {field_type} `json:\"{json_key},omitempty\" toml:\"{toml_key}\"`\n\n"

        if pd.isna(field['default_val']):
            go_struct += tail_part
        else:
            field_default_val = "\t// +kubebuilder:default:=" + field_default_val
            go_struct += field_default_val + "\n" + tail_part

    go_struct += "}\n\n"
    return go_struct

def write_go_file(go_structs, file_path):
    with open(file_path, 'w') as file:
        for go_struct in go_structs:
            file.write(go_struct)

def create_new_struct(struct_name):
    struct_name = "Configurations" if struct_name == "" else struct_name
    struct_dict = {
        "struct_name": struct_name,
        "fields": []
    }
    
    return struct_dict

def add_field(struct_dict, field_name, field_type,default_val,json_key, toml_key):
    field_dict = {
        "name": field_name,
        "type": field_type,
        "default_val": default_val,
        "json_key":json_key,
        "toml_key":toml_key
    }
    struct_dict["fields"].append(field_dict)
    
    return struct_dict

def traverse_tree(root):
    queue = deque([root])  
    while queue:
        node = queue.popleft()
        filtered_df = df[df['yaml_key'] == node.path]
        go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
        go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None

        if node.path=="":
            go_type="Configurations"

        # The structs themselves
        current_struct = create_new_struct(go_type)
        
        for child in node.children:  
            if (node.name!=""):
                child.path=node.path+"."+child.name   
            else:
                child.path=child.name

            filtered_df = df[df['yaml_key'] == child.path]
            go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
            go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
            default_val = filtered_df['default_val'].values[0] if not filtered_df.empty else None

            # Struct fields
            json_key = filtered_df['yaml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            toml_key = filtered_df['toml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            
            current_struct = add_field(current_struct, go_var, go_type,default_val,json_key, toml_key)

            if (child.children):
                # Add each child to the queue for processing
                queue.append(child)

        go_struct = generate_go_struct(current_struct)
        # print(go_struct,"\n")        
        structs.append(go_struct)

    write_go_file(structs, "output.go")

csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file) 

sample_data=df['yaml_key'].values.tolist()

# Create the tree
tree = create_tree(sample_data)

# Traverse the tree
traverse_tree(tree)

Thanks a lot for all of your helps!

答案2

得分: 0

Rukshan,我看到你做了一些工作,并提出了一个你喜欢的解决方案。我尝试了一下,仍然出现了一些奇怪的错误。也许CSV文件或代码已经发生了变化,对你来说确实可以无错误运行...我不知道。

我看到输出中存在一些问题,比如Endpoint重复出现,并且我看到Go的基本字符串类型被重新定义:

type string struct {
	FromAddress string `json:"fromAddress,omitempty" toml:"from_address"`
	Username    string `json:"username,omitempty" toml:"username"`
	Password    string `json:"password,omitempty" toml:"password"`
	Hostname    string `json:"hostname,omitempty" toml:"hostname"`
	...

我找到了另一种解决方案。

从一个简单的示例CSV开始,如下所示:

| toml_key | yaml_key | go_var | go_type |
|----------|----------|--------|---------|
| a        | a        | A      | A       |
| a.b      | a.b      | B      | string  |
| a.c      | a.c      | C      | string  |
| a.d      | a.d      | D      | bool    |
| e        | e        | E      | E       |
| e.f      | e.f      | F      | F       |
| e.f.g    | e.f.g    | G      | string  |
| e.h      | e.h      | H      | string  |

我的程序生成了以下Go代码:

// Code generated by ... DO NOT EDIT.
package main

type A struct {
	B string `json:"b,omitempty" toml:"b"`
	C string `json:"c,omitempty" toml:"c"`
	D bool   `json:"d,omitempty" toml:"d"`
}

type E struct {
	F struct {
		G string `json:"g,omitempty" toml:"g"`
	} `json:"f,omitempty" toml:"f"`
	H string `json:"h,omitempty" toml:"h"`
}

我选择只创建嵌套结构体,因为我认为没有必要将F作为自己的类型,并在E中引用它。这样可以直接从CSV的行解析结构,并最终生成Go代码。我的程序还允许F和H作为兄弟节点存在,即使更深层次的G在H之前(在CSV中)。

我从一个名为TypeDef的有类型字典开始:

class TypeDef(TypedDict):
    """A (possible) hierarchy of config values that will be materialized into (possibly nested) Go structs."""

    yaml_key_path: str
    """The original dot-separated YAML key path."""

    toml_name: str
    yaml_name: str
    go_name: str
    go_type: str

    fields: list["TypeDef"]

fields参数允许我递归地添加更多的TypeDef,以模拟上面看到的嵌套结构体。

我在读取CSV方面遇到了一些困难,并最终选择了以下方法:

def add(td: TypeDef, tl_tds: list[TypeDef]):
    """
    Adds td either as a top-level TypeDef, directly to a top-level TypeDef's fields, or to
    a sub-field of a top-level TypeDef.
    """
    print(f"adding {td['yaml_key_path']}")

    path_keys = td["yaml_key_path"].split(".")

    if len(path_keys) == 1:
        tl_tds.append(td)
        return

    tl_key = path_keys[0]
    tl_td: TypeDef | None = None
    for x_td in tl_tds:
        if x_td["yaml_name"] == tl_key:
            tl_td = x_td
            break
    assert tl_td is not None, f"could not find top-level TypeDef with key {tl_key}"

    if len(path_keys) == 2:
        tl_td["fields"].append(td)
        return

    parent_td = tl_td  # rename top-level to parent (same object)

    # Skip top-level key and omit final key.  If not all intermediate keys exist as
    # prescribed by YAML key the last-found parent will be used.
    intermediate_keys = path_keys[1:-1]
    for key in intermediate_keys:
        for child_td in parent_td["fields"]:
            if child_td["yaml_name"] == key:
                parent_td = child_td
                break
    parent_td["fields"].append(td)

该函数对于一个TypeDef有特殊情况:

  • 没有父级:只需将其添加到结构体列表中

  • 只有一个父级:将其添加到顶级TypeDef的fields中

  • 最后,任意数量的中间(父级)TypeDef:找到最后一个父级TypeDef。注意关于缺少中间键/TypeDef的注释。其中一些配置指定了不存在的YAML键路径:

    user_store.connection_password,userStore.connectionPassword,ConnectionPassword,string
    user_store.properties.CaseInsensitiveUsername,userStore.properties.caseInsensitiveUsername,CaseInsensitiveUsername,bool
    

    CaseInsensitiveUsername似乎是properties的子级,但是properties从未单独定义过,因此程序将CaseInsensitiveUsername直接添加到userStore中,如下所示:

    [
        ...
        {
            "yaml_key_path": "userStore",
            "toml_name": "user_store",
            "yaml_name": "userStore",
            "go_name": "UserStore",
            "go_type": "UserStore",
            "fields": [
                {
                    "yaml_key_path": "userStore.type",
                    "toml_name": "type",
                    "yaml_name": "type",
                    "go_name": "Type",
                    "go_type": "string",
                    "fields": [],
                },
                ...
                {
                    "yaml_key_path": "userStore.properties.caseInsensitiveUsername",
                    "toml_name": "CaseInsensitiveUsername",
                    "yaml_name": "caseInsensitiveUsername",
                    "go_name": "CaseInsensitiveUsername",
                    "go_type": "bool",
                    "fields": [],
                },
                ...
            ],
        }
        ...
    ]
    

将上述示例结构写入Go代码变得非常简单。

我开始迭代列表中的顶级TypeDef(结构体):

w("// Code generated by ... DO NOT EDIT.\n")
w("package main\n")
w("\n")

# Top-level TypeDefs are 'types' in Go
for tl_td in tl_tds:
    w("type ")
    write_td(tl_td)
    w("\n\n")

对于每个顶级TypeDef,递归地写入其字段:

def write_td(td: TypeDef):
    """Write a TypeDef (and recursively its fields)."""

    def write_struct_tag(td: TypeDef):
        w(f"`json:\"{td['yaml_name']},omitempty\" toml:\"{td['toml_name']}\"`")

    w("\n")
    w(td["go_name"] + " ")

    # If not a struct (simple Go type)
    if td["fields"] == []:
        w(td["go_type"])
        return

    # Else a struct
    w("struct {")
    for x_td in td["fields"]:
        write_td(x_td)
        write_struct_tag(x_td)
    w("}")

最终的Go代码看起来有点凌乱:

// Code generated by ... DO NOT EDIT.
package main

type 
A struct {
B string`json:"b,omitempty" toml:"b"`
C string`json:"c,omitempty" toml:"c"`
D bool`json:"d,omitempty" toml:"d"`}

type 
E struct {
F struct {
G string`json:"g,omitempty" toml:"g"`}`json:"f,omitempty" toml:"f"`
H string`json:"h,omitempty" toml:"h"`}

但它具有最少的换行和空格以保持语法正确,这是Gofmt所需的:

import subprocess

try:
    subprocess.run(["go", "fmt", "output.go"], check=True)
except subprocess.CalledProcessError as e:
    print(f"could not run go fmt output.go: {e}")

使其看起来像我在帖子顶部分享的示例。

你可以在这个解决方案中看到我使用/创建的所有Python代码、CSV文件和生成的Go代码,在这里

在完整的代码中,你会看到我没有使用Pandas(在我看来,不是正确的Pandas应用程序),并且你会看到我验证了每一行:

for row in reader:
    for i, x in enumerate(row):
        assert x != "", f"on line {reader.line_num} field {i+1} was empty"

因为我在my_configs.csv中发现了一行缺少Go类型的情况:

AssertionError: on line 165 field 4 was empty

对应于以下行:

authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,EnableMergingCustomClaimMappingsWithDefault,

我认为它缺少(并且我更正为)Go类型bool。

英文:

Rukshan. I see you did some work and came up with a solution you like. I tried it and got some weird errors, still. Maybe the CSV or the code has changed and it does indeed run without errors for you... I don't know.

I see problems in output like Endpoint being duplicated, and I see Go's basic string type being redefined:

type string struct {
	FromAddress string `json:"fromAddress,omitempty" toml:"from_address"`
	Username string `json:"username,omitempty" toml:"username"`
	Password string `json:"password,omitempty" toml:"password"`
	Hostname string `json:"hostname,omitempty" toml:"hostname"`
	...

I've worked out a different solution.

Starting with a simple, sample CSV, like:

| toml_key | yaml_key | go_var | go_type |
|----------|----------|--------|---------|
| a        | a        | A      | A       |
| a.b      | a.b      | B      | string  |
| a.c      | a.c      | C      | string  |
| a.d      | a.d      | D      | bool    |
| e        | e        | E      | E       |
| e.f      | e.f      | F      | F       |
| e.f.g    | e.f.g    | G      | string  |
| e.h      | e.h      | H      | string  |

My program generates the following Go code:

// Code generated by ... DO NOT EDIT.
package main

type A struct {
	B string `json:"b,omitempty" toml:"b"`
	C string `json:"c,omitempty" toml:"c"`
	D bool   `json:"d,omitempty" toml:"d"`
}

type E struct {
	F struct {
		G string `json:"g,omitempty" toml:"g"`
	} `json:"f,omitempty" toml:"f"`
	H string `json:"h,omitempty" toml:"h"`
}

I chose to just create nested structs since I don't see the need for F to be its own type that is referenced in E. This allowed for a direct approach to parsing the structure from the rows of CSV, and finally generating the Go code. My program also allows for F and H to be siblings, even with the more-deeply nested G coming before H (in the CSV).

I started with a typed dict named TypeDef:

class TypeDef(TypedDict):
    """A (possible) hierarchy of config values that will be materialized into (possibly nested) Go structs."""

    yaml_key_path: str
    """The original dot-separated YAML key path."""

    toml_name: str
    yaml_name: str
    go_name: str
    go_type: str

    fields: list["TypeDef"]

The fields parameter allows me to recursively add more TypeDefs, to mimic nested structs we saw above.

I struggled with the approach to reading the CSV and settled on this:

def add(td: TypeDef, tl_tds: list[TypeDef]):
    """
    Adds td either as a top-level TypeDef, directly to a top-level TypeDef's fields, or to
    a sub-field of a top-level TypeDef.
    """
    print(f"adding {td['yaml_key_path']}")

    path_keys = td["yaml_key_path"].split(".")

    if len(path_keys) == 1:
        tl_tds.append(td)
        return

    tl_key = path_keys[0]
    tl_td: TypeDef | None = None
    for x_td in tl_tds:
        if x_td["yaml_name"] == tl_key:
            tl_td = x_td
            break
    assert tl_td is not None, f"could not find top-level TypeDef with key {tl_key}"

    if len(path_keys) == 2:
        tl_td["fields"].append(td)
        return

    parent_td = tl_td  # rename top-level to parent (same object)

    # Skip top-level key and omit final key.  If not all intermediate keys exist as
    # prescribed by YAML key the last-found parent will be used.
    intermediate_keys = path_keys[1:-1]
    for key in intermediate_keys:
        for child_td in parent_td["fields"]:
            if child_td["yaml_name"] == key:
                parent_td = child_td
                break
    parent_td["fields"].append(td)

The function makes special cases for a TypeDef:

  • without a parent: just add it to the list of structs

  • with just one parent: add it to the fields of the top-level TypeDef

  • finally, any number of intermediate (parent) TypeDefs: find the last the parent TypeDef. Notice the comment about missing intermediate keys/TypeDefs. Some of the configs specify YAML key paths that don't exist:

    user_store.connection_password,userStore.connectionPassword,ConnectionPassword,string
    user_store.properties.CaseInsensitiveUsername,userStore.properties.caseInsensitiveUsername,CaseInsensitiveUsername,bool
    

    CaseInsensitiveUsername appears to be a child of properties, but properties was never defined on its own, so the program will add CaseInsensitiveUsername directly to userStore, like:

    [
        ...
        {
            "yaml_key_path": "userStore",
            "toml_name": "user_store",
            "yaml_name": "userStore",
            "go_name": "UserStore",
            "go_type": "UserStore",
            "fields": [
                {
                    "yaml_key_path": "userStore.type",
                    "toml_name": "type",
                    "yaml_name": "type",
                    "go_name": "Type",
                    "go_type": "string",
                    "fields": [],
                },
                ...
                {
                    "yaml_key_path": "userStore.properties.caseInsensitiveUsername",
                    "toml_name": "CaseInsensitiveUsername",
                    "yaml_name": "caseInsensitiveUsername",
                    "go_name": "CaseInsensitiveUsername",
                    "go_type": "bool",
                    "fields": [],
                },
                ...
            ],
        }
        ...
    ]
    

Writing that sample structure above to Go code becomes fairly simple.

I start iterating the top-level TypeDefs (structs) in the list-of-TypeDefs:

w("// Code generated by ... DO NOT EDIT.\n")
w("package main\n")
w("\n")

# Top-level TypeDefs are 'types' in Go
for tl_td in tl_tds:
    w("type ")
    write_td(tl_td)
    w("\n\n")

and for each top-level TypeDef, recursively write its fields:

def write_td(td: TypeDef):
    """Write a TypeDef (and recursively its fields)."""

    def write_struct_tag(td: TypeDef):
        w(f"`json:\"{td['yaml_name']},omitempty\" toml:\"{td['toml_name']}\"`")

    w("\n")
    w(td["go_name"] + " ")

    # If not a struct (simple Go type)
    if td["fields"] == []:
        w(td["go_type"])
        return

    # Else a struct
    w("struct {")
    for x_td in td["fields"]:
        write_td(x_td)
        write_struct_tag(x_td)
    w("}")

The final Go code looks a bit sloppy:

// Code generated by ... DO NOT EDIT.
package main

type 
A struct {
B string`json:"b,omitempty" toml:"b"`
C string`json:"c,omitempty" toml:"c"`
D bool`json:"d,omitempty" toml:"d"`}

type 
E struct {
F struct {
G string`json:"g,omitempty" toml:"g"`}`json:"f,omitempty" toml:"f"`
H string`json:"h,omitempty" toml:"h"`}

but it has the minimum number of line breaks and spaces to be syntactically correct, which is all Gofmt needs:

import subprocess

try:
    subprocess.run(["go", "fmt", "output.go"], check=True)
except subprocess.CalledProcessError as e:
    print(f"could not run go fmt output.go: {e}")

to make it look like the sample I shared at the top of the post.

You can see all the Python code, CSVs, and generated Go code I used/made in this solution, here.

In the full code you'll see I don't use Pandas (in my opinion, not the correct application for Pandas), and you'll see me validating each row:

for row in reader:
    for i, x in enumerate(row):
        assert x != "", f"on line {reader.line_num} field {i+1} was empty"

because I found a row in my_configs.csv that was missing a Go type:

AssertionError: on line 165 field 4 was empty

which corresponds to this row:

authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,EnableMergingCustomClaimMappingsWithDefault,

which I believe is missing (and I corrected to) the Go type bool.

huangapple
  • 本文由 发表于 2023年7月4日 17:41:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76611276.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定