英文:
Convert dot-separated values into Go structs using Python
问题
这是一个特定的应用要求,其配置可以更改(特别是WSO2 Identity Server,因为我正在使用Go为其编写Kubernetes操作员)。但这与当前问题无关。我想创建一个解决方案,可以轻松管理大量配置映射以生成Go结构体。这些配置在一个.csv文件中进行映射。
.csv文件链接 - my_configs.csv
我想要:
编写一个Python脚本,可以自动生成Go结构体,这样只需执行Python脚本就可以通过创建相应的Go结构体来更新应用程序配置的任何更改。我指的是应用程序本身的配置。例如,csv中的toml键名可以更改/添加新值。
到目前为止,我已经成功创建了一个几乎实现我的目标的Python脚本。脚本如下:
import pandas as pd
def convert_to_dict(data):
result = {}
for row in data:
current_dict = result
for item in row[:-1]:
if item is not None:
if item not in current_dict:
current_dict[item] = {}
current_dict = current_dict[item]
return result
def extract_json_key(yaml_key):
if isinstance(yaml_key, str) and '.' in yaml_key:
return yaml_key.split('.')[-1]
else:
return yaml_key
def add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key):
struct_string += str(go_var) + " " + str(go_type) + ' `json:"' + str(json_key) + ',omitempty" toml:"' +str(toml_key) + '"` ' + "\n"
return struct_string
def generate_go_struct(struct_name, struct_data):
struct_name="Configurations" if struct_name == "" else struct_name
struct_string = "type " + struct_name + " struct {\n"
yaml_key=df['yaml_key'].str.split('.').str[-1]
# Base case: Generate fields for the current struct level
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
toml_key = selected_rows['toml_key'].values[1]
go_type=selected_rows['go_type'].values[1]
json_key=selected_rows['json_key'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
toml_key = selected_rows['toml_key'].values[0]
go_type=selected_rows['go_type'].values[0]
json_key=selected_rows['json_key'].values[0]
# Add fields to the body of the struct
struct_string=add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key)
struct_string += "}\n\n"
# Recursive case: Generate struct definitions for nested structs
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
if isinstance(value, dict) and any(isinstance(v, dict) for v in value.values()):
nested_struct_name = go_var
nested_struct_data = value
struct_string += generate_go_struct(nested_struct_name, nested_struct_data)
return struct_string
# Read excel
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)
# Remove rows where all columns are NaN
df = df.dropna(how='all')
# Create the 'json_key' column using the custom function
df['json_key'] = df['yaml_key'].apply(extract_json_key)
data=df['yaml_key'].values.tolist() # Read the 'yaml_key' column
data = pd.DataFrame({'column':data}) # Convert to dataframe
data=data['column'].str.split('.', expand=True) # Split by '.'
nested_list = data.values.tolist() # Convert to nested list
data=nested_list
result_json = convert_to_dict(data) # Convert to dict (JSON)
# The generated co code
go_struct = generate_go_struct("", result_json)
# Write to file
file_path = "output.go"
with open(file_path, "w") as file:
file.write(go_struct)
问题是(请看csv的下面部分):
authentication.authenticator.basic
authentication.authenticator.basic.parameters
authentication.authenticator.basic.parameters.showAuthFailureReason
authentication.authenticator.basic.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp
authentication.authenticator.totp.parameters
authentication.authenticator.totp.parameters.showAuthFailureReason
authentication.authenticator.totp.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp.parameters.encodingMethod
authentication.authenticator.totp.parameters.timeStepSize
在这里,由于parameters
字段在basic
和totp
中重复,脚本会混淆并生成两个TotpParameters
结构体。期望的结果是有BasicParameters
和TotpParameters
结构体。csv的yaml_key
列中存在许多类似重复的单词。
我理解这与go_var = selected_rows['go_var'].values[1]
中硬编码的索引为1有关,但我很难解决这个问题。
请问有人可以指点我一个答案吗?我认为以下可能是问题的根本原因:
- 递归函数的问题
- 生成JSON的代码的问题
谢谢!
我也尝试了ChatGPT,但由于这涉及到嵌套和递归,ChatGPT提供的答案并不是很有效。
更新
我发现问题存在于包含properties
、poolOptions
、endpoint
和parameters
字段的行中。这是因为它们在yaml_key
列中重复出现。
英文:
This is a specific requirement for a application whose configurations can be changed (Specifically the WSO2 Identity Server since I'm writing a Kubernetes operator for it using Go). But it's really not relevant here. I want to create a solution which would allow to easily manage a lot of config mappings to generate Go structs. These configs are mapped within a .csv
Link to .csv - my_configs.csv
I want to,
write a python script that would automatically generate the Go structs so that any change to the application configs could be updated by creating the corresponding Go structs by simply executing the python script. I'm referring to the configs of the application itself. For example, the toml key names in the csv can be changed/new values can be added.
I have been successful so far to create a python script that nearly achieves my goal. The script is,
import pandas as pd
def convert_to_dict(data):
result = {}
for row in data:
current_dict = result
for item in row[:-1]:
if item is not None:
if item not in current_dict:
current_dict[item] = {}
current_dict = current_dict[item]
return result
def extract_json_key(yaml_key):
if isinstance(yaml_key, str) and '.' in yaml_key:
return yaml_key.split('.')[-1]
else:
return yaml_key
def add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key):
struct_string += str(go_var) + " " + str(go_type) + ' `json:"' + str(json_key) + ',omitempty" toml:"' +str(toml_key) + '"` ' + "\n"
return struct_string
def generate_go_struct(struct_name, struct_data):
struct_name="Configurations" if struct_name == "" else struct_name
struct_string = "type " + struct_name + " struct {\n"
yaml_key=df['yaml_key'].str.split('.').str[-1]
# Base case: Generate fields for the current struct level
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
toml_key = selected_rows['toml_key'].values[1]
go_type=selected_rows['go_type'].values[1]
json_key=selected_rows['json_key'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
toml_key = selected_rows['toml_key'].values[0]
go_type=selected_rows['go_type'].values[0]
json_key=selected_rows['json_key'].values[0]
# Add fields to the body of the struct
struct_string=add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key)
struct_string += "}\n\n"
# Recursive case: Generate struct definitions for nested structs
for key, value in struct_data.items():
selected_rows = df[yaml_key == key]
if len(selected_rows) > 1:
go_var = selected_rows['go_var'].values[1]
else:
go_var = selected_rows['go_var'].values[0]
if isinstance(value, dict) and any(isinstance(v, dict) for v in value.values()):
nested_struct_name = go_var
nested_struct_data = value
struct_string += generate_go_struct(nested_struct_name, nested_struct_data)
return struct_string
# Read excel
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)
# Remove rows where all columns are NaN
df = df.dropna(how='all')
# Create the 'json_key' column using the custom function
df['json_key'] = df['yaml_key'].apply(extract_json_key)
data=df['yaml_key'].values.tolist() # Read the 'yaml_key' column
data = pd.DataFrame({'column':data}) # Convert to dataframe
data=data['column'].str.split('.', expand=True) # Split by '.'
nested_list = data.values.tolist() # Convert to nested list
data=nested_list
result_json = convert_to_dict(data) # Convert to dict (JSON)
# The generated co code
go_struct = generate_go_struct("", result_json)
# Write to file
file_path = "output.go"
with open(file_path, "w") as file:
file.write(go_struct)
The problem is (look at the below part of the csv),
authentication.authenticator.basic
authentication.authenticator.basic.parameters
authentication.authenticator.basic.parameters.showAuthFailureReason
authentication.authenticator.basic.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp
authentication.authenticator.totp.parameters
authentication.authenticator.totp.parameters.showAuthFailureReason
authentication.authenticator.totp.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp.parameters.encodingMethod
authentication.authenticator.totp.parameters.timeStepSize
Here since the fields parameters
are repeated for basic
and totp
, the script confuses itself and produces two TotpParameters
structs. The expected outcome is to have BasicParameters
and TotpParameters
structs. Many similar repeating words are present in the csv's yaml_key
column.
I understand this has something to do with the index being hardcoded as 1 in go_var = selected_rows['go_var'].values[1]
but have a hard-time fixing this.
Could anyone please point me to an answer? I think,
- An issue with the recursive function
- An issue in the code to generate the JSON
might be a root cause for this issue.
Thanks!
I tried with ChatGPT also, but since this has something to do with nesting and recursion, the provided answers by ChatGPT are not very valid.
UPDATE
I found that the problem exists for the rows that contain properties
, poolOptions
, endpoint
and parameters
fields. This is because they are repeated in the yaml_key
column.
答案1
得分: 1
我能够解决这个问题。但是,我不得不完全采用一种新的方法来解决这个问题,即使用树数据结构,然后遍历它。这是其中的主要逻辑 - https://www.geeksforgeeks.org/level-order-tree-traversal/
以下是工作的Python代码。
import pandas as pd
from collections import deque
structs=[]
class TreeNode:
def __init__(self, name):
self.name = name
self.children = []
self.path=""
def add_child(self, child):
self.children.append(child)
def create_tree(data):
root = TreeNode('')
for item in data:
node = root
for name in item.split('.'):
existing_child = next((child for child in node.children if child.name == name), None)
if existing_child:
node = existing_child
else:
new_child = TreeNode(name)
node.add_child(new_child)
node = new_child
return root
def generate_go_struct(struct_data):
struct_name = struct_data['struct_name']
fields = struct_data['fields']
go_struct = f"type {struct_name} struct {{\n"
for field in fields:
field_name = field['name']
field_type = field['type']
field_default_val = str(field['default_val'])
json_key=field['json_key']
toml_key=field['toml_key']
tail_part=f"\t{field_name} {field_type} `json:\"{json_key},omitempty\" toml:\"{toml_key}\"`\n\n"
if pd.isna(field['default_val']):
go_struct += tail_part
else:
field_default_val = "\t// +kubebuilder:default:=" + field_default_val
go_struct += field_default_val + "\n" + tail_part
go_struct += "}\n\n"
return go_struct
def write_go_file(go_structs, file_path):
with open(file_path, 'w') as file:
for go_struct in go_structs:
file.write(go_struct)
def create_new_struct(struct_name):
struct_name = "Configurations" if struct_name == "" else struct_name
struct_dict = {
"struct_name": struct_name,
"fields": []
}
return struct_dict
def add_field(struct_dict, field_name, field_type,default_val,json_key, toml_key):
field_dict = {
"name": field_name,
"type": field_type,
"default_val": default_val,
"json_key":json_key,
"toml_key":toml_key
}
struct_dict["fields"].append(field_dict)
return struct_dict
def traverse_tree(root):
queue = deque([root])
while queue:
node = queue.popleft()
filtered_df = df[df['yaml_key'] == node.path]
go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
if node.path=="":
go_type="Configurations"
# The structs themselves
current_struct = create_new_struct(go_type)
for child in node.children:
if (node.name!=""):
child.path=node.path+"."+child.name
else:
child.path=child.name
filtered_df = df[df['yaml_key'] == child.path]
go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
default_val = filtered_df['default_val'].values[0] if not filtered_df.empty else None
# Struct fields
json_key = filtered_df['yaml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
toml_key = filtered_df['toml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
current_struct = add_field(current_struct, go_var, go_type,default_val,json_key, toml_key)
if (child.children):
# Add each child to the queue for processing
queue.append(child)
go_struct = generate_go_struct(current_struct)
# print(go_struct,"\n")
structs.append(go_struct)
write_go_file(structs, "output.go")
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)
sample_data=df['yaml_key'].values.tolist()
# Create the tree
tree = create_tree(sample_data)
# Traverse the tree
traverse_tree(tree)
非常感谢您的所有帮助!
英文:
I was able to solve the issue. But, I had to completely use a new approach to solve the problem, i.e. using a tree data structure and then traversing it. Here's the main logic behind it - https://www.geeksforgeeks.org/level-order-tree-traversal/
Here's the working python code.
import pandas as pd
from collections import deque
structs=[]
class TreeNode:
def __init__(self, name):
self.name = name
self.children = []
self.path=""
def add_child(self, child):
self.children.append(child)
def create_tree(data):
root = TreeNode('')
for item in data:
node = root
for name in item.split('.'):
existing_child = next((child for child in node.children if child.name == name), None)
if existing_child:
node = existing_child
else:
new_child = TreeNode(name)
node.add_child(new_child)
node = new_child
return root
def generate_go_struct(struct_data):
struct_name = struct_data['struct_name']
fields = struct_data['fields']
go_struct = f"type {struct_name} struct {{\n"
for field in fields:
field_name = field['name']
field_type = field['type']
field_default_val = str(field['default_val'])
json_key=field['json_key']
toml_key=field['toml_key']
tail_part=f"\t{field_name} {field_type} `json:\"{json_key},omitempty\" toml:\"{toml_key}\"`\n\n"
if pd.isna(field['default_val']):
go_struct += tail_part
else:
field_default_val = "\t// +kubebuilder:default:=" + field_default_val
go_struct += field_default_val + "\n" + tail_part
go_struct += "}\n\n"
return go_struct
def write_go_file(go_structs, file_path):
with open(file_path, 'w') as file:
for go_struct in go_structs:
file.write(go_struct)
def create_new_struct(struct_name):
struct_name = "Configurations" if struct_name == "" else struct_name
struct_dict = {
"struct_name": struct_name,
"fields": []
}
return struct_dict
def add_field(struct_dict, field_name, field_type,default_val,json_key, toml_key):
field_dict = {
"name": field_name,
"type": field_type,
"default_val": default_val,
"json_key":json_key,
"toml_key":toml_key
}
struct_dict["fields"].append(field_dict)
return struct_dict
def traverse_tree(root):
queue = deque([root])
while queue:
node = queue.popleft()
filtered_df = df[df['yaml_key'] == node.path]
go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
if node.path=="":
go_type="Configurations"
# The structs themselves
current_struct = create_new_struct(go_type)
for child in node.children:
if (node.name!=""):
child.path=node.path+"."+child.name
else:
child.path=child.name
filtered_df = df[df['yaml_key'] == child.path]
go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
default_val = filtered_df['default_val'].values[0] if not filtered_df.empty else None
# Struct fields
json_key = filtered_df['yaml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
toml_key = filtered_df['toml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
current_struct = add_field(current_struct, go_var, go_type,default_val,json_key, toml_key)
if (child.children):
# Add each child to the queue for processing
queue.append(child)
go_struct = generate_go_struct(current_struct)
# print(go_struct,"\n")
structs.append(go_struct)
write_go_file(structs, "output.go")
csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file)
sample_data=df['yaml_key'].values.tolist()
# Create the tree
tree = create_tree(sample_data)
# Traverse the tree
traverse_tree(tree)
Thanks a lot for all of your helps!
答案2
得分: 0
Rukshan,我看到你做了一些工作,并提出了一个你喜欢的解决方案。我尝试了一下,仍然出现了一些奇怪的错误。也许CSV文件或代码已经发生了变化,对你来说确实可以无错误运行...我不知道。
我看到输出中存在一些问题,比如Endpoint重复出现,并且我看到Go的基本字符串类型被重新定义:
type string struct {
FromAddress string `json:"fromAddress,omitempty" toml:"from_address"`
Username string `json:"username,omitempty" toml:"username"`
Password string `json:"password,omitempty" toml:"password"`
Hostname string `json:"hostname,omitempty" toml:"hostname"`
...
我找到了另一种解决方案。
从一个简单的示例CSV开始,如下所示:
| toml_key | yaml_key | go_var | go_type |
|----------|----------|--------|---------|
| a | a | A | A |
| a.b | a.b | B | string |
| a.c | a.c | C | string |
| a.d | a.d | D | bool |
| e | e | E | E |
| e.f | e.f | F | F |
| e.f.g | e.f.g | G | string |
| e.h | e.h | H | string |
我的程序生成了以下Go代码:
// Code generated by ... DO NOT EDIT.
package main
type A struct {
B string `json:"b,omitempty" toml:"b"`
C string `json:"c,omitempty" toml:"c"`
D bool `json:"d,omitempty" toml:"d"`
}
type E struct {
F struct {
G string `json:"g,omitempty" toml:"g"`
} `json:"f,omitempty" toml:"f"`
H string `json:"h,omitempty" toml:"h"`
}
我选择只创建嵌套结构体,因为我认为没有必要将F作为自己的类型,并在E中引用它。这样可以直接从CSV的行解析结构,并最终生成Go代码。我的程序还允许F和H作为兄弟节点存在,即使更深层次的G在H之前(在CSV中)。
我从一个名为TypeDef的有类型字典开始:
class TypeDef(TypedDict):
"""A (possible) hierarchy of config values that will be materialized into (possibly nested) Go structs."""
yaml_key_path: str
"""The original dot-separated YAML key path."""
toml_name: str
yaml_name: str
go_name: str
go_type: str
fields: list["TypeDef"]
fields参数允许我递归地添加更多的TypeDef,以模拟上面看到的嵌套结构体。
我在读取CSV方面遇到了一些困难,并最终选择了以下方法:
def add(td: TypeDef, tl_tds: list[TypeDef]):
"""
Adds td either as a top-level TypeDef, directly to a top-level TypeDef's fields, or to
a sub-field of a top-level TypeDef.
"""
print(f"adding {td['yaml_key_path']}")
path_keys = td["yaml_key_path"].split(".")
if len(path_keys) == 1:
tl_tds.append(td)
return
tl_key = path_keys[0]
tl_td: TypeDef | None = None
for x_td in tl_tds:
if x_td["yaml_name"] == tl_key:
tl_td = x_td
break
assert tl_td is not None, f"could not find top-level TypeDef with key {tl_key}"
if len(path_keys) == 2:
tl_td["fields"].append(td)
return
parent_td = tl_td # rename top-level to parent (same object)
# Skip top-level key and omit final key. If not all intermediate keys exist as
# prescribed by YAML key the last-found parent will be used.
intermediate_keys = path_keys[1:-1]
for key in intermediate_keys:
for child_td in parent_td["fields"]:
if child_td["yaml_name"] == key:
parent_td = child_td
break
parent_td["fields"].append(td)
该函数对于一个TypeDef有特殊情况:
-
没有父级:只需将其添加到结构体列表中
-
只有一个父级:将其添加到顶级TypeDef的fields中
-
最后,任意数量的中间(父级)TypeDef:找到最后一个父级TypeDef。注意关于缺少中间键/TypeDef的注释。其中一些配置指定了不存在的YAML键路径:
user_store.connection_password,userStore.connectionPassword,ConnectionPassword,string user_store.properties.CaseInsensitiveUsername,userStore.properties.caseInsensitiveUsername,CaseInsensitiveUsername,bool
CaseInsensitiveUsername似乎是properties的子级,但是properties从未单独定义过,因此程序将CaseInsensitiveUsername直接添加到userStore中,如下所示:
[ ... { "yaml_key_path": "userStore", "toml_name": "user_store", "yaml_name": "userStore", "go_name": "UserStore", "go_type": "UserStore", "fields": [ { "yaml_key_path": "userStore.type", "toml_name": "type", "yaml_name": "type", "go_name": "Type", "go_type": "string", "fields": [], }, ... { "yaml_key_path": "userStore.properties.caseInsensitiveUsername", "toml_name": "CaseInsensitiveUsername", "yaml_name": "caseInsensitiveUsername", "go_name": "CaseInsensitiveUsername", "go_type": "bool", "fields": [], }, ... ], } ... ]
将上述示例结构写入Go代码变得非常简单。
我开始迭代列表中的顶级TypeDef(结构体):
w("// Code generated by ... DO NOT EDIT.\n")
w("package main\n")
w("\n")
# Top-level TypeDefs are 'types' in Go
for tl_td in tl_tds:
w("type ")
write_td(tl_td)
w("\n\n")
对于每个顶级TypeDef,递归地写入其字段:
def write_td(td: TypeDef):
"""Write a TypeDef (and recursively its fields)."""
def write_struct_tag(td: TypeDef):
w(f"`json:\"{td['yaml_name']},omitempty\" toml:\"{td['toml_name']}\"`")
w("\n")
w(td["go_name"] + " ")
# If not a struct (simple Go type)
if td["fields"] == []:
w(td["go_type"])
return
# Else a struct
w("struct {")
for x_td in td["fields"]:
write_td(x_td)
write_struct_tag(x_td)
w("}")
最终的Go代码看起来有点凌乱:
// Code generated by ... DO NOT EDIT.
package main
type
A struct {
B string`json:"b,omitempty" toml:"b"`
C string`json:"c,omitempty" toml:"c"`
D bool`json:"d,omitempty" toml:"d"`}
type
E struct {
F struct {
G string`json:"g,omitempty" toml:"g"`}`json:"f,omitempty" toml:"f"`
H string`json:"h,omitempty" toml:"h"`}
但它具有最少的换行和空格以保持语法正确,这是Gofmt所需的:
import subprocess
try:
subprocess.run(["go", "fmt", "output.go"], check=True)
except subprocess.CalledProcessError as e:
print(f"could not run go fmt output.go: {e}")
使其看起来像我在帖子顶部分享的示例。
你可以在这个解决方案中看到我使用/创建的所有Python代码、CSV文件和生成的Go代码,在这里。
在完整的代码中,你会看到我没有使用Pandas(在我看来,不是正确的Pandas应用程序),并且你会看到我验证了每一行:
for row in reader:
for i, x in enumerate(row):
assert x != "", f"on line {reader.line_num} field {i+1} was empty"
因为我在my_configs.csv中发现了一行缺少Go类型的情况:
AssertionError: on line 165 field 4 was empty
对应于以下行:
authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,EnableMergingCustomClaimMappingsWithDefault,
我认为它缺少(并且我更正为)Go类型bool。
英文:
Rukshan. I see you did some work and came up with a solution you like. I tried it and got some weird errors, still. Maybe the CSV or the code has changed and it does indeed run without errors for you... I don't know.
I see problems in output like Endpoint being duplicated, and I see Go's basic string type being redefined:
type string struct {
FromAddress string `json:"fromAddress,omitempty" toml:"from_address"`
Username string `json:"username,omitempty" toml:"username"`
Password string `json:"password,omitempty" toml:"password"`
Hostname string `json:"hostname,omitempty" toml:"hostname"`
...
I've worked out a different solution.
Starting with a simple, sample CSV, like:
| toml_key | yaml_key | go_var | go_type |
|----------|----------|--------|---------|
| a | a | A | A |
| a.b | a.b | B | string |
| a.c | a.c | C | string |
| a.d | a.d | D | bool |
| e | e | E | E |
| e.f | e.f | F | F |
| e.f.g | e.f.g | G | string |
| e.h | e.h | H | string |
My program generates the following Go code:
// Code generated by ... DO NOT EDIT.
package main
type A struct {
B string `json:"b,omitempty" toml:"b"`
C string `json:"c,omitempty" toml:"c"`
D bool `json:"d,omitempty" toml:"d"`
}
type E struct {
F struct {
G string `json:"g,omitempty" toml:"g"`
} `json:"f,omitempty" toml:"f"`
H string `json:"h,omitempty" toml:"h"`
}
I chose to just create nested structs since I don't see the need for F to be its own type that is referenced in E. This allowed for a direct approach to parsing the structure from the rows of CSV, and finally generating the Go code. My program also allows for F and H to be siblings, even with the more-deeply nested G coming before H (in the CSV).
I started with a typed dict named TypeDef:
class TypeDef(TypedDict):
"""A (possible) hierarchy of config values that will be materialized into (possibly nested) Go structs."""
yaml_key_path: str
"""The original dot-separated YAML key path."""
toml_name: str
yaml_name: str
go_name: str
go_type: str
fields: list["TypeDef"]
The fields parameter allows me to recursively add more TypeDefs, to mimic nested structs we saw above.
I struggled with the approach to reading the CSV and settled on this:
def add(td: TypeDef, tl_tds: list[TypeDef]):
"""
Adds td either as a top-level TypeDef, directly to a top-level TypeDef's fields, or to
a sub-field of a top-level TypeDef.
"""
print(f"adding {td['yaml_key_path']}")
path_keys = td["yaml_key_path"].split(".")
if len(path_keys) == 1:
tl_tds.append(td)
return
tl_key = path_keys[0]
tl_td: TypeDef | None = None
for x_td in tl_tds:
if x_td["yaml_name"] == tl_key:
tl_td = x_td
break
assert tl_td is not None, f"could not find top-level TypeDef with key {tl_key}"
if len(path_keys) == 2:
tl_td["fields"].append(td)
return
parent_td = tl_td # rename top-level to parent (same object)
# Skip top-level key and omit final key. If not all intermediate keys exist as
# prescribed by YAML key the last-found parent will be used.
intermediate_keys = path_keys[1:-1]
for key in intermediate_keys:
for child_td in parent_td["fields"]:
if child_td["yaml_name"] == key:
parent_td = child_td
break
parent_td["fields"].append(td)
The function makes special cases for a TypeDef:
-
without a parent: just add it to the list of structs
-
with just one parent: add it to the fields of the top-level TypeDef
-
finally, any number of intermediate (parent) TypeDefs: find the last the parent TypeDef. Notice the comment about missing intermediate keys/TypeDefs. Some of the configs specify YAML key paths that don't exist:
user_store.connection_password,userStore.connectionPassword,ConnectionPassword,string user_store.properties.CaseInsensitiveUsername,userStore.properties.caseInsensitiveUsername,CaseInsensitiveUsername,bool
CaseInsensitiveUsername appears to be a child of properties, but properties was never defined on its own, so the program will add CaseInsensitiveUsername directly to userStore, like:
[ ... { "yaml_key_path": "userStore", "toml_name": "user_store", "yaml_name": "userStore", "go_name": "UserStore", "go_type": "UserStore", "fields": [ { "yaml_key_path": "userStore.type", "toml_name": "type", "yaml_name": "type", "go_name": "Type", "go_type": "string", "fields": [], }, ... { "yaml_key_path": "userStore.properties.caseInsensitiveUsername", "toml_name": "CaseInsensitiveUsername", "yaml_name": "caseInsensitiveUsername", "go_name": "CaseInsensitiveUsername", "go_type": "bool", "fields": [], }, ... ], } ... ]
Writing that sample structure above to Go code becomes fairly simple.
I start iterating the top-level TypeDefs (structs) in the list-of-TypeDefs:
w("// Code generated by ... DO NOT EDIT.\n")
w("package main\n")
w("\n")
# Top-level TypeDefs are 'types' in Go
for tl_td in tl_tds:
w("type ")
write_td(tl_td)
w("\n\n")
and for each top-level TypeDef, recursively write its fields:
def write_td(td: TypeDef):
"""Write a TypeDef (and recursively its fields)."""
def write_struct_tag(td: TypeDef):
w(f"`json:\"{td['yaml_name']},omitempty\" toml:\"{td['toml_name']}\"`")
w("\n")
w(td["go_name"] + " ")
# If not a struct (simple Go type)
if td["fields"] == []:
w(td["go_type"])
return
# Else a struct
w("struct {")
for x_td in td["fields"]:
write_td(x_td)
write_struct_tag(x_td)
w("}")
The final Go code looks a bit sloppy:
// Code generated by ... DO NOT EDIT.
package main
type
A struct {
B string`json:"b,omitempty" toml:"b"`
C string`json:"c,omitempty" toml:"c"`
D bool`json:"d,omitempty" toml:"d"`}
type
E struct {
F struct {
G string`json:"g,omitempty" toml:"g"`}`json:"f,omitempty" toml:"f"`
H string`json:"h,omitempty" toml:"h"`}
but it has the minimum number of line breaks and spaces to be syntactically correct, which is all Gofmt needs:
import subprocess
try:
subprocess.run(["go", "fmt", "output.go"], check=True)
except subprocess.CalledProcessError as e:
print(f"could not run go fmt output.go: {e}")
to make it look like the sample I shared at the top of the post.
You can see all the Python code, CSVs, and generated Go code I used/made in this solution, here.
In the full code you'll see I don't use Pandas (in my opinion, not the correct application for Pandas), and you'll see me validating each row:
for row in reader:
for i, x in enumerate(row):
assert x != "", f"on line {reader.line_num} field {i+1} was empty"
because I found a row in my_configs.csv that was missing a Go type:
AssertionError: on line 165 field 4 was empty
which corresponds to this row:
authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,authentication.endpoint.enableMergingCustomClaimMappingsWithDefault,EnableMergingCustomClaimMappingsWithDefault,
which I believe is missing (and I corrected to) the Go type bool.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论