将特定列转换为列表,然后创建JSON。

huangapple go评论74阅读模式
英文:

Convert specific columns to list and then create json

问题

我有一个类似以下的电子表格:
将特定列转换为列表,然后创建JSON。

如您所见,有多个 "tags" 列,例如 "tags_0","tags_1","tags_2",还可能有更多。

我正在尝试找到所有 "tags",并将它们放入一个列表中,使用 pandas 数据框,最终将它们放入 JSON 文件中的 "tags" 数组中。

我考虑使用正则表达式,但我找不到应用它的方法。

这是我用来输出 JSON 文件的函数。我添加了 "tags" 数组以供参考:

def convert_products():
    read_exc = pd.read_excel('./data/products.xlsx')
    df = pd.DataFrame(read_exc)
    all_data = []

    for i in range(len(df)):
        js = {
            "sku": df['sku'][i],
            "brand": df['brand'][i],
            "tags": [?]
        }

        all_data.append(js)

    json_object = json.dumps(all_data, ensure_ascii=False, indent=2)

    with open("./data/products.json", "w", encoding='utf-8') as outfile:
        outfile.write(json_object)

如何实现这一目标?

谢谢

英文:

I have a spreadsheet like the following:
将特定列转换为列表,然后创建JSON。

As you can see, there are multiple "tags" columns like this: "tags_0", "tags_1", "tags_2".
And they can be more.

I'm trying to find all the "tags", and put them inside a list using panda's data frame. And eventually, put them inside an array of "tags" inside a json file.

I thought of using regex, but I can't find a way to apply it.

This is the function I'm using to output the json file. I added the tags array for reference:

def convert_products():
    read_exc = pd.read_excel('./data/products.xlsx')
    df = pd.DataFrame(read_exc)
    all_data = []

    for i in range(len(df)):
        js = {
            "sku": df['sku'][i],
            "brand": df['brand'][i],
            "tags": [?]
        }

        all_data.append(js)

    json_object = json.dumps(all_data, ensure_ascii=False, indent=2)

    with open("./data/products.json", "w", encoding='utf-8') as outfile:
        outfile.write(json_object)

How can I achieve this?

Thanks

答案1

得分: 1

你可以通过这样更简单的方法实现...

    df = pd.read_excel('your_file.xlsx')

    tags_columns = [col for col in df.columns if col.startswith("tags_")]
    
    df["tags"] = df[tags_columns].values.tolist()
    
    df[["sku","brand","tags"]].to_json("test.json",orient="records")

如果需要你可以尝试其他的 JSON 方向:`["index","columns","split","records","values","table"]`。在[pandas文档][1]中查看更多详情

  [1]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html
英文:

You can achieve that in a much easier way by doing something like this...

df = pd.read_excel('your_file.xlsx')

tags_columns = [col for col in df.columns if col.startswith("tags_")]

df["tags"] = df[tags_columns].values.tolist()

df[["sku","brand","tags"]].to_json("test.json",orient="records")

You can try other json orientation if you want: ["index","columns","split","records","values","table"]. Check them in pandas documentation

答案2

得分: 0

以下是代码的部分翻译:

首先,您可以将所有的列作为列表获取:

list(df.columns.values)

现在,您可以搜索所有包含 tags_ 的列名,一旦您获取了用于标签的所有列名,您可以循环遍历此列表,检索特定行的特定标签值,并放入一个列表中,然后将其传递到 JSON 对象中。

对于数据框中的每一行
    tagList = []
    对于每个 tagColumn 在 tagColumnList 中
        tagList.append(df[tagColumn][i])
    
   .... 创建 JSON 对象的代码...
 将 tagList 传递给 JSON 对象中的标签键(tags key)

请注意,代码中的 ".... 创建 JSON 对象的代码..." 部分没有提供具体的信息,因此没有进行翻译。

英文:

First You can get all the columns as a list

list(df.columns.values)

Now you can search for all columns names which contains tags_ inside this list, once you get all the columns names which is for tags, you can loop through this list and retrieve specific tag value for specific row and put inside a list
And can pass into json object.

For each row in dataframe:
    tagList =[]
    for tagColumn in tagColumnList:
        tagList.append(df[tagColumn][i])
    
   .... Your code for creating json object...
 Pass this tagList for tags key in json object
        

答案3

得分: 0

你可能在寻找 filter

out = pd.concat([df[['sku', 'brand']], 
                 df.filter(regex='^tags_').agg(list, axis=1).rename('tags')],
                axis=1).to_json(orient='records', indent=2)
print(out)

# 输出
[
  {
    "sku":"ADX112",
    "brand":"ADX",
    "tags":[
      "art",
      "frame",
      "painting"
    ]
  }
]
英文:

You are probably looking for filter:

out = pd.concat([df[['sku', 'brand']], 
                 df.filter(regex='^tags_').agg(list, axis=1).rename('tags')],
                axis=1).to_json(orient='records', indent=2)
print(out)

# Output
[
  {
    "sku":"ADX112",
    "brand":"ADX",
    "tags":[
      "art",
      "frame",
      "painting"
    ]
  }
]

huangapple
  • 本文由 发表于 2023年2月19日 13:44:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498232.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定