2023年2月19日 13:44:14go评论74阅读模式

英文:

Convert specific columns to list and then create json

问题

我有一个类似以下的电子表格：

如您所见，有多个 "tags" 列，例如 "tags_0"，"tags_1"，"tags_2"，还可能有更多。

我正在尝试找到所有 "tags"，并将它们放入一个列表中，使用 pandas 数据框，最终将它们放入 JSON 文件中的 "tags" 数组中。

我考虑使用正则表达式，但我找不到应用它的方法。

这是我用来输出 JSON 文件的函数。我添加了 "tags" 数组以供参考：

def convert_products():
    read_exc = pd.read_excel('./data/products.xlsx')
    df = pd.DataFrame(read_exc)
    all_data = []

    for i in range(len(df)):
        js = {
            "sku": df['sku'][i],
            "brand": df['brand'][i],
            "tags": [?]
        }

        all_data.append(js)

    json_object = json.dumps(all_data, ensure_ascii=False, indent=2)

    with open("./data/products.json", "w", encoding='utf-8') as outfile:
        outfile.write(json_object)

如何实现这一目标？

谢谢

英文:

I have a spreadsheet like the following:

As you can see, there are multiple "tags" columns like this: "tags_0", "tags_1", "tags_2".
And they can be more.

I'm trying to find all the "tags", and put them inside a list using panda's data frame. And eventually, put them inside an array of "tags" inside a json file.

I thought of using regex, but I can't find a way to apply it.

This is the function I'm using to output the json file. I added the tags array for reference:

def convert_products():
    read_exc = pd.read_excel(&#39;./data/products.xlsx&#39;)
    df = pd.DataFrame(read_exc)
    all_data = []

    for i in range(len(df)):
        js = {
            &quot;sku&quot;: df[&#39;sku&#39;][i],
            &quot;brand&quot;: df[&#39;brand&#39;][i],
            &quot;tags&quot;: [?]
        }

        all_data.append(js)

    json_object = json.dumps(all_data, ensure_ascii=False, indent=2)

    with open(&quot;./data/products.json&quot;, &quot;w&quot;, encoding=&#39;utf-8&#39;) as outfile:
        outfile.write(json_object)

How can I achieve this?

Thanks

答案1

得分: 1

你可以通过这样更简单的方法实现...

    df = pd.read_excel('your_file.xlsx')

    tags_columns = [col for col in df.columns if col.startswith("tags_")]
    
    df["tags"] = df[tags_columns].values.tolist()
    
    df[["sku","brand","tags"]].to_json("test.json",orient="records")

如果需要，你可以尝试其他的 JSON 方向：`["index","columns","split","records","values","table"]`。在[pandas文档][1]中查看更多详情。

  [1]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

英文:

You can achieve that in a much easier way by doing something like this...

df = pd.read_excel(&#39;your_file.xlsx&#39;)

tags_columns = [col for col in df.columns if col.startswith(&quot;tags_&quot;)]

df[&quot;tags&quot;] = df[tags_columns].values.tolist()

df[[&quot;sku&quot;,&quot;brand&quot;,&quot;tags&quot;]].to_json(&quot;test.json&quot;,orient=&quot;records&quot;)

You can try other json orientation if you want: ["index","columns","split","records","values","table"]. Check them in pandas documentation

答案2

得分: 0

以下是代码的部分翻译：

首先，您可以将所有的列作为列表获取：

list(df.columns.values)

现在，您可以搜索所有包含 tags_ 的列名，一旦您获取了用于标签的所有列名，您可以循环遍历此列表，检索特定行的特定标签值，并放入一个列表中，然后将其传递到 JSON 对象中。

对于数据框中的每一行：
    tagList = []
    对于每个 tagColumn 在 tagColumnList 中：
        tagList.append(df[tagColumn][i])
    
   .... 创建 JSON 对象的代码...
 将 tagList 传递给 JSON 对象中的标签键(tags key)

请注意，代码中的 ".... 创建 JSON 对象的代码..." 部分没有提供具体的信息，因此没有进行翻译。

英文:

First You can get all the columns as a list

list(df.columns.values)

Now you can search for all columns names which contains tags_ inside this list, once you get all the columns names which is for tags, you can loop through this list and retrieve specific tag value for specific row and put inside a list
And can pass into json object.

For each row in dataframe:
    tagList =[]
    for tagColumn in tagColumnList:
        tagList.append(df[tagColumn][i])
    
   .... Your code for creating json object...
 Pass this tagList for tags key in json object

答案3

得分: 0

你可能在寻找 filter：

out = pd.concat([df[['sku', 'brand']], 
                 df.filter(regex='^tags_').agg(list, axis=1).rename('tags')],
                axis=1).to_json(orient='records', indent=2)
print(out)

# 输出
[
  {
    "sku":"ADX112",
    "brand":"ADX",
    "tags":[
      "art",
      "frame",
      "painting"
    ]
  }
]

英文:

You are probably looking for filter:

out = pd.concat([df[[&#39;sku&#39;, &#39;brand&#39;]], 
                 df.filter(regex=&#39;^tags_&#39;).agg(list, axis=1).rename(&#39;tags&#39;)],
                axis=1).to_json(orient=&#39;records&#39;, indent=2)
print(out)

# Output
[
  {
    &quot;sku&quot;:&quot;ADX112&quot;,
    &quot;brand&quot;:&quot;ADX&quot;,
    &quot;tags&quot;:[
      &quot;art&quot;,
      &quot;frame&quot;,
      &quot;painting&quot;
    ]
  }
]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将特定列转换为列表，然后创建JSON。

问题

答案1

答案2

答案3

如何编辑Wagtail在页面提交审核时发送的电子邮件的消息？

在Pandas数据框中评估两个条件并进行分别的赋值。

多线图

Python: Print the output N-1 number of times showing the swapswhile doing a selection sort on a random list of numbers

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论