2023年6月16日 10:54:49go评论79阅读模式

英文:

JSON to CSV in Python, CSV has more rows than JSON

问题

I have made the necessary translation of your provided text. Below is the translated content you requested:

我正在尝试使用Python将JSON转换为CSV，但遇到了一个问题，2行JSON生成了6行CSV。以下是我的原始JSON和生成的CSV输出的示例：
JSON：

{ " _id "： "a1"， " pl "： [ { " age "：45， " n "： "ar"}， { " age "：52， " n "： "Pi"}， { " age "：18， " n "： "al"} ]， " ld "：13}
{ " _id "： "a2"， " pl "： [ { " age "：85， " n "： "ta"}， { " age "：46， " n "： "lee"} ]， " ld "：14}


CSV：

_id， ld， age， n， age， n
a1， 13， 45， ar， 45， ar
a1， 13， 52， Pi， 52， Pi
a1， 13， 18， al， 18， al
a2， 14， 85， ta， 85， ta
a2， 14， 46， lee， 46， lee

我目前处理2行JSON，但意外地收到5行CSV输出。
以下是我当前使用的Python代码：
```python
import pandas as pd
import glob
for i in glob.glob('D:\.json'):
    data = []
    df_it = pd.read_json(i, encoding='utf-8', lines=True, chunksize=100, dtype=object)
    for sub in df_it:
     data.append(sub)
    df = pd.concat(data)
    df = df.explode('pl')
    df = pd.concat([
        df.reset_index(drop=True),
        pd.json_normalize(df.pl),
    ], axis=1)
    df = df.drop(['pl', ], axis=1)
    df.to_csv('D:\.csv', index=None, encoding='utf-8')

我希望获得如下格式的CSV输出：

_id， pl.age1， pl.n1， pl.age2， pl.n2， pl.age3， pl.n3， ld，
a1， 45， ar， 52， Pi， 18， al， 13
a2， 85， ta， 46， lee， ， ， 14

为了实现所期望的CSV输出，我需要对代码或实现进行哪些更改？


<details>
<summary>英文:</summary>
I am attempting to convert JSON into CSV using Python, but I&#39;m encountering an issue where 2 lines of JSON are producing 6 lines of CSV. Here are samples of my original JSON and resulting CSV outputs:
JSON:

{ "_id" : "a1" ,"pl" : [ { "age" : 45, "n" : "ar"}, { "age" : 52, "n" : "Pi" }, { "age" : 18, "n" : "al"} ] , "ld" : 13}
{ "_id" : "a2" ,"pl" : [ { "age" : 85, "n" : "ta"}, { "age" : 46, "n" : "lee" }] , "ld" : 14}


CSV:

_id, ld, age, n, age, n
a1, 13, 45, ar, 45, ar
a1, 13, 52, Pi, 52, Pi
a1, 13, 18, al, 18, al
a2, 14, 85, ta, 85, ta
a2, 14, 46, lee, 46, lee

I am currently processing 2 lines of JSON but unexpectedly receiving 5 lines of CSV output.
Below is the Python code I&#39;m currently using:

import pandas as pd
import glob

for i in glob.glob('D:\1.json'):
data = []
df_it = pd.read_json(i, encoding='utf-8', lines=True, chunksize=100, dtype=object)
for sub in df_it:
data.append(sub)
df = pd.concat(data)
df = df.explode('pl')
df = pd.concat([
df.reset_index(drop=True),
pd.json_normalize(df.pl),
], axis=1)
df = df.drop(['pl', ], axis=1)
df.to_csv('D:\1.csv', index=None, encoding='utf-8')


I am hoping to get a CSV output formatted like this:

_id, pl.age1, pl.n1, pl.age2, pl.n2, pl.age3, pl.n3, ld,
a1, 45, ar, 52, Pi, 18, al, 13
a2, 85, ta, 46, lee, , , 14


What changes to my code or implementation do I need to make to achieve my desired CSV output?
</details>
# 答案1
**得分**: 0
我假设 JSON 数据被存储在不同的文件中。
**实现：**
```python3
import glob
from cherrypicker import CherryPicker
import json
import pandas as pd
df = pd.DataFrame()
for index, file_name in enumerate(glob.glob("json/*.json")):
    with open(file_name, encoding="utf-8") as file:
        data = json.load(file)
    picker = CherryPicker(data)
    flat = picker.flatten().get()
    df = df.append(flat, ignore_index=True)
df.to_csv("output.csv", index=False)

输出：

_id,pl_0_age,pl_0_n,pl_1_age,pl_1_n,pl_2_age,pl_2_n,ld
a1,45,ar,52,Pi,18.0,al,13
a2,85,ta,46,lee,,,14

英文:

I am assuming that the json are being stored in different files.

Implementation:

import glob
from cherrypicker import CherryPicker
import json
import pandas as pd
df = pd.DataFrame()
for index, file_name in enumerate(glob.glob(&quot;json/*.json&quot;)):
    with open(file_name, encoding=&quot;utf-8&quot;) as file:
        data = json.load(file)
    picker = CherryPicker(data)
    flat = picker.flatten().get()
    df = df._append(flat, ignore_index=True)
df.to_csv(f&quot;output.csv&quot;, index=False)

Output:

_id,pl_0_age,pl_0_n,pl_1_age,pl_1_n,pl_2_age,pl_2_n,ld
a1,45,ar,52,Pi,18.0,al,13
a2,85,ta,46,lee,,,14

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

JSON转CSV在Python中，CSV的行数多于JSON。

问题

我的开放阅读框架（ORF）查找代码没有找到序列中最长的ORF。

运行一个带有额外参数的Python程序。

记录每个按键时在计数定时器中连续记录动力，同时执行keylogger.py。

PySpark / Snowpark 在两个指定日期之间计算累积和

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。