英文:
Python: json_normalize gives AttributeError for list of dict values
问题
I can help you with the translation. Here is the translated content:
我有一个pandas数据框,其中有两列是嵌套列,包含小数值:df.tail(1).to_dict('list')
会得到这样的数据:
{'nested_col1': [array([{'key1': 'CO', 'key2': Decimal('8.940000000')}], dtype=object)],
'nested_col2': [array([{'key3': 'CO', 'key4': 'P14', 'key5': Decimal('8.940000000'), 'key6': None}], dtype=object)]}
我试图通过以下方式展开数据框:
df = (df.drop(cols, axis=1)
.join(pd.concat(
[pd.json_normalize(df[x].explode(), errors='ignore').applymap(
lambda x: str(x) if isinstance(x, (int, float)) else x).add_prefix(f'{x}.') for x in
cols],
axis=1)))
但在某些情况下,我得到以下错误:
Traceback (most recent call last):
File "data_load.py.py", line 365, in <module>
df = prepare_data(data, transaction_id, cohort_no)
File "data_load.py.py", line 274, in prepare_data
df = flatten_dataframe(cols_to_explode, df)
File "data_load.py.py", line 204, in flatten_dataframe
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "data_load.py.py", line 204, in <listcomp>
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in _json_normalize
if any([isinstance(x, dict) for x in y.values()] for y in data):
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in <genexpr>
if any([isinstance(x, dict) for x in y.values()] for y in data):
AttributeError: 'float' object has no attribute 'values'
failed to run commands: exit status 1
我还有什么遗漏的地方吗,或者有没有更好的方法来做同样的事情?
预期输出应该是:
nested_col1.key1,nested_col1.key2 nested_col2.key3 ... 这样
---
If you need further assistance, feel free to ask.
<details>
<summary>英文:</summary>
I have a pandas dataframe where 2 columns are nested column having decimal value: `df.tail(1).to_dict('list')` gives this kind of data
{'nested_col1': [array([{'key1': 'CO', 'key2': Decimal('8.940000000')}],
dtype=object)], 'nested_col2': [array([{'key3': 'CO', 'key4': 'P14', 'key5': Decimal('8.940000000'), 'key6': None}],
dtype=object)]}
I am trying to explode the dataframe with this:
df = (df.drop(cols, axis=1)
.join(pd.concat(
[pd.json_normalize(df[x].explode(), errors='ignore').applymap(
lambda x: str(x) if isinstance(x, (int, float)) else x).add_prefix(f'{x}.') for x in
cols],
axis=1)))
With this I am getting below error in some cases:
Traceback (most recent call last):
File "data_load.py.py", line 365, in <module>
df = prepare_data(data, transaction_id, cohort_no)
File "data_load.py.py", line 274, in prepare_data
df = flatten_dataframe(cols_to_explode, df)
File "data_load.py.py", line 204, in flatten_dataframe
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "data_load.py.py", line 204, in <listcomp>
df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in _json_normalize
if any([isinstance(x, dict) for x in y.values()] for y in data):
File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in <genexpr>
if any([isinstance(x, dict) for x in y.values()] for y in data):
AttributeError: 'float' object has no attribute 'values'
failed to run commands: exit status 1
anything still I am missing here or any better way to do the same?
Expected Output should be:
nested_col1.key1,nested_col1.key2 nested_col2.key3 ... like this
</details>
# 答案1
**得分**: 1
```plaintext
似乎每个嵌套列中只有一个元素:
``` out = pd.concat([pd.json_normalize(df[x][0]).add_prefix(f'{x}.')
for x in cols], axis=1)
out = out.apply(pd.to_numeric, errors='coerce').fillna(out)
print(out)
# 输出
nested_col1.key1 nested_col1.key2 nested_col2.key3 nested_col2.key4 nested_col2.key5 nested_col2.key6
0 CO 8.94 CO P14 8.94 None
注意:如果每个嵌套列表中有多个记录,可以用.explode()
来替代[0]
。
<details>
<summary>英文:</summary>
It seems there is only one element for each nested column:
out = pd.concat([pd.json_normalize(df[x][0]).add_prefix(f'{x}.')
for x in cols], axis=1)
out = out.apply(pd.to_numeric, errors='coerce').fillna(out)
print(out)
Output
nested_col1.key1 nested_col1.key2 nested_col2.key3 nested_col2.key4 nested_col2.key5 nested_col2.key6
0 CO 8.94 CO P14 8.94 None
Note: you can replace `[0]` by `.explode()` if you have more than one record in each nested list.
</details>
# 答案2
**得分**: 1
以下是代码的翻译部分:
你可以使用 [`json_normalize`](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html) 和 [`concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):
```python
cols = ['nested_col1', 'nested_col2']
out = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
keys=cols, axis=1)
out.columns = out.columns.map('.'.join)
输出结果:
nested_col1.key1 nested_col1.key2 nested_col2.key3 nested_col2.key4 nested_col2.key5 nested_col2.key6
0 CO 8.940000000 CO P14 8.940000000 None
英文:
You can use json_normalize
and concat
:
cols = ['nested_col1', 'nested_col2']
out = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
keys=cols, axis=1)
out.columns = out.columns.map('.'.join)
Output:
nested_col1.key1 nested_col1.key2 nested_col2.key3 nested_col2.key4 nested_col2.key5 nested_col2.key6
0 CO 8.940000000 CO P14 8.940000000 None
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论