英文:
Explode a column vertically top create new columns
问题
I have a Dataframe like this:
name zones
aa []
bb [{"rack":11,"bin":22},{"rack":33,"bin":44}]
Now I want to transform into something like this:
name rack bin
aa - -
bb 11 22
bb 33 44
I tried this:
cols = ['zones',]
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.')
for x in cols], axis=1)))
But it only gives the values of the first dictionary:
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
英文:
I have a Dataframe like this :
name zones
aa []
bb [{"rack":11,"bin":22},{"rack":33,"bin":44}]
Now I want to transform into something like this:
name rack bin
aa - -
bb 11 22
bb 33 44
I tried this:
cols = ['zones',]
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.')
for x in cols], axis=1)))
But it only gives the values of first dictionary:
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
答案1
得分: 1
你可以在使用[`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)和连接之前使用[`explode`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html):
out = df.explode('zones', ignore_index=True)
out = out.join(pd.json_normalize(out.pop('zones')).add_prefix('zones.'))
*注意:如果要用`'-'`填充NaN,可以添加`.fillna(''-'')`。*
输出:
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
2 bb 33.0 44.0
如果有多个列需要处理,可以使用循环和[`concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html):
cols = ['zones', 'zones2']
out = df2.explode(cols, ignore_index=True)
out = out.join(pd.concat([pd.json_normalize(out.pop(col))
.add_prefix(f'{col}.')
for col in cols
], axis=1)
)
例如,如果有一个`zones2`列:
name zones.rack zones.bin zones2.rack zones2.bin
0 aa NaN NaN NaN NaN
1 bb 11.0 22.0 11.0 22.0
2 bb 33.0 44.0 33.0 44.0
<details>
<summary>英文:</summary>
You could [`explode`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html) before converting to columns with [`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) and joining:
out = df.explode('zones', ignore_index=True)
out = out.join(pd.json_normalize(out.pop('zones')).add_prefix('zones.'))
*NB. add `.fillna('-')` if you want to fill the NaNs with `'-'`.*
Output:
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
2 bb 33.0 44.0
If you have more than one column to handle, use a loop and [`concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html):
cols = ['zones', 'zones2']
out = df2.explode(cols, ignore_index=True)
out = out.join(pd.concat([pd.json_normalize(out.pop(col))
.add_prefix(f'{col}.')
for col in cols
], axis=1)
)
Example if there was a column `zones2`:
name zones.rack zones.bin zones2.rack zones2.bin
0 aa NaN NaN NaN NaN
1 bb 11.0 22.0 11.0 22.0
2 bb 33.0 44.0 33.0 44.0
</details>
# 答案2
**得分**: 0
需要在[`Series.explode`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.explode.html)创建的系列上创建索引,然后在[`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)之后执行以下代码:
```python
cols = ['zones',]
dfs = []
for x in cols:
s = df[x].explode()
df1 = pd.json_normalize(s).set_index(s.index).add_prefix(f'{x}.')
dfs.append(df1)
df1 = df.drop(cols, axis=1).join(pd.concat(dfs, axis=1))
print(df1)
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
1 bb 33.0 44.0
英文:
You need create indices by Series created in Series.explode
after json_normalize
:
cols = ['zones',]
dfs = []
for x in cols:
s = df[x].explode()
df1 = pd.json_normalize(s).set_index(s.index).add_prefix(f'{x}.')
dfs.append(df1)
df1 = df.drop(cols, axis=1).join(pd.concat(dfs, axis=1))
print (df1)
name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
1 bb 33.0 44.0
答案3
得分: 0
以下是翻译好的部分:
这是另一种选项:
out = (
df.join(df.pop("zones").explode().dropna()
.apply(lambda x: pd.Series({k: v for k,v in x.items()})))
.fillna("-") #此链是可选的
)
输出:
print(out)
names rack bin
0 aa - -
1 bb 11.00 22.00
1 bb 33.00 44.00
英文:
Here is another option :
out = (
df.join(df.pop("zones").explode().dropna()
.apply(lambda x: pd.Series({k: v for k,v in x.items()})))
.fillna("-") #this chain is optional though
)
Output :
print(out)
names rack bin
0 aa - -
1 bb 11.00 22.00
1 bb 33.00 44.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论