将一列垂直拆分以创建新列。

huangapple go评论76阅读模式
英文:

Explode a column vertically top create new columns

问题

I have a Dataframe like this:

name	zones
aa	[]
bb	[{"rack":11,"bin":22},{"rack":33,"bin":44}]

Now I want to transform into something like this:

name	rack	bin
aa	-	-
bb	11	22
bb	33	44

I tried this:

cols = ['zones',]

df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.')
for x in cols], axis=1)))

But it only gives the values of the first dictionary:

name  zones.rack  zones.bin
0   aa         NaN        NaN
1   bb        11.0       22.0
英文:

I have a Dataframe like this :

name	zones
aa	[]
bb	[{"rack":11,"bin":22},{"rack":33,"bin":44}]

Now I want to transform into something like this:

name	rack	bin
aa	-	-
bb	11	22
bb	33	44

I tried this:

cols = ['zones',]

df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.')
for x in cols], axis=1)))

But it only gives the values of first dictionary:

name  zones.rack  zones.bin
0   aa         NaN        NaN
1   bb        11.0       22.0

答案1

得分: 1

你可以在使用[`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)和连接之前使用[`explode`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html):

out = df.explode('zones', ignore_index=True)
out = out.join(pd.json_normalize(out.pop('zones')).add_prefix('zones.'))

*注意:如果要用`'-'`填充NaN,可以添加`.fillna(''-'')`。*

输出:

name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
2 bb 33.0 44.0


如果有多个列需要处理,可以使用循环和[`concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html):

cols = ['zones', 'zones2']

out = df2.explode(cols, ignore_index=True)
out = out.join(pd.concat([pd.json_normalize(out.pop(col))
.add_prefix(f'{col}.')
for col in cols
], axis=1)
)

例如,如果有一个`zones2`列:

name zones.rack zones.bin zones2.rack zones2.bin
0 aa NaN NaN NaN NaN
1 bb 11.0 22.0 11.0 22.0
2 bb 33.0 44.0 33.0 44.0


<details>
<summary>英文:</summary>

You could [`explode`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html) before converting to columns with [`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) and joining:

out = df.explode('zones', ignore_index=True)
out = out.join(pd.json_normalize(out.pop('zones')).add_prefix('zones.'))

*NB. add `.fillna(&#39;-&#39;)` if you want to fill the NaNs with `&#39;-&#39;`.*

Output:

name zones.rack zones.bin
0 aa NaN NaN
1 bb 11.0 22.0
2 bb 33.0 44.0


If you have more than one column to handle, use a loop and [`concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html):

cols = ['zones', 'zones2']

out = df2.explode(cols, ignore_index=True)
out = out.join(pd.concat([pd.json_normalize(out.pop(col))
.add_prefix(f'{col}.')
for col in cols
], axis=1)
)

Example if there was a column `zones2`:

name zones.rack zones.bin zones2.rack zones2.bin
0 aa NaN NaN NaN NaN
1 bb 11.0 22.0 11.0 22.0
2 bb 33.0 44.0 33.0 44.0


</details>



# 答案2
**得分**: 0

需要在[`Series.explode`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.explode.html)创建的系列上创建索引,然后在[`json_normalize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)之后执行以下代码:

```python
cols = ['zones',]

dfs = []
for x in cols:
    s = df[x].explode()
    df1 = pd.json_normalize(s).set_index(s.index).add_prefix(f'{x}.')
    dfs.append(df1)

df1 = df.drop(cols, axis=1).join(pd.concat(dfs, axis=1))
print(df1)
  name  zones.rack  zones.bin
0   aa         NaN        NaN
1   bb        11.0       22.0
1   bb        33.0       44.0
英文:

You need create indices by Series created in Series.explode after json_normalize:

cols = [&#39;zones&#39;,]

dfs = []
for x in cols:
    s = df[x].explode()
    df1 = pd.json_normalize(s).set_index(s.index).add_prefix(f&#39;{x}.&#39;)
    dfs.append(df1)

df1 = df.drop(cols, axis=1).join(pd.concat(dfs, axis=1))
print (df1)
  name  zones.rack  zones.bin
0   aa         NaN        NaN
1   bb        11.0       22.0
1   bb        33.0       44.0

答案3

得分: 0

以下是翻译好的部分:

这是另一种选项

out = (
    df.join(df.pop("zones").explode().dropna()
         .apply(lambda x: pd.Series({k: v for k,v in x.items()})))
         .fillna("-") #此链是可选的
)

输出

print(out)

  names  rack   bin
0    aa     -     -
1    bb 11.00 22.00
1    bb 33.00 44.00
英文:

Here is another option :

out = (
    df.join(df.pop(&quot;zones&quot;).explode().dropna()
         .apply(lambda x: pd.Series({k: v for k,v in x.items()})))
         .fillna(&quot;-&quot;) #this chain is optional though
)

Output :

print(out)

  names  rack   bin
0    aa     -     -
1    bb 11.00 22.00
1    bb 33.00 44.00

huangapple
  • 本文由 发表于 2023年5月15日 14:11:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76251296.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定