英文:
Compacting data in a pandas DataFrame by removing NaNs and shifting values left to reduce number of columns
问题
I have a data frame that looks as below:
5.29559 NaN 2.38176 NaN 0.51521 NaN 0.04454 0.00000 None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 0 2 None None None
4.32454 NaN 1.77600 NaN 0.04454 NaN 0.00000 None None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 2 None None None None
我尝试通过删除所有NaN值来生成一个数据框,尝试使当前的数据框看起来像这样:
5.29559 2.38176 0.51521 0.04454 0.00000
0 0 0 0 2
4.32454 1.77600 0.04454 0.00000
0 0 0 2
有人可以帮忙吗?我尝试了dropna()方法,但没有帮助。
英文:
I have a data frame that looks as below:
5.29559 NaN 2.38176 NaN 0.51521 NaN 0.04454 0.00000 None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 0 2 None None None
4.32454 NaN 1.77600 NaN 0.04454 NaN 0.00000 None None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 2 None None None None
I am trying to generate a data frame by remove all the NaN values and trying to make the current data frame look like this:
5.29559 2.38176 0.51521 0.04454 0.00000
0 0 0 0 2
4.32454 1.77600 0.04454 0.00000
0 0 0 2
Can someone please help?
I tried the dropna() method but it did not help.
答案1
得分: 2
让我们尝试堆叠以消除NaN值,然后为每个级别重置索引,最后再次取消堆叠:
(df.stack()
.groupby(level=0)
.apply(lambda df: df.reset_index(drop=True))
.unstack())
解释:
首先,堆叠以去除NaN值:
df.stack()
接下来,您会注意到索引的内部级别并不是单调递增的。让我们使用groupby.apply修复这个问题:
_.groupby(level=0).apply(lambda df: df.reset_index(drop=True))
现在我们取消堆叠:
_.unstack()
最终的结果如下:
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
如您所需。
英文:
Let's try stacking to eliminate nans, then reset the index for each level and finally unstack again:
(df.stack()
.groupby(level=0)
.apply(lambda df: df.reset_index(drop=True))
.unstack())
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
Explanation:
First, stack to remove NaNs
df.stack()
0 0 5.29559
2 2.38176
4 0.51521
6 0.04454
7 0.00000
1 0 0.00000
5 0.00000
8 0.00000
11 0.00000
12 2.00000
2 0 4.32454
2 1.77600
4 0.04454
6 0.00000
3 0 0.00000
5 0.00000
8 0.00000
11 2.00000
dtype: float64
You'll notice the inner level of the index isn't monotonically increasing. let's fix that with groupby.apply
_.groupby(level=0).apply(lambda df: df.reset_index(drop=True))
0 0 5.29559
1 2.38176
2 0.51521
3 0.04454
4 0.00000
1 0 0.00000
1 0.00000
2 0.00000
3 0.00000
4 2.00000
2 0 4.32454
1 1.77600
2 0.04454
3 0.00000
3 0 0.00000
1 0.00000
2 0.00000
3 2.00000
dtype: float64
now we unstack
_.unstack()
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
答案2
得分: 1
你可以使用自定义函数来从每一行中移除空值:
>>> df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
英文:
You can use a custom function to remove null values from each row:
>>> df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
答案3
得分: 1
Here is the translated code:
df = pd.DataFrame(your_table)
df = df.dropna(axis=1)
df = pd.DataFrame(df.values.reshape(-1, 5), columns=['col1', 'col2', 'col3', 'col4', 'col5'])
英文:
try this :
df = pd.DataFrame(your_table)
df = df.dropna(axis=1)
df = pd.DataFrame(df.values.reshape(-1, 5), columns=['col1', 'col2', 'col3', 'col4', 'col5'])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论