2023年4月17日 02:16:08go评论96阅读模式

英文:

Compacting data in a pandas DataFrame by removing NaNs and shifting values left to reduce number of columns

问题

I have a data frame that looks as below:

5.29559 	NaN 	2.38176 	NaN 	0.51521 	NaN 	0.04454 	0.00000 	None 	None 	None 	None 	None 	None 	None 	None
0 	NaN 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	0 	NaN 	NaN 	0 	2 	None 	None 	None
4.32454 	NaN 	1.77600 	NaN 	0.04454 	NaN 	0.00000 	None 	None 	None 	None 	None 	None 	None 	None 	None
0 	NaN 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	0 	NaN 	NaN 	2 	None 	None 	None 	None

我尝试通过删除所有NaN值来生成一个数据框，尝试使当前的数据框看起来像这样：

5.29559 	2.38176 	0.51521 	0.04454 	0.00000 	
      0 	      0 	   	  0 	 	  0 	      2 		
4.32454 	1.77600 	0.04454 	0.00000 	
      0 	      0 		  0 	      2

有人可以帮忙吗？我尝试了dropna()方法，但没有帮助。

英文:

I have a data frame that looks as below:

5.29559 	NaN 	2.38176 	NaN 	0.51521 	NaN 	0.04454 	0.00000 	None 	None 	None 	None 	None 	None 	None 	None
0 	NaN 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	0 	NaN 	NaN 	0 	2 	None 	None 	None
4.32454 	NaN 	1.77600 	NaN 	0.04454 	NaN 	0.00000 	None 	None 	None 	None 	None 	None 	None 	None 	None
0 	NaN 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	0 	NaN 	NaN 	2 	None 	None 	None 	None

I am trying to generate a data frame by remove all the NaN values and trying to make the current data frame look like this:

5.29559 	2.38176 	0.51521 	0.04454 	0.00000 	
      0 	      0 	   	  0 	 	  0 	      2 		
4.32454 	1.77600 	0.04454 	0.00000 	
      0 	      0 		  0 	      2

Can someone please help?
I tried the dropna() method but it did not help.

答案1

得分: 2

让我们尝试堆叠以消除NaN值，然后为每个级别重置索引，最后再次取消堆叠：

(df.stack()
   .groupby(level=0)
   .apply(lambda df: df.reset_index(drop=True))
   .unstack())

解释：

首先，堆叠以去除NaN值：

df.stack()

接下来，您会注意到索引的内部级别并不是单调递增的。让我们使用groupby.apply修复这个问题：

_.groupby(level=0).apply(lambda df: df.reset_index(drop=True))

现在我们取消堆叠：

_.unstack()

最终的结果如下：

         0        1        2        3    4
0  5.29559  2.38176  0.51521  0.04454  0.0
1  0.00000  0.00000  0.00000  0.00000  2.0
2  4.32454  1.77600  0.04454  0.00000  NaN
3  0.00000  0.00000  0.00000  2.00000  NaN

如您所需。

英文:

Let's try stacking to eliminate nans, then reset the index for each level and finally unstack again:

(df.stack()
   .groupby(level=0)
   .apply(lambda df: df.reset_index(drop=True))
   .unstack())
         0        1        2        3    4
0  5.29559  2.38176  0.51521  0.04454  0.0
1  0.00000  0.00000  0.00000  0.00000  2.0
2  4.32454  1.77600  0.04454  0.00000  NaN
3  0.00000  0.00000  0.00000  2.00000  NaN

Explanation:

First, stack to remove NaNs

df.stack()
0  0     5.29559
   2     2.38176
   4     0.51521
   6     0.04454
   7     0.00000
1  0     0.00000
   5     0.00000
   8     0.00000
   11    0.00000
   12    2.00000
2  0     4.32454
   2     1.77600
   4     0.04454
   6     0.00000
3  0     0.00000
   5     0.00000
   8     0.00000
   11    2.00000 
dtype: float64

You'll notice the inner level of the index isn't monotonically increasing. let's fix that with groupby.apply

_.groupby(level=0).apply(lambda df: df.reset_index(drop=True))
0  0    5.29559
   1    2.38176
   2    0.51521
   3    0.04454
   4    0.00000
1  0    0.00000
   1    0.00000
   2    0.00000
   3    0.00000
   4    2.00000
2  0    4.32454
   1    1.77600
   2    0.04454
   3    0.00000
3  0    0.00000
   1    0.00000
   2    0.00000
   3    2.00000
dtype: float64

now we unstack

_.unstack()
         0        1        2        3    4
0  5.29559  2.38176  0.51521  0.04454  0.0
1  0.00000  0.00000  0.00000  0.00000  2.0
2  4.32454  1.77600  0.04454  0.00000  NaN
3  0.00000  0.00000  0.00000  2.00000  NaN

答案2

得分: 1

你可以使用自定义函数来从每一行中移除空值：

>>> df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
         0        1        2        3    4
0  5.29559  2.38176  0.51521  0.04454  0.0
1  0.00000  0.00000  0.00000  0.00000  2.0
2  4.32454  1.77600  0.04454  0.00000  NaN
3  0.00000  0.00000  0.00000  2.00000  NaN

英文:

You can use a custom function to remove null values from each row:

&gt;&gt;&gt; df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
         0        1        2        3    4
0  5.29559  2.38176  0.51521  0.04454  0.0
1  0.00000  0.00000  0.00000  0.00000  2.0
2  4.32454  1.77600  0.04454  0.00000  NaN
3  0.00000  0.00000  0.00000  2.00000  NaN

答案3

得分: 1

Here is the translated code:

df = pd.DataFrame(your_table)
df = df.dropna(axis=1)
df = pd.DataFrame(df.values.reshape(-1, 5), columns=['col1', 'col2', 'col3', 'col4', 'col5'])

英文:

try this :

df = pd.DataFrame(your_table)
df = df.dropna(axis=1)
df = pd.DataFrame(df.values.reshape(-1, 5), columns=[&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;, &#39;col5&#39;])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

压缩pandas DataFrame中的数据，通过移除NaN值并向左移动数值以减少列数。

问题

答案1

答案2

答案3

Python TypeError: ‘datetime.datetime’ 对象不可订阅。无法打印所需的数字。

如何在保留现有参数的情况下更改 torch.nn.Linear 的输出大小？

你应该将我的Django应用程序Docker化，还是只创建一个虚拟环境？

`scoped_session.close()` 在 SQLAlchemy 中的含义是关闭一个 scoped session。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。