压缩pandas DataFrame中的数据,通过移除NaN值并向左移动数值以减少列数。

huangapple go评论96阅读模式
英文:

Compacting data in a pandas DataFrame by removing NaNs and shifting values left to reduce number of columns

问题

I have a data frame that looks as below:

  1. 5.29559 NaN 2.38176 NaN 0.51521 NaN 0.04454 0.00000 None None None None None None None None
  2. 0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 0 2 None None None
  3. 4.32454 NaN 1.77600 NaN 0.04454 NaN 0.00000 None None None None None None None None None
  4. 0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 2 None None None None

我尝试通过删除所有NaN值来生成一个数据框,尝试使当前的数据框看起来像这样:

  1. 5.29559 2.38176 0.51521 0.04454 0.00000
  2. 0 0 0 0 2
  3. 4.32454 1.77600 0.04454 0.00000
  4. 0 0 0 2

有人可以帮忙吗?我尝试了dropna()方法,但没有帮助。

英文:

I have a data frame that looks as below:

  1. 5.29559 NaN 2.38176 NaN 0.51521 NaN 0.04454 0.00000 None None None None None None None None
  2. 0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 0 2 None None None
  3. 4.32454 NaN 1.77600 NaN 0.04454 NaN 0.00000 None None None None None None None None None
  4. 0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 2 None None None None

I am trying to generate a data frame by remove all the NaN values and trying to make the current data frame look like this:

  1. 5.29559 2.38176 0.51521 0.04454 0.00000
  2. 0 0 0 0 2
  3. 4.32454 1.77600 0.04454 0.00000
  4. 0 0 0 2

Can someone please help?
I tried the dropna() method but it did not help.

答案1

得分: 2

让我们尝试堆叠以消除NaN值,然后为每个级别重置索引,最后再次取消堆叠:

  1. (df.stack()
  2. .groupby(level=0)
  3. .apply(lambda df: df.reset_index(drop=True))
  4. .unstack())

解释:

首先,堆叠以去除NaN值:

  1. df.stack()

接下来,您会注意到索引的内部级别并不是单调递增的。让我们使用groupby.apply修复这个问题:

  1. _.groupby(level=0).apply(lambda df: df.reset_index(drop=True))

现在我们取消堆叠:

  1. _.unstack()

最终的结果如下:

  1. 0 1 2 3 4
  2. 0 5.29559 2.38176 0.51521 0.04454 0.0
  3. 1 0.00000 0.00000 0.00000 0.00000 2.0
  4. 2 4.32454 1.77600 0.04454 0.00000 NaN
  5. 3 0.00000 0.00000 0.00000 2.00000 NaN

如您所需。

英文:

Let's try stacking to eliminate nans, then reset the index for each level and finally unstack again:

  1. (df.stack()
  2. .groupby(level=0)
  3. .apply(lambda df: df.reset_index(drop=True))
  4. .unstack())
  5. 0 1 2 3 4
  6. 0 5.29559 2.38176 0.51521 0.04454 0.0
  7. 1 0.00000 0.00000 0.00000 0.00000 2.0
  8. 2 4.32454 1.77600 0.04454 0.00000 NaN
  9. 3 0.00000 0.00000 0.00000 2.00000 NaN

Explanation:

First, stack to remove NaNs

  1. df.stack()
  2. 0 0 5.29559
  3. 2 2.38176
  4. 4 0.51521
  5. 6 0.04454
  6. 7 0.00000
  7. 1 0 0.00000
  8. 5 0.00000
  9. 8 0.00000
  10. 11 0.00000
  11. 12 2.00000
  12. 2 0 4.32454
  13. 2 1.77600
  14. 4 0.04454
  15. 6 0.00000
  16. 3 0 0.00000
  17. 5 0.00000
  18. 8 0.00000
  19. 11 2.00000
  20. dtype: float64

You'll notice the inner level of the index isn't monotonically increasing. let's fix that with groupby.apply

  1. _.groupby(level=0).apply(lambda df: df.reset_index(drop=True))
  2. 0 0 5.29559
  3. 1 2.38176
  4. 2 0.51521
  5. 3 0.04454
  6. 4 0.00000
  7. 1 0 0.00000
  8. 1 0.00000
  9. 2 0.00000
  10. 3 0.00000
  11. 4 2.00000
  12. 2 0 4.32454
  13. 1 1.77600
  14. 2 0.04454
  15. 3 0.00000
  16. 3 0 0.00000
  17. 1 0.00000
  18. 2 0.00000
  19. 3 2.00000
  20. dtype: float64

now we unstack

  1. _.unstack()
  2. 0 1 2 3 4
  3. 0 5.29559 2.38176 0.51521 0.04454 0.0
  4. 1 0.00000 0.00000 0.00000 0.00000 2.0
  5. 2 4.32454 1.77600 0.04454 0.00000 NaN
  6. 3 0.00000 0.00000 0.00000 2.00000 NaN

答案2

得分: 1

你可以使用自定义函数来从每一行中移除空值:

  1. >>> df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
  2. 0 1 2 3 4
  3. 0 5.29559 2.38176 0.51521 0.04454 0.0
  4. 1 0.00000 0.00000 0.00000 0.00000 2.0
  5. 2 4.32454 1.77600 0.04454 0.00000 NaN
  6. 3 0.00000 0.00000 0.00000 2.00000 NaN
英文:

You can use a custom function to remove null values from each row:

  1. >>> df.agg(lambda x: pd.Series([v for v in x if pd.notna(v)]), axis=1)
  2. 0 1 2 3 4
  3. 0 5.29559 2.38176 0.51521 0.04454 0.0
  4. 1 0.00000 0.00000 0.00000 0.00000 2.0
  5. 2 4.32454 1.77600 0.04454 0.00000 NaN
  6. 3 0.00000 0.00000 0.00000 2.00000 NaN

答案3

得分: 1

Here is the translated code:

  1. df = pd.DataFrame(your_table)
  2. df = df.dropna(axis=1)
  3. df = pd.DataFrame(df.values.reshape(-1, 5), columns=['col1', 'col2', 'col3', 'col4', 'col5'])
英文:

try this :

  1. df = pd.DataFrame(your_table)
  2. df = df.dropna(axis=1)
  3. df = pd.DataFrame(df.values.reshape(-1, 5), columns=['col1', 'col2', 'col3', 'col4', 'col5'])

huangapple
  • 本文由 发表于 2023年4月17日 02:16:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029552.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定