删除具有两个空行的列,每次出现一系列的0时。

huangapple go评论103阅读模式
英文:

Pandas: Dropping columns with 2 blank lines every time a series of 0 appear

问题

我有一大块数据,有很多列,但这些列在某些点上都为0。每当在列"two"中出现0时,我希望将该列删除,并在下面留下2个空行。

  1. one two three
  2. 1 4 4
  3. 3 5 5
  4. 5 7 5
  5. 666 0 6
  6. 785 0 8
  7. 455 0 9
  8. 454 0 9
  9. 12 2 8
  10. 23 5 9
  11. 2 3 7
  12. 1 5 5
  13. 123 0 7
  14. 123 0 7
  15. 3 5 5
  16. desired
  17. output:
  18. one two three
  19. 1 4 4
  20. 3 5 5
  21. 5 7 5
  22. 12 2 8
  23. 23 5 9
  24. 2 3 7
  25. 1 5 5
  26. 3 5 5
英文:

I have a big chunk of data, with a lot of columns, but this columns present at some points 0. I want every time that a 0 appears in the column "two", that column to drop it with 2 blank lines.

  1. one two three
  2. 1 4 4
  3. 3 5 5
  4. 5 7 5
  5. 666 0 6
  6. 785 0 8
  7. 455 0 9
  8. 454 0 9
  9. 12 2 8
  10. 23 5 9
  11. 2 3 7
  12. 1 5 5
  13. 123 0 7
  14. 123 0 7
  15. 3 5 5
  16. (desired)
  17. output:
  18. one two three
  19. 1 4 4
  20. 3 5 5
  21. 5 7 5
  22. 12 2 8
  23. 23 5 9
  24. 2 3 7
  25. 1 5 5
  26. 3 5 5

I tried different function: split, groupby, drop with conditions......., but noone didn't meet my request (most probably because I suck at coding)

答案1

得分: 1

以下是您要翻译的代码部分:

使用自定义的 groupby

  1. m = df['two'].ne(0)
  2. out = (df[m]
  3. .groupby((m & ~m.shift(fill_value=False)).cumsum(), group_keys=False)
  4. .apply(lambda g: pd.concat([g, pd.DataFrame('', columns=g.columns, index=[0, 1])]))
  5. .reset_index(drop=True)
  6. )
  7. print(out.to_string(index=False))

或者使用一个巧妙的 repeat 方法:

  1. N = 2
  2. m1 = df['two'].ne(0)
  3. m2 = (m1 & ~m1.shift(fill_value=True))
  4. idx = df.index[m1].repeat(m2[m1]*N+1)
  5. out = df.loc[idx]
  6. out[out.index.duplicated()] = ''
  7. print(out.to_string(index=False))

输出:

  1. one two three
  2. 1 4 4
  3. 3 5 5
  4. 5 7 5
  5. 12 2 8
  6. 23 5 9
  7. 2 3 7
  8. 1 5 5
  9. 3 5 5
英文:

You can use a custom groupby:

  1. m = df['two'].ne(0)
  2. out = (df[m]
  3. .groupby((m & ~m.shift(fill_value=False)).cumsum(), group_keys=False)
  4. .apply(lambda g: pd.concat([g, pd.DataFrame('', columns=g.columns, index=[0, 1])]))
  5. .reset_index(drop=True)
  6. )
  7. print(out.to_string(index=False))

Or with a hacky repeat:

  1. N = 2
  2. m1 = df['two'].ne(0)
  3. m2 = (m1 & ~m1.shift(fill_value=True))
  4. idx = df.index[m1].repeat(m2[m1]*N+1)
  5. out = df.loc[idx]
  6. out[out.index.duplicated()] = ''
  7. print(out.to_string(index=False))

Output:

  1. one two three
  2. 1 4 4
  3. 3 5 5
  4. 5 7 5
  5. 12 2 8
  6. 23 5 9
  7. 2 3 7
  8. 1 5 5
  9. 3 5 5

答案2

得分: 0

pd.Index.union() 方法有一个 sort 选项,所以在重新索引时,它应该按照正确的顺序进行操作。

  1. m = df['two'].ne(0)
  2. df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) & ~m].index.repeat(2) + .5)).reset_index(drop=True)

输出:

  1. one two three
  2. 0 1.0 4.0 4.0
  3. 1 3.0 5.0 5.0
  4. 2 5.0 7.0 5.0
  5. 3 NaN NaN NaN
  6. 4 NaN NaN NaN
  7. 5 12.0 2.0 8.0
  8. 6 23.0 5.0 9.0
  9. 7 2.0 3.0 7.0
  10. 8 1.0 5.0 5.0
  11. 9 NaN NaN NaN
  12. 10 NaN NaN NaN
  13. 11 3.0 5.0 5.0
英文:

Here is another way:

pd.Index.union() has a sort option, so when reindexing, it should be in the correct order

  1. m = df['two'].ne(0)
  2. df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) & ~m].index.repeat(2) + .5)).reset_index(drop=True)

Output:

  1. one two three
  2. 0 1.0 4.0 4.0
  3. 1 3.0 5.0 5.0
  4. 2 5.0 7.0 5.0
  5. 3 NaN NaN NaN
  6. 4 NaN NaN NaN
  7. 5 12.0 2.0 8.0
  8. 6 23.0 5.0 9.0
  9. 7 2.0 3.0 7.0
  10. 8 1.0 5.0 5.0
  11. 9 NaN NaN NaN
  12. 10 NaN NaN NaN
  13. 11 3.0 5.0 5.0

huangapple
  • 本文由 发表于 2023年5月11日 19:31:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76227165.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定