2023年5月11日 19:31:27go评论103阅读模式

英文:

Pandas: Dropping columns with 2 blank lines every time a series of 0 appear

问题

我有一大块数据，有很多列，但这些列在某些点上都为0。每当在列"two"中出现0时，我希望将该列删除，并在下面留下2个空行。


one  two three 
1     4     4
3     5     5
5     7     5
666   0     6
785   0     8 
455   0     9 
454   0     9
12    2     8
23    5     9
2     3     7
1     5     5 
123   0     7 
123   0     7
3     5     5
 
（desired）
output:
one  two three 
1     4     4
3     5     5
5     7     5
12    2     8
23    5     9
2     3     7
1     5     5 
3     5     5

英文:

I have a big chunk of data, with a lot of columns, but this columns present at some points 0. I want every time that a 0 appears in the column "two", that column to drop it with 2 blank lines.


one  two three 
1     4     4
3     5     5
5     7     5
666   0     6
785   0     8 
455   0     9 
454   0     9
12    2     8
23    5     9
2     3     7
1     5     5 
123   0     7 
123   0     7
3     5     5
 
(desired)
output:
one  two three 
1     4     4
3     5     5
5     7     5
12    2     8
23    5     9
2     3     7
1     5     5 
3     5     5

I tried different function: split, groupby, drop with conditions......., but noone didn't meet my request (most probably because I suck at coding)

答案1

得分: 1

以下是您要翻译的代码部分：

使用自定义的 groupby：

m = df['two'].ne(0)
out = (df[m]
        .groupby((m & ~m.shift(fill_value=False)).cumsum(), group_keys=False)
        .apply(lambda g: pd.concat([g, pd.DataFrame('', columns=g.columns, index=[0, 1])]))
        .reset_index(drop=True)
      )
print(out.to_string(index=False))

或者使用一个巧妙的 repeat 方法：

N = 2
m1 = df['two'].ne(0)
m2 = (m1 & ~m1.shift(fill_value=True))
idx = df.index[m1].repeat(m2[m1]*N+1)
out = df.loc[idx]
out[out.index.duplicated()] = ''
print(out.to_string(index=False))

输出：

one two three
  1   4     4
  3   5     5
  5   7     5
             
 12   2     8
 23   5     9
  2   3     7
  1   5     5
             
  3   5     5

英文:

You can use a custom groupby:

m = df[&#39;two&#39;].ne(0)
out = (df[m]
        .groupby((m &amp; ~m.shift(fill_value=False)).cumsum(), group_keys=False)
        .apply(lambda g: pd.concat([g, pd.DataFrame(&#39;&#39;, columns=g.columns, index=[0, 1])]))
        .reset_index(drop=True)
      )
print(out.to_string(index=False))

Or with a hacky repeat:

N = 2
m1 = df[&#39;two&#39;].ne(0)
m2 = (m1 &amp; ~m1.shift(fill_value=True))
idx = df.index[m1].repeat(m2[m1]*N+1)
out = df.loc[idx]
out[out.index.duplicated()] = &#39;&#39;
print(out.to_string(index=False))

Output:

one two three
  1   4     4
  3   5     5
  5   7     5
             
             
 12   2     8
 23   5     9
  2   3     7
  1   5     5
             
             
  3   5     5

答案2

得分: 0

pd.Index.union() 方法有一个 sort 选项，所以在重新索引时，它应该按照正确的顺序进行操作。

m = df['two'].ne(0)
df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) & ~m].index.repeat(2) + .5)).reset_index(drop=True)

输出：

     one  two  three
0    1.0  4.0    4.0
1    3.0  5.0    5.0
2    5.0  7.0    5.0
3    NaN  NaN    NaN
4    NaN  NaN    NaN
5   12.0  2.0    8.0
6   23.0  5.0    9.0
7    2.0  3.0    7.0
8    1.0  5.0    5.0
9    NaN  NaN    NaN
10   NaN  NaN    NaN
11   3.0  5.0    5.0

英文:

Here is another way:

pd.Index.union() has a sort option, so when reindexing, it should be in the correct order

m = df[&#39;two&#39;].ne(0)
df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) &amp; ~m].index.repeat(2) + .5)).reset_index(drop=True)

Output:

     one  two  three
0    1.0  4.0    4.0
1    3.0  5.0    5.0
2    5.0  7.0    5.0
3    NaN  NaN    NaN
4    NaN  NaN    NaN
5   12.0  2.0    8.0
6   23.0  5.0    9.0
7    2.0  3.0    7.0
8    1.0  5.0    5.0
9    NaN  NaN    NaN
10   NaN  NaN    NaN
11   3.0  5.0    5.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

删除具有两个空行的列，每次出现一系列的0时。

问题

答案1

答案2

如何在每次使用Python（交互式和执行脚本时）时自动运行特定的导入？

在Python中，字典键的组合的字典值的乘积。

Alembic – 如何创建超级表

Pythonic方式将一个二维数组移到一个五维数组中？（家谱项目）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。