英文:
Pandas: Dropping columns with 2 blank lines every time a series of 0 appear
问题
我有一大块数据,有很多列,但这些列在某些点上都为0。每当在列"two"中出现0时,我希望将该列删除,并在下面留下2个空行。
one two three
1 4 4
3 5 5
5 7 5
666 0 6
785 0 8
455 0 9
454 0 9
12 2 8
23 5 9
2 3 7
1 5 5
123 0 7
123 0 7
3 5 5
(desired)
output:
one two three
1 4 4
3 5 5
5 7 5
12 2 8
23 5 9
2 3 7
1 5 5
3 5 5
英文:
I have a big chunk of data, with a lot of columns, but this columns present at some points 0. I want every time that a 0 appears in the column "two", that column to drop it with 2 blank lines.
one two three
1 4 4
3 5 5
5 7 5
666 0 6
785 0 8
455 0 9
454 0 9
12 2 8
23 5 9
2 3 7
1 5 5
123 0 7
123 0 7
3 5 5
(desired)
output:
one two three
1 4 4
3 5 5
5 7 5
12 2 8
23 5 9
2 3 7
1 5 5
3 5 5
I tried different function: split, groupby, drop with conditions......., but noone didn't meet my request (most probably because I suck at coding)
答案1
得分: 1
以下是您要翻译的代码部分:
使用自定义的 groupby
:
m = df['two'].ne(0)
out = (df[m]
.groupby((m & ~m.shift(fill_value=False)).cumsum(), group_keys=False)
.apply(lambda g: pd.concat([g, pd.DataFrame('', columns=g.columns, index=[0, 1])]))
.reset_index(drop=True)
)
print(out.to_string(index=False))
或者使用一个巧妙的 repeat
方法:
N = 2
m1 = df['two'].ne(0)
m2 = (m1 & ~m1.shift(fill_value=True))
idx = df.index[m1].repeat(m2[m1]*N+1)
out = df.loc[idx]
out[out.index.duplicated()] = ''
print(out.to_string(index=False))
输出:
one two three
1 4 4
3 5 5
5 7 5
12 2 8
23 5 9
2 3 7
1 5 5
3 5 5
英文:
You can use a custom groupby
:
m = df['two'].ne(0)
out = (df[m]
.groupby((m & ~m.shift(fill_value=False)).cumsum(), group_keys=False)
.apply(lambda g: pd.concat([g, pd.DataFrame('', columns=g.columns, index=[0, 1])]))
.reset_index(drop=True)
)
print(out.to_string(index=False))
Or with a hacky repeat
:
N = 2
m1 = df['two'].ne(0)
m2 = (m1 & ~m1.shift(fill_value=True))
idx = df.index[m1].repeat(m2[m1]*N+1)
out = df.loc[idx]
out[out.index.duplicated()] = ''
print(out.to_string(index=False))
Output:
one two three
1 4 4
3 5 5
5 7 5
12 2 8
23 5 9
2 3 7
1 5 5
3 5 5
答案2
得分: 0
pd.Index.union()
方法有一个 sort
选项,所以在重新索引时,它应该按照正确的顺序进行操作。
m = df['two'].ne(0)
df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) & ~m].index.repeat(2) + .5)).reset_index(drop=True)
输出:
one two three
0 1.0 4.0 4.0
1 3.0 5.0 5.0
2 5.0 7.0 5.0
3 NaN NaN NaN
4 NaN NaN NaN
5 12.0 2.0 8.0
6 23.0 5.0 9.0
7 2.0 3.0 7.0
8 1.0 5.0 5.0
9 NaN NaN NaN
10 NaN NaN NaN
11 3.0 5.0 5.0
英文:
Here is another way:
pd.Index.union()
has a sort
option, so when reindexing, it should be in the correct order
m = df['two'].ne(0)
df.reindex(df.loc[m].index.union(df.loc[m.diff().ne(0) & ~m].index.repeat(2) + .5)).reset_index(drop=True)
Output:
one two three
0 1.0 4.0 4.0
1 3.0 5.0 5.0
2 5.0 7.0 5.0
3 NaN NaN NaN
4 NaN NaN NaN
5 12.0 2.0 8.0
6 23.0 5.0 9.0
7 2.0 3.0 7.0
8 1.0 5.0 5.0
9 NaN NaN NaN
10 NaN NaN NaN
11 3.0 5.0 5.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论