英文:
How to drop a row in Pandas if a third letter in a column is W?
问题
我有这样的数据框:
ID | Jan | Feb | Mar |
---|---|---|---|
20WAST | 2 | 2 | 5 |
20S22 | 0 | 0 | 1 |
20W1ST | 2 | 2 | 5 |
200122 | 0 | 0 | 1 |
我想要删除所有第一列中第三个字母是 'W' 的行,以获得以下输出:
ID | Jan | Feb | Mar |
---|---|---|---|
20S22 | 0 | 0 | 1 |
200122 | 0 | 0 | 1 |
这是一个非常大的数据框,我尝试了类似这样的操作:
df[df.ID.str[2] != 'W']
但这只选择了第二行的项。我可能可以遍历数据框,但想看看是否有更好的选项。
英文:
I have a dataframe of this kind:
ID | Jan | Feb | Mar |
---|---|---|---|
20WAST | 2 | 2 | 5 |
20S22 | 0 | 0 | 1 |
20W1ST | 2 | 2 | 5 |
200122 | 0 | 0 | 1 |
And I want to drop all the rows where the third letter in the first column is a 'W' to give an output:
ID | Jan | Feb | Mar |
---|---|---|---|
20S22 | 0 | 0 | 1 |
200122 | 0 | 0 | 1 |
It is a very large dataframe and I tried doing something like this:
df[df.ID[2] != 'W']
But this only selects the item in the second row. I could potentially iterate over the dataframe but wanted to see if there was a better option.
答案1
得分: 4
df = df[df['ID'].str[2].ne('W')]
在进行此选择后,您可能需要重置索引。
英文:
You are almost there. Use:
df= df[df['ID'].str[2].ne('W')]
you might want to reset the index after this selection
答案2
得分: 0
你可以使用正则表达式来查找第3个字符。
out = df[df['ID'].str.contains('^.{2}(?!W)')]
# 或者
out = df[df['ID'].str.match('.{2}(?!W)')]
# 或者
out = df[df['ID'].str.match('.{2}[^W]')]
注意:str.contains
和 str.match
之间的区别是 str.match
从目标字符串的开头匹配。
print(out)
ID Jan Feb Mar
1 20S22 0 0 1
3 200122 0 0 1
英文:
You can use regex to find the 3rd character
out = df[df['ID'].str.contains('^.{2}(?!W)')]
# or
out = df[df['ID'].str.match('.{2}(?!W)')]
# or
out = df[df['ID'].str.match('.{2}[^W]')]
NOTE: difference between str.contains
and str.match
is that str.match
match the string from beginning of the target.
$ print(out)
ID Jan Feb Mar
1 20S22 0 0 1
3 200122 0 0 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论