英文:
Pandas extract only capturing first character
问题
df["type"] = df["callsign"].str.extractall(r'([^W0-9-])').groupby(level=0).apply(''.join)
英文:
I want to do an extract to capture all characters that match a regular expression and add those extracted characters to another column. When I run the code below, it only captures the first character. I want to capture all the letters, except W, and also no numbers or any dashes.
Here's the code:
df["type"] = df["callsign"].str.extract(r'([^W0-9-])')
Currently the data frame shows the below result.
callsign | type |
---|---|
1AB3-W9 | A |
23DC-W0 | D |
But I need it to produce:
callsign | type |
---|---|
1AB3-W9 | AB |
23DC-W0 | DC |
答案1
得分: 1
使用 replace()
替换不需要的字符为一个空字符串,不要使用 extract()
。
df["type"] = df["callsign"].str.replace(r'[W0-9-]', '')
英文:
Don't use extract()
, use replace()
to replace the unwanted characters with an empty string.
df["type"] = df["callsign"].str.replace(r'[W0-9-]', '')
答案2
得分: 1
假设您想要提取在“-W”之前的字母,请使用:
df["type"] = df["callsign"].str.extract(r'([a-zA-Z]+)-W')
对于第一组不包括“W”的字母,您漏掉了一个“+”:
df["callsign"].str.extract(r'([^W0-9-]+)')
英文:
Assuming you want to extract
the letters right before the -W
, use:
df["type"] = df["callsign"].str.extract(r'([a-zA-Z]+)-W')
For the first set of letters that are not W
, you're missing a +
:
df["callsign"].str.extract(r'([^W0-9-]+)')
答案3
得分: 0
另一种有效的方法是使用 findall
:
df["callsign"].str.findall(r'([^W0-9-])')
这将为您提供一个包含所有匹配项的列表,然后您可以将它们连接起来:
df["type"] = df["callsign"].str.findall(r'([^W0-9-])').str.join("")
英文:
Alternatively to the other valid answers, you can use findall
:
df["callsign"].str.findall(r'([^W0-9-])')
This will give you a list will all the matches, you can then join it:
df["type"] = df["callsign"].str.findall(r'([^W0-9-])').str.join("")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论