英文:
Pandas extract only capturing first character
问题
df["type"] = df["callsign"].str.extractall(r'([^W0-9-])').groupby(level=0).apply(''.join)
英文:
I want to do an extract to capture all characters that match a regular expression and add those extracted characters to another column. When I run the code below, it only captures the first character. I want to capture all the letters, except W, and also no numbers or any dashes.
Here's the code:
df["type"] = df["callsign"].str.extract(r'([^W0-9-])')
Currently the data frame shows the below result.
| callsign | type |
|---|---|
| 1AB3-W9 | A |
| 23DC-W0 | D |
But I need it to produce:
| callsign | type |
|---|---|
| 1AB3-W9 | AB |
| 23DC-W0 | DC |
答案1
得分: 1
使用 replace() 替换不需要的字符为一个空字符串,不要使用 extract()。
df["type"] = df["callsign"].str.replace(r'[W0-9-]', '')
英文:
Don't use extract(), use replace() to replace the unwanted characters with an empty string.
df["type"] = df["callsign"].str.replace(r'[W0-9-]', '')
答案2
得分: 1
假设您想要提取在“-W”之前的字母,请使用:
df["type"] = df["callsign"].str.extract(r'([a-zA-Z]+)-W')
对于第一组不包括“W”的字母,您漏掉了一个“+”:
df["callsign"].str.extract(r'([^W0-9-]+)')
英文:
Assuming you want to extract the letters right before the -W, use:
df["type"] = df["callsign"].str.extract(r'([a-zA-Z]+)-W')
For the first set of letters that are not W, you're missing a +:
df["callsign"].str.extract(r'([^W0-9-]+)')
答案3
得分: 0
另一种有效的方法是使用 findall:
df["callsign"].str.findall(r'([^W0-9-])')
这将为您提供一个包含所有匹配项的列表,然后您可以将它们连接起来:
df["type"] = df["callsign"].str.findall(r'([^W0-9-])').str.join("")
英文:
Alternatively to the other valid answers, you can use findall:
df["callsign"].str.findall(r'([^W0-9-])')
This will give you a list will all the matches, you can then join it:
df["type"] = df["callsign"].str.findall(r'([^W0-9-])').str.join("")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论