英文:
How to identify UPPER case strings and move place
问题
我已创建了这个pandas数据框:
ds = {"col1": ["ROSSI Mauro", "Luca Giacomini", "Sonny Crockett"]}
df = pd.DataFrame(data=ds)
它看起来像这样:
print(df)
col1
0 ROSSI Mauro
1 Luca Giacomini
2 Sonny Crockett
让我们看一下col1
列,其中包含一些名字和姓氏(顺序不同)。
如果一个字符串全是大写字母(例如,像记录0中的ROSSI
),那么它是一个姓氏,我需要将它移动到非全大写字母字符串之后。
因此,最终的数据框将如下所示:
col1
0 Mauro ROSSI
1 Luca Giacomini
2 Sonny Crockett
有人知道如何识别col1
中的全大写字符串并将其移到非全大写字符串之后吗?
英文:
I have created this pandas dataframe:
ds = {"col1":["ROSSI Mauro", "Luca Giacomini", "Sonny Crockett"]}
df = pd.DataFrame(data=ds)
Which looks like this:
print(df)
col1
0 ROSSI Mauro
1 Luca Giacomini
2 Sonny Crockett
Let's take a look at the column col1
, which contains some names and last names (in different order).
If a string is in all UPPER case (for example, like ROSSI
in record 0), then it is a last name and I need to move it after the non all-upper case string.
So, the resulting dataframe would look like this:
col1
0 Mauro ROSSI
1 Luca Giacomini
2 Sonny Crockett
Does anyone know how to identify the all-upper case string in col1 and move it after the non all-upper case string?
答案1
得分: 3
你可以使用str.replace
与自定义函数:
df['col1'] = df['col1'].str.replace(r'(\S+)\s*(\S+)',
lambda m: f'{m.group(2)} {m.group(1)}'
if m.group(1).isupper() else m.group(0))
tmp = df['col1'].str.extract(r'(\S+)\s*(\S+)')
df.loc[tmp[0].str.isupper(), 'col1'] = tmp[1] + ' ' + tmp[0]
注意:这假设名称只有2个不同的单词,如果不是这样,你需要相应地调整正则表达式(正则表达式示例)。
输出:
col1
0 Mauro ROSSI
1 Luca Giacomini
2 Sonny Crockett
英文:
You can use str.replace
with a custom function:
df['col1'] = df['col1'].str.replace(r'(\S+)\s*(\S+)',
lambda m: f'{m.group(2)} {m.group(1)}'
if m.group(1).isupper() else m.group(0))
Or temporary Series and boolean indexing with str.upper
:
tmp = df['col1'].str.extract(r'(\S+)\s*(\S+)')
df.loc[tmp[0].str.isupper(), 'col1'] = tmp[1] + ' ' + tmp[0]
NB. this assumes that names are only 2 distinct words, if not you need to adapt the regex accordingly (regex demo).
Output:
col1
0 Mauro ROSSI
1 Luca Giacomini
2 Sonny Crockett
答案2
得分: 3
我们还可以在str.replace
中使用正则表达式的捕获组:
df['col1 new'] = df['col1'].str.replace('([A-Z]+)\\b(.*)', '\\2 \\1')
输出:
col1 col1 new
0 ROSSI Mauro Mauro ROSSI
1 Luca Giacomini Luca Giacomini
2 Sonny Crockett Sonny Crockett
使用括号()
创建捕获组,使用\\b
作为单词边界,我们可以使用\\2
和\\1
重新排序这些组。对于更复杂的数据,您可能需要调整您的正则表达式。
英文:
We can also use captured groups with regex in str.replace
:
df['col1 new'] = df['col1'].str.replace('([A-Z]+)\\b(.*)', '\ \')
Output:
col1 col1 new
0 ROSSI Mauro Mauro ROSSI
1 Luca Giacomini Luca Giacomini
2 Sonny Crockett Sonny Crockett
Using the () to make a captured group, with \b as a word boundary, we can use \2 and \1 to reorder the groups. With more complex data, you'll probably have to adjust your regex.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论