2023年7月3日 21:35:15go评论69阅读模式

英文:

How to identify UPPER case strings and move place

问题

我已创建了这个pandas数据框：

ds = {"col1": ["ROSSI Mauro", "Luca Giacomini", "Sonny Crockett"]}
df = pd.DataFrame(data=ds)

它看起来像这样：

print(df)
             col1
0     ROSSI Mauro
1  Luca Giacomini
2  Sonny Crockett

让我们看一下col1列，其中包含一些名字和姓氏（顺序不同）。
如果一个字符串全是大写字母（例如，像记录0中的ROSSI），那么它是一个姓氏，我需要将它移动到非全大写字母字符串之后。

因此，最终的数据框将如下所示：

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett

有人知道如何识别col1中的全大写字符串并将其移到非全大写字符串之后吗？

英文:

I have created this pandas dataframe:

ds = {&quot;col1&quot;:[&quot;ROSSI Mauro&quot;, &quot;Luca Giacomini&quot;, &quot;Sonny Crockett&quot;]}
df = pd.DataFrame(data=ds)

Which looks like this:

print(df)
             col1
0     ROSSI Mauro
1  Luca Giacomini
2  Sonny Crockett

Let's take a look at the column col1, which contains some names and last names (in different order).
If a string is in all UPPER case (for example, like ROSSI in record 0), then it is a last name and I need to move it after the non all-upper case string.

So, the resulting dataframe would look like this:

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett

Does anyone know how to identify the all-upper case string in col1 and move it after the non all-upper case string?

答案1

得分: 3

你可以使用str.replace与自定义函数：

df['col1'] = df['col1'].str.replace(r'(\S+)\s*(\S+)',
                                    lambda m: f'{m.group(2)} {m.group(1)}'
                                    if m.group(1).isupper() else m.group(0))

或者使用临时的Series和布尔索引与str.upper：

tmp = df['col1'].str.extract(r'(\S+)\s*(\S+)')
df.loc[tmp[0].str.isupper(), 'col1'] = tmp[1] + ' ' + tmp[0]

注意：这假设名称只有2个不同的单词，如果不是这样，你需要相应地调整正则表达式（正则表达式示例)。

输出：

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett

英文:

You can use str.replace with a custom function:

df[&#39;col1&#39;] = df[&#39;col1&#39;].str.replace(r&#39;(\S+)\s*(\S+)&#39;,
                                    lambda m: f&#39;{m.group(2)} {m.group(1)}&#39;
                                    if m.group(1).isupper() else m.group(0))

Or temporary Series and boolean indexing with str.upper:

tmp = df[&#39;col1&#39;].str.extract(r&#39;(\S+)\s*(\S+)&#39;)
df.loc[tmp[0].str.isupper(), &#39;col1&#39;] = tmp[1] + &#39; &#39; + tmp[0]

NB. this assumes that names are only 2 distinct words, if not you need to adapt the regex accordingly (regex demo).

Output:

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett

答案2

得分: 3

我们还可以在str.replace中使用正则表达式的捕获组：

df['col1 new'] = df['col1'].str.replace('([A-Z]+)\\b(.*)', '\\2 \\1')

输出：

                 col1        col1 new
0     ROSSI Mauro     Mauro ROSSI
1  Luca Giacomini  Luca Giacomini
2  Sonny Crockett  Sonny Crockett

使用括号()创建捕获组，使用\\b作为单词边界，我们可以使用\\2和\\1重新排序这些组。对于更复杂的数据，您可能需要调整您的正则表达式。

英文:

We can also use captured groups with regex in str.replace:

df[&#39;col1 new&#39;] = df[&#39;col1&#39;].str.replace(&#39;([A-Z]+)\\b(.*)&#39;, &#39;\ \&#39;)

Output:

             col1        col1 new
0     ROSSI Mauro     Mauro ROSSI
1  Luca Giacomini  Luca Giacomini
2  Sonny Crockett  Sonny Crockett

Using the () to make a captured group, with \b as a word boundary, we can use \2 and \1 to reorder the groups. With more complex data, you'll probably have to adjust your regex.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何识别大写字符串并移动位置

问题

答案1

答案2

Pyspark: 如何使用不同条件和不同列连接两个不同的数据集？

对于每个组，根据另一列中的数值添加一个新的偏移列。

如何在polars中添加具有不同形状的多个DataFrame？

如何根据不同列中的值填充 Pandas DataFrame 中的空值？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。