2023年2月14日 02:06:08go评论97阅读模式

英文:

Pandas DF: Create New Col by removing last word from of existing column

问题

这似乎是一个很简单的目标，但我可能想得太多，但我卡住了。关于我应该做什么的建议将不胜感激。

以下是正确的翻译部分：

"This should be easy, but I'm stumped." -> "这应该很容易，但我陷入了困境。"
"I have a df that includes a column of PLACENAMES. Some of these have multiple word names:" -> "我有一个包含PLACENAMES列的数据框。其中一些具有多个单词的名称："
"All I want to do is to create a new column in my df that has just the name, without the "county" word:" -> "我想做的只是在我的数据框中创建一个新列，只包含名称，不包括“county”这个词："
"1. Works - splits the names into a list ['St.','Louis','County']" -> "1. 有效 - 将名称分割成一个列表['St.','Louis','County']"
"2. The list splice is ignored, resulting in the same list ['St.','Louis','County'] rather than ['St.','Louis']" -> "2. 忽略了列表切片，导致相同的列表['St.','Louis','County']，而不是['St.','Louis']"
"3. Raises a ValueError: Length of values (2) does not match length of index (41414)" -> "3. 引发了一个ValueError错误：值的长度（2）与索引的长度（41414）不匹配"
"4. Raises a TypeError: sequence item 0: expected str instance, list found" -> "4. 引发了一个TypeError错误：序列项 0：期望 str 实例，但找到了列表"
"This also raises a TypeError: sequence item 0: expected str instance, list found" -> "这也引发了一个TypeError错误：序列项 0：期望 str 实例，但找到了列表"

希望这有助于您解决问题。

英文:

This should be easy, but I'm stumped.
I have a df that includes a column of PLACENAMES. Some of these have multiple word names:

Able County
Baker County
Charlie County
St. Louis County

All I want to do is to create a new column in my df that has just the name, without the "county" word:

Able
Baker
Charlie
St. Louis

I've tried a variety of things:

1. places[&#39;name_split&#39;] = places[&#39;PLACENAME&#39;].str.split()
2. places[&#39;name_split&#39;] = places[&#39;PLACENAME&#39;].str.split()[:-1]
3. places[&#39;name_split&#39;] = places[&#39;PLACENAME&#39;].str.rsplit(&#39; &#39;,1)[0]
4. places = places.assign(name_split = lambda x: &#39; &#39;.join(x[&#39;PLACENAME].str.split()[:-1]))

Works - splits the names into a list ['St.','Louis','County']
The list splice is ignored, resulting in the same list ['St.','Louis','County'] rather than ['St.','Louis']
Raises a ValueError: Length of values (2) does not match length of index (41414)
Raises a TypeError: sequence item 0: expected str instance, list found

I've also defined a function and called it with .assign():

def processField(namelist):
  words = namelist[:-1]
  name = &#39; &#39;.join(words)
  return name
places = places.assign(name_split = lambda x: processField(x[&#39;PLACENAME]))

This also raises a TypeError: sequence item 0: expected str instance, list found

This seems to be a very simple goal and I've probably overthought it, but I'm just stumped. Suggestions about what I should be doing would be deeply appreciated.

答案1

得分: 1

应用Series.str.rpartition函数：

places['name_split'] = places['PLACENAME'].str.rpartition()[0]

英文:

Apply Series.str.rpartition function:

places[&#39;name_split&#39;] = places[&#39;PLACENAME&#39;].str.rpartition()[0]

答案2

得分: 1

使用 str.replace 来移除最后一个单词和前面的空格：

places['new'] = place['PLACENAME'].str.replace(r'\s*\w+$', '', regex=True)
# 或者
places['new'] = place['PLACENAME'].str.replace(r'\s*\S+$', '', regex=True)
# 或者，只匹配 'County'
places['new'] = place['PLACENAME'].str.replace(r'\s*County$', '', regex=True)

输出：

          PLACENAME        new
0       Able County       Able
1      Baker County      Baker
2    Charlie County    Charlie
3  St. Louis County  St. Louis

正则表达式演示

英文:

Use str.replace to remove the last word and the preceding spaces:

places[&#39;new&#39;] = place[&#39;PLACENAME&#39;].str.replace(r&#39;\s*\w+$&#39;, &#39;&#39;, regex=True)
# or
places[&#39;new&#39;] = place[&#39;PLACENAME&#39;].str.replace(r&#39;\s*\S+$&#39;, &#39;&#39;, regex=True)
# or, only match &#39;County&#39;
places[&#39;new&#39;] = place[&#39;PLACENAME&#39;].str.replace(r&#39;\s*County$&#39;, &#39;&#39;, regex=True)

Output:

          PLACENAME        new
0       Able County       Able
1      Baker County      Baker
2    Charlie County    Charlie
3  St. Louis County  St. Louis

regex demo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas DF: 创建新列，通过删除现有列的最后一个单词。

问题

答案1

答案2

我的卷积神经网络无法正确预测不在数据集中的图像。

计算数据框中相同日期和小时的每两周滚动平均值。

按照分组显示来自其他列的相同行值

Pandas中的分组总计

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。