Pandas DF: 创建新列,通过删除现有列的最后一个单词。

huangapple go评论59阅读模式
英文:

Pandas DF: Create New Col by removing last word from of existing column

问题

这似乎是一个很简单的目标,但我可能想得太多,但我卡住了。关于我应该做什么的建议将不胜感激。

以下是正确的翻译部分:

  • "This should be easy, but I'm stumped." -> "这应该很容易,但我陷入了困境。"
  • "I have a df that includes a column of PLACENAMES. Some of these have multiple word names:" -> "我有一个包含PLACENAMES列的数据框。其中一些具有多个单词的名称:"
  • "All I want to do is to create a new column in my df that has just the name, without the "county" word:" -> "我想做的只是在我的数据框中创建一个新列,只包含名称,不包括“county”这个词:"
  • "1. Works - splits the names into a list ['St.','Louis','County']" -> "1. 有效 - 将名称分割成一个列表['St.','Louis','County']"
  • "2. The list splice is ignored, resulting in the same list ['St.','Louis','County'] rather than ['St.','Louis']" -> "2. 忽略了列表切片,导致相同的列表['St.','Louis','County'],而不是['St.','Louis']"
  • "3. Raises a ValueError: Length of values (2) does not match length of index (41414)" -> "3. 引发了一个ValueError错误:值的长度(2)与索引的长度(41414)不匹配"
  • "4. Raises a TypeError: sequence item 0: expected str instance, list found" -> "4. 引发了一个TypeError错误:序列项 0:期望 str 实例,但找到了列表"
  • "This also raises a TypeError: sequence item 0: expected str instance, list found" -> "这也引发了一个TypeError错误:序列项 0:期望 str 实例,但找到了列表"

希望这有助于您解决问题。

英文:

This should be easy, but I'm stumped.
I have a df that includes a column of PLACENAMES. Some of these have multiple word names:

Able County
Baker County
Charlie County
St. Louis County

All I want to do is to create a new column in my df that has just the name, without the "county" word:

Able
Baker
Charlie
St. Louis

I've tried a variety of things:

1. places['name_split'] = places['PLACENAME'].str.split()
2. places['name_split'] = places['PLACENAME'].str.split()[:-1]

3. places['name_split'] = places['PLACENAME'].str.rsplit(' ',1)[0]
4. places = places.assign(name_split = lambda x: ' '.join(x['PLACENAME].str.split()[:-1]))
  1. Works - splits the names into a list ['St.','Louis','County']
  2. The list splice is ignored, resulting in the same list ['St.','Louis','County'] rather than ['St.','Louis']
  3. Raises a ValueError: Length of values (2) does not match length of index (41414)
  4. Raises a TypeError: sequence item 0: expected str instance, list found

I've also defined a function and called it with .assign():

def processField(namelist):
  words = namelist[:-1]
  name = ' '.join(words)
  return name

places = places.assign(name_split = lambda x: processField(x['PLACENAME]))

This also raises a TypeError: sequence item 0: expected str instance, list found

This seems to be a very simple goal and I've probably overthought it, but I'm just stumped. Suggestions about what I should be doing would be deeply appreciated.

答案1

得分: 1

应用Series.str.rpartition函数:

places['name_split'] = places['PLACENAME'].str.rpartition()[0]
英文:

Apply Series.str.rpartition function:

places['name_split'] = places['PLACENAME'].str.rpartition()[0]

答案2

得分: 1

使用 str.replace 来移除最后一个单词和前面的空格:

places['new'] = place['PLACENAME'].str.replace(r'\s*\w+$', '', regex=True)

# 或者
places['new'] = place['PLACENAME'].str.replace(r'\s*\S+$', '', regex=True)

# 或者,只匹配 'County'
places['new'] = place['PLACENAME'].str.replace(r'\s*County$', '', regex=True)

输出:

          PLACENAME        new
0       Able County       Able
1      Baker County      Baker
2    Charlie County    Charlie
3  St. Louis County  St. Louis

正则表达式演示

英文:

Use str.replace to remove the last word and the preceding spaces:

places['new'] = place['PLACENAME'].str.replace(r'\s*\w+$', '', regex=True)

# or
places['new'] = place['PLACENAME'].str.replace(r'\s*\S+$', '', regex=True)

# or, only match 'County'
places['new'] = place['PLACENAME'].str.replace(r'\s*County$', '', regex=True)

Output:

          PLACENAME        new
0       Able County       Able
1      Baker County      Baker
2    Charlie County    Charlie
3  St. Louis County  St. Louis

regex demo

huangapple
  • 本文由 发表于 2023年2月14日 02:06:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439661.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定