英文:
Pandas DF: Create New Col by removing last word from of existing column
问题
这似乎是一个很简单的目标,但我可能想得太多,但我卡住了。关于我应该做什么的建议将不胜感激。
以下是正确的翻译部分:
- "This should be easy, but I'm stumped." -> "这应该很容易,但我陷入了困境。"
- "I have a df that includes a column of
PLACENAMES
. Some of these have multiple word names:" -> "我有一个包含PLACENAMES
列的数据框。其中一些具有多个单词的名称:" - "All I want to do is to create a new column in my df that has just the name, without the "county" word:" -> "我想做的只是在我的数据框中创建一个新列,只包含名称,不包括“county”这个词:"
- "1. Works - splits the names into a list
['St.','Louis','County']
" -> "1. 有效 - 将名称分割成一个列表['St.','Louis','County']
" - "2. The list splice is ignored, resulting in the same list
['St.','Louis','County']
rather than['St.','Louis']
" -> "2. 忽略了列表切片,导致相同的列表['St.','Louis','County']
,而不是['St.','Louis']
" - "3. Raises a ValueError:
Length of values (2) does not match length of index (41414)
" -> "3. 引发了一个ValueError错误:值的长度(2)与索引的长度(41414)不匹配
" - "4. Raises a TypeError:
sequence item 0: expected str instance, list found
" -> "4. 引发了一个TypeError错误:序列项 0:期望 str 实例,但找到了列表
" - "This also raises a TypeError:
sequence item 0: expected str instance, list found
" -> "这也引发了一个TypeError错误:序列项 0:期望 str 实例,但找到了列表
"
希望这有助于您解决问题。
英文:
This should be easy, but I'm stumped.
I have a df that includes a column of PLACENAMES
. Some of these have multiple word names:
Able County
Baker County
Charlie County
St. Louis County
All I want to do is to create a new column in my df that has just the name, without the "county" word:
Able
Baker
Charlie
St. Louis
I've tried a variety of things:
1. places['name_split'] = places['PLACENAME'].str.split()
2. places['name_split'] = places['PLACENAME'].str.split()[:-1]
3. places['name_split'] = places['PLACENAME'].str.rsplit(' ',1)[0]
4. places = places.assign(name_split = lambda x: ' '.join(x['PLACENAME].str.split()[:-1]))
- Works - splits the names into a list
['St.','Louis','County']
- The list splice is ignored, resulting in the same list
['St.','Louis','County']
rather than['St.','Louis']
- Raises a ValueError:
Length of values (2) does not match length of index (41414)
- Raises a TypeError:
sequence item 0: expected str instance, list found
I've also defined a function and called it with .assign():
def processField(namelist):
words = namelist[:-1]
name = ' '.join(words)
return name
places = places.assign(name_split = lambda x: processField(x['PLACENAME]))
This also raises a TypeError: sequence item 0: expected str instance, list found
This seems to be a very simple goal and I've probably overthought it, but I'm just stumped. Suggestions about what I should be doing would be deeply appreciated.
答案1
得分: 1
places['name_split'] = places['PLACENAME'].str.rpartition()[0]
英文:
Apply Series.str.rpartition
function:
places['name_split'] = places['PLACENAME'].str.rpartition()[0]
答案2
得分: 1
使用 str.replace
来移除最后一个单词和前面的空格:
places['new'] = place['PLACENAME'].str.replace(r'\s*\w+$', '', regex=True)
# 或者
places['new'] = place['PLACENAME'].str.replace(r'\s*\S+$', '', regex=True)
# 或者,只匹配 'County'
places['new'] = place['PLACENAME'].str.replace(r'\s*County$', '', regex=True)
输出:
PLACENAME new
0 Able County Able
1 Baker County Baker
2 Charlie County Charlie
3 St. Louis County St. Louis
英文:
Use str.replace
to remove the last word and the preceding spaces:
places['new'] = place['PLACENAME'].str.replace(r'\s*\w+$', '', regex=True)
# or
places['new'] = place['PLACENAME'].str.replace(r'\s*\S+$', '', regex=True)
# or, only match 'County'
places['new'] = place['PLACENAME'].str.replace(r'\s*County$', '', regex=True)
Output:
PLACENAME new
0 Able County Able
1 Baker County Baker
2 Charlie County Charlie
3 St. Louis County St. Louis
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论