英文:
How to group by elements of a list
问题
我有一个类似于这样的数据框:
81883 2011000011 ... [South Sturgeon, Creek]
81884 2011000022 ... [Meadowood]
81885 2011000016 ... [South, Portage]
81886 2011000011 ... [North Sturgeon, Creek]
我想要按照具有相同单词的行分组(单词是Locations列的值,由逗号分隔)。例如,在上面的示例中,我想要按Creek进行分组,当找不到相同单词时,将保留行(或更好地连接为字符串)。
我尝试使用以下代码:
def get_grp(list_current_row, df, column_location):
rows_index_to_groupby = []
for string_element in list_current_row:
for idx, row in enumerate(df[column_location].values):
if row != list_current_row and string_element in row:
rows_index_to_groupby.append(idx)
return rows_index_to_groupby
grouped_dataframe = resulting_dataframe.groupby(lambda x: [resulting_dataframe[column_location][i] for i in get_grp(x, resulting_dataframe, column_location)])
期望的输出将是:
Locations
Creek 0 Creek 81886 2011000011 ...
1 Creek 81883 2011000011 ...
South, Portage 2 South, Portage 81885 2011000016 ...
Meadowood 3 Meadowood 81884 2011000022
英文:
I have a dataframe that resembles this:
81883 2011000011 ... [South Sturgeon, Creek]
81884 2011000022 ... [Meadowood]
81885 2011000016 ... [South, Portage]
81886 2011000011 ... [North Sturgeon, Creek]
I want to groupby rows that have common words (words are values of the Locations column splitted by ',') from the last column (named Locations): for example in the mentioned example I want to groupby Creek, and when no common words are found the rows will be kept as is (or better joined as string)
I tried using:
def get_grp(list_current_row, df,column_location):
rows_index_to_groupby = []
for string_element in list_current_row:
for idx,row in enumerate (df[column_location].values):
if row != list_current_row and string_element in row:
rows_index_to_groupby.append(idx)
return rows_index_to_groupby
grouped_dataframe = resulting_dataframe.groupby(lambda x: [resulting_dataframe[column_location][i] for i in get_grp(x, resulting_dataframe,column_location)] )
The desired output would be:
Locations
Creek 0 Creek 81886 2011000011 ...
1 Creek 81883 2011000011 ...
South, Portage 2 South, Portage 81885 2011000016 ...
Meadowood 3 Meadowood 81884 2011000022
答案1
得分: 0
以下是翻译好的部分:
虽然不完全符合要求,但以下代码可能能满足您的需求。它只是提取位置值的最后一个元素,并将其赋值给索引:
```python
import pandas as pd
df = pd.DataFrame({
'number': [81883, 81884, 81885, 81886],
'date': ["2011000011", "2011000022", "2011000016", "2011000011"],
'location': [["South Sturgeon", "Creek"], ["Meadowood"], ["South", "Portage"], ["North Sturgeon", "Creek"]],
})
df.index = df.location.str[-1]
print(df)
输出结果如下:
number date location
location
Creek 81883 2011000011 [South Sturgeon, Creek]
Meadowood 81884 2011000022 [Meadowood]
Portage 81885 2011000016 [South, Portage]
Creek 81886 2011000011 [North Sturgeon, Creek]
现在,您可以使用以下方法轻松获取所有Creek条目:
df.loc['Creek']
由于索引与列名“location”相同,您可能想要重命名索引:
df.index.names = ['primary_location']
然后,对于分组操作,您可以执行以下操作:
df.groupby('primary_location')['number'].sum()
primary_location
Creek 163769
Meadowood 81884
Portage 81885
Name: number, dtype: int64
<details>
<summary>英文:</summary>
While not exactly what is asked, the following might get what you want. This simply extracts the last element of the location values and assigns that to the index:
import pandas as pd
df = pd.DataFrame({
'number': [81883, 81884, 81885, 81886],
'date': ["2011000011", "2011000022", "2011000016", "2011000011"],
'location': [["South Sturgeon", "Creek"], ["Meadowood"], ["South", "Portage"], ["North Sturgeon", "Creek"]],
})
df.index = df.location.str[-1]
print(df)
yields
number date location
location
Creek 81883 2011000011 [South Sturgeon, Creek]
Meadowood 81884 2011000022 [Meadowood]
Portage 81885 2011000016 [South, Portage]
Creek 81886 2011000011 [North Sturgeon, Creek]
Now you can simply get all Creek entries with e.g.
df.loc['Creek']
Since the index has the same name as a column, "location", you may want to rename the index:
df.index.names = ['primary_location']
and for grouped operations, you can then do e.g.
df.groupby('primary_location')['number'].sum()
primary_location
Creek 163769
Meadowood 81884
Portage 81885
Name: number, dtype: int64
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论