英文:
How do I return a list of column values when the rows are the same for every other column in pandas?
问题
抱歉,如果我的问题不够清晰。我有一个包含以下值的数据框:
评分 | 用户名 | 评论日期 | 文章 | 得分 |
---|---|---|---|---|
5 | Madison | 2023-06-05 | 025-03-7579 | 0.924742 |
5 | Madison | 2023-06-05 | 025-03-2888 | 0.924742 |
5 | Lindsanna | 2023-06-05 | 025-03-9268 | 0.990300 |
4 | Mamax4 | 2023-06-05 | 025-03-1736 | 0.875178 |
正如我们所看到的,前两行包含相同的数据,除了'文章'列。我想编写一个代码,返回一个包含所有包含相同数据的'文章'字段值的列表/元组。在这种特殊情况下,理想的输出应该是类似这样的:
[('025-03-7579','025-03-2888')]
我尝试使用df.groupby('Article')
来创建两个列表并进行比较,在文章匹配时返回元组,但我没有成功。
非常感谢任何反馈!
英文:
I apologize if my question was unclear. I have a dataframe that contains these values:
Rating | UserName | ReviewDate | Article | Score |
---|---|---|---|---|
5 | Madison | 2023-06-05 | 025-03-7579 | 0.924742 |
5 | Madison | 2023-06-05 | 025-03-2888 | 0.924742 |
5 | Lindsanna | 2023-06-05 | 025-03-9268 | 0.990300 |
4 | Mamax4 | 2023-06-05 | 025-03-1736 | 0.875178 |
As we can see, the first two rows contain the same data except for the 'Article' column. I want to write a code a that returns me a list/tuple that contains all the values of the 'Article' field that contain the same data. In this particular case the ideal output would be something like:
[('025-03-7579','025-03-2888')]
I have tried to use df.groupby('Article')
to create two lists and compare them, returning a tuple whenever there was a match in articles but I have had no success.
Any feedback would be highly appreciated!
答案1
得分: 1
[('025-03-7579', '025-03-2888')]
英文:
Group by all columns except Article
and filter groups with duplicate values:
[tuple(gr['Article']) for _, gr in df.groupby(['Rating', 'UserName', 'ReviewDate', 'Score'])
if gr.index.size > 1]
[('025-03-7579', '025-03-2888')]
答案2
得分: 0
你尝试过类似这样的 groupby
吗?
df1 = df.groupby(["Rating", "UserName", "ReviewDate", "Score"])["Article"].apply(list).reset_index()
英文:
Did you try a groupby
more similar to something like this
df1 = df.groupby(["Rating","UserName","ReviewDate","Score"])["Article"].apply(list).reset_index()
答案3
得分: 0
另一个可能的解决方案:
df['Article'][df.duplicated(
df.columns[df.columns != 'Article'], keep=False)].to_list()
输出:
['025-03-7579', '025-03-2888']
如果需要每组一个元组:
cols = df.columns[df.columns != 'Article']
(df[df.duplicated(cols, keep=False)]
.groupby(cols.to_list())['Article'].agg(tuple).tolist())
输出:
[('025-03-7579', '025-03-2888')]
英文:
Another possible solution:
df['Article'][df.duplicated(
df.columns[df.columns != 'Article'], keep=False)].to_list()
Output:
['025-03-7579', '025-03-2888']
If a tuple per group is needed:
cols = df.columns[df.columns != 'Article']
(df[df.duplicated(cols, keep=False)]
.groupby(cols.to_list())['Article'].agg(tuple).tolist())
Output:
[('025-03-7579', '025-03-2888')]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论