如何在 pandas 中在行对其他列都相同的情况下返回一列值的列表?

huangapple go评论64阅读模式
英文:

How do I return a list of column values when the rows are the same for every other column in pandas?

问题

抱歉,如果我的问题不够清晰。我有一个包含以下值的数据框:

评分 用户名 评论日期 文章 得分
5 Madison 2023-06-05 025-03-7579 0.924742
5 Madison 2023-06-05 025-03-2888 0.924742
5 Lindsanna 2023-06-05 025-03-9268 0.990300
4 Mamax4 2023-06-05 025-03-1736 0.875178

正如我们所看到的,前两行包含相同的数据,除了'文章'列。我想编写一个代码,返回一个包含所有包含相同数据的'文章'字段值的列表/元组。在这种特殊情况下,理想的输出应该是类似这样的:

[('025-03-7579','025-03-2888')]

我尝试使用df.groupby('Article')来创建两个列表并进行比较,在文章匹配时返回元组,但我没有成功。

非常感谢任何反馈!

英文:

I apologize if my question was unclear. I have a dataframe that contains these values:

Rating UserName ReviewDate Article Score
5 Madison 2023-06-05 025-03-7579 0.924742
5 Madison 2023-06-05 025-03-2888 0.924742
5 Lindsanna 2023-06-05 025-03-9268 0.990300
4 Mamax4 2023-06-05 025-03-1736 0.875178

As we can see, the first two rows contain the same data except for the 'Article' column. I want to write a code a that returns me a list/tuple that contains all the values of the 'Article' field that contain the same data. In this particular case the ideal output would be something like:

[('025-03-7579','025-03-2888')]

I have tried to use df.groupby('Article') to create two lists and compare them, returning a tuple whenever there was a match in articles but I have had no success.

Any feedback would be highly appreciated!

答案1

得分: 1

[('025-03-7579', '025-03-2888')]

英文:

Group by all columns except Article and filter groups with duplicate values:

[tuple(gr['Article']) for _, gr in df.groupby(['Rating', 'UserName', 'ReviewDate', 'Score']) 
 if gr.index.size > 1]

[('025-03-7579', '025-03-2888')]

答案2

得分: 0

你尝试过类似这样的 groupby 吗?

df1 = df.groupby(["Rating", "UserName", "ReviewDate", "Score"])["Article"].apply(list).reset_index()
英文:

Did you try a groupby more similar to something like this

df1 =  df.groupby(["Rating","UserName","ReviewDate","Score"])["Article"].apply(list).reset_index()

答案3

得分: 0

另一个可能的解决方案:

df['Article'][df.duplicated(
    df.columns[df.columns != 'Article'], keep=False)].to_list()

输出:

['025-03-7579', '025-03-2888']

如果需要每组一个元组:

cols = df.columns[df.columns != 'Article']
(df[df.duplicated(cols, keep=False)]
 .groupby(cols.to_list())['Article'].agg(tuple).tolist())

输出:

[('025-03-7579', '025-03-2888')]
英文:

Another possible solution:

df['Article'][df.duplicated(
    df.columns[df.columns != 'Article'], keep=False)].to_list()

Output:

['025-03-7579', '025-03-2888']

If a tuple per group is needed:

cols = df.columns[df.columns != 'Article']
(df[df.duplicated(cols, keep=False)]
 .groupby(cols.to_list())['Article'].agg(tuple).tolist())

Output:

[('025-03-7579', '025-03-2888')]

huangapple
  • 本文由 发表于 2023年7月7日 02:59:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631808.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定