2023年7月7日 02:59:33go评论97阅读模式

英文:

How do I return a list of column values when the rows are the same for every other column in pandas?

问题

抱歉，如果我的问题不够清晰。我有一个包含以下值的数据框：

评分	用户名	评论日期	文章	得分
5	Madison	2023-06-05	025-03-7579	0.924742
5	Madison	2023-06-05	025-03-2888	0.924742
5	Lindsanna	2023-06-05	025-03-9268	0.990300
4	Mamax4	2023-06-05	025-03-1736	0.875178

正如我们所看到的，前两行包含相同的数据，除了'文章'列。我想编写一个代码，返回一个包含所有包含相同数据的'文章'字段值的列表/元组。在这种特殊情况下，理想的输出应该是类似这样的：

[(&#39;025-03-7579&#39;,&#39;025-03-2888&#39;)]

我尝试使用df.groupby('Article')来创建两个列表并进行比较，在文章匹配时返回元组，但我没有成功。

非常感谢任何反馈！

英文:

I apologize if my question was unclear. I have a dataframe that contains these values:

Rating	UserName	ReviewDate	Article	Score
5	Madison	2023-06-05	025-03-7579	0.924742
5	Madison	2023-06-05	025-03-2888	0.924742
5	Lindsanna	2023-06-05	025-03-9268	0.990300
4	Mamax4	2023-06-05	025-03-1736	0.875178

As we can see, the first two rows contain the same data except for the 'Article' column. I want to write a code a that returns me a list/tuple that contains all the values of the 'Article' field that contain the same data. In this particular case the ideal output would be something like:

[(&#39;025-03-7579&#39;,&#39;025-03-2888&#39;)]

I have tried to use df.groupby('Article') to create two lists and compare them, returning a tuple whenever there was a match in articles but I have had no success.

Any feedback would be highly appreciated!

答案1

得分: 1

[('025-03-7579', '025-03-2888')]

英文:

Group by all columns except Article and filter groups with duplicate values:

[tuple(gr[&#39;Article&#39;]) for _, gr in df.groupby([&#39;Rating&#39;, &#39;UserName&#39;, &#39;ReviewDate&#39;, &#39;Score&#39;]) 
 if gr.index.size &gt; 1]

[(&#39;025-03-7579&#39;, &#39;025-03-2888&#39;)]

答案2

得分: 0

你尝试过类似这样的 groupby 吗？

df1 = df.groupby(["Rating", "UserName", "ReviewDate", "Score"])["Article"].apply(list).reset_index()

英文:

Did you try a groupby more similar to something like this

df1 =  df.groupby([&quot;Rating&quot;,&quot;UserName&quot;,&quot;ReviewDate&quot;,&quot;Score&quot;])[&quot;Article&quot;].apply(list).reset_index()

答案3

得分: 0

另一个可能的解决方案：

df['Article'][df.duplicated(
    df.columns[df.columns != 'Article'], keep=False)].to_list()

输出：

['025-03-7579', '025-03-2888']

如果需要每组一个元组：

cols = df.columns[df.columns != 'Article']
(df[df.duplicated(cols, keep=False)]
 .groupby(cols.to_list())['Article'].agg(tuple).tolist())

输出：

[('025-03-7579', '025-03-2888')]

英文:

Another possible solution:

df[&#39;Article&#39;][df.duplicated(
    df.columns[df.columns != &#39;Article&#39;], keep=False)].to_list()

Output:

[&#39;025-03-7579&#39;, &#39;025-03-2888&#39;]

If a tuple per group is needed:

cols = df.columns[df.columns != &#39;Article&#39;]
(df[df.duplicated(cols, keep=False)]
 .groupby(cols.to_list())[&#39;Article&#39;].agg(tuple).tolist())

Output:

[(&#39;025-03-7579&#39;, &#39;025-03-2888&#39;)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas 中在行对其他列都相同的情况下返回一列值的列表？

问题

答案1

答案2

答案3

使用 Polars 根据另一列的条件修改某列的一些行。

如果未提供网格的行参数会发生什么？

这段代码是如何成功地向集合中添加新元素的？

“Error debugging Python in VS Code: ‘pythonPath’ is not valid if ‘python’ is specified.”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。