2023年6月19日 19:00:31go评论72阅读模式

英文:

Combining two unique values in a Pandas column if they have the same value in another column

问题

让我们假设我有一个非常大的Python Pandas数据框，看起来像这样：

df_test = pd.DataFrame(data=None, columns=['file', 'source'])
df_test.file = ['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
df_test.source = ['usa', 'uk', 'jp', 'sk', 'au', 'nz']

我想从中获得的结果是将“source”列中的唯一来源组合成一个字符串，用“;”分隔两个唯一的来源，对于“file”列中相同的每个值。因此，“source”列的最终结果应该是：

['usa; uk', 'usa; uk', 'jp; sk', 'jp; sk', 'au; nz', 'au; nz']

由于“file”列中的“file_1”具有两个来源“usa”和“uk”等。实际数据框非常大，因此必须自动完成，而不是手动完成。如何做到这一点，将不胜感激，谢谢！

英文:

Let's say I have a very large Pandas dataframe in Python that looks something like this:

df_test = pd.DataFrame(data = None, columns = [&#39;file&#39;,&#39;source&#39;])
df_test.file = [&#39;file_1&#39;, &#39;file_1&#39;, &#39;file_2&#39;, &#39;file_2&#39;, &#39;file_3&#39;, &#39;file_3&#39;]
df_test.source = [&#39;usa&#39;, &#39;uk&#39;, &#39;jp&#39;, &#39;sk&#39;, &#39;au&#39;, &#39;nz&#39;]

What I want to get out from this is for the 'source' column to combine the unique sources into a single string separating the two unique sources with a '; ' for each value in the 'file' column that is the same. The end result for the 'source' column should therefore be:

[&#39;usa; uk&#39;, &#39;usa; uk&#39;, &#39;jp; sk&#39;, &#39;jp; sk&#39;, &#39;au; nz&#39;, &#39;au; nz&#39;]

Since 'file_1' in the 'file' column has the two sources 'usa' and 'uk', etc. The actual dataframe is very large so it must be done automatically and not manually. Any help on how to do this would be really appreciated, thanks!

答案1

得分: 1

使用GroupBy.transform中的lambda函数，通过dict.fromkeys或set来去除重复的值：

df_test['new'] = (df_test.groupby('file')['source']
                 .transform(lambda x: '; '.join(dict.fromkeys(x))))
print(df_test)
     file source      new
0  file_1    usa  usa; uk
1  file_1     uk  usa; uk
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   au; nz
5  file_3     nz   au; nz

df_test['new'] = df_test.groupby('file')['source'].transform(lambda x: '; '.join(set(x)))
print(df_test)
     file source      new
0  file_1    usa  uk; usa
1  file_1     uk  uk; usa
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   nz; au
5  file_3     nz   nz; au

英文:

Use lambda function in GroupBy.transform with remove duplicated values in dict.fromkeys or by sets:

df_test[&#39;new&#39;] = (df_test.groupby(&#39;file&#39;)[&#39;source&#39;]
                         .transform(lambda x: &#39;; &#39;.join(dict.fromkeys(x))))
print(df_test)
     file source      new
0  file_1    usa  usa; uk
1  file_1     uk  usa; uk
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   au; nz
5  file_3     nz   au; nz

df_test[&#39;new&#39;] = df_test.groupby(&#39;file&#39;)[&#39;source&#39;].transform(lambda x: &#39;; &#39;.join(set(x)))
print(df_test)
     file source      new
0  file_1    usa  uk; usa
1  file_1     uk  uk; usa
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   nz; au
5  file_3     nz   nz; au

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个Pandas列中具有相同值的唯一值组合在一起。

问题

答案1

在使用Google的TPU时，在Colab中导入Causal Impact时出现问题。

Is it good practice to add optional arguments to AWS lambda function?

在Python的unittest中，为同一个SQLClient对象在每个函数中获取不同的数值。

如何仅通过对一个数据集进行采样来修复交错的数据集？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论