将两个Pandas列中具有相同值的唯一值组合在一起。

huangapple go评论72阅读模式
英文:

Combining two unique values in a Pandas column if they have the same value in another column

问题

让我们假设我有一个非常大的Python Pandas数据框,看起来像这样:

df_test = pd.DataFrame(data=None, columns=['file', 'source'])
df_test.file = ['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
df_test.source = ['usa', 'uk', 'jp', 'sk', 'au', 'nz']

我想从中获得的结果是将“source”列中的唯一来源组合成一个字符串,用“;”分隔两个唯一的来源,对于“file”列中相同的每个值。因此,“source”列的最终结果应该是:

['usa; uk', 'usa; uk', 'jp; sk', 'jp; sk', 'au; nz', 'au; nz']

由于“file”列中的“file_1”具有两个来源“usa”和“uk”等。实际数据框非常大,因此必须自动完成,而不是手动完成。如何做到这一点,将不胜感激,谢谢!

英文:

Let's say I have a very large Pandas dataframe in Python that looks something like this:

df_test = pd.DataFrame(data = None, columns = ['file','source'])
df_test.file = ['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
df_test.source = ['usa', 'uk', 'jp', 'sk', 'au', 'nz']

What I want to get out from this is for the 'source' column to combine the unique sources into a single string separating the two unique sources with a '; ' for each value in the 'file' column that is the same. The end result for the 'source' column should therefore be:

['usa; uk', 'usa; uk', 'jp; sk', 'jp; sk', 'au; nz', 'au; nz']

Since 'file_1' in the 'file' column has the two sources 'usa' and 'uk', etc. The actual dataframe is very large so it must be done automatically and not manually. Any help on how to do this would be really appreciated, thanks!

答案1

得分: 1

使用GroupBy.transform中的lambda函数,通过dict.fromkeysset来去除重复的值:

df_test['new'] = (df_test.groupby('file')['source']
                 .transform(lambda x: '; '.join(dict.fromkeys(x))))
print(df_test)
     file source      new
0  file_1    usa  usa; uk
1  file_1     uk  usa; uk
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   au; nz
5  file_3     nz   au; nz
df_test['new'] = df_test.groupby('file')['source'].transform(lambda x: '; '.join(set(x)))
print(df_test)
     file source      new
0  file_1    usa  uk; usa
1  file_1     uk  uk; usa
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   nz; au
5  file_3     nz   nz; au
英文:

Use lambda function in GroupBy.transform with remove duplicated values in dict.fromkeys or by sets:

df_test['new'] = (df_test.groupby('file')['source']
                         .transform(lambda x: '; '.join(dict.fromkeys(x))))
print(df_test)
     file source      new
0  file_1    usa  usa; uk
1  file_1     uk  usa; uk
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   au; nz
5  file_3     nz   au; nz

df_test['new'] = df_test.groupby('file')['source'].transform(lambda x: '; '.join(set(x)))
print(df_test)
     file source      new
0  file_1    usa  uk; usa
1  file_1     uk  uk; usa
2  file_2     jp   jp; sk
3  file_2     sk   jp; sk
4  file_3     au   nz; au
5  file_3     nz   nz; au

huangapple
  • 本文由 发表于 2023年6月19日 19:00:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76505989.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定