英文:
Combining two unique values in a Pandas column if they have the same value in another column
问题
让我们假设我有一个非常大的Python Pandas数据框,看起来像这样:
df_test = pd.DataFrame(data=None, columns=['file', 'source'])
df_test.file = ['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
df_test.source = ['usa', 'uk', 'jp', 'sk', 'au', 'nz']
我想从中获得的结果是将“source”列中的唯一来源组合成一个字符串,用“;”分隔两个唯一的来源,对于“file”列中相同的每个值。因此,“source”列的最终结果应该是:
['usa; uk', 'usa; uk', 'jp; sk', 'jp; sk', 'au; nz', 'au; nz']
由于“file”列中的“file_1”具有两个来源“usa”和“uk”等。实际数据框非常大,因此必须自动完成,而不是手动完成。如何做到这一点,将不胜感激,谢谢!
英文:
Let's say I have a very large Pandas dataframe in Python that looks something like this:
df_test = pd.DataFrame(data = None, columns = ['file','source'])
df_test.file = ['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
df_test.source = ['usa', 'uk', 'jp', 'sk', 'au', 'nz']
What I want to get out from this is for the 'source' column to combine the unique sources into a single string separating the two unique sources with a '; ' for each value in the 'file' column that is the same. The end result for the 'source' column should therefore be:
['usa; uk', 'usa; uk', 'jp; sk', 'jp; sk', 'au; nz', 'au; nz']
Since 'file_1' in the 'file' column has the two sources 'usa' and 'uk', etc. The actual dataframe is very large so it must be done automatically and not manually. Any help on how to do this would be really appreciated, thanks!
答案1
得分: 1
使用GroupBy.transform
中的lambda函数,通过dict.fromkeys
或set
来去除重复的值:
df_test['new'] = (df_test.groupby('file')['source']
.transform(lambda x: '; '.join(dict.fromkeys(x))))
print(df_test)
file source new
0 file_1 usa usa; uk
1 file_1 uk usa; uk
2 file_2 jp jp; sk
3 file_2 sk jp; sk
4 file_3 au au; nz
5 file_3 nz au; nz
df_test['new'] = df_test.groupby('file')['source'].transform(lambda x: '; '.join(set(x)))
print(df_test)
file source new
0 file_1 usa uk; usa
1 file_1 uk uk; usa
2 file_2 jp jp; sk
3 file_2 sk jp; sk
4 file_3 au nz; au
5 file_3 nz nz; au
英文:
Use lambda function in GroupBy.transform
with remove duplicated values in dict.fromkeys
or by set
s:
df_test['new'] = (df_test.groupby('file')['source']
.transform(lambda x: '; '.join(dict.fromkeys(x))))
print(df_test)
file source new
0 file_1 usa usa; uk
1 file_1 uk usa; uk
2 file_2 jp jp; sk
3 file_2 sk jp; sk
4 file_3 au au; nz
5 file_3 nz au; nz
df_test['new'] = df_test.groupby('file')['source'].transform(lambda x: '; '.join(set(x)))
print(df_test)
file source new
0 file_1 usa uk; usa
1 file_1 uk uk; usa
2 file_2 jp jp; sk
3 file_2 sk jp; sk
4 file_3 au nz; au
5 file_3 nz nz; au
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论