英文:
PySpark: Merge two dataframes
问题
我是Python编程的初学者,在C++和JavaScript方面更加高级。也许你可以帮助我。
我有两个数据框,分别称为df1和df2。它们都具有不同的列以及长度。
Df1:
Id-name | Match | Title |
---|---|---|
21171 | 2500B | Title1 |
21171 | 2400B | Title2 |
21171 | 2400C | Title3 |
21171 | 3000A | Title4 |
22000 | 1000A | Title5 |
Df2:
Prio | Document |
---|---|
2500B | Doc1 |
2500B | Doc2 |
2500B | Doc3 |
1000A | Doc5 |
1000A | Doc6 |
1000A | Doc7 |
1000A | Doc8 |
输出:
Id-name | Match | title | Prio | Document |
---|---|---|---|---|
21171 | 2500B | Title1 | 2500B | Doc1 |
21171 | 2500B | Title1 | 2500B | Doc2 |
21171 | 2500B | Title1 | 2500B | Doc3 |
21171 | 2400B | Title2 | null | null |
21171 | 2400C | Title3 | null | null |
21171 | 3000A | Title4 | null | null |
22000 | 1000A | Title5 | 1000B | Doc5 |
22000 | 1000A | Title5 | 1000B | Doc6 |
22000 | 1000A | Title5 | 1000B | Doc7 |
22000 | 1000A | Title5 | 1000B | Doc8 |
我已经尝试使用union函数合并数据框,但没有成功。有人可以帮助我吗?或者告诉我一个正确的方法来做这件事。提前感谢。
英文:
I‘m a beginner in python coding and more advanced in C+ and Js.
However maybe you can help me.
I have two dataframes, called df1 and df2. Both of them have different columns as well as lengths.
Df1:
Id-name | Match | Title |
---|---|---|
21171 | 2500B | Title1 |
21171 | 2400B | Title2 |
21171 | 2400C | Title3 |
21171 | 3000A | Title4 |
22000 | 1000A | Title5 |
Df2:
Prio | Document |
---|---|
2500B | Doc1 |
2500B | Doc2 |
2500B | Doc3 |
1000A | Doc5 |
1000A | Doc6 |
1000A | Doc7 |
1000A | Doc8 |
Output:
Id-name | Match | title | Prio | Document |
---|---|---|---|---|
21171 | 2500B | Title1 | 2500B | Doc1 |
21171 | 2500B | Title1 | 2500B | Doc2 |
21171 | 2500B | Title1 | 2500B | Doc3 |
21171 | 2400B | Title2 | null | null |
21171 | 2400C | Title3 | null | null |
21171 | 3000A | Title4 | null | null |
22000 | 1000A | Title5 | 1000B | Doc5 |
22000 | 1000A | Title5 | 1000B | Doc6 |
22000 | 1000A | Title5 | 1000B | Doc7 |
22000 | 1000A | Title5 | 1000B | Doc8 |
I already tried with the union function to merge the dataframes but without any success.
Can somebody help me? Or tell me a proper way to do so.
Many thanks in advance.
答案1
得分: 0
我认为你在这里尝试的是一个连接操作(实际上是左连接,因为在连接时Prio
列中存在NULL值)。
你可以按照以下方式进行:
Df1.join(Df2, Df1['Match'] == Df2['Prio'], how='left')
英文:
I think what you're trying here is a join (left-join in fact, as there are NULL values in Prio
column upon join).
you can do it as follows:
Df1.join(Df2, Df1['Match'] == Df2['Prio'], how='left')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论