PySpark:合并两个数据框

huangapple go评论64阅读模式
英文:

PySpark: Merge two dataframes

问题

我是Python编程的初学者,在C++和JavaScript方面更加高级。也许你可以帮助我。

我有两个数据框,分别称为df1和df2。它们都具有不同的列以及长度。

Df1:

Id-name Match Title
21171 2500B Title1
21171 2400B Title2
21171 2400C Title3
21171 3000A Title4
22000 1000A Title5

Df2:

Prio Document
2500B Doc1
2500B Doc2
2500B Doc3
1000A Doc5
1000A Doc6
1000A Doc7
1000A Doc8

输出:

Id-name Match title Prio Document
21171 2500B Title1 2500B Doc1
21171 2500B Title1 2500B Doc2
21171 2500B Title1 2500B Doc3
21171 2400B Title2 null null
21171 2400C Title3 null null
21171 3000A Title4 null null
22000 1000A Title5 1000B Doc5
22000 1000A Title5 1000B Doc6
22000 1000A Title5 1000B Doc7
22000 1000A Title5 1000B Doc8

我已经尝试使用union函数合并数据框,但没有成功。有人可以帮助我吗?或者告诉我一个正确的方法来做这件事。提前感谢。

英文:

I‘m a beginner in python coding and more advanced in C+ and Js.
However maybe you can help me.

I have two dataframes, called df1 and df2. Both of them have different columns as well as lengths.

Df1:

Id-name Match Title
21171 2500B Title1
21171 2400B Title2
21171 2400C Title3
21171 3000A Title4
22000 1000A Title5

Df2:

Prio Document
2500B Doc1
2500B Doc2
2500B Doc3
1000A Doc5
1000A Doc6
1000A Doc7
1000A Doc8

Output:

Id-name Match title Prio Document
21171 2500B Title1 2500B Doc1
21171 2500B Title1 2500B Doc2
21171 2500B Title1 2500B Doc3
21171 2400B Title2 null null
21171 2400C Title3 null null
21171 3000A Title4 null null
22000 1000A Title5 1000B Doc5
22000 1000A Title5 1000B Doc6
22000 1000A Title5 1000B Doc7
22000 1000A Title5 1000B Doc8

I already tried with the union function to merge the dataframes but without any success.
Can somebody help me? Or tell me a proper way to do so.

Many thanks in advance.

答案1

得分: 0

我认为你在这里尝试的是一个连接操作(实际上是左连接,因为在连接时Prio列中存在NULL值)。

你可以按照以下方式进行:

Df1.join(Df2, Df1['Match'] == Df2['Prio'], how='left')
英文:

I think what you're trying here is a join (left-join in fact, as there are NULL values in Prio column upon join).

you can do it as follows:

Df1.join(Df2, Df1['Match'] == Df2['Prio'], how='left')

huangapple
  • 本文由 发表于 2023年7月13日 22:18:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680416.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定