Pandas:如何将一个DataFrame的值映射到另一个DataFrame?

huangapple go评论143阅读模式
英文:

Pandas: How to map the values of a Dataframe to another Dataframe?

问题

我是完全新手,正在学习Python,并且有一些用例需要使用。

我有两个数据框,一个是我需要Country列中的值,另一个是具有名为'Countries'列中的值,需要映射到主数据框中的名为'Data'的列中。
(如果此问题已经得到回答,请接受我的道歉)

以下是主数据框:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas	  |
Divya london Khosla	          |
new delhi Pragati Kumari	  |
Will London Turner	          |
Joseph Mascurenus Bombay	  |
Jason New York Bourne	      |
New york Vice Roy	          |
Joseph Mascurenus new York	  |
Peter Parker California	      |
Bruce (istanbul) Wayne	      |

以下是引用的数据框:

Data           | Countries
-------------- | ---------
las Vegas      | US
london         | UK
New Delhi      | IN
London         | UK
bombay         | IN
New York       | US
New york       | US
new York       | US
California     | US
istanbul       | TR
Moscow         | RS
Cape Town      | SA

而我想要的结果如下所示:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas	  | US
Divya london Khosla	          | UK
new delhi Pragati Kumari	  | IN
Will London Turner	          | UK
Joseph Mascurenus Bombay	  | IN
Jason New York Bourne	      | US
New york Vice Roy	          | US
Joseph Mascurenus new York	  | US
Peter Parker California	      | US
Bruce (istanbul) Wayne	      | TR

请注意,两个数据框的大小不同。
我考虑使用map或Fuzzywuzzy方法,但没有真正实现所需的结果。

英文:

I am totally new to Python and just learning with some use cases I have.

I have 2 Data Frames, one is where I need the values in the Country Column, and another is having the values in the column named 'Countries' which needs to be mapped in the main Data Frame referring to the column named 'Data'.
(Please accept my apology if this question has already been answered)

Below is the Main DataFrame:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas	  |
Divya london Khosla	          |
new delhi Pragati Kumari	  |
Will London Turner	          |
Joseph Mascurenus Bombay	  |
Jason New York Bourne	      |
New york Vice Roy	          |
Joseph Mascurenus new York	  |
Peter Parker California	      |
Bruce (istanbul) Wayne	      |

Below is the Referenced DataFrame:

Data           | Countries
-------------- | ---------
las Vegas      | US
london         | UK
New Delhi      | IN
London         | UK
bombay         | IN
New York       | US
New york       | US
new York       | US
California     | US
istanbul       | TR
Moscow         | RS
Cape Town      | SA

And what I want in the result will look like below:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas	  | US
Divya london Khosla	          | UK
new delhi Pragati Kumari	  | IN
Will London Turner	          | UK
Joseph Mascurenus Bombay	  | IN
Jason New York Bourne	      | US
New york Vice Roy	          | US
Joseph Mascurenus new York	  | US
Peter Parker California	      | US
Bruce (istanbul) Wayne	      | TR

Please note, Both the dataframes are not same in size.
I though of using map or Fuzzywuzzy method but couldn't really achieved the result.

答案1

得分: 2

提取与参考数据框匹配的国家键。

regex = '(' + '|'.join(ref_df['Data']) + ')'
df['key'] = df['Name Data'].str.extract(regex, flags=re.I).bfill(axis=1)[0]

合并两个数据框,根据提取的键进行合并。

pd.merge(df, ref_df, left_on='key', right_on='Data')
英文:

Find the country key that matches in the reference dataframe and extract it.

regex = '(' + ')|('.join(ref_df['Data']) + ')'
df['key'] = df['Name Data'].str.extract(regex, flags=re.I).bfill(axis=1)[0]

>>> df
                     Name Data        key
0  Arjun Kumar Reddy las Vegas  las Vegas
1       Bruce (istanbul) Wayne   istanbul
2   Joseph Mascurenus new York   new York


>>> ref_df
        Data Country
0  las Vegas      US
1   new York      US
2   istanbul      TR

Merge both the dataframes on key extracted.

pd.merge(df, ref_df, left_on='key', right_on='Data')
                     Name Data        key       Data Country
0  Arjun Kumar Reddy las Vegas  las Vegas  las Vegas      US
1       Bruce (istanbul) Wayne   istanbul   istanbul      TR
2   Joseph Mascurenus new York   new York   new York      US

答案2

得分: 1

好的,以下是翻译好的部分:

看起来一切都已经排序,所以你可以在索引上进行合并

mdf.merge(rdf, left_index=True, right_index=True)

英文:

It looks like everything is sorted so you can merge on index

mdf.merge(rdf, left_index=True, right_index=True)

huangapple
  • 本文由 发表于 2020年1月7日 00:18:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/59615483.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定