英文:
Pandas: How to map the values of a Dataframe to another Dataframe?
问题
我是完全新手,正在学习Python,并且有一些用例需要使用。
我有两个数据框,一个是我需要Country列中的值,另一个是具有名为'Countries'列中的值,需要映射到主数据框中的名为'Data'的列中。
(如果此问题已经得到回答,请接受我的道歉)
以下是主数据框:
Name Data | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas |
Divya london Khosla |
new delhi Pragati Kumari |
Will London Turner |
Joseph Mascurenus Bombay |
Jason New York Bourne |
New york Vice Roy |
Joseph Mascurenus new York |
Peter Parker California |
Bruce (istanbul) Wayne |
以下是引用的数据框:
Data | Countries
-------------- | ---------
las Vegas | US
london | UK
New Delhi | IN
London | UK
bombay | IN
New York | US
New york | US
new York | US
California | US
istanbul | TR
Moscow | RS
Cape Town | SA
而我想要的结果如下所示:
Name Data | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas | US
Divya london Khosla | UK
new delhi Pragati Kumari | IN
Will London Turner | UK
Joseph Mascurenus Bombay | IN
Jason New York Bourne | US
New york Vice Roy | US
Joseph Mascurenus new York | US
Peter Parker California | US
Bruce (istanbul) Wayne | TR
请注意,两个数据框的大小不同。
我考虑使用map或Fuzzywuzzy方法,但没有真正实现所需的结果。
英文:
I am totally new to Python and just learning with some use cases I have.
I have 2 Data Frames, one is where I need the values in the Country Column, and another is having the values in the column named 'Countries' which needs to be mapped in the main Data Frame referring to the column named 'Data'.
(Please accept my apology if this question has already been answered)
Below is the Main DataFrame:
Name Data | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas |
Divya london Khosla |
new delhi Pragati Kumari |
Will London Turner |
Joseph Mascurenus Bombay |
Jason New York Bourne |
New york Vice Roy |
Joseph Mascurenus new York |
Peter Parker California |
Bruce (istanbul) Wayne |
Below is the Referenced DataFrame:
Data | Countries
-------------- | ---------
las Vegas | US
london | UK
New Delhi | IN
London | UK
bombay | IN
New York | US
New york | US
new York | US
California | US
istanbul | TR
Moscow | RS
Cape Town | SA
And what I want in the result will look like below:
Name Data | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas | US
Divya london Khosla | UK
new delhi Pragati Kumari | IN
Will London Turner | UK
Joseph Mascurenus Bombay | IN
Jason New York Bourne | US
New york Vice Roy | US
Joseph Mascurenus new York | US
Peter Parker California | US
Bruce (istanbul) Wayne | TR
Please note, Both the dataframes are not same in size.
I though of using map or Fuzzywuzzy method but couldn't really achieved the result.
答案1
得分: 2
提取与参考数据框匹配的国家键。
regex = '(' + '|'.join(ref_df['Data']) + ')'
df['key'] = df['Name Data'].str.extract(regex, flags=re.I).bfill(axis=1)[0]
合并两个数据框,根据提取的键进行合并。
pd.merge(df, ref_df, left_on='key', right_on='Data')
英文:
Find the country key that matches in the reference dataframe and extract it.
regex = '(' + ')|('.join(ref_df['Data']) + ')'
df['key'] = df['Name Data'].str.extract(regex, flags=re.I).bfill(axis=1)[0]
>>> df
Name Data key
0 Arjun Kumar Reddy las Vegas las Vegas
1 Bruce (istanbul) Wayne istanbul
2 Joseph Mascurenus new York new York
>>> ref_df
Data Country
0 las Vegas US
1 new York US
2 istanbul TR
Merge both the dataframes on key extracted.
pd.merge(df, ref_df, left_on='key', right_on='Data')
Name Data key Data Country
0 Arjun Kumar Reddy las Vegas las Vegas las Vegas US
1 Bruce (istanbul) Wayne istanbul istanbul TR
2 Joseph Mascurenus new York new York new York US
答案2
得分: 1
好的,以下是翻译好的部分:
看起来一切都已经排序,所以你可以在索引上进行合并
mdf.merge(rdf, left_index=True, right_index=True)
英文:
It looks like everything is sorted so you can merge on index
mdf.merge(rdf, left_index=True, right_index=True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论