英文:
copy data from one df to match rows in another
问题
你可以使用Pandas库的merge
函数,根据'email'列将两个DataFrame合并,然后选择所需的列顺序,以获得期望的输出。以下是示例代码:
import pandas as pd
# 创建Dataframe One
data1 = {'number': [1234, 5678, 9012, 3456],
'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
'name': ['Me', 'You', 'Us', 'Them']}
df1 = pd.DataFrame(data1)
# 创建Dataframe Two
data2 = {'email': ['you@mail.com', 'them@mail.com', 'me@mail.com', 'us@mail.com'],
'name': ['You', 'Them', 'Me', 'Us'],
'open': [31, 84, 6, 0],
'click': [7, 15, 1, 4]}
df2 = pd.DataFrame(data2)
# 合并两个DataFrame
result = df1.merge(df2, on='email')
# 选择所需的列顺序
result = result[['number', 'email', 'name', 'open', 'click']]
# 打印期望的输出
print(result)
这段代码将根据'email'列合并两个DataFrame,并确保'open'和'click'列与相应的'email'和'name'匹配,以获得期望的输出。
英文:
Dataframe One
number email name
0 1234 me@mail.com Me
1 5678 you@mail.com You
2 9012 us@mail.com Us
3 3456 them@mail.com Them
Dataframe Two
email name open click
0 you@mail.com You 31 7
1 them@mail.com Them 84 15
2 me@mail.com Me 6 1
3 us@mail.com Us 0 4
I would like to combine the two dfs so I end up with one df only that combines the two:
Desired Output:
number email name open click
0 1234 me@mail.com Me 6 1
1 5678 you@mail.com You 31 7
2 9012 us@mail.com Us 0 4
3 3456 them@mail.com Them 84 15
What is confusing me is how to ensure the data in columns 'open' and 'click' from dataframe two matches up correctly when combined with dataframe one as the 'email' and 'name' columns are in a different order in each dataframe.
答案1
得分: 1
Here is the translated code portion:
如果你指的是不同的顺序,即行不在第二个数据框的索引处与你给出的示例相匹配。只需按电子邮件合并。
df3 = pd.merge(df1, df2, on='email', suffixes=('', '_y')).filter(regex='^(?!.*_y)')
或者在第一个数据框的左侧和第二个数据框的右侧执行左连接。如果在第二个数据框中没有要合并的行,则会有空行,而第一个数据框的行将全部保留。
df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')
更新
如果我理解你的意思正确的话,你需要合并,但是在打开和点击列中有不同的值。创建了两个数据框。为了避免重复列,我使用了第二个重复列的前缀 _y
,并使用 filter(regex='^(?!.*_y)')
进行重复列的筛选。为了保留必要的列,我对它们进行了重命名(以防它们被视为重复)。
df3 = pd.DataFrame(
{'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})
df4 = pd.DataFrame(
{'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})
df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)
df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')
英文:
If you mean by a different order, that the rows do not match at the indexes of the second dataframe that you gave as an example. Just merge by email.
df3 = pd.merge(df1, df2, on= 'email', suffixes=('', '_y')).filter(regex='^(?!.*_y)')
Or do a left-side merge on the left of the first dataframe and on the right of the second. And if there are no rows to merge in the second dataframe, then there will be empty rows, and the rows of the first will be all.
df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left',
suffixes=('', '_y')).filter(regex='^(?!.*_y)')
Update
If I understand you correctly. You need to merge, but you have different values in the open, click columns. Made two dataframes. So that there are no duplicate columns, I use the prefix '_y'
for the second duplicate and filter the duplicate filter(regex='^(?!.*_y)'
on it. To save the necessary columns, I renamed
them (so that they do not fall into duplicates).
df3 = pd.DataFrame(
{'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})
df4 = pd.DataFrame(
{'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})
df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)
df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left',
suffixes=('', '_y')).filter(regex='^(?!.*_y)')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论