将数据从一个数据框复制以匹配另一个数据框中的行。

huangapple go评论64阅读模式
英文:

copy data from one df to match rows in another

问题

你可以使用Pandas库的merge函数,根据'email'列将两个DataFrame合并,然后选择所需的列顺序,以获得期望的输出。以下是示例代码:

import pandas as pd

# 创建Dataframe One
data1 = {'number': [1234, 5678, 9012, 3456],
         'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
         'name': ['Me', 'You', 'Us', 'Them']}
df1 = pd.DataFrame(data1)

# 创建Dataframe Two
data2 = {'email': ['you@mail.com', 'them@mail.com', 'me@mail.com', 'us@mail.com'],
         'name': ['You', 'Them', 'Me', 'Us'],
         'open': [31, 84, 6, 0],
         'click': [7, 15, 1, 4]}
df2 = pd.DataFrame(data2)

# 合并两个DataFrame
result = df1.merge(df2, on='email')

# 选择所需的列顺序
result = result[['number', 'email', 'name', 'open', 'click']]

# 打印期望的输出
print(result)

这段代码将根据'email'列合并两个DataFrame,并确保'open'和'click'列与相应的'email'和'name'匹配,以获得期望的输出。

英文:

Dataframe One

    number   email          name
0   1234     me@mail.com    Me
1   5678     you@mail.com   You
2   9012     us@mail.com    Us
3   3456     them@mail.com  Them

Dataframe Two

    email         name    open  click
0   you@mail.com  You     31    7  
1   them@mail.com Them    84    15
2   me@mail.com   Me      6     1
3   us@mail.com   Us      0     4

I would like to combine the two dfs so I end up with one df only that combines the two:

Desired Output:

    number   email          name   open  click
0   1234     me@mail.com    Me     6     1
1   5678     you@mail.com   You    31    7
2   9012     us@mail.com    Us     0     4
3   3456     them@mail.com  Them   84    15

What is confusing me is how to ensure the data in columns 'open' and 'click' from dataframe two matches up correctly when combined with dataframe one as the 'email' and 'name' columns are in a different order in each dataframe.

答案1

得分: 1

Here is the translated code portion:

如果你指的是不同的顺序,即行不在第二个数据框的索引处与你给出的示例相匹配。只需按电子邮件合并。

df3 = pd.merge(df1, df2, on='email', suffixes=('', '_y')).filter(regex='^(?!.*_y)')

或者在第一个数据框的左侧和第二个数据框的右侧执行左连接。如果在第二个数据框中没有要合并的行,则会有空行,而第一个数据框的行将全部保留。

df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')

更新

如果我理解你的意思正确的话,你需要合并,但是在打开和点击列中有不同的值。创建了两个数据框。为了避免重复列,我使用了第二个重复列的前缀 _y,并使用 filter(regex='^(?!.*_y)') 进行重复列的筛选。为了保留必要的列,我对它们进行了重命名(以防它们被视为重复)。

df3 = pd.DataFrame(
    {'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
     'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
     'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})

df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)

df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')
英文:

If you mean by a different order, that the rows do not match at the indexes of the second dataframe that you gave as an example. Just merge by email.

df3 = pd.merge(df1, df2, on= 'email',  suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Or do a left-side merge on the left of the first dataframe and on the right of the second. And if there are no rows to merge in the second dataframe, then there will be empty rows, and the rows of the first will be all.

df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Update

If I understand you correctly. You need to merge, but you have different values in the open, click columns. Made two dataframes. So that there are no duplicate columns, I use the prefix '_y' for the second duplicate and filter the duplicate filter(regex='^(?!.*_y)' on it. To save the necessary columns, I renamed them (so that they do not fall into duplicates).

df3 = pd.DataFrame(
    {'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
     'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
     'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})


df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)

df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')

huangapple
  • 本文由 发表于 2023年4月17日 21:33:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035744.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定