2023年4月17日 21:33:36go评论72阅读模式

英文:

copy data from one df to match rows in another

问题

你可以使用Pandas库的merge函数，根据'email'列将两个DataFrame合并，然后选择所需的列顺序，以获得期望的输出。以下是示例代码：

import pandas as pd

# 创建Dataframe One
data1 = {'number': [1234, 5678, 9012, 3456],
         'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
         'name': ['Me', 'You', 'Us', 'Them']}
df1 = pd.DataFrame(data1)

# 创建Dataframe Two
data2 = {'email': ['you@mail.com', 'them@mail.com', 'me@mail.com', 'us@mail.com'],
         'name': ['You', 'Them', 'Me', 'Us'],
         'open': [31, 84, 6, 0],
         'click': [7, 15, 1, 4]}
df2 = pd.DataFrame(data2)

# 合并两个DataFrame
result = df1.merge(df2, on='email')

# 选择所需的列顺序
result = result[['number', 'email', 'name', 'open', 'click']]

# 打印期望的输出
print(result)

这段代码将根据'email'列合并两个DataFrame，并确保'open'和'click'列与相应的'email'和'name'匹配，以获得期望的输出。

英文:

Dataframe One

    number   email          name
0   1234     me@mail.com    Me
1   5678     you@mail.com   You
2   9012     us@mail.com    Us
3   3456     them@mail.com  Them

Dataframe Two

    email         name    open  click
0   you@mail.com  You     31    7  
1   them@mail.com Them    84    15
2   me@mail.com   Me      6     1
3   us@mail.com   Us      0     4

I would like to combine the two dfs so I end up with one df only that combines the two:

Desired Output:

    number   email          name   open  click
0   1234     me@mail.com    Me     6     1
1   5678     you@mail.com   You    31    7
2   9012     us@mail.com    Us     0     4
3   3456     them@mail.com  Them   84    15

What is confusing me is how to ensure the data in columns 'open' and 'click' from dataframe two matches up correctly when combined with dataframe one as the 'email' and 'name' columns are in a different order in each dataframe.

答案1

得分: 1

Here is the translated code portion:

如果你指的是不同的顺序，即行不在第二个数据框的索引处与你给出的示例相匹配。只需按电子邮件合并。

df3 = pd.merge(df1, df2, on='email', suffixes=('', '_y')).filter(regex='^(?!.*_y)')

或者在第一个数据框的左侧和第二个数据框的右侧执行左连接。如果在第二个数据框中没有要合并的行，则会有空行，而第一个数据框的行将全部保留。

df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')

更新

如果我理解你的意思正确的话，你需要合并，但是在打开和点击列中有不同的值。创建了两个数据框。为了避免重复列，我使用了第二个重复列的前缀 _y，并使用 filter(regex='^(?!.*_y)') 进行重复列的筛选。为了保留必要的列，我对它们进行了重命名（以防它们被视为重复）。

df3 = pd.DataFrame(
    {'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
     'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
     'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})

df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)

df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left', suffixes=('', '_y')).filter(regex='^(?!.*_y)')

英文:

If you mean by a different order, that the rows do not match at the indexes of the second dataframe that you gave as an example. Just merge by email.

df3 = pd.merge(df1, df2, on= &#39;email&#39;,  suffixes=(&#39;&#39;, &#39;_y&#39;)).filter(regex=&#39;^(?!.*_y)&#39;)

Or do a left-side merge on the left of the first dataframe and on the right of the second. And if there are no rows to merge in the second dataframe, then there will be empty rows, and the rows of the first will be all.

df3 = pd.merge(df1, df2, left_on=&#39;email&#39;, right_on=&#39;email&#39;, how=&#39;left&#39;,
               suffixes=(&#39;&#39;, &#39;_y&#39;)).filter(regex=&#39;^(?!.*_y)&#39;)

Update

If I understand you correctly. You need to merge, but you have different values in the open, click columns. Made two dataframes. So that there are no duplicate columns, I use the prefix '_y' for the second duplicate and filter the duplicate filter(regex='^(?!.*_y)' on it. To save the necessary columns, I renamed them (so that they do not fall into duplicates).

df3 = pd.DataFrame(
    {&#39;number&#39;: [1234, 5678, 9012, 3456], &#39;email&#39;: [&#39;me@mail.com&#39;, &#39;you@mail.com&#39;, &#39;us@mail.com&#39;, &#39;them@mail.com&#39;],
     &#39;name&#39;: [&#39;Me&#39;, &#39;You&#39;, &#39;Us&#39;, &#39;Them&#39;], &#39;open&#39;: [6, 31, 0, 84], &#39;click&#39;: [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {&#39;number&#39;: [3456, 5678, 9012, 1234], &#39;email&#39;: [&#39;them@mail.com&#39;, &#39;you@mail.com&#39;, &#39;us@mail.com&#39;, &#39;me@mail.com&#39;],
     &#39;name&#39;: [&#39;Them&#39;, &#39;You&#39;, &#39;Us&#39;, &#39;Me&#39;], &#39;open&#39;: [1, 2, 3, 4], &#39;click&#39;: [4, 3, 2, 1]})


df3.rename(columns={&#39;open&#39;: &#39; open_df3&#39;, &#39;click&#39;: &#39;click_df3&#39;}, inplace=True)
df4.rename(columns={&#39;open&#39;: &#39; open_df4&#39;, &#39;click&#39;: &#39;click_df4&#39;}, inplace=True)

df5 = pd.merge(df3, df4, left_on=&#39;email&#39;, right_on=&#39;email&#39;, how=&#39;left&#39;,
               suffixes=(&#39;&#39;, &#39;_y&#39;)).filter(regex=&#39;^(?!.*_y)&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数据从一个数据框复制以匹配另一个数据框中的行。

问题

答案1

Pymongo可以用来在Python中连接MongoDB BI连接器吗？

如何使用Python将文件保存在Docker容器外部。

如何在Python中使用多线程进行更快的API调用，而不使用requests？

最佳方法来构建一个需要大量参数的长期Python项目是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论