2023年6月6日 16:12:03go评论106阅读模式

英文:

Conditional merging of two dataframes in python3.7

问题

以下是您要翻译的内容：

我有以下的数据框

col1 term1 term2
ab|a ab a
cd cd


我想要将这个数据框与另一个数据框（df2）合并，使用“term1”和“term2”两列，但在值为None时跳过/忽略（就像在第2行中一样）。我试图在一个for循环中使用if/else条件来实现这一目标。请查看下面的伪代码（这不是一个功能性代码，因为它也显示错误）。
这是正确的方法吗，还有更好的方法吗？

df1 = pd.concat([df["col1"], df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print(row)
row.merge(df2, how='inner', left_on='term1', right_on='STR')


<details>
<summary>英文:</summary>
I have following dataframe

col1 term1 term2
ab|a ab a
cd cd


I would like to merge this dataframe to another dataframe (df2) using both the columns &quot;term1&quot; and &quot;term2&quot; but skip/ignore when it is None (like in row 2). I am trying to use if/else condition here in a for loop. Please see the pseudocode below (this is not a functional code as it is showing error as well). 
Is it a right approach or there is nicer way to do this.

df1 = pd.concat([df["col1
"],df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print (row)
row.merge(df2, how = 'inner', left_on = 'term1', right_on = 'STR')


</details>
# 答案1
**得分**: 1
在 pandas 数据框中使用循环是一种代码异味。为了排除具有空值的行，在合并之前，只需删除它们。您可以首先使用 pandas 的 dropna ([文档][1])。类似这样：
```python
df1 = df1.dropna(subset=["term1", "term2"])

然后应用 pandas 的 merge (文档)：

df = df1.merge(df2, on=["term1", "term2"])

为了使代码更简短，您可以首先定义 merge_columns = ["term1", "term2"] 以在 dropna 和 merge 方法中使用。您还可以在合并中直接进行筛选，我只是分步进行以便清晰明了。

希望对您有所帮助。

英文:

A loop in pandas dataframes is a code smell. In order to exclude rows with null values, just drop them before merging. You could first use pandas' dropna (doc). Something like this:

df1 = df1.dropna(subset=[&quot;term1&quot;, &quot;term2&quot;])

And then apply pandas' merge (doc):

df = df1.merge(df2, on=[&quot;term1&quot;, &quot;term2&quot;])

In order to make the code shorter, you could first define merge_columns = ["term1", "term2"] to be used in the dropna and merge method. You could also make the filter right inside the merge, I just did it step by step to be clear.

Hope it helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Conditional merging of two dataframes in Python 3.7.

问题

在Taipy创建一个用户友好的链接选择器：如何在选择器中显示链接列表

如何根据TriggerDagRunOperator中python_callable的结果来为trigger_dag_id赋值

如何避免Tkinter在形状数量增加时变慢？

如何将文件从S3存储桶复制到同一S3存储桶中的文件夹？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。