Conditional merging of two dataframes in Python 3.7.

huangapple go评论73阅读模式
英文:

Conditional merging of two dataframes in python3.7

问题

以下是您要翻译的内容:

我有以下的数据框

col1 term1 term2
ab|a ab a
cd cd


我想要将这个数据框与另一个数据框(df2)合并,使用“term1”和“term2”两列,但在值为None时跳过/忽略(就像在第2行中一样)。我试图在一个for循环中使用if/else条件来实现这一目标。请查看下面的伪代码(这不是一个功能性代码,因为它也显示错误)。

这是正确的方法吗,还有更好的方法吗?

df1 = pd.concat([df["col1"], df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print(row)
row.merge(df2, how='inner', left_on='term1', right_on='STR')


<details>
<summary>英文:</summary>

I have following dataframe

col1 term1 term2
ab|a ab a
cd cd


I would like to merge this dataframe to another dataframe (df2) using both the columns &quot;term1&quot; and &quot;term2&quot; but skip/ignore when it is None (like in row 2). I am trying to use if/else condition here in a for loop. Please see the pseudocode below (this is not a functional code as it is showing error as well). 

Is it a right approach or there is nicer way to do this.

df1 = pd.concat([df["col1
"],df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print (row)
row.merge(df2, how = 'inner', left_on = 'term1', right_on = 'STR')


</details>


# 答案1
**得分**: 1

在 pandas 数据框中使用循环是一种代码异味。为了排除具有空值的行,在合并之前,只需删除它们。您可以首先使用 pandas 的 dropna ([文档][1])。类似这样:

```python
df1 = df1.dropna(subset=["term1", "term2"])

然后应用 pandas 的 merge (文档):

df = df1.merge(df2, on=["term1", "term2"])

为了使代码更简短,您可以首先定义 merge_columns = ["term1", "term2"] 以在 dropna 和 merge 方法中使用。您还可以在合并中直接进行筛选,我只是分步进行以便清晰明了。

希望对您有所帮助。

英文:

A loop in pandas dataframes is a code smell. In order to exclude rows with null values, just drop them before merging. You could first use pandas' dropna (doc). Something like this:

df1 = df1.dropna(subset=[&quot;term1&quot;, &quot;term2&quot;])

And then apply pandas' merge (doc):

df = df1.merge(df2, on=[&quot;term1&quot;, &quot;term2&quot;])

In order to make the code shorter, you could first define merge_columns = [&quot;term1&quot;, &quot;term2&quot;] to be used in the dropna and merge method. You could also make the filter right inside the merge, I just did it step by step to be clear.

Hope it helps.

huangapple
  • 本文由 发表于 2023年6月6日 16:12:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76412630.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定