Conditional merging of two dataframes in Python 3.7.

huangapple go评论106阅读模式
英文:

Conditional merging of two dataframes in python3.7

问题

以下是您要翻译的内容:

  1. 我有以下的数据框

col1 term1 term2
ab|a ab a
cd cd

  1. 我想要将这个数据框与另一个数据框(df2)合并,使用“term1”和“term2”两列,但在值为None时跳过/忽略(就像在第2行中一样)。我试图在一个for循环中使用if/else条件来实现这一目标。请查看下面的伪代码(这不是一个功能性代码,因为它也显示错误)。
  2. 这是正确的方法吗,还有更好的方法吗?

df1 = pd.concat([df["col1"], df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print(row)
row.merge(df2, how='inner', left_on='term1', right_on='STR')

  1. <details>
  2. <summary>英文:</summary>
  3. I have following dataframe

col1 term1 term2
ab|a ab a
cd cd

  1. I would like to merge this dataframe to another dataframe (df2) using both the columns &quot;term1&quot; and &quot;term2&quot; but skip/ignore when it is None (like in row 2). I am trying to use if/else condition here in a for loop. Please see the pseudocode below (this is not a functional code as it is showing error as well).
  2. Is it a right approach or there is nicer way to do this.

df1 = pd.concat([df["col1
"],df["col1"].str.split("|", expand=True)], axis=1)
df1.rename(columns={0: 'term1', 1: 'term2'}, inplace=True)

for index, row in df1.iterrows():
if row['term1'] is None:
break
else:
row = row.to_frame()
print (row)
row.merge(df2, how = 'inner', left_on = 'term1', right_on = 'STR')

  1. </details>
  2. # 答案1
  3. **得分**: 1
  4. 在 pandas 数据框中使用循环是一种代码异味。为了排除具有空值的行,在合并之前,只需删除它们。您可以首先使用 pandas 的 dropna ([文档][1])。类似这样:
  5. ```python
  6. df1 = df1.dropna(subset=["term1", "term2"])

然后应用 pandas 的 merge (文档):

  1. df = df1.merge(df2, on=["term1", "term2"])

为了使代码更简短,您可以首先定义 merge_columns = ["term1", "term2"] 以在 dropna 和 merge 方法中使用。您还可以在合并中直接进行筛选,我只是分步进行以便清晰明了。

希望对您有所帮助。

英文:

A loop in pandas dataframes is a code smell. In order to exclude rows with null values, just drop them before merging. You could first use pandas' dropna (doc). Something like this:

  1. df1 = df1.dropna(subset=[&quot;term1&quot;, &quot;term2&quot;])

And then apply pandas' merge (doc):

  1. df = df1.merge(df2, on=[&quot;term1&quot;, &quot;term2&quot;])

In order to make the code shorter, you could first define merge_columns = [&quot;term1&quot;, &quot;term2&quot;] to be used in the dropna and merge method. You could also make the filter right inside the merge, I just did it step by step to be clear.

Hope it helps.

huangapple
  • 本文由 发表于 2023年6月6日 16:12:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76412630.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定