更新Python中嵌套循环中的列

huangapple go评论100阅读模式
英文:

updating the columns in nested loops in python

问题

我有两个数据框A和B如下所示:

A
col1 col2 col3
A B V1
A B V2
A C V1
A E V2

B
Col1 Col2 Value1 Value2
A B nan nan
A D nan nan
A C nan nan
A G nan nan
A E nan nan

我想根据数据框A更新数据框B中的Value1和Value2列,如果A的Col1和Col2的组合存在于B中,则更新Value1和Value2列,即从数据框A的col3中获取值。

我希望输出如下:

Col1 Col2 Value1 Value2
A B V1 V2
A D nan nan
A C V1 nan
A G nan nan
A E nan V2

我尝试了以下Python代码:

  1. def update_b():
  2. for x in b.index:
  3. for y in a.index:
  4. if ((a["col1"][y] == b["Col1"][x]) and (a["col2"][y] == b["Col2"][x])):
  5. if (a["col3"][y] == "V1"):
  6. b["Value1"][x] = "V1"
  7. else:
  8. b["Value2"][x] = "V2"
  9. update_b()

但是它给我报错:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

英文:

I have two dataframes A and B as follows :

  1. A
  2. col1 col2 col3
  3. A B V1
  4. A B V2
  5. A C V1
  6. A E V2
  7. B
  8. Col1 Col2 Value1 Value2
  9. A B nan nan
  10. A D nan nan
  11. A C nan nan
  12. A G nan nan
  13. A E nan nan

I want to update the columns Value1 and Value2 in dataframe B on the basis of dataframe A, as if the combination of Col1 and Col 2 of A exists in B it will update columns Value1 Value2 i.e. the values from col3 in dataframe A.

I want the output as :

  1. Col1 Col2 Value1 Value2
  2. A B V1 V2
  3. A D nan nan
  4. A C V1 nan
  5. A G nan nan
  6. A E nan V2

I tried the following code in python :

  1. def update_b():
  2. for x in b.index:
  3. for y in a.index:
  4. if ((a["col1"][y] == b["col1"][x]) & (a["col2"][y] == b["col2"][x])):
  5. if (a["col3"][y] == "V1"):
  6. b["value1"][x] = "V1"
  7. else:
  8. b["value2"][x] = "V2"

update_b()

but it gives me an error
> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

答案1

得分: 1

你的DataFrame(s)设计有问题,你应该重新审查它们,但为了回答你的初始问题,我们可以利用多级索引和映射。

  1. s = df.groupby(['col1','col2'])['col3'].agg(','.join).to_dict()
  2. df2['Value1'] = df2.set_index(['Col1','Col2']).index.map(s)
  3. df2.loc[df2['Value1'].str.contains(r'(v1)',case=False)==True,'Value1'] = 'V1'
  4. df2.loc[df2['Value1'].str.contains(r'(v2)',case=False)==True,'Value2'] = 'V2'
  5. df2.pop('Value1')

结果:

  1. print(df2)
  2. Col1 Col2 Value1 Value2
  3. 0 A B V1 V2
  4. 1 A D NaN NaN
  5. 2 A C V1 NaN
  6. 3 A G NaN NaN
  7. 4 A E NaN V2
英文:

Your dataframe(s) have bad design, you should revisit them, but to answer your initial question, we can leverage multindex & map

  1. s = df.groupby(['col1','col2'])['col3'].agg(','.join).to_dict()
  2. df2['Val'] = df2.set_index(['Col1','Col2']).index.map(s)
  3. df2.loc[df2['Val'].str.contains(r'(v1)',case=False)==True,'Value1'] = 'V1'
  4. df2.loc[df2['Val'].str.contains(r'(v2)',case=False)==True,'Value2'] = 'V2'
  5. df2.pop('Val')

Result:

  1. print(df2)
  2. Col1 Col2 Value1 Value2
  3. 0 A B V1 V2
  4. 1 A D NaN NaN
  5. 2 A C V1 NaN
  6. 3 A G NaN NaN
  7. 4 A E NaN V2

答案2

得分: 1

根据我理解的逻辑,您可以尝试以下函数。请注意,最后一行与您的预期输出不同(我无法理解为什么会这样)。

  1. def return_updated_B(df):
  2. m = A.rename(columns=lambda x: x.capitalize()) # 将A的列名首字母大写
  3. n = m.set_index(['Col1', 'Col2', m.groupby(['Col1', 'Col2']).cumcount().add(1)]) \
  4. .unstack().droplevel(0, axis=1).add_prefix('Value')
  5. return n.reindex(pd.MultiIndex.from_arrays((B['Col1'], B['Col2']))).reset_index()
  1. print(return_updated_B(A))
  2. Col1 Col2 Value1 Value2
  3. 0 A B V1 V2
  4. 1 A D NaN NaN
  5. 2 A C V1 NaN
  6. 3 A G NaN NaN
  7. 4 A E V2 NaN
英文:

As far as I understood the logic, you may try the below function, Note, the last row is different from your expected output (I couldnt figure how is that possible).

  1. def return_updated_B(df):
  2. m=A.rename(columns=lambda x:x.capitalize()) #capitalize cols of A
  3. n=m.set_index(['Col1','Col2',m.groupby(['Col1','Col2'])
  4. .cumcount().add(1)]).unstack().droplevel(0,axis=1).add_prefix('Value')
  5. return n.reindex(pd.MultiIndex.from_arrays((B['Col1'],B['Col2']))).reset_index()

  1. print(return_updated_B(A))
  2. Col1 Col2 Value1 Value2
  3. 0 A B V1 V2
  4. 1 A D NaN NaN
  5. 2 A C V1 NaN
  6. 3 A G NaN NaN
  7. 4 A E V2 NaN

答案3

得分: 0

你可以尝试迭代第一个数据框的行,并使用loc方法来分配数值给第二个数据框的行。

类似下面这样:

  1. for index, row in dfA.iterrows():
  2. dfB.loc[(dfB.Col1 == row['col1']) & (dfB.Col2 == row['col2']), "Value1"] = row['value1']

我根据您提供的代码进行了翻译。由于是根据记忆回答的,可能存在一些语法错误。

英文:

You could try iterating over rows of the first dataframe and loc rows of the second one to assing values.

Something like the following:

  1. for index, row in dfA.iterrows():
  2. dfB.loc[dfB.Col1 == row['col1'] & dfB.Col2 == row['col2'], "Value1"] = row['value1']

Writing answer from memory so there may be some syntax errors.

huangapple
  • 本文由 发表于 2020年1月6日 22:10:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定