迭代将一个数据框的行逐行连接到另一个数据框的单元格中,带有条件。

huangapple go评论72阅读模式
英文:

Iteratively cat rows from one df to cells of another df with conditions

问题

我想将df2中每个"name"的孩子附加到df1中每个"name"的行的第一个单元格中。类似这样:

  1. Name Age nickname
  2. Tom 20 T
  3. child1
  4. child2
  5. child3
  6. child4
  7. nick 21 N
  8. child1
  9. child2
  10. child3
  11. child4
  12. krish 19 K
  13. jack 18 J
  14. child1
  15. child2
  16. child3
  17. child4

我已经能够通过将每个孩子变成一行来实现这一点,但这对于我下游需要做的事情不起作用。我不想让每个孩子成为自己的行。我只想在相应父项的单元格内连接文本,如果这有意义的话。正如您所注意到的,我需要跳过随机名称。在我的真实数据集中,我需要按行索引执行此操作。在此先感谢您的帮助。

Harry

英文:

all. I have 2 data frames. My data frames look like this:

  1. # initialize data of lists.
  2. data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
  3. 'Age': [20, 21, 19, 18],
  4. 'nickname': ['T', 'N', 'K','J']}
  5. df1 = pd.DataFrame(data)
  6. data2 = {'Kids': ['child1', 'child2', 'child3', 'child4', 'child1', 'child2', 'child3', 'child4'],
  7. 'Name': ['Tom', 'Tom', 'Tom', 'Tom', 'nick', 'nick', 'nick', 'nick'],
  8. 'nickname': ['T', 'T', 'T', 'T', 'N', 'N', 'N', 'N']}
  9. df2 = pd.DataFrame(data2)

I would. like to append each "name's" children from df2 to the first cell of each "name's" row in df1. Something like this:

  1. Name Age nickname
  2. Tom. 20 T
  3. child1
  4. child2
  5. child3
  6. child4
  7. nick 21 N
  8. child1
  9. child2
  10. child3
  11. child4
  12. krish 19 K
  13. jack 18 J
  14. child1
  15. child2
  16. child3
  17. child4

I have been able to do this by making each child a row, but that won't work for what I need to do down stream. I don't want each child to be its own row. I just want the text concatenated inside the cell of the corresponding parent if that makes sense. And as you will notice, I need to skip random names. In my real dataset I will need to do this by row index. Thank you for you help in advance.

Harry

答案1

得分: 1

你可以使用 GroupBy.apply/concat

  1. out = (
  2. df1.groupby("Name", group_keys=False, sort=False)
  3. .apply(lambda g: pd.concat(
  4. [g, df2.loc[df2["Name"].eq(g.name),
  5. "Kids"].to_frame("Name")]
  6. ) # with optional ignore_index=True
  7. )
  8. # .fillna("") #if needed, uncomment this chain
  9. )

输出:

  1. print(out)
  2. Name Age nickname
  3. 0 Tom 20.00 T
  4. 0 child1 NaN NaN
  5. 1 child2 NaN NaN
  6. 2 child3 NaN NaN
  7. 3 child4 NaN NaN
  8. 1 nick 21.00 N
  9. 4 child1 NaN NaN
  10. 5 child2 NaN NaN
  11. 6 child3 NaN NaN
  12. 7 child4 NaN NaN
  13. 2 krish 19.00 K
  14. 3 jack 18.00 J
英文:

You can use GroupBy.apply/concat :

  1. out = (
  2. df1.groupby("Name", group_keys=False, sort=False)
  3. .apply(lambda g: pd.concat(
  4. [g, df2.loc[df2["Name"].eq(g.name),
  5. "Kids"].to_frame("Name")]
  6. ) # with optional ignore_index=True
  7. )
  8. # .fillna("") #if needed, uncomment this chain
  9. )

Output :

  1. print(out)
  2. Name Age nickname
  3. 0 Tom 20.00 T
  4. 0 child1 NaN NaN
  5. 1 child2 NaN NaN
  6. 2 child3 NaN NaN
  7. 3 child4 NaN NaN
  8. 1 nick 21.00 N
  9. 4 child1 NaN NaN
  10. 5 child2 NaN NaN
  11. 6 child3 NaN NaN
  12. 7 child4 NaN NaN
  13. 2 krish 19.00 K
  14. 3 jack 18.00 J

答案2

得分: 1

使用concatsort_values

  1. out = (
  2. pd.concat([df1, df2[['Name', 'Kids']]])
  3. .sort_values(by='Name', kind='stable', ignore_index=True)
  4. .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
  5. )

输出:

  1. Name Age nickname
  2. 0 Tom 20.0 T
  3. 1 child1 NaN NaN
  4. 2 child2 NaN NaN
  5. 3 child3 NaN NaN
  6. 4 child4 NaN NaN
  7. 5 jack 18.0 J
  8. 6 krish 19.0 K
  9. 7 nick 21.0 N
  10. 8 child1 NaN NaN
  11. 9 child2 NaN NaN
  12. 10 child3 NaN NaN
  13. 11 child4 NaN NaN

如果顺序和空字符串很重要:

  1. mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})
  2. out = (
  3. pd.concat([df1, df2[['Name', 'Kids']]])
  4. .sort_values(by='Name', kind='stable',
  5. key=mapper.get, ignore_index=True)
  6. .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
  7. .fillna('')
  8. )

输出:

  1. Name Age nickname
  2. 0 Tom 20.0 T
  3. 1 child1
  4. 2 child2
  5. 3 child3
  6. 4 child4
  7. 5 nick 21.0 N
  8. 6 child1
  9. 7 child2
  10. 8 child3
  11. 9 child4
  12. 10 krish 19.0 K
  13. 11 jack 18.0 J
英文:

Using concat and sort_values;

  1. out = (
  2. pd.concat([df1, df2[['Name', 'Kids']]])
  3. .sort_values(by='Name', kind='stable', ignore_index=True)
  4. .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
  5. )

Output:

  1. Name Age nickname
  2. 0 Tom 20.0 T
  3. 1 child1 NaN NaN
  4. 2 child2 NaN NaN
  5. 3 child3 NaN NaN
  6. 4 child4 NaN NaN
  7. 5 jack 18.0 J
  8. 6 krish 19.0 K
  9. 7 nick 21.0 N
  10. 8 child1 NaN NaN
  11. 9 child2 NaN NaN
  12. 10 child3 NaN NaN
  13. 11 child4 NaN NaN

If order and empty strings are important:

  1. mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})
  2. out = (
  3. pd.concat([df1, df2[['Name', 'Kids']]])
  4. .sort_values(by='Name', kind='stable',
  5. key=mapper.get, ignore_index=True)
  6. .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
  7. .fillna('')
  8. )

Output:

  1. Name Age nickname
  2. 0 Tom 20.0 T
  3. 1 child1
  4. 2 child2
  5. 3 child3
  6. 4 child4
  7. 5 nick 21.0 N
  8. 6 child1
  9. 7 child2
  10. 8 child3
  11. 9 child4
  12. 10 krish 19.0 K
  13. 11 jack 18.0 J

答案3

得分: 0

  1. 尝试
  2. ```py
  3. for i, row in df1.iterrows():
  4. df1.at[i, 'Name'] = [row['Name'], *df2.loc[df2.Name == row['Name'], 'Kids']]
  5. df1 = df1.explode('Name')
  6. m = df1.duplicated(subset=['Age', 'nickname'])
  7. df1.loc[m, ['Age', 'nickname']] = ''
  8. print(df1)

打印:

  1. Name Age nickname
  2. 0 Tom 20 T
  3. 0 child1
  4. 0 child2
  5. 0 child3
  6. 0 child4
  7. 1 nick 21 N
  8. 1 child1
  9. 1 child2
  10. 1 child3
  11. 1 child4
  12. 2 krish 19 K
  13. 3 jack 18 J
  1. <details>
  2. <summary>英文:</summary>
  3. Try:
  4. ```py
  5. for i, row in df1.iterrows():
  6. df1.at[i, &#39;Name&#39;] = [row[&#39;Name&#39;], *df2.loc[df2.Name == row[&#39;Name&#39;], &#39;Kids&#39;]]
  7. df1 = df1.explode(&#39;Name&#39;)
  8. m = df1.duplicated(subset=[&#39;Age&#39;, &#39;nickname&#39;])
  9. df1.loc[m, [&#39;Age&#39;, &#39;nickname&#39;]] = &#39;&#39;
  10. print(df1)

Prints:

  1. Name Age nickname
  2. 0 Tom 20 T
  3. 0 child1
  4. 0 child2
  5. 0 child3
  6. 0 child4
  7. 1 nick 21 N
  8. 1 child1
  9. 1 child2
  10. 1 child3
  11. 1 child4
  12. 2 krish 19 K
  13. 3 jack 18 J

huangapple
  • 本文由 发表于 2023年6月2日 04:27:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385492.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定