英文:
Iteratively cat rows from one df to cells of another df with conditions
问题
我想将df2中每个"name"的孩子附加到df1中每个"name"的行的第一个单元格中。类似这样:
Name Age nickname
Tom 20 T
child1
child2
child3
child4
nick 21 N
child1
child2
child3
child4
krish 19 K
jack 18 J
child1
child2
child3
child4
我已经能够通过将每个孩子变成一行来实现这一点,但这对于我下游需要做的事情不起作用。我不想让每个孩子成为自己的行。我只想在相应父项的单元格内连接文本,如果这有意义的话。正如您所注意到的,我需要跳过随机名称。在我的真实数据集中,我需要按行索引执行此操作。在此先感谢您的帮助。
Harry
英文:
all. I have 2 data frames. My data frames look like this:
# initialize data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
'Age': [20, 21, 19, 18],
'nickname': ['T', 'N', 'K','J']}
df1 = pd.DataFrame(data)
data2 = {'Kids': ['child1', 'child2', 'child3', 'child4', 'child1', 'child2', 'child3', 'child4'],
'Name': ['Tom', 'Tom', 'Tom', 'Tom', 'nick', 'nick', 'nick', 'nick'],
'nickname': ['T', 'T', 'T', 'T', 'N', 'N', 'N', 'N']}
df2 = pd.DataFrame(data2)
I would. like to append each "name's" children from df2 to the first cell of each "name's" row in df1. Something like this:
Name Age nickname
Tom. 20 T
child1
child2
child3
child4
nick 21 N
child1
child2
child3
child4
krish 19 K
jack 18 J
child1
child2
child3
child4
I have been able to do this by making each child a row, but that won't work for what I need to do down stream. I don't want each child to be its own row. I just want the text concatenated inside the cell of the corresponding parent if that makes sense. And as you will notice, I need to skip random names. In my real dataset I will need to do this by row index. Thank you for you help in advance.
Harry
答案1
得分: 1
你可以使用 GroupBy.apply
/concat
:
out = (
df1.groupby("Name", group_keys=False, sort=False)
.apply(lambda g: pd.concat(
[g, df2.loc[df2["Name"].eq(g.name),
"Kids"].to_frame("Name")]
) # with optional ignore_index=True
)
# .fillna("") #if needed, uncomment this chain
)
输出:
print(out)
Name Age nickname
0 Tom 20.00 T
0 child1 NaN NaN
1 child2 NaN NaN
2 child3 NaN NaN
3 child4 NaN NaN
1 nick 21.00 N
4 child1 NaN NaN
5 child2 NaN NaN
6 child3 NaN NaN
7 child4 NaN NaN
2 krish 19.00 K
3 jack 18.00 J
英文:
You can use GroupBy.apply
/concat
:
out = (
df1.groupby("Name", group_keys=False, sort=False)
.apply(lambda g: pd.concat(
[g, df2.loc[df2["Name"].eq(g.name),
"Kids"].to_frame("Name")]
) # with optional ignore_index=True
)
# .fillna("") #if needed, uncomment this chain
)
Output :
print(out)
Name Age nickname
0 Tom 20.00 T
0 child1 NaN NaN
1 child2 NaN NaN
2 child3 NaN NaN
3 child4 NaN NaN
1 nick 21.00 N
4 child1 NaN NaN
5 child2 NaN NaN
6 child3 NaN NaN
7 child4 NaN NaN
2 krish 19.00 K
3 jack 18.00 J
答案2
得分: 1
out = (
pd.concat([df1, df2[['Name', 'Kids']]])
.sort_values(by='Name', kind='stable', ignore_index=True)
.assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
)
输出:
Name Age nickname
0 Tom 20.0 T
1 child1 NaN NaN
2 child2 NaN NaN
3 child3 NaN NaN
4 child4 NaN NaN
5 jack 18.0 J
6 krish 19.0 K
7 nick 21.0 N
8 child1 NaN NaN
9 child2 NaN NaN
10 child3 NaN NaN
11 child4 NaN NaN
如果顺序和空字符串很重要:
mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})
out = (
pd.concat([df1, df2[['Name', 'Kids']]])
.sort_values(by='Name', kind='stable',
key=mapper.get, ignore_index=True)
.assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
.fillna('')
)
输出:
Name Age nickname
0 Tom 20.0 T
1 child1
2 child2
3 child3
4 child4
5 nick 21.0 N
6 child1
7 child2
8 child3
9 child4
10 krish 19.0 K
11 jack 18.0 J
英文:
Using concat
and sort_values
;
out = (
pd.concat([df1, df2[['Name', 'Kids']]])
.sort_values(by='Name', kind='stable', ignore_index=True)
.assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
)
Output:
Name Age nickname
0 Tom 20.0 T
1 child1 NaN NaN
2 child2 NaN NaN
3 child3 NaN NaN
4 child4 NaN NaN
5 jack 18.0 J
6 krish 19.0 K
7 nick 21.0 N
8 child1 NaN NaN
9 child2 NaN NaN
10 child3 NaN NaN
11 child4 NaN NaN
If order and empty strings are important:
mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})
out = (
pd.concat([df1, df2[['Name', 'Kids']]])
.sort_values(by='Name', kind='stable',
key=mapper.get, ignore_index=True)
.assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
.fillna('')
)
Output:
Name Age nickname
0 Tom 20.0 T
1 child1
2 child2
3 child3
4 child4
5 nick 21.0 N
6 child1
7 child2
8 child3
9 child4
10 krish 19.0 K
11 jack 18.0 J
答案3
得分: 0
尝试:
```py
for i, row in df1.iterrows():
df1.at[i, 'Name'] = [row['Name'], *df2.loc[df2.Name == row['Name'], 'Kids']]
df1 = df1.explode('Name')
m = df1.duplicated(subset=['Age', 'nickname'])
df1.loc[m, ['Age', 'nickname']] = ''
print(df1)
打印:
Name Age nickname
0 Tom 20 T
0 child1
0 child2
0 child3
0 child4
1 nick 21 N
1 child1
1 child2
1 child3
1 child4
2 krish 19 K
3 jack 18 J
<details>
<summary>英文:</summary>
Try:
```py
for i, row in df1.iterrows():
df1.at[i, 'Name'] = [row['Name'], *df2.loc[df2.Name == row['Name'], 'Kids']]
df1 = df1.explode('Name')
m = df1.duplicated(subset=['Age', 'nickname'])
df1.loc[m, ['Age', 'nickname']] = ''
print(df1)
Prints:
Name Age nickname
0 Tom 20 T
0 child1
0 child2
0 child3
0 child4
1 nick 21 N
1 child1
1 child2
1 child3
1 child4
2 krish 19 K
3 jack 18 J
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论