迭代将一个数据框的行逐行连接到另一个数据框的单元格中,带有条件。

huangapple go评论64阅读模式
英文:

Iteratively cat rows from one df to cells of another df with conditions

问题

我想将df2中每个"name"的孩子附加到df1中每个"name"的行的第一个单元格中。类似这样:

Name       Age      nickname
Tom        20          T
child1
child2
child3
child4
nick       21          N
child1
child2
child3
child4
krish      19          K
jack       18          J
child1
child2
child3
child4

我已经能够通过将每个孩子变成一行来实现这一点,但这对于我下游需要做的事情不起作用。我不想让每个孩子成为自己的行。我只想在相应父项的单元格内连接文本,如果这有意义的话。正如您所注意到的,我需要跳过随机名称。在我的真实数据集中,我需要按行索引执行此操作。在此先感谢您的帮助。

Harry

英文:

all. I have 2 data frames. My data frames look like this:

# initialize data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
        'Age': [20, 21, 19, 18],
        'nickname': ['T', 'N', 'K','J']}
df1 = pd.DataFrame(data)

data2 = {'Kids': ['child1', 'child2', 'child3', 'child4', 'child1', 'child2', 'child3', 'child4'],
        'Name': ['Tom', 'Tom', 'Tom', 'Tom', 'nick', 'nick', 'nick', 'nick'],
        'nickname': ['T', 'T', 'T', 'T', 'N', 'N', 'N', 'N']}
df2 = pd.DataFrame(data2)

I would. like to append each "name's" children from df2 to the first cell of each "name's" row in df1. Something like this:

Name       Age      nickname
Tom.       20          T
child1     
child2
child3
child4
nick       21          N
child1
child2
child3
child4
krish      19          K
jack       18          J
child1
child2
child3
child4       

I have been able to do this by making each child a row, but that won't work for what I need to do down stream. I don't want each child to be its own row. I just want the text concatenated inside the cell of the corresponding parent if that makes sense. And as you will notice, I need to skip random names. In my real dataset I will need to do this by row index. Thank you for you help in advance.

Harry

答案1

得分: 1

你可以使用 GroupBy.apply/concat

out = (
    df1.groupby("Name", group_keys=False, sort=False)
        .apply(lambda g: pd.concat(
            [g, df2.loc[df2["Name"].eq(g.name),
                        "Kids"].to_frame("Name")]
        ) # with optional ignore_index=True
              )
        # .fillna("") #if needed, uncomment this chain
)

输出:

print(out)
    
     Name   Age nickname
0     Tom 20.00        T
0  child1   NaN      NaN
1  child2   NaN      NaN
2  child3   NaN      NaN
3  child4   NaN      NaN
1    nick 21.00        N
4  child1   NaN      NaN
5  child2   NaN      NaN
6  child3   NaN      NaN
7  child4   NaN      NaN
2   krish 19.00        K
3    jack 18.00        J
英文:

You can use GroupBy.apply/concat :

out = (
    df1.groupby("Name", group_keys=False, sort=False)
        .apply(lambda g: pd.concat(
            [g, df2.loc[df2["Name"].eq(g.name),
                        "Kids"].to_frame("Name")]
        ) # with optional ignore_index=True
              )
        # .fillna("") #if needed, uncomment this chain
)

Output :

print(out)

     Name   Age nickname
0     Tom 20.00        T
0  child1   NaN      NaN
1  child2   NaN      NaN
2  child3   NaN      NaN
3  child4   NaN      NaN
1    nick 21.00        N
4  child1   NaN      NaN
5  child2   NaN      NaN
6  child3   NaN      NaN
7  child4   NaN      NaN
2   krish 19.00        K
3    jack 18.00        J

答案2

得分: 1

使用concatsort_values

out = (
 pd.concat([df1, df2[['Name', 'Kids']]])
   .sort_values(by='Name', kind='stable', ignore_index=True)
   .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
)

输出:

      Name   Age nickname
0      Tom  20.0        T
1   child1   NaN      NaN
2   child2   NaN      NaN
3   child3   NaN      NaN
4   child4   NaN      NaN
5     jack  18.0        J
6    krish  19.0        K
7     nick  21.0        N
8   child1   NaN      NaN
9   child2   NaN      NaN
10  child3   NaN      NaN
11  child4   NaN      NaN

如果顺序和空字符串很重要:

mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})

out = (
 pd.concat([df1, df2[['Name', 'Kids']]])
   .sort_values(by='Name', kind='stable',
                key=mapper.get, ignore_index=True)
   .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
   .fillna('')
)

输出:

      Name   Age nickname
0      Tom  20.0        T
1   child1               
2   child2               
3   child3               
4   child4               
5     nick  21.0        N
6   child1               
7   child2               
8   child3               
9   child4               
10   krish  19.0        K
11    jack  18.0        J
英文:

Using concat and sort_values;

out = (
 pd.concat([df1, df2[['Name', 'Kids']]])
   .sort_values(by='Name', kind='stable', ignore_index=True)
   .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
)

Output:

      Name   Age nickname
0      Tom  20.0        T
1   child1   NaN      NaN
2   child2   NaN      NaN
3   child3   NaN      NaN
4   child4   NaN      NaN
5     jack  18.0        J
6    krish  19.0        K
7     nick  21.0        N
8   child1   NaN      NaN
9   child2   NaN      NaN
10  child3   NaN      NaN
11  child4   NaN      NaN

If order and empty strings are important:

mapper = pd.Series({k:v for v, k in enumerate(df1['Name'].unique())})

out = (
 pd.concat([df1, df2[['Name', 'Kids']]])
   .sort_values(by='Name', kind='stable',
                key=mapper.get, ignore_index=True)
   .assign(Name=lambda d: d.pop('Kids').fillna(d['Name']))
   .fillna('')
)

Output:

      Name   Age nickname
0      Tom  20.0        T
1   child1               
2   child2               
3   child3               
4   child4               
5     nick  21.0        N
6   child1               
7   child2               
8   child3               
9   child4               
10   krish  19.0        K
11    jack  18.0        J

答案3

得分: 0

尝试

```py
for i, row in df1.iterrows():
    df1.at[i, 'Name'] = [row['Name'], *df2.loc[df2.Name == row['Name'], 'Kids']]

df1 = df1.explode('Name')
m = df1.duplicated(subset=['Age', 'nickname'])
df1.loc[m, ['Age', 'nickname']] = ''

print(df1)

打印:

     Name Age nickname
0     Tom  20        T
0  child1             
0  child2             
0  child3             
0  child4             
1    nick  21        N
1  child1             
1  child2             
1  child3             
1  child4             
2   krish  19        K
3    jack  18        J

<details>
<summary>英文:</summary>

Try:

```py
for i, row in df1.iterrows():
    df1.at[i, &#39;Name&#39;] = [row[&#39;Name&#39;], *df2.loc[df2.Name == row[&#39;Name&#39;], &#39;Kids&#39;]]

df1 = df1.explode(&#39;Name&#39;)
m = df1.duplicated(subset=[&#39;Age&#39;, &#39;nickname&#39;])
df1.loc[m, [&#39;Age&#39;, &#39;nickname&#39;]] = &#39;&#39;

print(df1)

Prints:

     Name Age nickname
0     Tom  20        T
0  child1             
0  child2             
0  child3             
0  child4             
1    nick  21        N
1  child1             
1  child2             
1  child3             
1  child4             
2   krish  19        K
3    jack  18        J

huangapple
  • 本文由 发表于 2023年6月2日 04:27:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385492.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定