基于单独行中的值创建列

huangapple go评论106阅读模式
英文:

Creating column based off values in separate row

问题

这是我的数据框的一个示例:

  1. df = pd.DataFrame([['Arsenal FC', '', ''],
  2. ['In', 'Age', 'Nat.'],
  3. ['Leandro Trossard', 28, 'Belgium'],
  4. ['Jakub Kiwior', 22, 'Poland'],
  5. ['Jorginho', 32, 'Italy'],
  6. ['Chelsea FC', '', ''],
  7. ['In', 'Age', 'Nat.'],
  8. ['Enzo Fernández', 22, 'Argentina'],
  9. ['Mykhaylo Mudryk', 22, 'Ukraine'],
  10. ], columns=['Player', 'Age', 'Nat.'])

我想创建一个新列"Club",该列的值取自"Player"列的单元格,并将其附加到下面的球员。

棘手的部分是将正确的俱乐部分配给正确的球员。

这是我期望的输出:

  1. df = pd.DataFrame([['In', 'Age', 'Nat.'],
  2. ['Leandro Trossard', 28, 'Belgium', 'Arsenal FC'],
  3. ['Jakub Kiwior', 22, 'Poland', 'Arsenal FC'],
  4. ['Jorginho', 32, 'Italy', 'Arsenal FC'],
  5. ['In', 'Age', 'Nat.'],
  6. ['Enzo Fernández', 22, 'Argentina', 'Chelsea FC'],
  7. ['Mykhaylo Mudryk', 22, 'Ukraine', 'Chelsea FC'],
  8. ], columns=['Player', 'Age', 'Nat.', 'Club'])

我找不到与这个问题相关的其他问题。在Python中是否可能实现这一点?

英文:

Here is an example of my dataframe:

  1. df = pd.DataFrame([['Arsenal FC', '', ''],
  2. ['In', 'Age', 'Nat.'],
  3. ['Leandro Trossard', 28, 'Belgium'],
  4. ['Jakub Kiwior', 22, 'Poland'],
  5. ['Jorginho', 32, 'Italy'],
  6. ['Chelsea FC', '', ''],
  7. ['In', 'Age', 'Nat.'],
  8. ['Enzo Fernández ', 22, 'Argentina'],
  9. ['Mykhaylo Mudryk', 22, 'Ukraine'],
  10. ], columns=['Player', 'Age', 'Nat.'])

I want to create a new column "Club" which takes the string value of the cell in the "Player" and attaches it to the player below.

The tricky part is getting the right clubs assigned to the right players

This is my desired output:

  1. df = pd.DataFrame([['In', 'Age', 'Nat.'],
  2. ['Leandro Trossard', 28, 'Belgium', 'Arsenal FC'],
  3. ['Jakub Kiwior', 22, 'Poland', 'Arsenal FC'],
  4. ['Jorginho', 32, 'Italy', 'Arsenal FC'],
  5. ['In', 'Age', 'Nat.'],
  6. ['Enzo Fernández ', 22, 'Argentina', 'Chelsea FC'],
  7. ['Mykhaylo Mudryk', 22, 'Ukraine', 'Chelsea FC'],
  8. ], columns=['Player', 'Age', 'Nat.', 'Club'])

I can't find another question that relates to this problem. Is this possible in python?

答案1

得分: 2

以下是代码部分的翻译:

One option using boolean masks with mask and ffill:

  1. # which rows are empty string on Age?
  2. m1 = df['Age'].ne('')
  3. # which row are not internal headers?
  4. m2 = df['Player'].ne('Player')
  5. out = df[m1&m2].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

  1. Player Age Nat. Club
  2. 2 Leandro Trossard 28 Belgium Arsenal FC
  3. 3 Jakub Kiwior 22 Poland Arsenal FC
  4. 4 Jorginho 32 Italy Arsenal FC
  5. 7 Enzo Fernández 22 Argentina Chelsea FC
  6. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC

Intermediates:

  1. Player m1 mask ffill
  2. 0 Arsenal FC False Arsenal FC Arsenal FC
  3. 2 Leandro Trossard True NaN Arsenal FC
  4. 3 Jakub Kiwior True NaN Arsenal FC
  5. 4 Jorginho True NaN Arsenal FC
  6. 5 Chelsea FC False Chelsea FC Chelsea FC
  7. 7 Enzo Fernández True NaN Chelsea FC
  8. 8 Mykhaylo Mudryk True NaN Chelsea FC

keeping the In/Age/Nat rows

  1. # which rows are empty string on Age?
  2. m1 = df['Age'].ne('')
  3. # which row are not internal headers?
  4. m2 = df['Player'].ne('In')
  5. out = df[m1].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

  1. Player Age Nat. Club
  2. 1 In Age Nat. NaN
  3. 2 Leandro Trossard 28 Belgium Arsenal FC
  4. 3 Jakub Kiwior 22 Poland Arsenal FC
  5. 4 Jorginho 32 Italy Arsenal FC
  6. 6 In Age Nat. NaN
  7. 7 Enzo Fernández 22 Argentina Chelsea FC
  8. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC
英文:

One option using boolean masks with mask and ffill:

  1. # which rows are empty string on Age?
  2. m1 = df['Age'].ne('')
  3. # which row are not internal headers?
  4. m2 = df['Player'].ne('Player')
  5. out = df[m1&m2].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

  1. Player Age Nat. Club
  2. 2 Leandro Trossard 28 Belgium Arsenal FC
  3. 3 Jakub Kiwior 22 Poland Arsenal FC
  4. 4 Jorginho 32 Italy Arsenal FC
  5. 7 Enzo Fernández 22 Argentina Chelsea FC
  6. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC

Intermediates:

  1. Player m1 mask ffill
  2. 0 Arsenal FC False Arsenal FC Arsenal FC
  3. 2 Leandro Trossard True NaN Arsenal FC
  4. 3 Jakub Kiwior True NaN Arsenal FC
  5. 4 Jorginho True NaN Arsenal FC
  6. 5 Chelsea FC False Chelsea FC Chelsea FC
  7. 7 Enzo Fernández True NaN Chelsea FC
  8. 8 Mykhaylo Mudryk True NaN Chelsea FC

keeping the In/Age/Nat rows

  1. # which rows are empty string on Age?
  2. m1 = df['Age'].ne('')
  3. # which row are not internal headers?
  4. m2 = df['Player'].ne('In')
  5. out = df[m1].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

  1. Player Age Nat. Club
  2. 1 In Age Nat. NaN
  3. 2 Leandro Trossard 28 Belgium Arsenal FC
  4. 3 Jakub Kiwior 22 Poland Arsenal FC
  5. 4 Jorginho 32 Italy Arsenal FC
  6. 6 In Age Nat. NaN
  7. 7 Enzo Fernández 22 Argentina Chelsea FC
  8. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC

答案2

得分: 1

以下是代码部分的翻译:

Edited:

  1. df = pd.DataFrame([['Arsenal FC', '', ''],
  2. ['In', 'Age', 'Nat.'],
  3. ['Leandro Trossard', 28, 'Belgium'],
  4. ['Jakub Kiwior', 22, 'Poland'],
  5. ['Jorginho', 32, 'Italy'],
  6. ['Chelsea FC', '', ''],
  7. ['In', 'Age', 'Nat.'],
  8. ['Enzo Fernández ', 22, 'Argentina'],
  9. ['Mykhaylo Mudryk', 22, 'Ukraine'],
  10. ], columns=['Player', 'Age', 'Nat.'])
  11. clubs = []
  12. current_club = None
  13. for i, row in df.iterrows():
  14. if row['Player'] in ['Arsenal FC', 'Chelsea FC']:
  15. current_club = row['Player']
  16. elif row['Player'] == 'In':
  17. continue
  18. else:
  19. clubs.append(current_club)
  20. df['Club'] = clubs
  21. print(df)

Output:

  1. Player Age Nat. Club
  2. 0 Arsenal FC Arsenal FC
  3. 1 In Age Nat. NaN
  4. 2 Leandro Trossard 28 Belgium Arsenal FC
  5. 3 Jakub Kiwior 22 Poland Arsenal FC
  6. 4 Jorginho 32 Italy Arsenal FC
  7. 5 Chelsea FC Chelsea FC
  8. 6 In Age Nat. NaN
  9. 7 Enzo Fernández 22 Argentina Chelsea FC
  10. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC

Edit 2: Multiple club names

  1. clubs = ['Arsenal FC', 'Chelsea FC', 'Other Club 1', 'Other Club 2', ..., 'Other Club n']
  2. def get_club(row, clubs):
  3. if row['Player'] in clubs:
  4. return row['Player']
  5. else:
  6. return ''
  7. df['Club'] = ''
  8. club = ''
  9. for index, row in df.iterrows():
  10. if row['Player'] in clubs:
  11. club = row['Player']
  12. else:
  13. df.at[index, 'Club'] = club
  14. df = df[df['Club'] != ''].reset_index(drop=True)
  15. df['Club'] = df.apply(lambda x: get_club(x, clubs), axis=1)
英文:

Edited:

  1. df = pd.DataFrame([['Arsenal FC', '', ''],
  2. ['In', 'Age', 'Nat.'],
  3. ['Leandro Trossard', 28, 'Belgium'],
  4. ['Jakub Kiwior', 22, 'Poland'],
  5. ['Jorginho', 32, 'Italy'],
  6. ['Chelsea FC', '', ''],
  7. ['In', 'Age', 'Nat.'],
  8. ['Enzo Fernández ', 22, 'Argentina'],
  9. ['Mykhaylo Mudryk', 22, 'Ukraine'],
  10. ], columns=['Player', 'Age', 'Nat.'])
  11. clubs = []
  12. current_club = None
  13. for i, row in df.iterrows():
  14. if row['Player'] in ['Arsenal FC', 'Chelsea FC']:
  15. current_club = row['Player']
  16. elif row['Player'] == 'In':
  17. continue
  18. else:
  19. clubs.append(current_club)
  20. df['Club'] = clubs
  21. print(df)

Output:

  1. Player Age Nat. Club
  2. 0 Arsenal FC Arsenal FC
  3. 1 In Age Nat. NaN
  4. 2 Leandro Trossard 28 Belgium Arsenal FC
  5. 3 Jakub Kiwior 22 Poland Arsenal FC
  6. 4 Jorginho 32 Italy Arsenal FC
  7. 5 Chelsea FC Chelsea FC
  8. 6 In Age Nat. NaN
  9. 7 Enzo Fernández 22 Argentina Chelsea FC
  10. 8 Mykhaylo Mudryk 22 Ukraine Chelsea FC

Edit 2: Multiple club names

  1. clubs = ['Arsenal FC', 'Chelsea FC', 'Other Club 1', 'Other Club 2', ..., 'Other Club n']
  2. def get_club(row, clubs):
  3. if row['Player'] in clubs:
  4. return row['Player']
  5. else:
  6. return ''
  7. df['Club'] = ''
  8. club = ''
  9. for index, row in df.iterrows():
  10. if row['Player'] in clubs:
  11. club = row['Player']
  12. else:
  13. df.at[index, 'Club'] = club
  14. df = df[df['Club'] != ''].reset_index(drop=True)
  15. df['Club'] = df.apply(lambda x: get_club(x, clubs), axis=1)

huangapple
  • 本文由 发表于 2023年2月9日 00:53:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389120.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定