Pandas:更新行中的值的位置

huangapple go评论53阅读模式
英文:

Pandas: Updating the Position of Values in a Row

问题

以下是您要的翻译内容:

我有以下的DataFrame,它被分成了两个团队 PE,每个球员及其位置是相关的数据,所以应该保持在一起 P1P1_Position

我感兴趣的是:

  • 查看每个团队中特定的球员 P: [('T', '3'), ('K', '2')]
    E: [('N', '1')]

    我希望根据我想查看的球员对每行进行排序,以便 ('T', '3'), ('K', '2')('N', '1') 在它们的团队中位于开头。

P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2 E2_Position E3 E3_Position
H 1 K 2 T 3 H 3 N 1 S 5
K 2 T 3 Y 4 A 2 H 3 N 1
K 2 T 3 Y 4 AK 4 H 3 N 1
K 2 T 3 Y 4 AK 4 A 2 N 1

结果DataFrame将类似于以下内容:

P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2 E2_Position E3 E3_Position
T 3 K 2 H 1 N 1 H 3 S 5
T 3 K 2 Y 4 N 1 H 3 A 2
T 3 K 2 Y 4 N 1 H 3 AK 4
T 3 K 2 Y 4 N 1 A 2 AK 4

排序的目的是在尝试执行分组操作时保持一致的格式,例如,所有感兴趣的球员都位于每个条目的最左边。

我的当前思路是,我需要使用 rename(columns={}) 在每一行上使用 apply,但我对此的第一反应是它会很慢。

def reorderrows(row, players, enemies):
    进行一些操作(row)

desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)

是否有一种好的方法来强制执行该排序,考虑到每个列表 P: [('T', '3'), ('K', '2')]E: [('N', '1')]` 可能包含多达3名球员?

示例DataFrame:

import pandas as pd
    
df = pd.DataFrame({'P1': ['H', 'K', 'K', 'K'],
                   'P1_Position': [1, 2, 2, 2],
                   'P2': ['K', 'T', 'T', 'T'],
                   'P2_Position': [2, 3, 3, 3],
                   'P3': ['T', 'Y', 'Y', 'Y'],
                   'P3_Position': [3, 4, 4, 4],
                   'E1': ['H', 'A', 'AK', 'AK'],
                   'E1_Position': [3, 2, 4, 4],
                   'E2': ['N', 'H', 'H', 'A'],
                   'E2_Position': [1, 3, 3, 2],
                   'E3': ['S', 'N', 'N', 'N'],
                   'E3_Position': [5, 1, 1, 1],})
英文:

I have the following DataFrame and its separated by 2 teams P and E and each player and their position are related data so should be kept together P1 & P1_Position.

I'm interested in:

  • Viewing specific players from each team P: [('T', '3'), ('K', '2')]
    and E: [('N', '1')]

    I would like to order each row with respect to the players im
    interested in viewing, so that ('T', '3'), ('K', '2') and ('N',
    '1')
    at at the beginning of their teams.

P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2 E2_Position E3 E3_Position
H 1 K 2 T 3 H 3 N 1 S 5
K 2 T 3 Y 4 A 2 H 3 N 1
K 2 T 3 Y 4 AK 4 H 3 N 1
K 2 T 3 Y 4 AK 4 A 2 N 1

The resulting DataFrame would look something like:

P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2 E2_Position E3 E3_Position
T 3 K 2 H 1 N 1 H 3 S 5
T 3 K 2 Y 4 N 1 H 3 A 2
T 3 K 2 Y 4 N 1 H 3 AK 4
T 3 K 2 Y 4 N 1 A 2 AK 4

The purpose of the ordering is to have a consistent formatting when attempting to perform a groupby operation E.G All players of interest are at the far left of each entry.

My current thought process is that I will need to use rename(columns={}) across each row using apply, but my first reaction to that is its going to be slow.

def reorderrows(row, players, enemies):
    doSomething(row)

desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)

Is there a good way of enforcing that ordering given that each list P: [('T', '3'), ('K', '2')]
and
E: [('N', '1')]` could include up to 3 players?

Example DataFrame:

import pandas as pd

df = pd.DataFrame({'P1': ['H', 'K', 'K', 'K'],
                       'P1_Position': [1,2,2,2],
                       'P2': ['K', 'T', 'T', 'T'],
                       'P2_Position': [2,3,3,3],
                       'P3': ['T', 'Y', 'Y', 'Y'],
                       'P3_Position': [3,4,4,4],
                       'E1': ['H', 'A', 'AK', 'AK'],
                       'E1_Position': [3,2,4,4],
                       'E2': ['N', 'H', 'H', 'A'],
                       'E2_Position': [1,3,3,2],
                       'E3': ['S', 'N', 'N', 'N'],
                       'E3_Position': [5,1,1,1],})

答案1

得分: 1

首先,创建一个指定 P/E 玩家的字典,并将其转换为具有 index 列以进行排序的 DataFrame

d = {'P': [('T', '3'), ('K', '2')], 'E': [('N', '1')]}
    
df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
                   columns=['key', '_', 'Position']).reset_index()

然后,通过拆分列名中的 _ 来重新塑造数据框,将 NaN 重命名为 _,并使用 DataFrame.stack 进行重塑,通过非数字列从原始列名中提取 key 列,将 Position 转换为字符串,使用辅助的 DataFrame 进行左连接、排序并通过 DataFrame.unstack 重塑,以计算原始索引和列名的数量:

f = lambda x:  '_' if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
          .rename(columns=f)
          .stack(0)
          .reset_index()
          .assign(key=lambda x: x.level_1.str.extract('^(\D+)', expand=False),
                   Position = lambda x: x.Position.astype(str))
          .merge(df1, how='left')
          .sort_values('index')
          .assign(i = lambda x: x.groupby('key').cumcount(),
                   level_0 = lambda x: x['i'] % len(df),
                   level_1 = lambda x: x['key'] + (x['i'] // len(df) + 1).astype(str))
          .set_index(['level_0','level_1'])[['_', 'Position']].rename(columns={'_':''})
          .unstack()
          .pipe(lambda x: x.set_axis([f'{b}_{a}'.strip('_') for a,b in x.columns], axis=1))
          .reindex(df.columns, axis=1)
          )

最后,打印 df2

请注意,这是代码的翻译,其中包括一些Pandas库的使用。

英文:

First create dictionary for specify P/E players and convert to DataFrame with index column for sorting:

d = {'P': [('T', '3'), ('K', '2')], 'E': [('N', '1')]}

df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
                   columns=['key','_','Position']).reset_index()
print (df1)
   index key  _ Position
0      0   P  T        3
1      1   P  K        2
2      2   E  N        1

Then reshape DataFrame by split columns names by _ to MultiIndex, rename NaNs to _ and reshape by DataFrame.stack, extract column key by non numeric column from original columns names, convert Position to strings and use left join by helper DataFrame, sorting and reshape back by DataFrame.unstack with count original indices and columns names:

f = lambda x:  '_' if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
          .rename(columns=f)
          .stack(0)
          .reset_index()
          .assign(key=lambda x: x.level_1.str.extract('^(\D+)', expand=False),
                   Position = lambda x: x.Position.astype(str))
          .merge(df1, how='left')
          .sort_values('index')
          .assign(i = lambda x: x.groupby('key').cumcount(),
                   level_0 = lambda x: x['i'] % len(df),
                   level_1 = lambda x: x['key'] + (x['i'] // len(df) + 1).astype(str))
          .set_index(['level_0','level_1'])[['_','Position']].rename(columns={'_':''})
          .unstack()
          .pipe(lambda x: x.set_axis([f'{b}_{a}'.strip('_') for a,b in x.columns], axis=1))
          .reindex(df.columns, axis=1)
          )

print (df2)
        P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2  \
level_0                                                                  
0        T           3  K           2  H           1  N           1  H   
1        T           3  K           2  Y           4  N           1  S   
2        T           3  K           2  Y           4  N           1  A   
3        T           3  K           2  Y           4  N           1  H   

        E2_Position  E3 E3_Position  
level_0                              
0                 3  AK           4  
1                 5   H           3  
2                 2  AK           4  
3                 3   A           2  

huangapple
  • 本文由 发表于 2023年6月13日 18:50:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76464136.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定