英文:
Pandas: Updating the Position of Values in a Row
问题
以下是您要的翻译内容:
我有以下的DataFrame,它被分成了两个团队 P
和 E
,每个球员及其位置是相关的数据,所以应该保持在一起 P1
和 P1_Position
。
我感兴趣的是:
-
查看每个团队中特定的球员
P: [('T', '3'), ('K', '2')]
和E: [('N', '1')]
我希望根据我想查看的球员对每行进行排序,以便
('T', '3'), ('K', '2')
和('N', '1')
在它们的团队中位于开头。
P1 | P1_Position | P2 | P2_Position | P3 | P3_Position | E1 | E1_Position | E2 | E2_Position | E3 | E3_Position |
---|---|---|---|---|---|---|---|---|---|---|---|
H | 1 | K | 2 | T | 3 | H | 3 | N | 1 | S | 5 |
K | 2 | T | 3 | Y | 4 | A | 2 | H | 3 | N | 1 |
K | 2 | T | 3 | Y | 4 | AK | 4 | H | 3 | N | 1 |
K | 2 | T | 3 | Y | 4 | AK | 4 | A | 2 | N | 1 |
结果DataFrame将类似于以下内容:
P1 | P1_Position | P2 | P2_Position | P3 | P3_Position | E1 | E1_Position | E2 | E2_Position | E3 | E3_Position |
---|---|---|---|---|---|---|---|---|---|---|---|
T | 3 | K | 2 | H | 1 | N | 1 | H | 3 | S | 5 |
T | 3 | K | 2 | Y | 4 | N | 1 | H | 3 | A | 2 |
T | 3 | K | 2 | Y | 4 | N | 1 | H | 3 | AK | 4 |
T | 3 | K | 2 | Y | 4 | N | 1 | A | 2 | AK | 4 |
排序的目的是在尝试执行分组操作时保持一致的格式,例如,所有感兴趣的球员都位于每个条目的最左边。
我的当前思路是,我需要使用 rename(columns={})
在每一行上使用 apply
,但我对此的第一反应是它会很慢。
def reorderrows(row, players, enemies):
进行一些操作(row)
desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)
是否有一种好的方法来强制执行该排序,考虑到每个列表 P: [('T', '3'), ('K', '2')]和
E: [('N', '1')]` 可能包含多达3名球员?
示例DataFrame:
import pandas as pd
df = pd.DataFrame({'P1': ['H', 'K', 'K', 'K'],
'P1_Position': [1, 2, 2, 2],
'P2': ['K', 'T', 'T', 'T'],
'P2_Position': [2, 3, 3, 3],
'P3': ['T', 'Y', 'Y', 'Y'],
'P3_Position': [3, 4, 4, 4],
'E1': ['H', 'A', 'AK', 'AK'],
'E1_Position': [3, 2, 4, 4],
'E2': ['N', 'H', 'H', 'A'],
'E2_Position': [1, 3, 3, 2],
'E3': ['S', 'N', 'N', 'N'],
'E3_Position': [5, 1, 1, 1],})
英文:
I have the following DataFrame and its separated by 2 teams P
and E
and each player and their position are related data so should be kept together P1
& P1_Position
.
I'm interested in:
-
Viewing specific players from each team
P: [('T', '3'), ('K', '2')]
andE: [('N', '1')]
I would like to order each row with respect to the players im
interested in viewing, so that('T', '3'), ('K', '2')
and('N',
at at the beginning of their teams.
'1')
P1 | P1_Position | P2 | P2_Position | P3 | P3_Position | E1 | E1_Position | E2 | E2_Position | E3 | E3_Position |
---|---|---|---|---|---|---|---|---|---|---|---|
H | 1 | K | 2 | T | 3 | H | 3 | N | 1 | S | 5 |
K | 2 | T | 3 | Y | 4 | A | 2 | H | 3 | N | 1 |
K | 2 | T | 3 | Y | 4 | AK | 4 | H | 3 | N | 1 |
K | 2 | T | 3 | Y | 4 | AK | 4 | A | 2 | N | 1 |
The resulting DataFrame would look something like:
P1 | P1_Position | P2 | P2_Position | P3 | P3_Position | E1 | E1_Position | E2 | E2_Position | E3 | E3_Position |
---|---|---|---|---|---|---|---|---|---|---|---|
T | 3 | K | 2 | H | 1 | N | 1 | H | 3 | S | 5 |
T | 3 | K | 2 | Y | 4 | N | 1 | H | 3 | A | 2 |
T | 3 | K | 2 | Y | 4 | N | 1 | H | 3 | AK | 4 |
T | 3 | K | 2 | Y | 4 | N | 1 | A | 2 | AK | 4 |
The purpose of the ordering is to have a consistent formatting when attempting to perform a groupby operation E.G All players of interest are at the far left of each entry.
My current thought process is that I will need to use rename(columns={})
across each row using apply
, but my first reaction to that is its going to be slow.
def reorderrows(row, players, enemies):
doSomething(row)
desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)
Is there a good way of enforcing that ordering given that each list P: [('T', '3'), ('K', '2')]
E: [('N', '1')]` could include up to 3 players?
and
Example DataFrame:
import pandas as pd
df = pd.DataFrame({'P1': ['H', 'K', 'K', 'K'],
'P1_Position': [1,2,2,2],
'P2': ['K', 'T', 'T', 'T'],
'P2_Position': [2,3,3,3],
'P3': ['T', 'Y', 'Y', 'Y'],
'P3_Position': [3,4,4,4],
'E1': ['H', 'A', 'AK', 'AK'],
'E1_Position': [3,2,4,4],
'E2': ['N', 'H', 'H', 'A'],
'E2_Position': [1,3,3,2],
'E3': ['S', 'N', 'N', 'N'],
'E3_Position': [5,1,1,1],})
答案1
得分: 1
首先,创建一个指定 P/E
玩家的字典,并将其转换为具有 index
列以进行排序的 DataFrame
:
d = {'P': [('T', '3'), ('K', '2')], 'E': [('N', '1')]}
df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
columns=['key', '_', 'Position']).reset_index()
然后,通过拆分列名中的 _
来重新塑造数据框,将 NaN
重命名为 _
,并使用 DataFrame.stack
进行重塑,通过非数字列从原始列名中提取 key
列,将 Position
转换为字符串,使用辅助的 DataFrame
进行左连接、排序并通过 DataFrame.unstack
重塑,以计算原始索引和列名的数量:
f = lambda x: '_' if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
.rename(columns=f)
.stack(0)
.reset_index()
.assign(key=lambda x: x.level_1.str.extract('^(\D+)', expand=False),
Position = lambda x: x.Position.astype(str))
.merge(df1, how='left')
.sort_values('index')
.assign(i = lambda x: x.groupby('key').cumcount(),
level_0 = lambda x: x['i'] % len(df),
level_1 = lambda x: x['key'] + (x['i'] // len(df) + 1).astype(str))
.set_index(['level_0','level_1'])[['_', 'Position']].rename(columns={'_':''})
.unstack()
.pipe(lambda x: x.set_axis([f'{b}_{a}'.strip('_') for a,b in x.columns], axis=1))
.reindex(df.columns, axis=1)
)
最后,打印 df2
。
请注意,这是代码的翻译,其中包括一些Pandas库的使用。
英文:
First create dictionary for specify P/E
players and convert to DataFrame
with index
column for sorting:
d = {'P': [('T', '3'), ('K', '2')], 'E': [('N', '1')]}
df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
columns=['key','_','Position']).reset_index()
print (df1)
index key _ Position
0 0 P T 3
1 1 P K 2
2 2 E N 1
Then reshape DataFrame by split columns names by _
to MultiIndex
, rename NaN
s to _
and reshape by DataFrame.stack
, extract column key
by non numeric column from original columns names, convert Position
to strings and use left join by helper DataFrame
, sorting and reshape back by DataFrame.unstack
with count original indices and columns names:
f = lambda x: '_' if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
.rename(columns=f)
.stack(0)
.reset_index()
.assign(key=lambda x: x.level_1.str.extract('^(\D+)', expand=False),
Position = lambda x: x.Position.astype(str))
.merge(df1, how='left')
.sort_values('index')
.assign(i = lambda x: x.groupby('key').cumcount(),
level_0 = lambda x: x['i'] % len(df),
level_1 = lambda x: x['key'] + (x['i'] // len(df) + 1).astype(str))
.set_index(['level_0','level_1'])[['_','Position']].rename(columns={'_':''})
.unstack()
.pipe(lambda x: x.set_axis([f'{b}_{a}'.strip('_') for a,b in x.columns], axis=1))
.reindex(df.columns, axis=1)
)
print (df2)
P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2 \
level_0
0 T 3 K 2 H 1 N 1 H
1 T 3 K 2 Y 4 N 1 S
2 T 3 K 2 Y 4 N 1 A
3 T 3 K 2 Y 4 N 1 H
E2_Position E3 E3_Position
level_0
0 3 AK 4
1 5 H 3
2 2 AK 4
3 3 A 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论