2023年6月13日 18:50:45go评论53阅读模式

英文:

Pandas: Updating the Position of Values in a Row

问题

以下是您要的翻译内容：

我有以下的DataFrame，它被分成了两个团队 P 和 E，每个球员及其位置是相关的数据，所以应该保持在一起 P1 和 P1_Position。

我感兴趣的是：

查看每个团队中特定的球员 P: [('T', '3'), ('K', '2')]
和 E: [('N', '1')]

我希望根据我想查看的球员对每行进行排序，以便 ('T', '3'), ('K', '2') 和 ('N', '1') 在它们的团队中位于开头。

P1	P1_Position	P2	P2_Position	P3	P3_Position	E1	E1_Position	E2	E2_Position	E3	E3_Position
H	1	K	2	T	3	H	3	N	1	S	5
K	2	T	3	Y	4	A	2	H	3	N	1
K	2	T	3	Y	4	AK	4	H	3	N	1
K	2	T	3	Y	4	AK	4	A	2	N	1

结果DataFrame将类似于以下内容：

P1	P1_Position	P2	P2_Position	P3	P3_Position	E1	E1_Position	E2	E2_Position	E3	E3_Position
T	3	K	2	H	1	N	1	H	3	S	5
T	3	K	2	Y	4	N	1	H	3	A	2
T	3	K	2	Y	4	N	1	H	3	AK	4
T	3	K	2	Y	4	N	1	A	2	AK	4

排序的目的是在尝试执行分组操作时保持一致的格式，例如，所有感兴趣的球员都位于每个条目的最左边。

我的当前思路是，我需要使用 rename(columns={}) 在每一行上使用 apply，但我对此的第一反应是它会很慢。

def reorderrows(row, players, enemies):
    进行一些操作(row)

desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)

是否有一种好的方法来强制执行该排序，考虑到每个列表 P: [('T', '3'), ('K', '2')]和E: [('N', '1')]` 可能包含多达3名球员？

示例DataFrame:

import pandas as pd
    
df = pd.DataFrame({'P1': ['H', 'K', 'K', 'K'],
                   'P1_Position': [1, 2, 2, 2],
                   'P2': ['K', 'T', 'T', 'T'],
                   'P2_Position': [2, 3, 3, 3],
                   'P3': ['T', 'Y', 'Y', 'Y'],
                   'P3_Position': [3, 4, 4, 4],
                   'E1': ['H', 'A', 'AK', 'AK'],
                   'E1_Position': [3, 2, 4, 4],
                   'E2': ['N', 'H', 'H', 'A'],
                   'E2_Position': [1, 3, 3, 2],
                   'E3': ['S', 'N', 'N', 'N'],
                   'E3_Position': [5, 1, 1, 1],})

英文:

I have the following DataFrame and its separated by 2 teams P and E and each player and their position are related data so should be kept together P1 & P1_Position.

I'm interested in:

Viewing specific players from each team P: [('T', '3'), ('K', '2')]
and E: [('N', '1')]

I would like to order each row with respect to the players im
interested in viewing, so that ('T', '3'), ('K', '2') and ('N', '1') at at the beginning of their teams.

P1	P1_Position	P2	P2_Position	P3	P3_Position	E1	E1_Position	E2	E2_Position	E3	E3_Position
H	1	K	2	T	3	H	3	N	1	S	5
K	2	T	3	Y	4	A	2	H	3	N	1
K	2	T	3	Y	4	AK	4	H	3	N	1
K	2	T	3	Y	4	AK	4	A	2	N	1

The resulting DataFrame would look something like:

P1	P1_Position	P2	P2_Position	P3	P3_Position	E1	E1_Position	E2	E2_Position	E3	E3_Position
T	3	K	2	H	1	N	1	H	3	S	5
T	3	K	2	Y	4	N	1	H	3	A	2
T	3	K	2	Y	4	N	1	H	3	AK	4
T	3	K	2	Y	4	N	1	A	2	AK	4

The purpose of the ordering is to have a consistent formatting when attempting to perform a groupby operation E.G All players of interest are at the far left of each entry.

My current thought process is that I will need to use rename(columns={}) across each row using apply, but my first reaction to that is its going to be slow.

def reorderrows(row, players, enemies):
    doSomething(row)

desiredrows = desiredrows.apply(reorderrows, axis=1, players=players, enemies=enemies)

Is there a good way of enforcing that ordering given that each list P: [('T', '3'), ('K', '2')]andE: [('N', '1')]` could include up to 3 players?

Example DataFrame:

import pandas as pd

df = pd.DataFrame({&#39;P1&#39;: [&#39;H&#39;, &#39;K&#39;, &#39;K&#39;, &#39;K&#39;],
                       &#39;P1_Position&#39;: [1,2,2,2],
                       &#39;P2&#39;: [&#39;K&#39;, &#39;T&#39;, &#39;T&#39;, &#39;T&#39;],
                       &#39;P2_Position&#39;: [2,3,3,3],
                       &#39;P3&#39;: [&#39;T&#39;, &#39;Y&#39;, &#39;Y&#39;, &#39;Y&#39;],
                       &#39;P3_Position&#39;: [3,4,4,4],
                       &#39;E1&#39;: [&#39;H&#39;, &#39;A&#39;, &#39;AK&#39;, &#39;AK&#39;],
                       &#39;E1_Position&#39;: [3,2,4,4],
                       &#39;E2&#39;: [&#39;N&#39;, &#39;H&#39;, &#39;H&#39;, &#39;A&#39;],
                       &#39;E2_Position&#39;: [1,3,3,2],
                       &#39;E3&#39;: [&#39;S&#39;, &#39;N&#39;, &#39;N&#39;, &#39;N&#39;],
                       &#39;E3_Position&#39;: [5,1,1,1],})

答案1

得分: 1

首先，创建一个指定 P/E 玩家的字典，并将其转换为具有 index 列以进行排序的 DataFrame：

d = {'P': [('T', '3'), ('K', '2')], 'E': [('N', '1')]}
    
df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
                   columns=['key', '_', 'Position']).reset_index()

然后，通过拆分列名中的 _ 来重新塑造数据框，将 NaN 重命名为 _，并使用 DataFrame.stack 进行重塑，通过非数字列从原始列名中提取 key 列，将 Position 转换为字符串，使用辅助的 DataFrame 进行左连接、排序并通过 DataFrame.unstack 重塑，以计算原始索引和列名的数量：

f = lambda x:  '_' if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
          .rename(columns=f)
          .stack(0)
          .reset_index()
          .assign(key=lambda x: x.level_1.str.extract('^(\D+)', expand=False),
                   Position = lambda x: x.Position.astype(str))
          .merge(df1, how='left')
          .sort_values('index')
          .assign(i = lambda x: x.groupby('key').cumcount(),
                   level_0 = lambda x: x['i'] % len(df),
                   level_1 = lambda x: x['key'] + (x['i'] // len(df) + 1).astype(str))
          .set_index(['level_0','level_1'])[['_', 'Position']].rename(columns={'_':''})
          .unstack()
          .pipe(lambda x: x.set_axis([f'{b}_{a}'.strip('_') for a,b in x.columns], axis=1))
          .reindex(df.columns, axis=1)
          )

最后，打印 df2。

请注意，这是代码的翻译，其中包括一些Pandas库的使用。

英文:

First create dictionary for specify P/E players and convert to DataFrame with index column for sorting:

d = {&#39;P&#39;: [(&#39;T&#39;, &#39;3&#39;), (&#39;K&#39;, &#39;2&#39;)], &#39;E&#39;: [(&#39;N&#39;, &#39;1&#39;)]}

df1 = pd.DataFrame([(k, *x) for k, v in d.items() for x in v],
                   columns=[&#39;key&#39;,&#39;_&#39;,&#39;Position&#39;]).reset_index()
print (df1)
   index key  _ Position
0      0   P  T        3
1      1   P  K        2
2      2   E  N        1

Then reshape DataFrame by split columns names by _ to MultiIndex, rename NaNs to _ and reshape by DataFrame.stack, extract column key by non numeric column from original columns names, convert Position to strings and use left join by helper DataFrame, sorting and reshape back by DataFrame.unstack with count original indices and columns names:

f = lambda x:  &#39;_&#39; if pd.isna(x) else x
df2 = (df.pipe(lambda x: x.set_axis(x.columns.str.split(&#39;_&#39;, expand=True), axis=1))
          .rename(columns=f)
          .stack(0)
          .reset_index()
          .assign(key=lambda x: x.level_1.str.extract(&#39;^(\D+)&#39;, expand=False),
                   Position = lambda x: x.Position.astype(str))
          .merge(df1, how=&#39;left&#39;)
          .sort_values(&#39;index&#39;)
          .assign(i = lambda x: x.groupby(&#39;key&#39;).cumcount(),
                   level_0 = lambda x: x[&#39;i&#39;] % len(df),
                   level_1 = lambda x: x[&#39;key&#39;] + (x[&#39;i&#39;] // len(df) + 1).astype(str))
          .set_index([&#39;level_0&#39;,&#39;level_1&#39;])[[&#39;_&#39;,&#39;Position&#39;]].rename(columns={&#39;_&#39;:&#39;&#39;})
          .unstack()
          .pipe(lambda x: x.set_axis([f&#39;{b}_{a}&#39;.strip(&#39;_&#39;) for a,b in x.columns], axis=1))
          .reindex(df.columns, axis=1)
          )

print (df2)
        P1 P1_Position P2 P2_Position P3 P3_Position E1 E1_Position E2  \
level_0                                                                  
0        T           3  K           2  H           1  N           1  H   
1        T           3  K           2  Y           4  N           1  S   
2        T           3  K           2  Y           4  N           1  A   
3        T           3  K           2  Y           4  N           1  H   

        E2_Position  E3 E3_Position  
level_0                              
0                 3  AK           4  
1                 5   H           3  
2                 2  AK           4  
3                 3   A           2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas：更新行中的值的位置

问题

答案1

调用 pickle 方法

在Pandas数据框中通过分组行值来计算平均值。

日期替换 Pandas

使用部分共享内存的多进程处理

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论