2023年2月9日 00:53:11go评论106阅读模式

英文:

Creating column based off values in separate row

问题

这是我的数据框的一个示例：

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])

我想创建一个新列"Club"，该列的值取自"Player"列的单元格，并将其附加到下面的球员。

棘手的部分是将正确的俱乐部分配给正确的球员。

这是我期望的输出：

df = pd.DataFrame([['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium', 'Arsenal FC'],
                   ['Jakub Kiwior', 22, 'Poland', 'Arsenal FC'],
                   ['Jorginho', 32, 'Italy', 'Arsenal FC'],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández', 22, 'Argentina', 'Chelsea FC'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine', 'Chelsea FC'],
                  ], columns=['Player', 'Age', 'Nat.', 'Club'])

我找不到与这个问题相关的其他问题。在Python中是否可能实现这一点？

英文:

Here is an example of my dataframe:

df = pd.DataFrame([[&#39;Arsenal FC&#39;, &#39;&#39;, &#39;&#39;],
                   [&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Leandro Trossard&#39;, 28, &#39;Belgium&#39;],
                   [&#39;Jakub Kiwior&#39;, 22, &#39;Poland&#39;],
                   [&#39;Jorginho&#39;, 32, &#39;Italy&#39;],
                   [&#39;Chelsea FC&#39;, &#39;&#39;, &#39;&#39;],
                   [&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Enzo Fern&#225;ndez	&#39;, 22, &#39;Argentina&#39;],
                   [&#39;Mykhaylo Mudryk&#39;, 22, &#39;Ukraine&#39;],
                  ], columns=[&#39;Player&#39;, &#39;Age&#39;, &#39;Nat.&#39;])

I want to create a new column "Club" which takes the string value of the cell in the "Player" and attaches it to the player below.

The tricky part is getting the right clubs assigned to the right players

This is my desired output:

df = pd.DataFrame([[&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Leandro Trossard&#39;, 28, &#39;Belgium&#39;, &#39;Arsenal FC&#39;],
                   [&#39;Jakub Kiwior&#39;, 22, &#39;Poland&#39;, &#39;Arsenal FC&#39;],
                   [&#39;Jorginho&#39;, 32, &#39;Italy&#39;, &#39;Arsenal FC&#39;],
                   [&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Enzo Fern&#225;ndez	&#39;, 22, &#39;Argentina&#39;, &#39;Chelsea FC&#39;],
                   [&#39;Mykhaylo Mudryk&#39;, 22, &#39;Ukraine&#39;, &#39;Chelsea FC&#39;],
                  ], columns=[&#39;Player&#39;, &#39;Age&#39;, &#39;Nat.&#39;, &#39;Club&#39;])

I can't find another question that relates to this problem. Is this possible in python?

答案1

得分: 2

以下是代码部分的翻译：

One option using boolean masks with mask and ffill:

# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('Player')
out = df[m1&m2].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

             Player Age       Nat.        Club
2  Leandro Trossard  28    Belgium  Arsenal FC
3      Jakub Kiwior  22     Poland  Arsenal FC
4          Jorginho  32      Italy  Arsenal FC
7   Enzo Fern&#225;ndez   22  Argentina  Chelsea FC
8   Mykhaylo Mudryk  22    Ukraine  Chelsea FC

Intermediates:

             Player     m1        mask       ffill
0        Arsenal FC  False  Arsenal FC  Arsenal FC
2  Leandro Trossard   True         NaN  Arsenal FC
3      Jakub Kiwior   True         NaN  Arsenal FC
4          Jorginho   True         NaN  Arsenal FC
5        Chelsea FC  False  Chelsea FC  Chelsea FC
7   Enzo Fern&#225;ndez    True         NaN  Chelsea FC
8   Mykhaylo Mudryk   True         NaN  Chelsea FC

keeping the In/Age/Nat rows

# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('In')
out = df[m1].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

             Player  Age       Nat.        Club
1                In  Age       Nat.         NaN
2  Leandro Trossard   28    Belgium  Arsenal FC
3      Jakub Kiwior   22     Poland  Arsenal FC
4          Jorginho   32      Italy  Arsenal FC
6                In  Age       Nat.         NaN
7   Enzo Fern&#225;ndez    22  Argentina  Chelsea FC
8   Mykhaylo Mudryk   22    Ukraine  Chelsea FC

英文:

One option using boolean masks with mask and ffill:

# which rows are empty string on Age?
m1 = df[&#39;Age&#39;].ne(&#39;&#39;)
# which row are not internal headers?
m2 = df[&#39;Player&#39;].ne(&#39;Player&#39;)
out = df[m1&amp;m2].assign(Club=df.loc[m2, &#39;Player&#39;].mask(m1).ffill())

Output:

             Player Age       Nat.        Club
2  Leandro Trossard  28    Belgium  Arsenal FC
3      Jakub Kiwior  22     Poland  Arsenal FC
4          Jorginho  32      Italy  Arsenal FC
7   Enzo Fern&#225;ndez   22  Argentina  Chelsea FC
8   Mykhaylo Mudryk  22    Ukraine  Chelsea FC

Intermediates:

             Player     m1        mask       ffill
0        Arsenal FC  False  Arsenal FC  Arsenal FC
2  Leandro Trossard   True         NaN  Arsenal FC
3      Jakub Kiwior   True         NaN  Arsenal FC
4          Jorginho   True         NaN  Arsenal FC
5        Chelsea FC  False  Chelsea FC  Chelsea FC
7   Enzo Fern&#225;ndez    True         NaN  Chelsea FC
8   Mykhaylo Mudryk   True         NaN  Chelsea FC

keeping the In/Age/Nat rows

# which rows are empty string on Age?
m1 = df[&#39;Age&#39;].ne(&#39;&#39;)
# which row are not internal headers?
m2 = df[&#39;Player&#39;].ne(&#39;In&#39;)
out = df[m1].assign(Club=df.loc[m2, &#39;Player&#39;].mask(m1).ffill())

Output:

             Player  Age       Nat.        Club
1                In  Age       Nat.         NaN
2  Leandro Trossard   28    Belgium  Arsenal FC
3      Jakub Kiwior   22     Poland  Arsenal FC
4          Jorginho   32      Italy  Arsenal FC
6                In  Age       Nat.         NaN
7   Enzo Fern&#225;ndez    22  Argentina  Chelsea FC
8   Mykhaylo Mudryk   22    Ukraine  Chelsea FC

答案2

得分: 1

以下是代码部分的翻译：

Edited:

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])
clubs = []
current_club = None
for i, row in df.iterrows():
    if row['Player'] in ['Arsenal FC', 'Chelsea FC']:
        current_club = row['Player']
    elif row['Player'] == 'In':
        continue
    else:
        clubs.append(current_club)
df['Club'] = clubs
print(df)

Output:

           Player  Age       Nat.       Club
0      Arsenal FC         Arsenal FC
1             In   Age       Nat.      NaN
2  Leandro Trossard   28     Belgium  Arsenal FC
3     Jakub Kiwior   22     Poland  Arsenal FC
4         Jorginho   32     Italy  Arsenal FC
5     Chelsea FC         Chelsea FC
6             In   Age       Nat.      NaN
7  Enzo Fernández    22   Argentina  Chelsea FC
8  Mykhaylo Mudryk   22     Ukraine  Chelsea FC

Edit 2: Multiple club names

clubs = ['Arsenal FC', 'Chelsea FC', 'Other Club 1', 'Other Club 2', ..., 'Other Club n']
def get_club(row, clubs):
    if row['Player'] in clubs:
        return row['Player']
    else:
        return ''
df['Club'] = ''
club = ''
for index, row in df.iterrows():
    if row['Player'] in clubs:
        club = row['Player']
    else:
        df.at[index, 'Club'] = club
df = df[df['Club'] != ''].reset_index(drop=True)
df['Club'] = df.apply(lambda x: get_club(x, clubs), axis=1)

英文:

Edited:

df = pd.DataFrame([[&#39;Arsenal FC&#39;, &#39;&#39;, &#39;&#39;],
                   [&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Leandro Trossard&#39;, 28, &#39;Belgium&#39;],
                   [&#39;Jakub Kiwior&#39;, 22, &#39;Poland&#39;],
                   [&#39;Jorginho&#39;, 32, &#39;Italy&#39;],
                   [&#39;Chelsea FC&#39;, &#39;&#39;, &#39;&#39;],
                   [&#39;In&#39;, &#39;Age&#39;, &#39;Nat.&#39;],
                   [&#39;Enzo Fern&#225;ndez &#39;, 22, &#39;Argentina&#39;],
                   [&#39;Mykhaylo Mudryk&#39;, 22, &#39;Ukraine&#39;],
                  ], columns=[&#39;Player&#39;, &#39;Age&#39;, &#39;Nat.&#39;])
clubs = []
current_club = None
for i, row in df.iterrows():
    if row[&#39;Player&#39;] in [&#39;Arsenal FC&#39;, &#39;Chelsea FC&#39;]:
        current_club = row[&#39;Player&#39;]
    elif row[&#39;Player&#39;] == &#39;In&#39;:
        continue
    else:
        clubs.append(current_club)
df[&#39;Club&#39;] = clubs
print(df)

Output:

           Player Age     Nat.       Club
0      Arsenal FC        Arsenal FC
1             In   Age     Nat.      NaN
2  Leandro Trossard  28   Belgium  Arsenal FC
3     Jakub Kiwior  22   Poland  Arsenal FC
4         Jorginho  32    Italy  Arsenal FC
5       Chelsea FC        Chelsea FC
6             In   Age     Nat.      NaN
7  Enzo Fern&#225;ndez   22 Argentina  Chelsea FC
8  Mykhaylo Mudryk  22   Ukraine  Chelsea FC

Edit 2: Multiple club names

clubs = [&#39;Arsenal FC&#39;, &#39;Chelsea FC&#39;, &#39;Other Club 1&#39;, &#39;Other Club 2&#39;, ..., &#39;Other Club n&#39;]
def get_club(row, clubs):
    if row[&#39;Player&#39;] in clubs:
        return row[&#39;Player&#39;]
    else:
        return &#39;&#39;
df[&#39;Club&#39;] = &#39;&#39;
club = &#39;&#39;
for index, row in df.iterrows():
    if row[&#39;Player&#39;] in clubs:
        club = row[&#39;Player&#39;]
    else:
        df.at[index, &#39;Club&#39;] = club
df = df[df[&#39;Club&#39;] != &#39;&#39;].reset_index(drop=True)
df[&#39;Club&#39;] = df.apply(lambda x: get_club(x, clubs), axis=1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于单独行中的值创建列

问题

答案1

keeping the In/Age/Nat rows

答案2

HTML 中的 “"” 在 Django 中没有转换为 “'”。

如何使用Huggingface transformers加载基于llama的fine-tuned peft/lora模型？

数据框中的新列不保留 POSIXct 类。

没有按照我期望的方式输出。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论