如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

huangapple go评论66阅读模式
英文:

How to apply a function to the values of one column, then take the output and apply it to multiple other columns in pandas

问题

好的,以下是代码部分的翻译:

Alright y'all, maybe my strategy here isn't ideal, but I've got a very awkward dataset to work with and I need help.

I have a pandas dataframe that's structured such that only the first column has values:

df = 
|Ind| Column A | Column B | Column C |
| - | -------- | -------- | -------- |
| 0 | String1  | Null     | Null     |
| 1 | String2  | Null     | Null     |

What I'd like to do is iteratively take the value from Column A and put it through a function whose output is a list. From there I need to fill the remaining columns with the output of the function, such that:

df = 
|Ind| Column A | Column B         | Column C         |
| - | -------- | ---------------- | ---------------- |
| 0 | String1  | func(String1)[0] | func(String1)[1] |
| 1 | String2  | func(String2)[0] | func(String2)[1] |

Thus far I've been trying to do this using anonymous functions, as such:

df.iloc[:,1:].apply(lambda y: df["Column A"].apply(lambda x: list(map(func, x)))

Which almost does what I want, but does not map the list into the respective columns, and the result is instead:

df = 
|Ind| Column A | Column B      | Column C      |
| - | -------- | ------------- | ------------- |
| 0 | String1  | func(String1) | func(String1) |
| 1 | String2  | func(String2) | func(String2) |

If there's a better approach I'm totally open.
英文:

Alright y'all, maybe my strategy here isn't ideal, but I've got a very awkward dataset to work with and I need help.

I have a pandas dataframe that's structured such that only the first column has values:

df = 
|Ind| Column A | Column B | Column C |
| - | -------- | -------- | -------- |
| 0 | String1  | Null     | Null     |
| 1 | String2  | Null     | Null     |

What I'd like to do is iteratively take the value from Column A and put it through a function whose output is a list. From there I need to fill the remaining columns with the output of the function, such that:

df = 
|Ind| Column A | Column B         | Column C         |
| - | -------- | ---------------- | ---------------- |
| 0 | String1  | func(String1)[0] | func(String1)[1] |
| 1 | String2  | func(String2)[0] | func(String2)[1] |

Thus far I've been trying to do this using anonymous functions, as such:

df.iloc[:,1:].apply(lambda y: df["Column A"].apply(lambda x: list(map(func, x)))

Which almost does what I want, but does not map the list into the respective columns, and the result is instead:

df = 
|Ind| Column A | Column B      | Column C      |
| - | -------- | ------------- | ------------- |
| 0 | String1  | func(String1) | func(String1) |
| 1 | String2  | func(String2) | func(String2) |

If there's a better approach I'm totally open.

答案1

得分: 0

你可以使用一个临时助手列与列表一起,然后在最后删除它,像这样:

df['temporary'] = df['Column A'].map(func)
df['Column B'] = df['temporary'].str[0]
df['Column C'] = df['temporary'].str[1]
df = df.drop('temporary', axis=1)
英文:

You could use a temporary helper column with the list and then drop it at the end, like this:

df['temporary'] = df['Column A'].map(func)
df['Column B'] = df['temporary'].str[0]
df['Column C'] = df['temporary'].str[1]
df = df.drop('temporary', axis=1)

答案2

得分: 0

函数式编程并不像它们所说的那么有趣。这里有一个过程化版本,它对列中的每个值应用一个函数,并扩展数据帧。请注意,该函数可以返回不同数量的结果。

import pandas as pd
# 创建一个包含三行一列的数据帧
df = pd.DataFrame(["foo", "bar", "fubar"], columns=["Column A"])
# 应用于第1列并创建新行(可变长度是可以的)
def fun(s):
    return list(c for c in s)
# 通过应用函数创建一个新数据帧
df2 = pd.DataFrame(fun(s) for s in df["Column A"])
# 命名新列(第一列保持不变)
column_names = {i: f"Column {'ABCDEFGHIJK'[i+1]}" for i in range(len(df2.columns))}
# 使用新名称添加新列
df = pd.concat([df, df2], axis=1).rename(columns=column_names)
df

如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

英文:

Functional programming is not as fun as they say it is. Here's a procedural version, that applies a function to each value in the column, and extends the data frame. Note that the function can return a variable number of results.

import pandas as pd
# Make a three row, one column data frame
df = pd.DataFrame(["foo","bar","fubar"],columns=["Column A"])
# Apply to column 1 and create new rows (variable length is fine)
def fun(s):
    return list(c for c in s)
# Make a new data frame by applying function
df2 = pd.DataFrame(fun(s) for s in df["Column A"])
# Name new columns (first column remains same)
column_names = {i:f"Column {'ABCDEFGHIJK'[i+1]}" for i in range(len(df2.columns))}
# And add new columns using new names
df = pd.concat([df,df2 ],axis=1).rename(columns=column_names)
df

如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

huangapple
  • 本文由 发表于 2023年3月4日 07:12:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定