如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

huangapple go评论95阅读模式
英文:

How to apply a function to the values of one column, then take the output and apply it to multiple other columns in pandas

问题

好的,以下是代码部分的翻译:

  1. Alright y'all, maybe my strategy here isn't ideal, but I've got a very awkward dataset to work with and I need help.
  2. I have a pandas dataframe that's structured such that only the first column has values:
  3. df =
  4. |Ind| Column A | Column B | Column C |
  5. | - | -------- | -------- | -------- |
  6. | 0 | String1 | Null | Null |
  7. | 1 | String2 | Null | Null |
  8. What I'd like to do is iteratively take the value from Column A and put it through a function whose output is a list. From there I need to fill the remaining columns with the output of the function, such that:
  9. df =
  10. |Ind| Column A | Column B | Column C |
  11. | - | -------- | ---------------- | ---------------- |
  12. | 0 | String1 | func(String1)[0] | func(String1)[1] |
  13. | 1 | String2 | func(String2)[0] | func(String2)[1] |
  14. Thus far I've been trying to do this using anonymous functions, as such:
  15. df.iloc[:,1:].apply(lambda y: df["Column A"].apply(lambda x: list(map(func, x)))
  16. Which almost does what I want, but does not map the list into the respective columns, and the result is instead:
  17. df =
  18. |Ind| Column A | Column B | Column C |
  19. | - | -------- | ------------- | ------------- |
  20. | 0 | String1 | func(String1) | func(String1) |
  21. | 1 | String2 | func(String2) | func(String2) |
  22. If there's a better approach I'm totally open.
英文:

Alright y'all, maybe my strategy here isn't ideal, but I've got a very awkward dataset to work with and I need help.

I have a pandas dataframe that's structured such that only the first column has values:

  1. df =
  2. |Ind| Column A | Column B | Column C |
  3. | - | -------- | -------- | -------- |
  4. | 0 | String1 | Null | Null |
  5. | 1 | String2 | Null | Null |

What I'd like to do is iteratively take the value from Column A and put it through a function whose output is a list. From there I need to fill the remaining columns with the output of the function, such that:

  1. df =
  2. |Ind| Column A | Column B | Column C |
  3. | - | -------- | ---------------- | ---------------- |
  4. | 0 | String1 | func(String1)[0] | func(String1)[1] |
  5. | 1 | String2 | func(String2)[0] | func(String2)[1] |

Thus far I've been trying to do this using anonymous functions, as such:

  1. df.iloc[:,1:].apply(lambda y: df["Column A"].apply(lambda x: list(map(func, x)))

Which almost does what I want, but does not map the list into the respective columns, and the result is instead:

  1. df =
  2. |Ind| Column A | Column B | Column C |
  3. | - | -------- | ------------- | ------------- |
  4. | 0 | String1 | func(String1) | func(String1) |
  5. | 1 | String2 | func(String2) | func(String2) |

If there's a better approach I'm totally open.

答案1

得分: 0

你可以使用一个临时助手列与列表一起,然后在最后删除它,像这样:

  1. df['temporary'] = df['Column A'].map(func)
  2. df['Column B'] = df['temporary'].str[0]
  3. df['Column C'] = df['temporary'].str[1]
  4. df = df.drop('temporary', axis=1)
英文:

You could use a temporary helper column with the list and then drop it at the end, like this:

  1. df['temporary'] = df['Column A'].map(func)
  2. df['Column B'] = df['temporary'].str[0]
  3. df['Column C'] = df['temporary'].str[1]
  4. df = df.drop('temporary', axis=1)

答案2

得分: 0

函数式编程并不像它们所说的那么有趣。这里有一个过程化版本,它对列中的每个值应用一个函数,并扩展数据帧。请注意,该函数可以返回不同数量的结果。

  1. import pandas as pd
  2. # 创建一个包含三行一列的数据帧
  3. df = pd.DataFrame(["foo", "bar", "fubar"], columns=["Column A"])
  4. # 应用于第1列并创建新行(可变长度是可以的)
  5. def fun(s):
  6. return list(c for c in s)
  7. # 通过应用函数创建一个新数据帧
  8. df2 = pd.DataFrame(fun(s) for s in df["Column A"])
  9. # 命名新列(第一列保持不变)
  10. column_names = {i: f"Column {'ABCDEFGHIJK'[i+1]}" for i in range(len(df2.columns))}
  11. # 使用新名称添加新列
  12. df = pd.concat([df, df2], axis=1).rename(columns=column_names)
  13. df

如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

英文:

Functional programming is not as fun as they say it is. Here's a procedural version, that applies a function to each value in the column, and extends the data frame. Note that the function can return a variable number of results.

  1. import pandas as pd
  2. # Make a three row, one column data frame
  3. df = pd.DataFrame(["foo","bar","fubar"],columns=["Column A"])
  4. # Apply to column 1 and create new rows (variable length is fine)
  5. def fun(s):
  6. return list(c for c in s)
  7. # Make a new data frame by applying function
  8. df2 = pd.DataFrame(fun(s) for s in df["Column A"])
  9. # Name new columns (first column remains same)
  10. column_names = {i:f"Column {'ABCDEFGHIJK'[i+1]}" for i in range(len(df2.columns))}
  11. # And add new columns using new names
  12. df = pd.concat([df,df2 ],axis=1).rename(columns=column_names)
  13. df

如何将一个函数应用于一列的值,然后将输出应用于pandas中的多个其他列。

huangapple
  • 本文由 发表于 2023年3月4日 07:12:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定