你可以在Python中根据多个条件/其他变量创建一个新的数据框列吗?

huangapple go评论116阅读模式
英文:

How do you create a new column in a dataframe based on multiple conditions/other variables in Python?

问题

我正在使用一个大型人口统计数据集。我有一个名为"Ethnicity"的变量,其中包括:African American(非洲裔美国人)、White(白人)、Latino(拉丁裔)、和Asian(亚洲裔)。我还有一个"Gender"(性别)变量,其中每个人的性别标记为"M"(男)或"F"(女)。

我想要创建一个名为"Gen_Eth"的新变量/列,它根据性别和种族以以下方式进行编码:

  • African American 男性 = 0
  • African American 女性 = 1
  • White 男性 = 2
  • White 女性 = 3
  • Latino 男性 = 4
  • Latino 女性 = 5
  • Asian 男性 = 6
  • Asian 女性 = 7

我应该如何做到这一点?可以使用if-else语句基于"ethnicity"(种族)和"gender"(性别)变量,将性别转换为0和1,然后创建一个函数来处理这两个变量,等等。非常感谢!

英文:

I am working with a large demographic dataset. I have a variable called "Ethnicity" which has: African American, White, Latino, and Asian. I also have a "Gender" variable where the genders for each person are "M" or "F".

What I want to do is create a new variable/column called "Gen_Eth" where it is coded based on gender AND ethnicity in the following manner:

African American male = 0

African American female = 1

White male = 2

White female = 3

Latino male = 4

Latino female = 5

Asian male = 6

Asian female = 7

How would I go about doing this? An if-else statement based on the "ethnicity" and "gender" variables, convert gender to 0's and 1's and then a function with the 2 variables, etc. Thank you so much!

答案1

得分: 1

以下是您要的代码部分的中文翻译:

  1. # 数据
  2. d = {'ethnicity': ['亚洲', '非洲', '白人', '拉丁美洲', '拉丁美洲', '非洲', '白人', '亚洲'], 'gender': ['男', '女', '男', '女', '男', '女', '男', '女']}
  3. df = pd.DataFrame(data=d) # 创建pandas数据框
  4. df_int = df.copy(deep=True) # 复制数据框
  5. # 将标签更改为整数
  6. for idx, ethnicity in enumerate(['非洲', '白人', '拉丁美洲', '亚洲']):
  7. df_int['ethnicity'].loc[df_int['ethnicity'] == ethnicity] = idx
  8. for idx, gender in enumerate(['男', '女']):
  9. df_int['gender'].loc[df_int['gender'] == gender] = idx
  10. # 在原始数据框中创建一个新列
  11. df['Gen_Eth'] = 2 * df_int['ethnicity'] + df_int['gender']

希望这有所帮助。如果您需要更多信息或其他翻译,请告诉我。

英文:

Here is an example of what you can do.

  1. # Data
  2. d = {'ethnicity':['asian', 'african', 'white', 'latino', 'latino', 'african', 'white', 'asian'], 'gender':['M', 'F', 'M', 'F','M', 'F','M', 'F']}
  3. df = pd.DataFrame(data=d) # creating pandas dataframe
  4. df_int = df.copy(deep=True) # making a copy of dataframe
  5. # Changing the labels to int
  6. for idx, ethnicity in enumerate(['african', 'white', 'latino', 'asian']):
  7. df_int['ethnicity'].loc[df_int['ethnicity']==ethnicity] = idx
  8. for idx, gender in enumerate(['M', 'F']):
  9. df_int['gender'].loc[df_int['gender']==gender] = idx
  10. # Creating a new column in your original dataframe
  11. df['Gen_Eth'] = 2*df_int['ethnicity'] + df_int['gender']

答案2

得分: 1

我特别喜欢使用map函数来实现这个。这个答案的其余部分与@Jason Yu的答案类似。

  1. ethnic_mapper = {"African American": 0, "White": 1, "Latino": 2, "Asian": 3}
  2. gender_mapper = {"M": 0, "F": 1}
  3. df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
  4. df["gender_int"] = df.gender.map(gender_mapper)
  1. gender ethnicity ethnic_int gender_int
  2. 0 M African American 0 0
  3. 1 M White 1 0
  4. 2 F Latino 2 1
  5. 3 M Asian 3 0
  6. 4 F Latino 2 1
  7. 5 F Asian 3 1
  1. df["gen_eth"] = 2 * df["ethnic_int"] + df["gender_int"]
  2. df = df.drop(columns=["ethnic_int", "gender_int"])
  1. gender ethnicity gen_eth
  2. 0 M African American 0
  3. 1 M White 2
  4. 2 F Latino 5
  5. 3 M Asian 6
  6. 4 F Latino 5
  7. 5 F Asian 7
英文:

I particularly like using the map function for this. The rest of this answer is similar to @Jason Yu's answer

  1. ethnic_mapper = {"African American":0,"White":1,"Latino":2,"Asian":3}
  2. gender_mapper = {"M":0,"F":1}
  3. df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
  4. df["gender_int"] = df.gender.map(gender_mapper)
  1. gender ethnicity ethnic_int gender_int
  2. 0 M African American 0 0
  3. 1 M White 1 0
  4. 2 F Latino 2 1
  5. 3 M Asian 3 0
  6. 4 F Latino 2 1
  7. 5 F Asian 3 1
  1. df["gen_eth"] = 2*df["ethnic_int"] + df["gender_int"]
  2. df = df.drop(columns=["ethnic_int","gender_int"])
  1. gender ethnicity gen_eth
  2. 0 M African American 0
  3. 1 M White 2
  4. 2 F Latino 5
  5. 3 M Asian 6
  6. 4 F Latino 5
  7. 5 F Asian 7

答案3

得分: 0

我会在df.apply()与字典一起使用。

示例:

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. "gender": ["M", "M", "F", "M", "F", "F"],
  4. "ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
  5. })
  6. ge_map = {
  7. "M": 0,
  8. "F": 1,
  9. "African American": 0,
  10. "White": 2,
  11. "Latino": 4,
  12. "Asian": 6
  13. }
  14. df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis=1)
  15. print(df)

结果:

id gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7
英文:

I would use a dictionary in conjunction with df.apply().

Example:

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. "gender": ["M", "M", "F", "M", "F", "F"],
  4. "ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
  5. })
  6. ge_map = {
  7. "M": 0,
  8. "F": 1,
  9. "African American": 0,
  10. "White": 2,
  11. "Latino": 4,
  12. "Asian": 6
  13. }
  14. df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis= 1)
  15. print(df)

Result:

id gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7

huangapple
  • 本文由 发表于 2023年2月18日 12:24:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491183.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定