你可以在Python中根据多个条件/其他变量创建一个新的数据框列吗?

huangapple go评论89阅读模式
英文:

How do you create a new column in a dataframe based on multiple conditions/other variables in Python?

问题

我正在使用一个大型人口统计数据集。我有一个名为"Ethnicity"的变量,其中包括:African American(非洲裔美国人)、White(白人)、Latino(拉丁裔)、和Asian(亚洲裔)。我还有一个"Gender"(性别)变量,其中每个人的性别标记为"M"(男)或"F"(女)。

我想要创建一个名为"Gen_Eth"的新变量/列,它根据性别和种族以以下方式进行编码:

  • African American 男性 = 0
  • African American 女性 = 1
  • White 男性 = 2
  • White 女性 = 3
  • Latino 男性 = 4
  • Latino 女性 = 5
  • Asian 男性 = 6
  • Asian 女性 = 7

我应该如何做到这一点?可以使用if-else语句基于"ethnicity"(种族)和"gender"(性别)变量,将性别转换为0和1,然后创建一个函数来处理这两个变量,等等。非常感谢!

英文:

I am working with a large demographic dataset. I have a variable called "Ethnicity" which has: African American, White, Latino, and Asian. I also have a "Gender" variable where the genders for each person are "M" or "F".

What I want to do is create a new variable/column called "Gen_Eth" where it is coded based on gender AND ethnicity in the following manner:

African American male = 0

African American female = 1

White male = 2

White female = 3

Latino male = 4

Latino female = 5

Asian male = 6

Asian female = 7

How would I go about doing this? An if-else statement based on the "ethnicity" and "gender" variables, convert gender to 0's and 1's and then a function with the 2 variables, etc. Thank you so much!

答案1

得分: 1

以下是您要的代码部分的中文翻译:

# 数据
d = {'ethnicity': ['亚洲', '非洲', '白人', '拉丁美洲', '拉丁美洲', '非洲', '白人', '亚洲'], 'gender': ['男', '女', '男', '女', '男', '女', '男', '女']}

df = pd.DataFrame(data=d)  # 创建pandas数据框
df_int = df.copy(deep=True)  # 复制数据框

# 将标签更改为整数
for idx, ethnicity in enumerate(['非洲', '白人', '拉丁美洲', '亚洲']):
    df_int['ethnicity'].loc[df_int['ethnicity'] == ethnicity] = idx

for idx, gender in enumerate(['男', '女']):
    df_int['gender'].loc[df_int['gender'] == gender] = idx

# 在原始数据框中创建一个新列
df['Gen_Eth'] = 2 * df_int['ethnicity'] + df_int['gender']

希望这有所帮助。如果您需要更多信息或其他翻译,请告诉我。

英文:

Here is an example of what you can do.

# Data
d = {'ethnicity':['asian', 'african', 'white', 'latino', 'latino', 'african', 'white', 'asian'], 'gender':['M', 'F', 'M', 'F','M', 'F','M', 'F']}

df = pd.DataFrame(data=d) # creating pandas dataframe
df_int = df.copy(deep=True) # making a copy of dataframe

# Changing the labels to int
for idx, ethnicity in enumerate(['african', 'white', 'latino', 'asian']):
    df_int['ethnicity'].loc[df_int['ethnicity']==ethnicity] = idx

for idx, gender in enumerate(['M', 'F']):
    df_int['gender'].loc[df_int['gender']==gender] = idx

# Creating a new column in your original dataframe
df['Gen_Eth'] = 2*df_int['ethnicity'] + df_int['gender']

答案2

得分: 1

我特别喜欢使用map函数来实现这个。这个答案的其余部分与@Jason Yu的答案类似。

ethnic_mapper = {"African American": 0, "White": 1, "Latino": 2, "Asian": 3}
gender_mapper = {"M": 0, "F": 1}
df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
df["gender_int"] = df.gender.map(gender_mapper)
  gender         ethnicity  ethnic_int  gender_int
0      M  African American           0           0
1      M             White           1           0
2      F            Latino           2           1
3      M             Asian           3           0
4      F            Latino           2           1
5      F             Asian           3           1
df["gen_eth"] = 2 * df["ethnic_int"] + df["gender_int"]
df = df.drop(columns=["ethnic_int", "gender_int"])
  gender         ethnicity  gen_eth
0      M  African American        0
1      M             White        2
2      F            Latino        5
3      M             Asian        6
4      F            Latino        5
5      F             Asian        7
英文:

I particularly like using the map function for this. The rest of this answer is similar to @Jason Yu's answer

ethnic_mapper = {"African American":0,"White":1,"Latino":2,"Asian":3}
gender_mapper = {"M":0,"F":1}
df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
df["gender_int"] = df.gender.map(gender_mapper)
  gender         ethnicity  ethnic_int  gender_int
0      M  African American           0           0
1      M             White           1           0
2      F            Latino           2           1
3      M             Asian           3           0
4      F            Latino           2           1
5      F             Asian           3           1
df["gen_eth"] = 2*df["ethnic_int"] + df["gender_int"]
df = df.drop(columns=["ethnic_int","gender_int"])
  gender         ethnicity  gen_eth
0      M  African American        0
1      M             White        2
2      F            Latino        5
3      M             Asian        6
4      F            Latino        5
5      F             Asian        7

答案3

得分: 0

我会在df.apply()与字典一起使用。

示例:

import pandas as pd

df = pd.DataFrame({
    "gender": ["M", "M", "F", "M", "F", "F"],
    "ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
})

ge_map = {
    "M": 0,
    "F": 1,
    "African American": 0,
    "White": 2,
    "Latino": 4, 
    "Asian": 6
}

df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis=1)

print(df)

结果:

id gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7
英文:

I would use a dictionary in conjunction with df.apply().

Example:

import pandas as pd

df = pd.DataFrame({
    "gender": ["M", "M", "F", "M", "F", "F"],
    "ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
})

ge_map = {
    "M": 0,
    "F": 1,
    "African American": 0,
    "White": 2,
    "Latino": 4, 
    "Asian": 6
}

df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis= 1)

print(df)

Result:

id gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7

huangapple
  • 本文由 发表于 2023年2月18日 12:24:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491183.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定