英文:
How do you create a new column in a dataframe based on multiple conditions/other variables in Python?
问题
我正在使用一个大型人口统计数据集。我有一个名为"Ethnicity"的变量,其中包括:African American(非洲裔美国人)、White(白人)、Latino(拉丁裔)、和Asian(亚洲裔)。我还有一个"Gender"(性别)变量,其中每个人的性别标记为"M"(男)或"F"(女)。
我想要创建一个名为"Gen_Eth"的新变量/列,它根据性别和种族以以下方式进行编码:
- African American 男性 = 0
- African American 女性 = 1
- White 男性 = 2
- White 女性 = 3
- Latino 男性 = 4
- Latino 女性 = 5
- Asian 男性 = 6
- Asian 女性 = 7
我应该如何做到这一点?可以使用if-else语句基于"ethnicity"(种族)和"gender"(性别)变量,将性别转换为0和1,然后创建一个函数来处理这两个变量,等等。非常感谢!
英文:
I am working with a large demographic dataset. I have a variable called "Ethnicity" which has: African American, White, Latino, and Asian. I also have a "Gender" variable where the genders for each person are "M" or "F".
What I want to do is create a new variable/column called "Gen_Eth" where it is coded based on gender AND ethnicity in the following manner:
African American male = 0
African American female = 1
White male = 2
White female = 3
Latino male = 4
Latino female = 5
Asian male = 6
Asian female = 7
How would I go about doing this? An if-else statement based on the "ethnicity" and "gender" variables, convert gender to 0's and 1's and then a function with the 2 variables, etc. Thank you so much!
答案1
得分: 1
以下是您要的代码部分的中文翻译:
# 数据
d = {'ethnicity': ['亚洲', '非洲', '白人', '拉丁美洲', '拉丁美洲', '非洲', '白人', '亚洲'], 'gender': ['男', '女', '男', '女', '男', '女', '男', '女']}
df = pd.DataFrame(data=d) # 创建pandas数据框
df_int = df.copy(deep=True) # 复制数据框
# 将标签更改为整数
for idx, ethnicity in enumerate(['非洲', '白人', '拉丁美洲', '亚洲']):
df_int['ethnicity'].loc[df_int['ethnicity'] == ethnicity] = idx
for idx, gender in enumerate(['男', '女']):
df_int['gender'].loc[df_int['gender'] == gender] = idx
# 在原始数据框中创建一个新列
df['Gen_Eth'] = 2 * df_int['ethnicity'] + df_int['gender']
希望这有所帮助。如果您需要更多信息或其他翻译,请告诉我。
英文:
Here is an example of what you can do.
# Data
d = {'ethnicity':['asian', 'african', 'white', 'latino', 'latino', 'african', 'white', 'asian'], 'gender':['M', 'F', 'M', 'F','M', 'F','M', 'F']}
df = pd.DataFrame(data=d) # creating pandas dataframe
df_int = df.copy(deep=True) # making a copy of dataframe
# Changing the labels to int
for idx, ethnicity in enumerate(['african', 'white', 'latino', 'asian']):
df_int['ethnicity'].loc[df_int['ethnicity']==ethnicity] = idx
for idx, gender in enumerate(['M', 'F']):
df_int['gender'].loc[df_int['gender']==gender] = idx
# Creating a new column in your original dataframe
df['Gen_Eth'] = 2*df_int['ethnicity'] + df_int['gender']
答案2
得分: 1
我特别喜欢使用map
函数来实现这个。这个答案的其余部分与@Jason Yu的答案类似。
ethnic_mapper = {"African American": 0, "White": 1, "Latino": 2, "Asian": 3}
gender_mapper = {"M": 0, "F": 1}
df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
df["gender_int"] = df.gender.map(gender_mapper)
gender ethnicity ethnic_int gender_int
0 M African American 0 0
1 M White 1 0
2 F Latino 2 1
3 M Asian 3 0
4 F Latino 2 1
5 F Asian 3 1
df["gen_eth"] = 2 * df["ethnic_int"] + df["gender_int"]
df = df.drop(columns=["ethnic_int", "gender_int"])
gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7
英文:
I particularly like using the map function for this. The rest of this answer is similar to @Jason Yu's answer
ethnic_mapper = {"African American":0,"White":1,"Latino":2,"Asian":3}
gender_mapper = {"M":0,"F":1}
df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
df["gender_int"] = df.gender.map(gender_mapper)
gender ethnicity ethnic_int gender_int
0 M African American 0 0
1 M White 1 0
2 F Latino 2 1
3 M Asian 3 0
4 F Latino 2 1
5 F Asian 3 1
df["gen_eth"] = 2*df["ethnic_int"] + df["gender_int"]
df = df.drop(columns=["ethnic_int","gender_int"])
gender ethnicity gen_eth
0 M African American 0
1 M White 2
2 F Latino 5
3 M Asian 6
4 F Latino 5
5 F Asian 7
答案3
得分: 0
我会在df.apply()与字典一起使用。
示例:
import pandas as pd
df = pd.DataFrame({
"gender": ["M", "M", "F", "M", "F", "F"],
"ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
})
ge_map = {
"M": 0,
"F": 1,
"African American": 0,
"White": 2,
"Latino": 4,
"Asian": 6
}
df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis=1)
print(df)
结果:
id | gender | ethnicity | gen_eth |
---|---|---|---|
0 | M | African American | 0 |
1 | M | White | 2 |
2 | F | Latino | 5 |
3 | M | Asian | 6 |
4 | F | Latino | 5 |
5 | F | Asian | 7 |
英文:
I would use a dictionary in conjunction with df.apply().
Example:
import pandas as pd
df = pd.DataFrame({
"gender": ["M", "M", "F", "M", "F", "F"],
"ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
})
ge_map = {
"M": 0,
"F": 1,
"African American": 0,
"White": 2,
"Latino": 4,
"Asian": 6
}
df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis= 1)
print(df)
Result:
id | gender | ethnicity | gen_eth |
---|---|---|---|
0 | M | African American | 0 |
1 | M | White | 2 |
2 | F | Latino | 5 |
3 | M | Asian | 6 |
4 | F | Latino | 5 |
5 | F | Asian | 7 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论