2023年2月18日 12:24:19go评论116阅读模式

英文:

How do you create a new column in a dataframe based on multiple conditions/other variables in Python?

问题

我正在使用一个大型人口统计数据集。我有一个名为"Ethnicity"的变量，其中包括：African American（非洲裔美国人）、White（白人）、Latino（拉丁裔）、和Asian（亚洲裔）。我还有一个"Gender"（性别）变量，其中每个人的性别标记为"M"（男）或"F"（女）。

我想要创建一个名为"Gen_Eth"的新变量/列，它根据性别和种族以以下方式进行编码：

African American 男性 = 0
African American 女性 = 1
White 男性 = 2
White 女性 = 3
Latino 男性 = 4
Latino 女性 = 5
Asian 男性 = 6
Asian 女性 = 7

我应该如何做到这一点？可以使用if-else语句基于"ethnicity"（种族）和"gender"（性别）变量，将性别转换为0和1，然后创建一个函数来处理这两个变量，等等。非常感谢！

英文:

I am working with a large demographic dataset. I have a variable called "Ethnicity" which has: African American, White, Latino, and Asian. I also have a "Gender" variable where the genders for each person are "M" or "F".

What I want to do is create a new variable/column called "Gen_Eth" where it is coded based on gender AND ethnicity in the following manner:

African American male = 0

African American female = 1

White male = 2

White female = 3

Latino male = 4

Latino female = 5

Asian male = 6

Asian female = 7

How would I go about doing this? An if-else statement based on the "ethnicity" and "gender" variables, convert gender to 0's and 1's and then a function with the 2 variables, etc. Thank you so much!

答案1

得分: 1

以下是您要的代码部分的中文翻译：

# 数据
d = {'ethnicity': ['亚洲', '非洲', '白人', '拉丁美洲', '拉丁美洲', '非洲', '白人', '亚洲'], 'gender': ['男', '女', '男', '女', '男', '女', '男', '女']}
df = pd.DataFrame(data=d)  # 创建pandas数据框
df_int = df.copy(deep=True)  # 复制数据框
# 将标签更改为整数
for idx, ethnicity in enumerate(['非洲', '白人', '拉丁美洲', '亚洲']):
    df_int['ethnicity'].loc[df_int['ethnicity'] == ethnicity] = idx
for idx, gender in enumerate(['男', '女']):
    df_int['gender'].loc[df_int['gender'] == gender] = idx
# 在原始数据框中创建一个新列
df['Gen_Eth'] = 2 * df_int['ethnicity'] + df_int['gender']

希望这有所帮助。如果您需要更多信息或其他翻译，请告诉我。

英文:

Here is an example of what you can do.

# Data
d = {&#39;ethnicity&#39;:[&#39;asian&#39;, &#39;african&#39;, &#39;white&#39;, &#39;latino&#39;, &#39;latino&#39;, &#39;african&#39;, &#39;white&#39;, &#39;asian&#39;], &#39;gender&#39;:[&#39;M&#39;, &#39;F&#39;, &#39;M&#39;, &#39;F&#39;,&#39;M&#39;, &#39;F&#39;,&#39;M&#39;, &#39;F&#39;]}
df = pd.DataFrame(data=d) # creating pandas dataframe
df_int = df.copy(deep=True) # making a copy of dataframe
# Changing the labels to int
for idx, ethnicity in enumerate([&#39;african&#39;, &#39;white&#39;, &#39;latino&#39;, &#39;asian&#39;]):
    df_int[&#39;ethnicity&#39;].loc[df_int[&#39;ethnicity&#39;]==ethnicity] = idx
for idx, gender in enumerate([&#39;M&#39;, &#39;F&#39;]):
    df_int[&#39;gender&#39;].loc[df_int[&#39;gender&#39;]==gender] = idx
# Creating a new column in your original dataframe
df[&#39;Gen_Eth&#39;] = 2*df_int[&#39;ethnicity&#39;] + df_int[&#39;gender&#39;]

答案2

得分: 1

我特别喜欢使用map函数来实现这个。这个答案的其余部分与@Jason Yu的答案类似。

ethnic_mapper = {"African American": 0, "White": 1, "Latino": 2, "Asian": 3}
gender_mapper = {"M": 0, "F": 1}
df["ethnic_int"] = df.ethnicity.map(ethnic_mapper)
df["gender_int"] = df.gender.map(gender_mapper)

  gender         ethnicity  ethnic_int  gender_int
0      M  African American           0           0
1      M             White           1           0
2      F            Latino           2           1
3      M             Asian           3           0
4      F            Latino           2           1
5      F             Asian           3           1

df["gen_eth"] = 2 * df["ethnic_int"] + df["gender_int"]
df = df.drop(columns=["ethnic_int", "gender_int"])

  gender         ethnicity  gen_eth
0      M  African American        0
1      M             White        2
2      F            Latino        5
3      M             Asian        6
4      F            Latino        5
5      F             Asian        7

英文:

I particularly like using the map function for this. The rest of this answer is similar to @Jason Yu's answer

ethnic_mapper = {&quot;African American&quot;:0,&quot;White&quot;:1,&quot;Latino&quot;:2,&quot;Asian&quot;:3}
gender_mapper = {&quot;M&quot;:0,&quot;F&quot;:1}
df[&quot;ethnic_int&quot;] = df.ethnicity.map(ethnic_mapper)
df[&quot;gender_int&quot;] = df.gender.map(gender_mapper)

  gender         ethnicity  ethnic_int  gender_int
0      M  African American           0           0
1      M             White           1           0
2      F            Latino           2           1
3      M             Asian           3           0
4      F            Latino           2           1
5      F             Asian           3           1

df[&quot;gen_eth&quot;] = 2*df[&quot;ethnic_int&quot;] + df[&quot;gender_int&quot;]
df = df.drop(columns=[&quot;ethnic_int&quot;,&quot;gender_int&quot;])

  gender         ethnicity  gen_eth
0      M  African American        0
1      M             White        2
2      F            Latino        5
3      M             Asian        6
4      F            Latino        5
5      F             Asian        7

答案3

得分: 0

我会在df.apply()与字典一起使用。

示例：

import pandas as pd
df = pd.DataFrame({
    "gender": ["M", "M", "F", "M", "F", "F"],
    "ethnicity": ["African American", "White", "Latino", "Asian", "Latino", "Asian"]
})
ge_map = {
    "M": 0,
    "F": 1,
    "African American": 0,
    "White": 2,
    "Latino": 4, 
    "Asian": 6
}
df["gen_eth"] = df.apply(lambda x: ge_map[x["gender"]] + ge_map[x["ethnicity"]], axis=1)
print(df)

结果：

id	gender	ethnicity	gen_eth
0	M	African American	0
1	M	White	2
2	F	Latino	5
3	M	Asian	6
4	F	Latino	5
5	F	Asian	7

英文:

I would use a dictionary in conjunction with df.apply().

Example:

import pandas as pd
df = pd.DataFrame({
    &quot;gender&quot;: [&quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, &quot;F&quot;, &quot;F&quot;],
    &quot;ethnicity&quot;: [&quot;African American&quot;, &quot;White&quot;, &quot;Latino&quot;, &quot;Asian&quot;, &quot;Latino&quot;, &quot;Asian&quot;]
})
ge_map = {
    &quot;M&quot;: 0,
    &quot;F&quot;: 1,
    &quot;African American&quot;: 0,
    &quot;White&quot;: 2,
    &quot;Latino&quot;: 4, 
    &quot;Asian&quot;: 6
}
df[&quot;gen_eth&quot;] = df.apply(lambda x: ge_map[x[&quot;gender&quot;]] + ge_map[x[&quot;ethnicity&quot;]], axis= 1)
print(df)

Result:

id	gender	ethnicity	gen_eth
0	M	African American	0
1	M	White	2
2	F	Latino	5
3	M	Asian	6
4	F	Latino	5
5	F	Asian	7

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以在Python中根据多个条件/其他变量创建一个新的数据框列吗？

问题

答案1

答案2

答案3

生成多元回归中交互项的Pandas截距乘积。

解释一个Python函数

如何启用FacetGrid的顶部和右侧脊柱。

如何在Python的IDLE中安装模块或使用pip？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。