如何在pandas数据框的列中添加标签,带有else条件?

huangapple go评论72阅读模式
英文:

How to add labels in panda dataframe columns with else condition?

问题

我有一个包含如下列的数据框:

政治
商业
旅行
体育
....
离婚
艺术
健康
犯罪

例如:

import pandas as pd

data = [['犯罪', 10], ['商业', 15], ['体育', 12], ['旅行', 2], ['健康', 3], ['艺术', 25]]

df = pd.DataFrame(data, columns=['category', 'no'])
df

我想要添加一个名为 'label' 的列,并将四个类别映射到标签,如下所示:

label_dict = {'犯罪': 1, '商业': 2, '体育': 3, '艺术': 4}

然后,所有其余的类别都应标记为 5。

我尝试过以下方法,但出现 KeyError: 'label' 错误:

df['label'] = df['category'].apply(lambda x: label_dict[x] if x in label_dict.keys() else 5)

我该如何实现这个目标?

英文:

I have a dataframe with a column like this:

POLITICS          
BUSINESS 
TRAVEL         
SPORTS
....
DIVORCE
ARTS
WELLNESS
CRIME

e.g

import pandas as pd

data = [['CRIME', 10], ['BUSINESS', 15], ['SPORTS', 12],  ['TRAVEL', 2], ['WELLNESS', 3], ['ARTS', 25]]
  

df = pd.DataFrame(data, columns=['category', 'no'])
df

I want to add a column 'label' and map four categories to labels like so

label_dict = {'CRIME':1, 'BUSINESS':2, 'SPORTS':3  'ARTS':4}

and then all of the remaining categories should be labeled as 5.
I have tried this and am getting a KeyError: 'label'.

df['label'] = df['category'].apply( lambda x : label_dict[x] if x in label_dict.keys() else 5)

How can I achieve this?

答案1

得分: 2

尝试使用 map

df['label'] = df['category'].map(label_dict).fillna(5).astype(int)
print(df)

# 输出
   category  no  label
0     CRIME  10      1
1  BUSINESS  15      2
2    SPORTS  12      3
3    TRAVEL   2      5
4  WELLNESS   3      5
5      ARTS  25      4

或者使用 replace

df['label'] = df['category'].replace(label_dict | {'.*': 5}, regex=True)

或者根据 @mozway 的建议:

df['label'] = df['category'].map(lambda x: label_dict.get(x, 5))
英文:

Try with map:

df['label'] = df['category'].map(label_dict).fillna(5).astype(int)
print(df)

# Output
   category  no  label
0     CRIME  10      1
1  BUSINESS  15      2
2    SPORTS  12      3
3    TRAVEL   2      5
4  WELLNESS   3      5
5      ARTS  25      4

Or with replace:

df['label'] = df['category'].replace(label_dict | {'.*': 5}, regex=True)

Or suggested by @mozway:

df['label'] = df['category'].map(lambda x: label_dict.get(x, 5))

huangapple
  • 本文由 发表于 2023年7月10日 22:20:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654672.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定