如何在pandas数据框的列中添加标签,带有else条件?

huangapple go评论93阅读模式
英文:

How to add labels in panda dataframe columns with else condition?

问题

我有一个包含如下列的数据框:

  1. 政治
  2. 商业
  3. 旅行
  4. 体育
  5. ....
  6. 离婚
  7. 艺术
  8. 健康
  9. 犯罪

例如:

  1. import pandas as pd
  2. data = [['犯罪', 10], ['商业', 15], ['体育', 12], ['旅行', 2], ['健康', 3], ['艺术', 25]]
  3. df = pd.DataFrame(data, columns=['category', 'no'])
  4. df

我想要添加一个名为 'label' 的列,并将四个类别映射到标签,如下所示:

  1. label_dict = {'犯罪': 1, '商业': 2, '体育': 3, '艺术': 4}

然后,所有其余的类别都应标记为 5。

我尝试过以下方法,但出现 KeyError: 'label' 错误:

  1. df['label'] = df['category'].apply(lambda x: label_dict[x] if x in label_dict.keys() else 5)

我该如何实现这个目标?

英文:

I have a dataframe with a column like this:

  1. POLITICS
  2. BUSINESS
  3. TRAVEL
  4. SPORTS
  5. ....
  6. DIVORCE
  7. ARTS
  8. WELLNESS
  9. CRIME

e.g

  1. import pandas as pd
  2. data = [['CRIME', 10], ['BUSINESS', 15], ['SPORTS', 12], ['TRAVEL', 2], ['WELLNESS', 3], ['ARTS', 25]]
  3. df = pd.DataFrame(data, columns=['category', 'no'])
  4. df

I want to add a column 'label' and map four categories to labels like so

  1. label_dict = {'CRIME':1, 'BUSINESS':2, 'SPORTS':3 'ARTS':4}

and then all of the remaining categories should be labeled as 5.
I have tried this and am getting a KeyError: 'label'.

  1. df['label'] = df['category'].apply( lambda x : label_dict[x] if x in label_dict.keys() else 5)

How can I achieve this?

答案1

得分: 2

尝试使用 map

  1. df['label'] = df['category'].map(label_dict).fillna(5).astype(int)
  2. print(df)
  3. # 输出
  4. category no label
  5. 0 CRIME 10 1
  6. 1 BUSINESS 15 2
  7. 2 SPORTS 12 3
  8. 3 TRAVEL 2 5
  9. 4 WELLNESS 3 5
  10. 5 ARTS 25 4

或者使用 replace

  1. df['label'] = df['category'].replace(label_dict | {'.*': 5}, regex=True)

或者根据 @mozway 的建议:

  1. df['label'] = df['category'].map(lambda x: label_dict.get(x, 5))
英文:

Try with map:

  1. df['label'] = df['category'].map(label_dict).fillna(5).astype(int)
  2. print(df)
  3. # Output
  4. category no label
  5. 0 CRIME 10 1
  6. 1 BUSINESS 15 2
  7. 2 SPORTS 12 3
  8. 3 TRAVEL 2 5
  9. 4 WELLNESS 3 5
  10. 5 ARTS 25 4

Or with replace:

  1. df['label'] = df['category'].replace(label_dict | {'.*': 5}, regex=True)

Or suggested by @mozway:

  1. df['label'] = df['category'].map(lambda x: label_dict.get(x, 5))

huangapple
  • 本文由 发表于 2023年7月10日 22:20:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654672.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定