Kaggle数据清理

huangapple go评论78阅读模式
英文:

Kaggle Data Clean Up

问题

I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of 'joke' answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way to clean the data so that I can just be left with Male and Female?

Here is what the unique values currently look like:

['Male' 'Female' 'Nonbinary' 'Nonbinary woman' 'Trans male' 'human'
'Trans guy' 'Athlet' 'Living being ' 'I am a meat popsicle' 'Transmale'
'Transmasculine Genderqueer' 'Nonbinary girl' 'Diverse and flexible'
'He/Him' 'Male teen' 'PINEAPPLE (rpz le meilleur sex)' 'Agender' 'Mixed'
'Gamer' 'trans man' 'Female High-schooler' 'Trans female' 'Trans Female'
'trans male' 'a human person Kaggle数据清理 (male)' "this question doesn't matter"
'A guy who is determined to learn web development' 'non-binary woman'
'student?' 'Male, just Male' 'questioning '
'possibly a descendant of a Norse God.' 'Ajith' 'Helicopter '
"I'm human. What else matters?" 'bi sexual' 'transguy' 'Trans Girl'
'Carbon-15' 'Life'
'I am a woman but "female" and "male" are not the best terms - you should use man or woman since you're asking about gender not biological sex. Also, the term "female" is often used in a dehumanizing manner. '
'pre anything MtF' 'transman' 'Motiviert, zielstrebend und zuverl\xe4ssig'
'demigirl' 'My gender is: Apache Helicopter' 'There are just two genders'
'WTF' 'Transgender man'
'Male but I like how you phrased that question :P' '(Trans) Woman'
'Trans-NB' 'transgender' 'Cyborg' 'Attack Helicopter' 'Alpha ;)'
'Straight forward lol.' 'Genderfluid'
'I don't "think of myself" as anything, I am a male.'
"LOL how dumb. Why do people get sucked into this nonsense? I'm giraffe okay!?"
'A Creative, compassionate citizen of the Global Garden.' 'bi'
'Bigender She/her he/him' 'Genetically and scientifically male'
'Am a human not an alien' 'Trans Masc ' 'attack helicopter'
'Michelle "Big Mike" Obama' 'Homosexual']

I have tried to do:
df_clean = df_clean[df_clean["Gender"] == 'Male' or 'Female'] but cannot have them on the same line and when I put them in 2 separate lines it just removes the whole list.

英文:

I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of 'joke' answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way to clean the data so that I can just be left with Male and Female?

Here is what the unique values currently look like:

['Male' 'Female' 'Nonbinary' 'Nonbinary woman' 'Trans male' 'human'
 'Trans guy' 'Athlet' 'Living being ' 'I am a meat popsicle' 'Transmale'
 'Transmasculine Genderqueer' 'Nonbinary girl' 'Diverse and flexible'
 'He/Him' 'Male teen' 'PINEAPPLE (rpz le meilleur sex)' 'Agender' 'Mixed'
 'Gamer' 'trans man' 'Female High-schooler' 'Trans female' 'Trans Female'
 'trans male' 'a human person :) (male)' "this question doesn't matter"
 'A guy who is determined to learn web development' 'non-binary woman'
 'student?' 'Male, just Male' 'questioning '
 'possibly a descendant of a Norse God.' 'Ajith' 'Helicopter '
 "I'm human. What else matters?" 'bi sexual' 'transguy' 'Trans Girl'
 'Carbon-15' 'Life'
 'I am a woman but "female" and "male" are not the best terms - you should use man or woman since you\'re asking about gender not biological sex. Also, the term "female" is often used in a dehumanizing manner. '
 'pre anything MtF' 'transman' 'Motiviert, zielstrebend und zuverlässig'
 'demigirl' 'My gender is: Apache Helicopter' 'There are just two genders'
 'WTF' 'Transgender man'
 'Male but I like how you phrased that question :P' '(Trans) Woman'
 'Trans-NB' 'transgender' 'Cyborg' 'Attack Helicopter' 'Alpha ;)'
 'Straight forward lol.' 'Genderfluid'
 'I don\'t "think of myself" as anything, I am a male.'
 "LOL how dumb. Why do people get sucked into this nonsense? I'm giraffe okay!?"
 'A Creative, compassionate citizen of the Global Garden.' 'bi'
 'Bigender She/her he/him' 'Genetically and scientifically male'
 'Am a human not an alien' 'Trans Masc ' 'attack helicopter'
 'Michelle "Big Mike" Obama' 'Homosexual']

I have tried to do:
df_clean = df_clean[df_clean["Gender"] == 'Male' or 'Female'] but cannot have them on the same line and when i put them in 2 seperate lines it just removes the whole list.

答案1

得分: 0

你应该在筛选数据框时使用 | 而不是 or

df_clean = df_clean[(df_clean["Gender"] == 'Male') | (df_clean["Gender"] == 'Female')]

你可以查看文档获取更多信息。

英文:

You're going to want to use the | instead of or when filter dataframes:

df_clean = df_clean[(df_clean["Gender"] == 'Male') | (df_clean["Gender"] == 'Female')]

You can take a look at the documentation for more info.

huangapple
  • 本文由 发表于 2023年3月21日 03:42:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794619.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定