英文:
Kaggle Data Clean Up
问题
I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of 'joke' answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way to clean the data so that I can just be left with Male and Female?
Here is what the unique values currently look like:
['Male' 'Female' 'Nonbinary' 'Nonbinary woman' 'Trans male' 'human'
'Trans guy' 'Athlet' 'Living being ' 'I am a meat popsicle' 'Transmale'
'Transmasculine Genderqueer' 'Nonbinary girl' 'Diverse and flexible'
'He/Him' 'Male teen' 'PINEAPPLE (rpz le meilleur sex)' 'Agender' 'Mixed'
'Gamer' 'trans man' 'Female High-schooler' 'Trans female' 'Trans Female'
'trans male' 'a human person (male)' "this question doesn't matter"
'A guy who is determined to learn web development' 'non-binary woman'
'student?' 'Male, just Male' 'questioning '
'possibly a descendant of a Norse God.' 'Ajith' 'Helicopter '
"I'm human. What else matters?" 'bi sexual' 'transguy' 'Trans Girl'
'Carbon-15' 'Life'
'I am a woman but "female" and "male" are not the best terms - you should use man or woman since you're asking about gender not biological sex. Also, the term "female" is often used in a dehumanizing manner. '
'pre anything MtF' 'transman' 'Motiviert, zielstrebend und zuverl\xe4ssig'
'demigirl' 'My gender is: Apache Helicopter' 'There are just two genders'
'WTF' 'Transgender man'
'Male but I like how you phrased that question :P' '(Trans) Woman'
'Trans-NB' 'transgender' 'Cyborg' 'Attack Helicopter' 'Alpha ;)'
'Straight forward lol.' 'Genderfluid'
'I don't "think of myself" as anything, I am a male.'
"LOL how dumb. Why do people get sucked into this nonsense? I'm giraffe okay!?"
'A Creative, compassionate citizen of the Global Garden.' 'bi'
'Bigender She/her he/him' 'Genetically and scientifically male'
'Am a human not an alien' 'Trans Masc ' 'attack helicopter'
'Michelle "Big Mike" Obama' 'Homosexual']
I have tried to do:
df_clean = df_clean[df_clean["Gender"] == 'Male' or 'Female']
but cannot have them on the same line and when I put them in 2 separate lines it just removes the whole list.
英文:
I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of 'joke' answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way to clean the data so that I can just be left with Male and Female?
Here is what the unique values currently look like:
['Male' 'Female' 'Nonbinary' 'Nonbinary woman' 'Trans male' 'human'
'Trans guy' 'Athlet' 'Living being ' 'I am a meat popsicle' 'Transmale'
'Transmasculine Genderqueer' 'Nonbinary girl' 'Diverse and flexible'
'He/Him' 'Male teen' 'PINEAPPLE (rpz le meilleur sex)' 'Agender' 'Mixed'
'Gamer' 'trans man' 'Female High-schooler' 'Trans female' 'Trans Female'
'trans male' 'a human person :) (male)' "this question doesn't matter"
'A guy who is determined to learn web development' 'non-binary woman'
'student?' 'Male, just Male' 'questioning '
'possibly a descendant of a Norse God.' 'Ajith' 'Helicopter '
"I'm human. What else matters?" 'bi sexual' 'transguy' 'Trans Girl'
'Carbon-15' 'Life'
'I am a woman but "female" and "male" are not the best terms - you should use man or woman since you\'re asking about gender not biological sex. Also, the term "female" is often used in a dehumanizing manner. '
'pre anything MtF' 'transman' 'Motiviert, zielstrebend und zuverlässig'
'demigirl' 'My gender is: Apache Helicopter' 'There are just two genders'
'WTF' 'Transgender man'
'Male but I like how you phrased that question :P' '(Trans) Woman'
'Trans-NB' 'transgender' 'Cyborg' 'Attack Helicopter' 'Alpha ;)'
'Straight forward lol.' 'Genderfluid'
'I don\'t "think of myself" as anything, I am a male.'
"LOL how dumb. Why do people get sucked into this nonsense? I'm giraffe okay!?"
'A Creative, compassionate citizen of the Global Garden.' 'bi'
'Bigender She/her he/him' 'Genetically and scientifically male'
'Am a human not an alien' 'Trans Masc ' 'attack helicopter'
'Michelle "Big Mike" Obama' 'Homosexual']
I have tried to do:
df_clean = df_clean[df_clean["Gender"] == 'Male' or 'Female']
but cannot have them on the same line and when i put them in 2 seperate lines it just removes the whole list.
答案1
得分: 0
你应该在筛选数据框时使用 |
而不是 or
:
df_clean = df_clean[(df_clean["Gender"] == 'Male') | (df_clean["Gender"] == 'Female')]
你可以查看文档获取更多信息。
英文:
You're going to want to use the |
instead of or
when filter dataframes:
df_clean = df_clean[(df_clean["Gender"] == 'Male') | (df_clean["Gender"] == 'Female')]
You can take a look at the documentation for more info.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论