英文:
How to check if a pandas data frame column contains any value from a list and return that value
问题
我有一个国家列表:
countries = ["Afghanistan", "Albania", "Algeria", "Andorra", "Angola", "Antigua and Barbuda", "Argentina", "Armenia", "Austria", "Azerbaijan", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", "Brunei", "Bulgaria", "Burkina Faso", "Burundi", "Cabo Verde", "Cambodia", "Cameroon", "Canada", "Central African Republic", "Chad", "Channel Islands", "Chile", "China", "Colombia", "Comoros", "Congo", "Costa Rica", "Côte d'Ivoire", "Croatia", "Cuba", "Cyprus", "Czech Republic", "Denmark", "Djibouti", "Dominica", "Dominican Republic", "DR Congo", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Eswatini", "Ethiopia", "Faeroe Islands", "Finland", "France", "French Guiana", "Gabon", "Gambia", "Georgia", "Germany", "Ghana", "Gibraltar", "Greece", "Grenada", "Guatemala", "Guinea", "Guinea-Bissau", "Guyana", "Haiti", "Holy See", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Isle of Man", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Kuwait", "Kyrgyzstan", "Laos", "Latvia", "Lebanon", "Lesotho", "Liberia", "Libya", "Liechtenstein", "Lithuania", "Luxembourg", "Macao", "Madagascar", "Malawi", "Malaysia", "Maldives", "Mali", "Malta", "Mauritania", "Mauritius", "Mayotte", "Mexico", "Moldova", "Monaco", "Mongolia", "Montenegro", "Morocco", "Mozambique", "Myanmar", "Namibia", "Nepal", "Netherlands", "Nicaragua", "Niger", "Nigeria", "North Korea", "North Macedonia", "Norway", "Oman", "Pakistan", "Panama", "Paraguay", "Peru", "Philippines", "Poland", "Portugal", "Qatar", "Réunion", "Romania", "Russia", "Rwanda", "Saint Helena", "Saint Kitts and Nevis", "Saint Lucia", "Saint Vincent and the Grenadines", "San Marino", "Sao Tome & Principe", "Saudi Arabia", "Senegal", "Serbia", "Seychelles", "Sierra Leone", "Singapore", "Slovakia", "Slovenia", "Somalia", "South Africa", "South Korea", "South Sudan", "Spain", "Sri Lanka", "State of Palestine", "Sudan", "Suriname", "Sweden", "Switzerland", "Syria", "Taiwan", "Tajikistan", "Tanzania", "Thailand", "The Bahamas", "Timor-Leste", "Togo", "Trinidad and Tobago", "Tunisia", "Turkey", "Turkmenistan", "Uganda", "Ukraine", "United Arab Emirates", "United Kingdom", "United States", "Uruguay", "Uzbekistan", "Venezuela", "Vietnam", "Western Sahara", "Yemen", "Zambia", "Zimbabwe"]
我还有这个pandas数据帧:
import pandas as pd
import numpy as np
ds1 = {'remarks':["DOB 21 Mar 1974; POB Baghdad, Iraq.","DOB 26 Mar 1969; POB Tunis, Tunisia; Italian Fiscal Code TLLLHR69C26Z352G.","DOB 10 Jun 1970; POB Tunis, Tunisia; nationality Tunisia; Passport L550681 issued 23 Sep 1997 expires 22 Sep 2002; Italian Fiscal Code WDDHBB70H10Z352O."], "Latitude" : [-23.69057,-23.41165,-23.51482]}
df1 = pd.DataFrame(data=ds1)
数据帧看起来像这样:
print(df1)
我需要:
- 检查列remarks是否包含在名为
countries
的列表中的任何国家 - 如果是这样,创建一个新列(称为
country
),其中包含匹配的国家名称。
从上面的示例中,结果数据帧将如下所示:
英文:
I have list of countries:
countries = ["Afghanistan", "Albania", "Algeria", "Andorra", "Angola", "Antigua and Barbuda", "Argentina", "Armenia", "Austria", "Azerbaijan", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", "Brunei", "Bulgaria", "Burkina Faso", "Burundi", "Cabo Verde", "Cambodia", "Cameroon", "Canada", "Central African Republic", "Chad", "Channel Islands", "Chile", "China", "Colombia", "Comoros", "Congo", "Costa Rica", "Côte d'Ivoire", "Croatia", "Cuba", "Cyprus", "Czech Republic", "Denmark", "Djibouti", "Dominica", "Dominican Republic", "DR Congo", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Eswatini", "Ethiopia", "Faeroe Islands", "Finland", "France", "French Guiana", "Gabon", "Gambia", "Georgia", "Germany", "Ghana", "Gibraltar", "Greece", "Grenada", "Guatemala", "Guinea", "Guinea-Bissau", "Guyana", "Haiti", "Holy See", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Isle of Man", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Kuwait", "Kyrgyzstan", "Laos", "Latvia", "Lebanon", "Lesotho", "Liberia", "Libya", "Liechtenstein", "Lithuania", "Luxembourg", "Macao", "Madagascar", "Malawi", "Malaysia", "Maldives", "Mali", "Malta", "Mauritania", "Mauritius", "Mayotte", "Mexico", "Moldova", "Monaco", "Mongolia", "Montenegro", "Morocco", "Mozambique", "Myanmar", "Namibia", "Nepal", "Netherlands", "Nicaragua", "Niger", "Nigeria", "North Korea", "North Macedonia", "Norway", "Oman", "Pakistan", "Panama", "Paraguay", "Peru", "Philippines", "Poland", "Portugal", "Qatar", "Réunion", "Romania", "Russia", "Rwanda", "Saint Helena", "Saint Kitts and Nevis", "Saint Lucia", "Saint Vincent and the Grenadines", "San Marino", "Sao Tome & Principe", "Saudi Arabia", "Senegal", "Serbia", "Seychelles", "Sierra Leone", "Singapore", "Slovakia", "Slovenia", "Somalia", "South Africa", "South Korea", "South Sudan", "Spain", "Sri Lanka", "State of Palestine", "Sudan", "Suriname", "Sweden", "Switzerland", "Syria", "Taiwan", "Tajikistan", "Tanzania", "Thailand", "The Bahamas", "Timor-Leste", "Togo", "Trinidad and Tobago", "Tunisia", "Turkey", "Turkmenistan", "Uganda", "Ukraine", "United Arab Emirates", "United Kingdom", "United States", "Uruguay", "Uzbekistan", "Venezuela", "Vietnam", "Western Sahara", "Yemen", "Zambia", "Zimbabwe"]
I also have this pandas dataframe:
import pandas as pd
import numpy as np
ds1 = {'remarks':["DOB 21 Mar 1974; POB Baghdad, Iraq.","DOB 26 Mar 1969; POB Tunis, Tunisia; Italian Fiscal Code TLLLHR69C26Z352G.","DOB 10 Jun 1970; POB Tunis, Tunisia; nationality Tunisia; Passport L550681 issued 23 Sep 1997 expires 22 Sep 2002; Italian Fiscal Code WDDHBB70H10Z352O."], "Latitude" : [-23.69057,-23.41165,-23.51482]}
df1 = pd.DataFrame(data=ds1)
The dataframe looks like this:
print(df1)
I need to:
- check whether the column remarks contains any of the countries included in the list called
countries
- if so,create a new column (called
country
) which contains the name of the matched country.
From the example above, the resulting dataframe would look like this:
Can anyone help me please?
答案1
得分: 1
str.extract()
的威力在处理这类问题时表现得非常出色。
df1['country'] = df1['remarks'].str.extract("(\(" + "|".join(countries) + "\))", expand=False)
print(df1)
remarks Latitude country
0 DOB 21 Mar 1974; POB Baghdad, Iraq. -23.69057 Iraq
1 DOB 26 Mar 1969; POB Tunis, FakeCountry; Itali... -23.41165 NaN #我在这里替换了国家以展示一个测试案例
2 DOB 10 Jun 1970; POB Tunis, Tunisia; nationali... -23.51482 Tunisia
英文:
The power of str.extract() really shines for problems like these
df1['country'] = df1['remarks'].str.extract(("(" + "|".join(countries) +")"), expand=False)
print(df1)
remarks Latitude country
0 DOB 21 Mar 1974; POB Baghdad, Iraq. -23.69057 Iraq
1 DOB 26 Mar 1969; POB Tunis, FakeCountry; Itali... -23.41165 NaN #I replaced the Country here to show a test case
2 DOB 10 Jun 1970; POB Tunis, Tunisia; nationali... -23.51482 Tunisia
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论