英文:
Extracting a data frame filtered by country names from a larger data frame in R
问题
我有一个大型(约9500行)的混乱数据集,其中包含国家名称以及几个变量和数值输出。我已经创建了一个数据框的示例,如下所示:
country1 <- c("Arab World", "Caribbean small states", "Central Europe and the Baltics", "Australia", "Brazil", "Sweden")
indicator1 <- c("Age at first marriage, female", "Age at first marriage, male", "Birth rate, crude (per 1,000 people)", "Death rate, crude (per 1,000 people)", "Fertility rate, total (births per woman)", "Hospital beds (per 1,000 people)")
year1 <- c(1960, 1961, 1962, 1963, 1964, 1965)
test <- data.frame(country=country1, indicator=indicator1, year=year1)
我需要从中提取一个较小的数据框,仅包含国家名称,例如“Sweden”,并且不包括多个国家的聚合,例如“Central Europe”。
在这个问题中,你可以使用以下方法创建一个包含所有可能国家名称的新数据框,并进行左连接以过滤所需的数据:
# 创建包含所有可能国家名称的数据框
all_countries <- data.frame(country = unique(test$country))
# 进行左连接以过滤数据
filtered_data <- merge(test, all_countries, by = "country", all.x = TRUE)
# 筛选出不包括"Central Europe"等聚合的数据
filtered_data <- filtered_data[!grepl("Central Europe", filtered_data$country), ]
# 打印结果
print(filtered_data)
这将创建一个包含所有可能国家名称的数据框all_countries
,然后使用左连接将它与原始数据框test
连接,最后筛选出不包括聚合国家的数据。
英文:
I have a large (circa 9500 rows) untidy dataset containing country names along with several variables and numerical output. I've made an example of the data frame as such:
country1 <- c("Arab World", "Caribbean small states", "Central Europe and the Baltics", "Australia", "Brazil", "Sweden")
indicator1 <- c("Age at first marriage, female", "Age at first marriage, male", "Birth rate, crude (per 1,000 people)", "Death rate, crude (per 1,000 people)", "Fertility rate, total (births per woman)", "Hospital beds (per 1,000 people)")
year1 <- c(1960,1961,1962,1963,1964,1965)
test <- data.frame(country=country1, indicator=indicator1, year=year1)
I need to extract a smaller data frame from this, that is filtered by only country names, e.g. "Sweden" and does not include agglomerations of countries, e.g. "Central Europe".
Would appreciate any assistance in this matter. I am quite new to R so not really sure where to begin, but I would imagine that I would first need to create a new data frame containing rows of all possible country names and then do a left join with my above test data frame. How would I go about getting that initial df of all countries?
Thanks.
答案1
得分: 1
您可以创建您自己的有效国家名称列表,或尝试从{maps}包中提取一个:
```r
library(maps)
x <- map("world", plot = FALSE)
country_list <- x$names
我建议手动检查以确保此列表对您的数据足够更新。
然后根据这个国家列表进行子集筛选:
test_countries <- test[test$country %in% country_list, ]
得到:
country indicator year
4 Australia 每千人的粗死亡率 1963
5 Brazil 总生育率(每位妇女的出生数) 1964
6 Sweden 每千人的医院床位数 1965
<details>
<summary>英文:</summary>
You can either create your own list of valid country names or try extracting one from the {maps} package:
```r
library(maps)
x <- map("world", plot = FALSE)
country_list <- x$names
I'd recommend manually inspecting to see if this list is up to date enough for your data.
Then subset based on this list of countries:
test_countries <- test[test$country %in% country_list, ]
which gives:
country indicator year
4 Australia Death rate, crude (per 1,000 people) 1963
5 Brazil Fertility rate, total (births per woman) 1964
6 Sweden Hospital beds (per 1,000 people) 1965
答案2
得分: 1
Alternatively use map_df
which generates the subset in a df or simply use the 'filter'
assume that you have a 'nam' vector with different countries as elements which you can use to subset the df
library(tidyverse)
nam <- c('Brazil','Australia')
new_df <- map_df(nam, \(x) test %>% filter(country==x))
# or
new_df <- test %>% filter(country %in% nam)
country indicator year
1 Brazil Fertility rate, total (births per woman) 1964
2 Australia Death rate, crude (per 1,000 people) 1963
英文:
Alternatively use map_df
which generates the subset in a df or simply use the 'filter'
assume that you have a nam
vector with different countries as elements which you can use to subset the df
library(tidyverse)
nam <- c('Brazil','Australia')
new_df <- map_df(nam, \(x) test %>% filter(country==x))
# or
new_df <- test %>% filter(country %in% nam)
country indicator year
1 Brazil Fertility rate, total (births per woman) 1964
2 Australia Death rate, crude (per 1,000 people) 1963
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论