英文:
How to compare list elements and keep duplicates?
问题
我有一个庞大的redditor列表
str(list_map_descr)
List of 4570
$ Europa_Teles_BTR : 'data.frame': 1916 obs. of 2 variables:
..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
$ growmylife : 'data.frame': 92 obs. of 2 variables:
..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
$ fzncdata : 'data.frame': 182 obs. of 2 variables:
..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...
和一个经过转换和筛选的列表,用于我的分析。
str(list_map_date_o_2_1)
List of 2132
$ Europa_Teles_BTR : 'data.frame': 562 obs. of 4 variables:
..$ subreddit : chr [1:562] "Warthunder" "Warthunder" "Warthunder" "Warthunder" ...
..$ date_utc : Date[1:562], format: "2020-05-30" "2020-05-30" "2020-05-29" ...
..$ Posts_stop: num [1:562] NA NA NA NA NA NA NA NA NA NA ...
..$ Posts_game: num [1:562] 1 1 1 1 1 1 1 1 1 1 ...
$ growmylife : 'data.frame': 37 obs. of 4 variables:
..$ subreddit : chr [1:37] "RocketLeague" "DaysGone" "StopGaming" "StopGaming" ...
..$ date_utc : Date[1:37], format: "2020-09-23" "2020-04-04" "2019-10-10" ...
..$ Posts_stop: num [1:37] NA NA 1 1 1 1 1 1 1 1 ...
..$ Posts_game: num [1:37] 1 1 NA NA NA NA NA NA NA NA ...
$ fzncdata : 'data.frame': 15 obs. of 4 variables:
..$ subreddit : chr [1:15] "DotA2" "GlobalOffensive" "DotA2" "DotA2" ...
..$ date_utc : Date[1:15], format: "2019-03-30" "2019-03-02" "2018-11-28" ...
..$ Posts_stop: num [1:15] NA NA NA NA NA NA NA NA NA NA ...
..$ Posts_game: num [1:15] 1 1 1 1 1 1 1 1 1 1 ...
我现在想要通过新列表list_map_date_o_2_1
的元素来过滤我的旧列表list_map_descr
。
我认为这可能有些棘手,因为这些列表共享它们的元素名称,但在它们的数据框中有不同的变量,所以我首先只提取了元素的名称
words <- as.list(names(list_map_date_o_2_1))
然后我尝试了我能想象到的所有版本的filter、keep、lapply,例如
list_map_descr_test_3 <- map(list_map_descr, ~filter(words %in% .x))
list_map_descr_test_2 <- map(list_map_descr, ~ filter(.x, .x %in% words))
list_map_descr_test_2 <- map(list_map_descr, ~ keep(any(.x %in% words == TRUE)))
list_map_descr_test_2 <- mapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- lapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- purrr::keep(list_map_descr, ~.x %in% words == TRUE)
这些都没有起作用。我认为问题是我不是在尝试按值过滤,而是想告诉R比较两个元素的名称,这是我无法用我的方法实现的。
我期望这样:
str(list_map_descr_test_3)
List of 2132
$ Europa_Teles_BTR : 'data.frame': 1916 obs. of 2 variables:
..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
$ growmylife : 'data.frame': 92 obs. of 2 variables:
..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
$ fzncdata : 'data.frame': 182 obs. of 2 variables:
..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...
对于任何建议,我都非常感激!
英文:
I have a big list of redditors
str(list_map_descr)
List of 4570
$ Europa_Teles_BTR :'data.frame': 1916 obs. of 2 variables:
..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
$ growmylife :'data.frame': 92 obs. of 2 variables:
..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
$ fzncdata :'data.frame': 182 obs. of 2 variables:
..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...
and a transformed, filtered version of this list for my analysis.
str(list_map_date_o_2_1)
List of 2132
$ Europa_Teles_BTR :'data.frame': 562 obs. of 4 variables:
..$ subreddit : chr [1:562] "Warthunder" "Warthunder" "Warthunder" "Warthunder" ...
..$ date_utc : Date[1:562], format: "2020-05-30" "2020-05-30" "2020-05-29" ...
..$ Posts_stop: num [1:562] NA NA NA NA NA NA NA NA NA NA ...
..$ Posts_game: num [1:562] 1 1 1 1 1 1 1 1 1 1 ...
$ growmylife :'data.frame': 37 obs. of 4 variables:
..$ subreddit : chr [1:37] "RocketLeague" "DaysGone" "StopGaming" "StopGaming" ...
..$ date_utc : Date[1:37], format: "2020-09-23" "2020-04-04" "2019-10-10" ...
..$ Posts_stop: num [1:37] NA NA 1 1 1 1 1 1 1 1 ...
..$ Posts_game: num [1:37] 1 1 NA NA NA NA NA NA NA NA ...
$ fzncdata :'data.frame': 15 obs. of 4 variables:
..$ subreddit : chr [1:15] "DotA2" "GlobalOffensive" "DotA2" "DotA2" ...
..$ date_utc : Date[1:15], format: "2019-03-30" "2019-03-02" "2018-11-28" ...
..$ Posts_stop: num [1:15] NA NA NA NA NA NA NA NA NA NA ...
..$ Posts_game: num [1:15] 1 1 1 1 1 1 1 1 1 1 ...
I want to filter now my old list list_map_descr
by the elements of the new list list_map_date_o_2_1
.
I thought it could be tricky that the lists share their element names, but have different variables in their data frames, so first I extracted only the names of the elements
words <- as.list(names(list_map_date_o_2_1))
and then I tried all versions of filter, keep, lapply that I could imagine e.g.
list_map_descr_test_3 <- map(list_map_descr, ~filter(words %in% .x))
list_map_descr_test_2 <- map(list_map_descr, ~ filter(.x, .x %in% words))
list_map_descr_test_2 <- map(list_map_descr, ~ keep(any(.x %in% words == TRUE)))
list_map_descr_test_2 <- mapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- lapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- purrr::keep(list_map_descr, ~.x %in% words == TRUE)
None of this worked. I think the problem is that I am not trying to filter by a value, instead I want to tell R to compare both element names and this I can not implement with my approaches.
I would expect this:
str(list_map_descr_test_3)
List of 2132
$ Europa_Teles_BTR :'data.frame': 1916 obs. of 2 variables:
..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
$ growmylife :'data.frame': 92 obs. of 2 variables:
..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
$ fzncdata :'data.frame': 182 obs. of 2 variables:
..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...
I am very grateful for any suggestion!
答案1
得分: 1
我们可以使用
library(dplyr)
nm1 <- intersect(names(list_map_descr), names(list_map_date_o_2_1))
list_map_new <- list_map_descr[nm1]
<details>
<summary>英文:</summary>
We may use
library(dplyr)
nm1 <- intersect(names(list_map_descr), names(list_map_date_o_2_1))
list_map_new <- list_map_descr[nm1]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论