2023年2月19日 22:10:46go评论101阅读模式

英文:

How to compare list elements and keep duplicates?

问题

我有一个庞大的redditor列表

str(list_map_descr)
List of 4570
 $ Europa_Teles_BTR    : 'data.frame':	1916 obs. of  2 variables:
  ..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
  ..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
 $ growmylife          : 'data.frame':	92 obs. of  2 variables:
  ..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
  ..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
 $ fzncdata            : 'data.frame':	182 obs. of  2 variables:
  ..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
  ..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...

和一个经过转换和筛选的列表，用于我的分析。

str(list_map_date_o_2_1)
List of 2132
 $ Europa_Teles_BTR    : 'data.frame':	562 obs. of  4 variables:
  ..$ subreddit : chr [1:562] "Warthunder" "Warthunder" "Warthunder" "Warthunder" ...
  ..$ date_utc  : Date[1:562], format: "2020-05-30" "2020-05-30" "2020-05-29" ...
  ..$ Posts_stop: num [1:562] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Posts_game: num [1:562] 1 1 1 1 1 1 1 1 1 1 ...
 $ growmylife          : 'data.frame':	37 obs. of  4 variables:
  ..$ subreddit : chr [1:37] "RocketLeague" "DaysGone" "StopGaming" "StopGaming" ...
  ..$ date_utc  : Date[1:37], format: "2020-09-23" "2020-04-04" "2019-10-10" ...
  ..$ Posts_stop: num [1:37] NA NA 1 1 1 1 1 1 1 1 ...
  ..$ Posts_game: num [1:37] 1 1 NA NA NA NA NA NA NA NA ...
 $ fzncdata            : 'data.frame':	15 obs. of  4 variables:
  ..$ subreddit : chr [1:15] "DotA2" "GlobalOffensive" "DotA2" "DotA2" ...
  ..$ date_utc  : Date[1:15], format: "2019-03-30" "2019-03-02" "2018-11-28" ...
  ..$ Posts_stop: num [1:15] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Posts_game: num [1:15] 1 1 1 1 1 1 1 1 1 1 ...

我现在想要通过新列表list_map_date_o_2_1的元素来过滤我的旧列表list_map_descr。

我认为这可能有些棘手，因为这些列表共享它们的元素名称，但在它们的数据框中有不同的变量，所以我首先只提取了元素的名称

words <- as.list(names(list_map_date_o_2_1))

然后我尝试了我能想象到的所有版本的filter、keep、lapply，例如

list_map_descr_test_3 <- map(list_map_descr, ~filter(words %in% .x))
list_map_descr_test_2 <- map(list_map_descr, ~ filter(.x, .x %in% words))
list_map_descr_test_2 <- map(list_map_descr, ~ keep(any(.x %in% words == TRUE)))
list_map_descr_test_2 <- mapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- lapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 <- purrr::keep(list_map_descr, ~.x %in% words == TRUE)

这些都没有起作用。我认为问题是我不是在尝试按值过滤，而是想告诉R比较两个元素的名称，这是我无法用我的方法实现的。

我期望这样：

str(list_map_descr_test_3)
List of 2132
 $ Europa_Teles_BTR    : 'data.frame':	1916 obs. of  2 variables:
  ..$ subreddit: chr [1:1916] "portugal" "Warthunder" "Warthunder" "portugal" ...
  ..$ date_utc : chr [1:1916] "2020-05-30" "2020-05-30" "2020-05-30" "2020-05-30" ...
 $ growmylife          : 'data.frame':	92 obs. of  2 variables:
  ..$ subreddit: chr [1:92] "PsoriaticArthritis" "Telegram" "google" "Notion" ...
  ..$ date_utc : chr [1:92] "2021-06-27" "2021-01-04" "2020-12-14" "2020-10-01" ...
 $ fzncdata            : 'data.frame':	182 obs. of  2 variables:
  ..$ subreddit: chr [1:182] "a:t5_39x4c" "nba" "NEET" "NEET" ...
  ..$ date_utc : chr [1:182] "2019-06-21" "2019-06-11" "2019-06-09" "2019-04-30" ...

对于任何建议，我都非常感激！

英文:

I have a big list of redditors

str(list_map_descr)
List of 4570
 $ Europa_Teles_BTR    :&#39;data.frame&#39;:	1916 obs. of  2 variables:
  ..$ subreddit: chr [1:1916] &quot;portugal&quot; &quot;Warthunder&quot; &quot;Warthunder&quot; &quot;portugal&quot; ...
  ..$ date_utc : chr [1:1916] &quot;2020-05-30&quot; &quot;2020-05-30&quot; &quot;2020-05-30&quot; &quot;2020-05-30&quot; ...
 $ growmylife          :&#39;data.frame&#39;:	92 obs. of  2 variables:
  ..$ subreddit: chr [1:92] &quot;PsoriaticArthritis&quot; &quot;Telegram&quot; &quot;google&quot; &quot;Notion&quot; ...
  ..$ date_utc : chr [1:92] &quot;2021-06-27&quot; &quot;2021-01-04&quot; &quot;2020-12-14&quot; &quot;2020-10-01&quot; ...
 $ fzncdata            :&#39;data.frame&#39;:	182 obs. of  2 variables:
  ..$ subreddit: chr [1:182] &quot;a:t5_39x4c&quot; &quot;nba&quot; &quot;NEET&quot; &quot;NEET&quot; ...
  ..$ date_utc : chr [1:182] &quot;2019-06-21&quot; &quot;2019-06-11&quot; &quot;2019-06-09&quot; &quot;2019-04-30&quot; ...

and a transformed, filtered version of this list for my analysis.

str(list_map_date_o_2_1)
List of 2132
 $ Europa_Teles_BTR    :&#39;data.frame&#39;:	562 obs. of  4 variables:
  ..$ subreddit : chr [1:562] &quot;Warthunder&quot; &quot;Warthunder&quot; &quot;Warthunder&quot; &quot;Warthunder&quot; ...
  ..$ date_utc  : Date[1:562], format: &quot;2020-05-30&quot; &quot;2020-05-30&quot; &quot;2020-05-29&quot; ...
  ..$ Posts_stop: num [1:562] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Posts_game: num [1:562] 1 1 1 1 1 1 1 1 1 1 ...
 $ growmylife          :&#39;data.frame&#39;:	37 obs. of  4 variables:
  ..$ subreddit : chr [1:37] &quot;RocketLeague&quot; &quot;DaysGone&quot; &quot;StopGaming&quot; &quot;StopGaming&quot; ...
  ..$ date_utc  : Date[1:37], format: &quot;2020-09-23&quot; &quot;2020-04-04&quot; &quot;2019-10-10&quot; ...
  ..$ Posts_stop: num [1:37] NA NA 1 1 1 1 1 1 1 1 ...
  ..$ Posts_game: num [1:37] 1 1 NA NA NA NA NA NA NA NA ...
 $ fzncdata            :&#39;data.frame&#39;:	15 obs. of  4 variables:
  ..$ subreddit : chr [1:15] &quot;DotA2&quot; &quot;GlobalOffensive&quot; &quot;DotA2&quot; &quot;DotA2&quot; ...
  ..$ date_utc  : Date[1:15], format: &quot;2019-03-30&quot; &quot;2019-03-02&quot; &quot;2018-11-28&quot; ...
  ..$ Posts_stop: num [1:15] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Posts_game: num [1:15] 1 1 1 1 1 1 1 1 1 1 ...

I want to filter now my old list list_map_descr by the elements of the new list list_map_date_o_2_1.

I thought it could be tricky that the lists share their element names, but have different variables in their data frames, so first I extracted only the names of the elements

words &lt;- as.list(names(list_map_date_o_2_1))

and then I tried all versions of filter, keep, lapply that I could imagine e.g.

list_map_descr_test_3 &lt;- map(list_map_descr, ~filter(words %in% .x))
list_map_descr_test_2 &lt;- map(list_map_descr, ~ filter(.x, .x %in% words))
list_map_descr_test_2 &lt;- map(list_map_descr, ~ keep(any(.x %in% words == TRUE)))
list_map_descr_test_2 &lt;- mapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 &lt;- lapply(function(x, y) x %in% y, list_map_descr, words, SIMPLIFY=FALSE)
list_map_descr_test_2 &lt;- purrr::keep(list_map_descr, ~.x %in% words == TRUE)

None of this worked. I think the problem is that I am not trying to filter by a value, instead I want to tell R to compare both element names and this I can not implement with my approaches.

I would expect this:

str(list_map_descr_test_3)
List of 2132
 $ Europa_Teles_BTR    :&#39;data.frame&#39;:	1916 obs. of  2 variables:
  ..$ subreddit: chr [1:1916] &quot;portugal&quot; &quot;Warthunder&quot; &quot;Warthunder&quot; &quot;portugal&quot; ...
  ..$ date_utc : chr [1:1916] &quot;2020-05-30&quot; &quot;2020-05-30&quot; &quot;2020-05-30&quot; &quot;2020-05-30&quot; ...
 $ growmylife          :&#39;data.frame&#39;:	92 obs. of  2 variables:
  ..$ subreddit: chr [1:92] &quot;PsoriaticArthritis&quot; &quot;Telegram&quot; &quot;google&quot; &quot;Notion&quot; ...
  ..$ date_utc : chr [1:92] &quot;2021-06-27&quot; &quot;2021-01-04&quot; &quot;2020-12-14&quot; &quot;2020-10-01&quot; ...
 $ fzncdata            :&#39;data.frame&#39;:	182 obs. of  2 variables:
  ..$ subreddit: chr [1:182] &quot;a:t5_39x4c&quot; &quot;nba&quot; &quot;NEET&quot; &quot;NEET&quot; ...
  ..$ date_utc : chr [1:182] &quot;2019-06-21&quot; &quot;2019-06-11&quot; &quot;2019-06-09&quot; &quot;2019-04-30&quot; ...

I am very grateful for any suggestion!

答案1

得分: 1

我们可以使用

library(dplyr)
nm1 <- intersect(names(list_map_descr), names(list_map_date_o_2_1))
list_map_new <- list_map_descr[nm1]


<details>
<summary>英文:</summary>
We may use

library(dplyr)
nm1 <- intersect(names(list_map_descr), names(list_map_date_o_2_1))
list_map_new <- list_map_descr[nm1]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何比较列表元素并保留重复项？

问题

答案1

收集来自副本的多值结果到一个数据框中。

在R中，根据另一列创建一个用于分组字符串文本的列。

如何在R中创建一个由分组或嵌套计数组成的数据框？

如何使用dcast并保留多年的数据？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。