在`map_if`函数中编写测试条件:对所有包含特定值的列进行函数应用。

huangapple go评论98阅读模式
英文:

writing test conditions in a map_if function : apply function on all dataframes with a column that includes specific values

问题

以下是您要翻译的代码部分:

  1. Once again i'm struggling with map functions of purrr.
  2. I've got a list of dataframes, all with ID and Name columns.
  3. I want to perform some recoding and then aggregation on rows with some specific values.
  4. For that purpose, i've got another dataframe with a vector of ID and newID that I want to replace before doing some aggregration (sum all numeric variable).
  5. I know how to perform this on one df (see II/), but I don't know what test to write in a map_if function to apply thoses operations on all dataframes where column ID includes some values of new newIDdf$ID (here dataframe B and C).
  6. Any ideas ?
  1. ## I/ 2 objects
  2. # a list of df
  3. list_df <- list(A = data.frame(ID = c("a", "b", "c", "Z", "Y"),
  4. Name = c("a_name", "b_name", "c_name", "Z_name", "Y_name"),
  5. Var1 = rnorm(5),
  6. Var2 = rnorm(5),
  7. Var3 = rnorm(5)),
  8. B = data.frame(ID = c("a", "b", "z1", "z2", "z3"),
  9. Name = c("a_name", "b_name", "z1_name", "z2_name", "z3_name"),
  10. Var1 = rnorm(5),
  11. Var2 = rnorm(5)),
  12. C = data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  13. Name = c("y1_name", "y2_name", "z1_name", "z2_name", "z3_name"),
  14. Var1 = rnorm(5),
  15. Var2 = rnorm(5)))
  16. # a dataframe of correspondance for aggregation operations
  17. newIDdf <- data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  18. IDagr = c("Y", "Y", "Z", "Z", "Z"))
  19. ## II/ what I want to do (but on 1 df)
  20. # example on 1 df
  21. On1df <- list_df[["B"]] %>%
  22. mutate(ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  23. .init= ID,
  24. str_replace)) %>%
  25. mutate(Name = case_when(ID == "Z" ~ "Z_name",
  26. ID == "Y" ~ "Y_name",
  27. TRUE ~ Name)) %>%
  28. group_by(ID) %>%
  29. mutate_if(is.numeric, ~list(. = sum(.))) %>%
  30. distinct(ID, .keep_all = TRUE)
  31. ## III/ What I really want to achieve
  32. # what if I want to do that simultaneously on df B and C
  33. # I mean applying thoses operations on dataframes
  34. # where column ID includes some values of new newIDdf$ID
  35. list_df_output <- list_df %>% map_if( .p = ~ any(ID %in% newIDdf$ID), ### what test to put here ? (because this doesn't work)
  36. ~ mutate(.x, ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  37. .init= ID,
  38. str_replace)) %>%
  39. mutate(.,Name = case_when(ID == "Z" ~ "Z_name",
  40. ID == "Y" ~ "Y_name",
  41. TRUE ~ Name)) %>%
  42. group_by(., ID) %>%
  43. mutate_if(., is.numeric, ~list(. = sum(.))) %>%
  44. distinct(., ID, .keep_all = TRUE) )

希望这有所帮助。

英文:

Once again i'm struggling with map functions of purrr.

I've got a list of dataframes, all with ID and Name columns.

I want to perform some recoding and then aggregation on rows with some specific values.
For that purpose, i've got another dataframe with a vector of ID and newID that I want to replace before doing some aggregration (sum all numeric variable).

I know how to perform this on one df (see II/), but I don't know what test to write in a map_if function to apply thoses operations on all dataframes where column ID includes some values of new newIDdf$ID (here dataframe B and C).

Any ideas ?

  1. ## I/ 2 objects
  2. # a list of df
  3. list_df <- list(A = data.frame(ID = c("a", "b", "c", "Z", "Y"),
  4. Name = c("a_name", "b_name", "c_name", "Z_name", "Y_name"),
  5. Var1 = rnorm(5),
  6. Var2 = rnorm(5),
  7. Var3 = rnorm(5)),
  8. B = data.frame(ID = c("a", "b", "z1", "z2", "z3"),
  9. Name = c("a_name", "b_name", "z1_name", "z2_name", "z3_name"),
  10. Var1 = rnorm(5),
  11. Var2 = rnorm(5)),
  12. C = data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  13. Name = c("y1_name", "y2_name", "z1_name", "z2_name", "z3_name"),
  14. Var1 = rnorm(5),
  15. Var2 = rnorm(5)))
  16. # a dataframe of correspondance for aggregation operations
  17. newIDdf <- data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  18. IDagr = c("Y", "Y", "Z", "Z", "Z"))
  19. ## II/ what I want to do (but on 1 df)
  20. # example on 1 df
  21. On1df <- list_df[["B"]] %>%
  22. mutate(ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  23. .init= ID,
  24. str_replace)) %>%
  25. mutate(Name = case_when(ID == "Z" ~ "Z_name",
  26. ID == "Y" ~ "Y_name",
  27. TRUE ~ Name)) %>%
  28. group_by(ID) %>%
  29. mutate_if(is.numeric, ~list(. = sum(.))) %>%
  30. distinct(ID, .keep_all = TRUE)
  31. ## III/ What I really want to achieve
  32. # what if I want to do that simultaneously on df B and C
  33. # I mean applying thoses operations on dataframes
  34. # where column ID includes some values of new newIDdf$ID
  35. list_df_output <- list_df %>% map_if( .p = ~ any(ID %in% newIDdf$ID), ### what test to put here ? (because this doesn't work)
  36. ~ mutate(.x, ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  37. .init= ID,
  38. str_replace)) %>%
  39. mutate(.,Name = case_when(ID == "Z" ~ "Z_name",
  40. ID == "Y" ~ "Y_name",
  41. TRUE ~ Name)) %>%
  42. group_by(., ID) %>%
  43. mutate_if(., is.numeric, ~list(. = sum(.))) %>%
  44. distinct(., ID, .keep_all = TRUE) )

答案1

得分: 3

我不确定下面的方法是否能够产生您期望的输出。另外,我们不需要使用map_if,我们可以使用across(where())。此外,我们也不需要使用reduce2(),而是可以将一个查找向量(使用set_names()创建)传递给str_replace_all()

  1. library(dplyr)
  2. library(purrr)
  3. library(stringr)
  4. list_df %>%
  5. map_if(~ any(.x$ID %in% newIDdf$ID),
  6. ~ .x %>%
  7. mutate(ID = str_replace_all(ID, set_names(newIDdf$IDagr, newIDdf$ID)),
  8. Name = case_when(ID == "Z" ~ "Z_name",
  9. ID == "Y" ~ "Y_name",
  10. TRUE ~ Name)
  11. ) %>%
  12. group_by(ID) %>%
  13. mutate(across(where(is.numeric), ~ sum(.))) %>%
  14. distinct(ID, .keep_all = TRUE)
  15. )

从 OP 的数据中:

  1. list_df <- list(A = data.frame(ID = c("a", "b", "c", "Z", "Y"),
  2. Name = c("a_name", "b_name", "c_name", "Z_name", "Y_name"),
  3. Var1 = rnorm(5),
  4. Var2 = rnorm(5),
  5. Var3 = rnorm(5)),
  6. B = data.frame(ID = c("a", "b", "z1", "z2", "z3"),
  7. Name = c("a_name", "b_name", "z1_name", "z2_name", "z3_name"),
  8. Var1 = rnorm(5),
  9. Var2 = rnorm(5)),
  10. C = data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  11. Name = c("y1_name", "y2_name", "z1_name", "z2_name", "z3_name"),
  12. Var1 = rnorm(5),
  13. Var2 = rnorm(5)))
  14. # 用于聚合操作的对应关系数据框
  15. newIDdf <- data.frame(ID = c("y1", "y2", "z1", "z2", "z3"),
  16. IDagr = c("Y", "Y", "Z", "Z", "Z"))

创建于2023-03-03,使用 reprex 包 (v2.0.1)

英文:

I'm not sure if the approach below yields your desird output. Btw. we don't need map_if we can use across(where()). Also, we don't need reduce2() but can pass a lookup vector (below created with set_names()) to str_replace_all():

  1. library(dplyr)
  2. library(purrr)
  3. library(stringr)
  4. list_df %&gt;%
  5. map_if(~ any(.x$ID %in% newIDdf$ID),
  6. ~ .x %&gt;%
  7. mutate(ID = str_replace_all(ID, set_names(newIDdf$IDagr, newIDdf$ID)),
  8. Name = case_when(ID == &quot;Z&quot; ~ &quot;Z_name&quot;,
  9. ID == &quot;Y&quot; ~ &quot;Y_name&quot;,
  10. TRUE ~ Name)
  11. ) %&gt;%
  12. group_by(ID) %&gt;%
  13. mutate(across(where(is.numeric), ~ sum(.))) %&gt;%
  14. distinct(ID, .keep_all = TRUE)
  15. )
  16. #&gt; $A
  17. #&gt; ID Name Var1 Var2 Var3
  18. #&gt; 1 a a_name -0.9958825 -0.4822998 -0.5283220
  19. #&gt; 2 b b_name 0.5309721 0.7133405 -1.1024029
  20. #&gt; 3 c c_name -1.2049361 0.2681276 0.1179077
  21. #&gt; 4 Z Z_name -0.7167132 -1.0513967 -1.5125656
  22. #&gt; 5 Y Y_name -0.5056531 0.6273818 1.4781721
  23. #&gt;
  24. #&gt; $B
  25. #&gt; # A tibble: 3 x 4
  26. #&gt; # Groups: ID [3]
  27. #&gt; ID Name Var1 Var2
  28. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  29. #&gt; 1 a a_name -0.967 2.78
  30. #&gt; 2 b b_name -0.814 1.37
  31. #&gt; 3 Z Z_name 0.354 2.33
  32. #&gt;
  33. #&gt; $C
  34. #&gt; # A tibble: 2 x 4
  35. #&gt; # Groups: ID [2]
  36. #&gt; ID Name Var1 Var2
  37. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  38. #&gt; 1 Y Y_name -2.71 -0.852
  39. #&gt; 2 Z Z_name -2.06 -1.52

Data from OP

  1. list_df &lt;- list(A = data.frame(ID = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;Z&quot;, &quot;Y&quot;),
  2. Name = c(&quot;a_name&quot;, &quot;b_name&quot;, &quot;c_name&quot;, &quot;Z_name&quot;, &quot;Y_name&quot;),
  3. Var1 = rnorm(5),
  4. Var2 = rnorm(5),
  5. Var3 = rnorm(5)),
  6. B = data.frame(ID = c(&quot;a&quot;, &quot;b&quot;, &quot;z1&quot;, &quot;z2&quot;, &quot;z3&quot;),
  7. Name = c(&quot;a_name&quot;, &quot;b_name&quot;, &quot;z1_name&quot;, &quot;z2_name&quot;, &quot;z3_name&quot;),
  8. Var1 = rnorm(5),
  9. Var2 = rnorm(5)),
  10. C = data.frame(ID = c(&quot;y1&quot;, &quot;y2&quot;, &quot;z1&quot;, &quot;z2&quot;, &quot;z3&quot;),
  11. Name = c(&quot;y1_name&quot;, &quot;y2_name&quot;, &quot;z1_name&quot;, &quot;z2_name&quot;, &quot;z3_name&quot;),
  12. Var1 = rnorm(5),
  13. Var2 = rnorm(5)))
  14. # a dataframe of correspondance for aggregation operations
  15. newIDdf &lt;- data.frame(ID = c(&quot;y1&quot;, &quot;y2&quot;, &quot;z1&quot;, &quot;z2&quot;, &quot;z3&quot;),
  16. IDagr = c(&quot;Y&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;Z&quot;, &quot;Z&quot;))

<sup>Created on 2023-03-03 by the reprex package (v2.0.1)</sup>

答案2

得分: 1

你要这个吗?我还将您的 mutate_at 函数更改为使用 acrosswhere 的更新版本:

  1. list_df |>
  2. map_if(~any(.x$ID %in%newIDdf$ID) , ~ .x |>
  3. mutate(ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  4. .init= ID,
  5. str_replace)) %>%
  6. mutate(Name = case_when(ID == "Z" ~ "Z_name",
  7. ID == "Y" ~ "Y_name",
  8. TRUE ~ Name)) %>%
  9. group_by(ID) %>%
  10. mutate(across(where(is.numeric), ~ sum(.))) %>%
  11. distinct(ID, .keep_all = TRUE))

输出

$A
ID Name Var1 Var2 Var3
1 a a_name 0.1015844 0.6306434 0.5058593
2 b b_name -0.1420690 0.5152645 0.2497879
3 c c_name 0.5841423 1.2883330 0.5297098
4 Z Z_name 1.6645565 0.2307524 -1.0418045
5 Y Y_name -0.1293767 -2.4152871 -0.1935843

$B

A tibble: 3 × 4

Groups: ID [3]

ID Name Var1 Var2

1 a a_name -0.512 -0.119
2 b b_name -2.14 -0.834
3 Z Z_name 0.468 2.54

$C

A tibble: 2 × 4

Groups: ID [2]

ID Name Var1 Var2

1 Y Y_name 1.15 0.162
2 Z Z_name 0.790 2.03

  1. <details>
  2. <summary>英文:</summary>
  3. Do you want this? I also changed your `mutate_at` function to the more recent version using `across` and `where`:
  4. list_df |&gt;
  5. map_if(~any(.x$ID %in%newIDdf$ID) , ~ .x |&gt;
  6. mutate(ID = reduce2(newIDdf$ID, newIDdf$IDagr,
  7. .init= ID,
  8. str_replace)) %&gt;%
  9. mutate(Name = case_when(ID == &quot;Z&quot; ~ &quot;Z_name&quot;,
  10. ID == &quot;Y&quot; ~ &quot;Y_name&quot;,
  11. TRUE ~ Name)) %&gt;%
  12. group_by(ID) %&gt;%
  13. mutate(across(where(is.numeric), ~ sum(.))) %&gt;%
  14. distinct(ID, .keep_all = TRUE))
  15. **Output**:
  16. $A
  17. ID Name Var1 Var2 Var3
  18. 1 a a_name 0.1015844 0.6306434 0.5058593
  19. 2 b b_name -0.1420690 0.5152645 0.2497879
  20. 3 c c_name 0.5841423 1.2883330 0.5297098
  21. 4 Z Z_name 1.6645565 0.2307524 -1.0418045
  22. 5 Y Y_name -0.1293767 -2.4152871 -0.1935843
  23. $B
  24. # A tibble: 3 &#215; 4
  25. # Groups: ID [3]
  26. ID Name Var1 Var2
  27. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  28. 1 a a_name -0.512 -0.119
  29. 2 b b_name -2.14 -0.834
  30. 3 Z Z_name 0.468 2.54
  31. $C
  32. # A tibble: 2 &#215; 4
  33. # Groups: ID [2]
  34. ID Name Var1 Var2
  35. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  36. 1 Y Y_name 1.15 0.162
  37. 2 Z Z_name 0.790 2.03
  38. </details>

huangapple
  • 本文由 发表于 2023年3月3日 18:51:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定