在R中的for循环 – 迭代地重复一段代码。

huangapple go评论97阅读模式
英文:

for loop in R - repeating a block of code iteratively

问题

column1_top50 <- dataframe %>%
arrange(desc(column1)) %>%
slice_head(n = 50) %>%
select(sample_name, column1)

column2_top50 <- dataframe %>%
arrange(desc(column2)) %>%
slice_head(n = 50) %>%
select(sample_name, column2)

column3_top50 <- dataframe %>%
arrange(desc(column3)) %>%
slice_head(n = 50) %>%
select(sample_name, column3)

英文:

I have a block of code that extracts the top 50 values from a specific column in my dataframe, outputting a new dataframe named accordingly. I want to repeat this process for every column in my dataframe (50-100 columns), as follows. How can I automate this?

  1. column1_top50 &lt;- dataframe %&gt;%
  2. arrange(desc(column1)) %&gt;%
  3. slice_head(n = 50) %&gt;%
  4. select(sample_name, column1)
  5. column2_top50 &lt;- dataframe %&gt;%
  6. arrange(desc(column2)) %&gt;%
  7. slice_head(n = 50) %&gt;%
  8. select(sample_name, column2)
  9. column3_top50 &lt;- dataframe %&gt;%
  10. arrange(desc(column3)) %&gt;%
  11. slice_head(n = 50) %&gt;%
  12. select(sample_name, column3)

答案1

得分: 1

我无法进行没有任何示例数据的测试,但这里有一种使用 purrr::map_dfr 的选项。您可以在数据框的每一列上进行“循环”,并返回前50个数值。

  1. library(dplyr)
  2. library(purrr)
  3. set.seed(100)
  4. tmp = data.frame(
  5. col1 = rnorm(100),
  6. col2 = rnorm(100),
  7. col3 = rnorm(100)
  8. )
  9. tmp %>%
  10. map_dfr(~ head(sort(.x, decreasing = TRUE), 50))

这将返回一个包含前50个数值的数据框。

英文:

I'm unable to test without any sample data but here's an option using purrr::map_dfr. Where you "loop" through each col in the data.frame and return the top 50 numeric values.

  1. library(dplyr)
  2. library(purrr)
  3. set.seed(100)
  4. tmp = data.frame(
  5. col1 = rnorm(100),
  6. col2 = rnorm(100),
  7. col3 = rnorm(100)
  8. )
  9. tmp %&gt;%
  10. map_dfr(~ head(sort(.x, decreasing = TRUE),50))
  11. # A tibble: 50 &#215; 3
  12. col1 col2 col3
  13. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  14. 1 2.58 2.17 2.73
  15. 2 2.45 1.90 2.61
  16. 3 2.31 1.65 2.55
  17. 4 1.90 1.62 1.88
  18. 5 1.82 1.58 1.79
  19. 6 1.76 1.36 1.35
  20. 7 1.73 1.35 1.35
  21. 8 1.65 1.27 1.24
  22. 9 1.43 1.24 1.23
  23. 10 1.40 1.03 1.14
  24. # … with 40 more rows

答案2

得分: 1

I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the for loop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:

  1. library(tidyverse)
  2. df = data.frame(sample_name = rep("example", 100),
  3. x1 = rnorm(100),
  4. x2 = rnorm(100),
  5. x3 = rnorm(100))
  6. for(i in 1:ncol(df)) {
  7. df1 <- df %>% arrange(desc(.[[i]])) %>%
  8. slice_head(n = 50)
  9. assign(paste0(colnames(df)[[i]], "_top50"), df1[,c(1, i)])
  10. }
英文:

I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the forloop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:

  1. library(tidyverse)
  2. df = data.frame(sample_name = rep(&quot;example&quot;, 100),
  3. x1 = rnorm(100),
  4. x2 = rnorm(100),
  5. x3 = rnorm(100))
  6. for(i in 1:ncol(df)) {
  7. df1 &lt;- df %&gt;% arrange(desc(.[[i]])) %&gt;%
  8. slice_head(n = 50)
  9. assign(paste0(colnames(df)[[i]], &quot;_top50&quot;), df1[,c(1, i)])
  10. }

huangapple
  • 本文由 发表于 2023年7月7日 00:53:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631017.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定