在R中的for循环 – 迭代地重复一段代码。

huangapple go评论72阅读模式
英文:

for loop in R - repeating a block of code iteratively

问题

column1_top50 <- dataframe %>%
arrange(desc(column1)) %>%
slice_head(n = 50) %>%
select(sample_name, column1)

column2_top50 <- dataframe %>%
arrange(desc(column2)) %>%
slice_head(n = 50) %>%
select(sample_name, column2)

column3_top50 <- dataframe %>%
arrange(desc(column3)) %>%
slice_head(n = 50) %>%
select(sample_name, column3)

英文:

I have a block of code that extracts the top 50 values from a specific column in my dataframe, outputting a new dataframe named accordingly. I want to repeat this process for every column in my dataframe (50-100 columns), as follows. How can I automate this?

column1_top50 &lt;- dataframe %&gt;%
arrange(desc(column1)) %&gt;%
slice_head(n = 50) %&gt;%
select(sample_name, column1)

column2_top50 &lt;- dataframe %&gt;%
  arrange(desc(column2)) %&gt;%
  slice_head(n = 50) %&gt;%
  select(sample_name, column2)

column3_top50 &lt;- dataframe %&gt;%
  arrange(desc(column3)) %&gt;%
  slice_head(n = 50) %&gt;%
  select(sample_name, column3)

答案1

得分: 1

我无法进行没有任何示例数据的测试,但这里有一种使用 purrr::map_dfr 的选项。您可以在数据框的每一列上进行“循环”,并返回前50个数值。

library(dplyr)
library(purrr)

set.seed(100)

tmp = data.frame(
  col1 = rnorm(100),
  col2 = rnorm(100),
  col3 = rnorm(100)
)

tmp %>%
  map_dfr(~ head(sort(.x, decreasing = TRUE), 50))

这将返回一个包含前50个数值的数据框。

英文:

I'm unable to test without any sample data but here's an option using purrr::map_dfr. Where you "loop" through each col in the data.frame and return the top 50 numeric values.

library(dplyr)
library(purrr)

set.seed(100)

tmp = data.frame(
  col1 = rnorm(100),
  col2 = rnorm(100),
  col3 = rnorm(100)
)

tmp %&gt;%
  map_dfr(~ head(sort(.x, decreasing = TRUE),50))

# A tibble: 50 &#215; 3
    col1  col2  col3
   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1  2.58  2.17  2.73
 2  2.45  1.90  2.61
 3  2.31  1.65  2.55
 4  1.90  1.62  1.88
 5  1.82  1.58  1.79
 6  1.76  1.36  1.35
 7  1.73  1.35  1.35
 8  1.65  1.27  1.24
 9  1.43  1.24  1.23
10  1.40  1.03  1.14
# … with 40 more rows

答案2

得分: 1

I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the for loop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:

library(tidyverse)
df = data.frame(sample_name = rep("example", 100),
                x1 = rnorm(100),
                x2 = rnorm(100),
                x3 = rnorm(100)) 
 for(i in 1:ncol(df)) {
  df1 <- df %>% arrange(desc(.[[i]])) %>%
    slice_head(n = 50)
  assign(paste0(colnames(df)[[i]], "_top50"), df1[,c(1, i)])
}
英文:

I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the forloop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:

library(tidyverse)
df = data.frame(sample_name = rep(&quot;example&quot;, 100),
                x1 = rnorm(100),
                x2 = rnorm(100),
                x3 = rnorm(100)) 
 for(i in 1:ncol(df)) {
  df1 &lt;- df %&gt;% arrange(desc(.[[i]])) %&gt;% 
    slice_head(n = 50)
  assign(paste0(colnames(df)[[i]], &quot;_top50&quot;), df1[,c(1, i)])
}

huangapple
  • 本文由 发表于 2023年7月7日 00:53:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631017.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定