英文:
for loop in R - repeating a block of code iteratively
问题
column1_top50 <- dataframe %>%
arrange(desc(column1)) %>%
slice_head(n = 50) %>%
select(sample_name, column1)
column2_top50 <- dataframe %>%
arrange(desc(column2)) %>%
slice_head(n = 50) %>%
select(sample_name, column2)
column3_top50 <- dataframe %>%
arrange(desc(column3)) %>%
slice_head(n = 50) %>%
select(sample_name, column3)
英文:
I have a block of code that extracts the top 50 values from a specific column in my dataframe, outputting a new dataframe named accordingly. I want to repeat this process for every column in my dataframe (50-100 columns), as follows. How can I automate this?
column1_top50 <- dataframe %>%
arrange(desc(column1)) %>%
slice_head(n = 50) %>%
select(sample_name, column1)
column2_top50 <- dataframe %>%
arrange(desc(column2)) %>%
slice_head(n = 50) %>%
select(sample_name, column2)
column3_top50 <- dataframe %>%
arrange(desc(column3)) %>%
slice_head(n = 50) %>%
select(sample_name, column3)
答案1
得分: 1
我无法进行没有任何示例数据的测试,但这里有一种使用 purrr::map_dfr
的选项。您可以在数据框的每一列上进行“循环”,并返回前50个数值。
library(dplyr)
library(purrr)
set.seed(100)
tmp = data.frame(
col1 = rnorm(100),
col2 = rnorm(100),
col3 = rnorm(100)
)
tmp %>%
map_dfr(~ head(sort(.x, decreasing = TRUE), 50))
这将返回一个包含前50个数值的数据框。
英文:
I'm unable to test without any sample data but here's an option using purrr::map_dfr
. Where you "loop" through each col in the data.frame and return the top 50 numeric values.
library(dplyr)
library(purrr)
set.seed(100)
tmp = data.frame(
col1 = rnorm(100),
col2 = rnorm(100),
col3 = rnorm(100)
)
tmp %>%
map_dfr(~ head(sort(.x, decreasing = TRUE),50))
# A tibble: 50 × 3
col1 col2 col3
<dbl> <dbl> <dbl>
1 2.58 2.17 2.73
2 2.45 1.90 2.61
3 2.31 1.65 2.55
4 1.90 1.62 1.88
5 1.82 1.58 1.79
6 1.76 1.36 1.35
7 1.73 1.35 1.35
8 1.65 1.27 1.24
9 1.43 1.24 1.23
10 1.40 1.03 1.14
# … with 40 more rows
答案2
得分: 1
I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the for loop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:
library(tidyverse)
df = data.frame(sample_name = rep("example", 100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100))
for(i in 1:ncol(df)) {
df1 <- df %>% arrange(desc(.[[i]])) %>%
slice_head(n = 50)
assign(paste0(colnames(df)[[i]], "_top50"), df1[,c(1, i)])
}
英文:
I'm not sure a for loop would be the most efficient way to do this (purrr would probably be faster). I'll also note that creating a data frame within a for loop is generally frowned upon (if you make an empty placeholder df before the forloop it'd be much faster), but without any sample data I tried to generalize this for the 50-100 columns you said you have:
library(tidyverse)
df = data.frame(sample_name = rep("example", 100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100))
for(i in 1:ncol(df)) {
df1 <- df %>% arrange(desc(.[[i]])) %>%
slice_head(n = 50)
assign(paste0(colnames(df)[[i]], "_top50"), df1[,c(1, i)])
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论