英文:
R for-loops for applying a function to a dataframe vector based on the ID of another column
问题
我正在学习R语言,并尝试使用for循环,但我找不到简化版本的方法。我有一个包含82个唯一ID的数据集。我有一个包含82个函数的列表,我想对2000多个数值列应用每个ID对应的函数,但我只想对具有相同ID的值应用这些函数。
我有一个类似于下面的数据框,我想使用这些函数进行预测:
ID | num | pred |
---|---|---|
ID1 | 0.1 | - |
ID1 | 0.2 | - |
ID2 | 0.2 | - |
ID2 | 0.3 | - |
ID3 | 0.5 | - |
ID3 | 0.7 | - |
ID3 | 0.7 | - |
我知道下面这行代码可以给我特定ID的预测值,但我不知道如何使用for循环来避免重复82次的操作:
funclist[["ID1"]](df$num[df$ID == "ID1"])
我尝试了下面的代码,但没有得到任何结果:
pred <- numeric()
for (i in 1:length(funclist)) {
pred <- funclist[[i]](df$num[df$ID == i])
}
而下面的代码虽然有结果,但在与第一行代码进行双重检查后发现预测结果与函数的ID不匹配:
pred <- numeric()
for (i in 1:length(funclist)) {
pred <- funclist[[i]](df$num)
}
如果有人能提供一些关于如何成功实现这个目标的建议,或者我所缺乏的基本for循环知识,那将非常棒。
英文:
I'm learning R and have ventured into for-loops that I can't find simplified versions of. I have a data set with 82 unique IDs. I have a list of 82 functions, for each ID, that I want to apply over a 2000+ numeric column, but I only want to apply the functions over values that also share the same ID.
I have a data frame similar to this where I want to use the functions to make the prediction:
ID | num | pred |
---|---|---|
ID1 | 0.1 | - |
ID1 | 0.2 | - |
ID2 | 0.2 | - |
ID2 | 0.3 | - |
ID3 | 0.5 | - |
ID3 | 0.7 | - |
ID3 | 0.7 | - |
funclist[["ID1"]](df$num[df$ID == "ID1"])
I know this line of code will give me the predicted values for that specific ID, but I am not sure how to make a for-loop so I don't have to brute force it 82 times.
pred <- numeric()
for (i in 1:length(funclist)) {
pred <- funclist[[i]](df$num[NatEmb$ID == i])
}
This resulted in nothing while
pred <- numeric()
for (i in 1:length(funclist)) {
pred <- funclist[[i]](df$num)
}
gave a result, but after double checking with the first line of code it did not match the function ID to the ID of the numeric column.
If someone could offer some advise on how to successfully do this, or basic some basic for loop knowledge I am lacking, that would be awesome.
答案1
得分: 2
你可以使用基本的R语言中的for
循环方法,或者加载额外的包来使用向量化操作,即在列表/数据框中应用相同的操作,利用tidyverse
包的魔力。
如果你是R语言的新手,我强烈推荐你选择这种学习数据分析的方法。
以下是我会采取的方法。非常感谢@DaveArmstrong提供的数据。
library(tidyverse)
# 在这里放置函数列表
funclist <- list(ID1 = function(x){x*2},
ID2 = function(x){x*3},
ID3 = function(x){x*4})
tib <- tibble::tribble(
~ID, ~num, ~pred,
"ID1", 0.1, NA,
"ID1", 0.2, NA,
"ID2", 0.2, NA,
"ID2", 0.3, NA,
"ID3", 0.5, NA,
"ID3", 0.7, NA,
"ID3", 0.7, NA)
mutate(tib,
pred = map2_dbl(ID, num, # 在两个并行向量上运行
~ funclist[[.x]](.y))) # 并应用函数
# ID匹配向量1(.x)与向量2(.y)中的值
#> # A tibble: 7 × 3
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
创建于2023-08-08,使用 reprex v2.0.2
英文:
You could use the for
loop approach with base R or load an additional package to use vectorization - applying the same operation on a list/dataframe with the magic of tidyverse
packages.
And if you are new to R, I would definitely recommend going this route for learning data analysis.
Here's the approach I would take. Many thanks to @DaveArmstrong for the data
library(tidyverse)
# put the list of functions here
funclist <- list(ID1 = function(x){x*2},
ID2 = function(x){x*3},
ID3 = function(x){x*4})
tib <- tibble::tribble(
~ID, ~num, ~pred,
"ID1", 0.1, NA,
"ID1", 0.2, NA,
"ID2", 0.2, NA,
"ID2", 0.3, NA,
"ID3", 0.5, NA,
"ID3", 0.7, NA,
"ID3", 0.7, NA)
mutate(tib,
pred = map2_dbl(ID, num, # run over two parallel vectors
~ funclist[[.x]](.y))) # and apply the function
# ID matching vector 1 (.x) to values in vector 2(.y)
#> # A tibble: 7 × 3
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
<sup>Created on 2023-08-08 with reprex v2.0.2</sup>
答案2
得分: 0
你走在正确的轨道上,只是你的循环需要稍作调整。首先,我们可以创建数据:
tib <- tibble::tribble(
~ID, ~num, ~pred,
"ID1", 0.1, NA,
"ID1", 0.2, NA,
"ID2", 0.2, NA,
"ID2", 0.3, NA,
"ID3", 0.5, NA,
"ID3", 0.7, NA,
"ID3", 0.7, NA)
我只是定义了一个简单的函数列表,这样你就可以看到它是如何工作的,你的函数可能比这个复杂。
funclist <- list(ID1 = function(x){x*2},
ID2 = function(x){x*3},
ID3 = function(x){x*4})
由于你希望在循环中传播的值是ID(例如"ID1"、"ID2"),那么你需要在循环中使用这些ID。所以,你可以循环遍历names(funclist)
而不是1:length(funclist)
,这样索引每次都会是函数的一个名称。在循环内部,你想要更改pred
的值,但只对具有相应ID的情况进行更改,你只需使用适当的funclist
值和列num
来替换那些观测值,但仅限于所讨论的ID。
for (i in names(funclist)) {
tib$pred[tib$ID == i] <- funclist[[i]](tib$num[tib$ID == i])
}
tib
#> # A tibble: 7 × 3
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
你还可以使用mutate
和rowwise()
在每行上单独调用函数:
library(tidyverse)
tib %>%
rowwise() %>%
mutate(pred = funclist[[ID]](num))
#> # A tibble: 7 × 3
#> # Rowwise:
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
或者更高效地使用group_by()
和mutate()
,在组中使用第一个ID
的值调用函数。
tib %>%
group_by(ID) %>%
mutate(pred = funclist[[ID[1]]](num))
#> # A tibble: 7 × 3
#> # Groups: ID [3]
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
在2023-08-09使用reprex v2.0.2创建
英文:
You were on the right track, your loop just needs a bit of tweaking. First, we can make the data:
tib <- tibble::tribble(
~ID, ~num, ~pred,
"ID1", 0.1, NA,
"ID1", 0.2, NA,
"ID2", 0.2, NA,
"ID2", 0.3, NA,
"ID3", 0.5, NA,
"ID3", 0.7, NA,
"ID3", 0.7, NA)
I just defined a simple function list so you can see how it works, presumably yours is more complicated than this.
funclist <- list(ID1 = function(x){x*2},
ID2 = function(x){x*3},
ID3 = function(x){x*4})
Since you want the values that get propagated through the loop to be IDs (e.g., "ID1", "ID2"), then you need those to be what you loop over. So, instead of looping over 1:length(funclist)
you can loop over names(funclist)
, then the index will be one of the names of the function each time. Inside the loop, you want to change the values of pred
but only for the cases with the appropriate ID, you replace just those observations with predictions using the appropriate funclist
values and the column num
, but again only for the ID in question.
for (i in names(funclist)) {
tib$pred[tib$ID == i] <- funclist[[i]](tib$num[tib$ID == i])
}
tib
#> # A tibble: 7 × 3
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
You could also do this with mutate
and rowwise()
to call the function on each row individually:
library(tidyverse)
tib %>%
rowwise() %>%
mutate(pred = funclist[[ID]](num))
#> # A tibble: 7 × 3
#> # Rowwise:
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
Or even more efficiently with group_by()
and mutate()
calling the function using the first value of ID
in the group.
tib %>%
group_by(ID) %>%
mutate(pred = funclist[[ID[1]]](num))
#> # A tibble: 7 × 3
#> # Groups: ID [3]
#> ID num pred
#> <chr> <dbl> <dbl>
#> 1 ID1 0.1 0.2
#> 2 ID1 0.2 0.4
#> 3 ID2 0.2 0.6
#> 4 ID2 0.3 0.9
#> 5 ID3 0.5 2
#> 6 ID3 0.7 2.8
#> 7 ID3 0.7 2.8
<sup>Created on 2023-08-09 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论