2023年8月9日 07:13:39go评论146阅读模式

英文:

R for-loops for applying a function to a dataframe vector based on the ID of another column

问题

我正在学习R语言，并尝试使用for循环，但我找不到简化版本的方法。我有一个包含82个唯一ID的数据集。我有一个包含82个函数的列表，我想对2000多个数值列应用每个ID对应的函数，但我只想对具有相同ID的值应用这些函数。

我有一个类似于下面的数据框，我想使用这些函数进行预测：

ID	num	pred
ID1	0.1	-
ID1	0.2	-
ID2	0.2	-
ID2	0.3	-
ID3	0.5	-
ID3	0.7	-
ID3	0.7	-

我知道下面这行代码可以给我特定ID的预测值，但我不知道如何使用for循环来避免重复82次的操作：

funclist[["ID1"]](df$num[df$ID == "ID1"])

我尝试了下面的代码，但没有得到任何结果：

pred <- numeric()
for (i in 1:length(funclist)) {
 pred <- funclist[[i]](df$num[df$ID == i])
}

而下面的代码虽然有结果，但在与第一行代码进行双重检查后发现预测结果与函数的ID不匹配：

pred <- numeric()
for (i in 1:length(funclist)) {
 pred <- funclist[[i]](df$num)
}

如果有人能提供一些关于如何成功实现这个目标的建议，或者我所缺乏的基本for循环知识，那将非常棒。

英文:

I'm learning R and have ventured into for-loops that I can't find simplified versions of. I have a data set with 82 unique IDs. I have a list of 82 functions, for each ID, that I want to apply over a 2000+ numeric column, but I only want to apply the functions over values that also share the same ID.

I have a data frame similar to this where I want to use the functions to make the prediction:

ID	num	pred
ID1	0.1	-
ID1	0.2	-
ID2	0.2	-
ID2	0.3	-
ID3	0.5	-
ID3	0.7	-
ID3	0.7	-

funclist[[&quot;ID1&quot;]](df$num[df$ID == &quot;ID1&quot;])

I know this line of code will give me the predicted values for that specific ID, but I am not sure how to make a for-loop so I don't have to brute force it 82 times.

pred &lt;- numeric()
for (i in 1:length(funclist)) {
 pred &lt;- funclist[[i]](df$num[NatEmb$ID == i])
}

This resulted in nothing while

pred &lt;- numeric()
for (i in 1:length(funclist)) {
 pred &lt;- funclist[[i]](df$num)
}

gave a result, but after double checking with the first line of code it did not match the function ID to the ID of the numeric column.

If someone could offer some advise on how to successfully do this, or basic some basic for loop knowledge I am lacking, that would be awesome.

答案1

得分: 2

你可以使用基本的R语言中的for循环方法，或者加载额外的包来使用向量化操作，即在列表/数据框中应用相同的操作，利用tidyverse包的魔力。

如果你是R语言的新手，我强烈推荐你选择这种学习数据分析的方法。

以下是我会采取的方法。非常感谢@DaveArmstrong提供的数据。

library(tidyverse)
# 在这里放置函数列表
funclist <- list(ID1 = function(x){x*2}, 
                 ID2 = function(x){x*3}, 
                 ID3 = function(x){x*4})
tib <- tibble::tribble(
  ~ID,    ~num,   ~pred,
  "ID1",  0.1, NA,
  "ID1",  0.2, NA,
  "ID2",  0.2, NA,
  "ID2",  0.3, NA,
  "ID3",  0.5, NA,
  "ID3",  0.7, NA,
  "ID3",  0.7, NA)
mutate(tib,
       pred = map2_dbl(ID, num, # 在两个并行向量上运行
                       ~ funclist[[.x]](.y))) # 并应用函数
# ID匹配向量1（.x）与向量2（.y）中的值
#> # A tibble: 7 × 3
#>   ID      num  pred
#>   <chr> <dbl> <dbl>
#> 1 ID1     0.1   0.2
#> 2 ID1     0.2   0.4
#> 3 ID2     0.2   0.6
#> 4 ID2     0.3   0.9
#> 5 ID3     0.5   2  
#> 6 ID3     0.7   2.8
#> 7 ID3     0.7   2.8

^{创建于2023-08-08，使用 reprex v2.0.2}

英文:

You could use the for loop approach with base R or load an additional package to use vectorization - applying the same operation on a list/dataframe with the magic of tidyverse packages.

And if you are new to R, I would definitely recommend going this route for learning data analysis.

Here's the approach I would take. Many thanks to @DaveArmstrong for the data

library(tidyverse)
# put the list of functions here
funclist &lt;- list(ID1 = function(x){x*2}, 
                 ID2 = function(x){x*3}, 
                 ID3 = function(x){x*4})
tib &lt;- tibble::tribble(
  ~ID,    ~num,   ~pred,
  &quot;ID1&quot;,  0.1, NA,
  &quot;ID1&quot;,  0.2, NA,
  &quot;ID2&quot;,  0.2, NA,
  &quot;ID2&quot;,  0.3, NA,
  &quot;ID3&quot;,  0.5, NA,
  &quot;ID3&quot;,  0.7, NA,
  &quot;ID3&quot;,  0.7, NA)
mutate(tib,
       pred = map2_dbl(ID, num, # run over two parallel vectors
                       ~ funclist[[.x]](.y))) # and apply the function
# ID matching vector 1 (.x) to values in vector 2(.y)
#&gt; # A tibble: 7 &#215; 3
#&gt;   ID      num  pred
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 ID1     0.1   0.2
#&gt; 2 ID1     0.2   0.4
#&gt; 3 ID2     0.2   0.6
#&gt; 4 ID2     0.3   0.9
#&gt; 5 ID3     0.5   2  
#&gt; 6 ID3     0.7   2.8
#&gt; 7 ID3     0.7   2.8

<sup>Created on 2023-08-08 with reprex v2.0.2</sup>

答案2

得分: 0

你走在正确的轨道上，只是你的循环需要稍作调整。首先，我们可以创建数据：

tib <- tibble::tribble(
~ID,    ~num,   ~pred,
"ID1",  0.1, NA,
"ID1",  0.2, NA,
"ID2",  0.2, NA,
"ID2",  0.3, NA,
"ID3",  0.5, NA,
"ID3",  0.7, NA,
"ID3",  0.7, NA)

我只是定义了一个简单的函数列表，这样你就可以看到它是如何工作的，你的函数可能比这个复杂。

funclist <- list(ID1 = function(x){x*2}, 
                 ID2 = function(x){x*3}, 
                 ID3 = function(x){x*4})

由于你希望在循环中传播的值是ID（例如"ID1"、"ID2"），那么你需要在循环中使用这些ID。所以，你可以循环遍历names(funclist)而不是1:length(funclist)，这样索引每次都会是函数的一个名称。在循环内部，你想要更改pred的值，但只对具有相应ID的情况进行更改，你只需使用适当的funclist值和列num来替换那些观测值，但仅限于所讨论的ID。

for (i in names(funclist)) {
  tib$pred[tib$ID == i] <- funclist[[i]](tib$num[tib$ID == i])
} 
tib
#> # A tibble: 7 × 3
#>   ID      num  pred
#>   <chr> <dbl> <dbl>
#> 1 ID1     0.1   0.2
#> 2 ID1     0.2   0.4
#> 3 ID2     0.2   0.6
#> 4 ID2     0.3   0.9
#> 5 ID3     0.5   2  
#> 6 ID3     0.7   2.8
#> 7 ID3     0.7   2.8

你还可以使用mutate和rowwise()在每行上单独调用函数：

library(tidyverse)
tib %>% 
  rowwise() %>% 
  mutate(pred = funclist[[ID]](num))
#> # A tibble: 7 × 3
#> # Rowwise: 
#>   ID      num  pred
#>   <chr> <dbl> <dbl>
#> 1 ID1     0.1   0.2
#> 2 ID1     0.2   0.4
#> 3 ID2     0.2   0.6
#> 4 ID2     0.3   0.9
#> 5 ID3     0.5   2  
#> 6 ID3     0.7   2.8
#> 7 ID3     0.7   2.8

或者更高效地使用group_by()和mutate()，在组中使用第一个ID的值调用函数。

tib %>% 
  group_by(ID) %>% 
  mutate(pred = funclist[[ID[1]]](num))
#> # A tibble: 7 × 3
#> # Groups:   ID [3]
#>   ID      num  pred
#>   <chr> <dbl> <dbl>
#> 1 ID1     0.1   0.2
#> 2 ID1     0.2   0.4
#> 3 ID2     0.2   0.6
#> 4 ID2     0.3   0.9
#> 5 ID3     0.5   2  
#> 6 ID3     0.7   2.8
#> 7 ID3     0.7   2.8

^{在2023-08-09使用reprex v2.0.2创建}

英文:

You were on the right track, your loop just needs a bit of tweaking. First, we can make the data:

tib &lt;- tibble::tribble(
~ID,    ~num,   ~pred,
&quot;ID1&quot;,  0.1, NA,
&quot;ID1&quot;,  0.2, NA,
&quot;ID2&quot;,  0.2, NA,
&quot;ID2&quot;,  0.3, NA,
&quot;ID3&quot;,  0.5, NA,
&quot;ID3&quot;,  0.7, NA,
&quot;ID3&quot;,  0.7, NA)

I just defined a simple function list so you can see how it works, presumably yours is more complicated than this.

funclist &lt;- list(ID1 = function(x){x*2}, 
                 ID2 = function(x){x*3}, 
                 ID3 = function(x){x*4})

Since you want the values that get propagated through the loop to be IDs (e.g., "ID1", "ID2"), then you need those to be what you loop over. So, instead of looping over 1:length(funclist) you can loop over names(funclist), then the index will be one of the names of the function each time. Inside the loop, you want to change the values of pred but only for the cases with the appropriate ID, you replace just those observations with predictions using the appropriate funclist values and the column num, but again only for the ID in question.

for (i in names(funclist)) {
  tib$pred[tib$ID == i] &lt;- funclist[[i]](tib$num[tib$ID == i])
} 
tib
#&gt; # A tibble: 7 &#215; 3
#&gt;   ID      num  pred
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 ID1     0.1   0.2
#&gt; 2 ID1     0.2   0.4
#&gt; 3 ID2     0.2   0.6
#&gt; 4 ID2     0.3   0.9
#&gt; 5 ID3     0.5   2  
#&gt; 6 ID3     0.7   2.8
#&gt; 7 ID3     0.7   2.8

You could also do this with mutate and rowwise() to call the function on each row individually:

library(tidyverse)
tib %&gt;% 
  rowwise() %&gt;% 
  mutate(pred = funclist[[ID]](num))
#&gt; # A tibble: 7 &#215; 3
#&gt; # Rowwise: 
#&gt;   ID      num  pred
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 ID1     0.1   0.2
#&gt; 2 ID1     0.2   0.4
#&gt; 3 ID2     0.2   0.6
#&gt; 4 ID2     0.3   0.9
#&gt; 5 ID3     0.5   2  
#&gt; 6 ID3     0.7   2.8
#&gt; 7 ID3     0.7   2.8

Or even more efficiently with group_by() and mutate() calling the function using the first value of ID in the group.

tib %&gt;% 
  group_by(ID) %&gt;% 
  mutate(pred = funclist[[ID[1]]](num))
#&gt; # A tibble: 7 &#215; 3
#&gt; # Groups:   ID [3]
#&gt;   ID      num  pred
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 ID1     0.1   0.2
#&gt; 2 ID1     0.2   0.4
#&gt; 3 ID2     0.2   0.6
#&gt; 4 ID2     0.3   0.9
#&gt; 5 ID3     0.5   2  
#&gt; 6 ID3     0.7   2.8
#&gt; 7 ID3     0.7   2.8

<sup>Created on 2023-08-09 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R的for循环用于根据另一列的ID将函数应用于数据框向量。

问题

答案1

答案2

removeAll()方法花费了很长时间。

Errorlevel在使用ping命令的for循环中始终返回0。

Function table(x, y) 兼容 R 基础语言和本地管道。

合并数据框时，多个匹配项可能存在时，不重复数据。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。