2020年1月6日 22:55:37go评论81阅读模式

英文:

Creating function from merge script using data.table in R

问题

It looks like you're trying to translate code and errors related to that code. Here's the translation of the code snippet and the error messages:

Original Code:

I have the following code (which runs as expected):
...
Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

Translated Code:

我有以下代码（按预期运行）：
...
有人能看出我做错了什么吗？非常感谢您的帮助！

Original Error 1:

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
7.
stop(ngettext(sum(bad), "&#39;by&#39; must specify a uniquely valid column", 
    "&#39;by&#39; must specify uniquely valid columns"), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
...

Translated Error 1:

错误：在 fix.by(by.x, x) 中出现问题：'by' 必须指定一个唯一有效的列
7.
stop(ngettext(sum(bad), "&#39;by&#39; must specify a uniquely valid column", 
    "&#39;by&#39; must specify uniquely valid columns"), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
...

Original Error 2:

Error in `[.data.table`(data, , V3, with = F) : 
  Item 8 of j is 4 which is outside the column number range [1,ncol=3] 
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)], 
    " which is outside the column number range [1,ncol=", ncol(x), 
    "]") 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
...

Translated Error 2:

错误：在 `[.data.table`(data, , V3, with = F) 中出现问题：
j 的第 8 项是 4，超出了列数范围 [1,ncol=3]
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)], 
    " which is outside the column number range [1,ncol=", ncol(x), 
    "]") 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
...

Please note that these translations are based on your provided code and error messages. If you have any specific questions or need further assistance, please feel free to ask.

英文:

I have the following code (which runs as expected):

library(&quot;data.table&quot;)
DT1 &lt;- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
test &lt;- merge(DT1[,c(&quot;V1&quot;,&quot;V2&quot;)],unique(DT1[,&quot;V3&quot;]),by.x=&quot;V1&quot;,by.y = &quot;V3&quot;)
test &lt;- test[!is.na(V1),]
test &lt;- test[!V1 %in% V1[which(duplicated(test$V1))]]
DT1[, V4 := merge(DT1[,&quot;V3&quot;],test,by.x = &quot;V3&quot;, by.y = &quot;V1&quot;, all.x=T, sort= F, all.y = F)[,2]]

That I want to make into a function:

fillInFields &lt;- function(data,V1,V2,V3,V4){
  test &lt;- merge(data[,c(V1,V2)],unique(data[,V3]),by.x=V1,by.y = V3)
  
  test &lt;- test[!is.na(cat(V1)),]
  
  test &lt;- test[!cat(V1) %in% cat(V1)[which(duplicated(test[,V1]))]]
  
  data[, cat(V4) := merge(data[,V3],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]]
  
}

However, when I run:

DT1 &lt;- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1,&quot;V1&quot;,&quot;V2&quot;,&quot;V3&quot;,&quot;V4&quot;)

I get the following error (Included traceback):

 Error in fix.by(by.x, x) : &#39;by&#39; must specify a uniquely valid column 
7.
stop(ngettext(sum(bad), &quot;&#39;by&#39; must specify a uniquely valid column&quot;, 
    &quot;&#39;by&#39; must specify uniquely valid columns&quot;), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
4.
merge(as.data.frame(x), as.data.frame(y), ...) 
3.
merge.default(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, 
    by.y = V3) 
2.
merge(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, by.y = V3) 
1.
fillInFields(DT1, &quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;, &quot;V4&quot;)

Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

After doing some alterations following the comment from chinsoon12 I now have the following function:

fillInFields &lt;- function(data,V1,V2,V3,V4){
  test &lt;- merge(data[,c(V1,V2),with=F],unique(data[,V3,with=F]),by.x=V1,by.y = V3)
  
  test &lt;- test[!is.na(V1),]
  
  test &lt;- test[!V1 %in% V1[which(duplicated(test[,V1]))]]
  
  return(data[, V4 := merge(data[,V3,with=F],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]])
  
}

I now get the following error:

 Error in `[.data.table`(data, , V3, with = F) : 
  Item 8 of j is 4 which is outside the column number range [1,ncol=3] 
9.
stop(&quot;Item &quot;, which.first(w), &quot; of j is &quot;, j[which.first(w)], 
    &quot; which is outside the column number range [1,ncol=&quot;, ncol(x), 
    &quot;]&quot;) 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
6.
merge(data[, V3, with = F], test, by.x = V3, by.y = V1, all.x = T, 
    sort = F, all.y = F) 
5.
eval(jsub, SDenv, parent.frame()) 
4.
eval(jsub, SDenv, parent.frame()) 
3.
`[.data.table`(data, , `:=`(V4, merge(data[, V3, with = F], test, 
    by.x = V3, by.y = V1, all.x = T, sort = F, all.y = F)[, 2])) 
2.
data[, `:=`(V4, merge(data[, V3, with = F], test, by.x = V3, 
    by.y = V1, all.x = T, sort = F, all.y = F)[, 2])] 
1.
fillInFields(DT1, &quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;, &quot;V4&quot;)

Any ideas?

答案1

得分: 0

以下是您提供的代码的翻译部分：

在当前方法中存在许多作用域问题，因为相同的名称用于函数参数和`data.table`中的列。建议重新命名函数参数。以下是您函数的重写：
fillInFields <- function(data, idcol, vcol, newidcol, newvcol) {
    nondup <- data[{
        x <- get(idcol)
        !is.na(x) & !(duplicated(x) | duplicated(x, fromLast=TRUE))
    }]
    data[nondup, on=paste0(newidcol,"==",idcol), (newvcol) := get(paste0("i.", vcol))]
}
示例用法：
library(data.table)
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1, "V1", "V2", "V3", "V4")[]

输出：

        V1   V2 V3   V4
     1:  1    A  1    A
     2:  3    A NA <NA>
     3:  3    C  2 <NA>
     4: NA    D NA <NA>
     5:  5    E  3 <NA>
     6:  6    F  3 <NA>
     7:  7    G  3 <NA>
     8:  8 <NA>  4 <NA>
     9:  9    I  5    E
    10: 10    J  6    F

希望这有助于您理解代码的翻译。

英文:

There are a lot of scoping issues with the current approach as the same names is used for function arguments and columns in the data.table. A suggestion is to rename the function arguments. Here is a rewrite of your function:

fillInFields &lt;- function(data, idcol, vcol, newidcol, newvcol) {
    nondup &lt;- data[{
        x &lt;- get(idcol)
        !is.na(x) &amp; !(duplicated(x) | duplicated(x, fromLast=TRUE))
    }]
    data[nondup, on=paste0(newidcol,&quot;==&quot;,idcol), (newvcol) := get(paste0(&quot;i.&quot;, vcol))]
}

Example usage:

library(data.table)
DT1 &lt;- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1, &quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;, &quot;V4&quot;)[]

output:

    V1   V2 V3   V4
 1:  1    A  1    A
 2:  3    A NA &lt;NA&gt;
 3:  3    C  2 &lt;NA&gt;
 4: NA    D NA &lt;NA&gt;
 5:  5    E  3 &lt;NA&gt;
 6:  6    F  3 &lt;NA&gt;
 7:  7    G  3 &lt;NA&gt;
 8:  8 &lt;NA&gt;  4 &lt;NA&gt;
 9:  9    I  5    E
10: 10    J  6    F

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用R中的data.table创建合并脚本的函数

问题

答案1

随机选择 R 数据表中的 50 列会导致只有 50 行的表格。如何修复这个问题？

在R中创建条件列时出错。

为什么在R中使用Likert包的100%堆叠条形图时，x轴上的条形被反转？

通过列表的特定列值获取对象名称

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。