使用R中的data.table创建合并脚本的函数

huangapple go评论59阅读模式
英文:

Creating function from merge script using data.table in R

问题

It looks like you're trying to translate code and errors related to that code. Here's the translation of the code snippet and the error messages:

Original Code:

I have the following code (which runs as expected):

...

Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

Translated Code:

我有以下代码(按预期运行):

...

有人能看出我做错了什么吗?非常感谢您的帮助!

Original Error 1:

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", 
    "'by' must specify uniquely valid columns"), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
...

Translated Error 1:

错误:在 fix.by(by.x, x) 中出现问题:'by' 必须指定一个唯一有效的列
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", 
    "'by' must specify uniquely valid columns"), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
...

Original Error 2:

Error in `[.data.table`(data, , V3, with = F) : 
  Item 8 of j is 4 which is outside the column number range [1,ncol=3] 
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)], 
    " which is outside the column number range [1,ncol=", ncol(x), 
    "]") 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
...

Translated Error 2:

错误:在 `[.data.table`(data, , V3, with = F) 中出现问题:
j 的第 8 项是 4,超出了列数范围 [1,ncol=3]
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)], 
    " which is outside the column number range [1,ncol=", ncol(x), 
    "]") 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
...

Please note that these translations are based on your provided code and error messages. If you have any specific questions or need further assistance, please feel free to ask.

英文:

I have the following code (which runs as expected):

library("data.table")

DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))

test <- merge(DT1[,c("V1","V2")],unique(DT1[,"V3"]),by.x="V1",by.y = "V3")

test <- test[!is.na(V1),]

test <- test[!V1 %in% V1[which(duplicated(test$V1))]]

DT1[, V4 := merge(DT1[,"V3"],test,by.x = "V3", by.y = "V1", all.x=T, sort= F, all.y = F)[,2]]

That I want to make into a function:

fillInFields <- function(data,V1,V2,V3,V4){
  test <- merge(data[,c(V1,V2)],unique(data[,V3]),by.x=V1,by.y = V3)
  
  test <- test[!is.na(cat(V1)),]
  
  test <- test[!cat(V1) %in% cat(V1)[which(duplicated(test[,V1]))]]
  
  data[, cat(V4) := merge(data[,V3],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]]
  
}

However, when I run:

DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))

fillInFields(DT1,"V1","V2","V3","V4")

I get the following error (Included traceback):

 Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column 
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", 
    "'by' must specify uniquely valid columns"), domain = NA) 
6.
fix.by(by.x, x) 
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...) 
4.
merge(as.data.frame(x), as.data.frame(y), ...) 
3.
merge.default(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, 
    by.y = V3) 
2.
merge(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, by.y = V3) 
1.
fillInFields(DT1, "V1", "V2", "V3", "V4") 

Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

After doing some alterations following the comment from chinsoon12 I now have the following function:

fillInFields <- function(data,V1,V2,V3,V4){
  test <- merge(data[,c(V1,V2),with=F],unique(data[,V3,with=F]),by.x=V1,by.y = V3)
  
  test <- test[!is.na(V1),]
  
  test <- test[!V1 %in% V1[which(duplicated(test[,V1]))]]
  
  return(data[, V4 := merge(data[,V3,with=F],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]])
  
}

I now get the following error:

 Error in `[.data.table`(data, , V3, with = F) : 
  Item 8 of j is 4 which is outside the column number range [1,ncol=3] 
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)], 
    " which is outside the column number range [1,ncol=", ncol(x), 
    "]") 
8.
`[.data.table`(data, , V3, with = F) 
7.
data[, V3, with = F] 
6.
merge(data[, V3, with = F], test, by.x = V3, by.y = V1, all.x = T, 
    sort = F, all.y = F) 
5.
eval(jsub, SDenv, parent.frame()) 
4.
eval(jsub, SDenv, parent.frame()) 
3.
`[.data.table`(data, , `:=`(V4, merge(data[, V3, with = F], test, 
    by.x = V3, by.y = V1, all.x = T, sort = F, all.y = F)[, 2])) 
2.
data[, `:=`(V4, merge(data[, V3, with = F], test, by.x = V3, 
    by.y = V1, all.x = T, sort = F, all.y = F)[, 2])] 
1.
fillInFields(DT1, "V1", "V2", "V3", "V4") 

Any ideas?

答案1

得分: 0

以下是您提供的代码的翻译部分:

在当前方法中存在许多作用域问题,因为相同的名称用于函数参数和`data.table`中的列。建议重新命名函数参数。以下是您函数的重写:

fillInFields <- function(data, idcol, vcol, newidcol, newvcol) {
    nondup <- data[{
        x <- get(idcol)
        !is.na(x) & !(duplicated(x) | duplicated(x, fromLast=TRUE))
    }]
    data[nondup, on=paste0(newidcol,"==",idcol), (newvcol) := get(paste0("i.", vcol))]
}

示例用法:

library(data.table)
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))

fillInFields(DT1, "V1", "V2", "V3", "V4")[]

输出:

        V1   V2 V3   V4
     1:  1    A  1    A
     2:  3    A NA <NA>
     3:  3    C  2 <NA>
     4: NA    D NA <NA>
     5:  5    E  3 <NA>
     6:  6    F  3 <NA>
     7:  7    G  3 <NA>
     8:  8 <NA>  4 <NA>
     9:  9    I  5    E
    10: 10    J  6    F

希望这有助于您理解代码的翻译。

英文:

There are a lot of scoping issues with the current approach as the same names is used for function arguments and columns in the data.table. A suggestion is to rename the function arguments. Here is a rewrite of your function:

fillInFields &lt;- function(data, idcol, vcol, newidcol, newvcol) {
    nondup &lt;- data[{
        x &lt;- get(idcol)
        !is.na(x) &amp; !(duplicated(x) | duplicated(x, fromLast=TRUE))
    }]
    data[nondup, on=paste0(newidcol,&quot;==&quot;,idcol), (newvcol) := get(paste0(&quot;i.&quot;, vcol))]
}

Example usage:

library(data.table)
DT1 &lt;- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))

fillInFields(DT1, &quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;, &quot;V4&quot;)[]

output:

    V1   V2 V3   V4
 1:  1    A  1    A
 2:  3    A NA &lt;NA&gt;
 3:  3    C  2 &lt;NA&gt;
 4: NA    D NA &lt;NA&gt;
 5:  5    E  3 &lt;NA&gt;
 6:  6    F  3 &lt;NA&gt;
 7:  7    G  3 &lt;NA&gt;
 8:  8 &lt;NA&gt;  4 &lt;NA&gt;
 9:  9    I  5    E
10: 10    J  6    F

huangapple
  • 本文由 发表于 2020年1月6日 22:55:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/59614247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定