英文:
Creating function from merge script using data.table in R
问题
It looks like you're trying to translate code and errors related to that code. Here's the translation of the code snippet and the error messages:
Original Code:
I have the following code (which runs as expected):
...
Can anyone see what I am doing wrong? All help appreciated, thanks in advance!
Translated Code:
我有以下代码(按预期运行):
...
有人能看出我做错了什么吗?非常感谢您的帮助!
Original Error 1:
Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
"'by' must specify uniquely valid columns"), domain = NA)
6.
fix.by(by.x, x)
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
...
Translated Error 1:
错误:在 fix.by(by.x, x) 中出现问题:'by' 必须指定一个唯一有效的列
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
"'by' must specify uniquely valid columns"), domain = NA)
6.
fix.by(by.x, x)
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
...
Original Error 2:
Error in `[.data.table`(data, , V3, with = F) :
Item 8 of j is 4 which is outside the column number range [1,ncol=3]
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)],
" which is outside the column number range [1,ncol=", ncol(x),
"]")
8.
`[.data.table`(data, , V3, with = F)
7.
data[, V3, with = F]
...
Translated Error 2:
错误:在 `[.data.table`(data, , V3, with = F) 中出现问题:
j 的第 8 项是 4,超出了列数范围 [1,ncol=3]
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)],
" which is outside the column number range [1,ncol=", ncol(x),
"]")
8.
`[.data.table`(data, , V3, with = F)
7.
data[, V3, with = F]
...
Please note that these translations are based on your provided code and error messages. If you have any specific questions or need further assistance, please feel free to ask.
英文:
I have the following code (which runs as expected):
library("data.table")
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
test <- merge(DT1[,c("V1","V2")],unique(DT1[,"V3"]),by.x="V1",by.y = "V3")
test <- test[!is.na(V1),]
test <- test[!V1 %in% V1[which(duplicated(test$V1))]]
DT1[, V4 := merge(DT1[,"V3"],test,by.x = "V3", by.y = "V1", all.x=T, sort= F, all.y = F)[,2]]
That I want to make into a function:
fillInFields <- function(data,V1,V2,V3,V4){
test <- merge(data[,c(V1,V2)],unique(data[,V3]),by.x=V1,by.y = V3)
test <- test[!is.na(cat(V1)),]
test <- test[!cat(V1) %in% cat(V1)[which(duplicated(test[,V1]))]]
data[, cat(V4) := merge(data[,V3],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]]
}
However, when I run:
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1,"V1","V2","V3","V4")
I get the following error (Included traceback):
Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
7.
stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
"'by' must specify uniquely valid columns"), domain = NA)
6.
fix.by(by.x, x)
5.
merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
4.
merge(as.data.frame(x), as.data.frame(y), ...)
3.
merge.default(data[, c(V1, V2)], unique(data[, V3]), by.x = V1,
by.y = V3)
2.
merge(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, by.y = V3)
1.
fillInFields(DT1, "V1", "V2", "V3", "V4")
Can anyone see what I am doing wrong? All help appreciated, thanks in advance!
After doing some alterations following the comment from chinsoon12 I now have the following function:
fillInFields <- function(data,V1,V2,V3,V4){
test <- merge(data[,c(V1,V2),with=F],unique(data[,V3,with=F]),by.x=V1,by.y = V3)
test <- test[!is.na(V1),]
test <- test[!V1 %in% V1[which(duplicated(test[,V1]))]]
return(data[, V4 := merge(data[,V3,with=F],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]])
}
I now get the following error:
Error in `[.data.table`(data, , V3, with = F) :
Item 8 of j is 4 which is outside the column number range [1,ncol=3]
9.
stop("Item ", which.first(w), " of j is ", j[which.first(w)],
" which is outside the column number range [1,ncol=", ncol(x),
"]")
8.
`[.data.table`(data, , V3, with = F)
7.
data[, V3, with = F]
6.
merge(data[, V3, with = F], test, by.x = V3, by.y = V1, all.x = T,
sort = F, all.y = F)
5.
eval(jsub, SDenv, parent.frame())
4.
eval(jsub, SDenv, parent.frame())
3.
`[.data.table`(data, , `:=`(V4, merge(data[, V3, with = F], test,
by.x = V3, by.y = V1, all.x = T, sort = F, all.y = F)[, 2]))
2.
data[, `:=`(V4, merge(data[, V3, with = F], test, by.x = V3,
by.y = V1, all.x = T, sort = F, all.y = F)[, 2])]
1.
fillInFields(DT1, "V1", "V2", "V3", "V4")
Any ideas?
答案1
得分: 0
以下是您提供的代码的翻译部分:
在当前方法中存在许多作用域问题,因为相同的名称用于函数参数和`data.table`中的列。建议重新命名函数参数。以下是您函数的重写:
fillInFields <- function(data, idcol, vcol, newidcol, newvcol) {
nondup <- data[{
x <- get(idcol)
!is.na(x) & !(duplicated(x) | duplicated(x, fromLast=TRUE))
}]
data[nondup, on=paste0(newidcol,"==",idcol), (newvcol) := get(paste0("i.", vcol))]
}
示例用法:
library(data.table)
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1, "V1", "V2", "V3", "V4")[]
输出:
V1 V2 V3 V4
1: 1 A 1 A
2: 3 A NA <NA>
3: 3 C 2 <NA>
4: NA D NA <NA>
5: 5 E 3 <NA>
6: 6 F 3 <NA>
7: 7 G 3 <NA>
8: 8 <NA> 4 <NA>
9: 9 I 5 E
10: 10 J 6 F
希望这有助于您理解代码的翻译。
英文:
There are a lot of scoping issues with the current approach as the same names is used for function arguments and columns in the data.table
. A suggestion is to rename the function arguments. Here is a rewrite of your function:
fillInFields <- function(data, idcol, vcol, newidcol, newvcol) {
nondup <- data[{
x <- get(idcol)
!is.na(x) & !(duplicated(x) | duplicated(x, fromLast=TRUE))
}]
data[nondup, on=paste0(newidcol,"==",idcol), (newvcol) := get(paste0("i.", vcol))]
}
Example usage:
library(data.table)
DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
fillInFields(DT1, "V1", "V2", "V3", "V4")[]
output:
V1 V2 V3 V4
1: 1 A 1 A
2: 3 A NA <NA>
3: 3 C 2 <NA>
4: NA D NA <NA>
5: 5 E 3 <NA>
6: 6 F 3 <NA>
7: 7 G 3 <NA>
8: 8 <NA> 4 <NA>
9: 9 I 5 E
10: 10 J 6 F
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论