使用R中的data.table创建合并脚本的函数

huangapple go评论81阅读模式
英文:

Creating function from merge script using data.table in R

问题

It looks like you're trying to translate code and errors related to that code. Here's the translation of the code snippet and the error messages:

Original Code:

  1. I have the following code (which runs as expected):
  2. ...
  3. Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

Translated Code:

  1. 我有以下代码(按预期运行):
  2. ...
  3. 有人能看出我做错了什么吗?非常感谢您的帮助!

Original Error 1:

  1. Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
  2. 7.
  3. stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
  4. "'by' must specify uniquely valid columns"), domain = NA)
  5. 6.
  6. fix.by(by.x, x)
  7. 5.
  8. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
  9. ...

Translated Error 1:

  1. 错误:在 fix.by(by.x, x) 中出现问题:'by' 必须指定一个唯一有效的列
  2. 7.
  3. stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
  4. "'by' must specify uniquely valid columns"), domain = NA)
  5. 6.
  6. fix.by(by.x, x)
  7. 5.
  8. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
  9. ...

Original Error 2:

  1. Error in `[.data.table`(data, , V3, with = F) :
  2. Item 8 of j is 4 which is outside the column number range [1,ncol=3]
  3. 9.
  4. stop("Item ", which.first(w), " of j is ", j[which.first(w)],
  5. " which is outside the column number range [1,ncol=", ncol(x),
  6. "]")
  7. 8.
  8. `[.data.table`(data, , V3, with = F)
  9. 7.
  10. data[, V3, with = F]
  11. ...

Translated Error 2:

  1. 错误:在 `[.data.table`(data, , V3, with = F) 中出现问题:
  2. j 的第 8 项是 4,超出了列数范围 [1,ncol=3]
  3. 9.
  4. stop("Item ", which.first(w), " of j is ", j[which.first(w)],
  5. " which is outside the column number range [1,ncol=", ncol(x),
  6. "]")
  7. 8.
  8. `[.data.table`(data, , V3, with = F)
  9. 7.
  10. data[, V3, with = F]
  11. ...

Please note that these translations are based on your provided code and error messages. If you have any specific questions or need further assistance, please feel free to ask.

英文:

I have the following code (which runs as expected):

  1. library("data.table")
  2. DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
  3. test <- merge(DT1[,c("V1","V2")],unique(DT1[,"V3"]),by.x="V1",by.y = "V3")
  4. test <- test[!is.na(V1),]
  5. test <- test[!V1 %in% V1[which(duplicated(test$V1))]]
  6. DT1[, V4 := merge(DT1[,"V3"],test,by.x = "V3", by.y = "V1", all.x=T, sort= F, all.y = F)[,2]]

That I want to make into a function:

  1. fillInFields <- function(data,V1,V2,V3,V4){
  2. test <- merge(data[,c(V1,V2)],unique(data[,V3]),by.x=V1,by.y = V3)
  3. test <- test[!is.na(cat(V1)),]
  4. test <- test[!cat(V1) %in% cat(V1)[which(duplicated(test[,V1]))]]
  5. data[, cat(V4) := merge(data[,V3],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]]
  6. }

However, when I run:

  1. DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
  2. fillInFields(DT1,"V1","V2","V3","V4")

I get the following error (Included traceback):

  1. Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
  2. 7.
  3. stop(ngettext(sum(bad), "'by' must specify a uniquely valid column",
  4. "'by' must specify uniquely valid columns"), domain = NA)
  5. 6.
  6. fix.by(by.x, x)
  7. 5.
  8. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
  9. 4.
  10. merge(as.data.frame(x), as.data.frame(y), ...)
  11. 3.
  12. merge.default(data[, c(V1, V2)], unique(data[, V3]), by.x = V1,
  13. by.y = V3)
  14. 2.
  15. merge(data[, c(V1, V2)], unique(data[, V3]), by.x = V1, by.y = V3)
  16. 1.
  17. fillInFields(DT1, "V1", "V2", "V3", "V4")

Can anyone see what I am doing wrong? All help appreciated, thanks in advance!

After doing some alterations following the comment from chinsoon12 I now have the following function:

  1. fillInFields <- function(data,V1,V2,V3,V4){
  2. test <- merge(data[,c(V1,V2),with=F],unique(data[,V3,with=F]),by.x=V1,by.y = V3)
  3. test <- test[!is.na(V1),]
  4. test <- test[!V1 %in% V1[which(duplicated(test[,V1]))]]
  5. return(data[, V4 := merge(data[,V3,with=F],test,by.x = V3, by.y = V1, all.x=T, sort= F, all.y = F)[,2]])
  6. }

I now get the following error:

  1. Error in `[.data.table`(data, , V3, with = F) :
  2. Item 8 of j is 4 which is outside the column number range [1,ncol=3]
  3. 9.
  4. stop("Item ", which.first(w), " of j is ", j[which.first(w)],
  5. " which is outside the column number range [1,ncol=", ncol(x),
  6. "]")
  7. 8.
  8. `[.data.table`(data, , V3, with = F)
  9. 7.
  10. data[, V3, with = F]
  11. 6.
  12. merge(data[, V3, with = F], test, by.x = V3, by.y = V1, all.x = T,
  13. sort = F, all.y = F)
  14. 5.
  15. eval(jsub, SDenv, parent.frame())
  16. 4.
  17. eval(jsub, SDenv, parent.frame())
  18. 3.
  19. `[.data.table`(data, , `:=`(V4, merge(data[, V3, with = F], test,
  20. by.x = V3, by.y = V1, all.x = T, sort = F, all.y = F)[, 2]))
  21. 2.
  22. data[, `:=`(V4, merge(data[, V3, with = F], test, by.x = V3,
  23. by.y = V1, all.x = T, sort = F, all.y = F)[, 2])]
  24. 1.
  25. fillInFields(DT1, "V1", "V2", "V3", "V4")

Any ideas?

答案1

得分: 0

以下是您提供的代码的翻译部分:

  1. 在当前方法中存在许多作用域问题,因为相同的名称用于函数参数和`data.table`中的列。建议重新命名函数参数。以下是您函数的重写:
  2. fillInFields <- function(data, idcol, vcol, newidcol, newvcol) {
  3. nondup <- data[{
  4. x <- get(idcol)
  5. !is.na(x) & !(duplicated(x) | duplicated(x, fromLast=TRUE))
  6. }]
  7. data[nondup, on=paste0(newidcol,"==",idcol), (newvcol) := get(paste0("i.", vcol))]
  8. }
  9. 示例用法:
  10. library(data.table)
  11. DT1 <- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
  12. fillInFields(DT1, "V1", "V2", "V3", "V4")[]

输出:

  1. V1 V2 V3 V4
  2. 1: 1 A 1 A
  3. 2: 3 A NA <NA>
  4. 3: 3 C 2 <NA>
  5. 4: NA D NA <NA>
  6. 5: 5 E 3 <NA>
  7. 6: 6 F 3 <NA>
  8. 7: 7 G 3 <NA>
  9. 8: 8 <NA> 4 <NA>
  10. 9: 9 I 5 E
  11. 10: 10 J 6 F

希望这有助于您理解代码的翻译。

英文:

There are a lot of scoping issues with the current approach as the same names is used for function arguments and columns in the data.table. A suggestion is to rename the function arguments. Here is a rewrite of your function:

  1. fillInFields &lt;- function(data, idcol, vcol, newidcol, newvcol) {
  2. nondup &lt;- data[{
  3. x &lt;- get(idcol)
  4. !is.na(x) &amp; !(duplicated(x) | duplicated(x, fromLast=TRUE))
  5. }]
  6. data[nondup, on=paste0(newidcol,&quot;==&quot;,idcol), (newvcol) := get(paste0(&quot;i.&quot;, vcol))]
  7. }

Example usage:

  1. library(data.table)
  2. DT1 &lt;- data.table(V1 = c(1,3,3,NA,5,6,7,8,9,10), V2 = LETTERS[c(1,1,3,4,5,6,7,NA,9,10)], V3 = c(1,NA,2,NA,3,3,3,4,5,6))
  3. fillInFields(DT1, &quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;, &quot;V4&quot;)[]

output:

  1. V1 V2 V3 V4
  2. 1: 1 A 1 A
  3. 2: 3 A NA &lt;NA&gt;
  4. 3: 3 C 2 &lt;NA&gt;
  5. 4: NA D NA &lt;NA&gt;
  6. 5: 5 E 3 &lt;NA&gt;
  7. 6: 6 F 3 &lt;NA&gt;
  8. 7: 7 G 3 &lt;NA&gt;
  9. 8: 8 &lt;NA&gt; 4 &lt;NA&gt;
  10. 9: 9 I 5 E
  11. 10: 10 J 6 F

huangapple
  • 本文由 发表于 2020年1月6日 22:55:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/59614247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定