使用R中的其他参考列表更改/映射数据框列的级别

huangapple go评论96阅读模式
英文:

Change/Map levels of a data frame columns using other reference list in R

问题

以下是翻译好的部分:

样本数据:

dt <- data.frame(a=c(1,2,3,4,5,6),b=c(50,10,20,30,99,190),c=c("a","b","c","d","e","f"))
ref_list <- list(a=c(1,2,3,4,5,6),b=NULL,c=c("a","b","c","d","e","f"))

str(dt)
'data.frame':	6 obs. of  3 variables:
 $ a: num  1 2 3 4 5 6
 $ b: num  50 10 20 30 99 190
 $ c: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6

期望输出:

将其转换为numeric,如果在ref_list中为NULL,否则将其转换为具有在ref_list中找到的水平的factor

'data.frame':	6 obs. of  3 variables:
 $ a: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6
 $ b: num  50 10 20 30 99 190
 $ c: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6
英文:

I am trying to change levels of the data frame and convert them to appropriate type using reference list.

Sample Data:

dt &lt;- data.frame(a=c(1,2,3,4,5,6),b=c(50,10,20,30,99,190),c=c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,&quot;e&quot;,&quot;f&quot;))
ref_list &lt;- list(a=c(1,2,3,4,5,6),b=NULL,c=c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,&quot;e&quot;,&quot;f&quot;))

str(dt)
&#39;data.frame&#39;:	6 obs. of  3 variables:
 $ a: num  1 2 3 4 5 6
 $ b: num  50 10 20 30 99 190
 $ c: Factor w/ 6 levels &quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,..: 1 2 3 4 5 6

Desired Output:

convert it to numeric, if it is NULL in the ref_list else convert it to factor with levels found in the ref_list

&#39;data.frame&#39;:	6 obs. of  3 variables:
 $ a: Factor w/ 6 levels &quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,..: 1 2 3 4 5 6
 $ b: num  50 10 20 30 99 190
 $ c: Factor w/ 6 levels &quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,..: 1 2 3 4 5 6

答案1

得分: 2

我会这样做,首先识别需要转换的变量,即在列表中不是NULL且在数据框中存在的变量,然后在一个快速的for循环中进行转换。

vars_to_convert = intersect(names(ref_list)[lengths(ref_list) > 0], names(dt))

for (var in vars_to_convert) {
  dt[[var]] = factor(dt[[var]], levels = ref_list[[var]])
}

str(dt)
# 'data.frame':	6 obs. of  3 variables:
#  $ a: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6
#  $ b: num  50 10 20 30 99 190
#  $ c: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6
英文:

I would do it like this, first identifying the variables that need converting, i.e., the variables that are not NULL in the list and also appear in the data frame, then converting in a quick for loop.

vars_to_convert = intersect(names(ref_list)[lengths(ref_list) &gt; 0], names(dt))

for (var in vars_to_convert) {
  dt[[var]] = factor(dt[[var]], levels = ref_list[[var]])
}

str(dt)
# &#39;data.frame&#39;:	6 obs. of  3 variables:
#  $ a: Factor w/ 6 levels &quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,..: 1 2 3 4 5 6
#  $ b: num  50 10 20 30 99 190
#  $ c: Factor w/ 6 levels &quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,..: 1 2 3 4 5 6

huangapple
  • 本文由 发表于 2020年1月3日 23:24:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581123.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定