调用在dplyr管道中创建的变量 – R

huangapple go评论73阅读模式
英文:

Call a variable created within a dplyr pipe - R

问题

可重现的数据:

# 截断指数分布
lambda_1 <- 1/2
lambda_2 <- 1/10
ff1 <- function(x) pexp(x, lambda_1)
f1.inv <- function(q) qexp(q, lambda_1)
ff2 <- function(x) pexp(x, lambda_2)
f2.inv <- function(q) qexp(q, lambda_2)
a <- 0
n <- 50
x1 <- f1.inv(runif(n))
x1.trunc <- f1.inv(runif(n, ff1(a)))
x2 <- f2.inv(runif(n))
x2.trunc <- f2.inv(runif(n, ff2(a)))
T_Phone <- c(x1.trunc, x2.trunc)

# 正态数据 - 方差相等
Normal_1_Eq <- rnorm(n = 50, mean = 24.6, sd = .95)
Normal_2_Eq <- rnorm(n = 50, mean = 38, sd = 1.05)
Weight <- c(Normal_1_Eq, Normal_2_Eq)

# 正态数据 - 方差不相等
Normal_1_Uneq <- rnorm(n = 50, mean = 24.6, sd = .23)
Normal_2_Uneq <- rnorm(n = 50, mean = 38, sd = 2.95)
Head_Circumference <- c(Normal_1_Uneq, Normal_2_Uneq)

# 泊松分布
Poisson_1 <- rpois(n = 50, lambda = 4.5)
Poisson_2 <- rpois(n = 50, lambda = 14.5)
Daily_Snacks <- c(Poisson_1, Poisson_2)

# 分配组别
Group <- rep(c("A", "B"), each = 50)
ID <- rep(c(1:50), each = 1, times = 2)

# 转为数据框
df <- data.frame(ID, Group, Weight, Head_Circumference, Daily_Snacks, T_Phone)
df[,c(1:2)] <- lapply(df[,c(1:2)], as.factor)
df[,c(3:6)] <- lapply(df[,c(3:6)], as.numeric)
df <- df %>% janitor::clean_names()

问题:
我尝试使用上面的长格式数据,并且只在需要时将其重塑为宽格式,以供 dplyr 管道链使用。我已成功地对“weight”变量执行了此操作:

df %>% select(id, group, weight) %>% spread(key = "group", value = "weight") 

现在,我想调用新创建的变量 A 和 B,并测试它们之间的方差齐性:

df %>% select(id, group, weight) %>% spread(key = "group", value = "weight") %>%
  var.test(.$A, .$B)

但是,当使用最后一个命令(var.test(.$))时,我只能访问我在 df 中最初选择的变量(例如,id 和 group)。

如果我将此保存到一个新的数据框中:

t_frame <- df %>% select(id, group, weight) %>% spread(key = "group", value = "weight")
var.test(t_frame$A, t_frame$B)

那么一切都可以正常工作。如何使新创建的 A 和 B 变量在管道内的 var.test 中填充?

英文:

Reproducible data:

&#39;# Truncated Exponential Dist&#39;s
lambda_1 &lt;- 1/2
lambda_2 &lt;- 1/10
ff1 &lt;- function(x) pexp(x, lambda_1)
f1.inv &lt;- function(q) qexp(q, lambda_1)
ff2 &lt;- function(x) pexp(x, lambda_2)
f2.inv &lt;- function(q) qexp(q, lambda_2)
a &lt;- 0
n &lt;- 50
x1 &lt;- f1.inv(runif(n))
x1.trunc &lt;- f1.inv(runif(n, ff1(a)))
x2 &lt;- f2.inv(runif(n))
x2.trunc &lt;- f2.inv(runif(n, ff2(a)))
T_Phone &lt;- c(x1.trunc,x2.trunc)
#Normal Data - Equal Variances
Normal_1_Eq &lt;- rnorm(n = 50, mean = 24.6, sd = .95)
Normal_2_Eq &lt;- rnorm(n = 50, mean = 38, sd = 1.05)
Weight &lt;- c(Normal_1_Eq,Normal_2_Eq)

#Normal Data - Unequal Variances
Normal_1_Uneq &lt;- rnorm(n = 50, mean = 24.6, sd = .23)
Normal_2_Uneq &lt;- rnorm(n = 50, mean = 38, sd = 2.95)
Head_Circumference &lt;- c(Normal_1_Uneq, Normal_2_Uneq)
#Poisson
Poisson_1 &lt;- rpois(n = 50, lambda = 4.5)
Poisson_2 &lt;- rpois(n = 50, lambda = 14.5)
Daily_Snacks &lt;- c(Poisson_1,Poisson_2)
#Assign Groups
Group &lt;- rep(c(&quot;A&quot;,&quot;B&quot;), each = 50)
ID &lt;- rep(c(1:50), each = 1, times = 2)
#Group &lt;- sample(Group)

#Set Into Dataframe
df &lt;- data.frame(ID,Group, Weight,Head_Circumference,Daily_Snacks,T_Phone)
df[,c(1:2)] &lt;- lapply(df[,c(1:2)], as.factor)
df[,c(3:6)] &lt;- lapply(df[,c(3:6)], as.numeric)
df &lt;- df %&gt;% janitor::clean_names()`

Question
I am attempting to work with the above long-data format and only reshape it into wide format when needed in a dplyr piped chain. I've been successful in doing so with the following (only applied to variable "weight")

df %&gt;% select(id,group, weight) %&gt;% spread(key = &quot;group&quot;, value = &quot;weight&quot;) 

Now, I want to call the new variables, A and B, and test homogeneity of variances between them:

df %&gt;% select(id,group, weight) %&gt;% spread(key = &quot;group&quot;, value = &quot;weight&quot;) %&gt;% var.test(.$A,.$B)

However, the only variables I have access to when using the last command (var.test(.$)) are the originally selected variables in my df (e.g., id and group)

If I save this to a new data frame:

t_frame &lt;- df %&gt;% select(id,group, weight) %&gt;% spread(key = &quot;group&quot;, value = &quot;weight&quot;)
var.test(t_frame$A,t_frame$B)

Then everything works. How can I get the newly created A and B variables to populate in var.test within the pipe?

答案1

得分: 3

将你的最后一个管道替换为:

%>% {var.test(.$A, .$B)}

没有{},你的代码将整个数据框作为第一个参数传递。花括号可以抑制这一行为,让你只选择使用$的子集。

英文:

Replace your last pipe with:

%&gt;% {var.test(.$A, .$B)}

Without the {} your code passes the whole data frame as the first argument. the curly braces suppress this allowing you to select just the subsets with $.

huangapple
  • 本文由 发表于 2023年6月16日 15:13:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487770.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定