Using outside function in dplyr to standardize values via selected geometric mean. (Getting it via sample instead of geom mean of full column)

huangapple go评论68阅读模式
英文:

Using outside function in dplyr to standardize values via selected geometric mean. (Getting it via sample instead of geom mean of full column)

问题

好的,以下是您要翻译的部分:

"Good evening fellow programmers/statistians etc."

"I'm trying to standardize a set of variables dividing them by the geometric mean of a set of (same or not) variables I'm using as reference. Problem is, when trying to get it to work via dplyr, I'm getting results that I suspect are not the ones they should be if I do it case by case."

"I have here some code explaining what I have done and why it failed. But It seems that dplyr is not getting my values via sample/row-wise, and instead is taking the full column to do my geometric mean."

"I have been reviewing some questions, including some about geometric means, but for now I have not yet found how to solve it."

"# A set of functions I'm using to calculate the geom mean."

"gm_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}"

"gm_mean2 = function(x, na.rm=TRUE){
exp(mean(log(x[x > 0]), na.rm=TRUE))
}"

"# And also psych::geometric.mean()"

"x <- c(4, 8, 9, 9, 12, 14, 17)"

"# x <- c(4, 8, 9, 9, 12, 14, 17)

gm_mean(x) # It works as intended.

gm_mean2(x) #It works as intended.

psych::geometric.mean(x) #Indeed it works"

"So, using the iris dataset, I want to standardize a set of columns (coln1), dividing by the geometric mean of another set of columns (Which I would want to set as a variable, but since I'm not getting it to work as separate, I'm trying them without grouping them in a variable)"

"For now I have tried this (and failed)"

"library(dplyr)
coln1 <- colnames(iris)[1:2]
coln1 <- colnames(iris)[1:2]
iris %>% mutate(across( any_of(coln1), ~ .x / psych::geometric.mean(c(Sepal.Length,Sepal.Width)) )) ## Doesn't work as intended? No. Not at all."

"Let me illustrate. Value that we are getting doing it case by case its == to the output?"

"iris[1,1] / psych::geometric.mean(c(iris[1,1],iris[1,2]))
1.207 != 1.2187"

"iris[1,1] / psych::geometric.mean(c(iris$Sepal.Length,iris$Sepal.Width))
1.2817 == 1.287"

"Its doing it by taking the full column of values, all of them, and not the values corresponding to that sample (in this case 2, but we could have more or less variables changing it in the psych:geometric.mean.)"

"Notes."

"The geometric mean is the nth root of n products or e to the mean log of x. Useful for describing non-normal, i.e., geometric distributions. We are usign it via psych:: because it could be negative and we should solve that."

"# iris %>% mutate( across( any_of(coln1), ~ .x / exp(mean(log(Sepal.Length+Sepal.Width))) )) # No. Cause this is not using the mean since its one value instead of two."

英文:

Good evening fellow programmers/statistians etc.

I'm trying to standardize a set of variables dividing them by the geometric mean of a set of (same or not) variables I'm using as reference. Problem is, when trying to get it to work via dplyr, I'm getting results that I suspect are not the ones they should be if I do it case by case.

I have here some code explaining what I have done and why it failed. But It seems that dplyr is not getting my values via sample/row-wise, and instead is taking the full column to do my geometric mean.

I have been reviewing some questions, including some about geometric means, but for now I have not yet found how to solve it.

# A set of functions I&#39;m using to calculate the geom mean.
gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x &gt; 0]), na.rm=na.rm) / length(x))
}
gm_mean2 = function(x, na.rm=TRUE){
  exp(mean(log(x[x &gt; 0]), na.rm=TRUE))
}
# And also psych::geometric.mean()

# x &lt;- c(4, 8, 9, 9, 12, 14, 17)
# gm_mean(x)  # It works as intended.
# gm_mean2(x)  #It works as intended.
# psych::geometric.mean(x) #Indeed it works

So, using the iris dataset, I want to standardize a set of columns (coln1), dividing by the geometric mean of another set of columns (Which I would want to set as a variable, but since I'm not getting it to work as separate, I'm trying them without grouping them in a variable)

For now I have tried this (and failed)

library(dplyr)
coln1 &lt;- colnames(iris)[1:2]
coln1 &lt;- colnames(iris)[1:2]
iris %&gt;% mutate(across( any_of(coln1),  ~ .x / psych::geometric.mean(c(Sepal.Length,Sepal.Width)) )) ## Doesn&#39;t work as intended? No. Not at all.

# Let me illustrate. Value that we are getting doing it case by case its == to the output?
iris[1,1] / psych::geometric.mean(c(iris[1,1],iris[1,2]))
1.207 != 1.2187
iris[1,1] / psych::geometric.mean(c(iris$Sepal.Length,iris$Sepal.Width))
1.2817 == 1.287
# Its doing it by taking the full column of values, all of them, and not the values corresponding to that sample (in this case 2, but we could have more or less variables changing it in the psych:geometri.c.mean.)


# Notes.
# The geometric mean is the nth root of n products or e to the mean log of x. Useful for describing non-normal, i.e., geometric distributions. We are usign it via psych:: because it could be negative and we should solve that.

# iris %&gt;% mutate(across( any_of(coln1),  ~ .x / exp(mean(log(Sepal.Length+Sepal.Width)))   )) # No. Cause this is not using the mean since its one value instead of two.

答案1

得分: 1

我认为你在设置方面做得很好,只是你确实缺少了 'rowwise()'!我重新排列了mutate调用中的逻辑,但基本上就是'rowwise()'。

coln1 <- colnames(iris)[3:4]

iris %>%
  rowwise() %>%
  mutate(geo.mean = psych::geometric.mean(c(Sepal.Length, Sepal.Width)),
         across(.cols = all_of(coln1), .fns = ~ .x / geo.mean, .names = '{.col}_{.fn}'))

一个数据框:150 x 8

逐行:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species geo.mean Petal.Length_1 Petal.Width_1

1 5.1 3.5 1.4 0.2 setosa 4.22 0.331 0.0473

证明它在第一个条目中正确运行:

1.4 / psych::geometric.mean(c(5.1, 3.5))
[1] 0.3313667

0.2 / psych::geometric.mean(c(5.1, 3.5))
[1] 0.04733811


<details>
<summary>英文:</summary>

I think you&#39;ve done a great job setting it up, it&#39;s just &#39;rowwise()&#39; that you&#39;re missing really! I&#39;ve re-arranged the logic in the mutate call but it&#39;s basically just rowwise.

    coln1 &lt;- colnames(iris)[3:4]
    
    iris %&gt;% 
      rowwise() %&gt;%
      mutate(geo.mean = psych::geometric.mean(c(Sepal.Length,Sepal.Width)),
             across(.cols = all_of(coln1), .fns = ~ .x / geo.mean, .names = &#39;{.col}_{.fn}&#39;))
        
    # A tibble: 150 x 8
    # Rowwise: 
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species geo.mean Petal.Length_1 Petal.Width_1
              &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;      &lt;dbl&gt;          &lt;dbl&gt;         &lt;dbl&gt;
     1          5.1         3.5          1.4         0.2 setosa      4.22          0.331        0.0473

    # prove it&#39;s correctly functioning with first entry:
    1.4 / psych::geometric.mean(c(5.1, 3.5))
    [1] 0.3313667
    
    0.2 / psych::geometric.mean(c(5.1, 3.5))
    [1] 0.04733811




</details>



huangapple
  • 本文由 发表于 2023年2月8日 20:30:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385828.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定