数据的逐行布尔比较

huangapple go评论68阅读模式
英文:

Row-wise Boolean comparison of data

问题

我已经按适当的分组对数据进行了分组,我需要确保每个唯一的 Group1 和 Group2 组合下的 "x" 和 "y" 值相等。换句话说,我可以使用什么代码循环遍历这个数据集,并确保 A1x == A1y,A2x == A2y,等等。

以下是示例中的数据:

"Group1","Group2","group3","values"
"A"        "1"       x       10
"A"        "1"       y       10
"A"        "2"       x       15 
"A"        "2"       y       15

为了简化回答,以下是示例中的数据框:

d <- data.frame(Group1= c("A", "A", "A", "A"), 
                Group2= c("1", "1", "2", "2"), 
                group3= c("x", "y", "x", "y"), 
                values= c(10, 10, 15, 15))
英文:

I have grouped my data by the appropriate grouping, and I need to be sure that "x" and "y" values equal each other for each unique combination of Group1 and Group2. In other words, what code could I use to cycle through this dataset and ensure that A1x == A1y and A2x == A2y, etc.

&quot;Group1&quot;,&quot;Group2&quot;,&quot;group3&quot;,&quot;values&quot;
&quot;A&quot;        &quot;1&quot;       x       10
&quot;A&quot;        &quot;1&quot;       y       10
&quot;A&quot;        &quot;2&quot;       x       15 
&quot;A&quot;        &quot;2&quot;       y       15

To help make the answer easier, here is the data.frame from the example

    d &lt;- data.frame(Group1= c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;), 
                    Group2= c(&quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;2&quot;), 
                    group3= c(&quot;x&quot;, &quot;y&quot;, &quot;x&quot;, &quot;y&quot;), 
                    values= c(10, 10, 15, 15))

答案1

得分: 4

使用dplyr,你可以做以下操作:

d %>%
  group_by(Group1, Group2) %>%
  mutate(cond = all(values == first(values)))
 Group1 Group2 group3 values cond 
 <fct>  <fct>  <fct>   <dbl> <lgl>
1 A      1      x          10 TRUE 
2 A      1      y          10 TRUE 
3 A      2      x          15 TRUE 
4 A      2      y          15 TRUE 

或者:

d %>%
  group_by(Group1, Group2) %>%
  mutate(cond = n_distinct(values) == 1)
英文:

With dplyr, you can do:

d %&gt;%
 group_by(Group1, Group2) %&gt;%
 mutate(cond = all(values == first(values)))

  Group1 Group2 group3 values cond 
  &lt;fct&gt;  &lt;fct&gt;  &lt;fct&gt;   &lt;dbl&gt; &lt;lgl&gt;
1 A      1      x          10 TRUE 
2 A      1      y          10 TRUE 
3 A      2      x          15 TRUE 
4 A      2      y          15 TRUE 

Or:

d %&gt;%
 group_by(Group1, Group2) %&gt;%
 mutate(cond = n_distinct(values) == 1)

答案2

得分: 3

你也可以使用 pivot_wider 完成这个操作:

tidyr::pivot_wider(d, names_from='group3', values_from='values') %>%
  dplyr::mutate(eq=x==y)
英文:

You can also do this with pivot_wider:

tidyr::pivot_wider(d, names_from=&#39;group3&#39;, values_from=&#39;values&#39;) %&gt;% 
  dplyr::mutate(eq=x==y)

答案3

得分: 1

我认为你在将数据转换为长格式方面走得太远,也许这样更容易操作:

d %>%
  pivot_wider(names_from = group3, values_from = values) %>%
  mutate(is_equal = x == y)
英文:

I think you went too far into turning your data into a long format maybe this is easier to manipulate

d %&gt;% 
  pivot_wider(names_from = group3,values_from = values) %&gt;% 
  mutate(is_equal = x == y)

答案4

得分: 1

以下是使用基本的 R 解决方案,使用 ave() 来实现的部分:

d <- within(d, isequal <- as.logical(ave(values, Group1, Group2, FUN = function(v) v == unique(v))))

这样,数据框 d 中的 isequal 列将如下所示:

> d
  Group1 Group2 group3 values isequal
1      A      1      x     10    TRUE
2      A      1      y     10    TRUE
3      A      2      x     15    TRUE
4      A      2      y     15    TRUE

请注意,这段代码使用 ave() 函数根据 Group1Group2 列的组合来计算 values 列是否在组合内是唯一的,并将结果存储在 isequal 列中。

英文:

Here is a base R solution using ave() to make it

d &lt;- within(d,isequal &lt;- as.logical(ave(values,Group1,Group2,FUN = function(v) v==unique(v))))

such that

&gt; d
  Group1 Group2 group3 values isequal
1      A      1      x     10    TRUE
2      A      1      y     10    TRUE
3      A      2      x     15    TRUE
4      A      2      y     15    TRUE

答案5

得分: 0

另一种选项是,如果数据被正确分组并且每组有2行:

d$check <- rep(d$values[seq(1L, nrow(d), 2L)] == d$values[seq(2L, nrow(d), 2L)], each = 2L)
英文:

Another option if the data is grouped properly and has 2 rows for each group:

d$check &lt;- rep(d$values[seq(1L,nrow(d),2L)]==d$values[seq(2L,nrow(d),2L)], each=2L)

答案6

得分: -1

一个简单的方法是合并具有组x和组y的子表格以比较数值。

> d[d$group3=="y",]

#      Group1 Group2 group3 values
#    2      A      1      y     10
#    4      A      2      y     15

> merge(d[d$group3=="y",],d[d$group3=="x",],by=c("Group1","Group2"))

#  Group1 Group2 group3.x values.x group3.y values.y
#  1      A      1        y       10        x       10
#  2      A      2        y       15        x       15


 with(merge(d[d$group3=="y",], d[d$group3=="x",],
      by=c("Group1","Group2")),
      values.x==values.y)

 ## [1] TRUE TRUE

当然,你还有更高级的方法,但从简单开始并不是坏事。

<details>
<summary>英文:</summary>

A simple way would be to merge the sub tables with group x and group y to compare the values.

    &gt; d[d$group3==&quot;y&quot;,]

    #      Group1 Group2 group3 values
    #    2      A      1      y     10
    #    4      A      2      y     15

    &gt; merge(d[d$group3==&quot;y&quot;,],d[d$group3==&quot;x&quot;,],by=c(&quot;Group1&quot;,&quot;Group2&quot;))

    #  Group1 Group2 group3.x values.x group3.y values.y
    #  1      A      1        y       10        x       10
    #  2      A      2        y       15        x       15


     with(merge(d[d$group3==&quot;y&quot;,], d[d$group3==&quot;x&quot;,],
          by=c(&quot;Group1&quot;,&quot;Group2&quot;)),
          values.x==values.y)

     ## [1] TRUE TRUE

Of course you have fancier ways of doing it but it is not bad to start simple first

</details>



huangapple
  • 本文由 发表于 2020年1月3日 22:04:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/59579928.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定