Conditional calculation of new variable based on specific groups of rows, row values and columns in R dataframe

huangapple go评论63阅读模式
英文:

Conditional calculation of new variable based on specific groups of rows, row values and columns in R dataframe

问题

示例数据

我在R中有以下调查数据集,并需要帮助计算特定新变量的条件计算。

# 加载包
library(tidyverse)

# 设置种子以保证可复现性
set.seed(123)

# 创建数据:步骤1
df <- tibble(
  country = c(rep("A", 10), rep("B", 10)),
  respondent_id = 1:20,
  vote_choice = c(sample(c("PartyA", "PartyB", "PartyC"), 10, replace = TRUE),
                  sample(c("PartyD", "PartyE", "PartyF"), 10, replace = TRUE)),
  ptv_1 = runif(20, min = 0, max = 1) %>% round(., 3),
  ptv_2 = runif(20, min = 0, max = 1) %>% round(., 3),
  ptv_3 = runif(20, min = 0, max = 1) %>% round(., 3)
)

# 创建数据:步骤2
df <- df %>% 
  group_by(vote_choice, country) %>%
  summarize(across(starts_with("ptv"), \(x) mean(x, na.rm = TRUE))) %>%
  pivot_longer(cols = starts_with("ptv"), names_to = "party_to_ptv", values_to = "average_value") %>%
  group_by(vote_choice, country) %>%
  slice_max(order_by = average_value) %>%
  ungroup() %>%
  mutate(average_value = NULL) %>%
  right_join(., df, by = c("vote_choice", "country"))

# 检查数据
df

变量信息:

  • country 包含了我示例数据集中的2个国家,每个国家有10个受访者,并且在上次选举中,受访者可以在3个不同的政党中进行选择(实际数据还包含一个名为year的变量,为了简单起见,我没有包含它)
  • respondent_id 是调查数据集中的受访者编号,表明数据集是以受访者为单位的,但在其他方面可以忽略不计
  • vote_choice 以名称表示受访者在上次选举中投票的政党
  • ptv_1ptv_2ptv_3 表示每个可选政党的受访者对该政党的倾向(在实际数据中,受访者当然更倾向于他们投票的政党);范围:0-1
  • party_to_ptv 是一个转换列表,指示 vote_choice 中的哪个政党对应于 ptv_*

问题描述

现在,我需要计算一组(3个)名为 electoral_opportunites_* 的新变量,其中 * 是指代三个 PTV 的占位符。这个想法是根据其他政党选民的有利倾向来计算政党获得新选民的机会。

为了做到这一点,我需要计算:1 - (sqrt(所投政党的PTV) - sqrt(其他政党的PTV)),其思想是将自己所支持政党的支持力量与新政党进行比较。例如,如果一个受访者强烈支持他们自己的政党A,PTV = 1.0,那么他们对政党B的倾向为 PTV = 0.4 并不那么重要。

我在计算中遇到的问题是条件性:我需要为每个受访者找到与他们选择的政党相对应的 PTV 列的值(可能不是行中最高的 PTV 值),然后从中减去另一列的平方根值。

手动计算,我会按照以下方式对示例数据集进行操作。

预期结果(对于 electoral_opportunities_1

df %>% 
  mutate(electoral_potential_1 = 
           # 减法:PTV(所投政党)- PTV(PTV列1)...
           c(1 - ( sqrt(0.799) - sqrt(0.691) ),
             1 - ( sqrt(0.810) - sqrt(0.544) ),
             1 - ( sqrt(0.794) - sqrt(0.289) ),
             1 - ( sqrt(0.440) - sqrt(0.147) ),
             1 - ( sqrt(0.754) - sqrt(0.963) ),
             NA, # ...除非它们都是相同的。
             NA,
             NA,
             NA,
             NA,
             1 - ( sqrt(0.220) - sqrt(0.478) ),
             1 - ( sqrt(0.352) - sqrt(0.318) ),
             1 - ( sqrt(0.668) - sqrt(0.415) ),
             1 - ( sqrt(0.418) - sqrt(0.414) ),
             NA,
             NA,
             NA,
             1 - ( sqrt(0.753) - sqrt(0.216) ),
             1 - ( sqrt(0.374) - sqrt(0.232) ),
             1 - ( sqrt(0.665) - sqrt(0.143) )) ) -> df

df

作为一个小细节,之后我会检查是否有任何值大于1,并将它们限制在1以内,这意味着如果受访者更倾向于他们实际没有投票的政党,该政党将在其吸引该选民的选举机会方面获得最高分(1)。

df %>% 
  mutate(electoral_opportunities_1 = ifelse(electoral_opportunities_1 > 1, 1, electoral_opportunities_1)) -> df

df

我无法手动完成所有这些操作。因此,我将非常感谢您提供一个高效且整洁的解决方案,以计算各个 PTV 列的选举机会。我尝试了许多不同的方法,包括对数据框进行旋转,但迄今为止都没有成功。总的来说,这个过程是:

  • 获取与 vote_choice 对应的 PTV 列的值。
  • 从该值中减去给定 PTV 列的值,以计算该政党的选举机会。
  • 除非两个政党相同,否则将值设置为 NA。
  • 然后,检查是否有任何值大于1,并将它们限制在1以内。

编辑

我刚刚注意到,在最终的数据框中,我当然需要计算 vote_choice 中每个政党的平均选举机会,而不是三个单独的列!

英文:

Example Data

I have got the following survey dataset in R and need help with the conditional calculation of a specific new variable.

# Load package
library(tidyverse)

# Important: set seed for replicability
set.seed(123)

# Create data: step 1
df &lt;- tibble(
  country = c(rep(&quot;A&quot;, 10), rep(&quot;B&quot;, 10)),
  respondent_id = 1:20,
  vote_choice = c(sample(c(&quot;PartyA&quot;, &quot;PartyB&quot;, &quot;PartyC&quot;), 10, replace = TRUE),
                  sample(c(&quot;PartyD&quot;, &quot;PartyE&quot;, &quot;PartyF&quot;), 10, replace = TRUE)),
  ptv_1 = runif(20, min = 0, max = 1) %&gt;% round(., 3),
  ptv_2 = runif(20, min = 0, max = 1) %&gt;% round(., 3),
  ptv_3 = runif(20, min = 0, max = 1) %&gt;% round(., 3)
)

# Create data: step 2
df &lt;- df %&gt;% 
  group_by(vote_choice, country) %&gt;%
  summarize(across(starts_with(&quot;ptv&quot;), \(x) mean(x, na.rm = TRUE))) %&gt;%
  pivot_longer(cols = starts_with(&quot;ptv&quot;), names_to = &quot;party_to_ptv&quot;, values_to = &quot;average_value&quot;) %&gt;%
  group_by(vote_choice, country) %&gt;%
  slice_max(order_by = average_value) %&gt;%
  ungroup() %&gt;%
  mutate(average_value = NULL) %&gt;%
  right_join(., df, by = c(&quot;vote_choice&quot;, &quot;country&quot;))

# Inspect data
df

# A tibble: 20 &#215; 7
   vote_choice country party_to_ptv respondent_id ptv_1 ptv_2 ptv_3
   &lt;chr&gt;       &lt;chr&gt;   &lt;chr&gt;                &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1 PartyA      A       ptv_2                   10 0.691 0.799 0.710 
 2 PartyB      A       ptv_3                    4 0.544 0.233 0.810 
 3 PartyB      A       ptv_3                    6 0.289 0.266 0.794
 4 PartyB      A       ptv_3                    7 0.147 0.858 0.440 
 5 PartyB      A       ptv_3                    8 0.963 0.046 0.754
 6 PartyC      A       ptv_1                    1 0.994 0.369 0.274
 7 PartyC      A       ptv_1                    2 0.656 0.152 0.815
 8 PartyC      A       ptv_1                    3 0.709 0.139 0.449
 9 PartyC      A       ptv_1                    5 0.594 0.466 0.812
10 PartyC      A       ptv_1                    9 0.902 0.442 0.629
11 PartyD      B       ptv_3                   13 0.478 0.207 0.220 
12 PartyD      B       ptv_3                   16 0.318 0.895 0.352
13 PartyD      B       ptv_3                   19 0.415 0.095 0.668
14 PartyD      B       ptv_3                   20 0.414 0.384 0.418
15 PartyE      B       ptv_1                   11 0.795 0.122 0.001
16 PartyE      B       ptv_1                   12 0.025 0.561 0.475
17 PartyE      B       ptv_1                   14 0.758 0.128 0.380 
18 PartyF      B       ptv_2                   15 0.216 0.753 0.613
19 PartyF      B       ptv_2                   17 0.232 0.374 0.111
20 PartyF      B       ptv_2                   18 0.143 0.665 0.244

Information on the variables:

  • country encompasses 2 countries in my example df which each contain 10 respondents and a set of 3 distinct political parties respondents got to choose between at the last election (the real data also contain a variable year which I did not include for the sake of simplicity)
  • respondent_id refers to the respondent in the survey dataset and demonstrates that the dataset is at respondent-level but can otherwise be ignored
  • vote_choice denotes by name the party the respondent voted for at the last election
  • ptv_1, ptv_2, and ptv_3 indicate for each party that is available the leaning of each respondent to this party (in the real data, respondents of course lean more strongly to the party they voted for); scale: 0-1
  • party_to_ptv is a conversion list that indicates which party in vote_choice corresponds to which ptv_* column

Problem Description

I now need to calculate a set of (3) new variables called electoral_opportunites_* where the asterisk is a placeholder for 1-3 refering to the three PTVs. The idea is to calculate the changes parties have of gaining new voters based on the favorable leaning of other parties' voters.

To do so, I need to calculate: 1 - (sqrt(PTV of party voted for) - sqrt(PTV of other party)), the idea of which is to set the strength of support of one's own party in relation to a new party. For example, if a respondent strongly supports their own party, A, by PTV = 1.0, it doesn't really matter that much that they also lean to B by PTV = 0.4.

My problem with the calculation is the conditionality: I need to find rowwise for each respondent the PTV column value that corresponds to their party of choice (which may not be the highest PTV value in the row), and then subtract from it the square-rooted value of another column.

Manually, I would do it as follows for the example df.

Expected Outcome (for electoral_opportunities_1)

df %&gt;% 
  mutate(electoral_potential_1 = 
           # Subtract: PTV (party voted for) - PTV (PTV column 1)...
           c(1 - ( sqrt(0.799) - sqrt(0.691) ),
             1 - ( sqrt(0.810) - sqrt(0.544) ),
             1 - ( sqrt(0.794) - sqrt(0.289) ),
             1 - ( sqrt(0.440) - sqrt(0.147) ),
             1 - ( sqrt(0.754) - sqrt(0.963) ),
             NA, # ...unless they are both the same.
             NA,
             NA,
             NA,
             NA,
             1 - ( sqrt(0.220) - sqrt(0.478) ),
             1 - ( sqrt(0.352) - sqrt(0.318) ),
             1 - ( sqrt(0.668) - sqrt(0.415) ),
             1 - ( sqrt(0.418) - sqrt(0.414) ),
             NA,
             NA,
             NA,
             1 - ( sqrt(0.753) - sqrt(0.216) ),
             1 - ( sqrt(0.374) - sqrt(0.232) ),
             1 - ( sqrt(0.665) - sqrt(0.143) )) ) -&gt; df

df

As a minor detail, I would afterwards check if there are any values > 1 and cap them at 1, which means that if respondents are leaning more strongly to a party they did not actually vote for, said party will receive the highest score (1) in terms of its eletoral changes to pursuade that voter.

df %&gt;% 
  mutate(electoral_opportunities_1 = ifelse(electoral_opportunities_1 &gt; 1, 1, electoral_opportunities_1)) -&gt; df

I cannot do all of this by hand. Hence I would be grateful for an efficient and tidy solution to calculate the electoral opportunities for individual PTV columns. I have tried many different approaches, including pivoting the df, none of which have worked so far. Taken together, the process is:

  • Take the value of the PTV column that corresponds to vote_choice.
  • Subtract from this value the value of a given PTV column to calculate that party's electoral opportunities.
  • Unless both parties are the same, in which case set the value to NA.
  • Then, check if there are any values > 1 and cap them at 1.

EDIT

I just noticed that in the final df I would of course need the average electoral opportunity for each party in vote_choice, instead of three separate columns!

答案1

得分: 1

继续你的解决方案,

df %>%
  mutate(
    ptv_v = case_when(
      party_to_ptv == "ptv_1" ~ ptv_1,
      party_to_ptv == "ptv_2" ~ ptv_2,
      party_to_ptv == "ptv_3" ~ ptv_3,
      TRUE ~ NA_real_
    ),
    opportunity_1 = ifelse(ptv_1 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_1))),
    opportunity_2 = ifelse(ptv_2 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_2))),
    opportunity_3 = ifelse(ptv_3 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_3))),
  ) %>%
  mutate_at(vars(starts_with("opportunity")), ~ifelse(party_to_ptv == substr(., start = 14, stop = 18), NA, .)) %>%
  group_by(vote_choice) %>%
  summarise(avg_opportunity = mean(c(opportunity_1, opportunity_2, opportunity_3), na.rm = TRUE))

现在得到的结果是,

vote_choice avg_opportunity
  <chr>                 <dbl>
1 PartyA                0.962
2 PartyB                0.813
3 PartyC                0.836
4 PartyD                0.937
5 PartyE                0.759
6 PartyF                0.816

初始尝试:

library(tidyverse) 

df %>%
  pivot_longer(cols = starts_with("ptv"), 
               names_to = "ptv", 
               values_to = "ptv_value") %>%
  group_by(respondent_id) %>%
  mutate(voted_party_ptv = ptv_value[party_to_ptv == ptv]) %>%
  ungroup() %>%
  mutate(electoral_opportunity = ifelse(party_to_ptv != ptv, 
                                        pmin(1, 1 - (sqrt(voted_party_ptv) - sqrt(ptv_value))), 
                                        NA)) %>% 
  select(-c(voted_party_ptv, ptv_value)) %>%
  pivot_wider(names_from = ptv, 
              values_from = electoral_opportunity, 
              names_prefix = "electoral_opportunity_") %>%
  mutate(avg_electoral_opportunity = rowMeans(select(., starts_with("electoral_opportunity")), na.rm = TRUE))

得到的结果是:

vote_choice country party_to_ptv respondent_id electoral_opportunity_ptv_1 electoral_opportunity_ptv_2 electoral_opportunity_ptv_3 avg_electoral_opportunity
   <chr>       <chr>   <chr>                <int>                       <dbl>                       <dbl>                      <dbl>                   <dbl>
 1 PartyA      A       ptv_2                   10                       0.937                      NA                          0.949                   0.943
 2 PartyB      A       ptv_3                    4                       0.838                       0.583                     NA                       0.710
 3 PartyB      A       ptv_3                    6                       0.647                       0.625                     NA                       0.636
 4 PartyB      A       ptv_3                    7                       0.720                       1                         NA                       0.860
 5 PartyB      A       ptv_3                    8                       1                           0.346                     NA                       0.673
 6 PartyC      A       ptv_1                    1                      NA                           0.610                      0.526                   0.568
 7 PartyC      A       ptv_1                    2                      NA                           0.580                      1                       0.790
 8 PartyC      A       ptv_1                    3                      NA                           0.531                      0.828                   0.679
 9 PartyC      A       ptv_1                    5                      NA                           0.912                      1                       0.956
10 PartyC      A       ptv_1                    9                      NA                           0.715                      0.843                   0.779
11 PartyD      B       ptv_3                   13                       1                           0.986                     NA                       0.993
12 PartyD      B       ptv_3                   16                       0.971                       1                         NA                       0.985
13 PartyD      B       ptv_3                   19                       0.827                       0.491                     NA                       0.659
14 PartyD      B       ptv_3                   20                       0.997                       0.973                     NA                       0.985
15 PartyE      B       ptv_1                   11                      NA                           0.458                      0.140                   0.299
16 PartyE      B       ptv_1                   12                      NA                           1                          1                       1    
17 PartyE      B       ptv_1                   14                      NA                           0.487                      0.746                   0.616
18 PartyF      B       ptv_2                   15                       0.597                      NA                          0.915                   0.756
19 PartyF      B       ptv_2                   17                       0.870                      NA                          0.722                   0.796
20 PartyF      B       ptv_2                   18                       0.563                      NA                          0.678                   0.621

你可以省略你不需要的任何列。

英文:

Continuing form your solution,

df %&gt;%
  mutate(
    ptv_v = case_when(
      party_to_ptv == &quot;ptv_1&quot; ~ ptv_1,
      party_to_ptv == &quot;ptv_2&quot; ~ ptv_2,
      party_to_ptv == &quot;ptv_3&quot; ~ ptv_3,
      TRUE ~ NA_real_
    ),
    opportunity_1 = ifelse(ptv_1 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_1))),
    opportunity_2 = ifelse(ptv_2 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_2))),
    opportunity_3 = ifelse(ptv_3 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_3))),
  ) %&gt;%
  mutate_at(vars(starts_with(&quot;opportunity&quot;)), ~ifelse(party_to_ptv == substr(., start = 14, stop = 18), NA, .)) %&gt;%
  group_by(vote_choice) %&gt;%
  summarise(avg_opportunity = mean(c(opportunity_1, opportunity_2, opportunity_3), na.rm = TRUE))

which now gives,

vote_choice avg_opportunity
  &lt;chr&gt;                 &lt;dbl&gt;
1 PartyA                0.962
2 PartyB                0.813
3 PartyC                0.836
4 PartyD                0.937
5 PartyE                0.759
6 PartyF                0.816

Initial attempt

library(tidyverse) 

df %&gt;%
  pivot_longer(cols = starts_with(&quot;ptv&quot;), 
               names_to = &quot;ptv&quot;, 
               values_to = &quot;ptv_value&quot;) %&gt;%
  group_by(respondent_id) %&gt;%
  mutate(voted_party_ptv = ptv_value[party_to_ptv == ptv]) %&gt;%
  ungroup() %&gt;%
  mutate(electoral_opportunity = ifelse(party_to_ptv != ptv, 
                                        pmin(1, 1 - (sqrt(voted_party_ptv) - sqrt(ptv_value))), 
                                        NA)) %&gt;% 
  select(-c(voted_party_ptv, ptv_value)) %&gt;%
  pivot_wider(names_from = ptv, 
              values_from = electoral_opportunity, 
              names_prefix = &quot;electoral_opportunity_&quot;) %&gt;%
  mutate(avg_electoral_opportunity = rowMeans(select(., starts_with(&quot;electoral_opportunity&quot;)), na.rm = TRUE))

which gives:

vote_choice country party_to_ptv respondent_id electoral_opportunity_ptv_1 electoral_opportunity_ptv_2 electoral_opportunity_pt…&#185; avg_e…&#178;
   &lt;chr&gt;       &lt;chr&gt;   &lt;chr&gt;                &lt;int&gt;                       &lt;dbl&gt;                       &lt;dbl&gt;                      &lt;dbl&gt;   &lt;dbl&gt;
 1 PartyA      A       ptv_2                   10                       0.937                      NA                          0.949   0.943
 2 PartyB      A       ptv_3                    4                       0.838                       0.583                     NA       0.710
 3 PartyB      A       ptv_3                    6                       0.647                       0.625                     NA       0.636
 4 PartyB      A       ptv_3                    7                       0.720                       1                         NA       0.860
 5 PartyB      A       ptv_3                    8                       1                           0.346                     NA       0.673
 6 PartyC      A       ptv_1                    1                      NA                           0.610                      0.526   0.568
 7 PartyC      A       ptv_1                    2                      NA                           0.580                      1       0.790
 8 PartyC      A       ptv_1                    3                      NA                           0.531                      0.828   0.679
 9 PartyC      A       ptv_1                    5                      NA                           0.912                      1       0.956
10 PartyC      A       ptv_1                    9                      NA                           0.715                      0.843   0.779
11 PartyD      B       ptv_3                   13                       1                           0.986                     NA       0.993
12 PartyD      B       ptv_3                   16                       0.971                       1                         NA       0.985
13 PartyD      B       ptv_3                   19                       0.827                       0.491                     NA       0.659
14 PartyD      B       ptv_3                   20                       0.997                       0.973                     NA       0.985
15 PartyE      B       ptv_1                   11                      NA                           0.458                      0.140   0.299
16 PartyE      B       ptv_1                   12                      NA                           1                          1       1    
17 PartyE      B       ptv_1                   14                      NA                           0.487                      0.746   0.616
18 PartyF      B       ptv_2                   15                       0.597                      NA                          0.915   0.756
19 PartyF      B       ptv_2                   17                       0.870                      NA                          0.722   0.796
20 PartyF      B       ptv_2                   18                       0.563                      NA                          0.678   0.621

You can omit any columns you don't need

答案2

得分: 0

好的,以下是我翻译好的内容:

好的,这显然比我最初想的要简单。以下是我解决问题最大部分的步骤。

library(tidyverse)

df %>%
  mutate(ptv_v = case_when(party_to_ptv == "ptv_1" ~ ptv_1,
                           party_to_ptv == "ptv_2" ~ ptv_2,
                           party_to_ptv == "ptv_3" ~ ptv_3,
                           T ~ NA_real_)) %>%
  mutate(electoral_opportunity_1 = ifelse(ptv_1 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_1)) ) %>% ifelse(party_to_ptv == "ptv_1", NA, .),
         electoral_opportunity_2 = ifelse(ptv_2 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_2)) ) %>% ifelse(party_to_ptv == "ptv_2", NA, .),
         electoral_opportunity_3 = ifelse(ptv_3 > ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_3)) ) %>% ifelse(party_to_ptv == "ptv_3", NA, .) ) -> df

现在我只需要计算每个投票选择中的平均选举机会。这有点棘手,我还在努力弄清楚我实际上想要什么。这有点笨拙,但我认为它能实现我想要的效果:

df %>%
  mutate(opportunity = case_when(is.na(electoral_opportunity_1) ~ mean(electoral_opportunity_1, na.rm = T),
                                 is.na(electoral_opportunity_2) ~ mean(electoral_opportunity_2, na.rm = T),
                                 is.na(electoral_opportunity_3) ~ mean(electoral_opportunity_3, na.rm = T),
                                 T ~ NA_real_)) -> df

df %>%
  group_by(vote_choice, country) %>%
  summarize(opportunity = mean(opportunity, na.rm = T))

结果如下:

  vote_choice country opportunity
  <chr>       <chr>         <dbl>
1 PartyA      A             0.706
2 PartyB      A             0.779
3 PartyC      A             0.830
4 PartyD      B             0.779
5 PartyE      B             0.830
6 PartyF      B             0.706
英文:

Okay, it is apparently more straight forward than I initially thought. Here is what I have done to solve the biggest part of the problem.

library(tidyverse)

df %&gt;% 
  mutate(ptv_v = case_when(party_to_ptv == &quot;ptv_1&quot; ~ ptv_1,
                                    party_to_ptv == &quot;ptv_2&quot; ~ ptv_2,
                                    party_to_ptv == &quot;ptv_3&quot; ~ ptv_3,
                                    T ~ NA_real_)) %&gt;% 
  mutate(electoral_opportunity_1 = ifelse(ptv_1 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_1)) ) %&gt;% ifelse(party_to_ptv == &quot;ptv_1&quot;, NA, .),
         electoral_opportunity_2 = ifelse(ptv_2 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_2)) ) %&gt;% ifelse(party_to_ptv == &quot;ptv_2&quot;, NA, .),
         electoral_opportunity_3 = ifelse(ptv_3 &gt; ptv_v, 1, 1 - (sqrt(ptv_v) - sqrt(ptv_3)) ) %&gt;% ifelse(party_to_ptv == &quot;ptv_3&quot;, NA, .) ) -&gt; df

Now I just need to get the average electoral opportunity for each party in vote_choice. It's tricky and I'm still triyng to figure out what I actually want. This is all a bit clumsy but I think it does what I want:

df %&gt;%
  mutate(opportunity = case_when(is.na(electoral_opportunity_1) ~ mean(electoral_opportunity_1, na.rm = T),
                                 is.na(electoral_opportunity_2) ~ mean(electoral_opportunity_2, na.rm = T),
                                 is.na(electoral_opportunity_3) ~ mean(electoral_opportunity_3, na.rm = T),
                                 T ~ NA_real_)) -&gt; df

df %&gt;% 
  group_by(vote_choice, country) %&gt;%
  summarize(opportunity = mean(opportunity, na.rm = T))

  vote_choice country opportunity
  &lt;chr&gt;       &lt;chr&gt;         &lt;dbl&gt;
1 PartyA      A             0.706
2 PartyB      A             0.779
3 PartyC      A             0.830
4 PartyD      B             0.779
5 PartyE      B             0.830
6 PartyF      B             0.706

huangapple
  • 本文由 发表于 2023年7月27日 17:49:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778513.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定