英文:
How could one branch multiple variables under another variable in R?
问题
我正在尝试按一组变量将我的观察结果分组,这些变量最终会归属于最后一组变量。这是我的示例数据:
         国家         姓名      民族       党派
         阿富汗     约翰      普什图人   X党
         阿富汗     奥利弗    普什图人   Y党
         阿富汗     布拉德    塔吉克人   X党
         阿富汗     查德      哈扎拉人   X党
         波斯尼亚   维尔京    塞尔维亚人 P党
         波斯尼亚   玛丽      塞尔维亚人 P党
         波斯尼亚   耶稣      克罗地亚人 C党
我要做的是在每个党派下创建所有现有民族的集合,并计算在每个党派下有多少人属于每个民族,还要考虑国家,结果应该如下所示:
         国家         党派     民族       人数
         阿富汗     X党     普什图人   1
         阿富汗     X党     塔吉克人   1
         阿富汗     X党     哈扎拉人   1
         阿富汗     Y党     普什图人   1
         阿富汗     Y党     塔吉克人   0
         阿富汗     Y党     哈扎拉人   0
         波斯尼亚   P党     塞尔维亚人 2
         波斯尼亚   P党     克罗地亚人 0
         波斯尼亚   C党     塞尔维亚人 0
         波斯尼亚   C党     克罗地亚人 1
到目前为止,我尝试了 group_by 和 aggregate 函数,但没有成功。
英文:
I'm trying to group my observations by a set of variables under another set of variables which is, finally, under a last set of variables. Here's what I have for example:
     country      name     ethnicity   party
     Afghanistan  john     Pashtun     X Party
     Afghanistan  oliver   Pashtun     Y Party
     Afghanistan  brad     Tajik       X Party
     Afghanistan  chad     Hazara      X Party
     Bosnia       virgin   Serb        P Party
     Bosnia       mary     Serb        P Party
     Bosnia       jesus    Croat       C Party
What I'm going for should create the set of all existing ethnicities under each party and count how many persons are under each ethnicity in a party, within a country and look something like:
     country      party     ethnicity   count
     Afghanistan  X Party   Pashtun     1
     Afghanistan  X Party   Tajik       1
     Afghanistan  X Party   Hazara      1
     Afghanistan  Y Party   Pashtun     1
     Afghanistan  Y Party   Tajik       0
     Afghanistan  Y Party   Hazara      0
     Bosnia       P Party   Serb        2
     Bosnia       P Party   Croat       0
     Bosnia       C Party   Serb        0
     Bosnia       C Party   Croat       1
So far I've tried the functions group_by and aggregate to no avail.
答案1
得分: 1
这是一个非常简单的操作,请阅读这本书 https://r4ds.had.co.nz/
library(data.table)
library(tidyverse)
df_example <- fread("country      name     ethnicity   party coolness
Afghanistan  john     Pashtun     X_Party     cool
Afghanistan  oliver   Pashtun     Y_Party     not_cool
Afghanistan  brad     Tajik       X_Party     cool
Afghanistan  chad     Hazara      X_Party     not_cool
Bosnia       virgin   Serb        P_Party     cool
Bosnia       mary     Serb        P_Party     cool
Bosnia       jesus    Croat       C_Party     not_cool" ,
                    header = TRUE)
df_example %>%
  group_by(country,ethnicity,party) %>%
  add_tally() %>%
  select(-name) %>%
  distinct()
英文:
this is a really simply operation, please read this book https://r4ds.had.co.nz/
library(data.table)
library(tidyverse)
df_example <- fread("country      name     ethnicity   party coolness
Afghanistan  john     Pashtun     X_Party     cool
Afghanistan  oliver   Pashtun     Y_Party     not_cool
Afghanistan  brad     Tajik       X_Party     cool
Afghanistan  chad     Hazara      X_Party     not_cool
Bosnia       virgin   Serb        P_Party     cool
Bosnia       mary     Serb        P_Party     cool
Bosnia       jesus    Croat       C_Party     not_cool" ,
                    
                    header = TRUE)
df_example %>% 
  group_by(country,ethnicity,party) %>% 
  add_tally() %>% 
  select(-name) %>% # Some stuff that you don't want
  distinct()
答案2
得分: 1
你可以使用 dplyr 和 tidyr:
df %>%
  count(!!!select(., -name)) %>%
  group_by(country) %>%
  complete(ethnicity, nesting(party), fill = list(n = 0))
   country     ethnicity party       n
   <chr>       <chr>     <fct>   <dbl>
 1 Afghanistan Hazara    X Party     1
 2 Afghanistan Hazara    Y Party     0
 3 Afghanistan Pashtun   X Party     1
 4 Afghanistan Pashtun   Y Party     1
 5 Afghanistan Tajik     X Party     1
 6 Afghanistan Tajik     Y Party     0
 7 Bosnia      Croat     C Party     1
 8 Bosnia      Croat     P Party     0
 9 Bosnia      Serb      C Party     0
10 Bosnia      Serb      P Party     2
英文:
You can use dplyr and tidyr:
df %>%
 count(!!!select(., -name)) %>%
 group_by(country) %>%
 complete(ethnicity, nesting(party), fill = list(n = 0))
   country     ethnicity party       n
   <chr>       <chr>     <fct>   <dbl>
 1 Afghanistan Hazara    X Party     1
 2 Afghanistan Hazara    Y Party     0
 3 Afghanistan Pashtun   X Party     1
 4 Afghanistan Pashtun   Y Party     1
 5 Afghanistan Tajik     X Party     1
 6 Afghanistan Tajik     Y Party     0
 7 Bosnia      Croat     C Party     1
 8 Bosnia      Croat     P Party     0
 9 Bosnia      Serb      C Party     0
10 Bosnia      Serb      P Party     2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论