Count number of filled out columns in R dataframe to create new column.

huangapple go评论64阅读模式
英文:

Count number of filled out columns in R dataframe to create new column

问题

我有一个数据框df_team,其中有多个团队列,当然并非所有列都有值。数据框看起来像这样,只是有超过100个这样的Team_列:

我想创建一个新的列Team_Count,它计算填充的Team_x列并忽略NA,以便我知道一个Project有多少团队成员。

新的数据框应该如下所示:

英文:

I have a dataframe df_team which has multiple Team-Columns, of which of course not all have values. The df looks like this, just with over 100 of these Team_ columns:

dput(df_team)

structure(list(Project = c("etwbv", "werg", "sdfg", "qwreg", 
"cae", "refdc"), Team_1 = c("ewrg", "werg", "asd", "qwe", NA, 
"vsfd"), Team_URL_1 = c("abc", "bfh", "fse", "rege", NA, "vsefr"
), Team_2 = c("abc1", "bfh", "fse", "rege1", NA, NA), Team_URL_2 = c("abc", 
"bfh", "fse", "rege", NA, NA), Team_3 = c("abc1", "bfh", NA, 
NA, NA, NA), Team_URL_3 = c("abc", "bfh", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L))

I want to create a new column Team_Count which counts the filled Team_x columns and disregards the NAs, so that I know how many team members one ````Project``` has.

The new dataframe should look like this:

Name   Team_1  Team_URL_1 Team_2  Team_URL_2  Team_3  Team_URL_3 Team_Count
etwbv  ewrg	   abc	      abc1	  abc	      abc1	  abc        3
werg   werg	   bfh	      bfh	  bfh	      bfh	  bfh        3
sdfg   asd	   fse	      fse	  fse	      NA	  NA         2
qwreg  qwe	   rege       rege1	  rege	      NA	  NA         2
cae	   NA	   NA	      NA      NA          NA      NA         0
refdc  vsfd	   vsefr	  NA	  NA 	      NA	  NA         1

答案1

得分: 2

library(dplyr)

df %>% 
  mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+"))))

If you are using dplyr < 1.1.0 then replace pick with across.

This works by creating a data frame with variables that match the regular expression "Team_\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.

Output

  Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1   etwbv   ewrg        abc   abc1        abc   abc1        abc          3
2    werg   werg        bfh    bfh        bfh    bfh        bfh          3
3    sdfg    asd        fse    fse        fse   &lt;NA&gt;       &lt;NA&gt;          2
4   qwreg    qwe       rege  rege1       rege   &lt;NA&gt;       &lt;NA&gt;          2
5     cae   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;          0
6   refdc   vsfd      vsefr   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;          1
英文:
library(dplyr)

df %&gt;% 
  mutate(Team_Count = rowSums(!is.na(pick(matches(&quot;Team_\\d+&quot;)))))

If you are using dplyr < 1.1.0 then replace pick with across.

This works by creating a data frame with variables that match the regular expression "Team_\\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.

Output

  Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1   etwbv   ewrg        abc   abc1        abc   abc1        abc          3
2    werg   werg        bfh    bfh        bfh    bfh        bfh          3
3    sdfg    asd        fse    fse        fse   &lt;NA&gt;       &lt;NA&gt;          2
4   qwreg    qwe       rege  rege1       rege   &lt;NA&gt;       &lt;NA&gt;          2
5     cae   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;          0
6   refdc   vsfd      vsefr   &lt;NA&gt;       &lt;NA&gt;   &lt;NA&gt;       &lt;NA&gt;          1

huangapple
  • 本文由 发表于 2023年3月21日 01:07:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793271.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定