英文:
Count number of filled out columns in R dataframe to create new column
问题
我有一个数据框df_team,其中有多个团队列,当然并非所有列都有值。数据框看起来像这样,只是有超过100个这样的Team_列:
我想创建一个新的列Team_Count,它计算填充的Team_x列并忽略NA,以便我知道一个Project有多少团队成员。
新的数据框应该如下所示:
英文:
I have a dataframe df_team which has multiple Team-Columns, of which of course not all have values. The df looks like this, just with over 100 of these Team_ columns:
dput(df_team)
structure(list(Project = c("etwbv", "werg", "sdfg", "qwreg",
"cae", "refdc"), Team_1 = c("ewrg", "werg", "asd", "qwe", NA,
"vsfd"), Team_URL_1 = c("abc", "bfh", "fse", "rege", NA, "vsefr"
), Team_2 = c("abc1", "bfh", "fse", "rege1", NA, NA), Team_URL_2 = c("abc",
"bfh", "fse", "rege", NA, NA), Team_3 = c("abc1", "bfh", NA,
NA, NA, NA), Team_URL_3 = c("abc", "bfh", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
I want to create a new column Team_Count which counts the filled Team_x columns and disregards the NAs, so that I know how many team members one ````Project``` has.
The new dataframe should look like this:
Name Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
etwbv ewrg abc abc1 abc abc1 abc 3
werg werg bfh bfh bfh bfh bfh 3
sdfg asd fse fse fse NA NA 2
qwreg qwe rege rege1 rege NA NA 2
cae NA NA NA NA NA NA 0
refdc vsfd vsefr NA NA NA NA 1
答案1
得分: 2
library(dplyr)
df %>%
mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+"))))
If you are using dplyr < 1.1.0 then replace pick with across.
This works by creating a data frame with variables that match the regular expression "Team_\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.
Output
Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1 etwbv ewrg abc abc1 abc abc1 abc 3
2 werg werg bfh bfh bfh bfh bfh 3
3 sdfg asd fse fse fse <NA> <NA> 2
4 qwreg qwe rege rege1 rege <NA> <NA> 2
5 cae <NA> <NA> <NA> <NA> <NA> <NA> 0
6 refdc vsfd vsefr <NA> <NA> <NA> <NA> 1
英文:
library(dplyr)
df %>%
mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+")))))
If you are using dplyr < 1.1.0 then replace pick with across.
This works by creating a data frame with variables that match the regular expression "Team_\\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.
Output
Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1 etwbv ewrg abc abc1 abc abc1 abc 3
2 werg werg bfh bfh bfh bfh bfh 3
3 sdfg asd fse fse fse <NA> <NA> 2
4 qwreg qwe rege rege1 rege <NA> <NA> 2
5 cae <NA> <NA> <NA> <NA> <NA> <NA> 0
6 refdc vsfd vsefr <NA> <NA> <NA> <NA> 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论