英文:
Count number of filled out columns in R dataframe to create new column
问题
我有一个数据框df_team
,其中有多个团队列,当然并非所有列都有值。数据框看起来像这样,只是有超过100个这样的Team_列:
我想创建一个新的列Team_Count
,它计算填充的Team_x
列并忽略NA,以便我知道一个Project
有多少团队成员。
新的数据框应该如下所示:
英文:
I have a dataframe df_team
which has multiple Team-Columns, of which of course not all have values. The df looks like this, just with over 100 of these Team_ columns:
dput(df_team)
structure(list(Project = c("etwbv", "werg", "sdfg", "qwreg",
"cae", "refdc"), Team_1 = c("ewrg", "werg", "asd", "qwe", NA,
"vsfd"), Team_URL_1 = c("abc", "bfh", "fse", "rege", NA, "vsefr"
), Team_2 = c("abc1", "bfh", "fse", "rege1", NA, NA), Team_URL_2 = c("abc",
"bfh", "fse", "rege", NA, NA), Team_3 = c("abc1", "bfh", NA,
NA, NA, NA), Team_URL_3 = c("abc", "bfh", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
I want to create a new column Team_Count
which counts the filled Team_x
columns and disregards the NAs, so that I know how many team members one ````Project``` has.
The new dataframe should look like this:
Name Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
etwbv ewrg abc abc1 abc abc1 abc 3
werg werg bfh bfh bfh bfh bfh 3
sdfg asd fse fse fse NA NA 2
qwreg qwe rege rege1 rege NA NA 2
cae NA NA NA NA NA NA 0
refdc vsfd vsefr NA NA NA NA 1
答案1
得分: 2
library(dplyr)
df %>%
mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+"))))
If you are using dplyr < 1.1.0 then replace pick
with across
.
This works by creating a data frame with variables that match the regular expression "Team_\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA
, and then summing the number of columns that are not NA
by row.
Output
Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1 etwbv ewrg abc abc1 abc abc1 abc 3
2 werg werg bfh bfh bfh bfh bfh 3
3 sdfg asd fse fse fse <NA> <NA> 2
4 qwreg qwe rege rege1 rege <NA> <NA> 2
5 cae <NA> <NA> <NA> <NA> <NA> <NA> 0
6 refdc vsfd vsefr <NA> <NA> <NA> <NA> 1
英文:
library(dplyr)
df %>%
mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+")))))
If you are using dplyr < 1.1.0 then replace pick
with across
.
This works by creating a data frame with variables that match the regular expression "Team_\\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA
, and then summing the number of columns that are not NA
by row.
Output
Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1 etwbv ewrg abc abc1 abc abc1 abc 3
2 werg werg bfh bfh bfh bfh bfh 3
3 sdfg asd fse fse fse <NA> <NA> 2
4 qwreg qwe rege rege1 rege <NA> <NA> 2
5 cae <NA> <NA> <NA> <NA> <NA> <NA> 0
6 refdc vsfd vsefr <NA> <NA> <NA> <NA> 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论