英文:
grouping one column and leaving other constants in there
问题
如何修改下面代码中的组函数以包括startdate
的常量值?
# 重现我想要的示例:
employee <- c('John Doe', 'John Doe', 'Peter Gynn', 'Peter Gynn', 'Jolie Hope', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2010-11-1', '2008-3-25', '2008-3-25', '2007-3-14', '2007-3-14'))
salary <- c(100, 200, 100, 300, 800, 12)
employ.data <- data.frame(employee, startdate, salary)
# 按员工分组并汇总工资
grouped.file <- employ.data %>% group_by(employee) %>%
summarize(salary = sum(salary, na.rm = T))
# 但我想要的数据框应如下所示:
employee <- c('John Doe', 'Peter Gynn', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2008-3-25', '2007-3-14'))
salary <- c(300, 400, 812)
employ.data <- data.frame(employee, startdate, salary)
请注意,这段代码将按照员工的名称分组,并汇总其工资,但没有考虑startdate
的值。如果要包括startdate
的常量值,您可以使用first()
函数来获取每个组的startdate
的第一个值,然后进行汇总。
英文:
How can I change the group function in the code below to also include the constant value of startdate
?
#Reproducing an example of what I like to have:
employee <- c('John Doe','John Doe','Peter Gynn','Peter Gynn','Jolie Hope','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2010-11-1','2008-3-25','2008-3-25','2007-3-14','2007-3-14'))
salary <- c(100,200,100,300,800,12)
employ.data <- data.frame(employee, startdate, salary)
#Grouping by employee en summing salary
grouped.file <- employ.data %>% group_by(employee) %>%
summarize(salary = sum(salary, na.rm =T))
#But I would like to have a dataframe like this:
employee <- c('John Doe','Peter Gynn','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
salary <- c(300,400,812)
employ.data <- data.frame(employee, startdate, salary)
答案1
得分: 2
如果startdate
是固定的,您可以在group_by
中使用它。
library(dplyr)
employ.data %>%
group_by(employee, startdate) %>%
summarize(salary = sum(salary, na.rm = TRUE))
或者在summarize
中获取它的第一个值。
employ.data %>%
group_by(employee) %>%
summarize(startdate = first(startdate), salary = sum(salary, na.rm = TRUE))
或者使用mutate
并仅选择每个组中的第一行。
employ.data %>%
group_by(employee) %>%
mutate(salary = sum(salary, na.rm = TRUE)) %>%
slice(1L)
英文:
If the startdate
is constant you can use it in group_by
library(dplyr)
employ.data %>%
group_by(employee, startdate) %>%
summarize(salary = sum(salary, na.rm =TRUE))
# employee startdate salary
# <fct> <date> <dbl>
#1 John Doe 2010-11-01 300
#2 Jolie Hope 2007-03-14 812
#3 Peter Gynn 2008-03-25 400
Or get its first
value in summarize
employ.data %>%
group_by(employee) %>%
summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))
Or use mutate
and select only 1st (any) row in each group.
employ.data %>%
group_by(employee) %>%
mutate(salary = sum(salary, na.rm =TRUE)) %>%
slice(1L)
答案2
得分: 1
以下是两种使用基本的R方法来实现的方式:
- 使用
aggregate()
employ.data <- aggregate(salary ~ employee + startdate, employ.data, FUN = function(x) sum(x, na.rm = TRUE))
得到结果
> employ.data
employee startdate salary
1 Jolie Hope 2007-03-14 812
2 Peter Gynn 2008-03-25 400
3 John Doe 2010-11-01 300
- 使用
ave()
和unique()
unique(within(employ.data, salary <- ave(salary, employee, startdate, FUN = function(x) sum(x, na.rm = TRUE))))
得到结果
> employ.data
employee startdate salary
1 John Doe 2010-11-01 300
3 Peter Gynn 2008-03-25 400
5 Jolie Hope 2007-03-14 812
英文:
Here are two base R approaches to make it:
- Using
aggregate()
employ.data <- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))
which gives
> employ.data
employee startdate salary
1 Jolie Hope 2007-03-14 812
2 Peter Gynn 2008-03-25 400
3 John Doe 2010-11-01 300
- Using
ave()
andunique()
unique(within(employ.data, salary <- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))
which gives
> employ.data
employee startdate salary
1 John Doe 2010-11-01 300
3 Peter Gynn 2008-03-25 400
5 Jolie Hope 2007-03-14 812
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论