将一列进行分组,同时保留其他常数。

huangapple go评论82阅读模式
英文:

grouping one column and leaving other constants in there

问题

如何修改下面代码中的组函数以包括startdate的常量值?

# 重现我想要的示例:
employee <- c('John Doe', 'John Doe', 'Peter Gynn', 'Peter Gynn', 'Jolie Hope', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2010-11-1', '2008-3-25', '2008-3-25', '2007-3-14', '2007-3-14'))
salary <- c(100, 200, 100, 300, 800, 12)
employ.data <- data.frame(employee, startdate, salary)

# 按员工分组并汇总工资
grouped.file <- employ.data %>% group_by(employee) %>%
  summarize(salary = sum(salary, na.rm = T))

# 但我想要的数据框应如下所示:
employee <- c('John Doe', 'Peter Gynn', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2008-3-25', '2007-3-14'))
salary <- c(300, 400, 812)
employ.data <- data.frame(employee, startdate, salary)

请注意,这段代码将按照员工的名称分组,并汇总其工资,但没有考虑startdate的值。如果要包括startdate的常量值,您可以使用first()函数来获取每个组的startdate的第一个值,然后进行汇总。

英文:

How can I change the group function in the code below to also include the constant value of startdate?

#Reproducing an example of what I like to have: 
employee &lt;- c(&#39;John Doe&#39;,&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;,&#39;Jolie Hope&#39;)
startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;,&#39;2007-3-14&#39;))
salary &lt;- c(100,200,100,300,800,12)
employ.data &lt;- data.frame(employee, startdate, salary)

#Grouping by employee en summing salary
grouped.file &lt;- employ.data %&gt;% group_by(employee) %&gt;%
  summarize(salary = sum(salary, na.rm =T))

#But I would like to have a dataframe like this: 
employee &lt;- c(&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;)
startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;))
salary &lt;- c(300,400,812)
employ.data &lt;- data.frame(employee, startdate, salary)

答案1

得分: 2

如果startdate是固定的,您可以在group_by中使用它。

library(dplyr)

employ.data %>%
  group_by(employee, startdate) %>%
  summarize(salary = sum(salary, na.rm = TRUE))

或者在summarize中获取它的第一个值。

employ.data %>%
  group_by(employee) %>%
  summarize(startdate = first(startdate), salary = sum(salary, na.rm = TRUE))

或者使用mutate并仅选择每个组中的第一行。

employ.data %>%
  group_by(employee) %>%
  mutate(salary = sum(salary, na.rm = TRUE)) %>%
  slice(1L)
英文:

If the startdate is constant you can use it in group_by

library(dplyr)

employ.data %&gt;%  
    group_by(employee, startdate) %&gt;% 
    summarize(salary = sum(salary, na.rm =TRUE))

#  employee   startdate  salary
#  &lt;fct&gt;      &lt;date&gt;      &lt;dbl&gt;
#1 John Doe   2010-11-01    300
#2 Jolie Hope 2007-03-14    812
#3 Peter Gynn 2008-03-25    400

Or get its first value in summarize

employ.data %&gt;%  
 group_by(employee) %&gt;% 
 summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))

Or use mutate and select only 1st (any) row in each group.

employ.data %&gt;% 
  group_by(employee) %&gt;%
  mutate(salary = sum(salary, na.rm =TRUE)) %&gt;%
  slice(1L)

答案2

得分: 1

以下是两种使用基本的R方法来实现的方式:

  • 使用 aggregate()
employ.data <- aggregate(salary ~ employee + startdate, employ.data, FUN = function(x) sum(x, na.rm = TRUE))

得到结果

> employ.data
    employee  startdate salary
1 Jolie Hope 2007-03-14    812
2 Peter Gynn 2008-03-25    400
3   John Doe 2010-11-01    300
  • 使用 ave()unique()
unique(within(employ.data, salary <- ave(salary, employee, startdate, FUN = function(x) sum(x, na.rm = TRUE))))

得到结果

> employ.data
    employee  startdate salary
1   John Doe 2010-11-01    300
3 Peter Gynn 2008-03-25    400
5 Jolie Hope 2007-03-14    812
英文:

Here are two base R approaches to make it:

  • Using aggregate()
employ.data &lt;- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))

which gives

&gt; employ.data
    employee  startdate salary
1 Jolie Hope 2007-03-14    812
2 Peter Gynn 2008-03-25    400
3   John Doe 2010-11-01    300
  • Using ave() and unique()
unique(within(employ.data, salary &lt;- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))

which gives

&gt; employ.data
    employee  startdate salary
1   John Doe 2010-11-01    300
3 Peter Gynn 2008-03-25    400
5 Jolie Hope 2007-03-14    812

huangapple
  • 本文由 发表于 2020年1月3日 18:04:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576549.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定