将一列进行分组,同时保留其他常数。

huangapple go评论108阅读模式
英文:

grouping one column and leaving other constants in there

问题

如何修改下面代码中的组函数以包括startdate的常量值?

  1. # 重现我想要的示例:
  2. employee <- c('John Doe', 'John Doe', 'Peter Gynn', 'Peter Gynn', 'Jolie Hope', 'Jolie Hope')
  3. startdate <- as.Date(c('2010-11-1', '2010-11-1', '2008-3-25', '2008-3-25', '2007-3-14', '2007-3-14'))
  4. salary <- c(100, 200, 100, 300, 800, 12)
  5. employ.data <- data.frame(employee, startdate, salary)
  6. # 按员工分组并汇总工资
  7. grouped.file <- employ.data %>% group_by(employee) %>%
  8. summarize(salary = sum(salary, na.rm = T))
  9. # 但我想要的数据框应如下所示:
  10. employee <- c('John Doe', 'Peter Gynn', 'Jolie Hope')
  11. startdate <- as.Date(c('2010-11-1', '2008-3-25', '2007-3-14'))
  12. salary <- c(300, 400, 812)
  13. employ.data <- data.frame(employee, startdate, salary)

请注意,这段代码将按照员工的名称分组,并汇总其工资,但没有考虑startdate的值。如果要包括startdate的常量值,您可以使用first()函数来获取每个组的startdate的第一个值,然后进行汇总。

英文:

How can I change the group function in the code below to also include the constant value of startdate?

  1. #Reproducing an example of what I like to have:
  2. employee &lt;- c(&#39;John Doe&#39;,&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;,&#39;Jolie Hope&#39;)
  3. startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;,&#39;2007-3-14&#39;))
  4. salary &lt;- c(100,200,100,300,800,12)
  5. employ.data &lt;- data.frame(employee, startdate, salary)
  6. #Grouping by employee en summing salary
  7. grouped.file &lt;- employ.data %&gt;% group_by(employee) %&gt;%
  8. summarize(salary = sum(salary, na.rm =T))
  9. #But I would like to have a dataframe like this:
  10. employee &lt;- c(&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;)
  11. startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;))
  12. salary &lt;- c(300,400,812)
  13. employ.data &lt;- data.frame(employee, startdate, salary)

答案1

得分: 2

如果startdate是固定的,您可以在group_by中使用它。

  1. library(dplyr)
  2. employ.data %>%
  3. group_by(employee, startdate) %>%
  4. summarize(salary = sum(salary, na.rm = TRUE))

或者在summarize中获取它的第一个值。

  1. employ.data %>%
  2. group_by(employee) %>%
  3. summarize(startdate = first(startdate), salary = sum(salary, na.rm = TRUE))

或者使用mutate并仅选择每个组中的第一行。

  1. employ.data %>%
  2. group_by(employee) %>%
  3. mutate(salary = sum(salary, na.rm = TRUE)) %>%
  4. slice(1L)
英文:

If the startdate is constant you can use it in group_by

  1. library(dplyr)
  2. employ.data %&gt;%
  3. group_by(employee, startdate) %&gt;%
  4. summarize(salary = sum(salary, na.rm =TRUE))
  5. # employee startdate salary
  6. # &lt;fct&gt; &lt;date&gt; &lt;dbl&gt;
  7. #1 John Doe 2010-11-01 300
  8. #2 Jolie Hope 2007-03-14 812
  9. #3 Peter Gynn 2008-03-25 400

Or get its first value in summarize

  1. employ.data %&gt;%
  2. group_by(employee) %&gt;%
  3. summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))

Or use mutate and select only 1st (any) row in each group.

  1. employ.data %&gt;%
  2. group_by(employee) %&gt;%
  3. mutate(salary = sum(salary, na.rm =TRUE)) %&gt;%
  4. slice(1L)

答案2

得分: 1

以下是两种使用基本的R方法来实现的方式:

  • 使用 aggregate()
  1. employ.data <- aggregate(salary ~ employee + startdate, employ.data, FUN = function(x) sum(x, na.rm = TRUE))

得到结果

  1. > employ.data
  2. employee startdate salary
  3. 1 Jolie Hope 2007-03-14 812
  4. 2 Peter Gynn 2008-03-25 400
  5. 3 John Doe 2010-11-01 300
  • 使用 ave()unique()
  1. unique(within(employ.data, salary <- ave(salary, employee, startdate, FUN = function(x) sum(x, na.rm = TRUE))))

得到结果

  1. > employ.data
  2. employee startdate salary
  3. 1 John Doe 2010-11-01 300
  4. 3 Peter Gynn 2008-03-25 400
  5. 5 Jolie Hope 2007-03-14 812
英文:

Here are two base R approaches to make it:

  • Using aggregate()
  1. employ.data &lt;- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))

which gives

  1. &gt; employ.data
  2. employee startdate salary
  3. 1 Jolie Hope 2007-03-14 812
  4. 2 Peter Gynn 2008-03-25 400
  5. 3 John Doe 2010-11-01 300
  • Using ave() and unique()
  1. unique(within(employ.data, salary &lt;- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))

which gives

  1. &gt; employ.data
  2. employee startdate salary
  3. 1 John Doe 2010-11-01 300
  4. 3 Peter Gynn 2008-03-25 400
  5. 5 Jolie Hope 2007-03-14 812

huangapple
  • 本文由 发表于 2020年1月3日 18:04:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576549.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定