2020年1月3日 18:04:13go评论108阅读模式

英文:

grouping one column and leaving other constants in there

问题

如何修改下面代码中的组函数以包括startdate的常量值？

# 重现我想要的示例：
employee <- c('John Doe', 'John Doe', 'Peter Gynn', 'Peter Gynn', 'Jolie Hope', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2010-11-1', '2008-3-25', '2008-3-25', '2007-3-14', '2007-3-14'))
salary <- c(100, 200, 100, 300, 800, 12)
employ.data <- data.frame(employee, startdate, salary)
# 按员工分组并汇总工资
grouped.file <- employ.data %>% group_by(employee) %>%
  summarize(salary = sum(salary, na.rm = T))
# 但我想要的数据框应如下所示：
employee <- c('John Doe', 'Peter Gynn', 'Jolie Hope')
startdate <- as.Date(c('2010-11-1', '2008-3-25', '2007-3-14'))
salary <- c(300, 400, 812)
employ.data <- data.frame(employee, startdate, salary)

请注意，这段代码将按照员工的名称分组，并汇总其工资，但没有考虑startdate的值。如果要包括startdate的常量值，您可以使用first()函数来获取每个组的startdate的第一个值，然后进行汇总。

英文:

How can I change the group function in the code below to also include the constant value of startdate?

#Reproducing an example of what I like to have: 
employee &lt;- c(&#39;John Doe&#39;,&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;,&#39;Jolie Hope&#39;)
startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;,&#39;2007-3-14&#39;))
salary &lt;- c(100,200,100,300,800,12)
employ.data &lt;- data.frame(employee, startdate, salary)
#Grouping by employee en summing salary
grouped.file &lt;- employ.data %&gt;% group_by(employee) %&gt;%
  summarize(salary = sum(salary, na.rm =T))
#But I would like to have a dataframe like this: 
employee &lt;- c(&#39;John Doe&#39;,&#39;Peter Gynn&#39;,&#39;Jolie Hope&#39;)
startdate &lt;- as.Date(c(&#39;2010-11-1&#39;,&#39;2008-3-25&#39;,&#39;2007-3-14&#39;))
salary &lt;- c(300,400,812)
employ.data &lt;- data.frame(employee, startdate, salary)

答案1

得分: 2

如果startdate是固定的，您可以在group_by中使用它。

library(dplyr)
employ.data %>%
  group_by(employee, startdate) %>%
  summarize(salary = sum(salary, na.rm = TRUE))

或者在summarize中获取它的第一个值。

employ.data %>%
  group_by(employee) %>%
  summarize(startdate = first(startdate), salary = sum(salary, na.rm = TRUE))

或者使用mutate并仅选择每个组中的第一行。

employ.data %>%
  group_by(employee) %>%
  mutate(salary = sum(salary, na.rm = TRUE)) %>%
  slice(1L)

英文:

If the startdate is constant you can use it in group_by

library(dplyr)
employ.data %&gt;%  
    group_by(employee, startdate) %&gt;% 
    summarize(salary = sum(salary, na.rm =TRUE))
#  employee   startdate  salary
#  &lt;fct&gt;      &lt;date&gt;      &lt;dbl&gt;
#1 John Doe   2010-11-01    300
#2 Jolie Hope 2007-03-14    812
#3 Peter Gynn 2008-03-25    400

Or get its first value in summarize

employ.data %&gt;%  
 group_by(employee) %&gt;% 
 summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))

Or use mutate and select only 1st (any) row in each group.

employ.data %&gt;% 
  group_by(employee) %&gt;%
  mutate(salary = sum(salary, na.rm =TRUE)) %&gt;%
  slice(1L)

答案2

得分: 1

以下是两种使用基本的R方法来实现的方式：

使用 aggregate()

employ.data <- aggregate(salary ~ employee + startdate, employ.data, FUN = function(x) sum(x, na.rm = TRUE))

得到结果

> employ.data
    employee  startdate salary
1 Jolie Hope 2007-03-14    812
2 Peter Gynn 2008-03-25    400
3   John Doe 2010-11-01    300

使用 ave() 和 unique()

unique(within(employ.data, salary <- ave(salary, employee, startdate, FUN = function(x) sum(x, na.rm = TRUE))))

得到结果

> employ.data
    employee  startdate salary
1   John Doe 2010-11-01    300
3 Peter Gynn 2008-03-25    400
5 Jolie Hope 2007-03-14    812

英文:

Here are two base R approaches to make it:

Using aggregate()

employ.data &lt;- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))

which gives

&gt; employ.data
    employee  startdate salary
1 Jolie Hope 2007-03-14    812
2 Peter Gynn 2008-03-25    400
3   John Doe 2010-11-01    300

Using ave() and unique()

unique(within(employ.data, salary &lt;- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))

which gives

&gt; employ.data
    employee  startdate salary
1   John Doe 2010-11-01    300
3 Peter Gynn 2008-03-25    400
5 Jolie Hope 2007-03-14    812

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一列进行分组，同时保留其他常数。

问题

答案1

答案2

R: 编写基于图的函数

在图表中心创建新坐标轴：ggplot

如何使用`scipy`中的`interp1d(x, y)`函数插值月度频率样本数据的缺失值

在dplyr::group_by中，获取一个或多个分组变量中的观察数量。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。