2023年2月8日 23:26:56go评论76阅读模式

英文:

R: Only keeping the first observation of the month in dataset

问题

我有以下类型的数据框，有成千上万的列和行。第一列包含日期，后面的列包含与该日期对应的资产回报指数。

日期	资产_1	资产_2	资产_3	资产_4
2000-01-01	1000	300	2900	NA
.....
2000-01-31	1100	350	2950	NA
2000-02-02	1200	330	2970	100
...
2000-02-28	1200	360	3000	200
2000-03-01	1200	370	3500	300

我想将其转化为每月仅保留该月的第一个观察值的数据集。

我提出了以下脚本：

library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)

然而，这个方法的问题在于，它对于月份的第一天不是交易日期（即在每日数据集中缺少的日期）无法正常工作。

因此，当我运行该命令时，那些月份中不存在该月的第一天的数据将被排除在我的数据集之外。

英文:

I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.

DATE	Asset_1	Asset_2	Asset_3	Asset_4
2000-01-01	1000	300	2900	NA
.....
2000-01-31	1100	350	2950	NA
2000-02-02	1200	330	2970	100
...
2000-02-28	1200	360	3000	200
2000-03-01	1200	370	3500	300

I want to make this into a monthly dataset by only keeping the first observation of the month.

I have come up with the following script:

library(dplyr)
library(lubridate)
monthly &lt;- daily %&gt;% filter(day(DATE) == 1)

However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).

So when I run the command, those months where the first day of the month doesn't exist are excluded from my dataset.

答案1

得分: 2

如果数据始终是有序的，您可以按年\月进行分组，然后保留（切片）每个组中的第一条记录。像这样：

df <- data.frame(mydate=as.Date("2023-01-01")+1:45)

library(tidyverse)
library(lubridate)

df %>%
  group_by(ym=paste(year(mydate), month(mydate))) %>%
  #group_by(year(mydate), month(mydate)) %>%
  slice_head(n=1)

请注意，这段代码是用R语言编写的，用于对数据框（data frame）进行操作和分组。

英文:

If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:

df&lt;-data.frame(mydate=as.Date(&quot;2023-01-01&quot;)+1:45)

library(tidyverse)
library(lubridate)

df %&gt;% 
  group_by(ym=paste(year(mydate), month(mydate))) %&gt;% 
  #group_by(year(mydate), month(mydate)) %&gt;% 
  slice_head(n=1)

答案2

得分: 2

使用 slice_min

library(dplyr) # 版本 1.1.0 或更高
library(zoo)

daily %>%
  mutate(ym = as.yearmon(DATE)) %>%
  slice_min(DATE, by = ym)

英文:

Use slice_min

library(dplyr) # version 1.1.0 or later
library(zoo)

daily %&gt;% 
  mutate(ym = as.yearmon(DATE)) %&gt;%
  slice_min(DATE, by = ym)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

只保留数据集中每月的第一个观测。

问题

答案1

答案2

lmer模型 – 访问模型的元素

Y轴上的刻度在geom_bar中未显示

有没有办法避免在这里使用for循环？

Manual ordering of categorical variables in ggplot has been changed.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论