只保留数据集中每月的第一个观测。

huangapple go评论76阅读模式
英文:

R: Only keeping the first observation of the month in dataset

问题

我有以下类型的数据框,有成千上万的列和行。第一列包含日期,后面的列包含与该日期对应的资产回报指数。

日期 资产_1 资产_2 资产_3 资产_4
2000-01-01 1000 300 2900 NA
.....
2000-01-31 1100 350 2950 NA
2000-02-02 1200 330 2970 100
...
2000-02-28 1200 360 3000 200
2000-03-01 1200 370 3500 300

我想将其转化为每月仅保留该月的第一个观察值的数据集。

我提出了以下脚本:

library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)

然而,这个方法的问题在于,它对于月份的第一天不是交易日期(即在每日数据集中缺少的日期)无法正常工作。

因此,当我运行该命令时,那些月份中不存在该月的第一天的数据将被排除在我的数据集之外。

英文:

I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.

DATE Asset_1 Asset_2 Asset_3 Asset_4
2000-01-01 1000 300 2900 NA
.....
2000-01-31 1100 350 2950 NA
2000-02-02 1200 330 2970 100
...
2000-02-28 1200 360 3000 200
2000-03-01 1200 370 3500 300

I want to make this into a monthly dataset by only keeping the first observation of the month.

I have come up with the following script:

library(dplyr)
library(lubridate)
monthly &lt;- daily %&gt;% filter(day(DATE) == 1) 

However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).

So when I run the command, those months where the first day of the month doesn't exist are excluded from my dataset.

答案1

得分: 2

如果数据始终是有序的,您可以按年\月进行分组,然后保留(切片)每个组中的第一条记录。像这样:

df <- data.frame(mydate=as.Date("2023-01-01")+1:45)

library(tidyverse)
library(lubridate)

df %>%
  group_by(ym=paste(year(mydate), month(mydate))) %>%
  #group_by(year(mydate), month(mydate)) %>%
  slice_head(n=1)

请注意,这段代码是用R语言编写的,用于对数据框(data frame)进行操作和分组。

英文:

If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:

df&lt;-data.frame(mydate=as.Date(&quot;2023-01-01&quot;)+1:45)

library(tidyverse)
library(lubridate)

df %&gt;% 
  group_by(ym=paste(year(mydate), month(mydate))) %&gt;% 
  #group_by(year(mydate), month(mydate)) %&gt;% 
  slice_head(n=1)

答案2

得分: 2

使用 slice_min

library(dplyr) # 版本 1.1.0 或更高
library(zoo)

daily %>%
  mutate(ym = as.yearmon(DATE)) %>%
  slice_min(DATE, by = ym)
英文:

Use slice_min

library(dplyr) # version 1.1.0 or later
library(zoo)

daily %&gt;% 
  mutate(ym = as.yearmon(DATE)) %&gt;%
  slice_min(DATE, by = ym)

huangapple
  • 本文由 发表于 2023年2月8日 23:26:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388010.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定