英文:
R: Only keeping the first observation of the month in dataset
问题
我有以下类型的数据框,有成千上万的列和行。第一列包含日期,后面的列包含与该日期对应的资产回报指数。
日期 | 资产_1 | 资产_2 | 资产_3 | 资产_4 |
---|---|---|---|---|
2000-01-01 | 1000 | 300 | 2900 | NA |
..... | ||||
2000-01-31 | 1100 | 350 | 2950 | NA |
2000-02-02 | 1200 | 330 | 2970 | 100 |
... | ||||
2000-02-28 | 1200 | 360 | 3000 | 200 |
2000-03-01 | 1200 | 370 | 3500 | 300 |
我想将其转化为每月仅保留该月的第一个观察值的数据集。
我提出了以下脚本:
library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)
然而,这个方法的问题在于,它对于月份的第一天不是交易日期(即在每日数据集中缺少的日期)无法正常工作。
因此,当我运行该命令时,那些月份中不存在该月的第一天的数据将被排除在我的数据集之外。
英文:
I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.
DATE | Asset_1 | Asset_2 | Asset_3 | Asset_4 |
---|---|---|---|---|
2000-01-01 | 1000 | 300 | 2900 | NA |
..... | ||||
2000-01-31 | 1100 | 350 | 2950 | NA |
2000-02-02 | 1200 | 330 | 2970 | 100 |
... | ||||
2000-02-28 | 1200 | 360 | 3000 | 200 |
2000-03-01 | 1200 | 370 | 3500 | 300 |
I want to make this into a monthly dataset by only keeping the first observation of the month.
I have come up with the following script:
library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)
However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).
So when I run the command, those months where the first day of the month doesn't exist are excluded from my dataset.
答案1
得分: 2
如果数据始终是有序的,您可以按年\月进行分组,然后保留(切片)每个组中的第一条记录。像这样:
df <- data.frame(mydate=as.Date("2023-01-01")+1:45)
library(tidyverse)
library(lubridate)
df %>%
group_by(ym=paste(year(mydate), month(mydate))) %>%
#group_by(year(mydate), month(mydate)) %>%
slice_head(n=1)
请注意,这段代码是用R语言编写的,用于对数据框(data frame)进行操作和分组。
英文:
If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:
df<-data.frame(mydate=as.Date("2023-01-01")+1:45)
library(tidyverse)
library(lubridate)
df %>%
group_by(ym=paste(year(mydate), month(mydate))) %>%
#group_by(year(mydate), month(mydate)) %>%
slice_head(n=1)
答案2
得分: 2
使用 slice_min
library(dplyr) # 版本 1.1.0 或更高
library(zoo)
daily %>%
mutate(ym = as.yearmon(DATE)) %>%
slice_min(DATE, by = ym)
英文:
Use slice_min
library(dplyr) # version 1.1.0 or later
library(zoo)
daily %>%
mutate(ym = as.yearmon(DATE)) %>%
slice_min(DATE, by = ym)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论