如何识别列向量中的缺失月份?

huangapple go评论82阅读模式
英文:

How to identify missing months from a column vector?

问题

让我们假设我有一个包含一个名为Date的数据框,日期从2000年到2019年。问题是,我没有完美的月度频率(实际上应该有245个观察结果,但实际上只有215个)。我的目标是检测在该列中缺少哪些月份。

让我们以这个示例为例。这是一个示例数据框:

df <- data.frame(Date = c("2015-01-22", "2015-03-05", "2015-04-15", "2015-06-03", "2015-07-16", "2015-09-03", "2015-10-22", "2015-12-03", "2016-01-21", "2016-03-10", "2016-04-21", "2016-06-02", "2016-07-21", "2016-09-08", "2016-10-20", "2016-12-08", "2017-01-19", "2017-03-09", "2017-04-27", "2017-06-08", "2017-07-20", "2017-09-07", "2017-10-26", "2017-12-14", "2018-01-25", "2018-03-08", "2018-04-26", "2018-06-14", "2018-07-26", "2018-09-13", "2018-10-25", "2018-12-13", "2019-01-24", "2019-03-07", "2019-04-10", "2019-06-06", "2019-07-25", "2019-09-12", "2019-10-24", "2019-12-12"))

df

我想找到一个代码,能够告诉我在我的日期列向量中缺少哪些月份。

有人可以帮助我吗?

非常感谢。

英文:

Let's assume I have a dataframe with one column - Date - that goes from 2000 to 2019. The problem is that I don't have perfect monthly frequence (in fact I should have 245 observations, instead I only have 215). My aim is to detect what are the missing months in the column.

Let's take this example. This is a sample dataframe:

df &lt;- data.frame(Date = c(&quot;2015-01-22&quot;, &quot;2015-03-05&quot;, &quot;2015-04-15&quot;, &quot;2015-06-03&quot;, &quot;2015-07-16&quot;, &quot;2015-09-03&quot;, &quot;2015-10-22&quot;, &quot;2015-12-03&quot;, &quot;2016-01-21&quot;, &quot;2016-03-10&quot;, &quot;2016-04-21&quot;, &quot;2016-06-02&quot;, &quot;2016-07-21&quot;, &quot;2016-09-08&quot;, &quot;2016-10-20&quot;, &quot;2016-12-08&quot;, &quot;2017-01-19&quot;, &quot;2017-03-09&quot;, &quot;2017-04-27&quot;, &quot;2017-06-08&quot;, &quot;2017-07-20&quot;, &quot;2017-09-07&quot;, &quot;2017-10-26&quot;, &quot;2017-12-14&quot;, &quot;2018-01-25&quot;, &quot;2018-03-08&quot;, &quot;2018-04-26&quot;, &quot;2018-06-14&quot;, &quot;2018-07-26&quot;, &quot;2018-09-13&quot;, &quot;2018-10-25&quot;, &quot;2018-12-13&quot;, &quot;2019-01-24&quot;, &quot;2019-03-07&quot;, &quot;2019-04-10&quot;, &quot;2019-06-06&quot;, &quot;2019-07-25&quot;, &quot;2019-09-12&quot;, &quot;2019-10-24&quot;, &quot;2019-12-12&quot;))

df

I would like to find a code that gives me what are the missing months in my column vector of dates.

Can anyone help me?

Thanks a lot

答案1

得分: 2

以下是查看缺失月份的两种结果类型,使用基本的R语言:

  • 如果您想查看不考虑年份的缺失月份,可以尝试以下代码:
missingMonths <- month.name[setdiff(seq(12), as.numeric(format(as.Date(df$Date), "%m")))]

这将返回以下结果:

> missingMonths
[1] "February" "May"      "August"   "November"
  • 如果您想按年份检查缺失的月份,可以尝试以下代码:
missingMonths <- lapply(split(df, format(as.Date(df$Date), "%Y")), 
                        function(x) month.name[setdiff(seq(12), as.numeric(format(as.Date(x$Date), "%m")))])

这将返回以下结果:

> missingMonths
$`2015`
[1] "February" "May"      "August"   "November"

$`2016`
[1] "February" "May"      "August"   "November"

$`2017`
[1] "February" "May"      "August"   "November"

$`2018`
[1] "February" "May"      "August"   "November"

$`2019`
[1] "February" "May"      "August"   "November"
英文:

Here are two types of results to see the missing months, with base R:

  • If you want to see the missing month regardless of years, you can try the following code
missingMonths &lt;- month.name[setdiff(seq(12),as.numeric(format(as.Date(df$Date),&quot;%m&quot;)))]

such that

&gt; missingMonths
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;
  • If you want to check the missing months by year, you can try the code below:
missingMonths &lt;- lapply(split(df,format(as.Date(df$Date),&quot;%Y&quot;)), 
                        function(x) month.name[setdiff(seq(12),as.numeric(format(as.Date(x$Date),&quot;%m&quot;)))])

such that

&gt; missingMonths
$`2015`
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;

$`2016`
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;

$`2017`
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;

$`2018`
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;

$`2019`
[1] &quot;February&quot; &quot;May&quot;      &quot;August&quot;   &quot;November&quot;

答案2

得分: 1

以下是您要翻译的代码部分:

month_date_strings &lt;- unique(paste0(sub(&quot;-[^-]+$&quot;, &quot;&quot;, 
                           sapply(df$Date, as.character)), &quot;-01&quot;))

month_seq_strings &lt;- unique(as.character(seq.Date(as.Date(&quot;2000-01-01&quot;, &quot;%Y-%m-%d&quot;),
                          as.Date(&quot;2019-12-31&quot;, &quot;%Y-%m-%d&quot;), by = &quot;month&quot;)))

month_seq_strings[!(month_seq_strings %in% month_date_strings)]
英文:

Not as succinct as above, but still does the trick in a couple of steps:

month_date_strings &lt;- unique(paste0(sub(&quot;-[^-]+$&quot;, &quot;&quot;, 
                           sapply(df$Date, as.character)), &quot;-01&quot;))


month_seq_strings &lt;- unique(as.character(seq.Date(as.Date(&quot;2000-01-01&quot;, &quot;%Y-%m-%d&quot;),
                      as.Date(&quot;2019-12-31&quot;, &quot;%Y-%m-%d&quot;), by = &quot;month&quot;)))

month_seq_strings[!(month_seq_strings %in% month_date_strings)]

huangapple
  • 本文由 发表于 2020年1月3日 18:50:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/59577236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定