I want to convert a variable for date with several repeats into a date format suitable that can plot in a time series using R

huangapple go评论97阅读模式
英文:

I want to convert a variable for date with several repeats into a date format suitable that can plot in a time series using R

问题

我想将数据框df1_long中的Quarter列转换为日期格式。

当我尝试使用lubridate库的df1_long$Q1 <- as.Date(as.yearqtr(df1_long$Quarter, format = "Q%q-%y"))时,所有的年份都被默认为2020年。

我已经尝试了lubridate和df1a_long <- df1_long %>% mutate(qtr = quarter(x, with_year = T)),但是这并没有起作用。

当尝试其他选项时,通常会出现"Caused by error in as.POSIXlt.character(): character string is not in a standard unambiguous format"的错误,我认为这是由于在同一时间点上测量了多个国家,导致同一季度出现了多个相同的日期。

请问我该如何纠正这个问题?

英文:

I am wanting to convert the Quarter column into a date format in the data frame df1_long

Quarter        Country    value
 1: Q1-2019      Australia 1607.929
 2: Q2-2019      Australia 1267.899
 3: Q3-2019      Australia 1584.615
 4: Q4-2019      Australia 1627.014
 5: Q1-2020      Australia 2000.000
 6: Q3-2022      Australia 1960.000
 7: Q4-2022      Australia 1908.295
 8: Q1-2023      Australia 2200.000
 9: Q2-2023      Australia 1838.000
10: Q1-2019             US 3652.640
11: Q2-2019             US 3017.615
12: Q3-2019             US 3081.797
13: Q4-2019             US 3179.357
14: Q1-2020             US 4064.289
15: Q3-2022             US 3076.462
16: Q4-2022             US 3987.771
17: Q1-2023             US 4036.000
18: Q2-2023             US 3032.000
19: Q1-2019         Canada 3311.035

When I try df1_long$Q1 &lt;-as.Date(as.yearqtr(df1_long$Quarter, format = &quot;Q%q-%y&quot;)) using the lubridate library, I end up with so that all the years get defaulted to 2020.

 Quarter        Country    value         Q1
 1: Q1-2019      Australia 1607.929 2020-01-01
 2: Q2-2019      Australia 1267.899 2020-04-01
 3: Q3-2019      Australia 1584.615 2020-07-01
 4: Q4-2019      Australia 1627.014 2020-10-01
 5: Q1-2020      Australia 2000.000 2020-01-01
 6: Q3-2022      Australia 1960.000 2020-07-01
 7: Q4-2022      Australia 1908.295 2020-10-01
 8: Q1-2023      Australia 2200.000 2020-01-01
 9: Q2-2023      Australia 1838.000 2020-04-01
10: Q1-2019             US 3652.640 2020-01-01
11: Q2-2019             US 3017.615 2020-04-01
12: Q3-2019             US 3081.797 2020-07-01
13: Q4-2019             US 3179.357 2020-10-01
14: Q1-2020             US 4064.289 2020-01-01
15: Q3-2022             US 3076.462 2020-07-01
16: Q4-2022             US 3987.771 2020-10-01
17: Q1-2023             US 4036.000 2020-01-01
18: Q2-2023             US 3032.000 2020-04-01
19: Q1-2019         Canada 3311.035 2020-01-01

How can I can correct this?

I have tried lubridate as well as
df1a_long &lt;- df1_long %&gt;%
mutate(qtr = quarter(x, with_year = T))

but this didn't work.
When other options were tried it would often come up with "Caused by error in as.POSIXlt.character():
! character string is not in a standard unambiguous format" error that I think is due to there being more than one identical date appearing in the quarter as there are multiple countries being measured at the same timepoints.

答案1

得分: 1

问题是%y是指没有世纪的年份(00-99)。你需要的是%Y,它表示带有世纪的年份(例如2023年)。

你可以在R文档的这里阅读更多关于日期格式字符串的信息。

英文:

The issue was that %y is for a year without a century (00–99). What you want is %Y, which is year with century (e.g. 2023).

You can read more information about date format strings in the R documentation here.

huangapple
  • 本文由 发表于 2023年8月9日 17:27:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76866347.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定