R:以正确的方式编写和转换日期为UTC格式

huangapple go评论101阅读模式
英文:

R: Correctly Writing and Converting Dates in UTC Format

问题

我正在尝试学习如何使用Reddit API(https://www.reddit.com/prefs/apps)。

我注册了一个账户/API - 我想检索在2020年3月1日至2020年3月2日期间发布的包含单词"covid"的100条评论。

使用API的文档(https://www.reddit.com/dev/api/),我运行了以下R代码。

首先,我注册了API:

  1. library(httr)
  2. library(jsonlite)
  3. response <- POST("https://www.reddit.com/api/v1/access_token",
  4. authenticate('****', '***'),
  5. user_agent("some_name"),
  6. body = list(grant_type="password",
  7. username="aaaaaa",
  8. password="bbbbbbb"))
  9. access_token_json <- rawToChar(response$content)
  10. access_token_content <- fromJSON(access_token_json)
  11. access_token <- access_token_content$access_token
  12. access_token
  13. url <- "https://oauth.reddit.com/LISTING" # 尝试 api/v1/me
  14. authorization_bearer <- paste("Bearer ", access_token, sep="")
  15. result <- GET(url,
  16. user_agent("some_name"),
  17. add_headers(Authorization = authorization_bearer))

接下来,我尝试发起请求:

  1. # 设置开始和结束时间
  2. start_time <- as.numeric(as.POSIXct("2020-03-01 16:52:00", tz = "UTC"))
  3. end_time <- as.numeric(as.POSIXct("2020-03-02 13:52:00", tz = "UTC"))
  4. # 设置查询参数
  5. query_params <- list(
  6. limit = 100,
  7. q = "covid",
  8. #subreddit = "news", #
  9. #author = "some_author", #
  10. after = start_time,
  11. before = end_time
  12. )
  13. # 进行API请求
  14. response <- GET(url, query = query_params, add_headers(Authorization = authorization_header, `User-Agent` = user_agent_string))
  15. # 提取响应
  16. response_json <- rawToChar(response$content)
  17. response_content <- fromJSON(response_json)
  18. # 提取相关字段
  19. final_result <- data.frame(
  20. title = response_content$data$children$data$title,
  21. subreddit = response_content$data$children$data$subreddit,
  22. author = response_content$data$children$data$author,
  23. created_utc = as.POSIXct(response_content$data$children$data$created_utc, origin = "1970-01-01", tz = "UTC"),
  24. permalink = paste0("https://www.reddit.com", response_content$data$children$data$permalink)
  25. )

当我查看结果时:

  1. > head(final_result)
  2. title subreddit author created_utc
  3. 1 COVID-19 is a leading cause of death in children and young people in the United States science thebelsnickle1991 2023-01-30 16:27:21
  4. 2 I thought Covid was a thing of the past.. double vaccinated wicked sick mildlyinfuriating Northeast4life 2023-02-19 00:52:03
  5. 3 People arent taking this seriously: experts say US Covid surge is big risk news ttkciar 2023-01-15 17:25:12

正如我们在这里看到的,日期不在指定的日期范围内。

我认为问题可能是我没有正确地转换日期,所以我查看了原始的UTC日期:

  1. > final_result$created_utc
  2. [1] 1675096041 1676767923 1673803512 1671022066 1674220458

但是这些日期似乎不对应我在2020年3月想要的日期(https://www.epochconverter.com/)。

我是不是没有正确地写日期 - 请问有人能帮我修复这个吗?

谢谢!

英文:

I am trying to learn how to use the Reddit API (https://www.reddit.com/prefs/apps).

I registered for an account/API - I want to retrieve 100 comments containing the word "covid" posted between March-01-2020 and March-02-2020.

Using the documentation from the API (https://www.reddit.com/dev/api/), I ran the following R code.

First, I registered the API:

  1. library(httr)
  2. library(jsonlite)
  3. response &lt;- POST(&quot;https://www.reddit.com/api/v1/access_token&quot;,
  4. authenticate(&#39;****&#39;, &#39;***&#39;),
  5. user_agent(&quot;some_name&quot;),
  6. body = list(grant_type=&quot;password&quot;,
  7. username=&quot;aaaaaa&quot;,
  8. password=&quot;bbbbbbb&quot;))
  9. access_token_json &lt;- rawToChar(response$content)
  10. access_token_content &lt;- fromJSON(access_token_json)
  11. access_token &lt;- access_token_content$access_token
  12. access_token
  13. url &lt;- &quot;https://oauth.reddit.com/LISTING&quot; # try api/v1/me
  14. authorization_bearer &lt;- paste(&quot;Bearer &quot;, access_token, sep=&quot;&quot;)
  15. result &lt;- GET(url,
  16. user_agent(&quot;some_name&quot;),
  17. add_headers(Authorization = authorization_bearer))

Next, I tried to make a request:

  1. # Set start and end times
  2. start_time &lt;- as.numeric(as.POSIXct(&quot;2020-03-01 16:52:00&quot;, tz = &quot;UTC&quot;))
  3. end_time &lt;- as.numeric(as.POSIXct(&quot;2020-03-02 13:52:00&quot;, tz = &quot;UTC&quot;))
  4. # Set query parameters
  5. query_params &lt;- list(
  6. limit = 100,
  7. q = &quot;covid&quot;,
  8. #subreddit = &quot;news&quot;, #
  9. #author = &quot;some_author&quot;, #
  10. after = start_time,
  11. before = end_time
  12. )
  13. # Make the API request
  14. response &lt;- GET(url, query = query_params, add_headers(Authorization = authorization_header, `User-Agent` = user_agent_string))
  15. # Extract the response
  16. response_json &lt;- rawToChar(response$content)
  17. response_content &lt;- fromJSON(response_json)
  18. # Extract the relevant fields
  19. final_result &lt;- data.frame(
  20. title = response_content$data$children$data$title,
  21. subreddit = response_content$data$children$data$subreddit,
  22. author = response_content$data$children$data$author,
  23. created_utc = as.POSIXct(response_content$data$children$data$created_utc, origin = &quot;1970-01-01&quot;, tz = &quot;UTC&quot;),
  24. permalink = paste0(&quot;https://www.reddit.com&quot;, response_content$data$children$data$permalink)
  25. )

When I look at the results:

  1. &gt; head(final_result)
  2. title subreddit author created_utc
  3. 1 COVID-19 is a leading cause of death in children and young people in the United States science thebelsnickle1991 2023-01-30 16:27:21
  4. 2 I thought Covid was a thing of the past.. double vaccinated wicked sick mildlyinfuriating Northeast4life 2023-02-19 00:52:03
  5. 3 People arent taking this seriously’: experts say US Covid surge is big risk news ttkciar 2023-01-15 17:25:12

As we can see here, the dates are not between the specified dates.

I thought the problem is that maybe I am not converting the date properly, so I looked at the raw UTC dates :

  1. &gt; final_result$created_utc
  2. [1] 1675096041 1676767923 1673803512 1671022066 1674220458

But these dates do not seem to correspond to the dates I wanted in March 2020 (https://www.epochconverter.com/).

Am I not writing the dates correctly - can someone please show me how to fix this?

Thanks!

答案1

得分: 2

尝试使用as.character()函数将开始和结束字符串转换为它们的Unix时间戳:

  1. # 设置开始和结束时间
  2. start_time <- as.character(as.POSIXct("2020-03-01 00:00:00", tz = "UTC"))
  3. end_time <- as.character(as.POSIXct("2020-03-02 23:59:59", tz = "UTC"))
英文:

Try converting the start and end strings to their Unix timestamps with the as.character() function:

  1. # Set start and end times
  2. start_time &lt;- as.character(as.POSIXct(&quot;2020-03-01 00:00:00&quot;, tz = &quot;UTC&quot;))
  3. end_time &lt;- as.character(as.POSIXct(&quot;2020-03-02 23:59:59&quot;, tz = &quot;UTC&quot;))

huangapple
  • 本文由 发表于 2023年2月27日 09:02:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75576022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定