英文:
R: Correctly Writing and Converting Dates in UTC Format
问题
我正在尝试学习如何使用Reddit API(https://www.reddit.com/prefs/apps)。
我注册了一个账户/API - 我想检索在2020年3月1日至2020年3月2日期间发布的包含单词"covid"的100条评论。
使用API的文档(https://www.reddit.com/dev/api/),我运行了以下R代码。
首先,我注册了API:
library(httr)
library(jsonlite)
response <- POST("https://www.reddit.com/api/v1/access_token",
authenticate('****', '***'),
user_agent("some_name"),
body = list(grant_type="password",
username="aaaaaa",
password="bbbbbbb"))
access_token_json <- rawToChar(response$content)
access_token_content <- fromJSON(access_token_json)
access_token <- access_token_content$access_token
access_token
url <- "https://oauth.reddit.com/LISTING" # 尝试 api/v1/me
authorization_bearer <- paste("Bearer ", access_token, sep="")
result <- GET(url,
user_agent("some_name"),
add_headers(Authorization = authorization_bearer))
接下来,我尝试发起请求:
# 设置开始和结束时间
start_time <- as.numeric(as.POSIXct("2020-03-01 16:52:00", tz = "UTC"))
end_time <- as.numeric(as.POSIXct("2020-03-02 13:52:00", tz = "UTC"))
# 设置查询参数
query_params <- list(
limit = 100,
q = "covid",
#subreddit = "news", #
#author = "some_author", #
after = start_time,
before = end_time
)
# 进行API请求
response <- GET(url, query = query_params, add_headers(Authorization = authorization_header, `User-Agent` = user_agent_string))
# 提取响应
response_json <- rawToChar(response$content)
response_content <- fromJSON(response_json)
# 提取相关字段
final_result <- data.frame(
title = response_content$data$children$data$title,
subreddit = response_content$data$children$data$subreddit,
author = response_content$data$children$data$author,
created_utc = as.POSIXct(response_content$data$children$data$created_utc, origin = "1970-01-01", tz = "UTC"),
permalink = paste0("https://www.reddit.com", response_content$data$children$data$permalink)
)
当我查看结果时:
> head(final_result)
title subreddit author created_utc
1 COVID-19 is a leading cause of death in children and young people in the United States science thebelsnickle1991 2023-01-30 16:27:21
2 I thought Covid was a thing of the past.. double vaccinated wicked sick mildlyinfuriating Northeast4life 2023-02-19 00:52:03
3 ‘People aren’t taking this seriously’: experts say US Covid surge is big risk news ttkciar 2023-01-15 17:25:12
正如我们在这里看到的,日期不在指定的日期范围内。
我认为问题可能是我没有正确地转换日期,所以我查看了原始的UTC日期:
> final_result$created_utc
[1] 1675096041 1676767923 1673803512 1671022066 1674220458
但是这些日期似乎不对应我在2020年3月想要的日期(https://www.epochconverter.com/)。
我是不是没有正确地写日期 - 请问有人能帮我修复这个吗?
谢谢!
英文:
I am trying to learn how to use the Reddit API (https://www.reddit.com/prefs/apps).
I registered for an account/API - I want to retrieve 100 comments containing the word "covid" posted between March-01-2020 and March-02-2020.
Using the documentation from the API (https://www.reddit.com/dev/api/), I ran the following R code.
First, I registered the API:
library(httr)
library(jsonlite)
response <- POST("https://www.reddit.com/api/v1/access_token",
authenticate('****', '***'),
user_agent("some_name"),
body = list(grant_type="password",
username="aaaaaa",
password="bbbbbbb"))
access_token_json <- rawToChar(response$content)
access_token_content <- fromJSON(access_token_json)
access_token <- access_token_content$access_token
access_token
url <- "https://oauth.reddit.com/LISTING" # try api/v1/me
authorization_bearer <- paste("Bearer ", access_token, sep="")
result <- GET(url,
user_agent("some_name"),
add_headers(Authorization = authorization_bearer))
Next, I tried to make a request:
# Set start and end times
start_time <- as.numeric(as.POSIXct("2020-03-01 16:52:00", tz = "UTC"))
end_time <- as.numeric(as.POSIXct("2020-03-02 13:52:00", tz = "UTC"))
# Set query parameters
query_params <- list(
limit = 100,
q = "covid",
#subreddit = "news", #
#author = "some_author", #
after = start_time,
before = end_time
)
# Make the API request
response <- GET(url, query = query_params, add_headers(Authorization = authorization_header, `User-Agent` = user_agent_string))
# Extract the response
response_json <- rawToChar(response$content)
response_content <- fromJSON(response_json)
# Extract the relevant fields
final_result <- data.frame(
title = response_content$data$children$data$title,
subreddit = response_content$data$children$data$subreddit,
author = response_content$data$children$data$author,
created_utc = as.POSIXct(response_content$data$children$data$created_utc, origin = "1970-01-01", tz = "UTC"),
permalink = paste0("https://www.reddit.com", response_content$data$children$data$permalink)
)
When I look at the results:
> head(final_result)
title subreddit author created_utc
1 COVID-19 is a leading cause of death in children and young people in the United States science thebelsnickle1991 2023-01-30 16:27:21
2 I thought Covid was a thing of the past.. double vaccinated wicked sick mildlyinfuriating Northeast4life 2023-02-19 00:52:03
3 ‘People aren’t taking this seriously’: experts say US Covid surge is big risk news ttkciar 2023-01-15 17:25:12
As we can see here, the dates are not between the specified dates.
I thought the problem is that maybe I am not converting the date properly, so I looked at the raw UTC dates :
> final_result$created_utc
[1] 1675096041 1676767923 1673803512 1671022066 1674220458
But these dates do not seem to correspond to the dates I wanted in March 2020 (https://www.epochconverter.com/).
Am I not writing the dates correctly - can someone please show me how to fix this?
Thanks!
答案1
得分: 2
尝试使用as.character()
函数将开始和结束字符串转换为它们的Unix时间戳:
# 设置开始和结束时间
start_time <- as.character(as.POSIXct("2020-03-01 00:00:00", tz = "UTC"))
end_time <- as.character(as.POSIXct("2020-03-02 23:59:59", tz = "UTC"))
英文:
Try converting the start and end strings to their Unix timestamps with the as.character()
function:
# Set start and end times
start_time <- as.character(as.POSIXct("2020-03-01 00:00:00", tz = "UTC"))
end_time <- as.character(as.POSIXct("2020-03-02 23:59:59", tz = "UTC"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论