英文:
Converting JSON Lists into Data Frames
问题
我从以下页面提取了JSON:
library(jsonlite)
results <- fromJSON("https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json")
final = results$data
当我检查输出时,我可以看到尽管输出是以"list"格式呈现,但在输出中似乎有一个"表格数据框"结构:
t3, NA, gardening, , FALSE, NA, 0, FALSE, Tree surgeon butchered my tree - will it be ok?, r/gardening, FALSE, 6, NA, 0, 140, NA, all_ads, FALSE, t3_1196op
我的问题: 根据上述情况,是否可能将此输出转换为数据框?
我尝试了以下代码:
dataframe_list = as.data.frame(final)
代码已运行,但输出仍然不是表格/数据框输出。
最终,我想以以下格式获取结果:
comment_id comment_text
1 1 I like gardening!
2 2 I dont like to garden!
3 3 its too cold outside?
4 4 try planting something different?
5 5 garden is fun!
请问有人可以向我展示如何做到这一点吗?
谢谢!
注意: 如果您查看实际网站 https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json - 所需的文本似乎位于"body:"和"edited"标签之间:
也许我以错误的方式解决了这个问题,可能有更好的方法?
英文:
I extracted the JSON from the following page:
library(jsonlite)
results <- fromJSON("https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json")
final = results$data
When I inspect the output, I can see that even though that the output is in a "list" format, there appears to be a "tabular data frame" structure within the output:
t3, NA, gardening, , FALSE, NA, 0, FALSE, Tree surgeon butchered my tree - will it be ok?, r/gardening, FALSE, 6, NA, 0, 140, NA, all_ads, FALSE, t3_1196op
My Question: Based on the above - is it possible to somehow convert this output into a data frame?
I tried the following code:
dataframe_list = as.data.frame(final)
The code ran - but the output is still not in a tabular/data frame output.
In the end, I would like to have the result in the following format:
comment_id comment_text
1 1 I like gardening!
2 2 I dont like to garden!
3 3 its too cold outside?
4 4 try planting something different?
5 5 garden is fun!
Can someone please show me how to do this?
Thanks!
Note: If you look at the actual website https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json - the desired text appears to be between the tags "body:" and "edited" :
Maybe I am approaching this problem the wrong way and there is a better way of doing this?
答案1
得分: 2
以下是使用 pluck()
、bind_rows()
和 unnest()
的一种方法:
library(jsonlite)
library(purrr)
library(dplyr)
library(tidyr)
URL <- "https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json";
fromJSON(URL) %>%
pluck("data", "children") %>%
bind_rows() %>%
filter(row_number() > 1) %>%
unnest(data) %>%
select(id, author, body) %>%
mutate(comment_id = row_number(), .before = "id")
输出:
# A tibble: 75 × 4
comment_id id author body
<int> <chr> <chr> <chr>
1 1 j9ktvi3 mikpgod "It'll grow back, probably won't be able to tell by summer. Except it'll be smaller"
2 2 j9l0egd hrudnick "Saw a tree surgeons advert today. Said, \"Don't worry, I hug them first.\""
3 3 j9kyb1v anonnewengland "It will be covered in new growth in a few months."
4 4 j9kqqqk Beatnikdan "He must've been a civil war surgeon. \n\nThey should survive but get a different tree guy to cl…
5 5 j9n0kp8 Live-Steaky "Very few people in there comment section actually know what’s up. It’s a fine pruning job, extr…
6 6 j9l2gxf Luke_low "Speaking of Tree Butchery, My parents have hired an \"amateur landscaper guy\" a bunch of times…
7 7 j9npnl1 tomt6371 "In all honesty it looks good,and definitely could have been pollarded further, it's the right s…
8 8 j9kpkux Amezrou "Had a tree surgeon round today to take the height off my Hazel and Plum trees and he’s absolute…
9 9 j9kxjyz testhec10ck "Those cut angles all look good. This seems pretty standard for an early spring pruning"
10 10 j9laq63 MarieTC "Lots of new growth will come and the tree will be fuller"
# … with 65 more rows
希望这对您有所帮助。
英文:
Here is one approach using pluck()
, bind_rows()
and unnest()
:
library(jsonlite)
library(purrr)
library(dplyr)
library(tidyr)
URL <- "https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json"
fromJSON(URL) |>
pluck("data", "children") |> # .$data$children
bind_rows() |>
filter(row_number() > 1) |>
unnest(data) |>
select(id, author, body) |>
mutate(comment_id = row_number(), .before = "id")
Output:
# A tibble: 75 × 4
comment_id id author body
<int> <chr> <chr> <chr>
1 1 j9ktvi3 mikpgod "It'll grow back, probably won't be able to tell by summer. Except it'll be smaller"
2 2 j9l0egd hrudnick "Saw a tree surgeons advert today. Said, \"Don't worry, I hug them first.\""
3 3 j9kyb1v anonnewengland "It will be covered in new growth in a few months."
4 4 j9kqqqk Beatnikdan "He must've been a civil war surgeon. \n\nThey should survive but get a different tree guy to cl…
5 5 j9n0kp8 Live-Steaky "Very few people in there comment section actually know what’s up. It’s a fine pruning job, extr…
6 6 j9l2gxf Luke_low "Speaking of Tree Butchery, My parents have hired an \"amateur landscaper guy\" a bunch of times…
7 7 j9npnl1 tomt6371 "In all honesty it looks good,and definitely could have been pollarded further, it's the right s…
8 8 j9kpkux Amezrou "Had a tree surgeon round today to take the height off my Hazel and Plum trees and he’s absolute…
9 9 j9kxjyz testhec10ck "Those cut angles all look good. This seems pretty standard for an early spring pruning"
10 10 j9laq63 MarieTC "Lots of new growth will come and the tree will be fuller"
# … with 65 more rows
答案2
得分: 1
用于解析Reddit JSON的工具,你可能想要检查RedditExtractoR包,get_thread_content()
函数返回2个数据框的列表,一个用于主题,另一个用于评论:
library(dplyr)
thread <- RedditExtractoR::get_thread_content("https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/")
thread$threads %>%
select(author, title, text) %>%
as_tibble()
#> # A tibble: 1 × 3
#> author title text
#> <chr> <chr> <chr>
#> 1 Amezrou Tree surgeon butchered my tree - will it be ok? ""
thread$comments %>%
select(comment_id, author, comment) %>%
as_tibble()
#> # A tibble: 176 × 3
#> comment_id author comment
#> <chr> <chr> <chr>
#> 1 1 mikpgod "It'll grow back, probably won't be able to tel…
# 2 1_1 Amezrou "I really hope so&"
# 3 1_1_1 mikpgod "Hazel's difficult to kill."
# 4 1_1_1_1 symetry_myass "> Hazel's difficult to kill.\n\nI see an Um…
# 5 1_1_1_2 Amezrou "Yeah but what\u0019s it going to look like whe…
# 6 1_1_1_2_1 EpidonoTheFool "Not very good in my opinion a lot of weak grow…
# 7 1_1_1_2_2 Cold-Pack-7653 "It will eventually look normal but its going t…
# 8 1_1_1_2_3 lethal_moustache "Look for images of coppiced trees. Yours will …
# 9 1_1_1_2_3_1 LeGrandePoobah "This is an interesting article. I\u0019m not s…
# 10 1_1_1_2_3_2 treecarefanatic "this is pollarding not coppicing"
# # … with 166 more rows
创建于2023-02-23,使用reprex v2.0.2
英文:
For parsing JSON from Reddit you may want to check RedditExtractoR package, get_thread_content()
returns list of 2 data.frames, one for thread and another for comments:
library(dplyr)
thread <- RedditExtractoR::get_thread_content("https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/")
thread$threads %>%
select(author, title, text) %>%
as_tibble()
#> # A tibble: 1 × 3
#> author title text
#> <chr> <chr> <chr>
#> 1 Amezrou Tree surgeon butchered my tree - will it be ok? ""
thread$comments %>%
select(comment_id, author, comment) %>%
as_tibble()
#> # A tibble: 176 × 3
#> comment_id author comment
#> <chr> <chr> <chr>
#> 1 1 mikpgod "It'll grow back, probably won't be able to tel…
#> 2 1_1 Amezrou "I really hope so&"
#> 3 1_1_1 mikpgod "Hazel's difficult to kill."
#> 4 1_1_1_1 symetry_myass "&gt; Hazel's difficult to kill.\n\nI see an Um…
#> 5 1_1_1_2 Amezrou "Yeah but what\u0019s it going to look like whe…
#> 6 1_1_1_2_1 EpidonoTheFool "Not very good in my opinion a lot of weak grow…
#> 7 1_1_1_2_2 Cold-Pack-7653 "It will eventually look normal but its going t…
#> 8 1_1_1_2_3 lethal_moustache "Look for images of coppiced trees. Yours will …
#> 9 1_1_1_2_3_1 LeGrandePoobah "This is an interesting article. I\u0019m not s…
#> 10 1_1_1_2_3_2 treecarefanatic "this is pollarding not coppicing"
#> # … with 166 more rows
<sup>Created on 2023-02-23 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论