英文:
How to unnest a dictionary from XML in R?
问题
我正在尝试将此 XML 转换为 R 中的数据框:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml
library(xml2)
library(tidyverse)
fileurl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xmllist <- as_list(read_xml(fileurl))
xml_df = tibble::as_tibble(xmllist) %>%
unnest_longer(response)
row_wider = xml_df %>%
unnest_wider(response)
row_df = row_wider %>%
unnest(cols = names(.)) %>%
unnest(cols = names(.)) %>%
readr::type_convert()
问题在于 'location_1' 列是一个字典,在我展开时会显示为 NA。如何将此字典的每个值放入此列?非常感谢您的帮助。
英文:
I am attempting to convert this xml to a dataframe in R:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml
library(xml2)
library(tidyverse)
fileurl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xmllist <- as_list(read_xml(fileurl))
xml_df = tibble::as_tibble(xmllist) %>%
unnest_longer(response)
row_wider = xml_df %>%
unnest_wider(response)
row_df = row_wider %>%
unnest(cols = names(.)) %>%
unnest(cols = names(.)) %>%
readr::type_convert()
The issue is that the 'location_1' column is a dictionary and shows up as NA when I unnest. How can I get each of the values of this dictionary into this column? Any help is much appreciated, thanks.
答案1
得分: 3
以下是您要翻译的内容:
"所请求的地址数据以JSON格式存储在XML节点的属性中。接下来我会提取这些属性,将JSON数据转换并合并。然后,生成的数据框可以绑定到之前执行的工作中。
有关详细信息,请参阅注释。
library(xml2)
library(jsonlite)
library(tidyverse)
# 将文件读取为XML
page <- read_xml(fileurl)
# 提取餐厅节点到一个向量中
restaurants <- page %>% xml_find_all(".//row/row")
# 获取存储为属性数据的地址数据
addresses <- restaurants %>% xml_find_first(".//location_1") %>% xml_attr("human_address")
# 这是一个JSON数据结构的向量
# 将JSON转换为数据框
dfs <- lapply(addresses, function(address){
address %>% fromJSON() %>% as.data.frame()
})
# 合并所有数据框
answer <- bind_rows(dfs)
answer
address city state zip
1 4509 BELAIR ROAD Baltimore MD
2 1919 FLEET ST Baltimore MD
3 2844 HUDSON ST Baltimore MD
4 3998 ROLAND AVE Baltimore MD
5 2481 frederick ave Baltimore MD
6 2722 HARFORD RD Baltimore MD
```"
<details>
<summary>英文:</summary>
The requested address data is stored as JSON in the XML node's attribute.
Below I extract the attribute, convert the JSON and then merge. The resulting dataframe can then be binded to the work performed did above.
See comments for details.
```r
library(xml2)
library(jsonlite)
library(tidyverse)
#read file as xml
page <- read_xml(fileurl)
#extract out the restaurant nodes into a vector
restaurants <- page %>% xml_find_all(".//row/row")
#get the address data which is stored as attribute data
addresses <- restaurants %>% xml_find_first(".//location_1") %>% xml_attr("human_address")
#this is a vector of JSON data structures
#convert the JSON to a data frame
dfs <- lapply(addresses, function(address){
address %>% fromJSON() %>% as.data.frame()
})
#combine all of the data frames
answer<- bind_rows(dfs)
answer
address city state zip
1 4509 BELAIR ROAD Baltimore MD
2 1919 FLEET ST Baltimore MD
3 2844 HUDSON ST Baltimore MD
4 3998 ROLAND AVE Baltimore MD
5 2481 frederick ave Baltimore MD
6 2722 HARFORD RD Baltimore MD
答案2
得分: 2
location_1
列是一个空列表(因此你会得到NA
值),具有两个属性:human_address
,它是一个JSON字符串,和一个逻辑值needs_recoding
。获得你想要的结果的一个选项是首先提取这些属性的内容并将它们存储在一个list
中。然后,你可以使用两个unnest_wider
来展开这个列表列。
library(xml2)
library(tidyverse)
parse_location_1 <- function(x) {
x$location_1 <- list(
human_address = jsonlite::fromJSON(attr(x$location_1, "human_address")),
needs_recoding = attr(x$location_1, "needs_recoding")
)
x
}
fileurl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xmllist <- as_list(read_xml(fileurl))
xml_df <- tibble::as_tibble(xmllist) %>%
unnest_longer(response) %>|
mutate(response = map(
response, parse_location_1
))
row_wider <- xml_df %>%
unnest_wider(response) %>|
unnest_wider(location_1) %>|
unnest_wider(human_address)
row_df <- row_wider %>%
unnest(cols = where(is.list)) %>%
unnest(cols = where(is.list)) %>%
readr::type_convert()
希望这对你有所帮助。
英文:
The location_1
column is an empty list (hence you get NA
s) with two attributes human_address
which is a JSON string and a logical needs_recoding
. One option to get your desired result would be to first extract the content of these attributes and store them in a list
. Afterwards you could use two unnest_wider
to unnest the list column.
library(xml2)
library(tidyverse)
parse_location_1 <- function(x) {
x$location_1 <- list(
human_address = jsonlite::fromJSON(attr(x$location_1, "human_address")),
needs_recoding = attr(x$location_1, "needs_recoding")
)
x
}
fileurl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xmllist <- as_list(read_xml(fileurl))
xml_df <- tibble::as_tibble(xmllist) %>%
unnest_longer(response) |>
mutate(response = map(
response, parse_location_1
))
row_wider <- xml_df %>%
unnest_wider(response) |>
unnest_wider(location_1) |>
unnest_wider(human_address)
row_df <- row_wider %>%
unnest(cols = where(is.list)) %>%
unnest(cols = where(is.list)) %>%
readr::type_convert()
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> name = col_character(),
#> zipcode = col_double(),
#> neighborhood = col_character(),
#> councildistrict = col_double(),
#> policedistrict = col_character(),
#> address = col_character(),
#> city = col_character(),
#> state = col_character(),
#> zip = col_logical(),
#> needs_recoding = col_logical(),
#> response_id = col_character()
#> )
head(row_df)
#> # A tibble: 6 × 11
#> name zipcode neighborhood councildistrict policedistrict address city state
#> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 410 21206 Frankford 2 NORTHEASTERN 4509 B… Balt… MD
#> 2 1919 21231 Fells Point 1 SOUTHEASTERN 1919 F… Balt… MD
#> 3 SAUTE 21224 Canton 1 SOUTHEASTERN 2844 H… Balt… MD
#> 4 #1 CH… 21211 Hampden 14 NORTHERN 3998 R… Balt… MD
#> 5 #1 ch… 21223 Millhill 9 SOUTHWESTERN 2481 f… Balt… MD
#> 6 19TH … 21218 Clifton Park 14 NORTHEASTERN 2722 H… Balt… MD
#> # ℹ 3 more variables: zip <lgl>, needs_recoding <lgl>, response_id <chr>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论