英文:
How to convert information in a JSON file into a dataframe in R?
问题
我有一个名为data.json
的JSON文件中包含以下信息:
{
"header": {
"apiVersion": "v1",
"code": "200",
"service": "catalogwebservice",
"developerMessage": "",
"userMessage": "OK",
"errorCode": "1",
"docLink": "https://ega-archive.org",
"errorStack": ""
},
"response": {
"numTotalResults": 12,
"resultType": "SampleData",
"result": [
{
"alias": "JKDFG093.T2",
"egaStableId": "EGAN00003456789",
"centerName": "Novartis",
"creationTime": "2016-05-13Y17:08.001Z",
"title": "JKDFG093.T2",
"bioSampleId": "MADFG110656789",
"subjectId": "JKDFG093",
"gender": "male",
"phenotype": "Cancer",
"attributes": null
},
{
"alias": "JKDFG093.T1",
"egaStableId": "EGAN00003456780",
"centerName": "Novartis",
"creationTime": "2016-05-13Y17:08.001Z",
"title": "JKDFG093.T1",
"bioSampleId": "MADFG110656790",
"subjectId": "JKDFG093",
"gender": "female",
"phenotype": "Cancer",
"attributes": null
},
{
"alias": "JKDFG087.T1",
"egaStableId": "EGAN00003456781",
"centerName": "Novartis",
"creationTime": "2016-05-13Y17:08.001Z",
"title": "JKDFG087.T1",
"bioSampleId": "MADFG110656791",
"subjectId": "JKDFG087",
"gender": "male",
"phenotype": "Cancer",
"attributes": null
}
]
}
}
我想将这个JSON文件中的信息转换成一个数据框。我需要的信息包括上述JSON文件中的alias, egaStableId, centerName, creationTime, title, bioSampleId, subjectId, gender, phenotype, and attributes
作为列名,以及它们在数据框中显示的相应信息。
我已经在R中加载了JSON文件并尝试将其转换为数据框,但出现了一些错误。
library(rjson)
data <- rjson::fromJSON(file = "data.json")
json_data_frame <- as.data.frame(data)
我收到的错误消息是:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
任何帮助将不胜感激。谢谢!
英文:
I have some information in the JSON
file named data.json
like below:
{
"header" : {
"apiVersion" : "v1",
"code" : "200",
"service" : "catalogwebservice",
"developerMessage" : "",
"userMessage" : "OK",
"errorCode" : "1",
"docLink" : "https://ega-archive.org",
"errorStack" : ""
},
"response" : {
"numTotalResults" : 12,
"resultType" : "SampleData",
"result" : [ {
"alias" : "JKDFG093.T2",
"egaStableId" : "EGAN00003456789",
"centerName" : "Novartis",
"creationTime" : "2016-05-13Y17:08.001Z",
"title" : "JKDFG093.T2",
"bioSampleId" : "MADFG110656789",
"subjectId" : "JKDFG093",
"gender" : "male",
"phenotype" : "Cancer",
"attributes" : null
}, {
"alias" : "JKDFG093.T1",
"egaStableId" : "EGAN00003456780",
"centerName" : "Novartis",
"creationTime" : "2016-05-13Y17:08.001Z",
"title" : "JKDFG093.T1",
"bioSampleId" : "MADFG110656790",
"subjectId" : "JKDFG093",
"gender" : "female",
"phenotype" : "Cancer",
"attributes" : null
}, {
"alias" : "JKDFG087.T1",
"egaStableId" : "EGAN00003456781",
"centerName" : "Novartis",
"creationTime" : "2016-05-13Y17:08.001Z",
"title" : "JKDFG087.T1",
"bioSampleId" : "MADFG110656791",
"subjectId" : "JKDFG087",
"gender" : "male",
"phenotype" : "Cancer",
"attributes" : null
} ]
}
}
I want to convert the information in a JSON
file into a data frame. I need information like alias, egaStableId, centerName, creationTime, title, bioSampleId, subjectId, gender, phenotype, and attributes
from the above JSON file as column names and their respective information showing in the data frame.
I loaded the JSON file in R
and tried converting it into a data frame, but ended up with some errors.
library(rjson)
data <- rjson::fromJSON(file = "data.json")
json_data_frame <- as.data.frame(data)
The error I got:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
Any help is appreciated. Thank you !!
答案1
得分: 3
你需要挖掘数据以获得结果,即$response$result
。
将此处的 json
替换为您的文件名:
jsonlite::fromJSON(json)$response$result
# alias egaStableId centerName creationTime title bioSampleId subjectId gender phenotype attributes
# 1 JKDFG093.T2 EGAN00003456789 Novartis 2016-05-13Y17:08.001Z JKDFG093.T2 MADFG110656789 JKDFG093 男性 癌症 NA
# 2 JKDFG093.T1 EGAN00003456780 Novartis 2016-05-13Y17:08.001Z JKDFG093.T1 MADFG110656790 JKDFG093 女性 癌症 NA
# 3 JKDFG087.T1 EGAN00003456781 Novartis 2016-05-13Y17:08.001Z JKDFG087.T1 MADFG110656791 JKDFG087 男性 癌症 NA
换句话说,如果您查看fromJSON
的输出,您会看到列表中嵌套了一个数据框(稍微深层次):
str(jsonlite::fromJSON(json))
# List of 2
# $ header :List of 8
# ..$ apiVersion : chr "v1"
# ..$ code : chr "200"
# ..$ service : chr "catalogwebservice"
# ..$ developerMessage: chr ""
# ..$ userMessage : chr "OK"
# ..$ errorCode : chr "1"
# ..$ docLink : chr "https://ega-archive.org"
# ..$ errorStack : chr ""
# $ response:List of 3
# ..$ numTotalResults: int 12
# ..$ resultType : chr "SampleData"
# ..$ result :'data.frame': 3 obs. of 10 variables:
# .. ..$ alias : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
# .. ..$ egaStableId : chr [1:3] "EGAN00003456789" "EGAN00003456780" "EGAN00003456781"
# .. ..$ centerName : chr [1:3] "Novartis" "Novartis" "Novartis"
# .. ..$ creationTime: chr [1:3] "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z"
# .. ..$ title : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
# .. ..$ bioSampleId : chr [1:3] "MADFG110656789" "MADFG110656790" "MADFG110656791"
# .. ..$ subjectId : chr [1:3] "JKDFG093" "JKDFG093" "JKDFG087"
# .. ..$ gender : chr [1:3] "male" "female" "male"
# .. ..$ phenotype : chr [1:3] "Cancer" "Cancer" "Cancer"
# .. ..$ attributes : logi [1:3] NA NA NA
我使用 jsonlite
,我相信它接近 rjson
,同样的操作应该有效。如果您的输出在str
输出中不是
..$ result :'data.frame': 3 obs. of 10 variables:
那么只需将其包装在 as.data.frame
中,如下所示:
as.data.frame(jsonlite::fromJSON(json)$response$result)
英文:
You need to dig into the data to get to the results, namely $response$result
.
Replace json
here with your filename:
jsonlite::fromJSON(json)$response$result
# alias egaStableId centerName creationTime title bioSampleId subjectId gender phenotype attributes
# 1 JKDFG093.T2 EGAN00003456789 Novartis 2016-05-13Y17:08.001Z JKDFG093.T2 MADFG110656789 JKDFG093 male Cancer NA
# 2 JKDFG093.T1 EGAN00003456780 Novartis 2016-05-13Y17:08.001Z JKDFG093.T1 MADFG110656790 JKDFG093 female Cancer NA
# 3 JKDFG087.T1 EGAN00003456781 Novartis 2016-05-13Y17:08.001Z JKDFG087.T1 MADFG110656791 JKDFG087 male Cancer NA
Namely, if you look at the output from fromJSON
, you'll see that there is a frame nested (a little deeply) in the list:
str(jsonlite::fromJSON(json))
# List of 2
# $ header :List of 8
# ..$ apiVersion : chr "v1"
# ..$ code : chr "200"
# ..$ service : chr "catalogwebservice"
# ..$ developerMessage: chr ""
# ..$ userMessage : chr "OK"
# ..$ errorCode : chr "1"
# ..$ docLink : chr "https://ega-archive.org"
# ..$ errorStack : chr ""
# $ response:List of 3
# ..$ numTotalResults: int 12
# ..$ resultType : chr "SampleData"
# ..$ result :'data.frame': 3 obs. of 10 variables:
# .. ..$ alias : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
# .. ..$ egaStableId : chr [1:3] "EGAN00003456789" "EGAN00003456780" "EGAN00003456781"
# .. ..$ centerName : chr [1:3] "Novartis" "Novartis" "Novartis"
# .. ..$ creationTime: chr [1:3] "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z"
# .. ..$ title : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
# .. ..$ bioSampleId : chr [1:3] "MADFG110656789" "MADFG110656790" "MADFG110656791"
# .. ..$ subjectId : chr [1:3] "JKDFG093" "JKDFG093" "JKDFG087"
# .. ..$ gender : chr [1:3] "male" "female" "male"
# .. ..$ phenotype : chr [1:3] "Cancer" "Cancer" "Cancer"
# .. ..$ attributes : logi [1:3] NA NA NA
I'm using jsonlite
, I believe it is close enough to rjson
that the same thing should work. If yours is not listed as
..$ result :'data.frame': 3 obs. of 10 variables:
in the str
-output, then just wrap it in as.data.frame
, as in
as.data.frame(jsonlite::fromJSON(json)$response$result)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论