如何将JSON文件中的信息转换为R中的数据框?

huangapple go评论68阅读模式
英文:

How to convert information in a JSON file into a dataframe in R?

问题

我有一个名为data.json的JSON文件中包含以下信息:

{
  "header": {
    "apiVersion": "v1",
    "code": "200",
    "service": "catalogwebservice",
    "developerMessage": "",
    "userMessage": "OK",
    "errorCode": "1",
    "docLink": "https://ega-archive.org",
    "errorStack": ""
  },
  "response": {
    "numTotalResults": 12,
    "resultType": "SampleData",
    "result": [
      {
        "alias": "JKDFG093.T2",
        "egaStableId": "EGAN00003456789",
        "centerName": "Novartis",
        "creationTime": "2016-05-13Y17:08.001Z",
        "title": "JKDFG093.T2",
        "bioSampleId": "MADFG110656789",
        "subjectId": "JKDFG093",
        "gender": "male",
        "phenotype": "Cancer",
        "attributes": null
      },
      {
        "alias": "JKDFG093.T1",
        "egaStableId": "EGAN00003456780",
        "centerName": "Novartis",
        "creationTime": "2016-05-13Y17:08.001Z",
        "title": "JKDFG093.T1",
        "bioSampleId": "MADFG110656790",
        "subjectId": "JKDFG093",
        "gender": "female",
        "phenotype": "Cancer",
        "attributes": null
      },
      {
        "alias": "JKDFG087.T1",
        "egaStableId": "EGAN00003456781",
        "centerName": "Novartis",
        "creationTime": "2016-05-13Y17:08.001Z",
        "title": "JKDFG087.T1",
        "bioSampleId": "MADFG110656791",
        "subjectId": "JKDFG087",
        "gender": "male",
        "phenotype": "Cancer",
        "attributes": null
      }
    ]
  }
}

我想将这个JSON文件中的信息转换成一个数据框。我需要的信息包括上述JSON文件中的alias, egaStableId, centerName, creationTime, title, bioSampleId, subjectId, gender, phenotype, and attributes作为列名,以及它们在数据框中显示的相应信息。

我已经在R中加载了JSON文件并尝试将其转换为数据框,但出现了一些错误。

library(rjson)
data <- rjson::fromJSON(file = "data.json")
json_data_frame <- as.data.frame(data)

我收到的错误消息是:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 1, 0

任何帮助将不胜感激。谢谢!

英文:

I have some information in the JSON file named data.json like below:

{
  &quot;header&quot; : {
    &quot;apiVersion&quot; : &quot;v1&quot;,
    &quot;code&quot; : &quot;200&quot;,
    &quot;service&quot; : &quot;catalogwebservice&quot;,
    &quot;developerMessage&quot; : &quot;&quot;,
    &quot;userMessage&quot; : &quot;OK&quot;,
    &quot;errorCode&quot; : &quot;1&quot;,
    &quot;docLink&quot; : &quot;https://ega-archive.org&quot;,
    &quot;errorStack&quot; : &quot;&quot;
  },
  &quot;response&quot; : {
    &quot;numTotalResults&quot; : 12,
    &quot;resultType&quot; : &quot;SampleData&quot;,
    &quot;result&quot; : [ {
      &quot;alias&quot; : &quot;JKDFG093.T2&quot;,
      &quot;egaStableId&quot; : &quot;EGAN00003456789&quot;,
      &quot;centerName&quot; : &quot;Novartis&quot;,
      &quot;creationTime&quot; : &quot;2016-05-13Y17:08.001Z&quot;,
      &quot;title&quot; : &quot;JKDFG093.T2&quot;,
      &quot;bioSampleId&quot; : &quot;MADFG110656789&quot;,
      &quot;subjectId&quot; : &quot;JKDFG093&quot;,
      &quot;gender&quot; : &quot;male&quot;,
      &quot;phenotype&quot; : &quot;Cancer&quot;,
      &quot;attributes&quot; : null
    }, {
      &quot;alias&quot; : &quot;JKDFG093.T1&quot;,
      &quot;egaStableId&quot; : &quot;EGAN00003456780&quot;,
      &quot;centerName&quot; : &quot;Novartis&quot;,
      &quot;creationTime&quot; : &quot;2016-05-13Y17:08.001Z&quot;,
      &quot;title&quot; : &quot;JKDFG093.T1&quot;,
      &quot;bioSampleId&quot; : &quot;MADFG110656790&quot;,
      &quot;subjectId&quot; : &quot;JKDFG093&quot;,
      &quot;gender&quot; : &quot;female&quot;,
      &quot;phenotype&quot; : &quot;Cancer&quot;,
      &quot;attributes&quot; : null
    }, {
      &quot;alias&quot; : &quot;JKDFG087.T1&quot;,
      &quot;egaStableId&quot; : &quot;EGAN00003456781&quot;,
      &quot;centerName&quot; : &quot;Novartis&quot;,
      &quot;creationTime&quot; : &quot;2016-05-13Y17:08.001Z&quot;,
      &quot;title&quot; : &quot;JKDFG087.T1&quot;,
      &quot;bioSampleId&quot; : &quot;MADFG110656791&quot;,
      &quot;subjectId&quot; : &quot;JKDFG087&quot;,
      &quot;gender&quot; : &quot;male&quot;,
      &quot;phenotype&quot; : &quot;Cancer&quot;,
      &quot;attributes&quot; : null
    } ]
  }
}

I want to convert the information in a JSON file into a data frame. I need information like alias, egaStableId, centerName, creationTime, title, bioSampleId, subjectId, gender, phenotype, and attributes from the above JSON file as column names and their respective information showing in the data frame.

I loaded the JSON file in R and tried converting it into a data frame, but ended up with some errors.

library(rjson)
data &lt;- rjson::fromJSON(file = &quot;data.json&quot;)
json_data_frame &lt;- as.data.frame(data)

The error I got:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 1, 0

Any help is appreciated. Thank you !!

答案1

得分: 3

你需要挖掘数据以获得结果,即$response$result

将此处的 json 替换为您的文件名:

jsonlite::fromJSON(json)$response$result
#         alias     egaStableId centerName          creationTime       title    bioSampleId subjectId gender phenotype attributes
# 1 JKDFG093.T2 EGAN00003456789   Novartis 2016-05-13Y17:08.001Z JKDFG093.T2 MADFG110656789  JKDFG093   男性     癌症       NA
# 2 JKDFG093.T1 EGAN00003456780   Novartis 2016-05-13Y17:08.001Z JKDFG093.T1 MADFG110656790  JKDFG093   女性     癌症       NA
# 3 JKDFG087.T1 EGAN00003456781   Novartis 2016-05-13Y17:08.001Z JKDFG087.T1 MADFG110656791  JKDFG087   男性     癌症       NA

换句话说,如果您查看fromJSON的输出,您会看到列表中嵌套了一个数据框(稍微深层次):

str(jsonlite::fromJSON(json))
# List of 2
#  $ header  :List of 8
#   ..$ apiVersion      : chr "v1"
#   ..$ code            : chr "200"
#   ..$ service         : chr "catalogwebservice"
#   ..$ developerMessage: chr ""
#   ..$ userMessage     : chr "OK"
#   ..$ errorCode       : chr "1"
#   ..$ docLink         : chr "https://ega-archive.org"
#   ..$ errorStack      : chr ""
#  $ response:List of 3
#   ..$ numTotalResults: int 12
#   ..$ resultType     : chr "SampleData"
#   ..$ result         :'data.frame':	3 obs. of  10 variables:
#   .. ..$ alias       : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
#   .. ..$ egaStableId : chr [1:3] "EGAN00003456789" "EGAN00003456780" "EGAN00003456781"
#   .. ..$ centerName  : chr [1:3] "Novartis" "Novartis" "Novartis"
#   .. ..$ creationTime: chr [1:3] "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z" "2016-05-13Y17:08.001Z"
#   .. ..$ title       : chr [1:3] "JKDFG093.T2" "JKDFG093.T1" "JKDFG087.T1"
#   .. ..$ bioSampleId : chr [1:3] "MADFG110656789" "MADFG110656790" "MADFG110656791"
#   .. ..$ subjectId   : chr [1:3] "JKDFG093" "JKDFG093" "JKDFG087"
#   .. ..$ gender      : chr [1:3] "male" "female" "male"
#   .. ..$ phenotype   : chr [1:3] "Cancer" "Cancer" "Cancer"
#   .. ..$ attributes  : logi [1:3] NA NA NA

我使用 jsonlite,我相信它接近 rjson,同样的操作应该有效。如果您的输出在str输出中不是

  ..$ result         :'data.frame':	3 obs. of  10 variables:

那么只需将其包装在 as.data.frame 中,如下所示:

as.data.frame(jsonlite::fromJSON(json)$response$result)
英文:

You need to dig into the data to get to the results, namely $response$result.

Replace json here with your filename:

jsonlite::fromJSON(json)$response$result
#         alias     egaStableId centerName          creationTime       title    bioSampleId subjectId gender phenotype attributes
# 1 JKDFG093.T2 EGAN00003456789   Novartis 2016-05-13Y17:08.001Z JKDFG093.T2 MADFG110656789  JKDFG093   male    Cancer         NA
# 2 JKDFG093.T1 EGAN00003456780   Novartis 2016-05-13Y17:08.001Z JKDFG093.T1 MADFG110656790  JKDFG093 female    Cancer         NA
# 3 JKDFG087.T1 EGAN00003456781   Novartis 2016-05-13Y17:08.001Z JKDFG087.T1 MADFG110656791  JKDFG087   male    Cancer         NA

Namely, if you look at the output from fromJSON, you'll see that there is a frame nested (a little deeply) in the list:

str(jsonlite::fromJSON(json))
# List of 2
#  $ header  :List of 8
#   ..$ apiVersion      : chr &quot;v1&quot;
#   ..$ code            : chr &quot;200&quot;
#   ..$ service         : chr &quot;catalogwebservice&quot;
#   ..$ developerMessage: chr &quot;&quot;
#   ..$ userMessage     : chr &quot;OK&quot;
#   ..$ errorCode       : chr &quot;1&quot;
#   ..$ docLink         : chr &quot;https://ega-archive.org&quot;
#   ..$ errorStack      : chr &quot;&quot;
#  $ response:List of 3
#   ..$ numTotalResults: int 12
#   ..$ resultType     : chr &quot;SampleData&quot;
#   ..$ result         :&#39;data.frame&#39;:	3 obs. of  10 variables:
#   .. ..$ alias       : chr [1:3] &quot;JKDFG093.T2&quot; &quot;JKDFG093.T1&quot; &quot;JKDFG087.T1&quot;
#   .. ..$ egaStableId : chr [1:3] &quot;EGAN00003456789&quot; &quot;EGAN00003456780&quot; &quot;EGAN00003456781&quot;
#   .. ..$ centerName  : chr [1:3] &quot;Novartis&quot; &quot;Novartis&quot; &quot;Novartis&quot;
#   .. ..$ creationTime: chr [1:3] &quot;2016-05-13Y17:08.001Z&quot; &quot;2016-05-13Y17:08.001Z&quot; &quot;2016-05-13Y17:08.001Z&quot;
#   .. ..$ title       : chr [1:3] &quot;JKDFG093.T2&quot; &quot;JKDFG093.T1&quot; &quot;JKDFG087.T1&quot;
#   .. ..$ bioSampleId : chr [1:3] &quot;MADFG110656789&quot; &quot;MADFG110656790&quot; &quot;MADFG110656791&quot;
#   .. ..$ subjectId   : chr [1:3] &quot;JKDFG093&quot; &quot;JKDFG093&quot; &quot;JKDFG087&quot;
#   .. ..$ gender      : chr [1:3] &quot;male&quot; &quot;female&quot; &quot;male&quot;
#   .. ..$ phenotype   : chr [1:3] &quot;Cancer&quot; &quot;Cancer&quot; &quot;Cancer&quot;
#   .. ..$ attributes  : logi [1:3] NA NA NA

I'm using jsonlite, I believe it is close enough to rjson that the same thing should work. If yours is not listed as

  ..$ result         :&#39;data.frame&#39;:	3 obs. of  10 variables:

in the str-output, then just wrap it in as.data.frame, as in

as.data.frame(jsonlite::fromJSON(json)$response$result)

huangapple
  • 本文由 发表于 2023年2月19日 07:50:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75497144.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定