将多个 JSON 值附加到单个 Pandas 列中使用 Python

huangapple go评论65阅读模式
英文:

Appending multiple values of a json into a Single Pandas column using python

问题

I have trouble in getting values from a Json and storing in a Dataframe.

My Json looks like

{
  "issues": [
    {
      "expand": "operations",
      "id": "1",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": "false"
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "2",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": false
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": false
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "3",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1002",
            "disabled": false
          }
        ]
      }
    },
    {
      "expand": "operations",
      "id": "4",
      "fields": {
        "customfield_100": [
          {
            "self": "https://url1002",
            "value": "Desktop",
            "id": "1002",
            "disabled": false
          },
          {
            "self": "https://url1001",
            "value": "Mobile",
            "id": "1001",
            "disabled": false
          },
          {
            "self": "https://url1003",
            "value": "ios",
            "id": "1003",
            "disabled": false
          }
        ]
      }
    }
  ]
}

Below is my part of the code

df2=pd.dataframe()
d=pd.json_normalize(json.loads(df1['customfield_100'].to_json(orient='record')))
filtered_component=[]
for index in range(len(issues.id)):
     if((pd.json_normalize(df1['customfield_100'][index])).size>0):
         filtered_component.append(d[0]['value'])
     else:
         filtered_component.append('No Component')
df2['Component']=filterd_component

When I list df2['Component'], I am getting the below output

'Mobile'
'Desktop'
'ios'
'Desktop'

I would like my output to be like (when I list df2[components]), i.e., if more than a single value for customfield_100, then I would like those to be separated by ";".

I am not sure how the loop/code should be written for it.

'Mobile'
'Desktop';'Mobile'
'ios'
'Desktop';'Mobile';'ios'
英文:

I have trouble in getting values from a Json and storing in a Dataframe.
My Json looks like

{
"issues": [
{
"expand": "operations",
"id": "1",
"fields": {
"customfield_100": [
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": "false"
}
]
}
},
{
"expand": "operations",
"id": "2",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": false
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": false
}
]
}
},
{
"expand": "operations",
"id": "3",
"fields": {
"customfield_100": [
{
"self": "https://url1003",
"value": "ios",
"id": "1002",
"disabled": false
}
]
}
},
{
"expand": "operations",
"id": "4",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": false
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": false
},
{
"self": "https://url1003",
"value": "ios",
"id": "1003",
"disabled": false
}
]
}
}
]
}

Below is my part of the code

        df2=pd.dataframe()
d=pd.json_normalize(json.loads(df1['customfield_100'].to_json(orient='record')))
filtered_component=[]
for index in range(len(issues.id)):
if((pd.json_normalize(df1['customfield_100'][index])).size>0):
filtered_component.append(d[0][index]['value']
else:
filtered_component.append('No Component')
df2['Component']=filterd_component

When i list df2['Component'], i am getting the below output

'Mobile'
'Desktop'
'ios'
'Desktop'

I would like my output to be like(when i list df2[components]) i.e if more than single value for customfield_100 then i would like those to be separated by ;. I am not sure how the loop/code should be written for it

'Mobile'
'Desktop';'Mobile'
'ios'
'Desktop';'Mobile';'ios'

答案1

得分: 1

以下是翻译好的部分:

Another possible solution:
df = pd.json_normalize(
data, 
record_path=['issues', 'fields', 'customfield_100'], 
meta=[['issues', 'id']])
df.groupby('issues.id')['value'].agg(';'.join)
where
data = {
"issues": [
{
"expand": "operations",
"id": "1",
"fields": {
"customfield_100": [
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "2",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": False
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "3",
"fields": {
"customfield_100": [
{
"self": "https://url1003",
"value": "ios",
"id": "1002",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "4",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": False
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
},
{
"self": "https://url1003",
"value": "ios",
"id": "1003",
"disabled": False
}
]
}
}
]
}
Output:
issues.id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object

希望这对你有帮助。

英文:

Another possible solution:

df = pd.json_normalize(
data, 
record_path=['issues', 'fields', 'customfield_100'], 
meta=[['issues', 'id']])
df.groupby('issues.id')['value'].agg(';'.join)

where

data = {
"issues": [
{
"expand": "operations",
"id": "1",
"fields": {
"customfield_100": [
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "2",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": False
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "3",
"fields": {
"customfield_100": [
{
"self": "https://url1003",
"value": "ios",
"id": "1002",
"disabled": False
}
]
}
},
{
"expand": "operations",
"id": "4",
"fields": {
"customfield_100": [
{
"self": "https://url1002",
"value": "Desktop",
"id": "1002",
"disabled": False
},
{
"self": "https://url1001",
"value": "Mobile",
"id": "1001",
"disabled": False
},
{
"self": "https://url1003",
"value": "ios",
"id": "1003",
"disabled": False
}
]
}
}
]
}

Output:

issues.id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object

答案2

得分: 0

如果 `data` 包含了您解析的 Json 数据则可以执行以下操作

```py
all_data = []
for i in data['issues']:
    for k, v in i['fields'].items():
        for vv in v:
            all_data.append({'main_id': i['id'], 'field_id': k, **vv})

df = pd.DataFrame(all_data)
print(df)

这将打印:

  main_id         field_id             self    value    id disabled
0       1  customfield_100  https://url1001   Mobile  1001    false
1       2  customfield_100  https://url1002  Desktop  1002    False
2       2  customfield_100  https://url1001   Mobile  1001    False
3       3  customfield_100  https://url1003      ios  1002    False
4       4  customfield_100  https://url1002  Desktop  1002    False
5       4  customfield_100  https://url1001   Mobile  1001    False
6       4  customfield_100  https://url1003      ios  1003    False

然后您可以按 main_id 进行分组,例如:

df = df.groupby('main_id')['value'].agg(';'.join)
print(df)

这将打印:

main_id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object

<details>
<summary>英文:</summary>
If `data` contains your parsed Json data then you can do:
```py
all_data = []
for i in data[&#39;issues&#39;]:
for k, v in i[&#39;fields&#39;].items():
for vv in v:
all_data.append({&#39;main_id&#39;: i[&#39;id&#39;], &#39;field_id&#39;: k, **vv})
df = pd.DataFrame(all_data)
print(df)

This prints:

  main_id         field_id             self    value    id disabled
0       1  customfield_100  https://url1001   Mobile  1001    false
1       2  customfield_100  https://url1002  Desktop  1002    False
2       2  customfield_100  https://url1001   Mobile  1001    False
3       3  customfield_100  https://url1003      ios  1002    False
4       4  customfield_100  https://url1002  Desktop  1002    False
5       4  customfield_100  https://url1001   Mobile  1001    False
6       4  customfield_100  https://url1003      ios  1003    False

Then you can group by main_id, e.g.:

df = df.groupby(&#39;main_id&#39;)[&#39;value&#39;].agg(&#39;;&#39;.join)
print(df)

This prints:

main_id
1                Mobile
2        Desktop;Mobile
3                   ios
4    Desktop;Mobile;ios
Name: value, dtype: object

huangapple
  • 本文由 发表于 2023年6月19日 23:37:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76508142.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定