英文:
How to get values from a json file efficiently with python?
问题
titles = []
questions = []
for i in data["data"]:
titles.append(i["title"])
for p in i["paragraphs"]:
for q in p["qas"]:
questions.append(q["question"])
print(titles)
print(questions)
英文:
I'm trying to retrieve values from different layers of a json file, I'm using a quite silly way -- get the values from one dictionary inside another dictionary through for looping. I want to get all the "title" and "question" and put them in a list or a pandas dataframe. How can I retrieve the values needed in a simpler way? How to handle json files efficiently in general?
Thanks a lot for anyone who answers the question:)
here's a piece of the json:
{
"contact": "xxx",
"version": 1.0,
"data": [
{
"title": "anges-musiciens-(national-gallery)",
"paragraphs": [
{
"qas": [
{
"answers": [{
"text": "La Vierge aux rochers"
}
],
"question": "Que concerne principalement les documents ?"
}
}
]
}
]
}
titles = []
questions = []
for i in data["data"]:
titles.append(i["title"])
for p in i["paragraphs"]:
for q in p["qas"]:
questions.append(q["question"])
print(titles)
print(questions)
答案1
得分: 0
如果结构是规则的(即总是相同的层次结构模式,字典存在时没有丢失的键),那么您可以使用嵌套列表理解来获取结果:
titles = [d["title"] for d in data["data"]]
questions = [q["question"] for d in data.get("data", [])
for p in d.get("paragraphs", [])
for q in p.get("qas", [])]
如果结构不规则,您需要在深入结构的过程中跟踪新条目。您可以使用列表(或队列)来实现:
titles = []
questions = []
more = [*data.items()] # 从第一级字典的键/值开始
while more:
key, value = more.pop(0) # 获取要处理的下一个键/值对
if isinstance(value, list): # 如果值是列表
more.extend(enumerate(value)) # 使用索引作为键添加键/值
elif isinstance(value, dict): # 如果值是字典
more.extend(value.items()) # 从其项中添加更多键/值
elif key == "title": # 对于 "title" 键,添加到 titles 列表
titles.append(value)
elif key == "question": # 对于 "question" 键,也一样
questions.append(value)
输出:
print(titles)
['anges-musiciens-(national-gallery)']
print(questions)
['Que concerne principalement les documents ?']
希望这有帮助。
英文:
If the structure is regular (i.e. always the same hierarchy patterns and no missing keys when a dictionary is present), then you can obtain your results with nested list comprehensions:
titles = [d["title"] for d in data["data"]]
questions = [q["question"] for d in data.get("data",[])
for p in d.get("paragraphs",[])
for q in p.get("qas",[])]
If the structure is not regular, you will need to keep track of new entries as you progress deeper and deeper in the structure. You can do this with a list (or a queue):
titles = []
questions = []
more = [*data.items()] # start with key/values of first level dictionary
while more:
key,value = more.pop(0) # get next key/value pair to process
if isinstance(value,list): # if value is a list
more.extend(enumerate(value)) # add key/values using indexes as keys
elif isinstance(value,dict): # if value is a dictionary
more.extend(value.items()) # add more key/values from its items
elif key == "title": # for "title" key, add to titles list
titles.append(value)
elif key == "question": # same for "question" keys
questions.append(value)
output:
print(titles)
['anges-musiciens-(national-gallery)']
print(questions)
['Que concerne principalement les documents ?']
答案2
得分: 0
如果你想返回一个DataFrame
data = {
"contact": "xxx",
"version": 1.0,
"data": [
{
"title": "anges-musiciens-(national-gallery)",
"paragraphs": [
{
"qas": [
{
"answers": [{
"text": "La Vierge aux rochers"
}],
"question": "Que concerne principalement les documents ?"
}
]
}
]
}
]
}
df = pd.json_normalize(data['data'], ['paragraphs', 'qas'], 'title')[['title', 'question']]
print(df)
英文:
If you want to return a DataFrame
data = {
"contact": "xxx",
"version": 1.0,
"data": [
{
"title": "anges-musiciens-(national-gallery)",
"paragraphs": [
{
"qas": [
{
"answers": [{
"text": "La Vierge aux rochers"
}
],
"question": "Que concerne principalement les documents ?"
}
]
}
]
}
]
}
df = pd.json_normalize(data['data'], ['paragraphs', 'qas'], 'title')[['title', 'question']]
print(df)
title question
0 anges-musiciens-(national-gallery) Que concerne principalement les documents ?
答案3
得分: -1
你可以使用递归来在嵌套结构上执行深度优先搜索:
def extract_fields(json_data, fields_of_interest=None, extracted=None):
if extracted is None:
extracted = {}
if isinstance(json_data, dict):
for field, value in json_data.items():
if field in fields_of_interest:
extracted.setdefault(field, []).append(value)
elif isinstance(value, dict) or isinstance(value, list):
extract_fields(value, fields_of_interest, extracted)
elif isinstance(json_data, list):
for x in json_data:
extract_fields(x, fields_of_interest, extracted)
return extracted
j = {'title': 'abc',
'deep': {'question': 'zyx',
'deeper': [{'title': 'def',
'question': 'wvu',
'nothing': 'hahaha'},
{'even deeper': [{'title': 'ghi',
'question': 'tsr',
'answer': 42},
{'not a title': "ceci n'est pas une pipe"}]}]}
extracted = extract_fields(j, ('title', 'question'))
print(extracted)
# {'title': ['abc', 'def', 'ghi'], 'question': ['zyx', 'wvu', 'tsr']}
英文:
You can use recursion to perform a depth-first-search on the nested structure:
def extract_fields(json_data, fields_of_interest=None, extracted=None):
if extracted is None:
extracted = {}
if isinstance(json_data, dict):
for field,value in json_data.items():
if field in fields_of_interest:
extracted.setdefault(field, []).append(value)
elif isinstance(value, dict) or isinstance(value, list):
extract_fields(value, fields_of_interest, extracted)
elif isinstance(json_data, list):
for x in json_data:
extract_fields(x, fields_of_interest, extracted)
return extracted
j = {'title': 'abc',
'deep': {'question': 'zyx',
'deeper': [{'title': 'def',
'question': 'wvu',
'nothing': 'hahaha'},
{'even deeper': [{'title': 'ghi',
'question':'tsr',
'answer': 42},
{'not a title': "ceci n'est pas une pipe"}]}]}}
extracted = extract_fields(j, ('title', 'question'))
print(extracted)
# {'title': ['abc', 'def', 'ghi'], 'question': ['zyx', 'wvu', 'tsr']}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论