2023年3月7日 04:12:28go评论84阅读模式

英文:

How to get values from a json file efficiently with python?

问题

titles = []
questions = []

for i in data["data"]:
titles.append(i["title"])

for p in i["paragraphs"]:
    for q in p["qas"]:
        questions.append(q["question"])

print(titles)
print(questions)

英文:

I'm trying to retrieve values from different layers of a json file, I'm using a quite silly way -- get the values from one dictionary inside another dictionary through for looping. I want to get all the "title" and "question" and put them in a list or a pandas dataframe. How can I retrieve the values needed in a simpler way? How to handle json files efficiently in general?
Thanks a lot for anyone who answers the question:)

here's a piece of the json:

{
    &quot;contact&quot;: &quot;xxx&quot;,
    &quot;version&quot;: 1.0,
    &quot;data&quot;: [
        {
            &quot;title&quot;: &quot;anges-musiciens-(national-gallery)&quot;,
            &quot;paragraphs&quot;: [
                {
                    &quot;qas&quot;: [
                        {
                            &quot;answers&quot;: [{
                                    &quot;text&quot;: &quot;La Vierge aux rochers&quot;
                                }
                            ],
                            
                            &quot;question&quot;: &quot;Que concerne principalement les documents ?&quot;
                        }
                 }
             ]
        }
     ]
}

titles = []
questions = []

for i in data[&quot;data&quot;]:
    titles.append(i[&quot;title&quot;])

    for p in i[&quot;paragraphs&quot;]:
        for q in p[&quot;qas&quot;]:
            questions.append(q[&quot;question&quot;])
    
print(titles)
print(questions)

答案1

得分: 0

如果结构是规则的（即总是相同的层次结构模式，字典存在时没有丢失的键），那么您可以使用嵌套列表理解来获取结果：

titles = [d["title"] for d in data["data"]]
questions = [q["question"] for d in data.get("data", [])
                           for p in d.get("paragraphs", [])
                           for q in p.get("qas", [])]

如果结构不规则，您需要在深入结构的过程中跟踪新条目。您可以使用列表（或队列）来实现：

titles = []
questions = []
more = [*data.items()]  # 从第一级字典的键/值开始
while more:
    key, value = more.pop(0)  # 获取要处理的下一个键/值对
    if isinstance(value, list):  # 如果值是列表
        more.extend(enumerate(value))  # 使用索引作为键添加键/值
    elif isinstance(value, dict):  # 如果值是字典
        more.extend(value.items())  # 从其项中添加更多键/值
    elif key == "title":  # 对于 "title" 键，添加到 titles 列表
        titles.append(value)
    elif key == "question":  # 对于 "question" 键，也一样
        questions.append(value)

输出：

print(titles)
['anges-musiciens-(national-gallery)']

print(questions)

['Que concerne principalement les documents ?']

希望这有帮助。

英文:

If the structure is regular (i.e. always the same hierarchy patterns and no missing keys when a dictionary is present), then you can obtain your results with nested list comprehensions:

titles    = [d[&quot;title&quot;] for d in data[&quot;data&quot;]]
questions = [q[&quot;question&quot;] for d in data.get(&quot;data&quot;,[])
                           for p in d.get(&quot;paragraphs&quot;,[])
                           for q in p.get(&quot;qas&quot;,[])]

If the structure is not regular, you will need to keep track of new entries as you progress deeper and deeper in the structure. You can do this with a list (or a queue):

titles    = []
questions = []
more      = [*data.items()]  # start with key/values of first level dictionary
while more:
    key,value = more.pop(0)            # get next key/value pair to process
    if isinstance(value,list):         # if value is a list
        more.extend(enumerate(value))  # add key/values using indexes as keys
    elif isinstance(value,dict):       # if value is a dictionary
        more.extend(value.items())     # add more key/values from its items
    elif key == &quot;title&quot;:               # for &quot;title&quot; key, add to titles list
        titles.append(value)
    elif key == &quot;question&quot;:            # same for &quot;question&quot; keys
        questions.append(value)

output:

print(titles)
[&#39;anges-musiciens-(national-gallery)&#39;]

print(questions)

[&#39;Que concerne principalement les documents ?&#39;]

答案2

得分: 0

如果你想返回一个DataFrame

data = {
    "contact": "xxx",
    "version": 1.0,
    "data": [
        {
            "title": "anges-musiciens-(national-gallery)",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "answers": [{
                                "text": "La Vierge aux rochers"
                            }],
                            "question": "Que concerne principalement les documents ?"
                        }
                    ]
                }
            ]
        }
    ]
}

df = pd.json_normalize(data['data'], ['paragraphs', 'qas'], 'title')[['title', 'question']]
print(df)

英文:

If you want to return a DataFrame

data = {
    &quot;contact&quot;: &quot;xxx&quot;,
    &quot;version&quot;: 1.0,
    &quot;data&quot;: [
        {
            &quot;title&quot;: &quot;anges-musiciens-(national-gallery)&quot;,
            &quot;paragraphs&quot;: [
                {
                    &quot;qas&quot;: [
                        {
                            &quot;answers&quot;: [{
                                    &quot;text&quot;: &quot;La Vierge aux rochers&quot;
                                }
                            ],
                            
                            &quot;question&quot;: &quot;Que concerne principalement les documents ?&quot;
                        }
                    ]
                }
            ]
        }
    ]
}

df = pd.json_normalize(data[&#39;data&#39;], [&#39;paragraphs&#39;, &#39;qas&#39;], &#39;title&#39;)[[&#39;title&#39;, &#39;question&#39;]]
print(df)

                                title                            question  
0  anges-musiciens-(national-gallery)  Que concerne principalement les documents ?

答案3

得分: -1

你可以使用递归来在嵌套结构上执行深度优先搜索：

def extract_fields(json_data, fields_of_interest=None, extracted=None):
    if extracted is None:
        extracted = {}
    if isinstance(json_data, dict):
        for field, value in json_data.items():
            if field in fields_of_interest:
                extracted.setdefault(field, []).append(value)
            elif isinstance(value, dict) or isinstance(value, list):
                extract_fields(value, fields_of_interest, extracted)
    elif isinstance(json_data, list):
        for x in json_data:
            extract_fields(x, fields_of_interest, extracted)
    return extracted

j = {'title': 'abc',
     'deep': {'question': 'zyx',
              'deeper': [{'title': 'def',
                          'question': 'wvu',
                          'nothing': 'hahaha'},
                         {'even deeper': [{'title': 'ghi',
                                           'question': 'tsr',
                                           'answer': 42},
                                          {'not a title': "ceci n'est pas une pipe"}]}]}

extracted = extract_fields(j, ('title', 'question'))

print(extracted)
# {'title': ['abc', 'def', 'ghi'], 'question': ['zyx', 'wvu', 'tsr']}

英文:

You can use recursion to perform a depth-first-search on the nested structure:

def extract_fields(json_data, fields_of_interest=None, extracted=None):
    if extracted is None:
        extracted = {}
    if isinstance(json_data, dict):
        for field,value in json_data.items():
            if field in fields_of_interest:
                extracted.setdefault(field, []).append(value)
            elif isinstance(value, dict) or isinstance(value, list):
                extract_fields(value, fields_of_interest, extracted)
    elif isinstance(json_data, list):
        for x in json_data:
            extract_fields(x, fields_of_interest, extracted)
    return extracted

j = {&#39;title&#39;: &#39;abc&#39;,
     &#39;deep&#39;: {&#39;question&#39;: &#39;zyx&#39;,
              &#39;deeper&#39;: [{&#39;title&#39;: &#39;def&#39;,
                          &#39;question&#39;: &#39;wvu&#39;,
                          &#39;nothing&#39;: &#39;hahaha&#39;},
                         {&#39;even deeper&#39;: [{&#39;title&#39;: &#39;ghi&#39;,
                                           &#39;question&#39;:&#39;tsr&#39;,
                                           &#39;answer&#39;: 42},
                                          {&#39;not a title&#39;: &quot;ceci n&#39;est pas une pipe&quot;}]}]}}

extracted = extract_fields(j, (&#39;title&#39;, &#39;question&#39;))

print(extracted)
# {&#39;title&#39;: [&#39;abc&#39;, &#39;def&#39;, &#39;ghi&#39;], &#39;question&#39;: [&#39;zyx&#39;, &#39;wvu&#39;, &#39;tsr&#39;]}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何高效使用Python从JSON文件中获取数值？

问题

答案1

答案2

答案3

用联接而不是 isin 填充值。

传递命令给KiTTY界面

In Django, how to get the return of two functions in the same html page? Error: TypeError: kwargs argument must be a dict, but got function

Issue with appending a list in Python.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论