2023年1月9日 16:46:24go评论75阅读模式

英文:

Reshaping a large dictionary

问题

我正在进行XBRL文档解析工作。我已经达到了一个阶段，其中我有一个大的字典结构，类似于这样...

我正在处理的字典示例

由于描述我正在尝试实现的模式有点具有挑战性，我只是提供了一个我希望它成为的示例...

我正在努力实现的示例

由于我对编程相对新手，我已经花了好几天的时间在这个问题上。从这里开始，我尝试了不同的方法，包括循环、列表和字典推导...

for k in storage_gaap:
    if 'context_ref' in storage_gaap[k]:
        for _k in storage_gaap[k]['context_ref']:
            storage_gaap[k]['context_ref'] = {_k}

其中storage_gaap是主字典。抱歉附上了图片，但这样更清晰地看到了字典。

我真的会非常感激任何和所有的帮助。

英文:

I am working on xbrl document parsing. I got to a point where I have a large dic structured like this....

sample of a dictionary I'm working on

Since it's bit challenging to describe the pattern of what I'm trying to achieve I just put an example of what I'd like it to be...

sample of what I'm trying to achieve

Since I'm fairly new to programing, I'm hustling for days with this. Trying different approaches with loops, list and dic comprehension starting from here...


for k in storage_gaap:
    if &#39;context_ref&#39; in storage_gaap[k]:
        for _k in storage_gaap[k][&#39;context_ref&#39;]:
            storage_gaap[k][&#39;context_ref&#39;]={_k}```

storage_gaap being the master dictionary. Sorry for attaching pictures, but it's just much clearer to see the dictionary

I'd really appreciate any and ever help

答案1

得分: 0

以下是使用zip和字典推导解决问题的示例，使用了类似结构的玩具数据：

import itertools
import pprint

# 类似提供的截图的示例数据
data = {
    'a': {
        'id': 'a',
        'vals': ['a1', 'a2', 'a3'],
        'val_num': [1, 2, 3]
    },
    'b': {
        'id': 'b',
        'vals': ['b1', 'b2', 'b3'],
        'val_num': [4, 5, 6]
    }
}

# 接受一个键的元组和值的元组列表，并将它们转化为字典列表
# 例如 ('id', 'val'), [('a', 1), ('b', 2)] => [{'id': 'a', 'val': 1}, {'id': 'b', 'val': 2}]
def get_list_of_dict(keys, list_of_tuples):
    list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
    return list_of_dict

def process_dict(key, values):
    # 将具有值列表的字典转化为字典列表
    list_of_dicts = get_list_of_dict(('id', 'val', 'val_num'), zip(itertools.repeat(key, len(values['vals'])), values['vals'], values['val_num']))
    # 使用字典推导根据每个字典的 'val' 属性对它们进行分组
    return {d['val']: {k: v for k, v in d.items() if k != 'val'} for d in list_of_dicts}

# 重新组织以将字典置于 'context_values' 键下
processed = {k: {'context_values': process_dict(k, v)} for k, v in data.items()}

# {'a': {'context_values': {'a1': {'id': 'a', 'val_num': 1},
#                           'a2': {'id': 'a', 'val_num': 2},
#                           'a3': {'id': 'a', 'val_num': 3}}},
#  'b': {'context_values': {'b1': {'id': 'b', 'val_num': 4},
#                           'b2': {'id': 'b', 'val_num': 5},
#                           'b3': {'id': 'b', 'val_num': 6}}}}
pprint.pprint(processed)

英文:

Here's a solution using zip and dictionary comprehension to do what you're trying to do using toy data in a similar structure.

import itertools
import pprint

# Sample data similar to provided screenshots
data = {
    &#39;a&#39;: {
        &#39;id&#39;: &#39;a&#39;,
        &#39;vals&#39;: [&#39;a1&#39;, &#39;a2&#39;, &#39;a3&#39;],
        &#39;val_num&#39;: [1, 2, 3]
    },
    &#39;b&#39;: {
        &#39;id&#39;: &#39;b&#39;,
        &#39;vals&#39;: [&#39;b1&#39;, &#39;b2&#39;, &#39;b3&#39;],
        &#39;val_num&#39;: [4, 5, 6]
    }
}

# Takes a tuple of keys, and a list of tuples of values, and transforms them into a list of dicts
# i.e (&#39;id&#39;, &#39;val&#39;), [(&#39;a&#39;, 1), (&#39;b&#39;, 2) =&gt; [{&#39;id&#39;: &#39;a&#39;, &#39;val&#39;: 1}, {&#39;id&#39;: &#39;b&#39;, &#39;val&#39;: 2}]
def get_list_of_dict(keys, list_of_tuples):
     list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
     return list_of_dict

def process_dict(key, values):
    # Transform the dict with lists of values into a list of dicts
    list_of_dicts = get_list_of_dict((&#39;id&#39;, &#39;val&#39;, &#39;val_num&#39;), zip(itertools.repeat(key, len(values[&#39;vals&#39;])), values[&#39;vals&#39;], values[&#39;val_num&#39;]))
    # Dictionary comprehension to group them based on the &#39;val&#39; property of each dict
    return {d[&#39;val&#39;]: {k:v for k,v in d.items() if k != &#39;val&#39;} for d in list_of_dicts}

# Reorganize to put dict under a &#39;context_values&#39; key
processed = {k: {&#39;context_values&#39;: process_dict(k, v)} for k,v in data.items()}

# {&#39;a&#39;: {&#39;context_values&#39;: {&#39;a1&#39;: {&#39;id&#39;: &#39;a&#39;, &#39;val_num&#39;: 1},
#                           &#39;a2&#39;: {&#39;id&#39;: &#39;a&#39;, &#39;val_num&#39;: 2},
#                           &#39;a3&#39;: {&#39;id&#39;: &#39;a&#39;, &#39;val_num&#39;: 3}}},
#  &#39;b&#39;: {&#39;context_values&#39;: {&#39;b1&#39;: {&#39;id&#39;: &#39;b&#39;, &#39;val_num&#39;: 4},
#                           &#39;b2&#39;: {&#39;id&#39;: &#39;b&#39;, &#39;val_num&#39;: 5},
#                           &#39;b3&#39;: {&#39;id&#39;: &#39;b&#39;, &#39;val_num&#39;: 6}}}}
pprint.pprint(processed)

答案2

得分: 0

以下是您的代码的中文翻译部分：

好的，这是我案例中的更新解决方案。对我来说，关键在于zip函数，因为它只迭代传递的最小列表。解决方案是itertools.cycle方法。以下是代码：

data = {'us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding': {'context_ref': ['D20210801-20220731',
  'D20200801-20210731',
  'D20190801-20200731',
  'D20210801-20220731',
  'D20200801-20210731',
  'D20190801-20200731'],
 'decimals': ['-5', '-5', '-5', '-5', '-5', '-5'],
 'id': ['us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding'],
 'master_id': ['us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding'],
 'unit_ref': ['shares', 'shares', 'shares', 'shares', 'shares', 'shares'],
 'value': ['98500000', '96400000', '96900000', '98500000', '96400000', '96900000']}

def get_list_of_dict(keys, list_of_tuples):
    list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
    return list_of_dict

def process_dict(k, values):
    list_of_dicts = get_list_of_dict(('context_ref', 'decimals', 'id', 'master_id', 'unit_ref', 'value'),
                    zip((values['context_ref']), values['decimals'], itertools.cycle(values['id']),
                    itertools.cycle(values['master_id']), values['unit_ref'], values['value']))
    return {d['context_ref']: {k:v for k,v in d.items() if k != 'context_ref'} for d in list_of_dicts}

processed = {k: {'context_values': process_dict(k, v)} for k, v in data.items()}

pprint.pprint(processed)

希望对您有所帮助！如果您需要进一步的翻译或有其他问题，请随时提问。

英文:

Ok, Here is the updated solution from my case. Catch for me was the was the zip function since it only iterates over the smallest list passed. Solution was the itertools.cycle method Here is the code:

data =  {&#39;us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding&#39;: {&#39;context_ref&#39;: [&#39;D20210801-20220731&#39;,
                                                                          &#39;D20200801-20210731&#39;,
                                                                          &#39;D20190801-20200731&#39;,
                                                                          &#39;D20210801-20220731&#39;,
                                                                          &#39;D20200801-20210731&#39;,
                                                                          &#39;D20190801-20200731&#39;],
                                                          &#39;decimals&#39;: [&#39;-5&#39;,
                                                                       &#39;-5&#39;,
                                                                       &#39;-5&#39;,
                                                                       &#39;-5&#39;,
                                                                       &#39;-5&#39;,
                                                                       &#39;-5&#39;],
                                                          &#39;id&#39;: [&#39;us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding&#39;],
                                                          &#39;master_id&#39;: [&#39;us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding&#39;],
                                                          &#39;unit_ref&#39;: [&#39;shares&#39;,
                                                                       &#39;shares&#39;,
                                                                       &#39;shares&#39;,
                                                                       &#39;shares&#39;,
                                                                       &#39;shares&#39;,
                                                                       &#39;shares&#39;],
                                                          &#39;value&#39;: [&#39;98500000&#39;,
                                                                    &#39;96400000&#39;,
                                                                    &#39;96900000&#39;,
                                                                    &#39;98500000&#39;,
                                                                    &#39;96400000&#39;,
                                                                    &#39;96900000&#39;]},


def get_list_of_dict(keys, list_of_tuples):
list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
return list_of_dict

def process_dict(k, values):
list_of_dicts = get_list_of_dict((&#39;context_ref&#39;, &#39;decimals&#39;, &#39;id&#39;,&#39;master_id&#39;,&#39;unit_ref&#39;,&#39;value&#39;),
                zip((values[&#39;context_ref&#39;]),values[&#39;decimals&#39;],itertools.cycle(values[&#39;id&#39;]),
                itertools.cycle(values[&#39;master_id&#39;]),values[&#39;unit_ref&#39;], values[&#39;value&#39;]))
return {d[&#39;context_ref&#39;]: {k:v for k,v in d.items()if k != &#39;context_ref&#39;} for d in list_of_dicts}

processed = {k: {&#39;context_values&#39;: process_dict(k, v)} for k,v in data.items()}

pprint.pprint(processed)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重塑一个大型字典

问题

答案1

答案2

无法使用Selenium（Python）点击搜索按钮。

Python问题涉及到库。我遇到了一些错误。

Automate the Boring Stuff Practice Problem: Collatz Sequence–Running the sequence on the number 1 without going into an infinite loop

在Python中将值以特定格式追加到列表

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论