重塑一个大型字典

huangapple go评论65阅读模式
英文:

Reshaping a large dictionary

问题

我正在进行XBRL文档解析工作。我已经达到了一个阶段,其中我有一个大的字典结构,类似于这样...

我正在处理的字典示例

由于描述我正在尝试实现的模式有点具有挑战性,我只是提供了一个我希望它成为的示例...

我正在努力实现的示例

由于我对编程相对新手,我已经花了好几天的时间在这个问题上。从这里开始,我尝试了不同的方法,包括循环、列表和字典推导...

for k in storage_gaap:
    if 'context_ref' in storage_gaap[k]:
        for _k in storage_gaap[k]['context_ref']:
            storage_gaap[k]['context_ref'] = {_k}

其中storage_gaap是主字典。抱歉附上了图片,但这样更清晰地看到了字典。

我真的会非常感激任何和所有的帮助。

英文:

I am working on xbrl document parsing. I got to a point where I have a large dic structured like this....

sample of a dictionary I'm working on

Since it's bit challenging to describe the pattern of what I'm trying to achieve I just put an example of what I'd like it to be...

sample of what I'm trying to achieve

Since I'm fairly new to programing, I'm hustling for days with this. Trying different approaches with loops, list and dic comprehension starting from here...


for k in storage_gaap:
    if 'context_ref' in storage_gaap[k]:
        for _k in storage_gaap[k]['context_ref']:
            storage_gaap[k]['context_ref']={_k}```

storage_gaap being the master dictionary. Sorry for attaching pictures, but it's just much clearer to see the dictionary

I'd really appreciate any and ever help

答案1

得分: 0

以下是使用zip和字典推导解决问题的示例,使用了类似结构的玩具数据:

import itertools
import pprint

# 类似提供的截图的示例数据
data = {
    'a': {
        'id': 'a',
        'vals': ['a1', 'a2', 'a3'],
        'val_num': [1, 2, 3]
    },
    'b': {
        'id': 'b',
        'vals': ['b1', 'b2', 'b3'],
        'val_num': [4, 5, 6]
    }
}

# 接受一个键的元组和值的元组列表,并将它们转化为字典列表
# 例如 ('id', 'val'), [('a', 1), ('b', 2)] => [{'id': 'a', 'val': 1}, {'id': 'b', 'val': 2}]
def get_list_of_dict(keys, list_of_tuples):
    list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
    return list_of_dict

def process_dict(key, values):
    # 将具有值列表的字典转化为字典列表
    list_of_dicts = get_list_of_dict(('id', 'val', 'val_num'), zip(itertools.repeat(key, len(values['vals'])), values['vals'], values['val_num']))
    # 使用字典推导根据每个字典的 'val' 属性对它们进行分组
    return {d['val']: {k: v for k, v in d.items() if k != 'val'} for d in list_of_dicts}

# 重新组织以将字典置于 'context_values' 键下
processed = {k: {'context_values': process_dict(k, v)} for k, v in data.items()}

# {'a': {'context_values': {'a1': {'id': 'a', 'val_num': 1},
#                           'a2': {'id': 'a', 'val_num': 2},
#                           'a3': {'id': 'a', 'val_num': 3}}},
#  'b': {'context_values': {'b1': {'id': 'b', 'val_num': 4},
#                           'b2': {'id': 'b', 'val_num': 5},
#                           'b3': {'id': 'b', 'val_num': 6}}}}
pprint.pprint(processed)
英文:

Here's a solution using zip and dictionary comprehension to do what you're trying to do using toy data in a similar structure.

import itertools
import pprint

# Sample data similar to provided screenshots
data = {
    'a': {
        'id': 'a',
        'vals': ['a1', 'a2', 'a3'],
        'val_num': [1, 2, 3]
    },
    'b': {
        'id': 'b',
        'vals': ['b1', 'b2', 'b3'],
        'val_num': [4, 5, 6]
    }
}

# Takes a tuple of keys, and a list of tuples of values, and transforms them into a list of dicts
# i.e ('id', 'val'), [('a', 1), ('b', 2) => [{'id': 'a', 'val': 1}, {'id': 'b', 'val': 2}]
def get_list_of_dict(keys, list_of_tuples):
     list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
     return list_of_dict

def process_dict(key, values):
    # Transform the dict with lists of values into a list of dicts
    list_of_dicts = get_list_of_dict(('id', 'val', 'val_num'), zip(itertools.repeat(key, len(values['vals'])), values['vals'], values['val_num']))
    # Dictionary comprehension to group them based on the 'val' property of each dict
    return {d['val']: {k:v for k,v in d.items() if k != 'val'} for d in list_of_dicts}

# Reorganize to put dict under a 'context_values' key
processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}

# {'a': {'context_values': {'a1': {'id': 'a', 'val_num': 1},
#                           'a2': {'id': 'a', 'val_num': 2},
#                           'a3': {'id': 'a', 'val_num': 3}}},
#  'b': {'context_values': {'b1': {'id': 'b', 'val_num': 4},
#                           'b2': {'id': 'b', 'val_num': 5},
#                           'b3': {'id': 'b', 'val_num': 6}}}}
pprint.pprint(processed)

答案2

得分: 0

以下是您的代码的中文翻译部分:

好的这是我案例中的更新解决方案对我来说关键在于zip函数因为它只迭代传递的最小列表解决方案是itertools.cycle方法以下是代码

data = {'us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding': {'context_ref': ['D20210801-20220731',
  'D20200801-20210731',
  'D20190801-20200731',
  'D20210801-20220731',
  'D20200801-20210731',
  'D20190801-20200731'],
 'decimals': ['-5', '-5', '-5', '-5', '-5', '-5'],
 'id': ['us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding'],
 'master_id': ['us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding'],
 'unit_ref': ['shares', 'shares', 'shares', 'shares', 'shares', 'shares'],
 'value': ['98500000', '96400000', '96900000', '98500000', '96400000', '96900000']}

def get_list_of_dict(keys, list_of_tuples):
    list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
    return list_of_dict

def process_dict(k, values):
    list_of_dicts = get_list_of_dict(('context_ref', 'decimals', 'id', 'master_id', 'unit_ref', 'value'),
                    zip((values['context_ref']), values['decimals'], itertools.cycle(values['id']),
                    itertools.cycle(values['master_id']), values['unit_ref'], values['value']))
    return {d['context_ref']: {k:v for k,v in d.items() if k != 'context_ref'} for d in list_of_dicts}

processed = {k: {'context_values': process_dict(k, v)} for k, v in data.items()}

pprint.pprint(processed)

希望对您有所帮助!如果您需要进一步的翻译或有其他问题,请随时提问。

英文:

Ok, Here is the updated solution from my case. Catch for me was the was the zip function since it only iterates over the smallest list passed. Solution was the itertools.cycle method Here is the code:

data =  {'us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding': {'context_ref': ['D20210801-20220731',
                                                                          'D20200801-20210731',
                                                                          'D20190801-20200731',
                                                                          'D20210801-20220731',
                                                                          'D20200801-20210731',
                                                                          'D20190801-20200731'],
                                                          'decimals': ['-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5'],
                                                          'id': ['us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding'],
                                                          'master_id': ['us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding'],
                                                          'unit_ref': ['shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares'],
                                                          'value': ['98500000',
                                                                    '96400000',
                                                                    '96900000',
                                                                    '98500000',
                                                                    '96400000',
                                                                    '96900000']},


def get_list_of_dict(keys, list_of_tuples):
list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
return list_of_dict

def process_dict(k, values):
list_of_dicts = get_list_of_dict(('context_ref', 'decimals', 'id','master_id','unit_ref','value'),
                zip((values['context_ref']),values['decimals'],itertools.cycle(values['id']),
                itertools.cycle(values['master_id']),values['unit_ref'], values['value']))
return {d['context_ref']: {k:v for k,v in d.items()if k != 'context_ref'} for d in list_of_dicts}

processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}

pprint.pprint(processed)

huangapple
  • 本文由 发表于 2023年1月9日 16:46:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75054863.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定