如何强制YAML忽略重复的键?

huangapple go评论87阅读模式
英文:

How to force yaml to overlook duplicate keys?

问题

我目前正在尝试构建一个HTML解析器,以处理像下面这样的YAML文件并将其转换为实际的HTML。我已经构建了解析器来获取字典并解析所需的内容,但没有意识到YAML不支持在同一个字典中有重复的元素/键,它会将最后一个元素视为其值。是否有办法强制它忽略这一点,或者是否有类似的配置文件可以保持与JSON除了最小外观之外的外观?

我选择YAML的主要原因是因为它不需要像XML的开/闭括号(因为在那一点上它应该只是一个生成的XML网站),或者JSON的大量开/闭大括号、引号等。

我已经尝试找到其他配置文件,或者直接使用yaml.load,但都失败了。像JSON和直接的Python配置文件需要引号包围一切,并且有大量的大括号。像INI这样的配置文件似乎不支持复杂的文件结构(根据我的理解)。

yaml.load()似乎没有以我想要的方式加载它的方法。我只想使用空格语法来定义每个元素应嵌套的位置。

最后,我尝试使用ruamel YAML 允许重复的属性,但所有它做的只是给出警告,然后与以前类似地处理它,只允许一个div树通过。

英文:

I'm currently trying to build an html parser to take yaml files like the one below and convert it into actual html. I've built up the parser to take the dict and parse out what's required, but didn't realize yaml doesn't support duplicate elements/keys in the same dict, taking the last one as it's value. Is there a way I can force it to disregard this, or a similar config file that would keep the minimal look to it besides JSON?

The main reason I picked yaml is due to it not needing anything like xml's open/close brackets (cause at that point it should just be an xml generated website), or Json's massive amounts of open/close braces, quotes, etc.


html:
  head:
    title: 
      content: Always Under Development
    meta: 
      attributes:
        charset: utf-8
    link: 
      attributes: 
        rel: stylesheet
        href: static/css/base.css
  body:
    div:
      attributes:
        class: Index_Div_Nav_Bar
      ul:
        li:
          content:
            index : Home
        li:
          content:
            projects : Projects
    div:
      attributes:
        class: foreground
    footer:
      div:
        attributes:
          class: Index_Div_Social_Media
        a:
          content:
            img:
              attributes:
                src: static/images/Linkedin-logo-2011-2019-241642781.png
                style: 'width:8%; height:5%;'
          attributes:
            href: 'https://www.linkedin.com/'
        br:
        a:
          content:
            img:
              attributes:
                src: static/images/gitlab-logo-square-1426672963.png
                style: 'width:5%; height:5%'
          attributes:
            href: 'https://gitlab.com/AltTabLife'

I've attempted to find other config files, or use yaml.load directly, which both have failed. Config files like JSON and direct python require quotes for everything and massive amounts of brain overloading curly braces. Configs like INI, don't support complex file structures (to my understanding).

yaml.load() doesn't seem to have a way to load it like I'm wanting. I straight just want to use the whitespace syntax to define where each element is meant to be nested.

Lastly, I've attempted using ruamel YAML allowing duplicate attributes, but all that does is give the warning, then process it similarly to before, only allowing one div tree to make it through.

答案1

得分: 0

如果您的键不是唯一的,您可能想使用一个列表 - 注意每个元素之间的破折号。

import yaml

vals = """
A:
   - 1
   - 2
   - 3
B:
   C: 4
   D: 5
"""

print(yaml.load(vals, yaml.SafeLoader))

结果为:

{'A': [1, 2, 3], 'B': {'C': 4, 'D': 5}}
英文:

If your keys aren't unique, you probably want to use a list - note the dashes denoting each element.

import yaml


vals = """
A:
   - 1
   - 2
   - 3
B:
   C: 4
   D: 5
"""

print(yaml.load(vals,yaml.SafeLoader))

Yields:

{'A': [1, 2, 3], 'B': {'C': 4, 'D': 5}}

答案2

得分: 0

以下是代码部分的翻译:

import sys
from pathlib import Path
import ruamel.yaml

class MyConstructor(ruamel.yaml.RoundTripConstructor):
    def construct_mapping(self, node, datatyp, deep = False):
        if not isinstance(node, ruamel.yaml.nodes.MappingNode):
            raise ConstructorError(
                None, None, f'expected a mapping node, but found {node.id!s}', node.start_mark,
            )
        ret_val = datatyp
        for key_node, value_node in node.value:
            # keys can be list -> deep
            key = self.construct_object(key_node, deep=True)
            assert isinstance(key, str)
            value = self.construct_object(value_node, deep=deep)
            ret_val.append((key, value))
        return ret_val

    def construct_yaml_map(self, node):
        data = []
        yield data
        self.construct_mapping(node, data, deep=True)

MyConstructor.add_constructor(
    'tag:yaml.org,2002:map', MyConstructor.construct_yaml_map
)

file_in = Path('input.yaml')
    
yaml = ruamel.yaml.YAML()
yaml.Constructor = MyConstructor
data = yaml.load(file_in)

def attr(data):
    ret_val = ''
    if not isinstance(data, list):
        return ret_val
    for key, value in data:
        if key == 'attributes':
            for k1, v1 in value:
                ret_val += f' {k1}="{v1}"'
    return ret_val

def html(data, level=0):
    indent = '  ' * level
    if isinstance(data, list):
        for elem in data:
            if elem[0] == 'attributes':
                continue
            if elem[1] is None or len(elem[1]) == 1 and elem[1][0] == 'attributes':
                print(f'{indent}<{elem[0]}{attr(elem[1])}>')
                continue
            print(f'{indent}<{elem[0]}{attr(elem[1])}>')
            html(elem[1], level+1)
            print(f'{indent}</{elem[0]}>')
    elif isinstance(data, str):
        print(f'{indent}{data}')
    else:
        print('type', type(data))
        # raise NotImplementedError

html(data)

希望这对你有所帮助。如果你有其他问题,请随时提出。

英文:

YAML, by specification, is not allowed to have duplicate keys in a mapping. So if a parser incorrectly ignores this
directive then there is no guarantee which of the values for the key is taken, especially because
the YAML specification also states that the order of the keys is not significant.
Since the usual data structure created for a YAML mapping is a Python dict, there is no
way for it to contain information about multiple values and keep the order of all key value pairs (
you can make each dict value a list of one or more elements, but that would only
be able to keep the order of the values for a key, and still mean loss
of original ordering of the key-value pairs).

What you are looking for is parsing something that is not YAML, but since it
is close to YAML, that doesn't mean you can't start with a YAML parser and derive a parser for your purposes from it.
E.g. when ruamel.yaml parses a mapping all the key-value pairs are kept in order, and you can
change the method that constructs a mapping to forgo checking duplicate keys and
create a data structure that that keeps the info you need for generating HTML.

Assming your input is in a file input.yaml:

import sys
from pathlib import Path
import ruamel.yaml

class MyConstructor(ruamel.yaml.RoundTripConstructor):
    def construct_mapping(self, node, datatyp, deep = False):
        if not isinstance(node, ruamel.yaml.nodes.MappingNode):
            raise ConstructorError(
                None, None, f&#39;expected a mapping node, but found {node.id!s}&#39;, node.start_mark,
            )
        ret_val = datatyp
        for key_node, value_node in node.value:
            # keys can be list -&gt; deep
            key = self.construct_object(key_node, deep=True)
            assert isinstance(key, str)
            value = self.construct_object(value_node, deep=deep)
            ret_val.append((key, value))
        return ret_val

    def construct_yaml_map(self, node):
        data = []
        yield data
        self.construct_mapping(node, data, deep=True)

MyConstructor.add_constructor(
    &#39;tag:yaml.org,2002:map&#39;, MyConstructor.construct_yaml_map
)

file_in = Path(&#39;input.yaml&#39;)
    
yaml = ruamel.yaml.YAML()
yaml.Constructor = MyConstructor
data = yaml.load(file_in)

def attr(data):
    ret_val = &#39;&#39;
    if not isinstance(data, list):
        return ret_val
    for key, value in data:
        if key == &#39;attributes&#39;:
            for k1, v1 in value:
                ret_val += f&#39; {k1}=&quot;{v1}&quot;&#39;
    return ret_val

def html(data, level=0):
    indent = &#39;  &#39; * level
    if isinstance(data, list):
        for elem in data:
            if elem[0] == &#39;attributes&#39;:
                continue
            if elem[1] is None or len(elem[1]) == 1 and elem[1][0] == &#39;attributes&#39;:
                print(f&#39;{indent}&lt;{elem[0]}{attr(elem[1])}&gt;&#39;)
                continue
            print(f&#39;{indent}&lt;{elem[0]}{attr(elem[1])}&gt;&#39;)
            html(elem[1], level+1)
            print(f&#39;{indent}&lt;/{elem[0]}&gt;&#39;)
    elif isinstance(data, str):
        print(f&#39;{indent}{data}&#39;)
    else:
        print(&#39;type&#39;, type(data))
        # raise NotImplementedError

html(data)

which gives:

&lt;html&gt;
&lt;head&gt;
&lt;title&gt;
&lt;content&gt;
Always Under Development
&lt;/content&gt;
&lt;/title&gt;
&lt;meta charset=&quot;utf-8&quot;&gt;
&lt;/meta&gt;
&lt;link rel=&quot;stylesheet&quot; href=&quot;static/css/base.css&quot;&gt;
&lt;/link&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;div class=&quot;Index_Div_Nav_Bar&quot;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;content&gt;
&lt;index&gt;
Home
&lt;/index&gt;
&lt;/content&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;content&gt;
&lt;projects&gt;
Projects
&lt;/projects&gt;
&lt;/content&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class=&quot;foreground&quot;&gt;
&lt;/div&gt;
&lt;footer&gt;
&lt;div class=&quot;Index_Div_Social_Media&quot;&gt;
&lt;a href=&quot;https://www.linkedin.com/&quot;&gt;
&lt;content&gt;
&lt;img src=&quot;static/images/Linkedin-logo-2011-2019-241642781.png&quot; style=&quot;width:8%; height:5%;&quot;&gt;
&lt;/img&gt;
&lt;/content&gt;
&lt;/a&gt;
&lt;br&gt;
&lt;a href=&quot;https://gitlab.com/AltTabLife&quot;&gt;
&lt;content&gt;
&lt;img src=&quot;static/images/gitlab-logo-square-1426672963.png&quot; style=&quot;width:5%; height:5%&quot;&gt;
&lt;/img&gt;
&lt;/content&gt;
&lt;/a&gt;
&lt;/div&gt;
&lt;/footer&gt;
&lt;/body&gt;
&lt;/html&gt;

The data structure you end up with has a list of tuples where you normally would get a dict like object. That
way you preserve the order and can handle tuples that have the same first element (i.e.. the "key"). YAML merge
keys (&lt;&lt;) are not handled, but anchors/aliases can probably still be used in the input (if you do make sure
you check for (infinite) recursion in the routine processing data).

huangapple
  • 本文由 发表于 2023年6月26日 05:37:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76552508.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定