Combining Dumper class with string representer to get exact required YAML output

huangapple go评论82阅读模式
英文:

Combining Dumper class with string representer to get exact required YAML output

问题

我使用PyYAML 6.0和Python 3.9。

我正在尝试以下操作:

  1. 创建一个YAML列表
  2. 将此列表嵌入另一个YAML对象中作为多行字符串
  3. 将这个YAML对象替换到现有文档中
  4. 以能够通过YAML 1.2 linting的格式写回文档

我已经让这个过程运行起来了,除了YAML 1.2的要求,以下是相关的Python代码:

import yaml

# 这里有一些代码,它定义了str_presenter函数和DoYamlStuff类。

这会生成类似这样的内容:

- patch: |
    - op: replace
      path: /spec/postRenderers
      value:
      - kustomize:
          images:
          - name: nginx:latest
            newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
            newTag: latest    
  target:
    kind: HelmRelease
    name: nginx
    namespace: nginx

如您所见,这大部分都可以正常工作。有效的YAML,做了所需的操作等。

不幸的是,它没有缩进列表项2个空格,因此我们仓库的预提交中的YAML linter会对一切进行调整。这使得仓库变得混乱,并且导致PR经常包含不相关的更改。

随后,我尝试实现来自StackOverflow的PrettyDumper类,这反转了效果 - 我的缩进现在正确了,但我的标量根本不起作用。

我已经尝试将str_presenter函数与PrettyDumper类合并,但标量仍然无法正常工作。

如果我能将这两种方法合并到PrettyDumper类中,我认为它将满足我所有的需求。有人可以指点我正确的方向吗?

英文:

I'm using PyYAML 6.0 with Python 3.9.

In order, I am trying to...

  1. Create a YAML list
  2. Embed this list as a multi-line string in another YAML object
  3. Replace this YAML object in an existing document
  4. Write the document back, in a format that will pass YAML 1.2 linting

I have the process working, apart from the YAML 1.2 requirement, with the following code:

import yaml

def str_presenter(dumper, data):
    """configures yaml for dumping multiline strings
    Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
    if data.count('\n') > 0:  # check for multiline string
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(
    str, str_presenter) 

class DoYamlStuff:
    def post_renderers(images):
        return yaml.dump([
            {
                "op": "replace",
                "path": "/spec/postRenderers",
                "value": [
                    {
                        "kustomize": {
                            "images": images
                        }
                    }
                ]
            }])

    @classmethod
    def images_patch(cls, chart, images, ecr_url):
        return {
            "target": {
                "kind": "HelmRelease",
                "name": chart,
                "namespace": chart
            },
            "patch": cls.post_renderers([x.patch(ecr_url) for x in images])

This produces something like this:

- patch: |
    - op: replace
      path: /spec/postRenderers
      value:
      - kustomize:
          images:
          - name: nginx:latest
            newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
            newTag: latest    
  target:
    kind: HelmRelease
    name: nginx
    namespace: nginx

As you can see, that's mostly working. Valid YAML, does what it needs to, etc.

Unfortunately... it doesn't indent the list item by 2 spaces, so the YAML linter in our repository's pre-commit then adjusts everything. Makes the repo messy, and causes PRs to regularly include changes that aren't relevant.

I then set out to implement this PrettyDumper class from StackOverflow. This reversed the effects - my indentation is now right, but my scalars aren't working at all:

  - patch: "- op: replace\n  path: /spec/postRenderers\n  value:\n    - kustomize:\n\
      \        images:\n          - name: nginx:latest\n           \
      \ newName: 793961818876.dkr.ecr.eu-west-1.amazonaws.com/nginx\n        \
      \    newTag: latest\n"
    target:
      kind: HelmRelease
      name: nginx
      namespace: nginx

I have tried to merge the str_presenter function with the PrettyDumper class, but the scalars still don't work:

import yaml.emitter
import yaml.serializer
import yaml.representer
import yaml.resolver


class IndentingEmitter(yaml.emitter.Emitter):
    def increase_indent(self, flow=False, indentless=False):
        """Ensure that lists items are always indented."""
        return super().increase_indent(
            flow=False,
            indentless=False,
        )


class PrettyDumper(
    IndentingEmitter,
    yaml.serializer.Serializer,
    yaml.representer.Representer,
    yaml.resolver.Resolver,
):
    def __init__(
        self,
        stream,
        default_style=None,
        default_flow_style=False,
        canonical=None,
        indent=None,
        width=None,
        allow_unicode=None,
        line_break=None,
        encoding=None,
        explicit_start=None,
        explicit_end=None,
        version=None,
        tags=None,
        sort_keys=True,
    ):
        IndentingEmitter.__init__(
            self,
            stream,
            canonical=canonical,
            indent=indent,
            width=width,
            allow_unicode=allow_unicode,
            line_break=line_break,
        )
        yaml.serializer.Serializer.__init__(
            self,
            encoding=encoding,
            explicit_start=explicit_start,
            explicit_end=explicit_end,
            version=version,
            tags=tags,
        )
        yaml.representer.Representer.__init__(
            self,
            default_style=default_style,
            default_flow_style=default_flow_style,
            sort_keys=sort_keys,
        )
        yaml.resolver.Resolver.__init__(self)
        
        yaml.add_representer(str, self.str_presenter)
        yaml.representer.SafeRepresenter.add_representer(
            str, self.str_presenter) 

    def str_presenter(self, data):
        print(data)
        """configures yaml for dumping multiline strings
        Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
        if data.count('\n') > 0:  # check for multiline string
            return self.represent_scalar('tag:yaml.org,2002:str', data, style='|')
        return self.represent_scalar('tag:yaml.org,2002:str', data)

If I could merge these two approaches into the PrettyDumper class, I think it would do everything I require. Can anyone point me in the right direction?

答案1

得分: 1

如果您需要通过YAML 1.2的验证,不应该使用PyYAML,因为它只支持(YAML 1.1的一个子集)。

ruamel.yaml可以处理更多内容,例如将序列用作映射键,这是PyYAML无法处理的,尽管它是有效的YAML 1.1。除此之外,它支持并默认使用YAML 1.2的加载/转储(免责声明:我是该软件包的作者)。

多年来,ruamel.yaml的往返模式,最初是为了保留注释而构建的,现在已经扩展,可以处理多余的引号、锚点/别名名称保留、不同格式的字符串标量、整数和浮点数等。您可以使用它的底层技术轻松获取您想要的内容,而无需处理表示器。

import sys
import io
import ruamel.yaml

images = [
   dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'

def data_as_literal_scalar(d):
    """将数据结构d转储并将其变成文字标量字符串以供进一步转储"""
    yaml = ruamel.yaml.YAML()
    yaml.indent(sequence=4, offset=2)  # 这会使根序列缩进2个额外位置
    buf = io.StringIO()
    yaml.dump(d, buf)
    v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)])  # 去除额外的位置
    return ruamel.yaml.scalarstring.LiteralScalarString(v)

data = [dict(patch=data_as_literal_scalar([{
                                   "op": "replace",
                                   "path": "/spec/postRenderers",
                                   "value": [
                                       {
                                           "kustomize": {
                                               "images": images
                                           }
                                       }
                                   ]
                                 }]),
    target={
                "kind": "HelmRelease",
                "name": chart,
                "namespace": chart
            },
)]

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

这会生成:

- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
英文:

If you need to pass your output through YAML 1.2 linting, you should not use PyYAML as it only supports (a subset of) YAML 1.1.

ruamel.yaml can handle more, e.g using a sequence as a mapping key, something that PyYAML cannot handle at all, although it is
valid YAML 1.1. Apart from that it supports, and defaults to,
YAML 1.2 loading/dumping (disclaimer: I am the author of that package).

Over the years ruamel.yaml's round-trip mode, which was originally built to preserve comments,
has been extended and now
handles superfluous quotes, anchor/alias name preservation,
different format string scalars, integers and float etc. You can use its underlying technology
to easily get what you want, without mucking with representers:

import sys
import io
import ruamel.yaml

images = [
   dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'

def data_as_literal_scalar(d):
    """dump a data structure d and make it a literal scalar string for further dumping"""
    yaml = ruamel.yaml.YAML()
    yaml.indent(sequence=4, offset=2)  # this indents even the root sequence by 2 extra positions
    buf = io.StringIO()
    yaml.dump(d, buf)
    v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)])  # strip extra positions
    return ruamel.yaml.scalarstring.LiteralScalarString(v)

data = [dict(patch=data_as_literal_scalar([{
                                   "op": "replace",
                                   "path": "/spec/postRenderers",
                                   "value": [
                                       {
                                           "kustomize": {
                                               "images": images
                                           }
                                       }
                                   ]
                                 }]),
    target={
                "kind": "HelmRelease",
                "name": chart,
                "namespace": chart
            },
)]

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which gives:

- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx

huangapple
  • 本文由 发表于 2023年3月21日 00:41:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793013.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定