英文:
Combining Dumper class with string representer to get exact required YAML output
问题
我使用PyYAML 6.0和Python 3.9。
我正在尝试以下操作:
- 创建一个YAML列表
- 将此列表嵌入另一个YAML对象中作为多行字符串
- 将这个YAML对象替换到现有文档中
- 以能够通过YAML 1.2 linting的格式写回文档
我已经让这个过程运行起来了,除了YAML 1.2的要求,以下是相关的Python代码:
import yaml
# 这里有一些代码,它定义了str_presenter函数和DoYamlStuff类。
这会生成类似这样的内容:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
如您所见,这大部分都可以正常工作。有效的YAML,做了所需的操作等。
不幸的是,它没有缩进列表项2个空格,因此我们仓库的预提交中的YAML linter会对一切进行调整。这使得仓库变得混乱,并且导致PR经常包含不相关的更改。
随后,我尝试实现来自StackOverflow的PrettyDumper类,这反转了效果 - 我的缩进现在正确了,但我的标量根本不起作用。
我已经尝试将str_presenter
函数与PrettyDumper
类合并,但标量仍然无法正常工作。
如果我能将这两种方法合并到PrettyDumper
类中,我认为它将满足我所有的需求。有人可以指点我正确的方向吗?
英文:
I'm using PyYAML 6.0 with Python 3.9.
In order, I am trying to...
- Create a YAML list
- Embed this list as a multi-line string in another YAML object
- Replace this YAML object in an existing document
- Write the document back, in a format that will pass YAML 1.2 linting
I have the process working, apart from the YAML 1.2 requirement, with the following code:
import yaml
def str_presenter(dumper, data):
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('\n') > 0: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, str_presenter)
class DoYamlStuff:
def post_renderers(images):
return yaml.dump([
{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}])
@classmethod
def images_patch(cls, chart, images, ecr_url):
return {
"target": {
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
"patch": cls.post_renderers([x.patch(ecr_url) for x in images])
This produces something like this:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
As you can see, that's mostly working. Valid YAML, does what it needs to, etc.
Unfortunately... it doesn't indent the list item by 2 spaces, so the YAML linter in our repository's pre-commit then adjusts everything. Makes the repo messy, and causes PRs to regularly include changes that aren't relevant.
I then set out to implement this PrettyDumper class from StackOverflow. This reversed the effects - my indentation is now right, but my scalars aren't working at all:
- patch: "- op: replace\n path: /spec/postRenderers\n value:\n - kustomize:\n\
\ images:\n - name: nginx:latest\n \
\ newName: 793961818876.dkr.ecr.eu-west-1.amazonaws.com/nginx\n \
\ newTag: latest\n"
target:
kind: HelmRelease
name: nginx
namespace: nginx
I have tried to merge the str_presenter
function with the PrettyDumper
class, but the scalars still don't work:
import yaml.emitter
import yaml.serializer
import yaml.representer
import yaml.resolver
class IndentingEmitter(yaml.emitter.Emitter):
def increase_indent(self, flow=False, indentless=False):
"""Ensure that lists items are always indented."""
return super().increase_indent(
flow=False,
indentless=False,
)
class PrettyDumper(
IndentingEmitter,
yaml.serializer.Serializer,
yaml.representer.Representer,
yaml.resolver.Resolver,
):
def __init__(
self,
stream,
default_style=None,
default_flow_style=False,
canonical=None,
indent=None,
width=None,
allow_unicode=None,
line_break=None,
encoding=None,
explicit_start=None,
explicit_end=None,
version=None,
tags=None,
sort_keys=True,
):
IndentingEmitter.__init__(
self,
stream,
canonical=canonical,
indent=indent,
width=width,
allow_unicode=allow_unicode,
line_break=line_break,
)
yaml.serializer.Serializer.__init__(
self,
encoding=encoding,
explicit_start=explicit_start,
explicit_end=explicit_end,
version=version,
tags=tags,
)
yaml.representer.Representer.__init__(
self,
default_style=default_style,
default_flow_style=default_flow_style,
sort_keys=sort_keys,
)
yaml.resolver.Resolver.__init__(self)
yaml.add_representer(str, self.str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, self.str_presenter)
def str_presenter(self, data):
print(data)
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('\n') > 0: # check for multiline string
return self.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return self.represent_scalar('tag:yaml.org,2002:str', data)
If I could merge these two approaches into the PrettyDumper
class, I think it would do everything I require. Can anyone point me in the right direction?
答案1
得分: 1
如果您需要通过YAML 1.2的验证,不应该使用PyYAML,因为它只支持(YAML 1.1的一个子集)。
ruamel.yaml
可以处理更多内容,例如将序列用作映射键,这是PyYAML无法处理的,尽管它是有效的YAML 1.1。除此之外,它支持并默认使用YAML 1.2的加载/转储(免责声明:我是该软件包的作者)。
多年来,ruamel.yaml
的往返模式,最初是为了保留注释而构建的,现在已经扩展,可以处理多余的引号、锚点/别名名称保留、不同格式的字符串标量、整数和浮点数等。您可以使用它的底层技术轻松获取您想要的内容,而无需处理表示器。
import sys
import io
import ruamel.yaml
images = [
dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'
def data_as_literal_scalar(d):
"""将数据结构d转储并将其变成文字标量字符串以供进一步转储"""
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2) # 这会使根序列缩进2个额外位置
buf = io.StringIO()
yaml.dump(d, buf)
v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # 去除额外的位置
return ruamel.yaml.scalarstring.LiteralScalarString(v)
data = [dict(patch=data_as_literal_scalar([{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}]),
target={
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
)]
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
这会生成:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
英文:
If you need to pass your output through YAML 1.2 linting, you should not use PyYAML as it only supports (a subset of) YAML 1.1.
ruamel.yaml
can handle more, e.g using a sequence as a mapping key, something that PyYAML cannot handle at all, although it is
valid YAML 1.1. Apart from that it supports, and defaults to,
YAML 1.2 loading/dumping (disclaimer: I am the author of that package).
Over the years ruamel.yaml
's round-trip mode, which was originally built to preserve comments,
has been extended and now
handles superfluous quotes, anchor/alias name preservation,
different format string scalars, integers and float etc. You can use its underlying technology
to easily get what you want, without mucking with representers:
import sys
import io
import ruamel.yaml
images = [
dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'
def data_as_literal_scalar(d):
"""dump a data structure d and make it a literal scalar string for further dumping"""
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2) # this indents even the root sequence by 2 extra positions
buf = io.StringIO()
yaml.dump(d, buf)
v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # strip extra positions
return ruamel.yaml.scalarstring.LiteralScalarString(v)
data = [dict(patch=data_as_literal_scalar([{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}]),
target={
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
)]
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
which gives:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论