Combining Dumper class with string representer to get exact required YAML output

huangapple go评论111阅读模式
英文:

Combining Dumper class with string representer to get exact required YAML output

问题

我使用PyYAML 6.0和Python 3.9。

我正在尝试以下操作:

  1. 创建一个YAML列表
  2. 将此列表嵌入另一个YAML对象中作为多行字符串
  3. 将这个YAML对象替换到现有文档中
  4. 以能够通过YAML 1.2 linting的格式写回文档

我已经让这个过程运行起来了,除了YAML 1.2的要求,以下是相关的Python代码:

  1. import yaml
  2. # 这里有一些代码,它定义了str_presenter函数和DoYamlStuff类。

这会生成类似这样的内容:

  1. - patch: |
  2. - op: replace
  3. path: /spec/postRenderers
  4. value:
  5. - kustomize:
  6. images:
  7. - name: nginx:latest
  8. newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
  9. newTag: latest
  10. target:
  11. kind: HelmRelease
  12. name: nginx
  13. namespace: nginx

如您所见,这大部分都可以正常工作。有效的YAML,做了所需的操作等。

不幸的是,它没有缩进列表项2个空格,因此我们仓库的预提交中的YAML linter会对一切进行调整。这使得仓库变得混乱,并且导致PR经常包含不相关的更改。

随后,我尝试实现来自StackOverflow的PrettyDumper类,这反转了效果 - 我的缩进现在正确了,但我的标量根本不起作用。

我已经尝试将str_presenter函数与PrettyDumper类合并,但标量仍然无法正常工作。

如果我能将这两种方法合并到PrettyDumper类中,我认为它将满足我所有的需求。有人可以指点我正确的方向吗?

英文:

I'm using PyYAML 6.0 with Python 3.9.

In order, I am trying to...

  1. Create a YAML list
  2. Embed this list as a multi-line string in another YAML object
  3. Replace this YAML object in an existing document
  4. Write the document back, in a format that will pass YAML 1.2 linting

I have the process working, apart from the YAML 1.2 requirement, with the following code:

  1. import yaml
  2. def str_presenter(dumper, data):
  3. """configures yaml for dumping multiline strings
  4. Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
  5. if data.count('\n') > 0: # check for multiline string
  6. return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  7. return dumper.represent_scalar('tag:yaml.org,2002:str', data)
  8. yaml.add_representer(str, str_presenter)
  9. yaml.representer.SafeRepresenter.add_representer(
  10. str, str_presenter)
  11. class DoYamlStuff:
  12. def post_renderers(images):
  13. return yaml.dump([
  14. {
  15. "op": "replace",
  16. "path": "/spec/postRenderers",
  17. "value": [
  18. {
  19. "kustomize": {
  20. "images": images
  21. }
  22. }
  23. ]
  24. }])
  25. @classmethod
  26. def images_patch(cls, chart, images, ecr_url):
  27. return {
  28. "target": {
  29. "kind": "HelmRelease",
  30. "name": chart,
  31. "namespace": chart
  32. },
  33. "patch": cls.post_renderers([x.patch(ecr_url) for x in images])

This produces something like this:

  1. - patch: |
  2. - op: replace
  3. path: /spec/postRenderers
  4. value:
  5. - kustomize:
  6. images:
  7. - name: nginx:latest
  8. newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
  9. newTag: latest
  10. target:
  11. kind: HelmRelease
  12. name: nginx
  13. namespace: nginx

As you can see, that's mostly working. Valid YAML, does what it needs to, etc.

Unfortunately... it doesn't indent the list item by 2 spaces, so the YAML linter in our repository's pre-commit then adjusts everything. Makes the repo messy, and causes PRs to regularly include changes that aren't relevant.

I then set out to implement this PrettyDumper class from StackOverflow. This reversed the effects - my indentation is now right, but my scalars aren't working at all:

  1. - patch: "- op: replace\n path: /spec/postRenderers\n value:\n - kustomize:\n\
  2. \ images:\n - name: nginx:latest\n \
  3. \ newName: 793961818876.dkr.ecr.eu-west-1.amazonaws.com/nginx\n \
  4. \ newTag: latest\n"
  5. target:
  6. kind: HelmRelease
  7. name: nginx
  8. namespace: nginx

I have tried to merge the str_presenter function with the PrettyDumper class, but the scalars still don't work:

  1. import yaml.emitter
  2. import yaml.serializer
  3. import yaml.representer
  4. import yaml.resolver
  5. class IndentingEmitter(yaml.emitter.Emitter):
  6. def increase_indent(self, flow=False, indentless=False):
  7. """Ensure that lists items are always indented."""
  8. return super().increase_indent(
  9. flow=False,
  10. indentless=False,
  11. )
  12. class PrettyDumper(
  13. IndentingEmitter,
  14. yaml.serializer.Serializer,
  15. yaml.representer.Representer,
  16. yaml.resolver.Resolver,
  17. ):
  18. def __init__(
  19. self,
  20. stream,
  21. default_style=None,
  22. default_flow_style=False,
  23. canonical=None,
  24. indent=None,
  25. width=None,
  26. allow_unicode=None,
  27. line_break=None,
  28. encoding=None,
  29. explicit_start=None,
  30. explicit_end=None,
  31. version=None,
  32. tags=None,
  33. sort_keys=True,
  34. ):
  35. IndentingEmitter.__init__(
  36. self,
  37. stream,
  38. canonical=canonical,
  39. indent=indent,
  40. width=width,
  41. allow_unicode=allow_unicode,
  42. line_break=line_break,
  43. )
  44. yaml.serializer.Serializer.__init__(
  45. self,
  46. encoding=encoding,
  47. explicit_start=explicit_start,
  48. explicit_end=explicit_end,
  49. version=version,
  50. tags=tags,
  51. )
  52. yaml.representer.Representer.__init__(
  53. self,
  54. default_style=default_style,
  55. default_flow_style=default_flow_style,
  56. sort_keys=sort_keys,
  57. )
  58. yaml.resolver.Resolver.__init__(self)
  59. yaml.add_representer(str, self.str_presenter)
  60. yaml.representer.SafeRepresenter.add_representer(
  61. str, self.str_presenter)
  62. def str_presenter(self, data):
  63. print(data)
  64. """configures yaml for dumping multiline strings
  65. Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
  66. if data.count('\n') > 0: # check for multiline string
  67. return self.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  68. return self.represent_scalar('tag:yaml.org,2002:str', data)

If I could merge these two approaches into the PrettyDumper class, I think it would do everything I require. Can anyone point me in the right direction?

答案1

得分: 1

如果您需要通过YAML 1.2的验证,不应该使用PyYAML,因为它只支持(YAML 1.1的一个子集)。

ruamel.yaml可以处理更多内容,例如将序列用作映射键,这是PyYAML无法处理的,尽管它是有效的YAML 1.1。除此之外,它支持并默认使用YAML 1.2的加载/转储(免责声明:我是该软件包的作者)。

多年来,ruamel.yaml的往返模式,最初是为了保留注释而构建的,现在已经扩展,可以处理多余的引号、锚点/别名名称保留、不同格式的字符串标量、整数和浮点数等。您可以使用它的底层技术轻松获取您想要的内容,而无需处理表示器。

  1. import sys
  2. import io
  3. import ruamel.yaml
  4. images = [
  5. dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
  6. ]
  7. chart = 'nginx'
  8. def data_as_literal_scalar(d):
  9. """将数据结构d转储并将其变成文字标量字符串以供进一步转储"""
  10. yaml = ruamel.yaml.YAML()
  11. yaml.indent(sequence=4, offset=2) # 这会使根序列缩进2个额外位置
  12. buf = io.StringIO()
  13. yaml.dump(d, buf)
  14. v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # 去除额外的位置
  15. return ruamel.yaml.scalarstring.LiteralScalarString(v)
  16. data = [dict(patch=data_as_literal_scalar([{
  17. "op": "replace",
  18. "path": "/spec/postRenderers",
  19. "value": [
  20. {
  21. "kustomize": {
  22. "images": images
  23. }
  24. }
  25. ]
  26. }]),
  27. target={
  28. "kind": "HelmRelease",
  29. "name": chart,
  30. "namespace": chart
  31. },
  32. )]
  33. yaml = ruamel.yaml.YAML()
  34. yaml.dump(data, sys.stdout)

这会生成:

  1. - patch: |
  2. - op: replace
  3. path: /spec/postRenderers
  4. value:
  5. - kustomize:
  6. images:
  7. - name: nginx:latest
  8. newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
  9. newTag: latest
  10. target:
  11. kind: HelmRelease
  12. name: nginx
  13. namespace: nginx
英文:

If you need to pass your output through YAML 1.2 linting, you should not use PyYAML as it only supports (a subset of) YAML 1.1.

ruamel.yaml can handle more, e.g using a sequence as a mapping key, something that PyYAML cannot handle at all, although it is
valid YAML 1.1. Apart from that it supports, and defaults to,
YAML 1.2 loading/dumping (disclaimer: I am the author of that package).

Over the years ruamel.yaml's round-trip mode, which was originally built to preserve comments,
has been extended and now
handles superfluous quotes, anchor/alias name preservation,
different format string scalars, integers and float etc. You can use its underlying technology
to easily get what you want, without mucking with representers:

  1. import sys
  2. import io
  3. import ruamel.yaml
  4. images = [
  5. dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
  6. ]
  7. chart = 'nginx'
  8. def data_as_literal_scalar(d):
  9. """dump a data structure d and make it a literal scalar string for further dumping"""
  10. yaml = ruamel.yaml.YAML()
  11. yaml.indent(sequence=4, offset=2) # this indents even the root sequence by 2 extra positions
  12. buf = io.StringIO()
  13. yaml.dump(d, buf)
  14. v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # strip extra positions
  15. return ruamel.yaml.scalarstring.LiteralScalarString(v)
  16. data = [dict(patch=data_as_literal_scalar([{
  17. "op": "replace",
  18. "path": "/spec/postRenderers",
  19. "value": [
  20. {
  21. "kustomize": {
  22. "images": images
  23. }
  24. }
  25. ]
  26. }]),
  27. target={
  28. "kind": "HelmRelease",
  29. "name": chart,
  30. "namespace": chart
  31. },
  32. )]
  33. yaml = ruamel.yaml.YAML()
  34. yaml.dump(data, sys.stdout)

which gives:

  1. - patch: |
  2. - op: replace
  3. path: /spec/postRenderers
  4. value:
  5. - kustomize:
  6. images:
  7. - name: nginx:latest
  8. newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
  9. newTag: latest
  10. target:
  11. kind: HelmRelease
  12. name: nginx
  13. namespace: nginx

huangapple
  • 本文由 发表于 2023年3月21日 00:41:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793013.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定