Ansible: 如何等待多个主机中的一个服务?

huangapple go评论52阅读模式
英文:

Ansible: How to wait_for service on one of many hosts?

问题

我使用Ansible在我的主机上启动一个服务来进行一些测试。其中一个特定的主机是(最初的)主机,如果该主机上的服务停止或失败,就意味着测试已经结束。每个主机都有一个在清单中分配的host_id,并传递给正在运行的服务。

然而,在执行过程中,主机可能会根据名为primary_host_ids的列表而更改。
在这种情况下,最初的主节点不再表示测试的完成,而是在具有匹配node_id的此列表中的其他节点之一。

在某些情况下(取决于特定测试迭代的设置),测试可能不会在一定的时间限制内结束。如果是这种情况,我还想停止它,并使用新的参数启动下一个测试。

伪代码如下:

对于parameter_set_list中的每个parameter_set
  从parameter_set中提取事实
  根据设置的事实更改配置文件
  启动服务
  等待服务在至少一个潜在的主节点上完成或者等待超时发生
  停止服务
  获取一些日志

我编写了一个角色来处理测试循环:

# roles/run-tests/main.yml
- name: 循环测试
  include_tasks: run-single-test.yml
  loop: "{{ parameter_set_list }}"
  loop_control:
    loop_var: parameter_set

# play.yml
- hosts: all
  name: 循环测试角色
  roles:
    - name: 运行测试
      role: run-tests

我尝试了使用include_tasksblockuntildelayretries等各种构造,但找不到适用的方法。以下是我尝试的一些示例:

# roles/run-tests/run-single-test.yml
# [...设置...]
# 以下不起作用,因为无法在“block”中使用“until”
# 将block移动到自己的任务文件中也不起作用,因为“until”/“delay”无法与“import_tasks”或“include_tasks”一起使用
- name: 等待至少一个主节点完成
  when: host_id in primary_host_ids
  block:
    - name: 检查服务是否不再运行
      ansible.builtin.service_facts:
      register: result

    - name: 设置结束标志
      when: result.ansible_facts.services['myservice.service'].state != "running"
      ansible.builtin.set_fact:
        _primary_completed: true
  until: _primary_completed == true
  retries: 20
  delay: 300
  ignore_errors: true
# [...清理...]
# roles/run-tests/run-single-test.yml
# [...设置...]
# 这个循环不正确:重试-直到完成或超时,然后遍历不同的ID
- name: 等待至少一个主机完成
  when: item|int == host_id|int
  ansible.builtin.service_facts:
  register: result
  until: result.ansible_facts.services['myservice.service'].state != "running"
  loop: "{{ primary_host_ids }}"
  retries: 20
  delay: 300
  ignore_errors: true
# [...清理...]
# roles/run-tests/run-single-test.yml
# [...设置...]
# 这将等待服务在*所有*潜在的主机上完成
# 我只想等待*任何一个*完成
- name: 等待至少一个主机完成
  when: host_id in primary_host_ids
  ansible.builtin.service_facts:
  register: result
  until: result.ansible_facts.services['myservice.service'].state != "running"
  retries: 20
  delay: 300
  ignore_errors: true
# [...清理...]

如果使用Ansible无法实现这一点,是否可以编写一个简单的Bash脚本并执行多个playbook?我不在乎这变得多丑陋,但重要的是我找到一种方式可以在夜间自动运行我的测试。

英文:

I'm using Ansible to spin up a service on my hosts to conduct some tests. One specific host is the (initial) primary host and if the service is stopped or fails on this host it means the test is concluded. Each host has a host_id which is assigned in the inventory and is passed to the service that is running.

However, during execution, the primary host may change according to a list called primary_host_ids.
In this case, the initial primary node is no longer the one that indicates the completion of the test but one of the other nodes within this list with the matching node_id.

In some cases (depending on the settings for this specific test iteration), the test may not conclude within a certain time limit. If that is the case, I also want to stop it and start the next test with the new parameters.

In pseudo-code, this would be:

for every parameter_set in parameter_set_list:
  Extract facts from the parameter_set
  Change configuration files according to the set facts
  Start the service
  Wait for the service to finish on at least one of the potential primary nodes OR for the timeout to occur
  Stop the service
  Fetch some logs

I've written a role to handle the test loop:

# roles/run-tests/main.yml
- name: Loop test
  include_tasks: run-single-test.yml
  loop: "{{ parameter_set_list }}"
  loop_control:
    loop_var: parameter_set

# play.yml
- hosts: all
  name: Loop test role
  roles:
    - name: Run test
      role: run-tests

I've tried various constructs using include_tasks, block, until,delay,retries, etc. but I can't find something that works. Below are some examples of what I tried:

# roles/run-tests/run-single-test.yml
# [...Setup...]
# The following doesn't work because I cannot use `until` with `block`
# Moving the block to its own tasks file also doesn't work because `until`/`delay` cannot be used with `import_tasks` or `include_tasks` either
- name: Wait for completion on at least one primary node
  when: host_id in primary_host_ids
  block:
    - name: Check if service is no longer running
      ansible.builtin.service_facts:
      register: result

    - name: Set end flag
      when: result.ansible_facts.services['myservice.service'].state != "running"
      ansible.builtin.set_fact:
        _primary_completed: true
  until: _primary_completed == true
  retries: 20
  delay: 300
  ignore_errors: true
# [...Cleanup...]
# roles/run-tests/run-single-test.yml
# [...Setup...]
# This loops incorrectly: The retry-until is executed first until completion or timeout and is then iterated over the different IDs
- name: Wait for completion on at least one primary host
  when: item|int == host_id|int
  ansible.builtin.service_facts:
  register: result
  until: result.ansible_facts.services['myservice.service'].state != "running"
  loop: "{{ primary_host_ids }}"
  retries: 20
  delay: 300
  ignore_errors: true
# [...Cleanup...]
# roles/run-tests/run-single-test.yml
# [...Setup...]
# This would wait for the service to complete on *all* potential primary hosts
# I only want to way for *any* of them to complete
- name: Wait for completion on at least one primary host
  when: host_id in primary_host_ids
  ansible.builtin.service_facts:
  register: result
  until: result.ansible_facts.services['myservice.service'].state != "running"
  retries: 20
  delay: 300
  ignore_errors: true
# [...Cleanup...]

If there is no way to do this with just Ansible, is there a way to write a simple Bash script and execute multiple playbooks? I don't really care how ugly this gets but it's important that I find a way to run my tests automatically overnight.

答案1

得分: 0

我找到了一个解决方案,使用答案 U880D 链接

# roles/run-tests/run-single-test.yml
# [...设置...]
- name: 使完成标志失效
  ansible.builtin.set_fact:
    experiment_finished: false
  delegate_to: localhost
  delegate_facts: true

- name: 等待至少一个主节点完成
  when: host_id in primary_host_ids
  block:
    - name: 初始化事实
      ansible.builtin.set_fact:
        tries_max: 20
        tries_counter: 1

    - name: 运行检查
      ansible.builtin.include_tasks: "./wait-for-completion.yml";


# roles/run-tests/wait-for-completion.yml
---
- name: 获取服务状态
  when: host_id in primary_host_ids
  ansible.builtin.service_facts:
  register: result
  ignore_errors: true

- name: 检查主节点上的服务是否已完成
  when: 
    - host_id in primary_host_ids
    - result.ansible_facts.services['myservice.service'].state != "running"
  ansible.builtin.set_fact:
    experiment_finished: true
  delegate_to: localhost
  delegate_facts: true

- name: 打印结果
  when: 
    - host_id in primary_host_ids
  ansible.builtin.debug:
    msg: "{{hostvars['localhost']['experiment_finished']}}"

- name: 将尝试计数器增加1
  ansible.builtin.set_fact:
    tries_counter: "{{ (tries_counter | int) + 1 }}"

- name: 如果服务未完成,则休眠并重复
  when: 
    - not hostvars['localhost']['experiment_finished']
    - (tries_counter | int) <= (tries_max | int)
    - host_id in primary_host_ids
  block:
    - name: 休眠
      ansible.builtin.pause:
        seconds: "{{ sleep_duration | default(300) }}"

    - name: 再次运行
      ansible.builtin.include_tasks: "./wait-for-completion.yml"
英文:

I managed to find a solution with the answer U880D linked

# roles/run-tests/run-single-test.yml
# [...Setup...]
- name: Invalidate finish flag
  ansible.builtin.set_fact:
    experiment_finished: false
  delegate_to: localhost
  delegate_facts: true

- name: Wait for completion on at least one primary node
  when: host_id in primary_host_ids
  block:
    - name: Initialize facts
      ansible.builtin.set_fact:
        tries_max: 20
        tries_counter: 1

    - name: Run check
      ansible.builtin.include_tasks: &quot;./wait-for-completion.yml&quot;


# roles/run-tests/wait-for-completion.yml
---
- name: Get service status
  when: host_id in primary_host_ids
  ansible.builtin.service_facts:
  register: result
  ignore_errors: true

- name: Check if service is completed on primary nodes
  when: 
    - host_id in primary_host_ids
    - result.ansible_facts.services[&#39;myservice.service&#39;].state != &quot;running&quot;
  ansible.builtin.set_fact:
    experiment_finished: true
  delegate_to: localhost
  delegate_facts: true

- name: Print result
  when: 
    - host_id in primary_host_ids
  ansible.builtin.debug:
    msg: &quot;{{hostvars[&#39;localhost&#39;][&#39;experiment_finished&#39;]}}&quot;

- name: increase try counter by 1
  ansible.builtin.set_fact:
    tries_counter: &quot;{{ (tries_counter | int) + 1 }}&quot;

- name: Sleep and repeat if service is not completed
  when: 
    - not hostvars[&#39;localhost&#39;][&#39;experiment_finished&#39;]
    - (tries_counter | int) &lt;= (tries_max | int)
    - host_id in primary_host_ids
  block:
    - name: Sleep
      ansible.builtin.pause:
        seconds: &quot;{{ sleep_duration | default(300) }}&quot;

    - name: Run again
      ansible.builtin.include_tasks: &quot;./wait-for-completion.yml&quot;

huangapple
  • 本文由 发表于 2023年3月7日 03:22:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654984.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定